Reproducibility and Dependencies for Jupyter Notebooks

Abstract: 

Project Thoth is focused on helping developers (including data scientists) in their daily tasks. One important requirement when creating code or experiments is reproducibility. Especially in ML nowadays it is important to allow others to re-run experiments in the same environment used by the creator. One of the first tasks is the selection of the dependencies (e.g. pandas for data exploration and manipulation or TensorFlow for training our model).
In this talk, you will discover ways that do not allow reproducibility and why.

To guarantee reproducibility, you must account for all dependencies with specific version numbers for direct and transitive dependencies, including all hashes used to verify the provenance of the packages for security reasons (check these docs to know more about security in software stacks). To be even more precise, the Python version, operating system, and hardware all influence the code’s behavior. You should share all of this information so other users can experience the same behavior and obtain similar results.

Project Thoth wants to help data scientists in dealing with this task in the easiest and smartest way possible so that they can focus on more important and challenging tasks. Today you are going to learn about jupyterlab-requirements library for dependency management.

Bio: 

Francesco is a senior data scientist/software engineer at Red Hat and he is part of the AI Centre of Excellence (AICoE) and Office of the CTO. He works with the Thoth team, where they created a recommender system to help developers (including data scientists) to focus on important problems offloading many tasks that are automated and performed by pipelines and bots. He has a passion for AI, software and space. He previously worked on a research project with the European Space Agency (ESA) and industrial partners mixing AI and space to create a recommender system for the design of satellites. He loves to read, travel and learn languages.

Open Data Science

 

 

 

Open Data Science
One Broadway
Cambridge, MA 02142
info@odsc.com

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from - Youtube
Vimeo
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google