ODSC East 2019 Warm-Up: AI for Engineers
President, Enplus Advisors Inc.
Programming with Data: Python and Pandas
Whether in R, MATLAB, Stata, or python, modern data analysis, for many researchers, requires some kind of programming. The preponderance of tools and specialized languages for data analysis suggests that general purpose programming languages like C and Java do not readily address the needs of data scientists; something more is needed.
In this workshop, you will learn how to accelerate your data analyses using the Python language and Pandas, a library specifically designed for interactive data analysis. Pandas is a massive library, so we will focus on its core functionality, specifically, loading, filtering, grouping, and transforming data. Having completed this workshop, you will understand the fundamentals of Pandas, be aware of common pitfalls, and be ready to perform your own analyses.
Daniel Gerlanc has worked as a data scientist for more than decade and written software professionally for 15 years. He spent 5 years as a quantitative analyst with two Boston hedge funds before starting Enplus Advisors. At Enplus, he works with clients on data science and custom software development with a particular focus on projects requiring an expertise in both areas. He teaches data science and software development at introductory through advanced levels. He has coauthored several open source R packages, published in peer-reviewed journals, and is active in local predictive analytics groups.
Principal Software Engineer, Twilio
Real-ish Time Predictive Analytics with Spark Structured Streaming
In this workshop we will dive deep into what it takes to build and deliver an always-on “real-ish time” predictive analytics pipeline with Spark Structured Streaming.
The core focus of the workshop material will be on how to solve a common complex problem in which we have no labeled data in an unbounded timeseries dataset and need to understand the substructure of said chaos in order to apply common supervised and statistical modeling techniques to our data in a streaming fashion.
Scott Haines is a full stack engineer with a current focus on real-time, highly available, trust-worthy analytics systems. He is currently working at Twilio (as Principal Engineer / Tech Lead of the Voice Insights team) where he helped drive spark adoption and streaming pipeline architectures. Prior to Twilio, he worked writing the backend java API’s for Yahoo Games, as well as the real-time game ranking/ratings engine (built on Storm) to provide personalized recommendations and page views for 10 million customers. He finished his tenure at Yahoo working for Flurry Analytics where he wrote the alerts/notifications system for mobile.
Leonardo De Marchi
Head of Data Science and Analytics, Badoo
Modern and Old Reinforcement Learning
Reinforcement Learning recently progressed greatly in the industry as one of the best techniques for sequential decision making and control policies.
In this presentation we will explore Reinforcement Learning, starting from its fundamentals and ending creating our own algorithms.
We will use OpenAI gym to try our RL algorithms.
We then will also explore other RL frameworks and more complex concepts like Policy gradients methods and Deep Reinforcement learning, which recently changed the field of Reinforcement Learning.
Leonardo De Marchi holds a Master in Artificial intelligence and has worked as a Data Scientist in the sport world, with clients such as New York Knicks and Manchester United, and with large social networks, like Justgiving.He now works as Lead Data Scientist in Badoo, the largest dating site with over 360 million users, he is also the lead instructor at ideai.io, a company specialized in Deep Learning and Machine Learning training and a contractor for the European Commission.
Sourav Dey, PhD
Reproducible Data Science Using Orbyter
Artificial Intelligence is already helping many businesses become more responsive and competitive, but how do you move machine learning models efficiently from research to deployment at enterprise scale? It is imperative to plan for deployment from day one, both in tool selection and in the feedback and development process. Additionally, just as DevOps is about people working at the intersection of development and operations, there are now people working at the intersection of data science and software engineering who need to be integrated into the team with tools and support.
At Manifold, we’ve developed the Lean AI process to streamline machine learning projects and the open-source Orbyter package for Docker-first data science to help your engineers work as an an integrated part of your development and production teams. In this workshop, Sourav and Alex will focus heavily on the DevOps side of things, demonstrating how to use Orbyter to spin up data science containers and discussing experiment management as part of the Lean AI process.
Sourav Dey & Alex NG (co-presenters) bios
As CTO for Manifold, Sourav is responsible for the overall delivery of data science and data product services to make clients successful. Before Manifold, Sourav led teams to build data products across the technology stack, from smart thermostats and security cams (Google / Nest) to power grid forecasting (AutoGrid) to wireless communication chips (Qualcomm). He holds patents for his work, has been published in several IEEE journals, and has won numerous awards. He earned his PhD, MS, and BS degrees from MIT in Electrical Engineering and Computer Science.
Alexander Ng is a Senior Data Engineer at Manifold, an artificial intelligence engineering services firm with offices in Boston and Silicon Valley. Prior to Manifold, Alex served as both a Sales Engineering Tech Lead and a DevOps Tech Lead for Kyruus, a startup that built SaaS products for enterprise healthcare organizations. Alex got his start as a Software Systems Engineer at the MITRE Corporation and the Naval Undersea Warfare Center in Newport, RI. His recent projects at the intersection of systems and machine learning continue to combine a deep understanding of the entire development lifecycle with cutting-edge tools and techniques. Alex earned his Bachelor of Science degree in Electrical Engineering from Boston University, and is an AWS Certified Solutions Architect.
Principal Data Scientist
Becoming The Complete Data Scientist with Data Literacy and Data Storytelling
I will review some of the key data literacy components that contribute to successful data science in real world applications. In discussing these concepts, I will give examples through the art of data storytelling, which aims to answer the core questions that your clients, colleagues, and stakeholders want to have answered: What? So what? Now what? By focusing your effort on addressing the user questions and user requirements, which then drive your project’s data and modeling activities, which then fuel your final data products and project deliverables, you will establish yourself as a key contributor to any analytics team. Your technical skills may bring you customers, but it’s not the technical stuff that you know (i.e., your successes) that brings your customers back. What brings customers back is your customers’ successes, which are nurtured and grown through clear explanations of the data, the modeling activities, and the results, which they can then share with others.
Kirk Borne is a data scientist and an astrophysicist who has used his talents at Booz Allen since 2015. He was professor of astrophysics and computational science at George Mason University (GMU) for 12 years. He served as undergraduate advisor for the GMU data science program and graduate advisor in the computational science and informatics Ph.D. program.
Kirk spent nearly 20 years supporting NASA projects, including NASA’s Hubble Space Telescope as data archive project scientist, NASA’s Astronomy Data Center, and NASA’s Space Science Data Operations Office. He has extensive experience in large scientific databases and information systems, including expertise in scientific data mining. He was a contributor to the design and development of the new Large Synoptic Survey Telescope, for which he contributed in the areas of science data management, informatics and statistical science research, galaxies research, and education and public outreach.
Ph.D., Author, Lecturer, Core Contributor of scikit-learn
Introduction to Machine Learning
Machine learning has become an indispensable tool across many areas of research and commercial applications. From text-to-speech for your phone to detecting the Higgs boson, machine learning excells at extracting knowledge from large amounts of data. This talk will give a general introduction to machine learning, as well as introduce practical tools for you to apply machine learning in your research. We will focus on one particularly important subfield of machine learning, supervised learning. The goal of supervised learning is to “learn” a function that maps inputs x to an output y, by using a collection of training data consisting of input-output pairs. We will walk through formalizing a problem as a supervised machine learning problem, creating the necessary training data and applying and evaluating a machine learning algorithm. The talk should give you all the necessary background to start using machine learning yourself.
Andreas Mueller received his MS degree in Mathematics (Dipl.-Math.) in 2008 from the Department of Mathematics at the University of Bonn. In 2013, he finalized his PhD thesis at the Institute for Computer Science at the University of Bonn. After working as a machine learning scientist at the Amazon Development Center Germany in Berlin for a year, he joined the Center for Data Science at the New York University in the end of 2014. In his current position as assistant research engineer at the Center for Data Science, he works on open source tools for machine learning and data science. He is one of the core contributors of scikit-learn, a machine learning toolkit widely used in industry and academia, for several years, and has authored and contributed to a number of open source projects related to machine learning.
Ph.D. in Physics and Data Scientist at Catalit LLC, Instructor at Udemy
Pre-trained models, Transfer Learning and Advanced Keras Features
You have been using keras for deep learning models and are ready to bring your skills to the next level. In this workshop we will explore the use of pre-trained networks for image classification, transfer learning to adapt a pre-trained network to your use case, multi gpu training, data augmentation, keras callbacks and support for different kernels.
Francesco Mosconi, Ph.D. in Physics and Data Scientist at Catalit LLC, Instructor at Udemy. Formerly co-founder and Chief Data Officer at Spire, a YC-backed company that invented the first consumer wearable device capable of continuously tracking respiration and physical activity. Machine Learning and python expert. Also served as Data Science lead instructor at General Assembly and The Data incubator.
Senior Software Engineer at Comet.ML
Easy Visualizations for Deep Learning
Visualizations are important in order to debug and understand how a Deep Learning model is representing a problem. In this talk, I will introduce a layer of software (ConX) that was developed on top of Keras in Jupyter Notebooks for making useful (and beautiful) visualizations of activations of a neural network. We will develop a model from scratch, train it, test it, and explore various tools for visualizing learning over time in representational space.