ODSC Webinar Calendar

ODSC’s free webinar series serves to educate our community on the languages, tools, and topics of AI and Data Science


ODSC East 2019 Warm-Up: AI for Engineers

February, 20th,  1 pm – 3 pm EST

This event will feature four 30-minute tutorials presented by our distinguished speakers, listed below. These sessions will highlight some of the most integral topics, tools, and languages in AI for engineers, and give attendees a preview of what can be expected at ODSC East 2019


Add to Calendar
02/20/2019 10:00 AM
America/Los_Angeles
ODSC East’19 Warm-up Webinar

Click here for Webinar Access
ODSC Webinar

Daniel Gerlanc
President, Enplus Advisors Inc.

Programming with Data: Python and Pandas

Whether in R, MATLAB, Stata, or python, modern data analysis, for many researchers, requires some kind of programming. The preponderance of tools and specialized languages for data analysis suggests that general purpose programming languages like C and Java do not readily address the needs of data scientists; something more is needed.

In this workshop, you will learn how to accelerate your data analyses using the Python language and Pandas, a library specifically designed for interactive data analysis. Pandas is a massive library, so we will focus on its core functionality, specifically, loading, filtering, grouping, and transforming data. Having completed this workshop, you will understand the fundamentals of Pandas, be aware of common pitfalls, and be ready to perform your own analyses.

Presenter bio

Daniel Gerlanc has worked as a data scientist for more than decade and written software professionally for 15 years. He spent 5 years as a quantitative analyst with two Boston hedge funds before starting Enplus Advisors. At Enplus, he works with clients on data science and custom software development with a particular focus on projects requiring an expertise in both areas. He teaches data science and software development at introductory through advanced levels. He has coauthored several open source R packages, published in peer-reviewed journals, and is active in local predictive analytics groups.

Scott Haines
Principal Software Engineer, Twilio

Real-ish Time Predictive Analytics with Spark Structured Streaming

In this workshop we will dive deep into what it takes to build and deliver an always-on “real-ish time” predictive analytics pipeline with Spark Structured Streaming.

The core focus of the workshop material will be on how to solve a common complex problem in which we have no labeled data in an unbounded timeseries dataset and need to understand the substructure of said chaos in order to apply common supervised and statistical modeling techniques to our data in a streaming fashion.

Presenter bio

Scott Haines is a full stack engineer with a current focus on real-time, highly available, trust-worthy analytics systems. He is currently working at Twilio (as Principal Engineer / Tech Lead of the Voice Insights team) where he helped drive spark adoption and streaming pipeline architectures. Prior to Twilio, he worked writing the backend java API’s for Yahoo Games, as well as the real-time game ranking/ratings engine (built on Storm) to provide personalized recommendations and page views for 10 million customers. He finished his tenure at Yahoo working for Flurry Analytics where he wrote the alerts/notifications system for mobile.

Leonardo De Marchi
Head of Data Science and Analytics, Badoo

Modern and Old Reinforcement Learning

Reinforcement Learning recently progressed greatly in the industry as one of the best techniques for sequential decision making and control policies.
In this presentation we will explore Reinforcement Learning, starting from its fundamentals and ending creating our own algorithms.
We will use OpenAI gym to try our RL algorithms.
We then will also explore other RL frameworks and more complex concepts like Policy gradients methods and Deep Reinforcement learning, which recently changed the field of Reinforcement Learning.

Presenter bio

Leonardo De Marchi holds a Master in Artificial intelligence and has worked as a Data Scientist in the sport world, with clients such as New York Knicks and Manchester United, and with large social networks, like Justgiving.He now works as Lead Data Scientist in Badoo, the largest dating site with over 360 million users, he is also the lead instructor at ideai.io, a company specialized in Deep Learning and Machine Learning training and a contractor for the European Commission.

Sourav Dey, PhD
CTO, Manifold

Reproducible Data Science Using Orbyter

Artificial Intelligence is already helping many businesses become more responsive and competitive, but how do you move machine learning models efficiently from research to deployment at enterprise scale? It is imperative to plan for deployment from day one, both in tool selection and in the feedback and development process. Additionally, just as DevOps is about people working at the intersection of development and operations, there are now people working at the intersection of data science and software engineering who need to be integrated into the team with tools and support.

At Manifold, we’ve developed the Lean AI process to streamline machine learning projects and the open-source Orbyter package for Docker-first data science to help your engineers work as an an integrated part of your development and production teams. In this workshop, Sourav and Alex will focus heavily on the DevOps side of things, demonstrating how to use Orbyter to spin up data science containers and discussing experiment management as part of the Lean AI process.

Sourav Dey & Alex NG (co-presenters) bios

As CTO for Manifold, Sourav is responsible for the overall delivery of data science and data product services to make clients successful. Before Manifold, Sourav led teams to build data products across the technology stack, from smart thermostats and security cams (Google / Nest) to power grid forecasting (AutoGrid) to wireless communication chips (Qualcomm). He holds patents for his work, has been published in several IEEE journals, and has won numerous awards. He earned his PhD, MS, and BS degrees from MIT in Electrical Engineering and Computer Science.

Alexander Ng is a Senior Data Engineer at Manifold, an artificial intelligence engineering services firm with offices in Boston and Silicon Valley. Prior to Manifold, Alex served as both a Sales Engineering Tech Lead and a DevOps Tech Lead for Kyruus, a startup that built SaaS products for enterprise healthcare organizations. Alex got his start as a Software Systems Engineer at the MITRE Corporation and the Naval Undersea Warfare Center in Newport, RI. His recent projects at the intersection of systems and machine learning continue to combine a deep understanding of the entire development lifecycle with cutting-edge tools and techniques. Alex earned his Bachelor of Science degree in Electrical Engineering from Boston University, and is an AWS Certified Solutions Architect.


Leveraging Apache Arrow to improve PySpark performance

February 28th, 2019
1 pm – 2 pm India Standard Time
Click here to register


Add to Calendar
02/27/2019 11:30 PM
America/Los_Angeles
Leveraging Apache Arrow and PySpark, to boost up Spark processing

Click here for Webinar Access
ODSC Webinar

Vipul Modi
Software Engineer and Spark Specialist at Qubole

Leveraging Apache Arrow to improve PySpark performance

Abstract:
Apache Arrow is an in-memory columnar data format that is used in Spark to efficiently transfer data between JVM and Python processes. This currently is most beneficial to Python users that work with Pandas/NumPy data. In this webinar, we will learn about how PySpark works, how spark uses Arrow to improve the performance of python UDF’s. We will also learn how we can use this new feature and see real performance gains by enabling and disabling arrow optimizations.

Agenda:
1. How PySpark works
2. 
What is Apache Arrow
3. How Arrow helps PySpark
4. Demo

Presenter bio

Vipul is a widely experienced software engineer with a demonstrated history of working in the internet industry, including software giants InMobi, Oracle, Microsoft, Flipkart, and now Qubole, where he is a recognized Spark specialist. Skilled in Java, SQL, ROR and Big Data Technologies (esp. Apache Spark), Vipul is a strong engineering professional backed with a masters/MSc. (Tech) focused in Information Systems from Birla Institute of Technology and Science.


OmniSci and RAPIDS: An End-to-End Open-Source Data Science Workflow

POSTPONED – new date will be confirmed soon

Randy Zwitch
Senior Developer Advocate at OmniSci

OmniSci and RAPIDS: An End-to-End Open-Source Data Science Workflow

In this session, attendees will learn how the OmniSci GPU-accelerated SQL engine fits into the overall RAPIDS partner ecosystem for open-source GPU analytics. Using open bike-share data, users will learn how to ingest streaming data from Apache Kafka into OmniSci, perform descriptive statistics and feature engineering using both SQL and cuDF with Python and return the results as a GPU DataFrame. By the end of the session, attendees should feel comfortable that an entire data science workflow can be accomplished using tools from the RAPIDS eco-system, all without the data ever leaving the GPU.

Topics to be highlighted:
– What is RAPIDS? (discussion of NVIDIA open-source RAPIDS project, how it relates to Apache Arrow, etc.)
– What is OmniSci and how does it fit into the RAPIDS eco-system
– Example:
– Ingesting a data stream from Apache Kafka into OmniSci
– Using pymapd (Python) to query data from OmniSci and do basic visualizations
– Use cudf to do data cleaning and feature engineering
– Show how cudf dataframes can be passed to machine learning libraries like Tensorflow, PyTorch or xgboost.

Presenter bio

Randy Zwitch is a Senior Developer Advocate at OmniSci, enabling customers and community users alike to utilize OmniSci to its fullest potential. With broad industry experience in Energy, Digital Analytics, Banking, Telecommunications and Media, Randy brings a wealth of knowledge across verticals as well as an in-depth knowledge of open-source tools for analytics.


Previous Webinars


Check out our previous AI talks at learnai.odsc.com below


Kubeflow and Beyond: Automation of Model Training, Deployment, Testing, Monitoring and Retraining

Click here to access free recording


Stepan Pushkarev 
CTO, Hydrosphere.io

Ilnur Garifullin
ML Engineer, Hydrosphere.io

Abstract

Very often a workflow of training models and delivering them to the production environment contains loads of manual work. Those could be either building a Docker image and deploying it to the Kubernetes cluster or packing the model to the Python package and installing it to your Python application. Or even changing your Java classes with the defined weights and re-compiling the whole project. Not to mention that all of this should be followed by testing your model’s performance. It hardly could be named “continuous delivery” if you do it all manually. Imagine you could run the whole process of assembling/training/deploying/testing/running model via a single command in your terminal.

In this webinar, we will present a way to build the whole workflow of data gathering/model training/model deployment/model testing into a single flow and run it with a single command.

Presenter bio: Stepan Pushkarev

Stepan Pushkarev is a CTO of Hydrosphere.io. His background is in the engineering of data platforms. He spent the last couple of years building continuous delivery and monitoring tools for machine learning applications as well as designing streaming data platforms. He works closely with data scientists to make them productive and successful in their daily operations.

Presenter bio: Ilnur Garifullin

Ilnur Garifullin is an ML Engineer in Hydrosphere.io focused on implementation of company’s latest researches and platform developments into Hydrosphere.io users practice.


ODSC East 2019 Warm-Up: Machine Learning and Deep Learning

Click here to access free recording


Dr.Kirk Borne
Principal Data Scientist

Becoming The Complete Data Scientist with Data Literacy and Data Storytelling

I will review some of the key data literacy components that contribute to successful data science in real world applications. In discussing these concepts, I will give examples through the art of data storytelling, which aims to answer the core questions that your clients, colleagues, and stakeholders want to have answered: What? So what? Now what? By focusing your effort on addressing the user questions and user requirements, which then drive your project’s data and modeling activities, which then fuel your final data products and project deliverables, you will establish yourself as a key contributor to any analytics team. Your technical skills may bring you customers, but it’s not the technical stuff that you know (i.e., your successes) that brings your customers back. What brings customers back is your customers’ successes, which are nurtured and grown through clear explanations of the data, the modeling activities, and the results, which they can then share with others.

Presenter bio

Kirk Borne is a data scientist and an astrophysicist who has used his talents at Booz Allen since 2015. He was professor of astrophysics and computational science at George Mason University (GMU) for 12 years. He served as undergraduate advisor for the GMU data science program and graduate advisor in the computational science and informatics Ph.D. program.

Kirk spent nearly 20 years supporting NASA projects, including NASA’s Hubble Space Telescope as data archive project scientist, NASA’s Astronomy Data Center, and NASA’s Space Science Data Operations Office. He has extensive experience in large scientific databases and information systems, including expertise in scientific data mining. He was a contributor to the design and development of the new Large Synoptic Survey Telescope, for which he contributed in the areas of science data management, informatics and statistical science research, galaxies research, and education and public outreach.

Andreas Mueller
Ph.D.,
Author, Lecturer, Core Contributor of scikit-learn

Introduction to Machine Learning

Machine learning has become an indispensable tool across many areas of research and commercial applications. From text-to-speech for your phone to detecting the Higgs boson, machine learning excells at extracting knowledge from large amounts of data. This talk will give a general introduction to machine learning, as well as introduce practical tools for you to apply machine learning in your research. We will focus on one particularly important subfield of machine learning, supervised learning. The goal of supervised learning is to “learn” a function that maps inputs x to an output y, by using a collection of training data consisting of input-output pairs. We will walk through formalizing a problem as a supervised machine learning problem, creating the necessary training data and applying and evaluating a machine learning algorithm. The talk should give you all the necessary background to start using machine learning yourself.

Presenter bio

Andreas Mueller received his MS degree in Mathematics (Dipl.-Math.) in 2008 from the Department of Mathematics at the University of Bonn. In 2013, he finalized his PhD thesis at the Institute for Computer Science at the University of Bonn. After working as a machine learning scientist at the Amazon Development Center Germany in Berlin for a year, he joined the Center for Data Science at the New York University in the end of 2014. In his current position as assistant research engineer at the Center for Data Science, he works on open source tools for machine learning and data science. He is one of the core contributors of scikit-learn, a machine learning toolkit widely used in industry and academia, for several years, and has authored and contributed to a number of open source projects related to machine learning.

Francesco Mosconi
Ph.D. in Physics and Data Scientist at Catalit LLC, Instructor at Udemy

Pre-trained models, Transfer Learning and Advanced Keras Features

You have been using keras for deep learning models and are ready to bring your skills to the next level. In this workshop we will explore the use of pre-trained networks for image classification, transfer learning to adapt a pre-trained network to your use case, multi gpu training, data augmentation, keras callbacks and support for different kernels.

Presenter bio

Francesco Mosconi, Ph.D. in Physics and Data Scientist at Catalit LLC, Instructor at Udemy. Formerly co-founder and Chief Data Officer at Spire, a YC-backed company that invented the first consumer wearable device capable of continuously tracking respiration and physical activity. Machine Learning and python expert. Also served as Data Science lead instructor at General Assembly and The Data incubator.

Douglas Blank
Senior Software Engineer at Comet.ML

Easy Visualizations for Deep Learning

Visualizations are important in order to debug and understand how a Deep Learning model is representing a problem. In this talk, I will introduce a layer of software (ConX) that was developed on top of Keras in Jupyter Notebooks for making useful (and beautiful) visualizations of activations of a neural network. We will develop a model from scratch, train it, test it, and explore various tools for visualizing learning over time in representational space.

Presenter bio

Doug Blank is now a Senior Software Engineer at Comet.ML, a start-up in New York City. Comet.ML helps data scientists and engineers track, manage, replicate, and analyze machine learning experiments.
Doug was a professor of Computer Science for 18 years at Bryn Mawr College, a small, all-women’s liberal arts college outside of Philadelphia. He has been working on artificial neural networks for almost 30 years. His focus has been on creating models to make analogies, and for use with robot control systems. He is one of the core developers of ConX.

Tuning the untunable: Lessons for tuning expensive deep learning functions

Click here to access free recording

Patrick Hayes,
CTO & Co-Founder at SigOpt

Tuning the untunable: Lessons for tuning expensive deep learning functions

Tuning models with lengthy training cycles, typically found in deep learning, can be extremely expensive to train and tune. In certain instances, this high cost may even render tuning infeasible for a particular model. Even if tuning is feasible, it is often extremely expensive. Popular methods for tuning these types of models, such as evolutionary algorithms, typically require several orders of magnitude the time and compute as other methods. And techniques like parallelism often come with a degradation of performance trade-off that results in the use of many more expensive computational resources. This leaves most teams with few good options for tuning particular expensive deep learning functions.

But new methods related to task sampling in the tuning process create the chance for teams to dramatically lower the cost of tuning these models. This method referred to as multitask optimization, combines “strong anytime performance” from bandit-based methods with “strong eventual performance” of Bayesian optimization. As a result, this process can unlock tuning for some deep learning models that have particularly lengthy training and tuning cycles.

During this talk, Patrick Hayes, CTO & Co-Founder of SigOpt, walks through a variety of methods for training models with lengthier training cycles before diving deep on this multitask optimization functionality. The rest of the talk will focus on how this type of method works and explain the ways in which deep learning experts are deploying it today. Finally, we will talk through the implications of early findings in this area of research and next steps for exploring this functionality further. This is a particularly valuable and interesting talk for anyone who is working with large data sets or complex deep learning models.

Presenter bio

Patrick is happiest when building the most efficient architecture to reliably scale complex systems. He is responsible for the innovation and evolution of SigOpt’s products, and for evangelizing the value they bring to our customers. Prior to SigOpt, Patrick led engineering efforts at Foursquare to develop passive local recommendations and supported a team that build a more scalable approach to user growth experimentation. Before Foursquare, Patrick was a software engineer at Facebook and Wish responsible for building systems that scaled to tens of millions of users. Patrick holds a Bachelor of Mathematics in Computer Science and Pure Mathematics from the University of Waterloo.


Data Science for Good

3 presentations focused on Data Science for Good
Click here to access free recording

Data wrangling to provide solar energy access across Africa

Brianna Schuyler, PhD
Data Science team Lead at Fenix International

Data wrangling to provide solar energy access across Africa

More than 600 million people in Sub-Saharan Africa have no access to electricity, and the majority of those have no documented financial history. These two facts set the stage for some incredibly cool applications of data science. A family can light their home and keep necessary electronics (such as a cell phone) charged using a small solar panel and battery, but most solar devices are not affordable to a vast number of people making $2 a day or less.

One solution to this problem is offering solar energy kits on a Pay As You Go basis, providing financial loans to families until they are able to pay off the cost of their device (paying around 10-20 cents per day over several months to years). However, people with severely restricted income are very susceptible to financial shocks and oftentimes exhibit sporadic payment behavior which poses an interesting prediction problem. By mining data from a variety of data sources – demographic, past repayment patterns, weather and climate data, satellite imagery, and data from the devices themselves – we can predict repayment and develop credit histories for solar energy users. This rich and unique dataset can be used to develop credit profiles for individuals, allowing them access to credit for other life-changing loans or utilities.

In addition to financial information, the solar devices themselves send millions of bits of information (from their internal temperature, to the amount of energy flowing from the panel, to the number of hours of light that the kit is providing) regularly using a GSM chip. We can identify, diagnose, and predict system malfunction using anomaly detection and classification algorithms, and even plan mobile clinic routes to fix the systems in the field. Information transferred through GSM, along with the financial data amassed through loan repayment, provide a fascinating dataset on which to model and explore. Data analysis and machine learning techniques allow increased energy access to those for whom the costs of solar were previously prohibitive, as well as increased adoption of renewable energy sources in a rapidly growing population.

Presenter bio

Brianna leads the data science team at Fenix International. Their work spans multiple countries, including the US, Uganda, Zambia, and Ivory Coast. She and the data team at Fenix work on a wide range of problems to help provide clean, safe, and sustainable energy to people living off the grid in Sub-Saharan Africa. She has a bachelor’s degree in Physics from Johns Hopkins University, a master’s degree in Physics from the University of Wisconsin – Madison, and a Ph.D. in Neuroscience from the University of Wisconsin – Madison. After years of particle physics and functional MRI analyses, she took a break from academia and served as a Peace Corps volunteer in Northern Uganda. She’s delighted to use her background in big data at the perfect crossroads of sustainable energy and energy access for underserved populations.

AI Ethics: Current challenges

Abhishek Gupta,
AI Ethics Researcher, Software Engineer

AI ETHICS: CURRENT CHALLENGES

This talk will highlight some of the emerging challenges when it comes to the responsible and ethical development and deployment of AI. It will use recent examples to illustrate some of the challenges and present potential strategies on how to best mitigate these issues. The talk will also highlight 2 projects coming up from the Montreal AI Ethics Institute that are aiming to concretely address some of these challenges.

Presenter bio

Abhishek Gupta is the founder of Montreal AI Ethics Institute and an AI Ethics Researcher at McGill University, Montreal, Canada. His research focuses on applied technical and policy methods to address ethical, safety and inclusivity concerns in using AI in different domains. Abhishek comes from a strong technical background, working as a Software Engineer, Machine Learning at Microsoft in Montreal.

He is also the founder of the AI Ethics community in Montreal that has more than 1350 members from diverse backgrounds who do a deep dive into AI ethics and offer public consultations to initiatives like the Montreal Declaration for Responsible AI. His work has been featured by the United Nations, Oxford, Stanford Social Innovation Review, World Economic Forum and he travels frequently across North America and Europe to help governments, industry and academia understand AI and how they can incorporate ethical, safe and inclusive development processes within their work. More information can be found on https://atg-abhishek.github.io

Detecting semantic bias through interpretability

Eric Schles,
Data Scientist at Microsoft

DETECTING SEMANTIC BIAS THROUGH INTERPRETABILITY

In this session, we will juxtapose classical statistical interpretability techniques against cutting-edge techniques. We will show how these newer techniques allow us to interpret models like neural networks, ensembles and support vector machines. The two main new tools we will use are SHAP and LIME.

We will apply this to data synthetic datasets, showing how one could detect semantic bias (non-statistical bias).

Presenter bio

Eric Schles is a data scientist for Microsoft working on machine learning models in production. He is an alumnus of the Obama White House, the DAs office in the southern district of New York, and 18F. In his spare time Eric runs the New York Data Science Meetup and plays with his cat.


Jason Prentice, Senior Manager, Data Science at S&P Global Market Intelligence

“Mapping the Global Supply Chain Graph”

Click here to access free recording.

Mapping the Global Supply chain graph

Panjiva maps the network of global trade using over one billion shipping records sourced from 15 governments around the world. We perform large-scale entity extraction and entity resolution from this raw data, identifying over 8 million companies involved in international trade, located across every country in the world. Moreover, we track detailed information on the 25 million+ relationships between them, yielding a map of the global trade network with unprecedented scope and granularity. We have developed a powerful platform facilitating search, analysis, and visualization of this network as well as a data feed integrated into S&P Global’s Xpressfeed platform.

We can explore the global supply chain graph at many levels of granularity. At the micro level, we can surface the close relationships around a given company to, for example, identify overseas suppliers shared with a competitor. At the macro level, we can track patterns such as the flow of products among geographic areas or industries. By linking to S&P Global’s financial and corporate data, we can understand how supply chains flow within or between multinational corporate structures and correlate trade volumes and anomalies to financial metrics and events.

Presenter bio - Jason Prentice, Senior Manager, Data Science at S&P Global Market Intelligence

Jason Prentice leads the data team at Panjiva, where he focuses on developing the fundamental machine learning technologies that power our data collection. Before joining Panjiva as a data scientist, he researched computational neuroscience as a C.V. Starr fellow at Princeton University and earned a Ph.D. in Physics from the University of Pennsylvania.

Matthew Rubashkin, Ph.D. AI Program Director at Insight Data Science

“Building an image search service from scratch”

Click here to access free recording.

Building an image search service from scratch

We are bringing a workshop on how you would go about building your own representations, both for image and text data, and efficiently do similarity search. By the end of this workshop, you should be able to build a quick semantic search model from scratch, no matter the size of your dataset.

Presenter bio - Matthew Rubashkin, Ph.D. AI Program Director at Insight Data Science

Michael Mahoney, PhD, Professor at UC Berkeley

“Matrix Algorithms at Scale: Randomization and using Alchemist to bridge the Spark-MPI gap”

Click here to access free recording.

Matrix Algorithms at Scale: Randomization and using Alchemist to bridge the Spark-MPI gap

In this talk we will describe some of the underlying randomized linear algebra techniques. Finally, we’ll describe Alchemist, a system for interfacing between Spark and existing MPI libraries that is designed to address this performance gap. The libraries can be called from a Spark application with little effort, and we illustrate how the resulting system leads to efficient and scalable performance on large datasets. We describe use cases from scientific data analysis that motivated the development of Alchemist and that benefit from this system. We’ll also describe related work on communication-avoiding machine learning, optimization-based methods that can call these algorithms, and extending Alchemist to provide an ipython notebook <=> MPI interface.

Presenter Bio - Michael Mahoney, PhD, Professor at UC Berkeley

Michael Mahoney is at the University of California at Berkeley in the Department of Statistics and at the International Computer Science Institute (ICSI). He works on algorithmic and statistical aspects of modern large-scale data analysis. Much of his recent research has focused on large-scale machine learning, including randomized matrix algorithms and randomized numerical linear algebra, geometric network analysis tools for structure extraction in large informatics graphs, scalable implicit regularization methods, and applications in genetics, astronomy, medical imaging, social network analysis, and internet data analysis. He received him PhD from Yale University with a dissertation in computational statistical mechanics, and he has worked and taught at Yale University in the mathematics department, at Yahoo Research, and at Stanford University in the mathematics department. Among other things, he is on the national advisory committee of the Statistical and Applied Mathematical Sciences Institute (SAMSI), he was on the National Research Council’s Committee on the Analysis of Massive Data, he runs the biennial MMDS Workshops on Algorithms for Modern Massive Data Sets, and he spent fall 2013 at UC Berkeley co-organizing the Simons Foundation’s program on the Theoretical Foundations of Big Data Analysis.

Joshua Cook, Curriculum Developer at Databricks

“Engineering for Data Science”

Click here to access free recording.

Engineering for Data Science

This talk will discuss Docker as a tool for the data scientist, in particular in conjunction with the popular interactive programming platform, Jupyter, and the cloud computing platform, Amazon Web Services (AWS). Using Docker, Jupyter, and AWS, the data scientist can take control of their environment configuration, prototype scalable data architectures, and trivially clone their work toward replicability and communication. This talk will toward developing a set of best practices for Engineering for Data Science.

Presenter Bio - Joshua Cook, Curriculum Developer at Databricks

Joshua Cook is a mathematician. He writes code in Bash, C, and Python and has done pure and applied for computational work in geospatial predictive modeling, quantum mechanics, semantic search, and artificial intelligence. He also has ten years experience teaching mathematics at the secondary and post-secondary level. His research interests lie in high-performance computing, interactive computing, feature extraction, and reinforcement learning. He is always willing to discuss orthogonality or to explain why Fortran is the language of the future over a warm or cold beverage.

Nisha Talagala, CTO/VP of Engineering at ParallelM

“Bringing Your Machine Learning and Deep Learning Algorithms to Life: From Experiments to Production Use”

Click here to access free recording.

Bringing Your Machine Learning and Deep Learning Algorithms to Life: From Experiments to Production Use

In this hands on workshop, attendees will learn how to take Machine Learning and Deep Learning programs into a production use case and manage the full production lifecycle. This workshop is targeted for data scientists, with some basic knowledge of Machine Learning and/or Deep Learning algorithms, who would like to learn how to bring their promising experimental results on ML and DL algorithms into production success. In the first half of the workshop, attendees will learn how to develop an ML algorithm in a Jupyter notebook and transition this algorithm into an automated production scoring environment using Apache Spark. The audience will then learn how to diagnose production scenarios for their application (for example, data and model drift) and optimize their ML performance further using retraining. In the second half of the workshop, users will perform a similar exercise for Deep Learning. They will learn how to experiment with Convolutional Neural Network algorithms in TensorFlow and then deploy their chosen algorithm into production use. They will learn how to monitor the behavior of Deep Learning algorithms in production and approaches to optimizing production DL behavior via retraining and transfer learning.

Attendees should have basic knowledge of ML and DL algorithm types. Deep mathematical knowledge of algorithm internals is not required. All experiments will use Python. Environments will be provided in Azure for hands on use by all attendees. Each attendee will receive an account for use during the workshop and access to the notebook environments, Spark and TensorFlow engines, as well as an ML lifecycle management environment. For the ML experiments, sample algorithms and public data sets will be provided for Anomaly Detection and Classification. For the DL experiments, sample algorithms and public data sets will be provided for Image Classification and Text Recognition.

Presenter Bio - Nisha Talagala, CTO/VP of Engineering at ParallelM

Nisha Talagala is Co-Founder, CTO/VP of Engineering at ParallelM, a startup focused on Production Machine Learning. As Fellow at SanDisk and Fellow/Lead Architect at Fusion-io, she led advanced technology development in Non-Volatile Memory and applications. Nisha has more than 15 years of expertise in software, distributed systems, machine learning, persistent memory, and flash. Nisha was also technology lead for server flash at Intel and the CTO of Gear6. Nisha earned her PhD at UC Berkeley on distributed systems research. Nisha holds 54 patents, is a frequent speaker at both industry and academic conferences, and serves on multiple technical conference program committees.

Kirk Borne, PhD, Principal Data Scientist, Executive Advisor Booz Allen Hamilton

“Solving the Data Scientist’s Dilemma – The Cold Start Problem”

           Click here to access free recording.

Solving the Data Scientist's Dilemma - The Cold Start Problem

Supervised machine learning is a great tool when you have labeled training data and known classes that you are trying to predict for new previously unseen data. But, the assumptions of labeled data and known classes are generally not true in unsupervised machine learning. So, how can you maximize the data science outcomes, benefits, and applications when faced with the cold start problem? We will discuss this challenge and some solutions with several illustrative examples.

Presenter bio - Kirk Borne, PhD. Principal Data Scientist, Executive Advisor Booz Allen Hamilton

Kirk Borne is a data scientist and an astrophysicist who has used his talents at Booz Allen since 2015. He was professor of astrophysics and computational science at George Mason University (GMU) for 12 years. Kirk spent nearly 20 years supporting NASA projects.


Sean Patrick Gorman, PhD, Head of Technical Product Management, DigitalGlobe

Steven Pousty, Director of Developer Relations, DigitalGlobe

“How to use Satellite Imagery to be a Machine Learning Mantis Shrimp”

Click here to access free recording.

How to use Satellite Imagery to be a Machine Learning Mantis Shrimp

In this session we are going to start by showing you how satellite imagery actually allows you to “see” in more bands of color than the mantis (how about 26 bands) – each band is a massive amount of data about the earth. We will show you how you can work with this data in Jupyter notebooks to extract all sorts of information about the world. Last, we will wrap up with how to make ML models using this data, extract features we care about, and then run it through a cloud-based processing model.

Presenter Bio - Sean Patrick Gorman, PhD, Head of Technical Product Management, DigitalGlobe

1. Sean Patrick Gorman, PhD.
Sean is the Head of Technical Product Management at DigitalGlobe helping build GBDX and next generation machine learning tools for satellite imagery. Sean received his PhD from George Mason University as the Provost’s High Potential Research Candidate, Fisher Prize winner and an INFORMS Dissertation Prize recipient.

2. Steven Pousty.
Steve is the Developer Relations lead for DigitalGlobe. He goes around and shows off all the great work the DigitalGlobe engineers do. Steve has a Ph.D. in Ecology from University of Connecticut

Free access to ODSC talks and content is available at our

AI Learning Accelerator

ODSC EAST | Boston

– April 30th – May 3rd, 2019 –

The World’s Largest Applied Data Science Conference

ODSC EUROPE | London

– Nov 19th – 22nd, 2019 –

Europe’s Fastest Growing Data Science Community

ODSC WEST | San Francisco

– Oct 29th – Nov 1st, 2019 –

The World’s Largest Applied Data Science Conference

Accelerate AI

Business Conference

The Accelerate AI conference series is where executives and business professionals meet the best and brightest innovators in AI and Data Science. The conference brings together top industry executives and CxOs that will help you understand how AI and data science can transform your business.

Accelerate AI East | Boston

– April 30th – May 1st, 2019 –

The ODSC summit on accelerating your business growth with AI

Accelerate AI Europe | London 

– Nov 19th – 20th, 2019 –

The ODSC summit on accelerating your business growth with AI

Accelerate AI West | San Francisco 

– Oct 29th – 30th, 2019 –

The ODSC summit on accelerating your business growth with AI