ODSC Webinar Calendar

ODSC’s free webinar series serves to educate our community on the languages, tools, and topics of AI and Data Science


Data Science for Good

December 19th – 4 presentations focused on Data Science for Good


Data wrangling to provide solar energy access across Africa

Brianna Schuyler, PhD
Data Science team Lead at Fenix International

Time: 2 – 2:30 pm EST

Data wrangling to provide solar energy access across Africa

More than 600 million people in Sub-Saharan Africa have no access to electricity, and the majority of those have no documented financial history. These two facts set the stage for some incredibly cool applications of data science. A family can light their home and keep necessary electronics (such as a cell phone) charged using a small solar panel and battery, but most solar devices are not affordable to a vast number of people making $2 a day or less.

One solution to this problem is offering solar energy kits on a Pay As You Go basis, providing financial loans to families until they are able to pay off the cost of their device (paying around 10-20 cents per day over several months to years). However, people with severely restricted income are very susceptible to financial shocks and oftentimes exhibit sporadic payment behavior which poses an interesting prediction problem. By mining data from a variety of data sources – demographic, past repayment patterns, weather and climate data, satellite imagery, and data from the devices themselves – we can predict repayment and develop credit histories for solar energy users. This rich and unique dataset can be used to develop credit profiles for individuals, allowing them access to credit for other life-changing loans or utilities.

In addition to financial information, the solar devices themselves send millions of bits of information (from their internal temperature, to the amount of energy flowing from the panel, to the number of hours of light that the kit is providing) regularly using a GSM chip. We can identify, diagnose, and predict system malfunction using anomaly detection and classification algorithms, and even plan mobile clinic routes to fix the systems in the field. Information transferred through GSM, along with the financial data amassed through loan repayment, provide a fascinating dataset on which to model and explore. Data analysis and machine learning techniques allow increased energy access to those for whom the costs of solar were previously prohibitive, as well as increased adoption of renewable energy sources in a rapidly growing population.

Presenter bio

Brianna leads the data science team at Fenix International. Their work spans multiple countries, including the US, Uganda, Zambia, and Ivory Coast. She and the data team at Fenix work on a wide range of problems to help provide clean, safe, and sustainable energy to people living off the grid in Sub-Saharan Africa. She has a bachelor’s degree in Physics from Johns Hopkins University, a master’s degree in Physics from the University of Wisconsin – Madison, and a Ph.D. in Neuroscience from the University of Wisconsin – Madison. After years of particle physics and functional MRI analyses, she took a break from academia and served as a Peace Corps volunteer in Northern Uganda. She’s delighted to use her background in big data at the perfect crossroads of sustainable energy and energy access for underserved populations.

Detecting semantic bias through interpretability

Eric Schles, Data Scientist at Microsoft

Time: 2:30 – 3 pm EST

Detecting semantic bias through interpretability

In this session, we will juxtapose classical statistical interpretability techniques against cutting-edge techniques. We will show how these newer techniques allow us to interpret models like neural networks, ensembles and support vector machines. The two main new tools we will use are SHAP and LIME.

We will apply this to data synthetic datasets, showing how one could detect semantic bias (non-statistical bias).

Presenter bio

Eric Schles is a data scientist for Microsoft working on machine learning models in production. He is an alumnus of the Obama White House, the DAs office in the southern district of New York, and 18F. In his spare time Eric runs the New York Data Science Meetup and plays with his cat.


Add to Calendar
12/19/2018 11:00 AM
America/Los_Angeles
ODSC Data Science for Good Webinar

Click here for Webinar Access
ODSC Webinar

Register Here



More presentations coming soon!


Tuning the untunable: Lessons for tuning expensive deep learning functions

January 10th, 2019

11:00 am –  13 pm PT

Click here to register


Patrick Hayes, CTO & Co-Founder at SigOpt

Tuning the untunable: Lessons for tuning expensive deep learning functions

Tuning models with lengthy training cycles, typically found in deep learning, can be extremely expensive to train and tune. In certain instances, this high cost may even render tuning infeasible for a particular model. Even if tuning is feasible, it is often extremely expensive. Popular methods for tuning these types of models, such as evolutionary algorithms, typically require several orders of magnitude the time and compute as other methods. And techniques like parallelism often come with a degradation of performance trade-off that results in the use of many more expensive computational resources. This leaves most teams with few good options for tuning particular expensive deep learning functions.

But new methods related to task sampling in the tuning process create the chance for teams to dramatically lower the cost of tuning these models. This method referred to as multitask optimization, combines “strong anytime performance” from bandit-based methods with “strong eventual performance” of Bayesian optimization. As a result, this process can unlock tuning for some deep learning models that have particularly lengthy training and tuning cycles.

During this talk, Patrick Hayes, CTO & Co-Founder of SigOpt, walks through a variety of methods for training models with lengthier training cycles before diving deep on this multitask optimization functionality. The rest of the talk will focus on how this type of method works and explain the ways in which deep learning experts are deploying it today. Finally, we will talk through the implications of early findings in this area of research and next steps for exploring this functionality further. This is a particularly valuable and interesting talk for anyone who is working with large data sets or complex deep learning models.

Presenter bio

Patrick is happiest when building the most efficient architecture to reliably scale complex systems. He is responsible for the innovation and evolution of SigOpt’s products, and for evangelizing the value they bring to our customers. Prior to SigOpt, Patrick led engineering efforts at Foursquare to develop passive local recommendations and supported a team that build a more scalable approach to user growth experimentation. Before Foursquare, Patrick was a software engineer at Facebook and Wish responsible for building systems that scaled to tens of millions of users. Patrick holds a Bachelor of Mathematics in Computer Science and Pure Mathematics from the University of Waterloo.


Data Storytelling, January 16th

13:00 – 15:00 pm EST

Click here to register

We will talk about Data Storytelling – structured approach for communicating data insights; it involves a combination of three key elements: data, visuals, and a narrative. In data storytelling, it’s key to clearly demonstrate the ‘so what’ of the analysis – why is this important to the business/ decision makers. As with all stories, you must have a beginning, middle and an end. Join us as we discuss the necessary components of effective data storytelling.
5 presenters:

Kate Strachnyi

Data visualization specialist and author of The Disruptors: Data Science Leaders and Journey to Data Scientist. Founder and host of Humans of Data Science (HoDS) on the Story by Data YouTube channel. I’m also an instructor at Udemy. Prior to work in data analytics, I worked in the financial services space helping clients address regulatory compliance and risk management-related issues. I enjoy spending time with my family, running, and playing guitar.

Kristen Kehrer

Kristen is #8 LinkedIn Global Top Voice 2018 – Data Science & Analytics. Since 2010, Kristen has been a Data Scientist has been delivering innovative and actionable machine learning solutions across multiple industries, including the utilities, healthcare, and eCommerce. She finished a BS in Mathematics in 2004, and a Master’s Degree in Applied Statistics.   Kristen is the founder of Data Moves Me, LLC and is a co-founder of Data Science Live.

Eric Weber

Eric is a data scientist, educator and lifelong learner. He has worked on data science efforts in the technology, healthcare, property/real-estate and finance sectors and currently holds two positions: principal data scientist on the data management team at CoreLogic in Irvine, CA and managing director of Method Data Science. At Method, Eric helps data scientists gain real world experience in a supported setting with mentors and project leads.

Favio Vázquez

Physicist and computer engineer with a master degree in physical sciences. Working on Data Science and Big Data. I have a passion for science, philosophy, programming, and music. Right now I’m working on data science, machine learning and big data as Data Science Instructor at Business Science and Senior Data Scientist at Raken Data Group. Also, I’m the creator of Ciencia y Datos, a Data Science publication in Spanish. I love new challenges, working with a good team and having interesting problems to solve, also applying my knowledge and expertise in science, data analysis, visualization, and automatic learning to help the world become a better place.

Sarah Nooravi

Sarah Nooravi is a lifelong learner and data geek. She has a history of delivering innovative marketing tools to help drive better business decisions in the entertainment and gaming industries at Operam and MobilityWare. She is also passionate about teaching and giving back to the community. In that spirit, she leads and coordinates monthly Machine Learning meetups at MobilityWare and mentors aspiring data scientists and engineers through groups like: WiBD, SWE and GLAD. These activities support a core motivation for Sarah: helping set up others for success in industry.

Previous Webinars


Check out our previous AI talks at learnai.odsc.com below


Jason Prentice, Senior Manager, Data Science at S&P Global Market Intelligence

“Mapping the Global Supply Chain Graph”

Click here to access free recording.

Mapping the Global Supply chain graph

Panjiva maps the network of global trade using over one billion shipping records sourced from 15 governments around the world. We perform large-scale entity extraction and entity resolution from this raw data, identifying over 8 million companies involved in international trade, located across every country in the world. Moreover, we track detailed information on the 25 million+ relationships between them, yielding a map of the global trade network with unprecedented scope and granularity. We have developed a powerful platform facilitating search, analysis, and visualization of this network as well as a data feed integrated into S&P Global’s Xpressfeed platform.

We can explore the global supply chain graph at many levels of granularity. At the micro level, we can surface the close relationships around a given company to, for example, identify overseas suppliers shared with a competitor. At the macro level, we can track patterns such as the flow of products among geographic areas or industries. By linking to S&P Global’s financial and corporate data, we can understand how supply chains flow within or between multinational corporate structures and correlate trade volumes and anomalies to financial metrics and events.

Presenter bio - Jason Prentice, Senior Manager, Data Science at S&P Global Market Intelligence

Jason Prentice leads the data team at Panjiva, where he focuses on developing the fundamental machine learning technologies that power our data collection. Before joining Panjiva as a data scientist, he researched computational neuroscience as a C.V. Starr fellow at Princeton University and earned a Ph.D. in Physics from the University of Pennsylvania.

 

Matthew Rubashkin, Ph.D. AI Program Director at Insight Data Science

“Building an image search service from scratch”

Click here to access free recording.

Building an image search service from scratch

We are bringing a workshop on how you would go about building your own representations, both for image and text data, and efficiently do similarity search. By the end of this workshop, you should be able to build a quick semantic search model from scratch, no matter the size of your dataset.

Presenter bio - Matthew Rubashkin, Ph.D. AI Program Director at Insight Data Science

Michael Mahoney, PhD, Professor at UC Berkeley

“Matrix Algorithms at Scale: Randomization and using Alchemist to bridge the Spark-MPI gap”

Click here to access free recording.

Matrix Algorithms at Scale: Randomization and using Alchemist to bridge the Spark-MPI gap

In this talk we will describe some of the underlying randomized linear algebra techniques. Finally, we’ll describe Alchemist, a system for interfacing between Spark and existing MPI libraries that is designed to address this performance gap. The libraries can be called from a Spark application with little effort, and we illustrate how the resulting system leads to efficient and scalable performance on large datasets. We describe use cases from scientific data analysis that motivated the development of Alchemist and that benefit from this system. We’ll also describe related work on communication-avoiding machine learning, optimization-based methods that can call these algorithms, and extending Alchemist to provide an ipython notebook <=> MPI interface.

Presenter Bio - Michael Mahoney, PhD, Professor at UC Berkeley

Michael Mahoney is at the University of California at Berkeley in the Department of Statistics and at the International Computer Science Institute (ICSI). He works on algorithmic and statistical aspects of modern large-scale data analysis. Much of his recent research has focused on large-scale machine learning, including randomized matrix algorithms and randomized numerical linear algebra, geometric network analysis tools for structure extraction in large informatics graphs, scalable implicit regularization methods, and applications in genetics, astronomy, medical imaging, social network analysis, and internet data analysis. He received him PhD from Yale University with a dissertation in computational statistical mechanics, and he has worked and taught at Yale University in the mathematics department, at Yahoo Research, and at Stanford University in the mathematics department. Among other things, he is on the national advisory committee of the Statistical and Applied Mathematical Sciences Institute (SAMSI), he was on the National Research Council’s Committee on the Analysis of Massive Data, he runs the biennial MMDS Workshops on Algorithms for Modern Massive Data Sets, and he spent fall 2013 at UC Berkeley co-organizing the Simons Foundation’s program on the Theoretical Foundations of Big Data Analysis.

Joshua Cook, Curriculum Developer at Databricks

“Engineering for Data Science”

Click here to access free recording.

Engineering for Data Science

This talk will discuss Docker as a tool for the data scientist, in particular in conjunction with the popular interactive programming platform, Jupyter, and the cloud computing platform, Amazon Web Services (AWS). Using Docker, Jupyter, and AWS, the data scientist can take control of their environment configuration, prototype scalable data architectures, and trivially clone their work toward replicability and communication. This talk will toward developing a set of best practices for Engineering for Data Science.

Presenter Bio - Joshua Cook, Curriculum Developer at Databricks

Joshua Cook is a mathematician. He writes code in Bash, C, and Python and has done pure and applied for computational work in geospatial predictive modeling, quantum mechanics, semantic search, and artificial intelligence. He also has ten years experience teaching mathematics at the secondary and post-secondary level. His research interests lie in high-performance computing, interactive computing, feature extraction, and reinforcement learning. He is always willing to discuss orthogonality or to explain why Fortran is the language of the future over a warm or cold beverage.

Nisha Talagala, CTO/VP of Engineering at ParallelM

“Bringing Your Machine Learning and Deep Learning Algorithms to Life: From Experiments to Production Use”

Click here to access free recording.

Bringing Your Machine Learning and Deep Learning Algorithms to Life: From Experiments to Production Use

In this hands on workshop, attendees will learn how to take Machine Learning and Deep Learning programs into a production use case and manage the full production lifecycle. This workshop is targeted for data scientists, with some basic knowledge of Machine Learning and/or Deep Learning algorithms, who would like to learn how to bring their promising experimental results on ML and DL algorithms into production success. In the first half of the workshop, attendees will learn how to develop an ML algorithm in a Jupyter notebook and transition this algorithm into an automated production scoring environment using Apache Spark. The audience will then learn how to diagnose production scenarios for their application (for example, data and model drift) and optimize their ML performance further using retraining. In the second half of the workshop, users will perform a similar exercise for Deep Learning. They will learn how to experiment with Convolutional Neural Network algorithms in TensorFlow and then deploy their chosen algorithm into production use. They will learn how to monitor the behavior of Deep Learning algorithms in production and approaches to optimizing production DL behavior via retraining and transfer learning.

Attendees should have basic knowledge of ML and DL algorithm types. Deep mathematical knowledge of algorithm internals is not required. All experiments will use Python. Environments will be provided in Azure for hands on use by all attendees. Each attendee will receive an account for use during the workshop and access to the notebook environments, Spark and TensorFlow engines, as well as an ML lifecycle management environment. For the ML experiments, sample algorithms and public data sets will be provided for Anomaly Detection and Classification. For the DL experiments, sample algorithms and public data sets will be provided for Image Classification and Text Recognition.

Presenter Bio - Nisha Talagala, CTO/VP of Engineering at ParallelM

Nisha Talagala is Co-Founder, CTO/VP of Engineering at ParallelM, a startup focused on Production Machine Learning. As Fellow at SanDisk and Fellow/Lead Architect at Fusion-io, she led advanced technology development in Non-Volatile Memory and applications. Nisha has more than 15 years of expertise in software, distributed systems, machine learning, persistent memory, and flash. Nisha was also technology lead for server flash at Intel and the CTO of Gear6. Nisha earned her PhD at UC Berkeley on distributed systems research. Nisha holds 54 patents, is a frequent speaker at both industry and academic conferences, and serves on multiple technical conference program committees.

Kirk Borne, PhD, Principal Data Scientist, Executive Advisor Booz Allen Hamilton

“Solving the Data Scientist’s Dilemma – The Cold Start Problem”

           Click here to access free recording.

Solving the Data Scientist's Dilemma - The Cold Start Problem

Supervised machine learning is a great tool when you have labeled training data and known classes that you are trying to predict for new previously unseen data. But, the assumptions of labeled data and known classes are generally not true in unsupervised machine learning. So, how can you maximize the data science outcomes, benefits, and applications when faced with the cold start problem? We will discuss this challenge and some solutions with several illustrative examples.

Presenter bio - Kirk Borne, PhD. Principal Data Scientist, Executive Advisor Booz Allen Hamilton

Kirk Borne is a data scientist and an astrophysicist who has used his talents at Booz Allen since 2015. He was professor of astrophysics and computational science at George Mason University (GMU) for 12 years. Kirk spent nearly 20 years supporting NASA projects.


Sean Patrick Gorman, PhD, Head of Technical Product Management, DigitalGlobe

 

 

Steven Pousty, Director of Developer Relations, DigitalGlobe

“How to use Satellite Imagery to be a Machine Learning Mantis Shrimp”

Click here to access free recording.

How to use Satellite Imagery to be a Machine Learning Mantis Shrimp

In this session we are going to start by showing you how satellite imagery actually allows you to “see” in more bands of color than the mantis (how about 26 bands) – each band is a massive amount of data about the earth. We will show you how you can work with this data in Jupyter notebooks to extract all sorts of information about the world. Last, we will wrap up with how to make ML models using this data, extract features we care about, and then run it through a cloud-based processing model.

Presenter Bio - Sean Patrick Gorman, PhD, Head of Technical Product Management, DigitalGlobe

1. Sean Patrick Gorman, PhD.
Sean is the Head of Technical Product Management at DigitalGlobe helping build GBDX and next generation machine learning tools for satellite imagery. Sean received his PhD from George Mason University as the Provost’s High Potential Research Candidate, Fisher Prize winner and an INFORMS Dissertation Prize recipient.

 

2. Steven Pousty.
Steve is the Developer Relations lead for DigitalGlobe. He goes around and shows off all the great work the DigitalGlobe engineers do. Steve has a Ph.D. in Ecology from University of Connecticut

Free access to ODSC talks and content is available at our

AI Learning Accelerator

ODSC EAST | Boston

– May 1-4, 2018 –

The World’s Largest Applied Data Science Conference

ODSC EUROPE | London

– Sept 19-22, 2018 –

Europe’s Fastest Growing Data Science Community

ODSC WEST | San Francisco

– Oct 31- Nov 3, 2018 –

The World’s Largest Applied Data Science Conference

Accelerate AI

Business Conference

The Accelerate AI conference series is where executives and business professionals meet the best and brightest innovators in AI and Data Science The conference brings together top industry executives and CxOs to help you understand how AI and data science will transform your business.

Accelerate AI East | Boston

– May 1 to 4, 2018 –

The ODSC summit on accelerating your business growth with AI

Accelerate AI Europe | London 

– Sept 19 to 22, 2018 –

The ODSC summit on accelerating your business growth with AI

Accelerate AI West | San Francisco 

– Oct 31 to Nov 3, 2018 –

The ODSC summit on accelerating your business growth with AI
Open Data Science Conference