ODSC West Schedule 2020
Day by Day Session Schedule
If you are an attendee, please refer the most updated schedule on live.staging6.odsc.com
Half-Day Training | R-Programming | Data Visualization | Beginner-Intermediate
Turning raw data into meaningful information and telling data driven stories is one of the great challenges of data science. When your data does not have you to speak for it in a live situation, your application needs to communicate your message clearly and provide simple interfaces and meaningful interactions to drive your message home to consumers.
In this session you will learn to use Shiny to build a dashboard from blank page to interactive application using the programming language R, the free R development environment rStudio and Redis. We will use free public data and open source libraries as we sculpt our dashboard together...more details
Bethany Poulin is a data scientist and educator with expertise in statistical analysis, data visualization and complex algorithmic problem solving. She has worked as a professional data scientist and educator for the last 4 years and loves sharing what she has learned with her students. Her unusual background in Fine Arts and experience teaching high school give her a unique perspective on both problem-solving and the learning-teaching process. Prior to teaching this part time course, she was lead instructor in our Data Science Immersive program on the Boston campus. She teaches simply because she loves students and enjoys being a part of their success. She holds a BFA in Professional Photography from Rochester Institute of Technology, did post bachelor’s studies and the University of Montana in Environmental Biology, where she was recognized nationally as a Morris K Udall Scholar, is one semester away from an MS in Data Science from City University of New York. She is has presented at PyOhio 2018 and 2019 and gave a presented at ODSC East in 2019. In here spare time, Bethany is an avid fly fisherman, potter and maker.
Half-Day Training | Data Science Kick-Starter | Beginner
This tutorial offers a comprehensive introduction to the powerful pandas library for data analysis built on top of the Python programming language. Pandas represents a great step forward for graphical spreadsheet users looking to grow their data manipulation skills. I like to call it “Excel on steroids”. By completing this workshop, you’ll have a strong foundation for using Pandas in your day-to-day data analysis needs. We’ll start out with the basics — importing datasets, selecting rows and columns, filtering rows by criteria — and progress to advanced concepts like grouping values, joining multiple datasets together, and cleaning text…more details
Boris Paskhaver is a full-stack web developer based in New York City with experience building apps in React / Redux and Ruby on Rails. His favorite part of programming is the never-ending sense that there’s always something new to master — a secret language feature, a popular design pattern, an emerging library or — most importantly — a different way of looking at a problem.
Half-Day Training | Deep Learning | Intermediate
This workshop hopes to convince participants that Keras is a worthwhile addition to their Machine Learning toolbelt. It teaches them how to build their own Keras models, initially using components already available in Keras, then extend them by customizing some of these components, and finally exploit the underlying Tensorflow platform for maximum flexibility and performance. They will also be able to work with the many cool (sometimes SOTA) models shared by the Keras community…more details
Sujit Pal builds intelligent systems around research content that help researchers and medical professionals achieve better outcomes. His areas of interest are Information Retrieval, Natural Language Processing and Machine Learning (including Deep Learning). As an individual contributor in the Elsevier Labs team, he works with diverse product teams to help them solve tough problems in these areas, as well as build proofs of concept at the cutting edge of applied research.
Half-Day Training | Machine Learning | Intermediate-Advanced
Ever wondered how quantum computers work, and how they do machine learning? With quantum computing technologies nearing the ear of commercialization and quantum advantage, machine learning has been proposed as one of the most promising applications. One of the areas in which quantum computing is showing great potential is in generative models in unsupervised and semi-supervised learning. In this training you will develop a basic understanding of quantum computing and how it can be used in machine learning models, with special emphasis on generative models. We will focus on a particular architecture, the quantum circuit Born machine (QCBM), and use it to generate a simple dataset of bars and stripes…more details
Luis Serrano is a Quantum AI Research Scientist at Zapata Computing. He is the author of the book Grokking Machine Learning and maintains a popular YouTube channel where he explains machine learning in pedestrian terms.
Luis has previously worked in machine learning at Apple and Google, and at Udacity as the head of content for AI and data science. He has a PhD in mathematics from the University of Michigan, a master’s and bachelors from the University of Waterloo, and worked as a postdoctoral researcher in mathematics at the University of Quebec at Montreal.
Kaitlin Gili is a Quantum Applications Intern at Zapata Computing. She has previously worked at Los Alamos National Laboratory and the IBMQ hub within Keio University as a quantum algorithm intern, and at the University of Oxford as a visiting quantum hardware research student. Kaitlin is passionate about quantum computing outreach for young scientists and has previously delivered quantum computing workshops to Girls Who Code middle/high school programs. She received her Bachelors’s in Physics from Stevens Institute of Technology and will be starting her PhD in Physics at the University of Oxford in January 2021.
Alejandro Perdomo-Ortiz is a Lead Quantum Application Scientist at Zapata Computing. He did his graduate studies, M.A and Ph.D. in Chemical Physics, at Harvard University. For over 12 years, he has worked to enhance the performance of quantum computing algorithms with physics-based approaches while maintaining a practical, application-relevant perspective. Before joining Zapata Computing, Alejandro spent over 5 years at NASA’s Quantum Artificial Intelligence Laboratory (NASA QuAIL), where he was the quantum machine learning technical lead. Between NASA and joining Zapata, he co-founded a consulting company called Qubitera LLC, which was acquired by Rigetti.
Half-Day Training | Data Visualization | Intermediate
Data visualization is fundamental to the data science process. Using plots and graphs to convey a complex idea makes your data more accessible to everyone. In this session, you will learn the fundamentals of plotting with Pandas in Jupyter by building an interactive visualization prototype that can also run as a standalone web application/dashboard. This session is for anyone who wants to be more familiar with data visualization, hands-on, with Python, Pandas, Matplotlib, interactive widgets, and Flask...more details
At the age of 8, David began learning the BASIC programming language while living in the outskirts of Alaska. He studied music performance but found the beginning of his career building a small software and consulting company in the late 90’s. David’s career spans almost 20 years including many startups as a lead engineer, building scalable data services from prototype to production. During his time at Sony/Gracenote, leading the development of a handful of prototypes were featured multiple years in the Consumer Electronics Show, spanning problems with recommender systems, content classification, and profiling type problems. More recently, David was a data scientist at a YC backed dating app company researching and building scalable recommendation pipelines.
Currently, David works at General Assembly as a Global Data Science Instructor, where he helped architect the first version of the data science immersive curriculum and pilot many new programs developed internally. Regularly delivering lectures in a hybrid format classroom on topics ranging from engineering, statistics, and ML.
Half-Day Training | Deep Learning | Advanced
Although supervised learning has dominated industry machine learning implementations, unsupervised and semi-supervised methods have started to be practically applied to real world problems (outside of playing video games). Generative Adversarial Networks (GANs) are being utilized to augment data and generate dialogue, and Reinforcement Learning (RL) is helping people plan marketing campaigns and control robots. In this training, you will develop a theoretical understanding of these and other related state-of-the-art AI methods along with the hands-on skills needed to train and utilize them. You will implement a variety of models in TensorFlow for tasks including object recognition, image generation and robotics…more details
Daniel Whitenack (aka Data Dan) is a PhD trained data scientist who has been developing artificial intelligence applications in the real world for over 10 years. He knows how to see beyond the hype of AI and machine learning to build systems that create business value, and he has taught these skills to 1000’s of developers, data scientists, and engineers all around the world. Now with the AI Classroom event, Data Dan is bringing this knowledge to an live, online learning environment so that you can level up your career from anywhere!
Full-Day Training | NLP | Advanced
Natural Language Processing (NLP) has recently experienced its own “ImageNet” moment. Rapidly evolving language models have enabled practitioners to decipher long lost languages, translate speech in one language to speech in another language directly without converting to text, generate long form text that adapts to the style and content of human prompts, and translate between language pairs never seen explicitly by computer systems (among many other impressive results).
In this training, you will develop a theoretical understanding of modern NLP along with the hands-on skills needed to develop state-of-the-art models. You will implement a variety of recurrent layer and transformer based architectures in both TensorFlow and PyTorch for tasks including text classification, machine translation, and predictive text…more details
Daniel Whitenack (aka Data Dan) is a PhD trained data scientist who has been developing artificial intelligence applications in the real world for over 10 years. He knows how to see beyond the hype of AI and machine learning to build systems that create business value, and he has taught these skills to 1000’s of developers, data scientists, and engineers all around the world. Now with the AI Classroom event, Data Dan is bringing this knowledge to an live, online learning environment so that you can level up your career from anywhere!
Full-Day Training | Deep Learning | Machine Learning | Beginner – Intermediate
Relatively obscure a few short years ago, Deep Learning is ubiquitous today across data-driven applications as diverse as machine vision, natural language processing, and super-human game-playing.
This Deep Learning primer brings the revolutionary machine-learning approach behind contemporary artificial intelligence to life with interactive demos featuring TensorFlow 2, the major, cutting-edge revision of the world’s most popular Deep Learning library. To facilitate an intuitive understanding of Deep Learning’s artificial-neural-network foundations, the essential theory will be introduced visually and pragmatically. Paired with tips for overcoming common pitfalls and hands-on Python code run-throughs provided in straightforward Jupyter notebooks, this foundational knowledge empowers you to build powerful state-of-the-art Deep Learning models…more details
Jon Krohn is Chief Data Scientist at the machine learning company untapt. He authored the book Deep Learning Illustrated, which was released by Addison-Wesley in 2019 and became an instant #1 bestseller that was translated into six languages. Jon is renowned for his compelling lectures, which he offers in-person at Columbia University, New York University, and the NYC Data Science Academy, as well as online via O’Reilly, YouTube, and his A4N podcast on A.I. news. Jon holds a doctorate in neuroscience from Oxford and has been publishing on machine learning in leading academic journals since 2010.
Half-Day Training | Deep Learning | Machine Learning | Beginner-Intermediate
Reinforcement Learning recently progressed greatly in the industry as one of the best techniques for sequential decision making and control policies.DeepMind used RL to greatly reduce energy consumption in Google’s data center. It has been used to do text summarization, autonomous driving, dialog systems, media advertisements and in finance by JPMorgan Chase. We are at the very beginning of the adoption of these algorithms as systems are required to operate more and more autonomously.
In this workshop we will explore Reinforcement Learning, starting from its fundamentals and ending creating our own algorithms. We will use OpenAI gym to try our RL algorithms. OpenAI is a non profit organization that want committed to open source all their research on Artificial Intelligence…more details
Leonardo De Marchi holds a Master in Artificial intelligence and has worked as a Data Scientist in the sports world, with clients such as the New York Knicks and Manchester United, and with large social networks, like Justgiving. He now works as Head of Data Scientist and Analytics in Badoo, the largest dating site with over 420 million users. He is also the lead instructor at ideai.io, a company specialized in Reinforcement Learning, Deep Learning, and Machine Learning training.
Half-Day Training | Kick-starter | Machine Learning | All Levels
The field of machine learning and data science has gained sudden resurgence in the last few years. The contributions of machine learning in solving data-driven problems and creating intelligent applications cannot be overemphasized. This field which intersects statistics and probability, mathematics, computer science and algorithms can be used to learn iteratively from complex data and find hidden insights. Understanding the mathematics behind machine learning allows us to choose the right algorithms for our problem, make good choices on parameter settings and validation strategies, recognize under- and over-fitting, troubleshoot ambiguous results and put appropriate confidence bounds on results...more details
Adewale (Wale) Akinfaderin is a Data Scientist, Team Lead at Duke Energy Innovation Center where he works on the intersection of machine learning and renewable energy. His expertise is in machine learning, deep learning, statistical experimentation and general information theory. He has broad experience implementing and extending ML techniques to solve practical and business problems. In his spare time, he conduct research on Machine Learning for the Developing World and he was on the Program Committee for NeurIPS 2019 Workshop on Machine Learning for the Developing World. He is affiliated with Google as a Google Developer Expert in Machine Learning.
Half-Day Training | NLP | Machine Learning | Beginner-Intermediate
Learn why the truly open source HPCC Systems platform is Better at Big Data and learn how ECL can empower you to build powerful data queries with ease. HPCC Systems is a comprehensive, dedicated data lake platform makes combining different types of data easier and faster than competing platforms — even data stored in massive, mixed schema data lakes — and it scales very quickly as your data needs grow...more details
Bob Foreman has worked with LexisNexis and the open source big data HPCC Systems technology platform and the ECL programming language for more than 10 years, and has been a technical trainer for more than 25 years. He is the developer and designer of the HPCC Systems Online Training Courses, and is the Senior Instructor for all classroom and remote training. In addition to being one of HPCC Systems favorite Trainers, Bob has more than 30 years of industry experience in training, consulting and technical writing with RDBMS platforms and most recently large scale data technology.
Hugo supports the development and delivery of training programs for the HPCC Systems platform in the Brazil region since 2019. Hugo has worked for over 15 years on various technical roles in the IT industry with a focus on High-Performance Computing. He is also a part-time researcher on Information Systems and a member of the UK Academy for Information Systems.
Alejandro is a Ph.D. Candidate at Stanford University and Professor Ron Howard’s last doctoral student. His research focus is on approximation of multi-attribute utility functions with the use of thresholds and the implications that this approach has on the creation of better decision support automation. He is passionate about operationalizing decision making within organizations in a way that creates an “alive process”. Before coming to Stanford, he was an Operations consultant and an Operations Manager for several manufacturing businesses in his native Venezuela. He has led large teams in delivering results in operations rich fields such as food and heavy metal manufacturing.
Half-Day Training | Deep Learning | Machine Learning | Beginner-Intermediate
We will use OpenAI gym to try our RL algorithms. OpenAI is a non profit organization that want committed to open source all their research on Artificial Intelligence. To foster innovation OpenAI created a virtual environment, OpenAi gym, where it’s easy to test Reinforcement Learning algorithms. In particular, we will start with some popular techniques like Multi-Armed Bandit, going thought Markov Decision Processes and Dynamic Programming. We then will also explore other RL frameworks and more complex concepts like Policy gradients methods and Deep Reinforcement learning, which recently changed the field of Reinforcement Learning. In particular, we will see Actor-Critic models and Proximal Policy Optimizations that allowed OpenAI to beat some of the best Dota players. We will also provide the necessary Deep Learning concepts for the course…more details
Leonardo De Marchi holds a Master in Artificial intelligence and has worked as a Data Scientist in the sports world, with clients such as the New York Knicks and Manchester United, and with large social networks, like Justgiving. He now works as Head of Data Scientist and Analytics in Badoo, the largest dating site with over 420 million users. He is also the lead instructor at ideai.io, a company specialized in Reinforcement Learning, Deep Learning, and Machine Learning training.
Workshop | MLOps & Management | NLP | Intermediate-Advanced
During the workshop, the audience will gain not only a holistic overview of the TensorFlow ecosystem but will also learn the necessary steps to bring ML projects from experiments to production. With the knowledge, the participants can translate their ML projects into TFX pipelines and simplify their ML model production processes…more details
Hannes Hapke works in machine learning at Digits. Prior, he was a senior machine learning scientist for Concur Labs at SAP Concurfor Concur Labs at SAP Concur, where he explored innovative ways to use machine learning to improve the experience of a business traveler. Hannes has also solved machine learning and ML infrastructure problems in various industries including healthcare, retail, recruiting, and renewable energies. He was recognized as a Google Developer Expert for ML and has co-authored two machine learning publications: “Building Machine Learning Pipeline” by O’Reilly Media and “NLP in Action” by Manning Publications.
Workshop | R Programming | Intermediate
Apache Arrow is a cross-language development platform for in-memory analytics. In this tutorial, I’ll show how you can use Arrow in Python and R, both separately and together, to speed up data analysis on datasets that are bigger than memory. We’ll cover the fundamentals of Arrow in Python in R, then explore in depth Arrow’s Dataset feature, which provides for fast, efficient querying of large, multi-file datasets. Finally, we’ll discuss Flight, an Arrow-native client-server framework for transporting data, and show how to set up a server and query against it...more details
Experienced product manager and engineering leader with a focus on data science and data products, currently working on the Apache Arrow project with the Ursa Labs team. Previously, he led product development at Crunch.io, where he grew a geographically distributed, cross-functional team from 5 to 25, shaped the product vision and roadmap, and developed processes to deliver a high-quality product tailored to the needs of survey researchers.
He is also an open-source software developer, author of several packages for the R language, and contributor to many others in R and Python. Many of his projects sit at the intersection of data science and web services. In his projects and public talks, he is a strong advocate for intuitive user experience/API design and for comprehensive test coverage.
Prior to working in software development, he received a Ph.D. in Political Science from UC Berkeley and worked in data science at YouGov, where analyzed survey data and developed tools to streamline data pipelines and workflows. Among the statistical tools and methods he has used professionally and in peer-reviewed publications are experimental and quasi-experimental methods, text classification and sentiment analysis, survey weighting, and web scraping.
Jeffrey is a VP of Data Science, Data Engineering, and Platform Engineering at the Store Associate Technology of Walmart Global Technology. His prior roles include the Chief Data Scientist at AllianceBernstein, a global asset-management firm that managed nearly $700 billion, Vice President and Head of Data Science at Silicon Valley Data Science, and senior leadership position at Charles Schwab Corporation and KPMG. He has also taught econometrics, statistics, and machine learning at UC Berkeley, Cornell, NYU, University of Pennsylvania, and Virginia Tech. Jeffrey is active in the data science community and often speaks at data science conferences and local events. He has many years of experience in applying a wide range of econometric and machine learning techniques to create analytic solutions for financial institutions, businesses, and policy institutions. Jeffrey holds a Ph.D. and an M.A. in Economics from the University of Pennsylvania and a B.S. in Mathematics and Economics from UCLA.
Workshop | Machine Learning | Open-source | Intermediate
Gradient boosting is a powerful machine-learning technique that achieves state-of-the-art results in a variety of practical tasks. For a number of years, it has remained the primary method for learning problems with heterogeneous features, noisy data, and complex dependencies: web search, recommendation systems, weather forecasting, and many others.
Some problems contain different types of data, including numerical, categorical and text data. In this case the best solution is either buiding new numerical features instead of text and categories and pass it to gradient boosting, or using out-of-the box solutions for that…more details
Workshop | Deep Learning | Beginner-Intermediate
Adoption of messaging communication and voice assistants has grown rapidly in the last years. This creates a huge demand for tools that speed up development of Conversational AI systems. An open-source DeepPavlov framework is created for development of multi-skill conversational agents. It prioritizes efficiency, modularity, and extensibility with the goal to make it easier to develop dialogue systems from scratch and with limited data available. It also supports modular as well as end-to-end approaches to the implementation of conversational agents...more details
Mikhail Burtsev is a head of Neural Networks and Deep Learning Laboratory at Moscow Institute of Physics and Technology. He is also founder and leader of open-source conversational AI framework DeepPavlov. Mikhail had proposed and co-organize a series of academic Conversational AI Challenges (including NIPS 2017, NeurIPS 2018, EMNLP 2020). His research interests are in fields of Natural Language Processing , Machine Learning, Artificial Intelligence and Complex Systems. Mikhail Burtsev has published more than 20 technical papers including – Nature, Artificial Life, Lecture Notes in Computer Science series and other peer reviewed venues.
Workshop | Machine Learning | ML for Programmers | Beginner-Intermediate
The talk will emphasize both data preparation (ETL) and machine learning operations, with a hands-on demonstration of porting a typical workflow from CPU to GPU and measuring the speedup. We’ll go into more detail on real-world applications taking advantage of these speed improvements, including hyperparameter optimization for machine learning models, single cell genomics analysis, and applications in finance. For large-data users, we’ll discuss some of the options for scaling RAPIDS to multiple GPUs or multiple nodes, emphasizing the tight integration with the Dask ecosystem…more details
Tutorial | Machine Learning | Data Visualization | Beginner-Intermediate
Data scientists often get asked questions of the form “Does X Drive Y”: (1) did recent PR coverage drive sign ups, (2) does customer support increase sales, or (3) did improving the recommendation model drive revenue? Supporting company stakeholders requires every data scientist to learn techniques that can answer questions like these, which are centered around issues of causality. Often times we cannot use AB testing to answer these questions and must turn to causal inference techniques instead…more details
Vinod Bakthavachalam is a Data Scientist working with the Content Strategy and Enterprise teams, focusing on using Coursera’s data to understand what are the most valuable skills across roles, industries, and geographies. Prior to Coursera, he worked in quantitative finance and studied Economics, Statistics, and Molecular & Cellular Biology at UC Berkeley.
Workshop | Deep Learning | Data Visualization | Intermediate
In this hands-on tutorial, we will introduce attendees to the siVAE (scalable, interpretable VAE) model that infers a set of factor loadings that explicitly map latent dimensions to the input features that define them, during training of the VAE model. Using standard datasets from computer vision (MNIST, Fashion-MNIST and CIFAR-10), we will walk attendees through the process of training the siVAE model, visualizing the sample embeddings inferred by classic VAEs, and extracting and visualizing the features that contribute to individual latent dimensions. We will also teach attendees how to estimate and visualize feature awareness, a new metric for measuring the overall importance of individual features for embedding a sample in the latent space. At the end of the tutorial, attendees will be able to train an siVAE model on their own datasets and interpret and visualize the latent dimensions inferred…more details
Gerald Quon is an Assistant Professor in the Department of Molecular and Cellular Biology at the University of California at Davis. He obtained his Ph.D. in Computer Science from the University of Toronto, M.Sc. in Biochemistry from the University of Toronto, and B. Math in Computer Science from the University of Waterloo. He also completed postdoctoral research training at MIT. His lab focuses on applications of machine learning to human genetics, genomics and health, and is funded by the National Science Foundation, National Institutes of Health, the Chan Zuckerberg Initiative, and the American Cancer Society.
Yongin is a PhD candidate at UC Davis advised by Gerald Quon. His research focuses on computational biology involving applications of machine learning to answer questions in field of biology. More specifically, his current research focuses on the interpretation of deep neural network architectures trained on genomics data to understand the underlying gene to gene relationships.
Workshop | Machine Learning | Beginner-Intermediate
Most deployed Machine Learning models use Supervised Learning powered by human training data. Selecting the right data for human review is known as Active Learning. This talk will introduce a set of Active Learning methods that help you understand where your model is currently confused (Uncertainty Sampling) and to identify gaps in your model knowledge (Diversity Sampling). We’ll cover techniques that are only a few lines of code through to techniques that build on recent advances in transfer learning. We’ll use code examples from my open source PyTorch Active Learning library...more details
Robert Munro is an expert in combining Human and Machine Intelligence, working with Machine Learning approaches to Text, Speech, Image and Video Processing. Robert has founded several AI companies, building some of the top teams in Artificial Intelligence. He has worked in many diverse environments, from Sierra Leone, Haiti and the Amazon, to London, Sydney and Silicon Valley, in organizations ranging from startups to the United Nations. He has shipped Machine Learning Products at startups and at/with Amazon, Google, IBM & Microsoft.
Robert has published more than 50 papers on Artificial Intelligence and is a regular speaker about technology in an increasingly connected world. He has a PhD from Stanford University. Robert is the author of Human-in-the-Loop Machine Learning (Manning Publications, 2020)
Tutorial | NLP | Machine Learning | Intermediate
This is a hands-on tutorial on applying the latest advances in deep learning and transfer learning for common NLP tasks such as named entity recognition, document classification, spell checking, and sentiment analysis. Learn to building complete text analysis pipelines using the highly accurate, high performant, open-source Spark NLP library in Python.This is a hands-on tutorial on applying the latest advances in deep learning and transfer learning for common NLP tasks such as named entity recognition, document classification, spell checking, and sentiment analysis. Learn to building complete text analysis pipelines using the highly accurate, high performant, open-source Spark NLP library in Python…more details
David Talby is a chief technology officer at Pacific AI, helping fast-growing companies apply big data and data science techniques to solve real-world problems in healthcare, life science, and related fields. David has extensive experience in building and operating web-scale data science and business platforms, as well as building world-class, Agile, distributed teams. Previously, he was with Microsoft’s Bing Group, where he led business operations for Bing Shopping in the US and Europe and worked at Amazon both in Seattle and the UK, where he built and ran distributed teams that helped scale Amazon’s financial systems. David holds a Ph.D. in computer science and master’s degrees in both computer science and business administration.
Workshop | NLP | Deep Learning | Intermediate-Advanced
NLP is one of the fastest-growing fields within AI. A wide variety of tasks can be tackled with NLP such as text classification, question-answering (e.g. chatbots), translation, topic modelling, sentiment analysis, summarization, and so on. In this workshop, we focus on text summarization, as it is not commonly showcased in tutorials despite being a powerful and challenging application of NLP. We see a trend towards pre-training Deep Learning models on a large text corpus and fine-tuning them for a specific downstream task (also known as transfer learning). In this hands-on workshop, you’ll get the opportunity to apply a state-of-the-art summarization model to generate news headlines. We finetuned this model on Reuters news data, which is professionally produced by journalists and strictly follows rules of integrity, independence and freedom from bias…more details
Nadja Herger is a Data Scientist at Thomson Reuters Labs, based in Switzerland. She is primarily focusing on Deep Learning PoCs within the Labs, where she is working on applied NLP projects in the legal and news domains, applying her skills to text classification, metadata extraction, and summarization tasks. Before joining Thomson Reuters, she obtained her Ph.D. in Climate Science from the University of New South Wales, Australia. She has successfully made the transition from working with spatio-temporal data to working with text-based data on the job. Nadja is passionate about education, which is reflected in her ongoing mentorship of students within Thomson Reuters, as well as South African students from previously disadvantaged groups who are aspiring to get into Data Science.
Nina Hristozova is a Data Scientist at Thomson Reuters (TR) Labs. She has a BSc in Computer Science from the University of Glasgow, Scotland. As part of her role at TR she has worked on a wide range of projects applying ML and DL to a variety of NLP problems. Her current focus is on applied summarization of legal text. She is actively engaged with local technology Meetups to spread the love and knowledge for NLP through tech talks and workshops. In her free time she plays volleyball for the local team in Zug, enjoys going to the mountains and SUP in the lakes.
Viktoriia Samatova is a Head of Applied Innovation team of Data Scientists within Reuters Technology division focused on discovering and applying new technologies to enhance Reuters products and improving efficiency of news content production and discoverability. Prior to joining Reuters, Viktoriia spent over 5 years at State Street Bank’s Global Exchange division where she was managing product development of academic and financial industry content ingestion platform, which was the first company-wide product application utilizing emerging technologies of AI and machine learning.
Workshop | Machine Learning | Research Frontiers | Intermediate
Randomized Controlled Trial (RCT) has been the gold standard for determining the Average Treatment Effect of a program for marketing, medical, economic, political, and policy applications. Uplift modeling is an additional step to uncover individuals who are most positively influenced by treatment through predictive analytics and machine learning by identifying Heterogeneous Treatment Effects. This approach enables us to identify the likely “”persuadables”” in order to maximize treatment impact through optimal target selection. It has gained significant attention in recent years from multiple fields such as personalized medicine, political election, personalized marketing, and personalized healthcare with growing publications and presentations from industry and academic experts across the world...more details
Victor has managed teams of quantitative analysts in multiple organizations. He is currently Senior Vice President, Data Science and Artificial Intelligence in Workplace Investing at Fidelity Investments. Previously he managed advanced analytics / data science teams in Personal Investing, Corporate Treasury, Managerial Finance, and Healthcare and Total Well-being at Fidelity Investments. Prior to Fidelity, he was VP and Manager of Modeling and Analysis at FleetBoston Financial (now Bank of America), and Senior Associate at Mercer Management Consulting (now Oliver Wyman).
For academic services, Victor is an elected board member of the National Institute of Statistical Sciences (NISS), where he provides guidance to the board and general education to the statistics community. He has also been a visiting research fellow and corporate executive-in-residence at Bentley University, as well as serving on the steering committee of the Boston Chapter of the Institute for Operations Research and the Management Sciences (INFORMS). Victor earned a master’s degree in Operational Research at Lancaster University, UK, and a PhD in Statistics at the University of Hong Kong, and was a Postdoctoral Fellow in Management Science at University of British Columbia. He has co-authored a graduate level econometrics book and published numerous articles in Data Science, Marketing, Statistics, and Management Science literature. and is co-authoring a graduate-level data science textbook titled “Cause-and-Effect Business Analytics.
Workshop | Machine Learning
Bayesian statistical methods are becoming more common, but there are not many resources to help beginners get started. People who know Python can use their programming skills to get a head start. In this workshop, I introduce Bayesian methods using grid algorithms, which help develop understanding, and MCMC, which is a powerful algorithm for real-world problems.
As the primary example, we will estimate goal scoring rates in hockey and soccer. This example is meant to be fun, but it is also useful; the same methods apply to any system well-modeled by a Poisson process, including customers arriving at a business, requests coming in to a server, and many other applications…more details
Allen Downey is a Professor of Computer Science at Olin College of Engineering in Needham, MA. He is the author of several books related to computer science and data science, including Think Python, Think Stats, Think Bayes, and Think Complexity. Prof Downey has taught at Colby College and Wellesley College, and in 2009 he was a Visiting Scientist at Google. He received his Ph.D. in Computer Science from U.C. Berkeley, and M.S. and B.S. degrees from MIT.
Workshop | Machine Learning | Intermediate
Effective predictive modeling projects follow the analytics life cycle, from data and discovery to deployment and decisions. Data scientists use a variety of tools, both commercial and open-source, to collaborate and develop enterprise applications of analytics and artificial intelligence. SAS Viya provides a unified platform to perform all these from one graphical user interface or through programming APIs. In this workshop, you will load data into memory, prepare input variables for modeling and build complex analytics pipelines to demonstrate powerful machine learning models. Need to integrate open source models? No problem. We’ll show you how you to do that and deploy any model. Then you can save and package the best performing model for deployment while keeping the ability to retrain it on new data…more details
Jordan Bakerman holds a Ph.D. in statistics from North Carolina State University. His dissertation centered on using social media to forecast real-world events, such as civil unrest and influenza rates. As an intern at SAS, Jordan wrote the SAS Programming for R Users course for students to efficiently transition from the R to SAS using a cookbook style approach. As an employee, Jordan has developed courses demonstrating how to integrate open source software within SAS products. He is passionate about statistics, programming, and helping others become better statisticians.
Ari Zitin holds bachelor’s degrees in both physics and mathematics from UNC-Chapel Hill. His research focused on collecting and analyzing low energy physics data to better understand the neutrino. Ari taught introductory and advanced physics and scientific programming courses at UC-Berkeley while working on a master’s in physics with a focus on nonlinear dynamics. While at SAS, Ari has worked to develop courses that teach how to use Python code to control SAS analytical procedures, including machine learning and optimization.
Workshop | Machine Learning | Research Frontiers | Intermediate-Advanced
The values of a categorical variable frequently have a structure that is not ordinal or linear in nature. For example, the months of the year have a circular structure, and the US States have a geographical structure. Standard approaches such as one-hot or numerical encoding are unable to effectively exploit the structural information of such variables. In this tutorial, we will introduce the StructureBoost gradient boosting package, wherein the structure of categorical variables can be represented by a graph, and exploited to improve predictive performance. Moreover, StructureBoost can make informed predictions on categorical values for which there is little or no data, by leveraging the knowledge of the structure. We will walk through examples of how to configure and train models using StructureBoost and demonstrate other features of the package…more details
Brian Lucena is Principal at Numeristical, where he advises companies of all sizes on how to apply modern machine learning techniques to solve real-world problems with data. He is the creator of StructureBoost, ML-Insights, and the SplineCalib calibration tool. In previous roles he has served as Senior VP of Analytics at PCCI, Principal Data Scientist at Clover Health, and Chief Mathematician at Guardian Analytics. He has taught at numerous institutions including UC-Berkeley, Brown, USF, and the Metis Data Science Bootcamp.
Tutorial | Machine Learning | NLP | Intermediate-Advanced
xNLP driven AI is a fast-growing technology domain with diverse applications in customer engagement, employee collaboration, marketing, and social media. Does having an accurate model by itself mean, that we have a successful product, service, or solution? No! The most difficult phase begins after that. We still have the following challenges to solve. How do we Create a sellable service around this model? How to Scale this service to handle millions of inference requests, in a cost-effective manner? How to build automated deployment pipelines for software and models? How to add security, privacy, manageability, and observability for the service? How to track model drift and analyze model performance? In this session, we will discuss these questions and explore the answers. We will go through the unique challenges that NLP serving poses and the solutions and best practices to overcome them…more details
Kumaran Ponnambalam is an AI and Big Data leader with 15+ years of experience. He is currently the Director of AI for Webex Contact Center at Cisco. He focuses on creating robust, scalable AI platforms and models to drive effective customer engagements. In his current and previous roles, he has built data pipelines, ML models, analytics and integrations around customer engagement. He has also authored several courses on the LinkedIn Learning Platform in Machine Learning and Big Data areas. He holds a MS in Information Technology and advanced certificates in Deep Learning and Data Science.
Workshop | Machine Learning | ML for Programmers | Intermediate-Advanced
A brief introduction to Reinforcement Learning (RL), and a walkthrough of using the Dopamine library for running RL experiments…more details
Pablo Samuel Castro was born and raised in Quito, Ecuador, and moved to Montreal after high school to study at McGill. He stayed in Montreal for the next 10 years, finished his bachelors, worked at a flight simulator company, and then eventually obtained his masters and PhD at McGill, focusing on Reinforcement Learning. After his PhD Pablo did a 10-month postdoc in Paris before moving to Pittsburgh to join Google. He has worked at Google for over 8 years, and is currently a Staff Research Software Developer in Google Brain in Montreal, focusing on fundamental Reinforcement Learning research, as well as Machine Learning and Creativity. Aside from his interest in coding/AI/math, Pablo is actively trying to increase the presence of the latin american community in the AI research ecosystem. On the side, he’s an active musician (https://www.psctrio.com/), loves running (5 marathons so far, including Boston!), and enjoys discussing politics and activism.
Workshop | Research Frontiers | Intermediate
Novel single-cell transcriptome sequencing assays allow researchers to measure gene expression levels at the resolution of single cells and offer the unprecedented opportunity to investigate fundamental biological questions at the cellular level, such as stem cell differentiation or the discovery and characterization of rare cell types. The majority of the computational methods to analyze single-cell RNA-Seq data are implemented in R making it a natural tool to start working with single-cell transcriptomic data…more details
Fanny Perraudeau is a Senior Manager in Data Science & Software at Pendulum where she leads research into the application of state-of-the-art biostatistical, bioinformatics and computational methodology to better understand and leverage the human microbiome to deliver health innovations. She designs and implements novel clinical programs and develops novel algorithms and bioinformatics pipelines to associate clinical, genomic, mobile, and consumer data streams with metagenomics datasets. She has a master’s in Engineering from Ecole Polytechnique, France, and a Ph.D. in Biostatistics from the University of California, Berkeley, the USA with a Designated Emphasis in Computational and Genomic Biology.
Eric P. Xing is a Professor of Computer Science at Carnegie Mellon University, and the Founder, CEO, and Chief Scientist of Petuum Inc., a 2018 World Economic Forum Technology Pioneer company that builds standardized artificial intelligence development platform and operating system for broad and general industrial AI applications. He completed his undergraduate study at Tsinghua University, and holds a PhD in Molecular Biology and Biochemistry from the State University of New Jersey, and a PhD in Computer Science from the University of California, Berkeley. His main research interests are the development of machine learning and statistical methodology, and large-scale computational system and architectures, for solving problems involving automated learning, reasoning, and decision-making in high-dimensional, multimodal, and dynamic possible worlds in artificial, biological, and social systems. Prof. Xing currently serves or has served the following roles: associate editor of the Journal of the American Statistical Association (JASA), Annals of Applied Statistics (AOAS), IEEE Journal of Pattern Analysis and Machine Intelligence (PAMI) and the PLoS Journal of Computational Biology; action editor of the Machine Learning Journal (MLJ) and Journal of Machine Learning Research (JMLR); member of the United States Department of Defense Advanced Research Projects Agency (DARPA) Information Science and Technology (ISAT) advisory group. He is a recipient of the Carnegie Science Award, National Science Foundation (NSF) Career Award, the Alfred P. Sloan Research Fellowship in Computer Science, the United States Air Force Office of Scientific Research Young Investigator Award, the IBM Open Collaborative Research Faculty Award, as well as several best paper awards. Prof Xing is a board member of the International Machine Learning Society; he has served as the Program Chair (2014) and General Chair (2019) of the International Conference of Machine Learning (ICML); he is also the Associate Department Head of the Machine Learning Department, founding director of the Center for Machine Learning and Health at Carnegie Mellon University; and he is a Fellow of the Association of Advancement of Artificial Intelligence (AAAI), and an IEEE Fellow.
Workshop | R Programming | Machine Learning | Intermediate
With customer privacy laws, missing data is becoming more of a normal situation. Many approaches are either computationally expensive, or throw out the baby with the bath water. There are certain models which allow missing data directly, and variants of common models which can be adapted to do so. Surprisingly, we can create effective predictability even in scenarios where data is missing not at random and at rates of higher than 80%. Hands on workshop using R or Python...more details
Anne Lifton has ten years of experience in data science and 3 years in data science management. She has worked across a range of industries from medical devices to retail to engineering, and specializes in reducing the cycle time to delivery of models.
Workshop | Machine Learning | Deep Learning | All Levels
Leveraging data to produce algorithmic rules is nothing new. However, we used to second guess these rules, and now we don’t. Therefore, a human used to be accountable, and now it’s the algorithm… But can it? How do we solve this accountability gap and ethical quandary?…more details
Serg Masís is a Data Scientist in agriculture with a lengthy background in entrepreneurship and web/app development, and the author of the bestselling book “Interpretable Machine Learning with Python”. Passionate about machine learning interpretability, responsible AI, behavioral economics, and causal inference.
Tutorial | Deep Learning | Machine Learning | Intermediate
Deep generative models have made great progress to synthesize data realistically in various domains such as image generation and speech synthesis. Generative modeling brings a new paradigm shift in AI from content classification and regression to content analysis and creation. This tutorial aims at introducing the basics of deep generative models such as Variational AutoEncoder (VAEs) and Generative Adversarial Networks (GANs) as well as the user interaction with the models for content manipulation and creation. By completing this workshop, you will have an understanding of the deep generative models, their strengths and weaknesses, and their promising applications in content analytics and creation. The workshop will focus on the generative models used in image synthesis but the introduced methodology is able to be extended in different domains…more details
Bolei Zhou is an Assistant Professor with the Information Engineering Department at the Chinese University of Hong Kong. He received his PhD in computer science at the Massachusetts Institute of Technology. His research is on machine perception and decision, with a focus on visual scene understanding and interpretable AI systems. He received the MIT Tech Review’s Innovators under 35 in Asia-Pacific award, Facebook Fellowship, Microsoft Research Asia Fellowship, MIT Greater China Fellowship, and his research was featured in media outlets such as TechCrunch, Quartz, and MIT News. More about his research is at http://bzhou.ie.cuhk.edu.hk/.
Many problems in systems and chip design are in the form of combinatorial optimization on graph structured data. In this talk, I will motivate taking a learning based approach to combinatorial optimization problems with a focus on deep reinforcement learning (RL) agents that generalize. I will discuss our work on a new domain-transferable reinforcement learning methodology for optimizing chip placement, a long pole in hardware design…more details
Azalia Mirhoseini is a Senior Research Scientist at Google Brain. She is the co-founder/tech-lead of the Machine Learning for Systems Team in Google Brain where they focus on deep reinforcement learning based approaches to solve problems in computer systems and metalearning. She has a Ph.D. in Electrical and Computer Engineering from Rice University. She has received a number of awards, including the MIT Technology Review 35 under 35 award, the Best Ph.D. Thesis Award at Rice and a Gold Medal in the National Math Olympiad in Iran. Her work has been covered in various media outlets including MIT Technology Review, IEEE Spectrum, and Wired.
We live in the era of Analytics Heterogeneity – where analytics is not limited to one single methodology, tool or algorithm, but is able to leverage the full potential of the fast-growing and rapid-changing ecosystem of analytical solutions and technologies available. However, one byproduct of this evolution has been increasing complexity of how to turn analytics into value. Which new capabilities are needed to be successful in this new era? What can we do to be champions and convert this complexity into a competitive advantage? In this session, I will walk you through the best practices to move from sandbox to production…more details
Marinela Profi uses her cross-domain expertise in statistics, business and marketing, to showcase the value of SAS Platforms and Industry Solutions specifically for the Data Science and Machine Learning field. She is a keynote speaker for global conferences, where she shares trends and priorities of the data science industry and a published author and contributor to major industry and data science publications. Her main contributions have been around the combination of forecasting and machine learning techniques. As a proud woman in tech, she finds many ways to empower and educate other women on why joining the tech world is inspiring and rewarding.
AI for Good | Machine Learning | All Levels
Every field has data. We use data to discover new knowledge, to interpret the world, to make decisions, and even to predict the future. The recent convergence of big data, cloud computing, and novel machine learning algorithms and statistical methods is causing an explosive interest in data science and its applicability to all fields. This convergence has already enabled the automation of some tasks that better human performance. The novel capabilities we derive from data science will drive our cars, treat disease, and keep us safe. At the same time, such capabilities risk leading to biased, inappropriate, or unintended action. The design of data science solutions requires both excellence in the fundamentals of the field and expertise to develop applications which meet human challenges without creating even greater risk...more details
Jeannette M. Wing is Avanessians Director of the Data Science Institute and Professor of Computer Science at Columbia University. From 2013 to 2017, she was a Corporate Vice President of Microsoft Research. She is Adjunct Professor of Computer Science at Carnegie Mellon where she twice served as the Head of the Computer Science Department and had been on the faculty since 1985. From 2007-2010 she was the Assistant Director of the Computer and Information Science and Engineering Directorate at the National Science Foundation. She received her S.B., S.M., and Ph.D. degrees in Computer Science, all from the Massachusetts Institute of Technology. Professor Wing’s general research interests are in the areas of trustworthy computing, specification and verification, concurrent and distributed systems, programming languages, and software engineering. Her current interests are in the foundations of security and privacy, with a new focus on trustworthy AI. She was or is on the editorial board of twelve journals, including the Journal of the ACM and Communications of the ACM. Professor Wing is known for her work on linearizability, behavioral subtyping, attack graphs, and privacy-compliance checkers. Her 2006 seminal essay, titled “Computational Thinking,” is credited with helping to establish the centrality of computer science to problem-solving in fields where previously it had not been embraced. She is currently a member of: the National Library of Medicine Blue Ribbon Panel; the Science, Engineering, and Technology Advisory Committee for the American Academy for Arts and Sciences; the Board of Trustees for the Institute of Pure and Applied Mathematics; the Advisory Board for the Association for Women in Mathematics; and the Alibaba DAMO Technical Advisory Board. She has been chair and/or a member of many other academic, government, and industry advisory boards. She received the CRA Distinguished Service Award in 2011 and the ACM Distinguished Service Award in 2014. She is a Fellow of the American Academy of Arts and Sciences, American Association for the Advancement of Science, the Association for Computing Machinery (ACM), and the Institute of Electrical and Electronic Engineers (IEEE).
An AI expert and health AI pioneer, Suchi Saria’s research has led to myriad new inventions to improve patient care. Her work first demonstrated the use of machine learning to make early detection possible in sepsis, a life-threatening condition (Science Trans. Med. 2015). In Parkinson’s, her work showed a first demonstration of using readily-available sensors to easily track and measure symptom severity at home, to optimize treatment management (JAMA Neurology 2018). On the technical front, her work at the intersection of machine learning and causal inference has led to new ideas for building and evaluating reliable ML (ACM FAT 2019). Suchi currently holds a John C. Malone endowed chair at Johns Hopkins University, with appointments across engineering, public health, and medicine. She is also the Founder of Bayesian Health, aiming to revolutionize the delivery of healthcare by empowering providers and health systems with real-time access to essential clinical inferences. She is the recipient of numerous prizes and honors, including being named a Sloan Research Fellow, a National Academy of Medicine Emerging Leader in Health and Medicine, MIT Technology Review’s 35 Innovators Under 35, and a World Economic Forum Young Global Leader.
Zoubin Ghahramani is Chief Scientist of Uber and a world leader in the field of machine learning, significantly advancing the state-of-the-art in algorithms that can learn from data. He is known in particular for fundamental contributions to probabilistic modeling and Bayesian approaches to machine learning systems and AI. Zoubin also maintains his roles as Professor of Information Engineering at the University of Cambridge and Deputy Director of the Leverhulme Centre for the Future of Intelligence. He was one of the founding directors of the Alan Turing Institute (the UK’s national institute for Data Science and AI), and is a Fellow of St John’s College Cambridge and of the Royal Society.
Ben Taylor has over 16 years of machine learning experience. After studying chemical engineering, Taylor joined Intel and Micron and worked in their photolithography, process control, and yield prediction groups. Pursuing his love for high-performance computing (HPC) and predictive modeling, Taylor joined an artificial intelligence hedge fund (AIQ) as their HPC/AI expert and built out models using a 600 GPU cluster to predict stock movements based on the news. Taylor then joined a young HR startup called HireVue. Taylor built out their data science group, filed 7 patents, and helped to launch HireVue’s AI insights product using video/audio from candidate interviews. That work allowed Taylor’s team of PhD physicists to help pioneer anti-bias mitigation strategies for AI. In 2017 Taylor co-founded Zeff.ai with David Gonzalez to pursue deep learning for image, audio, video, and text for the enterprise. Zeff was acquired by DataRobot.
Distributed applications are not new. The first distributed applications were developed over 50 years ago with the arrival of computer networks, such as ARPANET. Since then, developers have leveraged distributed systems to scale-out applications and services, including large-scale simulations, web serving, and big data processing. However, until recently, distributed applications have been the exception, rather than the norm. However, this is changing quickly…more details
Ion Stoica is a Professor in the EECS Department at University of California at Berkeley. HHe does research on cloud computing and networked computer systems. Past work includes Apache Spark, Apache Mesos, Tachyon, Chord DHT, and Dynamic Packet State (DPS). He is an ACM Fellow and has received numerous awards, including the SIGOPS Hall of Fame Award (2015), the SIGCOMM Test of Time Award (2011), and the ACM doctoral dissertation award (2001). In 2013, he co-founded Databricks a startup to commercialize technologies for Big Data processing, and in 2006 he co-founded Conviva, a startup to commercialize technologies for large scale video distribution.
Multiple organizations often wish to aggregate their sensitive data and learn from it, but they cannot do so because they cannot share their data. For example, banks wish to train models jointly over their aggregate transaction data to detect money launderers because criminals hide their traces across different banks. To address such problems, my students and I developed MC^2, a framework for secure collaborative computation. My talk will overview our MC^2 platform, our technical approach, results, and adoption…more details
Raluca Ada Popa is an assistant professor of computer science at UC Berkeley. She is interested in security, systems, and applied cryptography. Raluca developed practical systems that protect data confidentiality by computing over encrypted data, as well as designed new encryption schemes that underlie these systems. Some of her systems have been adopted into or inspired systems such as SEEED of SAP AG, Microsoft SQL Server’s Always Encrypted Service, and others. Raluca received her PhD in computer science as well as her two BS degrees, in computer science and in mathematics, from MIT. She is the recipient of an Intel Early Career Faculty Honor award, George M. Sprowls Award for best MIT CS doctoral thesis, a Google PhD Fellowship, a Johnson award for best CS Masters of Engineering thesis from MIT, and a CRA Outstanding undergraduate award from the ACM.
Track Keynote | Machine Learning | All levels
ML Models in production lose accuracy over time. A number of factors contribute to this change, like Demographic Mix, Consumer Behavior change etc. Using these models’ output resultz in incorrect decisions that could lead to catastrophic failures for the organization. This existing whitespace calls for a solution(s) to help Data Science teams predict the failure in advance...more details
Aravind heads the Data Science Org at Tredence, and his team works on the R&D & algorithm development for new Data Science solutions
Talk | Machine Learning | AI for Good | All Levels
Rainforests are the most diverse, complex, and imperiled terrestrial ecosystems on Earth. Despite their critical importance to the livelihoods of local communities, harboring immense biodiversity, regulating climate, and being the origin of most modern medicines, rainforests are undervalued, understudied, and overexploited.
In addition, the Rainforest is dense, vast, and home to notoriously harsh environmental conditions, all presenting barriers for current technology and research. The 5-year $10M Rainforest XPRIZE attempts to solve this problem by improving our understanding of rainforest ecosystems. The competition challenges teams to integrate current and emerging technology such as data science, artificial intelligence, machine learning, robotics, and remote sensing to survey biodiversity in multiple stories of the rainforest and to use data to deliver insights in near real-time and in unprecedented detail that will reveal the true value of standing forests, ultimately leading to their sustainable use and preservation globally.
Tune in to our talk to learn more and see how you can become involved in this groundbreaking competition for the benefit of humanity…more details
Peter Houlihan specializes in planning and leading expeditions into understudied and threatened rainforests all over the world for conservation. Regularly operating in more than 20 countries across Africa, the Americas, and Asia, Peter is passionate about working with local scientists and communities, and inspiring others to learn about our natural world. He has lived and worked extensively throughout the tropics, where he has led nearly 50 large scale expeditions and managed long term conservation programs, particularly in Borneo, Madagascar, the Amazon, Central America, and the Congo Basin. A tropical ecologist and conservation scientist by training, Peter is a Senior Research Fellow with UCLA’s Center for Tropical Research, a frequent visiting scientist at the Smithsonian Tropical Research Institute in Panama, International Advisor for the Borneo Nature Foundation, and an Adjunct Professor for Johns Hopkins University instructing international graduate level field courses in tropical ecology and conservation.
Globally, Peter’s research, conservation work, and science communication have helped lead to the establishment of new protected areas and the documentation of species new to science. As a multilingual photographer, videographer, and producer, Peter fuses conservation science with high-impact media to inform broad international audiences about our planet. On camera, Peter has appeared alongside Sir David Attenborough in a BAFTA-nominated series for the BBC, across National Geographic platforms, and in an Emmy-nominated PBS documentary. A recent film about his scientific research in the Florida Everglades has received many accolades as an Official Selection of numerous film festivals. Peter is also a National Geographic Explorer, Photographer, and Expeditions Expert, a Fellow of the Explorers Club, and a gear tester for Patagonia.
Track Keynote | AI for Good | Research Frontiers | All Levels
This talk will describe the dramatic creation of the COVID-19 Open Research Dataset (CORD-19) and the broad range of efforts, both inside and outside of the Semantic Scholar project, to garner insights into COVID-19 and its treatment based on this data...more details
Dr. Oren Etzioni has served as the Chief Executive Officer of the Allen Institute for AI (AI2) since its inception in 2014. He has been a Professor at the University of Washington’s Computer Science department since 1991, and a Venture Partner at the Madrona Venture Group since 2000. He has garnered several awards including Seattle’s Geek of the Year (2013), the Robert Engelmore Memorial Award (2007), the IJCAI Distinguished Paper Award (2005), AAAI Fellow (2003), and a National Young Investigator Award (1993). He has been the founder or co-founder of several companies, including Farecast (sold to Microsoft in 2008) and Decide (sold to eBay in 2013). He has written commentary on AI for The New York Times, Nature, Wired, and the MIT Technology Review. He helped to pioneer meta-search (1994), online comparison shopping (1996), machine reading (2006), and Open Information Extraction (2007). He has authored over 100 technical papers that have garnered over 2,000 highly influential citations on Semantic Scholar. He received his Ph.D. from Carnegie Mellon in 1991 and his B.A. from Harvard in 1986.
Track Keynote
Scikit-learn was born as a tool from machine-learning geeks for computer geeks. But it has grown as an industry standard, used by many with various impacts on the world. Growing up brings new challenges. How can an open source, community-driven project address the needs of a diverse and huge user base?…more details
Gaël Varoquaux is a research director working on data science and health at Inria (French Computer Science National research). His research focuses on using data and machine learning for scientific inference, with applications to health and social science, as well as developing tools that make it easier for non-specialists to use machine learning. He has long applied it to brain-imaging data to understand cognition. Years before the NSA, he was hoping to make bleeding-edge data processing available across new fields, and he has been working on a mastermind plan building easy-to-use open-source software in Python. He is a core developer of scikit-learn, joblib, Mayavi and nilearn, a nominated member of the PSF, and often teaches scientific computing with Python using the scipy lecture notes.
Track Keynote | NLP | Intermediate
Measuring the impact of scientific articles is important for evaluating the research output of individual scientists, academic institutions and journals. While citations are raw data for constructing impact measures, there exist biases and potential issues if factors affecting citation patterns are not properly accounted for. In this work, we address the problem of field variation and introduce an article level metric useful for evaluating individual articles’ visibility. This measure derives from joint probabilistic modeling of the content in the articles and the citations among them using latent Dirichlet allocation (LDA) and the mixed membership stochastic blockmodel (MMSB). Our proposed model provides a visibility metric for individual articles adjusted for field variation in citation rates, a structural understanding of citation behavior in different fields, and article recommendations which take into account article visibility and citation patterns...more details
Tian Zheng is Professor and Department Chair of Statistics at Columbia University. She develops novel methods for exploring and understanding patterns in complex data from different application domains. Her current projects are in the fields of statistical machine learning, spatiotemporal modeling and social network analysis. Professor Zheng’s research has been recognized by the 2008 Outstanding Statistical Application Award from the American Statistical Association (ASA), the Mitchell Prize from ISBA and a Google research award. She became a Fellow of American Statistical Association in 2014. Professor Zheng is the recipient of the 2017 Columbia’s Presidential Award for Outstanding Teaching. From 2018-2020, she has been the chair-elect, chair and past-chair for ASA’s section on Statistical Learning and Data Science.
Talk | Machine Learning | Deep Learning | Beginner-Intermediate
This session will begin by introducing the concept of Reinforcement Learning, as well as some common use cases. After the high-level introduction, a more formal mathematical framework will be introduced. Finally, a review and demonstration of the prior ideas to create a Tic-Tac-Toe playing AI, code-free, in the KNIME Analytics Platform…more details
Corey studied Mathematics at Michigan State University and works as a Data Scientist with KNIME where he focuses on Time Series Analysis, Forecasting, and Signal Analytics. He is the creator and instructor of the KNIME Time Series Analysis course, author of the e-book: Alteryx to KNIME, creator of the KNIME Time Series Analysis components, and Co-Author of the upcoming Codeless Time Series Analysis Book with Packt.
Track Keynote | Deep Learning | All Levels
Games have been used for decades as an important way to test and evaluate the performance of artificial intelligence systems. As capabilities have increased, the research community has sought games with increasing complexity that capture different elements of intelligence required to solve scientific and real-world problems. In recent years, StarCraft, considered to be one of the most challenging Real-Time Strategy (RTS) games and one of the longest-played esports of all time, has emerged by consensus as a “grand challenge” for AI research.
In this talk, I will introduce our StarCraft II program AlphaStar, the first Artificial Intelligence to reach Grandmaster status without any game restrictions. The focus will be on the technical contributions which made possible this milestone in AI…more details
Oriol Vinyals is a Principal Scientist at Google DeepMind, and a team lead of the Deep Learning group. His work focuses on Deep Learning and Artificial Intelligence. Prior to joining DeepMind, Oriol was part of the Google Brain team. He holds a Ph.D. in EECS from the University of California, Berkeley and is a recipient of the 2016 MIT TR35 innovator award. His research has been featured multiple times at the New York Times, Financial Times, WIRED, BBC, etc., and his articles have been cited over 85000 times. Some of his contributions such as seq2seq, knowledge distillation, or TensorFlow are used in Google Translate, Text-To-Speech, and Speech recognition, serving billions of queries every day, and he was the lead researcher of the AlphaStar project, creating an agent that defeated a top professional at the game of StarCraft, achieving Grandmaster level, also featured as the cover of Nature. At DeepMind he continues working on his areas of interest, which include artificial intelligence, with particular emphasis on machine learning, deep learning and reinforcement learning.
Talk | Machine Learning | Beginner-Intermediate
In this talk, ZS will showcase:
How to use graph neural networks to learn rich, latent representations for customers (customer embeddings) to encode interactions and behaviors.
Graph neural network’s superior performance on a variety of customer-level ML tasks such as prediction of brand adoption (pre-launch, early launch), channel preferences, and engagement with digital channels (e.g. email) compared to existing approaches…more details
Srinivas leads ZS AI Research Lab with a focus on frontier innovation and development of cutting edge algorithms. Srinivas’s core expertise areas include automated machine learning, natural language processing, and marketing AI across industries. He has authored several thought leadership articles and presented at conferences. Prior to joining ZS, Srinivas spent time as a solution architect building expert systems to automate product design and manufacturing across multiple industries viz., automobile, power systems, medical devices and retail.
Track Keynote
As Deep Neural Nets have come forward to provide the best accuracy on an ever-increasing number of tasks in computer vision, audio processing, natural language processing, and recommendation systems, efficient training, and inference of Deep Neural Nets have naturally emerged as critical computing challenges of our time. While these Deep Neural Net computations are hosted on a very diverse range of processors, from hyperscale systems in datacenters to microcontrollers at the edge, there are a common set of underlying principles for making Deep Neural Nets fast and energy-efficient on these hosts…more details
Kurt received his Ph.D. degree in Computer Science from Indiana University in 1984 and then joined the research division of AT&T Bell Laboratories. In 1991 he joined Synopsys, Inc. where he ultimately became Chief Technical Officer and Senior Vice-President of Research. In 1998 Kurt became Professor of Electrical Engineering and Computer Science at the University of California at Berkeley. Kurt’s research now focuses on systems issues associated with the application of Deep Learning to computer vision, speech recognition, natural language processing, and finance.
Kurt has published six books, over 250 refereed articles, and is among the most highly cited authors in Hardware and Design Automation. Kurt was elected a Fellow of the IEEE in 1996. At the 50th Design Automation Conference Kurt received a number of awards reflecting achievements over the 50 year history of the conference. These included “Top Ten Cited Author” and “Top Ten Cited Paper.” He was also recognized as among one of only three people to have received four Best Paper Awards in the history of the conference. Kurt’s research on Deep Learning has also received Best Paper Awards at the Embedded Vision Workshop and at the International Conference on Parallel Processing.
As an entrepreneur Kurt has served as an angel investor and advisor to over twenty-five start-up companies including C-Cube Microsystems, Coverity, Simplex, and Tensilica. Kurt co-founded DeepScale with his PhD student Forrest Iandola. DeepScale was acquired by Tesla in 2019.
Talk | Deep Learning | Beginner
The talk will introduce Ludwig, a deep learning toolbox that allows to train models and to use them for prediction without the need to write code. It is unique in its ability to help make deep learning easier to understand for non-experts and enable faster model improvement iteration cycles for experienced machine learning developers and researchers alike. By using Ludwig, experts and researchers can simplify the prototyping process and streamline data processing so that they can focus on developing deep learning architectures…more details
Piero Molino is a Senior Research Scientist at Uber AI with focus on machine learning for language and dialogue. Piero completed a PhD on Question Answering at the University of Bari, Italy. Founded QuestionCube, a startup that built a framework for semantic search and QA. Worked for Yahoo Labs in Barcelona on learning to rank, IBM Watson in New York on natural language processing with deep learning and then joined Geometric Intelligence, where he worked on grounded language understanding. After Uber acquired Geometric Intelligence, he became one of the founding members of Uber AI Labs. At Uber he works on research topics including Dialogue Systems, Language Generation, Graph Representation Learning, Computer Vision, Reinforcement Learning and Meta Learning. He also worked on several deployed applications like COTA, an ML and NLP model for Customer Support, Dialogue Systems for driver hands free dispatch, pickup and communications, and on the Uber Eats Reommender System. He is the author of Ludwig, a code-free deep learnin toolbox.
Talk | DataOps and Management | AI for Good | Beginner-Intermediate
The Human Rights Data Analysis Group (HRDAG) uses methods from statistics and computer science to help answer questions about mass violence. Examples from HRDAG’s work will be presented to illustrate the many steps involved, including evaluating data quality, determining what’s missing from observed data, conducting analyses that withstand adversarial political and/or legal climates, and explaining analyses to non-technical audiences, including judges and prosecutors...more details
As the Executive Director of the Human Rights Data Analysis Group, Megan Price designs strategies and methods for statistical analysis of human rights data for projects in a variety of locations including Guatemala, Colombia, and Syria. Her work in Guatemala includes serving as the lead statistician on a project in which she analyzed documents from the National Police Archive; she has also contributed analyses submitted as evidence in two court cases in Guatemala. Her work in Syria includes serving as the lead statistician and author on three reports, commissioned by the Office of the United Nations High Commissioner of Human Rights (OHCHR), on documented deaths in that country. Megan is a member of the Technical Advisory Board for the Office of the Prosecutor at the International Criminal Court and a Research Fellow at the Carnegie Mellon University Center for Human Rights Science. She is the Human Rights Editor for the Statistical Journal of the International Association for Official Statistics (IAOS) and on the editorial board of Significance Magazine. She earned her doctorate in biostatistics and a Certificate in Human Rights from the Rollins School of Public Health at Emory University. She also holds a master of science degree and bachelor of science degree in Statistics from Case Western Reserve University.
Talk | Data Visualization | Advanced
Cloud migration and modernization projects might not be the easiest to execute, but there is a better way of doing them than turning them into a trial and error process. At this session you will learn how combining a hybrid cloud data warehouse with automated lineage will help you prepare for migration and modernization projects and accelerate their execution...more details
Ernie is SVP of Products at MANTA, focusing on solutions for lineage and metadata integration. He has over thirty years of experience in the data integration space, including twenty-plus years at IBM, working in a variety of roles with responsibilities in product management and technical sales support. For most of the past decade Ernie has been providing guidance in information governance and helping architect custom lineage solutions. Earlier in his career, Ernie was building decision support systems with fourth generation languages and data access middleware. Ernie maintains a blog on open metadata, data lineage and overall metadata management and governance. He is a graduate of Boston College. While not pursuing MANTA initiatives, working on his blog, or spending time with family, Ernie enjoys woodturning and motorcycles.
Talk | Deep Learning | Machine Learning | Beginner-Intermediate
Twitter is what’s happening in the world right now. In order to understand and organize content on the platform, we leverage a semantic text representation that is useful across a variety of tasks. Because content on Twitter spans a wide range of diverse topics and is constantly changing, supervised training that traditionally relies on human-annotated corpora proves to be expensive and unscalable.
Sijun He and Kenny Leung share their experience building and serving self-supervised content representations for heterogeneous content on Twitter. They also highlight various applications of content embeddings in recommendation systems, as well as the engineering challenge of maintaining such embeddings at scale...more details
Sijun He is a machine learning engineer at Twitter Cortex, where he works on content understanding with deep learning and NLP. Previously, he was a data scientist at Autodesk. Sijun holds an MS in statistics from Stanford University.
Kenny is interested in bringing human capabilities – particularly language, vision, and the acquisition of everyday knowledge – to modern technology.
Talk | Deep Learning | Intermediate-Advanced
Much real-world data such as medical records, customer interactions, or financial transactions is recorded at irregular time intervals. However, most deep learning time series models, such as recurrent neural networks, require data to be recorded at regular intervals, such hourly or daily. I’ll explain some recent advances in building deep stochastic differential equation models that specify continuous-time dynamics. This allows us to fit a new family of richly-parameterized distributions over time series. We’ll discuss their strengths and limitations, and demonstrate these models on medical records and motion capture data. All the tools discussed are open-source…more details
David Duvenaud is an assistant professor in computer science and statistics at the University of Toronto, where he holds a Canada Research Chair in generative models. His postdoctoral research was done at Harvard University, where he worked on hyperparameter optimization, variational inference, and automatic chemical design. He did his Ph.D. at the University of Cambridge, studying Bayesian nonparametrics with Zoubin Ghahramani and Carl Rasmussen. David spent two summers in the machine vision team at Google Research, and also co-founded Invenia, an energy forecasting and trading company. David is a founding member of the Vector Institute for Artificial Intelligence.
Talk | AI for Good | Research Frontiers | Intermediate
We recently fit a series of models to account for uncertainty and variation in coronavirus tests (see here: http://www.stat.columbia.edu/~gelman/research/published/specificity.pdf). We will talk about the background of this problem and our analysis, and then we will expand into a general discussion of Bayesian workflow...more details
Andrew Gelman is a professor of statistics and political science and director of the Applied Statistics Center at Columbia University. He has received the Outstanding Statistical Application award from the American Statistical Association, the award for best article published in the American Political Science Review, and the Council of Presidents of Statistical Societies award for outstanding contributions by a person under the age of 40. His books include Bayesian Data Analysis (with John Carlin, Hal Stern, David Dunson, Aki Vehtari, and Don Rubin), Teaching Statistics: A Bag of Tricks (with Deb Nolan), Data Analysis Using Regression and Multilevel/Hierarchical Models (with Jennifer Hill), Red State, Blue State, Rich State, Poor State: Why Americans Vote the Way They Do (with David Park, Boris Shor, and Jeronimo Cortina), and A Quantitative Tour of the Social Sciences (co-edited with Jeronimo Cortina). Andrew has done research on a wide range of topics, including: why it is rational to vote; why campaign polls are so variable when elections are so predictable; why redistricting is good for democracy; reversals of death sentences; police stops in New York City, the statistical challenges of estimating small effects; the probability that your vote will be decisive; seats and votes in Congress; social network structure; arsenic in Bangladesh; radon in your basement; toxicology; medical imaging; and methods in surveys, experimental design, statistical inference, computation, and graphics.
Track Keynote | NLP | All Levels
Our research focuses on the language complexity of earnings calls and its impact on volatility. The hypothesis is that the more complex the language used during an earnings call is, the more difficult it will be for market participants to process the additional information conveyed. Therefore, management’s choice of words and use of language affect the participants’ ability to determine the value of the firm which will lead to higher stock price volatility following the earnings calls. With our research design, use of a novel measure of idiosyncratic volatility, and access to a unique dataset produced by using NLP, machine learning (ML) and linguistic text processing (LTP) techniques, we are finding convincing support for our hypothesis. In this talk, Dr. Karagozoglu will present details of our methodology and dataset, and share our findings with the audience. Discussion of further applications of NLP, ML, and data science in financial markets, especially in volatility estimation and trading, as well as risk management will wrap up the talk...more details