ODSC West 2022
Preliminary Schedule
more sessions added weekly

Virtual | Talk | Machine Learning | Beginner – Intermediate
In this talk, we will present the challenges of learning from dirty data, overview data poisoning attacks on different systems like Spam detection, image classification and rating systems, discuss the problem of learning from web traffic – probably the dirtiest data in the world, and explain different approaches for learning from dirty data and poisoned data. We will focus on threshold-learning mitigation for data poisoning, aiming to reduce the impact of any single data source, and discuss a mundane but crucial aspect of threshold learning – memory complexity. We will present a robust learning scheme optimized to work efficiently on streamed data with bounded memory consumption. We will give examples from the web security arena with robust learning of URLs, parameters, character sets, cookies and more…more details
Virtual | Talk | Machine Learning | Research Frontiers | Intermediate-Advanced
This talk will overview our recent work on reasoning about the behavior of learned classifiers. I will cover techniques to inject domain knowledge into machine learning models, for example by enforcing monotonicity constraints, or logical structure, on neural network outputs…more details
In-person | Half-Day Training | Beginner-Intermediate
This course will examine anomaly detection through the example of fraud, but all these techniques can be applied to other areas as well. We will start with the importance of feature creation and transformation. We will then cover more statistical based approaches to anomaly detection. Last, we will end with more machine learning based approaches to allow the learner to approach anomalies from any angle and industry need…more details
A Teaching Associate Professor at the Institute for Advanced Analytics, Dr. Aric LaBarr is passionate about helping people solve challenges using their data. There he helps design the innovative program to prepare a modern workforce to wisely communicate and handle a data-driven future at the nation’s first master of science in the analytics degree program. He teaches courses in predictive modeling, forecasting, simulation, financial analytics, and risk management.
Previously, he was Director and Senior Scientist at Elder Research, where he mentored and lead a team of data scientists and software engineers. As director of the Raleigh, NC office he worked closely with clients and partners to solve problems in the fields of banking, consumer product goods, healthcare, and government.
Dr. LaBarr holds a B.S. in economics, as well as a B.S., M.S., and Ph.D. in statistics — all from NC State University.
Virtual | Talk | All Levels
Data-centric AI broadly describes the idea that *data*, rather than models, is increasingly the crux of success or failure in AI for many settings and use cases. More specifically, data-centric AI defines ML development workflows that center around principally iterating on the *training data*–e.g. labeling, sampling, slicing, augmenting, etc.–rather than the model architecture. In this talk, I’ll describe how programmatic or weak supervision can not only facilitate these data-centric workflows (in ways that manual labeling cannot), but more importantly, will present an overview about how it can serve as an API for rich organizational knowledge sources, presenting recent technical results and user case studies…more details
Alex Ratner has Ph.D. in computer science at Stanford, advised by Chris Re, where his researched focuses on weak supervision: the idea of using higher-level, noisier input from domain experts to train complex state-of-the-art models where limited or no hand-labeled training data is available. He leads the development of the Snorkel framework (snorkel.stanford.edu) for weakly supervised ML, which has been applied to machine learning problems in domains like genomics, radiology, and political science. He is supported by a Stanford Bio-X SIGF fellowship.
In-person | Talk
In this talk, we explore:
- Challenges of building recommender systems
- Strategies for reducing latency, while balancing requirements for freshness
- Challenges in mitigating data quality issues
- Technical and organizational challenges feature stores solve
- How to integrate Feast, an open-source feature store, into an existing recommender system to support production systems…more details
Danny Chiao is an engineering lead at Tecton/Feast Inc working on building a next-generation feature store. Previously, Danny was a technical lead at Google working on end to end machine learning problems within Google Workspace, helping build privacy-aware ML platforms / data pipelines and working with research and product teams to deliver large-scale ML powered enterprise functionality. Danny holds a Bachelor’s degree in Computer Science from MIT.
Virtual | Talk | All Levels
Coming Soon!
Stefano Ermon is an Associate Professor of Computer Science in the CS Department at Stanford University, where he is affiliated with the Artificial Intelligence Laboratory, and a fellow of the Woods Institute for the Environment. His research is centered on techniques for probabilistic modeling of data and is motivated by applications in the emerging field of computational sustainability. He has won several awards, including Best Paper Awards (ICLR, AAAI, UAI and CP), a NSF Career Award, ONR and AFOSR Young Investigator Awards, Microsoft Research Fellowship, Sloan Fellowship, and the IJCAI Computers and Thought Award. Stefano earned his Ph.D. in Computer Science at Cornell University in 2015.
In-person | Talk | Beginner-Intermediate
Did my data change after a certain intervention? This is a common question with data observed over time. Classical statistical and engineering approaches include control charts to see if the series falls outside of the normal boundaries of expected data. A Bayesian approach to this problem calculates the probability that the data series changes at every point along the series. Bayesian change point analysis allows the analyst to evaluate a whole series and look where the highest probability of change occurred. Has the financial asset lost value after the recent financial report? Are the healthcare outcomes at this hospital better after our new process to help patients? Did the manufacturing process improve after upgrading the machinery? All these questions and more can be answered with these techniques which will be shown in R…more details
A Teaching Associate Professor at the Institute for Advanced Analytics, Dr. Aric LaBarr is passionate about helping people solve challenges using their data. There he helps design the innovative program to prepare a modern workforce to wisely communicate and handle a data-driven future at the nation’s first master of science in the analytics degree program. He teaches courses in predictive modeling, forecasting, simulation, financial analytics, and risk management.
Previously, he was Director and Senior Scientist at Elder Research, where he mentored and lead a team of data scientists and software engineers. As director of the Raleigh, NC office he worked closely with clients and partners to solve problems in the fields of banking, consumer product goods, healthcare, and government.
Dr. LaBarr holds a B.S. in economics, as well as a B.S., M.S., and Ph.D. in statistics — all from NC State University.
In-person | Talk | Deep Learning | Machine Learning | ML Ops and Date Engineering | All Levels
In this talk, we discuss these challenges and share our findings and recommendations from working on real-world examples at SPINS, a data/tech company focused on the natural grocery industry. More specifically, we describe how we leverage state-of-the-art language models to seamlessly automate parts of SPINS’ data ingestion workflow and drive substantial business outcomes. We provide a walk-through of our end-to-end MLOps system and discuss how using the right tools and methods have helped to mitigate some of these challenges. We also share our findings from our experimentation and provide insights on when one should use these massive transformer models instead of classical ML models. Considering that we have a variety of challenges in our use cases from an ill-defined label space to a huge number of classes (~86,000) and massive data imbalance, we believe our findings and recommendations can be applied to most real-world settings. We hope that the learnings from this talk can help you to solve your own problems more effectively and efficiently!…more details
Azin Asgarian is currently an applied research scientist on Georgian’s R&D team where she works with companies to help adopt applied research techniques to overcome business challenges. Prior to joining Georgian, Azin was a research assistant at the University of Toronto and part of the Computer Vision Group where she was working on the intersection of Machine Learning, Transfer Learning, and Computer Vision.
Elliot Henry is a Data Science Manager at SPINS; he is responsible for the successful execution of data science projects and the overall support and care of his team. He has experience managing projects from the ideation phase through delivery of the final product, and is passionate about building end to end machine learning systems using software based methods. Prior to SPINS, Elliot has experience as a data scientist in the fields of retail, marketing, and digital. He holds a B.A. in Biochemistry from Dartmouth College and a M.S. in Analytics from the University of Chicago. In his free time, he enjoys playing board games with friends.
In-person | Talk | Responsible AI | Beginner
There are many types of users and stakeholders that require Explainable AI. Without explanations, end-users are less likely to trust and adopt ML-based technologies. Without a means of understanding model decision-making, business stakeholders have a difficult time assessing the value and risks associated with launching a new ML-based product. And without insights into why an ML application is behaving in a certain way, application developers have a harder time troubleshooting issues, and ML scientists have a more difficult time assessing their models for fairness and bias. To further complicate an already challenging problem, the audiences for ML model explanations come from varied backgrounds, have different levels of experience with statistics and mathematical reasoning, and are subject to cognitive biases…more details
Meg is currently a UX Researcher for Google Cloud AI and Industry Solutions, where she focuses her research on Explainable AI and Model Understanding. She has had a varied career working for start-ups and large corporations alike across fields such as EdTech, weather forecasting, and commercial robotics. She has published articles on topics such as information visualization, educational-technology design, human-robot interaction (HRI), and voice user interface (VUI) design. Meg is also a proud alumnus of Virginia Tech, where she received her Ph.D. in Human-Computer Interaction (HCI).
In-person | Talk | MLOps and Data Engineering | | Intermediate
This session will explore the challenges our clients face in data sourcing, data preparation and model evaluation by humans, and discuss how automated labelling and responsible AI optimize predictive modeling. Our expert analysts will examine the technologies and processes used to enable success for clients across data types and industries…more details
Rachel is a Product Manager in Appen’s Autonomous Vehicles working group. In that role, she is working to provide high quality data on all levels of autonomy for motor vehicle clients. Prior to joining Appen, Rachel worked on data science tools to enable model interpretability, fairness testing and automated machine learning. Other passions of hers include using AI and technology to act as a catalyst towards solving humanitarian-centered problems for non-profits around the world.
In-person | Talk | Machine Learning | Data Science Research Frontiers | All Levels
Heart rate variability biofeedback (HRV-B) is a clinically effective therapy in which patients can improve their mental and physical well-being through real-time monitoring of the heart-rate and specialized breathing techniques. HRV-B can improve health outcomes in a number of medical or wellness-related conditions, ranging from depression and anxiety, to cardiovascular disease, asthma, cancer fatigue, women’s health, better sleep, peak athletic performance, and stress resilience…more details
In-person | Talk | ML Ops and Date Engineering | Data Science Research Frontiers | Data Analytics | Beginner – Intermediate
We at LinkedIn leverage Jupyter notebooks extensively to do ad-hoc data analysis and our data scientists, engineers and developers spend a lot of time iterating over the query development lifecycle. We have created a hosted notebook platform at LinkedIn for our internal users) called Darwin (Data Analytics & Relevance Workbench at LinkedIn) – a one-stop solution for a complete Jupyter notebook/query life cycle (query development, query testing and query productionizing)…more details
In-person | Talk
In this talk we navigate through the latest buzz around semantic search and separate the noise from the meaningful advancements. Is dense retrieval better than BM25’s keyword search? Do large language models outperform smaller transformers? How well do the models generalize to industry corpora? How can we leverage Question Answering? We will benchmark different methods, share best practices from industry use cases and show how you can use the open source framework Haystack to build, test and deploy stellar search pipelines easily yourself…more details
Malte Pietsch is CTO & Co-Founder at deepset. His current focus is on building deepset Cloud – a SaaS platform for developers to build, deploy and operate modern NLP pipelines. He holds a M.Sc. with honors from TU Munich and conducted research at Carnegie Mellon University. Before founding deepset he worked as a data scientist for multiple startups. He is an active open-source contributor and author of the NLP framework Haystack.
Virtual | Talk | Machine Learning | All Levels
Coming Soon!
Jacob Schreiber is a fifth year Ph.D. student and NSF IGERT big data fellow in the Computer Science and Engineering department at the University of Washington. His primary research focus is on the application of machine larning methods, primarily deep learning ones, to the massive amount of data being generated in the field of genome science. His research projects have involved using convolutional neural networks to predict the three dimensional structure of the genome and using deep tensor factorization to learn a latent representation of the human epigenome. He routinely contributes to the Python open source community, currently as the core developer of the pomegranate package for flexible probabilistic modeling, and in the past as a developer for the scikit-learn project. Future projects include graduating.
In-person | Talk | Deep Learning | ML Ops and Date Engineering | Intermediate – Advanced
Only a significant minority of companies unlock the true potential of AI as trained models accumulate dust due to challenges in MLOps. Serving reliable AI predictions to customers involves cost, effort, and planning to set up a continuous deployment pipeline. MLOps for Deep Learning demands a carefully crafted deployment pipeline. We discuss our open-source project which is a robust continuous deployment pipeline by integrating our unique drift detection and model retrain algorithms for serving DL models. We show how to efficiently deploy, monitor, and maintain DL models in production using our solution which is a Kubernetes native POC solution…more details
Diego Klabjan is a professor at Northwestern University, Department of Industrial Engineering and Management Sciences. He is also Founding Director, Master of Science in Analytics, and the Deep Learning Lab. His expertise is focused on data science and deep learning with a concentration in finance, insurance, and healthcare. Professor Klabjan has led projects with large companies such as The Chicago Mercantile Exchange Group, Intel, General Motors and many others, and he is also assisting numerous start-ups with their analytics needs. He is also a founder of Opex Analytics.
Yegna Jambunath is a Researcher at Centre for Deep Learning, Northwestern University. Yegna has six years of total work experience with four years of industry focused research experience in ML and Data Science. His areas of interest are MLOps, ML in Healthcare and RL.
In-person | Talk | Machine Learning | Data Analytics | Responsible AI | Intermediate
In his talk “Responsible AI Is Not an Option,” Dr. Scott Zoldi brings to bear his decades of experience in delivering analytic innovation in a highly regulated environment, to underscore the urgency with which the topics of AI fairness and bias must be ushered onto Boards of Directors’ agendas. In his dynamic presentation, Dr. Zoldi spells out the “why” and “how” of fulfilling the social covenant and soon, regulatory requirements of enterprises using AI ethically, transparently, securely and in their customers’ best interests…more details
Scott Zoldi is chief analytics officer at FICO responsible for advancing the company's leadership in artificial intelligence (AI) and analytics in its product and technology solutions. At FICO Scott has authored more than 120 analytic patents, with 71 granted and 49 pending. Scott is actively involved in the development of analytics applications, Responsible AI technologies and AI governance frameworks, the latter including FICO's blockchain-based [SZ1] model development governance methodology. Scott is a member of the Board of Advisors of FinRegLab, a Cybersecurity Advisory Board Member of the California Technology Council, and a Board Member of Tech San Diego and the San Diego Cyber Center of Excellence. He is also a member of the CNBC Technology Executive Council. Scott received his Ph.D. in theoretical and computational physics from Duke University.
In-person | Talk | Deep Learning | Machine Learning | Intermediate
In this talk I will give an overview of this problem, what makes it difficult, how we have been addressing it at Wayfair, and the lessons that we have learned so far. I will start by giving an overview of the different types of competing business objectives that can arise in the e-commerce use case and how they compete against each other in practice. Along the way I will introduce some of the fundamental concepts of multi-objective optimization, including Pareto Efficiency and the Pareto Frontier, and how they relate to this problem. Finally, I will discuss the pros and cons of various strategies for making recommendation systems profit aware…more details
Ali Vanderveld is a Senior Staff Data Scientist at Wayfair, where she serves as a technical leader for machine learning, currently leading the development of novel search and recommendation technologies. Prior to Wayfair, she led a team focused on language AI at Amazon Web Services and was the Director of Data Science at ShopRunner. She has also worked at Civis Analytics, at Groupon, and as a technical mentor for the Data Science for Social Good Fellowship. Ali has a PhD in theoretical astrophysics from Cornell University and got her start working as an academic researcher at Caltech, the NASA Jet Propulsion Laboratory, and the University of Chicago, working on the development teams for several space telescope missions, including ESA’s Euclid.
In-person | Talk
Differential privacy can be viewed as a technical solution for protecting individual privacy to meet legal or policy requirements for disclosure limitation while analyzing and sharing personal data. Using examples and some mathematical formalism, this talk will introduce differential privacy concepts including the definition of differential privacy, how differentially private analyses are constructed, and how these can be used in practice…more details
Veena Mendiratta is an applied researcher in network reliability and analytics at Nokia Bell Labs based in Naperville, Illinois, USA. Her research interests include network dependability, software reliability engineering, programmable networks resiliency, and telecom data analytics. Current work is focused on network reliability and analytics – architecting and modeling the reliability of next-generation programmable networks; and the development of analytics-based algorithms for anomaly detection, network slicing and network control for improving network performance and reliability. She has led projects on customer experience analytics using data mining and social network analysis techniques, and the development of algorithms and visual analytics for anomaly detection in telecommunications networks. She is a member of the SIAM Visiting Lecturer Program, Life Member of SIAM, Senior Member of IEEE, Member of INFORMS; member of ASA; and was a Fulbright Specialist Scholar for 5 years during which time she visited universities in India, Norway and New Zealand. She holds a B.Tech in engineering from IIT-Delhi, India, and a Ph.D. in operations research from Northwestern University, USA.
In-person | Talk
In this talk, I will provide a holistic review of the research methods and tools for causal analytics in business decisions. I focus especially on causal inference in data science. I will discuss a decision tree that helps data scientists to identify the best causal research method based on the problem, context, and the nature of the data. I will draw on my proprietary research on prescriptive analytics (https://docs.google.com/document/d/1b8yaDzriVB2JyIBNQMsUn-uz4bXnsdFe6hTLLTOs1q4/edit)…more details
Dr. Victor Zitian Chen, CFA, is a believer and action-taker on the idea of a world brain. Dr. Chen is currently the Director of Data Analytics and Insights, Experimental Design and Causal Inference at Fidelity Investments. He leads the causal analytics efforts across the personal investing business at the Fidelity, including experimentation, prescriptive analytics, and causal knowledge graph-based applications. Before joining Fidelity, Dr. Chen was a tenured professor in management and data science at the University of North Carolina, Charlotte, and a visiting professor in international business at Copenhagen Business School, Denmark. He led two major National Science Foundation (NSF) grants focusing on causal knowledge graph-based explainable AI and analytics applications. He founded and led the Global OpenLabs for Performance Enhancement-Analytics and Knowledge System (GoPeaks) – a startup to advance and commercialize knowledge synthesis and causal/prescriptive analytics solutions for business decisions.
In-person | Demo Talk
In this demo, we’ll cover the best practices for setting up data science environments, how to store them so you can replicate your code, and how to share a setup with your coworkers…more details
Dr. Jacqueline Nolis is a data science leader with 15 years of experience in running data science teams and projects at companies ranging from Airbnb to Boeing. She is the Chief Product Officer at Saturn Cloud where she helps design products for data scientists. Jacqueline has a PhD in Industrial Engineering and her academic research focused on optimization under uncertainty. Data science is also her hobby—like making an R package that mails physical postcards of your plots.
Virtual | Bootcamp | Data Analytics
Data literacy is an essential foundational topic for anyone considering data science or machine learning. This session will help one understand core data workflow concepts including what is data, data generation and collecting, data profiling, data transformation, data shaping, and other essential workflow topics. As this is an interactive training session, in addition to covering these data topics, we will layer on SQL and an introduction to relational databases…more details
Sheamus McGovern is the founder of ODSC (The Open Data Science Conference). He is also a software architect, data engineer, and AI expert. He started his career in finance by building stock and bond trading systems and risk assessment platforms and has worked for numerous financial institutions and quant hedge funds. Over the last decade, Sheamus has consulted with dozens of companies and startups to build leading-edge data-driven applications in finance, healthcare, eCommerce, and venture capital. He holds degrees from Northeastern University, Boston University, Harvard University, and a CQF in Quantitative Finance.
In-person | Half-Day Training | Machine Learning | Data Visualization | Intermediate-Advanced
While there are many plotting libraries to choose from, the prolific Matplotlib library is always a great place to start. Since various Python data science libraries utilize Matplotlib under the hood, familiarity with Matplotlib itself gives you the flexibility to fine tune the resulting visualizations (e.g., add annotations, animate, etc.). This session will also introduce interactive visualizations using HoloViz, which provides a higher-level plotting API capable of using Matplotlib and Bokeh (a Python library for generating interactive, JavaScript-powered visualizations) under the hood…more details
Stefanie Molin is a data scientist and software engineer at Bloomberg in New York City, where she tackles tough problems in information security, particularly those revolving around anomaly detection, building tools for gathering data, and knowledge sharing. She is also the author of “Hands-On Data Analysis with Pandas,” which is currently in its second edition. She holds a bachelor’s of science degree in operations research from Columbia University’s Fu Foundation School of Engineering and Applied Science. She is currently pursuing a master’s degree in computer science, with a specialization in machine learning, from Georgia Tech. In her free time, she enjoys traveling the world, inventing new recipes, and learning new languages spoken among both people and computers.
In-person | Half-Day Training | Machine Learning | All Levels
By completing this hands-on workshop, you will develop an understanding of machine learning concepts and methodologies and learn how to fit, tune, and evaluate the predictive performance of a variety of parametric and non-parametric models for classification and regression. You will become familiar with how to preprocess data, build, tune, and cross-validate predictive models, and make predictions with the models in Python…more details
Clinton Brownley, Ph.D., is a data scientist at Meta (formerly Facebook), where he’s responsible for a variety of analytics projects designed to empower employees to do their best work. Prior to this role, he was a data scientist at WhatsApp, working to improve messaging and VoIP calling performance and reliability. Before WhatsApp, he worked on large-scale infrastructure analytics projects to inform hardware acquisition, maintenance, and data center operations decisions at Facebook.
As an avid student and teacher of modern data analysis and visualization techniques, Clinton teaches a graduate course in interactive data visualization for UC Berkeley’s MIDS program, taught a short-term graduate course in regression analysis and machine learning workshop for NYU’s A3SR program, leads an annual machine learning in Python workshop, and is the author of two books, “Foundations for Analytics with Python” and “Multi-objective Decision Analysis”.
Clinton is a past-president of the San Francisco Bay Area Chapter of the American Statistical Association and is a council member for the Section on Practice of the Institute for Operations Research and the Management Sciences. Clinton received degrees from Carnegie Mellon University and American University.
Virtual | Workshop | Deep Learning | MLOps & Data Engineering | Intermediate
OpenMPI is used for high-performance computing at supercomputing centers as part of distributed computing systems. In the first half of this workshop, we will get to know more about OpenMPI and work with it hands-on using a python interface. The second half of the workshop will focus on using DeepSpeed on OpenMPI and work through a few examples. Examples for MPI basics will include inferring pi using distributed computing and running python scripts using OpenMPI…more details
Jennifer Davis, Ph.D. is a Staff Field Data Scientist at Domino Data Labs, where she empowers clients on complex data science projects. She has completed two postdocs in computational and systems biology, trained at a supercomputing center at the University of Texas, Austin, and worked on hundreds of consulting projects with companies ranging from start-ups to the Fortune 100. Jennifer has previously presented topics for Association for Computing Machinery on LSTMs and Natural Language Generation and at conferences across the US and in Italy. Jennifer was part of a panel discussion for an IEEE conference on artificial intelligence in biology and medicine. She has practical experience teaching both corporate classes and at the college level.
Virtual | Workshop
Regardless of their concrete application, the primary goal of forecasting systems is to produce the most accurate forecast possible. However, while beating benchmarks is important, a forecast useable in business processes additionally needs to fulfill many more criteria, which significantly increases the complexity of real-world solutions…more details
Virtual | Workshop | NLP | Machine Learning | Beginner – Intermediate
In this talk, I will be giving some background in Conversational AI, NLP along with Self-supervised and Unsupervised techniques. Transformers-based large language models (LLMs) such as GPT-3, Jurasic, T5 have been foundational to the advances that we see. I will walk the audience through hands-on examples and how they can leverage transformers and large language models for few-shot, zero-shot learning in a variety of NLP applications such as text classification, summarization and question-answering…more details
Chandra Khatri is the Chief Scientist and Head of AI at Got It AI, wherein, his team is transforming AI space by leveraging state-of-the-art technologies to deliver the world’s first fully autonomous Conversational AI system. Under his leadership, Got It AI is democratizing Conversational AI and related ecosystems through automation. Prior to Got-It, Chandra was leading various AI applied and research groups at Uber, Amazon Alexa and eBay.
At Uber, he was leading Conversational AI, Multi-modal AI, and Recommendation Systems. At Amazon he was the founding member of the Alexa Prize Competition and Alexa AI, wherein he was leading the R&D and got the opportunity to significantly advance the field of Conversational AI, particularly Open-domain Dialog Systems, which is considered as the holy-grail of Conversational AI and is one of the open-ended problems in AI. And at eBay he was driving NLP, Deep Learning, and Recommendation Systems related applied research projects.
He graduated from Georgia Tech with a specialization in Deep Learning in 2015 and holds an undergraduate degree from BITS Pilani, India. His current areas of research include Artificial and General Intelligence, Democratization of AI, Reinforcement Learning, Language and Multi-modal Understanding, and Introducing Common Sense within Artificial Agents.
In-person | Workshop | Machine Learning | Beginner-Intermediate
This workshop will teach hands-on coding techniques covering all the foundations of a complete churn fighting data pipeline including churn measurement, feature engineering from raw data, data set creation, and machine learning. The workshop is taught using Python and SQL from the open source fightchurn package (pypi pypi.org/project/fightchurn/, github github.com/carl24k/fight-churn.) Participants should come with their own laptop prepared with Python and an IDE such as PyCharm or VSCode that will allow them to put breakpoints in code that will be demonstrated and discussed….more details
Carl Gold is currently the Data Science Director at OfferFit.ai, an AI-as-a-Service reinforcement learning engine that maximizes customer upsell and retention. Before coming to OfferFit, Carl was Chief Data Scientist of Zuora, the Subscription Economy leading billing platform. Based on his experiences fighting churn for SaaS companies during his time at Zuora, Carl wrote the first book dedicated to customer churn analytics and data science: “Fighting Churn With Data”. Carl has a PhD from the California Institute of Technology and first author publications in leading Machine Learning and Neuroscience journals.
Virtual | Tutorial | Deep Learning | Intermediate-Advanced
Deep Reinforcement Learning equips AI agents with the ability to learn from their own trial and error. Success stories include learning to play Atari games, Go, Dota2, robots learning to run, jump, manipulate. This tutorial will cover the foundations of Deep Reinforcement Learning, including MDPs, DQN, Policy Gradients, TRPO, PPO, DDPG, SAC, TD3, model-based RL, as well as current research frontiers.…more details
Professor Pieter Abbeel is Director of the Berkeley Robot Learning Lab and Co-Director of the Berkeley Artificial Intelligence (BAIR) Lab. Abbeel’s research strives to build ever more intelligent systems, which has his lab push the frontiers of deep reinforcement learning, deep unsupervised learning, especially as it pertains to robotics. Abbeel’s Intro to AI class has been taken by over 100K students through edX, and his Deep Unsupervised Learning materials are standard references for AI researchers. Abbeel has founded several companies, including Gradescope (AI to help instructors with grading homework, projects and exams) and Covariant (AI for robotic automation of warehouses and factories). He advises many AI and robotics start-ups, and is a frequently sought after speaker worldwide for C-suite sessions on AI future and strategy. Abbeel has received many awards and honors, including ACM Prize, IEEE Fellow, PECASE, NSF-CAREER, ONR-YIP, AFOSR-YIP, Darpa-YFA, TR35, and 10+ best paper awards/finalists. His work is frequently featured in the press, including the New York Times, Wall Street Journal, BBC, Rolling Stone, Wired, and Tech Review.
In-person | Tutorial | Deep Learning | All Levels
Deep neural networks can make overconfident errors and assign high confidence predictions to inputs far away from the training data. Well-calibrated predictive uncertainty estimates are important to know when to trust a model’s predictions, especially for safe deployment of models in applications where the train and test distributions can be different. I’ll first present some concrete examples that motivate the need for uncertainty and out-of-distribution (OOD) robustness in deep learning…more details
Balaji is currently a Staff Research Scientist at Google Brain working on Machine Learning and its applications. Previously, he was a research scientist at DeepMind for 4.5+ years. Before that, he received a PhD in machine learning from Gatsby Unit, UCL supervised by Yee Whye Teh.
His research interests are in scalable, probabilistic machine learning. More recently, he has focused on:
– Uncertainty and out-of-distribution robustness in deep learning
– Deep generative models including generative adversarial networks (GANs), normalizing flows and variational auto-encoders (VAEs)
– Applying probabilistic deep learning ideas to solve challenging real-world problems.
In-person | Half-Day Training | Deep Learning | Machine Learning | Beginner – Intermediate
Most production information retrieval systems are built on top of Lucene which use tf-idf and BM25. The current state of the art techniques utilizes embeddings for retrieval. This workshop aims to demystify what is involved in building such a system…more details
In-person | Workshop | NLP | Machine Learning | Beginner – Intermediate
In this talk, I will be giving some background in Conversational AI, NLP along with Self-supervised and Unsupervised techniques. Transformers based large language models (LLMs) such as GPT-3, Jurasic, T5 have been foundational to the advances that we see. I will walk the audience through hands-on examples and how they can leverage transformers and large language models for few-shot, zero-shot learning in a variety of NLP applications such as text classification, summarization and question-answering…more details
Chandra Khatri is the Chief Scientist and Head of AI at Got It AI, wherein, his team is transforming AI space by leveraging state-of-the-art technologies to deliver the world’s first fully autonomous Conversational AI system. Under his leadership, Got It AI is democratizing Conversational AI and related ecosystems through automation. Prior to Got-It, Chandra was leading various AI applied and research groups at Uber, Amazon Alexa and eBay.
At Uber, he was leading Conversational AI, Multi-modal AI, and Recommendation Systems. At Amazon he was the founding member of the Alexa Prize Competition and Alexa AI, wherein he was leading the R&D and got the opportunity to significantly advance the field of Conversational AI, particularly Open-domain Dialog Systems, which is considered as the holy-grail of Conversational AI and is one of the open-ended problems in AI. And at eBay he was driving NLP, Deep Learning, and Recommendation Systems related applied research projects.
He graduated from Georgia Tech with a specialization in Deep Learning in 2015 and holds an undergraduate degree from BITS Pilani, India. His current areas of research include Artificial and General Intelligence, Democratization of AI, Reinforcement Learning, Language and Multi-modal Understanding, and Introducing Common Sense within Artificial Agents.
Virtual | Workshop | All Levels
Coming Soon!
Daniel Lenton is the creator of Ivy, which is an open-source framework with an ambitious mission to unify all other ML frameworks. Prior to starting Ivy, Daniel was a PhD student at Imperial College London, where he published research in the areas of machine learning, robotics and computer vision.
In-person | Workshop | All Levels
In this workshop, we will explore some of these popular NLP techniques that have broad applicability. From the basics of bagging and word vectors to the creating of contextualized representations of words and sentences, the workshop will equip participants with the tools they need to turn messy text data into useful insights. The focus of the workshop will be building NLP approaches with increasing complexity. Each step in the progression will build on the others and be evaluated against one another…more details
Benjamin is a Data Scientist at ThriveHive where he works with business stakeholders to create tools and models providing guidance to internal teams and business customers. Before ThriveHive he worked with the City of Boston’s analytics team on various modelling projects. He received his PhD in Policy Analysis from the RAND Corporation, with a focus on projects related to health, infrastructure and defense. He has experience with a wide array of methods and their applications across a diverse set of verticals.
In-person | Workshop | All Levels
Your model is only as good as the data that goes into it, so removing the maximum amount of noise while retaining signal is vital. This talk will introduce basic signal processing and motion artefact removal techniques, such as low-pass filtering, Kalman filtering, and Savitzky-Golay filtering. The session will take you through practical examples that you can apply straight away to your biological data. I will also make recommendations for data collection to consider when designing a sensor so that you can get the best possible data (because prevention is better than treatment!). Those interested in machine learning and signal analysis for biomedical processing, electrical and optical signals, and wearables technology will enjoy this talk!…more details
In-person | Workshop | Machine Learning | Intermediate-Advanced
Often, categorical variables possess a natural structure that is not linear or ordinal in nature. The months of the year have a circular structure while the US states have a structure that can be represented by a graph. StructureBoost uses novel techniques that allow this known structure to be exploited to yield better predictions. Recently, StructureBoost has been enhance to utilize the structure in the target variable (i.e. in multi-classification) as well as in the predictor variables. This hands-on workshop will demonstrate how to use StructureBoost in different problems involving categorical variables with known structure…more details
Brian Lucena is Principal at Numeristical, where he advises companies of all sizes on how to apply modern machine learning techniques to solve real-world problems with data. He is the creator of StructureBoost, ML-Insights, and the SplineCalib calibration tool. In previous roles he has served as Senior VP of Analytics at PCCI, Principal Data Scientist at Clover Health, and Chief Mathematician at Guardian Analytics. He has taught at numerous institutions including UC-Berkeley, Brown, USF, and the Metis Data Science Bootcamp.
Virtual | Talk | Machine Learning | Beginner – Intermediate
In this talk, we will present the challenges of learning from dirty data, overview data poisoning attacks on different systems like Spam detection, image classification and rating systems, discuss the problem of learning from web traffic – probably the dirtiest data in the world, and explain different approaches for learning from dirty data and poisoned data. We will focus on threshold-learning mitigation for data poisoning, aiming to reduce the impact of any single data source, and discuss a mundane but crucial aspect of threshold learning – memory complexity. We will present a robust learning scheme optimized to work efficiently on streamed data with bounded memory consumption. We will give examples from the web security arena with robust learning of URLs, parameters, character sets, cookies and more…more details
Virtual | Talk | Machine Learning | Research Frontiers | Intermediate-Advanced
This talk will overview our recent work on reasoning about the behavior of learned classifiers. I will cover techniques to inject domain knowledge into machine learning models, for example by enforcing monotonicity constraints, or logical structure, on neural network outputs…more details
In-person | Half-Day Training | Beginner-Intermediate
This course will examine anomaly detection through the example of fraud, but all these techniques can be applied to other areas as well. We will start with the importance of feature creation and transformation. We will then cover more statistical based approaches to anomaly detection. Last, we will end with more machine learning based approaches to allow the learner to approach anomalies from any angle and industry need…more details
A Teaching Associate Professor at the Institute for Advanced Analytics, Dr. Aric LaBarr is passionate about helping people solve challenges using their data. There he helps design the innovative program to prepare a modern workforce to wisely communicate and handle a data-driven future at the nation’s first master of science in the analytics degree program. He teaches courses in predictive modeling, forecasting, simulation, financial analytics, and risk management.
Previously, he was Director and Senior Scientist at Elder Research, where he mentored and lead a team of data scientists and software engineers. As director of the Raleigh, NC office he worked closely with clients and partners to solve problems in the fields of banking, consumer product goods, healthcare, and government.
Dr. LaBarr holds a B.S. in economics, as well as a B.S., M.S., and Ph.D. in statistics — all from NC State University.
Virtual | Talk | All Levels
Data-centric AI broadly describes the idea that *data*, rather than models, is increasingly the crux of success or failure in AI for many settings and use cases. More specifically, data-centric AI defines ML development workflows that center around principally iterating on the *training data*–e.g. labeling, sampling, slicing, augmenting, etc.–rather than the model architecture. In this talk, I’ll describe how programmatic or weak supervision can not only facilitate these data-centric workflows (in ways that manual labeling cannot), but more importantly, will present an overview about how it can serve as an API for rich organizational knowledge sources, presenting recent technical results and user case studies…more details
Alex Ratner has Ph.D. in computer science at Stanford, advised by Chris Re, where his researched focuses on weak supervision: the idea of using higher-level, noisier input from domain experts to train complex state-of-the-art models where limited or no hand-labeled training data is available. He leads the development of the Snorkel framework (snorkel.stanford.edu) for weakly supervised ML, which has been applied to machine learning problems in domains like genomics, radiology, and political science. He is supported by a Stanford Bio-X SIGF fellowship.
In-person | Talk
In this talk, we explore:
- Challenges of building recommender systems
- Strategies for reducing latency, while balancing requirements for freshness
- Challenges in mitigating data quality issues
- Technical and organizational challenges feature stores solve
- How to integrate Feast, an open-source feature store, into an existing recommender system to support production systems…more details
Danny Chiao is an engineering lead at Tecton/Feast Inc working on building a next-generation feature store. Previously, Danny was a technical lead at Google working on end to end machine learning problems within Google Workspace, helping build privacy-aware ML platforms / data pipelines and working with research and product teams to deliver large-scale ML powered enterprise functionality. Danny holds a Bachelor’s degree in Computer Science from MIT.
Virtual | Talk | All Levels
Coming Soon!
Stefano Ermon is an Associate Professor of Computer Science in the CS Department at Stanford University, where he is affiliated with the Artificial Intelligence Laboratory, and a fellow of the Woods Institute for the Environment. His research is centered on techniques for probabilistic modeling of data and is motivated by applications in the emerging field of computational sustainability. He has won several awards, including Best Paper Awards (ICLR, AAAI, UAI and CP), a NSF Career Award, ONR and AFOSR Young Investigator Awards, Microsoft Research Fellowship, Sloan Fellowship, and the IJCAI Computers and Thought Award. Stefano earned his Ph.D. in Computer Science at Cornell University in 2015.
In-person | Talk | Beginner-Intermediate
Did my data change after a certain intervention? This is a common question with data observed over time. Classical statistical and engineering approaches include control charts to see if the series falls outside of the normal boundaries of expected data. A Bayesian approach to this problem calculates the probability that the data series changes at every point along the series. Bayesian change point analysis allows the analyst to evaluate a whole series and look where the highest probability of change occurred. Has the financial asset lost value after the recent financial report? Are the healthcare outcomes at this hospital better after our new process to help patients? Did the manufacturing process improve after upgrading the machinery? All these questions and more can be answered with these techniques which will be shown in R…more details
A Teaching Associate Professor at the Institute for Advanced Analytics, Dr. Aric LaBarr is passionate about helping people solve challenges using their data. There he helps design the innovative program to prepare a modern workforce to wisely communicate and handle a data-driven future at the nation’s first master of science in the analytics degree program. He teaches courses in predictive modeling, forecasting, simulation, financial analytics, and risk management.
Previously, he was Director and Senior Scientist at Elder Research, where he mentored and lead a team of data scientists and software engineers. As director of the Raleigh, NC office he worked closely with clients and partners to solve problems in the fields of banking, consumer product goods, healthcare, and government.
Dr. LaBarr holds a B.S. in economics, as well as a B.S., M.S., and Ph.D. in statistics — all from NC State University.
In-person | Talk | Deep Learning | Machine Learning | ML Ops and Date Engineering | All Levels
In this talk, we discuss these challenges and share our findings and recommendations from working on real-world examples at SPINS, a data/tech company focused on the natural grocery industry. More specifically, we describe how we leverage state-of-the-art language models to seamlessly automate parts of SPINS’ data ingestion workflow and drive substantial business outcomes. We provide a walk-through of our end-to-end MLOps system and discuss how using the right tools and methods have helped to mitigate some of these challenges. We also share our findings from our experimentation and provide insights on when one should use these massive transformer models instead of classical ML models. Considering that we have a variety of challenges in our use cases from an ill-defined label space to a huge number of classes (~86,000) and massive data imbalance, we believe our findings and recommendations can be applied to most real-world settings. We hope that the learnings from this talk can help you to solve your own problems more effectively and efficiently!…more details
Azin Asgarian is currently an applied research scientist on Georgian’s R&D team where she works with companies to help adopt applied research techniques to overcome business challenges. Prior to joining Georgian, Azin was a research assistant at the University of Toronto and part of the Computer Vision Group where she was working on the intersection of Machine Learning, Transfer Learning, and Computer Vision.
Elliot Henry is a Data Science Manager at SPINS; he is responsible for the successful execution of data science projects and the overall support and care of his team. He has experience managing projects from the ideation phase through delivery of the final product, and is passionate about building end to end machine learning systems using software based methods. Prior to SPINS, Elliot has experience as a data scientist in the fields of retail, marketing, and digital. He holds a B.A. in Biochemistry from Dartmouth College and a M.S. in Analytics from the University of Chicago. In his free time, he enjoys playing board games with friends.
In-person | Talk | Responsible AI | Beginner
There are many types of users and stakeholders that require Explainable AI. Without explanations, end-users are less likely to trust and adopt ML-based technologies. Without a means of understanding model decision-making, business stakeholders have a difficult time assessing the value and risks associated with launching a new ML-based product. And without insights into why an ML application is behaving in a certain way, application developers have a harder time troubleshooting issues, and ML scientists have a more difficult time assessing their models for fairness and bias. To further complicate an already challenging problem, the audiences for ML model explanations come from varied backgrounds, have different levels of experience with statistics and mathematical reasoning, and are subject to cognitive biases…more details
Meg is currently a UX Researcher for Google Cloud AI and Industry Solutions, where she focuses her research on Explainable AI and Model Understanding. She has had a varied career working for start-ups and large corporations alike across fields such as EdTech, weather forecasting, and commercial robotics. She has published articles on topics such as information visualization, educational-technology design, human-robot interaction (HRI), and voice user interface (VUI) design. Meg is also a proud alumnus of Virginia Tech, where she received her Ph.D. in Human-Computer Interaction (HCI).
In-person | Talk | MLOps and Data Engineering | | Intermediate
This session will explore the challenges our clients face in data sourcing, data preparation and model evaluation by humans, and discuss how automated labelling and responsible AI optimize predictive modeling. Our expert analysts will examine the technologies and processes used to enable success for clients across data types and industries…more details
Rachel is a Product Manager in Appen’s Autonomous Vehicles working group. In that role, she is working to provide high quality data on all levels of autonomy for motor vehicle clients. Prior to joining Appen, Rachel worked on data science tools to enable model interpretability, fairness testing and automated machine learning. Other passions of hers include using AI and technology to act as a catalyst towards solving humanitarian-centered problems for non-profits around the world.
In-person | Talk | Machine Learning | Data Science Research Frontiers | All Levels
Heart rate variability biofeedback (HRV-B) is a clinically effective therapy in which patients can improve their mental and physical well-being through real-time monitoring of the heart-rate and specialized breathing techniques. HRV-B can improve health outcomes in a number of medical or wellness-related conditions, ranging from depression and anxiety, to cardiovascular disease, asthma, cancer fatigue, women’s health, better sleep, peak athletic performance, and stress resilience…more details
In-person | Talk | ML Ops and Date Engineering | Data Science Research Frontiers | Data Analytics | Beginner – Intermediate
We at LinkedIn leverage Jupyter notebooks extensively to do ad-hoc data analysis and our data scientists, engineers and developers spend a lot of time iterating over the query development lifecycle. We have created a hosted notebook platform at LinkedIn for our internal users) called Darwin (Data Analytics & Relevance Workbench at LinkedIn) – a one-stop solution for a complete Jupyter notebook/query life cycle (query development, query testing and query productionizing)…more details
In-person | Talk
In this talk we navigate through the latest buzz around semantic search and separate the noise from the meaningful advancements. Is dense retrieval better than BM25’s keyword search? Do large language models outperform smaller transformers? How well do the models generalize to industry corpora? How can we leverage Question Answering? We will benchmark different methods, share best practices from industry use cases and show how you can use the open source framework Haystack to build, test and deploy stellar search pipelines easily yourself…more details
Malte Pietsch is CTO & Co-Founder at deepset. His current focus is on building deepset Cloud – a SaaS platform for developers to build, deploy and operate modern NLP pipelines. He holds a M.Sc. with honors from TU Munich and conducted research at Carnegie Mellon University. Before founding deepset he worked as a data scientist for multiple startups. He is an active open-source contributor and author of the NLP framework Haystack.
Virtual | Talk | Machine Learning | All Levels
Coming Soon!
Jacob Schreiber is a fifth year Ph.D. student and NSF IGERT big data fellow in the Computer Science and Engineering department at the University of Washington. His primary research focus is on the application of machine larning methods, primarily deep learning ones, to the massive amount of data being generated in the field of genome science. His research projects have involved using convolutional neural networks to predict the three dimensional structure of the genome and using deep tensor factorization to learn a latent representation of the human epigenome. He routinely contributes to the Python open source community, currently as the core developer of the pomegranate package for flexible probabilistic modeling, and in the past as a developer for the scikit-learn project. Future projects include graduating.
In-person | Talk | Deep Learning | ML Ops and Date Engineering | Intermediate – Advanced
Only a significant minority of companies unlock the true potential of AI as trained models accumulate dust due to challenges in MLOps. Serving reliable AI predictions to customers involves cost, effort, and planning to set up a continuous deployment pipeline. MLOps for Deep Learning demands a carefully crafted deployment pipeline. We discuss our open-source project which is a robust continuous deployment pipeline by integrating our unique drift detection and model retrain algorithms for serving DL models. We show how to efficiently deploy, monitor, and maintain DL models in production using our solution which is a Kubernetes native POC solution…more details
Diego Klabjan is a professor at Northwestern University, Department of Industrial Engineering and Management Sciences. He is also Founding Director, Master of Science in Analytics, and the Deep Learning Lab. His expertise is focused on data science and deep learning with a concentration in finance, insurance, and healthcare. Professor Klabjan has led projects with large companies such as The Chicago Mercantile Exchange Group, Intel, General Motors and many others, and he is also assisting numerous start-ups with their analytics needs. He is also a founder of Opex Analytics.
Yegna Jambunath is a Researcher at Centre for Deep Learning, Northwestern University. Yegna has six years of total work experience with four years of industry focused research experience in ML and Data Science. His areas of interest are MLOps, ML in Healthcare and RL.
In-person | Talk | Machine Learning | Data Analytics | Responsible AI | Intermediate
In his talk “Responsible AI Is Not an Option,” Dr. Scott Zoldi brings to bear his decades of experience in delivering analytic innovation in a highly regulated environment, to underscore the urgency with which the topics of AI fairness and bias must be ushered onto Boards of Directors’ agendas. In his dynamic presentation, Dr. Zoldi spells out the “why” and “how” of fulfilling the social covenant and soon, regulatory requirements of enterprises using AI ethically, transparently, securely and in their customers’ best interests…more details
Scott Zoldi is chief analytics officer at FICO responsible for advancing the company's leadership in artificial intelligence (AI) and analytics in its product and technology solutions. At FICO Scott has authored more than 120 analytic patents, with 71 granted and 49 pending. Scott is actively involved in the development of analytics applications, Responsible AI technologies and AI governance frameworks, the latter including FICO's blockchain-based [SZ1] model development governance methodology. Scott is a member of the Board of Advisors of FinRegLab, a Cybersecurity Advisory Board Member of the California Technology Council, and a Board Member of Tech San Diego and the San Diego Cyber Center of Excellence. He is also a member of the CNBC Technology Executive Council. Scott received his Ph.D. in theoretical and computational physics from Duke University.
In-person | Talk | Deep Learning | Machine Learning | Intermediate
In this talk I will give an overview of this problem, what makes it difficult, how we have been addressing it at Wayfair, and the lessons that we have learned so far. I will start by giving an overview of the different types of competing business objectives that can arise in the e-commerce use case and how they compete against each other in practice. Along the way I will introduce some of the fundamental concepts of multi-objective optimization, including Pareto Efficiency and the Pareto Frontier, and how they relate to this problem. Finally, I will discuss the pros and cons of various strategies for making recommendation systems profit aware…more details
Ali Vanderveld is a Senior Staff Data Scientist at Wayfair, where she serves as a technical leader for machine learning, currently leading the development of novel search and recommendation technologies. Prior to Wayfair, she led a team focused on language AI at Amazon Web Services and was the Director of Data Science at ShopRunner. She has also worked at Civis Analytics, at Groupon, and as a technical mentor for the Data Science for Social Good Fellowship. Ali has a PhD in theoretical astrophysics from Cornell University and got her start working as an academic researcher at Caltech, the NASA Jet Propulsion Laboratory, and the University of Chicago, working on the development teams for several space telescope missions, including ESA’s Euclid.
In-person | Talk
Differential privacy can be viewed as a technical solution for protecting individual privacy to meet legal or policy requirements for disclosure limitation while analyzing and sharing personal data. Using examples and some mathematical formalism, this talk will introduce differential privacy concepts including the definition of differential privacy, how differentially private analyses are constructed, and how these can be used in practice…more details
Veena Mendiratta is an applied researcher in network reliability and analytics at Nokia Bell Labs based in Naperville, Illinois, USA. Her research interests include network dependability, software reliability engineering, programmable networks resiliency, and telecom data analytics. Current work is focused on network reliability and analytics – architecting and modeling the reliability of next-generation programmable networks; and the development of analytics-based algorithms for anomaly detection, network slicing and network control for improving network performance and reliability. She has led projects on customer experience analytics using data mining and social network analysis techniques, and the development of algorithms and visual analytics for anomaly detection in telecommunications networks. She is a member of the SIAM Visiting Lecturer Program, Life Member of SIAM, Senior Member of IEEE, Member of INFORMS; member of ASA; and was a Fulbright Specialist Scholar for 5 years during which time she visited universities in India, Norway and New Zealand. She holds a B.Tech in engineering from IIT-Delhi, India, and a Ph.D. in operations research from Northwestern University, USA.
In-person | Talk
In this talk, I will provide a holistic review of the research methods and tools for causal analytics in business decisions. I focus especially on causal inference in data science. I will discuss a decision tree that helps data scientists to identify the best causal research method based on the problem, context, and the nature of the data. I will draw on my proprietary research on prescriptive analytics (https://docs.google.com/document/d/1b8yaDzriVB2JyIBNQMsUn-uz4bXnsdFe6hTLLTOs1q4/edit)…more details
Dr. Victor Zitian Chen, CFA, is a believer and action-taker on the idea of a world brain. Dr. Chen is currently the Director of Data Analytics and Insights, Experimental Design and Causal Inference at Fidelity Investments. He leads the causal analytics efforts across the personal investing business at the Fidelity, including experimentation, prescriptive analytics, and causal knowledge graph-based applications. Before joining Fidelity, Dr. Chen was a tenured professor in management and data science at the University of North Carolina, Charlotte, and a visiting professor in international business at Copenhagen Business School, Denmark. He led two major National Science Foundation (NSF) grants focusing on causal knowledge graph-based explainable AI and analytics applications. He founded and led the Global OpenLabs for Performance Enhancement-Analytics and Knowledge System (GoPeaks) – a startup to advance and commercialize knowledge synthesis and causal/prescriptive analytics solutions for business decisions.
In-person | Demo Talk
In this demo, we’ll cover the best practices for setting up data science environments, how to store them so you can replicate your code, and how to share a setup with your coworkers…more details
Dr. Jacqueline Nolis is a data science leader with 15 years of experience in running data science teams and projects at companies ranging from Airbnb to Boeing. She is the Chief Product Officer at Saturn Cloud where she helps design products for data scientists. Jacqueline has a PhD in Industrial Engineering and her academic research focused on optimization under uncertainty. Data science is also her hobby—like making an R package that mails physical postcards of your plots.
Virtual | Bootcamp | Data Analytics
Data literacy is an essential foundational topic for anyone considering data science or machine learning. This session will help one understand core data workflow concepts including what is data, data generation and collecting, data profiling, data transformation, data shaping, and other essential workflow topics. As this is an interactive training session, in addition to covering these data topics, we will layer on SQL and an introduction to relational databases…more details
Sheamus McGovern is the founder of ODSC (The Open Data Science Conference). He is also a software architect, data engineer, and AI expert. He started his career in finance by building stock and bond trading systems and risk assessment platforms and has worked for numerous financial institutions and quant hedge funds. Over the last decade, Sheamus has consulted with dozens of companies and startups to build leading-edge data-driven applications in finance, healthcare, eCommerce, and venture capital. He holds degrees from Northeastern University, Boston University, Harvard University, and a CQF in Quantitative Finance.
In-person | Half-Day Training | Machine Learning | Data Visualization | Intermediate-Advanced
While there are many plotting libraries to choose from, the prolific Matplotlib library is always a great place to start. Since various Python data science libraries utilize Matplotlib under the hood, familiarity with Matplotlib itself gives you the flexibility to fine tune the resulting visualizations (e.g., add annotations, animate, etc.). This session will also introduce interactive visualizations using HoloViz, which provides a higher-level plotting API capable of using Matplotlib and Bokeh (a Python library for generating interactive, JavaScript-powered visualizations) under the hood…more details
Stefanie Molin is a data scientist and software engineer at Bloomberg in New York City, where she tackles tough problems in information security, particularly those revolving around anomaly detection, building tools for gathering data, and knowledge sharing. She is also the author of “Hands-On Data Analysis with Pandas,” which is currently in its second edition. She holds a bachelor’s of science degree in operations research from Columbia University’s Fu Foundation School of Engineering and Applied Science. She is currently pursuing a master’s degree in computer science, with a specialization in machine learning, from Georgia Tech. In her free time, she enjoys traveling the world, inventing new recipes, and learning new languages spoken among both people and computers.
In-person | Half-Day Training | Machine Learning | All Levels
By completing this hands-on workshop, you will develop an understanding of machine learning concepts and methodologies and learn how to fit, tune, and evaluate the predictive performance of a variety of parametric and non-parametric models for classification and regression. You will become familiar with how to preprocess data, build, tune, and cross-validate predictive models, and make predictions with the models in Python…more details
Clinton Brownley, Ph.D., is a data scientist at Meta (formerly Facebook), where he’s responsible for a variety of analytics projects designed to empower employees to do their best work. Prior to this role, he was a data scientist at WhatsApp, working to improve messaging and VoIP calling performance and reliability. Before WhatsApp, he worked on large-scale infrastructure analytics projects to inform hardware acquisition, maintenance, and data center operations decisions at Facebook.
As an avid student and teacher of modern data analysis and visualization techniques, Clinton teaches a graduate course in interactive data visualization for UC Berkeley’s MIDS program, taught a short-term graduate course in regression analysis and machine learning workshop for NYU’s A3SR program, leads an annual machine learning in Python workshop, and is the author of two books, “Foundations for Analytics with Python” and “Multi-objective Decision Analysis”.
Clinton is a past-president of the San Francisco Bay Area Chapter of the American Statistical Association and is a council member for the Section on Practice of the Institute for Operations Research and the Management Sciences. Clinton received degrees from Carnegie Mellon University and American University.
Virtual | Workshop | Deep Learning | MLOps & Data Engineering | Intermediate
OpenMPI is used for high-performance computing at supercomputing centers as part of distributed computing systems. In the first half of this workshop, we will get to know more about OpenMPI and work with it hands-on using a python interface. The second half of the workshop will focus on using DeepSpeed on OpenMPI and work through a few examples. Examples for MPI basics will include inferring pi using distributed computing and running python scripts using OpenMPI…more details
Jennifer Davis, Ph.D. is a Staff Field Data Scientist at Domino Data Labs, where she empowers clients on complex data science projects. She has completed two postdocs in computational and systems biology, trained at a supercomputing center at the University of Texas, Austin, and worked on hundreds of consulting projects with companies ranging from start-ups to the Fortune 100. Jennifer has previously presented topics for Association for Computing Machinery on LSTMs and Natural Language Generation and at conferences across the US and in Italy. Jennifer was part of a panel discussion for an IEEE conference on artificial intelligence in biology and medicine. She has practical experience teaching both corporate classes and at the college level.
Virtual | Workshop
Regardless of their concrete application, the primary goal of forecasting systems is to produce the most accurate forecast possible. However, while beating benchmarks is important, a forecast useable in business processes additionally needs to fulfill many more criteria, which significantly increases the complexity of real-world solutions…more details
Virtual | Workshop | NLP | Machine Learning | Beginner – Intermediate
In this talk, I will be giving some background in Conversational AI, NLP along with Self-supervised and Unsupervised techniques. Transformers-based large language models (LLMs) such as GPT-3, Jurasic, T5 have been foundational to the advances that we see. I will walk the audience through hands-on examples and how they can leverage transformers and large language models for few-shot, zero-shot learning in a variety of NLP applications such as text classification, summarization and question-answering…more details
Chandra Khatri is the Chief Scientist and Head of AI at Got It AI, wherein, his team is transforming AI space by leveraging state-of-the-art technologies to deliver the world’s first fully autonomous Conversational AI system. Under his leadership, Got It AI is democratizing Conversational AI and related ecosystems through automation. Prior to Got-It, Chandra was leading various AI applied and research groups at Uber, Amazon Alexa and eBay.
At Uber, he was leading Conversational AI, Multi-modal AI, and Recommendation Systems. At Amazon he was the founding member of the Alexa Prize Competition and Alexa AI, wherein he was leading the R&D and got the opportunity to significantly advance the field of Conversational AI, particularly Open-domain Dialog Systems, which is considered as the holy-grail of Conversational AI and is one of the open-ended problems in AI. And at eBay he was driving NLP, Deep Learning, and Recommendation Systems related applied research projects.
He graduated from Georgia Tech with a specialization in Deep Learning in 2015 and holds an undergraduate degree from BITS Pilani, India. His current areas of research include Artificial and General Intelligence, Democratization of AI, Reinforcement Learning, Language and Multi-modal Understanding, and Introducing Common Sense within Artificial Agents.
In-person | Workshop | Machine Learning | Beginner-Intermediate
This workshop will teach hands-on coding techniques covering all the foundations of a complete churn fighting data pipeline including churn measurement, feature engineering from raw data, data set creation, and machine learning. The workshop is taught using Python and SQL from the open source fightchurn package (pypi pypi.org/project/fightchurn/, github github.com/carl24k/fight-churn.) Participants should come with their own laptop prepared with Python and an IDE such as PyCharm or VSCode that will allow them to put breakpoints in code that will be demonstrated and discussed….more details
Carl Gold is currently the Data Science Director at OfferFit.ai, an AI-as-a-Service reinforcement learning engine that maximizes customer upsell and retention. Before coming to OfferFit, Carl was Chief Data Scientist of Zuora, the Subscription Economy leading billing platform. Based on his experiences fighting churn for SaaS companies during his time at Zuora, Carl wrote the first book dedicated to customer churn analytics and data science: “Fighting Churn With Data”. Carl has a PhD from the California Institute of Technology and first author publications in leading Machine Learning and Neuroscience journals.
Virtual | Tutorial | Deep Learning | Intermediate-Advanced
Deep Reinforcement Learning equips AI agents with the ability to learn from their own trial and error. Success stories include learning to play Atari games, Go, Dota2, robots learning to run, jump, manipulate. This tutorial will cover the foundations of Deep Reinforcement Learning, including MDPs, DQN, Policy Gradients, TRPO, PPO, DDPG, SAC, TD3, model-based RL, as well as current research frontiers.…more details
Professor Pieter Abbeel is Director of the Berkeley Robot Learning Lab and Co-Director of the Berkeley Artificial Intelligence (BAIR) Lab. Abbeel’s research strives to build ever more intelligent systems, which has his lab push the frontiers of deep reinforcement learning, deep unsupervised learning, especially as it pertains to robotics. Abbeel’s Intro to AI class has been taken by over 100K students through edX, and his Deep Unsupervised Learning materials are standard references for AI researchers. Abbeel has founded several companies, including Gradescope (AI to help instructors with grading homework, projects and exams) and Covariant (AI for robotic automation of warehouses and factories). He advises many AI and robotics start-ups, and is a frequently sought after speaker worldwide for C-suite sessions on AI future and strategy. Abbeel has received many awards and honors, including ACM Prize, IEEE Fellow, PECASE, NSF-CAREER, ONR-YIP, AFOSR-YIP, Darpa-YFA, TR35, and 10+ best paper awards/finalists. His work is frequently featured in the press, including the New York Times, Wall Street Journal, BBC, Rolling Stone, Wired, and Tech Review.
In-person | Tutorial | Deep Learning | All Levels
Deep neural networks can make overconfident errors and assign high confidence predictions to inputs far away from the training data. Well-calibrated predictive uncertainty estimates are important to know when to trust a model’s predictions, especially for safe deployment of models in applications where the train and test distributions can be different. I’ll first present some concrete examples that motivate the need for uncertainty and out-of-distribution (OOD) robustness in deep learning…more details
Balaji is currently a Staff Research Scientist at Google Brain working on Machine Learning and its applications. Previously, he was a research scientist at DeepMind for 4.5+ years. Before that, he received a PhD in machine learning from Gatsby Unit, UCL supervised by Yee Whye Teh.
His research interests are in scalable, probabilistic machine learning. More recently, he has focused on:
– Uncertainty and out-of-distribution robustness in deep learning
– Deep generative models including generative adversarial networks (GANs), normalizing flows and variational auto-encoders (VAEs)
– Applying probabilistic deep learning ideas to solve challenging real-world problems.
In-person | Half-Day Training | Deep Learning | Machine Learning | Beginner – Intermediate
Most production information retrieval systems are built on top of Lucene which use tf-idf and BM25. The current state of the art techniques utilizes embeddings for retrieval. This workshop aims to demystify what is involved in building such a system…more details
In-person | Workshop | NLP | Machine Learning | Beginner – Intermediate
In this talk, I will be giving some background in Conversational AI, NLP along with Self-supervised and Unsupervised techniques. Transformers based large language models (LLMs) such as GPT-3, Jurasic, T5 have been foundational to the advances that we see. I will walk the audience through hands-on examples and how they can leverage transformers and large language models for few-shot, zero-shot learning in a variety of NLP applications such as text classification, summarization and question-answering…more details
Chandra Khatri is the Chief Scientist and Head of AI at Got It AI, wherein, his team is transforming AI space by leveraging state-of-the-art technologies to deliver the world’s first fully autonomous Conversational AI system. Under his leadership, Got It AI is democratizing Conversational AI and related ecosystems through automation. Prior to Got-It, Chandra was leading various AI applied and research groups at Uber, Amazon Alexa and eBay.
At Uber, he was leading Conversational AI, Multi-modal AI, and Recommendation Systems. At Amazon he was the founding member of the Alexa Prize Competition and Alexa AI, wherein he was leading the R&D and got the opportunity to significantly advance the field of Conversational AI, particularly Open-domain Dialog Systems, which is considered as the holy-grail of Conversational AI and is one of the open-ended problems in AI. And at eBay he was driving NLP, Deep Learning, and Recommendation Systems related applied research projects.
He graduated from Georgia Tech with a specialization in Deep Learning in 2015 and holds an undergraduate degree from BITS Pilani, India. His current areas of research include Artificial and General Intelligence, Democratization of AI, Reinforcement Learning, Language and Multi-modal Understanding, and Introducing Common Sense within Artificial Agents.
Virtual | Workshop | All Levels
Coming Soon!
Daniel Lenton is the creator of Ivy, which is an open-source framework with an ambitious mission to unify all other ML frameworks. Prior to starting Ivy, Daniel was a PhD student at Imperial College London, where he published research in the areas of machine learning, robotics and computer vision.
In-person | Workshop | All Levels
In this workshop, we will explore some of these popular NLP techniques that have broad applicability. From the basics of bagging and word vectors to the creating of contextualized representations of words and sentences, the workshop will equip participants with the tools they need to turn messy text data into useful insights. The focus of the workshop will be building NLP approaches with increasing complexity. Each step in the progression will build on the others and be evaluated against one another…more details
Benjamin is a Data Scientist at ThriveHive where he works with business stakeholders to create tools and models providing guidance to internal teams and business customers. Before ThriveHive he worked with the City of Boston’s analytics team on various modelling projects. He received his PhD in Policy Analysis from the RAND Corporation, with a focus on projects related to health, infrastructure and defense. He has experience with a wide array of methods and their applications across a diverse set of verticals.
In-person | Workshop | All Levels
Your model is only as good as the data that goes into it, so removing the maximum amount of noise while retaining signal is vital. This talk will introduce basic signal processing and motion artefact removal techniques, such as low-pass filtering, Kalman filtering, and Savitzky-Golay filtering. The session will take you through practical examples that you can apply straight away to your biological data. I will also make recommendations for data collection to consider when designing a sensor so that you can get the best possible data (because prevention is better than treatment!). Those interested in machine learning and signal analysis for biomedical processing, electrical and optical signals, and wearables technology will enjoy this talk!…more details
In-person | Workshop | Machine Learning | Intermediate-Advanced
Often, categorical variables possess a natural structure that is not linear or ordinal in nature. The months of the year have a circular structure while the US states have a structure that can be represented by a graph. StructureBoost uses novel techniques that allow this known structure to be exploited to yield better predictions. Recently, StructureBoost has been enhance to utilize the structure in the target variable (i.e. in multi-classification) as well as in the predictor variables. This hands-on workshop will demonstrate how to use StructureBoost in different problems involving categorical variables with known structure…more details
Brian Lucena is Principal at Numeristical, where he advises companies of all sizes on how to apply modern machine learning techniques to solve real-world problems with data. He is the creator of StructureBoost, ML-Insights, and the SplineCalib calibration tool. In previous roles he has served as Senior VP of Analytics at PCCI, Principal Data Scientist at Clover Health, and Chief Mathematician at Guardian Analytics. He has taught at numerous institutions including UC-Berkeley, Brown, USF, and the Metis Data Science Bootcamp.
Register for ODSC West 2022 - November 1-3rd
Register and save 60%See our Program Summary for an Event Overview
Program Summary.
How It Works


Virtual conference experience includes networking lounge area, speaker auditorium, expo halls, and prizes
Access multiple livestream tracks on Tuesday, Wednesday, Thursday
Switch between sessions or tracks as your interests dictate
Multiple focus areas including deep learning, machine learning, NLP, research frontiers, AI X for business, and more
Sessions you missed can be viewed on demand at your leisure
Engage virtually with fellow attendees, speakers, and Expo partners
Participate in Q&A sessions with your speaker over live chat
Directly download slides and other session materials
(Training only) Access training and workshops prerequisites, notebooks, and other materials prior to training session starting
(Training only) Access hands-on training and workshops with instructor-led code labs and notebooks.
Interested? Don't doubt - explore ODSC West 2022 Conference
Register and save 60%ODSC Newsletter
Stay current with the latest news and updates in open source data science. In addition, we’ll inform you about our many upcoming Virtual and in person events in Boston, NYC, Sao Paulo, San Francisco, and London. And keep a lookout for special discount codes, only available to our newsletter subscribers!