3 DAYS of HANDS-ON TRAINING, WORKSHOPS, AND TUTORIALS
MAY 9-11th
– Boost your machine learning and deep learning skills –
Taught by World-Class Data Scientists Learn the latest concepts, tools, and techniques from the best in the field. Forge a connection with these rock stars from the industry and academia, who are passionate about molding the next generation of data scientists.
Register your interest for East 2023
Beginner to Advanced Level Training
From the Leading Instructors in the Industry
Machine Learning
Meta-learning for Machine Learning
Self Supervised learning; new techniques
Federated Learning for Data Privacy
Explainable AI and Bias in machine learning
Machine Learning at Scale using Apache Spark
Safety & Robustness in Machine Learning Modeling
Semi-supervised learning
Causal Inference with Machine Learning
Deep Learning
Deep Reinforcement learning
Deep Learning with PyTorch & Tensorflow
Deep Learning Deep Dive
Computer Vision 1/2 Day Training
Deep Learning with Keras
Introduction to Deep learning
Deepfakes Tutorial
Graph Representation Learning
NLP
Self Supervised learning; new techniques
Transfer Learning in NLP
Introduction to NLP and Topic Modeling
NLP Pre-trained Transformer Models with Bert, Ernie,, and GPT-2
State-of-the-Art NLP with PyTorch and Tensorflow
Semi-supervised learning
Hugging Face Transformer Library Workshop
Applications of NLP; Sentiment Analysis, Dialog Systems, and Semantic Search
ADDITIONAL TUTORIALS & WORKSHOPS
Machine Learning for Cyber Security
Real-time Streaming Analytics
MLOps and Machine Learning Pipelines
Introduction to Machine Learning Using scikit-learn
Auto Machine Learning (AutoML)
Distributed Machine Learning
Introduction to Data Analysis with Python Pandas
Machine Learning Workflow with Kubeflow & Kubernetes
East 2022 Preliminary Training and Workshop Schedule
We are delighted to announce our East 2022 Schedule!
Training | Virtual | Bootcamp | Machine Learning | Intermediate-Advanced
Abstract Coming Soon!
Mona is a Data Science Manager at Greenhouse Software in New York City, where they contribute to data-informed decision making across the company and machine learning solutions to improve the hiring process for Greenhouse customers. They’ve previously worked in government, creating analytics and machine learning solutions to improve the lives of New Yorkers, and continue to be involved in civic projects through a number of volunteer and non-profit organizations. They’ve also been a statistics and data science educator with DataCamp, Emeritus, and in university settings. They hold a graduate degree in Developmental Psychology, and are passionate about contributing to the ethical use of data science methodology in the public and private sector.
Training | Virtual | Bootcamp | Open-source | Intermediate
In this training, you will learn how to accelerate your data analyses using the Python language and Pandas, a library specifically designed for tabular data analysis. We start by learning the core Pandas data structures, the Series and DataFrame. From these foundations, we will learn to use the split-apply-combine paradigm for grouped computations, manipulate time series, and perform advanced joins between datasets. Specifically, loading, filtering, grouping, and transforming data. Having completed this workshop, you will understand the fundamentals and advanced features of Pandas, be aware of common pitfalls, and be ready to perform your own analyses…more details
Daniel Gerlanc has worked as a data scientist for more than decade and been writing software for nearly 20 years. He frequently teaches live trainings on oreilly.com and is the author of the video course Programming with Data: Python and Pandas. He has coauthored several open source R packages, published in peer-reviewed journals, and is a graduate of Williams College.
Training | Virtual | NLP | Machine Learning | All Levels
In this course we will go through Natural Language Processing fundamentals, such as pre-processing techniques,tf-idf, embeddings, and more. It will be followed by practical coding examples, in python, to teach how to apply the theory to real use cases. The goal of this workshop is to provide the attendees all the basic tools and knowledge they need to solve real problems and understand the most recent and advanced NLP topics…more details
Leonardo De Marchi holds a Master in Artificial intelligence and has worked as a Data Scientist in the sports world, with clients such as the New York Knicks and Manchester United, and with large social networks, like Justgiving. He now works as Head of Data Scientist and Analytics in Badoo, the largest dating site with over 420 million users. He is also the lead instructor at ideai.io, a company specialized in Reinforcement Learning, Deep Learning, and Machine Learning training.
Training | In-person | Machine Learning | MLOps & Data Engineering | All Levels
Data professionals (analysts, scientists, operators, …) utilize data to extract insights from it and subsequently to make decisions that impact day-to-day operations as well as long term strategy for organizations. The process of going from data to insights and using decisions typically involve (a) extracting data from varied structured and unstructured sources; (b) normalizing, cleaning, stitching such varied data sources to obtain “ground truth”; (c) extracting structure within data, interacting with it, visualizing it to obtain insights; (d) predicting, optimization, doing scenario analysis to make decisions, and (e) automating all of the above while allowing for human in the loop intervention. In this hands-on workshop, we shall discuss all of these with the help of illustrative datasets in a low-code / no-code AI environment…more details
Devavrat Shah, PhD is the founding director of Statistics and Data Science at MIT. He is also a member of IDSS, LIDS, CSAIL and ORC at MIT. He co-founded Celect, Inc. (now part of Nike) in 2013 to help retailers decide what to put where by accurately predicting demand using omni-channel data. He is a co-founder and CTO of IkigaiLabs with the mission to build self-driving organizations by enabling data-driven operations with human-in-the-loop.
His research focuses on statistical inference and stochastic networks. His contributions span a variety of areas including resource allocation in communications networks, inference and learning on graphical models, algorithms for social data processing including ranking, recommendations and crowdsourcing and more recently causal inference. He has made foundational contributions to the development of “gossip” protocols and “message-passing” algorithms for statistical inference which have been the building blocks of modern distributed data processing systems. His work spans a range of areas across electrical engineering, computer science and operations research.
His work has received broad recognition, including prize paper awards in Machine Learning, Operations Research and Computer Science, and career prizes including 2010 Erlang prize from the INFORMS Applied Probability Society, awarded bi-annually to a young researcher who has made outstanding contributions to applied probability. He is a distinguished alumni of his alma mater IIT Bombay from where he graduated with the honor of President of India Gold Medal. His work has been covered in popular press including NY Times, Forbes, Wired and Redditt.
Workshop | Virtual | MLOps & Data Engineering | Machine Learning | Intermediate
This session is all about vector databases. If you are a data scientist or data/software engineer this session would be interesting for you. You will learn how you can easily run your favourite ML models with the vector database Weaviate. You’ll get an overview of what a vector database like Weaviate can offer: such as semantic search, question answering, data classification, named entity recognition, multimodal search, and much more. After this session, you are able to load in your own data and query it with your preferred ML model!…more details
Laura is a Data Scientist at SeMI, where we build the open-source vector search engine Weaviate. She researches new machine learning features for Weaviate and works on everything UX/DX related to Weaviate. For example, she is responsible for the GraphQL API design. She is in close contact with our open source community. Additionally, she likes to solve custom use cases with Weaviate, and introduces Weaviate to other people by means of Meetups, talks and presentations.
Workshop | Virtual | Big Data Analytics | Machine Learning | All Levels
This talk will preview some of the latest new features in Streamlit, and Streamlit Cloud, and will finish with our deploying a machine learning model in the cloud so that others can see and interact with our results…more details
Adrien is co-founder and CEO of Streamlit which is pioneering next-generation tools for machine learning engineers. Dr. Treuille has been VP of Simulation Zoox, lead a Google X project, and was a Professor of Computer Science and Robotics at Carnegie Mellon. He gives talks around the world, including to the President’s Council of Advisors on Science and Technology, and has won numerous scientific awards, including the MIT TR35. Adrien and his work have been featured in the documentaries “What Will the Future Be Like” by PBS/NOVA, and “Lo and Behold” by Werner Herzog.
Workshop | Virtual | Machine Learning | All Levels
Data scientists can spend 60 to 80% of their time exploring and cleaning data. When they’re given an updated data set, this process should be repeated but often, it isn’t. This can lead to a model that poorly describes the system it represents. However, there is something that you can do about this. The “feature type” system in OCI Data Science’s Accelerated Data Science (ADS) SDK classifies data based on what they represent, not how they’re stored in memory. It also gives you the tools to compute custom statistics, create visualizations, use a validator and a warning system, and select columns based on the feature types…more details
A modern polymath, John holds advanced degrees in mechanical engineering, kinesiology and data science, with a focus on solving novel and ambiguous problems. As a senior applied data scientist at Amazon, John worked closely with engineering to create machine learning models to arbitrate chatbot skills, entity resolution, search, and personalization.
As a principal data scientist for Oracle Cloud Infrastructure, he is now defining tooling for data science at scale. John frequently gives talks on best practices and reproducible research. To that end, he has developed an approach to improve validation and reliability by using data unit tests and has pioneered Data Science Design Thinking. He also coordinates SoCal RUG, the largest R meetup group in Southern California.
Training | Virtual | Machine Learning | Beginner
We will start this training by learning about scikit-learn’s API for supervised machine learning. scikit-learn’s API mainly consists of three methods: fit to build models, predict to make predictions from models, and transform to modify data. This consistent and straightforward interface abstracts away the underlying algorithm, thus enabling us to focus on our particular problems. We will learn about the importance of splitting your data into train and test sets for model evaluation. Next, we will learn about combining preprocessing techniques with machine learning models using scikit-learn’s Pipeline…more details
Thomas J. Fan is a Senior Software Engineer at Quansight Labs, working to sustain and evolve the PyData open-source ecosystem. He is a maintainer for scikit-learn, an open-source machine learning library written for Python. Previously, he worked at Columbia University, improving the interoperability between scikit-learn and AutoML systems. Thomas holds a Masters in Physics from Stony Brook University and a Masters in Mathematics from New York University.
Training | Virtual | Bootcamp | Data Visualization | Intermediate-Advanced
In this workshop, we will build an interactive data visualization from scratch using d3.js in the browser. The possibilities shown in d3 examples are exciting but the API surface of d3 and the various browser standards like HTML, CSS, SVG, and JavaScript, can be overwhelming. Think of this workshop as a guided tour that will point out the important things to pay attention to as we go step-by-step from CSV file to interactive visualization…more details
Ian Johnson is a User Experience Engineer at Google. He also organizes of Bay Area d3, starts with SVG and then dives deep into d3 including DOM manipulation, categorical and quantitative scales, axis, brushes, color schemes, events and histograms. Ian likes to make sense of data by exploring it visually with D3.js!
Workshop | Virtual | NLP | Machine Learning | Beginner – Intermediate
In this talk, I will be giving some background in Conversational AI, NLP along with Self-supervised and Unsupervised techniques. Transformers based large language models (LLMs) such as GPT-3, Jurasic, T5 have been foundational to the advances that we see. I will walk the audience through hands-on examples and how they can leverage transformers and large language models for few-shot, zero-shot learning in a variety of NLP applications such as text classification, summarization and question-answering…more details
Chandra Khatri is the Chief Scientist and Head of AI at Got It AI, wherein, his team is transforming the AI space by leveraging state-of-the-art technologies in order to deliver Self-Discovering, Self-Training, and Self-Optimizing products. Under his leadership, Got It AI is democratizing Conversational AI and related ecosystems through automation. Prior to Got-It, Chandra was leading various kinds of applied research projects at Uber AI such as Conversational AI, Multi-modal AI, and Recommendation Systems.
Prior to Uber AI, he was the founding member of the Alexa Prize Competition at Amazon, wherein he was leading the R&D and got the opportunity to significantly advance the field of Conversational AI, particularly Open-domain Dialog Systems, which is considered as the holy-grail of Conversational AI and is one of the open-ended problems in AI. Prior to Alexa AI, he was driving NLP, Deep Learning, and Recommendation Systems related Applied Research at eBay. He graduated from Georgia Tech with a specialization in Deep Learning in 2015 and holds an undergraduate degree from BITS Pilani, India.
His current areas of research include Artificial and General Intelligence, Democratization of AI, Reinforcement Learning, Language Understanding, Conversational AI, Multi-modal and Human-agent Interactions, and Introducing Common Sense within Artificial Agents.
Training | Virtual | NLP | Machine Learning | Intermediate-Advanced
The half-day training will train attendees on how to use Hugging Face’s Hub as well as the Transformers and Datasets library to efficiently prototype and productize machine learning models…more details
Patrick von Platen is a research engineer at Hugging Face and one of the core maintainers of the popular Transformers library. He specializes in speech recognition, encoder-decoder models and long-range sequence modeling. Before joining Hugging Face, Patrick conducted research in speech recognition at Uber AI, Cambridge University, and RWTH Aachen University.
Tutorial | Virtual | Deep Learning | Machine Learning | Intermediate – Advanced
In this tutorial, we will introduce RLlib (http://rllib.io/), an open-source RL library with a proven track record for solving real-life industry problems at scale. We will walk through different industrial RL use cases and the solutions RLlib offers for those. In particular, we will build a recommender system using offline RL, show how to train policies that master complex multi-agent games, and demonstrate how you can connect external simulators to RLlib at scale for faster learning…more details
Richard Liaw is an engineer manager at Anyscale, where he leads a team in building open source machine learning libraries on top of Ray. He is on leave from the PhD program at UC Berkeley, where he worked at the RISELab advised by Ion Stoica, Joseph Gonzalez, and Ken Goldberg. In his time in the PhD program, he was part of the Ray team, building scalable ML libraries on top of Ray.
Christy is a Developer Advocate at Anyscale. Her work involves figuring out how to parallelize different AI algorithms and creating demos and tutorials on how to use Ray and Anyscale. Before that, she was a Senior AI/ML Specialist Solutions Architect at AWS and Data Scientist at a banking startup and at Atlassian. In her spare time, she enjoys hiking and bird watching.
Avnish Narayan is an ML Engineer at Anyscale where he works on RLlib. He’s passionate about exploring where RL can improve upon existing solutions in industrial applications. He previously received his MS in Computer Science at USC, where he did research on the applications of RL in robotic manipulation problems.
Training | Virtual | Machine Learning | Intermediate
We will learn about evaluating, calibrating, and inspecting models during this training. Model evaluation is an essential piece of the ML workflow. We will cover multiple metrics and see how they behave on various combinations of datasets and models. We will explore scikit-learn’s plotting API to visualize a model’s performance. Next, we will learn how to calibrate a machine learning model with scikit-learn. We will see how models behave before and after calibrating by visualizing an estimator’s calibration. Next, we will explore techniques to inspect machine learning models. Specifically, we will see how to examine open-box machine learning models, such as linear models and random forests…more details
Thomas J. Fan is a Senior Software Engineer at Quansight Labs, working to sustain and evolve the PyData open-source ecosystem. He is a maintainer for scikit-learn, an open-source machine learning library written for Python. Previously, he worked at Columbia University, improving the interoperability between scikit-learn and AutoML systems. Thomas holds a Masters in Physics from Stony Brook University and a Masters in Mathematics from New York University.
Tutorial | Virtual |Machine Learning | Deep Learning | Intermediate
In this presentation, we will share our experience on the topic. We will start by classical methods (single imputation, multiple imputation, likelihood based methods) developed in the inferential framework, where the aim is to estimate at best the parameters and their variance in the presence of missing data. Then we will present recent results in a supervised-learning setting…more details
Julie Josse is a senior researcher in statistics and machine learning applied to health at Inria, a French research institute in digital sciences, and Professor at Ecole Polytechnique (Paris). She is an expert in the treatment of missing values (inference, multiple imputation, matrix completion, MNAR, supervised learning with missing values) and has created a website on the topic (https://rmisstastic.netlify.app/) for users. Her research also focuses on causal inference techniques (causal inference with missing values, combining RCT and observational data) for personalized medicine. Julie Josse is dedicated to reproducible research with R statistical software: she has developed packages including FactoMineR and missMDA to transfer her work.
Gaël Varoquaux is a research director working on data science and health at Inria (French Computer Science National research). His research focuses on using data and machine learning for scientific inference, with applications to health and social science, as well as developing tools that make it easier for non-specialists to use machine learning. He has long applied it to brain-imaging data to understand cognition. Years before the NSA, he was hoping to make bleeding-edge data processing available across new fields, and he has been working on a mastermind plan building easy-to-use open-source software in Python. He is a core developer of scikit-learn, joblib, Mayavi and nilearn, a nominated member of the PSF, and often teaches scientific computing with Python using the scipy lecture notes.
Workshop | In-person | Deep Learning | Beginner – Intermediate
Thanks to packages like Keras and Torch, you can get started with neural networks with only a few lines of R code. Once you understand the basic concepts, you will be able to use deep learning to make AI-generated humorous content! In this workshop you’ll get to make a neural network on text data to generate pet names. By the end of the workshop you should feel comfortable using neural networks in a variety of contexts…more details
Dr. Jacqueline Nolis is a data science leader with 15 years of experience in running data science teams and projects at companies ranging from Airbnb to Boeing. She is the Head of Data Science at Saturn Cloud where she helps design products for data scientists. Jacqueline has a PhD in Industrial Engineering and her academic research focused on optimization under uncertainty. For fun Jacqueline likes to use data science for humor—like using deep learning to generate offensive license plates.
Workshop | In-person | Machine Learning | All Levels
This talk will cut through some of the biggest issues I’ve seen with Pandas code after working with the library for a while and writing three books on it…more details
Matt Harrison has been using Python since 2000. He runs MetaSnake, a Python and Data Science consultancy and corporate training shop. In the past, he has worked across the domains of search, build management and testing, business intelligence, and storage. He has presented and taught tutorials at conferences such as Strata, SciPy, SCALE, PyCON, and OSCON as well as local user conferences.
Training | In-Person | Machine Learning Safety and Security | Responsible AI | Intermediate
In this workshop, which is directed to both a Data Science audience who may want to learn DFIR, and a DFIR audience who may want to learn Data Science, Jess Garcia will explain the fundamentals of Data Science and DFIR, and will lead the audience through all the different steps of an end-to-end investigation using exclusively Data Science tools and techniques. In the process, Jess will introduce multiple forensic artifacts and will explain the value they provide to the overall investigation….more details
Jess Garcia is the Founder of the global Cybersecurity/DFIR firm One eSecurity and a Senior Instructor with the SANS Institute. During his 25 years in the field, Jess has led a myriad of complex multinational investigations for Fortune 500 companies and global organizations. As a SANS Instructor, Jess stands as one of the most prolific and veteran ones, having taught 10+ different highly technical Cybersecurity/DFIR courses in hundreds of conferences world-wide over the last 19 years. Jess is also an active Cybersecurity/DFIR Researcher. With the mission of bringing Data Science/AI to the DFIR field, Jess launched in 2020 the DS4N6 initiative (www.ds4n6.io), under which he is leading the development of multiple open source tools, standards and analysis platforms for DS/AI+DFIR interoperability.
David Contreras is a Senior Forensic Analyst in One eSecurity, working in Incident Response, leading the Research team and Internal products development. David has more than six years in DFIR, working in multiple remarkable incidents in international organizations and many other projects related to Threat Hunting, SOCs, etc. He also collaborates in the research of the DS4N6 project (www.ds4n6.io), helping to provide Data Science and Machine Learning content to the Cybersecurity community.
Training | In-Person | Machine Learning | Deep Learning | Beginner
Have you ever wondered about how those data scientists at Facebook and LinkedIn make friend recommendations? Or how epidemiologists track down patient zero in an outbreak? If so, then this tutorial is for you. In this tutorial, we will use a variety of datasets to help you understand the fundamentals of network thinking, with a particular focus on constructing, summarizing, and visualizing complex networks…more details
Eric is an Investigator at the Novartis Institutes for Biomedical Research, where he solves biological problems using machine learning. He obtained his Doctor of Science (ScD) from the Department of Biological Engineering, MIT, and was an Insight Health Data Fellow in the summer of 2017. He has taught Network Analysis at a variety of data science venues, including PyCon USA, SciPy, PyData, and ODSC, and has also co-developed the Python Network Analysis curriculum on DataCamp. As an open-source contributor, he has made contributions to PyMC3, matplotlib, and bokeh. He has also led the development of the graph visualization package nxviz, and a data cleaning package pyjanitor (a Python port of the R package).
Training | In-person | Machine Learning | Data Visualization | Intermediate-Advanced
This session will equip you with the skills to make customized visualizations for your data using Python. While there are many plotting libraries to choose from, the prolific Matplotlib library is always a great place to start. Since various Python data science libraries utilize Matplotlib under the hood, familiarity with Matplotlib itself gives you the flexibility to fine tune the resulting visualizations (e.g., add annotations, animate, etc.). This session will also introduce interactive visualizations using HoloViz, which provides a higher-level plotting API capable of using Matplotlib and Bokeh (a Python library for generating interactive, JavaScript-powered visualizations) under the hood…more details
Stefanie Molin is a data scientist and software engineer at Bloomberg in New York City, where she tackles tough problems in information security, particularly those revolving around anomaly detection, building tools for gathering data, and knowledge sharing. She is also the author of “Hands-On Data Analysis with Pandas,” which is currently in its second edition. She holds a bachelor’s of science degree in operations research from Columbia University’s Fu Foundation School of Engineering and Applied Science. She is currently pursuing a master’s degree in computer science, with a specialization in machine learning, from Georgia Tech. In her free time, she enjoys traveling the world, inventing new recipes, and learning new languages spoken among both people and computers.
Workshop | In-person | Machine Learning | Deep Learning | Intermediate
Learn how to pair traditional AutoML workflows with orchestration (the automated configuration, management, and coordination of data and models) and experiment tracking (management of a system of record for our data and models) to provide yourself with the tools to turn your machine learning models into an operational machine learning workflow which can plug into common DevOps platforms…more details
Anish loves turning ML ideas into ML products. Anish started his career working with multiple Data Science teams within SAP, using traditional ML, deep learning, and recommendation systems before landing at Weights & Biases. With the art of programming and a little bit of magic, Anish crafts ML projects to help better serve our customers, turning “oh nos” to “a-ha”s!
Workshop | In-person | Machine Learning | Intermediate
This workshop is a hands-on introduction to Bayesian Decision Analysis (BDA), which is a framework for using probability to guide decision-making under uncertainty. I start with Bayes’s Theorem, which is the foundation of Bayesian statistics, and work toward the Bayesian bandit strategy, which is used for A/B testing, medical tests, and related applications. For each step, I provide a Jupyter notebook where you can run Python code and work on exercises…more details
Allen Downey is a Professor of Computer Science at Olin College of Engineering in Needham, MA. He is the author of several books related to computer science and data science, including Think Python, Think Stats, Think Bayes, and Think Complexity. Prof Downey has taught at Colby College and Wellesley College, and in 2009 he was a Visiting Scientist at Google. He received his Ph.D. in Computer Science from U.C. Berkeley, and M.S. and B.S. degrees from MIT.
Workshop | In-Person | Machine Learning | MLOps & Data Engineering | Beginner-Intermediate
Although powerful, modern machine learning models can be sensitive. Seemingly subtle changes in a data distribution can destroy the performance of otherwise state-of-the art models, which can be especially problematic when ML models are deployed in production.. In this workshop, we will give a hands-on overview to drift detection, the discipline focused on detecting such changes. We will start by building an understanding of the ways in which drift can occur, and why it pays to detect it. We’ll then explore the anatomy of a drift detector, and learn how they can be used to detect drift in a principled manner…more details
Ed Shee, Head of Developer Relations at Seldon. Having previously led a tech team at IBM, Ed comes from a cloud computing background and is a strong believer in making deployments as easy as possible for developers. With an education in computational modelling and an enthusiasm for machine learning, Ed has blended his work in ML and cloud native computing together to cement himself firmly in the emerging field of MLOps.
Ashley is a data science research engineer at Seldon, where he works on developing production-ready tools for drift, adversarial and outlier detection. Prior to joining Seldon, he spent a number of years as a Research Fellow at The Alan Turing Institute. Here, he explored the use of machine learning for tackling aerospace engineering problems, with a focus on explainability and uncertainty quantification. Ashley also completed a PhD at the University of Cambridge, and is a keen proponent of open-source software
Bootcamp | Virtual | Machine Learning | Beginner
In this workshop we cover the basics of modern R. You’ll learn how to read data from a CSV using the readr package, manipulate data with dplyr and make compelling visualizations with ggplot2…more details
Workshop | In-person
In this workshop, you’ll learn an easy way to incorporate data science and AI/ML into an OpenShift development workflow. As an example, you’ll use an object detection model to detect ‘dog(s)’ in an image…more details
Audrey Reznik has been in the IT industry (private and public sectors) for 27 years in multiple verticals. In the last 4 years, she worked as a Data Scientist at ExxonMobil where she created a Data Science Enablement team to help data scientists easily deploy ML models in a Hybrid Cloud environment. Audrey was instrumental in educating scientists about what the OpenShift platform was and how to use OpenShift containers (images) to organize, run, and visualize data analysis results. Audrey now works as a Data Scientist with the Red Hat Data Science Team where she is focused on next-generation applications. She is passionate about Data Science and in particular the current opportunities with ML and Federated Data.
Prasanth Anbalagan is a Senior Principal Software Engineer (QE and Analysis) on the Red Hat OpenShift Data Science team. Prasanth earned his M.S and Ph.D in Computer Science from North Carolina State University focusing on Software Reliability Engineering, Predictive Modeling and Automated Software Engineering. As a member of AI team at Red Hat, Prasanth focuses on development of services and tools to analyze, manipulate, and visualize data and execute automated operations as part of an Analytics, Machine Learning and AI platform.
Workshop | In-person | Data Analytics | All Levels
This workshop will introduce you to optimization as a powerful tool in your analytics toolbox. You’ll learn what optimization is, how to think about a problem through an optimization lens, and how to formulate the problem. Having heard about supply chain in the past two years due to shipping delays and labor shortages, it’s only fitting to discuss an optimization problem within this domain. You will see how to formulate that problem mathematically, how to add complexity to it step by step, and how to solve it utilizing Python and Gurobi, a leading commercial optimization solver…more details
Ehsan is a Principal Operations Research Scientist at Decision Spot, with knowledge in logistics and transportation industries. Over the years, he has worked with several Fortune 500 companies, including GE, Norfolk Southern, and C.H. Robinson. Ehsan has worked on a variety of supply chain projects and has focused primarily on network optimization and routing. Before joining Decision Spot, he worked at Opex Analytics, which was acquired by Llamasoft, and later by Coupa.
He holds a PhD in Industrial Engineering and has been an Adjunct Lecturer at Northwestern Master of Science in Analytics (MSiA) program since Fall 2019.
Workshop | In-person | Deep Learning | Machine Learning
Some of the most exciting tech research happening now is in the area of deep learning, but how do we get started with hands-on practice and how do we gain a basic understanding of what is going on within all of those deep learning layers?
This lesson will help a beginner navigate this new landscape. We’ll start by identifying some of the major breakthroughs that make deep learning what it is today. Then we’ll get a chance to learn about the math and architecture behind neural nets. Lastly, we’ll talk about how you can create and tune your own neural networks via Keras…more details
Julia Lintern currently works as an instructor for the Metis Data Science Flex Program. Previously, she worked as a Data Scientist for the New York Times. Julia began her career as a structures engineer designing repairs for damaged aircraft. Julia holds an MA in applied math from Hunter College, where she focused on visualizations of various numerical methods and discovered a deep appreciation for the combination of mathematics and visualizations. During certain seasons of her career, she has also worked on creative side projects such as Lia Lintern, her own fashion label.
Tutorial | In-person | Deep Learning | All Levels
This is an introductory and hands-on guided tutorial of Ray, which provides powerful yet easy-to-use abstractions for implementing distributed systems in Python. This tutorial includes a brief talk to provide an overview of concepts in Ray Core, why you should use Ray for distributing Python and machine learning workloads, and a brief discussion on Ray’s library ecosystem.
Primarily, the tutorial will focus on Ray Core APIs to write remote functions and actors. Attendees will walk away with an understanding of why distributed computing is a necessity today, the common design patterns for writing distributed Python applications in Ray, and a basic understanding of how Ray works under hood…more details
Stephanie is a final-year PhD student at UC Berkeley and a software engineer at Anyscale. She is interested in abstractions for distributed computing and problems in fault tolerance. Towards this end, she is also a maintainer for the open-source project Ray, which provides a simple, universal API for building distributed applications in Python.
Tutorial | In-person | Responsible AI | Intermediate
In this tutorial, Olivier Blais, Project Editor of the “ISO/IEC TS 5471 – Quality evaluation guidelines for AI Systems” technical specifications, will share modern best practices and techniques to improve the quality evaluation of AI systems. This session targets AI practitioners, AI project managers, and AI leads as AI system quality is paramount in every AI project, delivery processes and experts’ toolbox. Participants will learn about upcoming AI system quality evaluation approach and methods that will inspire future certifications…more details
Olivier is co-founder and VP of decision science at Moov AI. He is the editor of the international ISO standard that defines the quality of artificial intelligence systems, where he leads a team of 50 AI professionals from around the world.
His cutting-edge AI and machine learning knowledge have led him to implement a data culture in various industries and support digital transformation projects in many companies such as Pratt & Whitney, Metro, Sharethrough, Merck, and Premier Tech.
He is a mentor for AI for Creative Destruction Labs and coaches several start-ups. As a speaker, his topics of choice are adopting and applying AI and responsible AI.
Olivier is the recipient of the prestigious “30 under 30” award (2019) and is co-author of a patent for an advanced algorithm that evaluates a borrower’s creditworthiness.
Workshop | In-person | NLP | Machine Learning | Intermediate
In this workshop, I will demonstrate an end-to-end NER application to identify medications in social media data using the spaCy NLP library. We will begin with an overview of neural NER models, including recent transformer-based models (i.e. BERT). We will then dig into spaCy’s project structure and how to set up their base neural model for NER. Our first objective will be to identify medication entities in social media conversation on Twitter. Once we’ve run and examined the results of the model, we will see how we can achieve significant performance boosts by tweaking project parameters and using a transformer architecture…more details
Ben is a Senior Data Scientist at the Institute for Experiential AI. He obtained his Masters in Public Health (MPH) from Johns Hopkins and his PhD in Policy Analysis from the Pardee RAND Graduate School. Since 2014, he has been working in data science for government, academia and the private sector. His major focus has been on Natural Language Processing (NLP) technology and applications. Throughout his career, he has pursued opportunities to contribute to the larger data science community. He has spoken at data science conferences , taught courses in Data Science, and helped organize the Boston chapter of PyData. He also contributes to volunteer projects applying data science tools for public good.
Workshop| Virtual | NLP | Machine Learning | Intermediate
In order to gain a proper understanding of modeling I will explain traditional NLP techniques using TFIDF approaches and go into details of different deep learning architectures such as feed-forward neural network and convolutional neural network (CNN). Along with these concepts, I will also show code snippets in Keras to build the classifier. I will conclude with some of the metrics commonly used in measuring the performance of the classifier…more details
Sanghamitra Deb is a Staff Data Scientist at Chegg, she works on problems related school and college education to sustain and improve the learning process. Her work involves recommendation systems, computer vision, graph modeling, deep NLP analysis , data pipelines and machine learning. Previously, Sanghamitra was a data scientist at a Accenture where she worked on a wide variety of problems related data modeling, architecture and visual story telling. She is an avid fan of python and has been programming for more than a decade.
Trained as an astrophysicist (she holds a PhD in physics) she uses her analytical mind to not only work in a range of domains such as: education, healthcare and recruitment but also in her leadership style. She mentors junior data scientists at her current organization and coaches students from various field to transition into Data Science. Sanghamitra enjoys addressing technical and non-technical audiences at conferences and encourages women into joining tech careers. She is passionate about diversity and has organized Women In Data Science meetups.
Tutorial | Virtual | Machine Learning | Advanced
In this tutorial, I’ll discuss how machine learning tools can be rigorously integrated into observational study analyses, and how they interact with classical statistical ideas around randomization, semiparametric modeling, double robustness, etc. I’ll also survey some recent advances in methods for treatment heterogeneity, and illustrate them with example applications in R. When deployed carefully, machine learning enables us to develop causal estimators that reflect an observational study design more closely than basic linear regression based methods…more details
Stefan Wager is an Associate Professor of Operations, Information and Technology at Stanford Graduate School of Business, and an Associate Professor of Statistics (by courtesy). He received his PhD in Statistics from Stanford in 2016, and has worked with or consulted for several Silicon Valley companies, including Dropbox, Facebook, Google and Uber. His research lies at the intersection of causal inference, optimization, and statistical learning. He is particularly interested in developing new solutions to problems in statistics, economics and decision making that leverage recent advances in machine learning.
Workshop | Virtual
Existing ETL & MLOps tools claim to solve orchestration problems but no one does it the right way. In this hands-on workshop, we’ll go through a sample standard ML data pipeline, which represents the typical data science use case, extracting data from multiple data sources: DB and DWH, transforming it, viewing the data, and cleaning it. Then we’ll make sure it meets the quality standards and start training the model. During each of these phases, we will talk about testing (unit/integration tests). As a pre-pipeline step, we’ll talk about optional data preparation flows and talk about some strategies to accelerate the whole process by setting the quality gates, data testing, and some of the labeling services out there…more details
Ido Michael co-founded Ploomber to help data scientists build faster. He’d been working at AWS leading data engineering/science teams. Single handedly he built 100’s of data pipelines during those customer engagements together with his team. He came to NY for his MS at Columbia University. He focused on building Ploomber after he constantly found that projects dedicated about 30% of their time just to refactor the dev work (prototype) into a production pipeline.
Workshop | Virtual
In this workshop, you’ll learn an easy way to incorporate data science and AI/ML into an OpenShift development workflow. As an example, you’ll use an object detection model to detect ‘dog(s)’ in an image…more details
Audrey Reznik has been in the IT industry (private and public sectors) for 27 years in multiple verticals. In the last 4 years, she worked as a Data Scientist at ExxonMobil where she created a Data Science Enablement team to help data scientists easily deploy ML models in a Hybrid Cloud environment. Audrey was instrumental in educating scientists about what the OpenShift platform was and how to use OpenShift containers (images) to organize, run, and visualize data analysis results. Audrey now works as a Data Scientist with the Red Hat Data Science Team where she is focused on next-generation applications. She is passionate about Data Science and in particular the current opportunities with ML and Federated Data.
Prasanth Anbalagan is a Senior Principal Software Engineer (QE and Analysis) on the Red Hat OpenShift Data Science team. Prasanth earned his M.S and Ph.D in Computer Science from North Carolina State University focusing on Software Reliability Engineering, Predictive Modeling and Automated Software Engineering. As a member of AI team at Red Hat, Prasanth focuses on development of services and tools to analyze, manipulate, and visualize data and execute automated operations as part of an Analytics, Machine Learning and AI platform.
Tutorial | Virtual | NLP | Beginner – Intermediate
This talk aims to give an overview walkthrough of the suite of NLP methods grounded in neural-network architectures, including recurrent neural networks (RNNs), transformers, and convolutional neural networks (CNNs). We will connect them by diving into their similarities and differences. You will come away from the talk gaining the overview picture of NLP and grasping the theoretical essence that underpins NLP methods. This talk hopes to empower you with the foundational NLP knowledge and reduce the knowledge barrier for you to jumpstart your NLP projects…more details
Chengyin Eng is a Senior Data Science Consultant on the Machine Learning Practice team at Databricks. She is experienced in developing end-to-end scalable machine learning solutions for cross-functional clients. She also teaches deep learning and ML in production courses and regularly gives talks at universities and conferences. Prior to Databricks, she worked in the life insurance industry, where she contributed to risk modeling and marketing pipelines. She holds an MS in Computer Science from the University of Massachusetts, Amherst. Her Bachelor’s degrees were in Statistics and Environmental Studies from Mount Holyoke College.
Workshop | Virtual | Responsible AI | Machine Learning Safety & Security
In this workshop, we’ll walk you through several real-world use cases for synthetic data. You’ll learn how to balance a biased medical dataset to improve early cancer detection in women, generate realistic time-series financial data for forecasting, and more. You can test the examples yourself – some with Gretel-synthetics, a fully open-source package, and some using Gretel Blueprints, a collection of notebooks and sample code that leverage the open-source package through Gretel’s client…more details
Lipika Ramaswamy is a Senior Applied Scientist at Gretel.ai where she focuses on developing advanced synthetic data generation technologies that include privacy guarantees. Prior to Gretel.ai, she worked as a data scientist at LeapYear, a differential privacy software company. Lipika attended Bryn Mawr College for her undergrad, where she began her STEM career, and holds a Master’s in Data Science from Harvard University.
Workshop | Virtual | Machine Learning | NLP | Beginner – Intermediate
This talk will present self-supervised speech representation learning approaches and their connection to related research areas. Since many of the current methods focused solely on automatic speech recognition as a downstream task, we will review recent efforts on benchmarking learned representations to extend the application of such representations beyond speech recognition…more details
Abdelrahman Mohamed (PhD) is a research scientist at Meta AI Research (previously, Facebook AI Research (FAIR)). He was a principal scientist/manager in Amazon Alexa and a researcher in Microsoft Research. Abdelrahman was part of the team that started the Deep Learning revolution in Spoken Language Processing in 2009. He is the recipient of the IEEE Signal Processing Society Best Journal Paper Award for 2016. His current research interest focuses on improving, using, and benchmarking learned speech representations, e.g. HuBERT, Wav2vec 2.0, TextlessNLP, and SUPERB.
Training | Virtual | Machine Learning | Intermediate-Advanced
During this training, we will learn about processing text data, working with imbalanced data, and Poisson regression. We will start by processing text data with scikit-learn’s vectorizers. Since the output of these vectorizers is sparse, we will also review scikit-learn estimators that can handle sparse data. We will look at estimators with class weights, resampling techniques provided by imbalanced-learn, and using a bagging classifier with balancing. Next, we will explore how to work with imbalanced data where one of the classes appears more frequently than the others…more details
Thomas J. Fan is a Senior Software Engineer at Quansight Labs, working to sustain and evolve the PyData open-source ecosystem. He is a maintainer for scikit-learn, an open-source machine learning library written for Python. Previously, he worked at Columbia University, improving the interoperability between scikit-learn and AutoML systems. Thomas holds a Masters in Physics from Stony Brook University and a Masters in Mathematics from New York University.
Training | Virtual | NLP | Machine Learning | Intermediate
Named Entity Recognition (NER) and Relationship Extraction (RE) are foundational for many downstream NLP tasks such as Information Retrieval and Knowledge Base construction. While pre-trained models exist for both NER and RE tasks, they are usually specialized for some narrow application domain. If your application domain is different, your best bet is to train your own models. However, the costs associated with training, specifically generating training data, can be a significant deterrent for doing so…more details
Sujit Pal builds intelligent systems around research content that help researchers and medical professionals achieve better outcomes. His areas of interest are Information Retrieval, Natural Language Processing and Machine Learning (including Deep Learning). As an individual contributor in the Elsevier Labs team, he works with diverse product teams to help them solve tough problems in these areas, as well as build proofs of concept at the cutting edge of applied research.
Training | Virtual | Machine Learning | Beginner-Intermediate
Scikit-learn is a Python machine learning library used by data science practitioners from many disciplines. We will learn about cross-validation, tuning machine learning algorithms, and pandas interoperability during this training. Cross-validation enables us to evaluate our machine learning models by splitting our data into multiple training and testing datasets. We will learn to handle missing values with imputation using univariate and multivariate techniques. Next, we will explore tuning algorithms in scikit-learn with grid search and random search. We will learn about categorical features and how to use scikit-learn’s encoders to convert these categorical features into numerical features for a machine-learning algorithm to consume…more details
Thomas J. Fan is a Senior Software Engineer at Quansight Labs, working to sustain and evolve the PyData open-source ecosystem. He is a maintainer for scikit-learn, an open-source machine learning library written for Python. Previously, he worked at Columbia University, improving the interoperability between scikit-learn and AutoML systems. Thomas holds a Masters in Physics from Stony Brook University and a Masters in Mathematics from New York University.
Training | Virtual | Machine Learning | MLOps & Data Engineering | Intermediate
In this session, we will provide a tutorial on TensorFlow and Keras, and guide your through a series of hands-on examples ranging from basic MNIST dataset to time series processing for model building. We will also cover data input and output processing with TensorFlow, from processing simple CSV files to cloud data warehouse services such as Google Cloud BigQuery. As a bonus we will also cover the integration of TensorFlow with Apache Kafka, to illustrate the streaming data pipeline that is used broadly across the industry…more details
Yong Tang is the Director of Engineering at MobileIron. His most recent focus is on data processing in machine learning. He is a maintainer and the SIG I/O lead of the TensorFlow project. He received the Open Source Peer Bonus Award from Google for his contributions to TensorFlow and is the author of the Kafka Dataset module in TensorFlow. In addition to TensorFlow, Yong Tang also contributes to many other projects for the open-source community. He is a maintainer of Docker, CoreDNS, and SwarmKit. Yong Tang received his PhD in Computer Science & Engineering at the University of Florida.
Tutorial | In-person | Machine Learning Safety & Security | Responsible AI | Beginner – Intermediate
After presenting a brief history of cybersecurity data science, I’ll discuss different cybersecurity data science specializations, such as malware detection and intrusion detection. The workshop will also introduce prominent practitioners and companies within the cybersecurity data science field. Finally, I’ll cover pathways to becoming a cybersecurity data scientist…more details
John Speed Meyers is a security data scientist at Chainguard. His interests include software supply chain security, open source software security and applications of data science to cybersecurity. He has a PhD in policy analysis from the Pardee RAND Graduate School.
Training | In-person | Machine Learning Safety & Security | Responsible AI | Intermediate
We can easily trick a classifier into making embarrassingly false predictions. When this is done systematically and intentionally, it is called an adversarial attack. Specifically, this kind of attack is called an evasion attack. In this session, we will examine an evasion use case and elaborate on other forms of attacks. Then, we explain two defense methods: spatial smoothing preprocessing and adversarial training. Lastly, we will demonstrate one robustness evaluation method and one certification method to ascertain that the model can withstand such attacks…more details
Serg Masís is a Data Scientist in agriculture with a lengthy background in entrepreneurship and web/app development, and the author of the bestselling book “Interpretable Machine Learning with Python”. Passionate about machine learning interpretability, responsible AI, behavioral economics, and causal inference.
Tutorial | In-Person | Machine Learning for Biotech & Pharma | Beginner – Intermediate
In this tutorial session, attendees will learn how a set of open source tools can be leveraged to perform standardization, characterization, and data quality assessment for various health data sources. Open source tools including Synthea, ETL-Synthea, Achilles, Data Quality Dashboard, and Ares will be reviewed and demonstrated in a data operations pipeline. We will demonstrate how the global health information community leverages this strategy to ensure research-ready health data…more details
Frank DeFalco is the Director of Epidemiology Analytics at Janssen Research and Development where he architects software solutions and data platforms for the analysis and application of observational data sources. He is currently the leader and Benevolent Dictator of the OHDSI open source architecture working group. Frank is a presenter and panelist at OHDSI symposiums and has served as faculty for OHDSI symposium tutorials classes on architecture and common data model vocabulary.
In addition to leading the OHDSI Architecture working group Frank initiated development of a standardized platform for observational analytics known as ATLAS. He is an active contributor to the open source software repositories developed and released by OHDSI including ATLAS, WebAPI, Achilles, Circe, Arachne, Visualizations, Hermes, Helios and others. Frank’s areas of expertise include computation epidemiology, large scale data platforms, software development and architecture, data visualization and informatics.
Prior to joining Janssen Research and Development, Frank held the position of Senior Principal and Director of Collaboration and Analytics at British Telecom where he was a strategic advisor for multiple Fortune 100 companies across sectors including Consumer Products, Telecommunications and Pharmaceuticals. Frank received his undergraduate degrees in Computer Science and Psychology at Rutgers University.
Training | In-person | Machine Learning | Intermediate
This tutorial will show how to use XGBoost. It will demonstrate model creation, model tuning, model evaluation, and model interpretation…more details
Matt Harrison has been using Python since 2000. He runs MetaSnake, a Python and Data Science consultancy and corporate training shop. In the past, he has worked across the domains of search, build management and testing, business intelligence, and storage. He has presented and taught tutorials at conferences such as Strata, SciPy, SCALE, PyCON, and OSCON as well as local user conferences.
Workshop | In-person | Machine Learning Safety & Security | Machine Learning | Beginner – Intermediate
I will aim provide an important step to progress the AI for Cybersecurity discipline in this talk by summarizing the state of the field and promising future directions. I will offer a multi-disciplinary AI for Cybersecurity roadmap that centers on major themes such as cybersecurity applications and data, advanced AI methodologies for cybersecurity, and AI-enabled decision making. I will also provide examples of recent research at the intersection of AI and cybersecurity, particularly around detecting vulnerable code on GitHub repositories and detecting emerging threats from the Dark Web for proactive cyber threat intelligence capabilities…more details
Dr. Sagar Samtani is an Assistant Professor and Grant Thornton Scholar in the Department of Operations and Decision Technologies at Indiana University. Dr. Samtani graduated with his Ph.D. from the AI Lab from University of Arizona. Dr. Samtani’s research interests are in AI for Cybersecurity, developing deep learning approaches for cyber threat intelligence, vulnerability assessment, open-source software, AI risk management, and Dark Web analytics. He has received funding from NSF’s SaTC, CICI, and SFS programs and has published over 40 peer-reviewed articles in leading information systems, machine learning, and cybersecurity venues. He is deeply involved with industry, serving on the Board of Directors for the DEFCON AI Village and Executive Advisory Council for the CompTIA ISAO.
Tutorial | In-person | Deep Learning | Machine Learning | Beginner – Intermediate
In this talk, Sabrina Smai (Program Manager, Microsoft) shares the most recent updates to PyTorch Profiler, a demo and tips for leveraging the Profiler API to help you quickly locate and address common bottlenecks, as well as a look into what’s to come…more details
Sabrina Smai is a Product Manager in Microsoft’s AI Frameworks team. She works with all things PyTorch and ONNX Runtime.
Training | In-Person | NLP | Machine Learning for Biotech & Pharma| Intermediate-Advanced
Spark NLP is a Natural Language Processing (NLP) library built on top of Apache Spark ML. It provides simple, performant & accurate NLP annotations for machine learning pipelines that can scale easily in a distributed environment. Spark NLP comes with 3000+ pretrained pipelines and models in more than 200+ languages. It supports nearly all the NLP tasks and modules that can be used seamlessly in a cluster. Downloaded more than 1 million every month and experiencing 20x growth for the last one year, Spark NLP is used by 54% of healthcare organizations as the world’s most widely used NLP library in the enterprise. In this talk, Veysel will conduct a hands-on session to go over the library’s healthcare components and teach how to solve any NLP problem in healthcare with state-of-the-art methods and practices across the industry. He will also explain the best practices for building production-grade solutions around the latest research…more details
Veysel is a well known thought leader in healthcare NLP and works as a Lead Data Scientist and ML Engineer at John Snow Labs, improving the Spark NLP for the Healthcare library and delivering hands-on projects in Healthcare and Life Science. He is a seasoned data scientist with a strong background in every aspect of data science including NLP, machine learning, deep learning, and big data with over ten years of experience. He’s also pursuing his Ph.D. in ML at Leiden University, Netherlands, and delivers graduate-level lectures in Auto ML and Distributed Data Processing. He also has broad consulting experience in Statistics, Data Science, Software Architecture, MLOps, Machine Learning, and AI to several start-ups, boot camps, and companies around the globe. He also speaks at Data Science & AI events, conferences and workshops, and has delivered more than a hundred talks at international as well as national conferences and meetups.
Workshop | In-person | MLOps & Data Engineering | Intermediate
This session will explain:
Why a knowledge graph approach was chosen to power Beamery AI
Why RDF standard was chosen
How our core ontology was designed
Henri is Senior Knowledge Engineering at Beamery – a full-talent lifecycle scaleup making sense of enterprise people data, based in London and the US. At work, he specialises in ontology design, entity reconciliation and data provenance for semantic web databases. Henri has worked at a number of HRTech startups, and is passionate about modelling and serving AI models in the domain of people, skills, companies and occupations.
Tutorial | Virtual | Responsible AI | Machine Learning Safety & Security | All Levels
This talk will introduce recent AI incidents as the motivation for discussing governance in AI/ML. It will then address the fundamentals of model risk management and accountability structures for AI/ML systems, and also introduce novel approaches and technologies that can enhance AI/ML risk mitigation efforts. With government and public scrutiny on the rise, AI/ML’s wild west days could end sooner rather than later. Join this talk to learn some basics about making a better future for these ever more present technologies…more details
Tutorial | Virtual | Machine Learning Safety & Security | Responsible AI | Beginner
In this workshop, we introduce differential privacy and show how to do sql-like analytics using the Tumult Labs differential privacy platform. Tumult Analytics (https://www.tmlt.io/platform) is used by a variety of organizations — including the US Census Bureau, US Internal Revenue Service, and Wikimedia — to publicly share aggregate statistics about populations of interest. Tumult Analytics will be made open-source this year and is available now for trial users…more details
Ashwin Machanavajjhala is an Assistant Professor in the Department of Computer Science, Duke University and an Associate Director at the Information Initiative@Duke (iiD). Previously, he was a Senior Research Scientist in the Knowledge Management group at Yahoo! Research. His primary research interests lie in algorithms for ensuring privacy in statistical databases and augmented reality applications. He is a recipient of the National Science Foundation Faculty Early CAREER award in 2013, and the 2008 ACM SIGMOD Jim Gray Dissertation Award Honorable Mention. Ashwin graduated with a Ph.D. from the Department of Computer Science, Cornell University and a B.Tech in Computer Science and Engineering from the Indian Institute of Technology, Madras.
Michael Hay is an Associate Professor of Computer Science at Colgate University and founder/CTO of Tumult Labs, a startup that helps organizations safely release data using differential privacy. His research interests include data privacy, databases, data mining, machine learning, and social network analysis. He was previously a Research Data Scientist at the US Census Bureau and a Computing Innovation Fellow at Cornell University. He holds a Ph.D. from the University of Massachusetts Amherst and a bachelor’s degree from Dartmouth College. His research is supported by grants from DARPA and NSF.
Workshop | Virtual | NLP | Beginner – Intermediate
We are witnessing the rapid adoption of Pandas as a main library for representation and manipulation of structured data in python. In the contrary, when it comes to the Natural Language Processing (NLP) applications, we are usually using various NLP libraries with complex and more importantly incompatible output structures. That makes the integration of NLP features and solutions into the machine learning and data science pipeline difficult and time-consuming…more details
Monireh Ebrahimi is a Senior Cognitive Software Developer at IBM’s Center for Open-Source Data and AI Technologies (CODAIT) in San Francisco where she works on Open Source, Data & AI Technologies and she has been awarded “Outstanding Technical Award” in 2021. She has obtained her Ph.D. from Data Semantics (DaSe) lab at Kansas State University with a major focus on Neuro-Symbolic Integration. Throughout her dissertation titled “Generalizable Neuro-Symbolic Reasoners”, she covers her recent works to bridge the neural and symbolic divide in the context of deep deductive reasoners. Her primary research interests include Deep Learning, Knowledge Graphs, Reasoning, Semantic Web, and Natural Language Processing. She is also really interested in applying NLP and Data Science in real world applications and get the chance to work with customers and partners in various industries and help them in their Data Science journey. She has organized several tutorials on “Current and Future Trends of Neural Knowledge Graph Representation and Reasoning” at IJCAI 2020, US2TS 2019, and US2TS 2020 and served as a PC member or reviewer for Artificial Intelligence (NeurIPS, AAAI, IJCAI, ICML, ICLR, JAIR), and Semantic Web conferences (ISWC, ESWC, TheWebCon) and received Most Outstanding Reviewer Award from WWW 2017.
Sepideh Seifzadeh is Data Scientist/Machine Learning Engineer with IBM Data Science Elite team, based in San Francisco. After studying her PhD in the field of Computer Engineering at the Center for Pattern Analysis and Machine Intelligence at the University of Waterloo, she started my journey as Big data & Analytics consultant, she then joined IBM as open source solution engineer and was working for IBM Canada for 2 years where she was honored to be awarded “the best of IBM” in 2018. She recently moved from Toronto to San Francisco and joined DSE team. She really enjoy being on the team as she is learning how to apply Data Science in real world applications and get the chance to meet with different customers in different industries and help them in their Data Science journey. Also, she enjoys working with the top talented people in the team and have a chance to learn from them.
Tutorial | Virtual | Deep Learning | Machine Learning | Intermediate
Quantization is a common technique to speedup the inference time of your model by reducing the precision of the model, for example, to int8. In this tutorial, we will introduce the basics of quantization and quantization support in PyTorch…more details
Jerry Zhang is a Software Engineer in PyTorch Architecture Optimization team under AI Frameworks org in Meta. He has been working on PyTorch Quantization for the past three years, trying to provide self-serve and easy to use tools for people to optimize the inference speed of their model while maintaining accuracy. Before Meta, he was a master’s student in computer science at Carnegie Mellon University and he got his Bachelor’s degree in Computer Science and Technology from Zhejiang University, China.
Workshop | Virtual | Machine Learning | All Levels
This presentation will walk through an overview of GeoAI and spatio-temporal analytics tools and workflows used at ORNL. By attending this workshop, you will understand the basics of geospatial data and the tools used to analyze, process, and store geospatial data. Two hands-on examples will be presented to show how geospatial data and software are used in practice for specific applications…more details
Dalton Lunga is a senior R&D scientist and research group leader for GeoAI at the Oak Ridge National Laboratory. His research interests are in domain adaptation, manifold learning, unsupervised representation learning using deep learning approaches for geospatial imagery analytics. His technical background includes image processing, statistical machine learning, remote sensing, and geospatial data analysis. He currently conducts research and development in machine learning techniques and advanced workflows for handling large volumes of geospatial data. Before ORNL, Dalton worked as machine learning research scientist at the council for scientific and industrial research in South Africa on various projects. He received his Ph.D. in electrical and computer engineering from Purdue University, West Lafayette, Indiana.
Jacob Arndt received the B.S. degree in geography and the master’s degree in geographic information science (MGIS) from the University of Minnesota – Twin Cities, Minneapolis, MN, USA, in 2016 and 2018, respectively. He is a Geospatial Data Scientist with the Geospatial Artificial Intelligence (GeoAI) Group, Oak Ridge National Laboratory, Oak Ridge, TN, USA. His research interests include remote sensing, machine learning, and geographic information systems.
Jesse Piburn is a Research Scientist in Geographic Data Science at Oak Ridge National Laboratory. His work includes research and development in spatiotemporal analytics, time series data mining, and machine learning. As a member of the GeoAI group at ORNL, he has had the opportunity to work across multiple domains including population dynamics, socio-economic modeling, and image analysis. Jesse has a background in spatial statistics and geographic information science, he received his M.S. in Geography, and is currently pursuing a PhD in Data Science and Engineering from the University of Tennessee.
Workshop | Virtual | Machine Learning | Beginner
This session will cover the fundamentals of few-shot learning, covering some of the popular networks and loss functions used with the network, and application of few-shot learning techniques in real-world use cases. The session is targeted towards an audience having at least intermediate knowledge of neural networks and machine learning. The session will also go through the code – implementation of Siamese Neural Networks with Triplet Loss on Imagery data. At the end of the workshop, attendees will have a decent idea of few-shot learning techniques and its applications, and will gain a deeper understanding of deep learning overall…more details
Isha is a principal data scientist at Capital One, working in conversational AI space. Prior to that, I worked at Ericsson as a data scientist in the computer vision team. I completed my master’s from New York University from an Urban Data Science program in 2018. I have worked in different NYU research labs (NYU Urban Observatory, NYU Sounds of New York City (SONYC) lab). Before moving to New York, I lived in Hong Kong for 5 years, where I did my bachelors from Hong Kong University of Science & Tech (HKUST) in Environmental Technology and Computer Science and later worked in HKUST- Deutsche Telecom Systems and Media lab (an Augmented Reality and Computer Vision focused lab) as a Research Assistant.
Training | Virtual | Deep Learning | Machine Learning | Intermediate
This talk introduces the PyTorch lightning ecosystem one step at a time. We start with Lightning Flash, an AI factory and an easy-to-use framework for getting started with the Lightning ecosystem. With Flash, we showcase a basic NLP task that classifies the toxicity level of user comments. Then, we delve into the core components of the PyTorch Lightning framework. Using these components, we show how to implement state-of-the-art research without worrying about the hassles of engineering…more details
Jirika is working in Machine learning and Data science for several years. He has done Ph.D. in Medical Imaging. In parallel, he gains practical experience while he has been working for a few IT companies as a consultant or data scientist. Actually,he is focusing on exploring interesting world problems and solving them with state-of-the-art techniques.
He has developed several open-source python packages, He is the core contributor of `PyTorch-Lightning` and `TorchMetrics` and actively participating in other well-known projects.
Kaushik Bokka is a Senior Research Engineer at Grid.ai and one of the core maintainers of the PyTorch Lightning library. He has prior experience in building production scale Machine Learning and Computer Vision systems for several products ranging from Video Analytics to Fashion AI workflows. He has also been a contributor for few other open source projects and aims to empower the way people and organizations build AI applications.
Workshop | In-Person | Data Visualisation | Intermediate-Advanced
This workshop will teach you how to manipulate and structure your data for visualizations, graph elements, and their associated terminology, how to select the appropriate graph based on your data, and how to avoid common graphing mistakes. You will also learn how to customize data visualizations and give them the ‘personal touches’ that make them memorable to your audience…more details
Martin is a Senior Clinical Programmer at BioMarin, where he builds dashboards and tools for making data-informed decisions. Previously, Martin built statistical tools and dashboards for the Diabetes Technology Society, a contributing author for Data Journalism in R on the Northeastern University School of Journalism blog/website, and other volunteer and non-profit organizations. He’s a data journalism instructor for California State University, Chico. Martin holds a graduate degree in Clinical Research and is passionate about data literacy and open source technologies.
Peter is a hands-on data science leader with a business focused approach to building data science solutions and telling stories with data. Experienced in translating business problems into data products using advanced statistical techniques and ML to support decision making in a variety of rapid growth environments. Scaled data science solutions for user acquisition, retention, channel optimization, revenue and fraud at Lyft, Alibaba and Citrix. Currently leading Marketing Science for Growth at Nextdoor.
Tutorial | In-person | Big Data Analytics | MLOps & Data Engineering | Beginner – Intermediate
This talk will introduce Apache Flink as a general data processor for all these use cases on both finite and infinite streams. We demonstrate Flink’s SQL engine as a changelog processor, with an ecosystem tailored to process CDC data and maintain materialized views. We will use Kafka as an upsert log, Debezium, to connect to databases and enrich streams of various sources using different kinds of joins. Finally, we illustrate how to combine Flink’s Table API with DataStream API for event-driven applications beyond SQL…more details
Workshop | In-person | All Levels
Abstract Coming Soon!
Multi-discipline leader with experience in marketing, product management, and business leadership. Broad experience across multiple business functions including product management, marketing, biz dev, business transformation, GTM, and strategy.
Deep technical understanding of data platform, data science, mobility, OS/Platforms, networking, and Cloud technologies (PaaS, IaaS, SaaS)
Workshop | In-person | Machine Learning Safety & Security | Machine Learning
We will start with linear models and we will discuss under what conditions the weights of linear models can be used as explanations. I will introduce the pros and cons of perturbation feature importance which is a model-agnostic approach to calculate global explanations. Some model-specific explanations will also be briefly discussed. And we will close with what I consider to be the state of the art technique to calculate local feature importances: the Shapley Additive Explanations…more details
Andras Zsom is an assistant Professor of the Practice of Data Science and Director of Industry and Research Engagement at Brown University, Providence, RI. He works with high-level academic administrators to tackle predictive modeling problems, he collaborates with faculty members on data-intensive research projects, and he was the instructor of a data science course offered to the data science master students at Brown.
Tutorial | Virtual | Machine Learning | MLOps & Data Engineering | All Levels
In this tutorial, we present an overview of the field of machine programming (MP), which aims to automate certain aspects of software development. The goal of MP is not to replace programmers; instead, it is meant to help improve software developer productivity and software quality across many dimensions (e.g., correctness, performance, security, maintainability, etc.)…more details
Justin Gottschlich is the Founder, CEO & Chief Scientist of Merly, Inc. (http://merly.ai), a company aimed at making software developers more productive using state-of-the-art machine programming systems. Justin also has an academic appointment as an Adjunct Assistant Professor at the University of Pennsylvania. Before founding Merly, Justin was a Principal AI Scientist and the Founder & Director of Machine Programming Research at Intel Labs. In 2017, he co-founded the ACM SIGPLAN Machine Programming Symposium (MAPS) and now serves as its Steering Committee Chair. Justin also serves on the 2020 NSF Expeditions advisory board “Understanding the World Through Code” led by MIT Prof. Armando Solar-Lezama. Justin received his PhD in Computer Engineering from the University of Colorado-Boulder in 2011 and has 40+ peer-reviewed publications, 50+ issued patents, with 100+ patents pending. Justin’s research has been highlighted in venues like The New York Times, Communications of the ACM, MIT Technology Review, and The Wall Street Journal.
Workshop | Virtual | Big Data Analytics | Machine Learning | All Levels
Ready to show the world you can throw down better than the competition? Join us as we build a competitive rock/paper/scissors application, with built-in global leaderboard, in X minutes. Combining Roboflow’s computer vision capabilities with the Streamlit rapid deployment package–we’ll have an interactive production ready app ready for consumer adoption…more details
