Many companies are now utilizing data science and machine learning, but there’s still a lot of room for improvement in terms of ROI. A 2021 VentureBeat analysis suggests that 87% of AI models never make it to a production environment and an MIT Sloan Management Review article found that 70% of companies reported minimal impact from AI projects. Yet despite these difficulties, Gartner forecasts investment in artificial intelligence to reach an unprecedented $62.5 billion in 2022, an increase of 21.3% from 2021.

Nevertheless, we are still left with the question: How can we do machine learning better? To find out, we’ve taken some of the upcoming tutorials and workshops from ODSC East 2023 and let the experts via their topics guide us toward building better machine learning.

Mathematics for Data Science

Data science uses a combination of mathematics, statistics, and computer science to help us solve questions of importance in a large number of fields. In this workshop, we will introduce the underlying mathematical principles of the field, with example problems gleaned from a number of different industries. By the end of the workshop, the participant will know enough data science to explore their own problems, and be ready for more intermediate and advanced courses.

Speaker: Eric Eager, PhD | VP of Research and Development  | SumerSports

Using Data Science to Better Evaluate American Football Players

The game of football is undergoing a significant shift towards the quantitative. Much of the progress made in the analytics space can be attributed to play-by-play data and charting data. However, recent years have given rise to tracking data, which has opened the door for innovation that was not possible before. In this talk, Eric will describe how to gain an edge in player evaluation by building off of traditional charting data with state-of-the-art player tracking data, and foreshadow how such methods will revolutionize the sport of football in the future.

Speaker: Eric Eager, PhD | VP of Research and Development  | SumerSports

Improving ML Datasets with Cleanlab, a Standard Framework for Data-Centric AI

This talk will describe how high-level ideas from data-centric AI can be operationalized across a wide variety of datasets (image, text, tabular, etc), and introduce new algorithmic strategies to improve data that we have researched and published papers on with extensive benchmarks. Jonas will conclude with a discussion of connections to AutoML, where the data-centric AI movement is headed next, and key obstacles that deserve more attention.

Speaker: Jonas Mueller | Chief Scientist and Co-Founder | Cleanlab

If We Want AI to be Interpretable, We Need to Measure Interpretability

AI tools are ubiquitous, but most users treat them as a black box: a handy tool that suggests purchases, flags spam, or autocompletes text. While researchers have presented explanations for making AI less of a black box, a lack of metrics makes it hard to optimize explicitly for interpretability. Thus, Jordan proposes two metrics for interpretability suitable for unsupervised and supervised AI methods.

Speaker: Jordan Boyd-Graber, PhD | Associate Professor | University of Maryland

Towards the Next Generation of Artificial Intelligence with its Applications in Practice

Artificial intelligence has reached or surpassed human standards in the perceptual intelligence fields such as “listening, speaking, and seeing”, but it is still in its infancy in the field of cognitive intelligence that requires external knowledge, logical reasoning, or domain migration. After long-term exploration and verification of large-scale online businesses such as Taobao and Alipay, we have built a “super brain” through the extremely large scale pre-training model M6, and built “flexible limbs” through the edge-cloud collaboration platform Gemini, to fully present the comprehensive picture of the next generation of AI. In this talk, we will include details of state-of-the-art recommendation systems in practice, M6 and Gemini.

Speaker: Dr. Hongxia Yang, PhD | Director & Senior Staff Scientist | Joint Adjunct Professor, Alibaba | Zhejiang University

A Practical Tutorial on Building Machine Learning Demos with Gradio

In this workshop, we will cover how to build machine learning web applications using the Gradio library. First, we will start by walking through Gradio’s core concepts, Interface, and Blocks, and show how they can be used to create responsive machine learning web applications. Then we will apply these core concepts to build real-time leaderboards and dashboards.

Speaker: Freddy Boulton | Senior Software Engineer | Alteryx Innovation Labs

When Privacy Meets AI – Your Kick-Start Guide to Machine Learning with Synthetic Data

Over 80% of all AI projects fail – but a huge chunk of projects doesn’t even get started due to privacy constraints. Therefore Gartner predicts that by 2024 60% of all machine learning training data will be synthetic. High time to kick-start your synthetic data (SD) journey! Join Alexandra for a hands-on tutorial on synthetic data fundamentals to learn how to create synthetic data you can trust, assess its quality, and use it for privacy-preserving ML training. As a bonus, we’ll look into boosting your ML performance with smart upsampling.

Speaker: Alexandra Ebert | Chief Trust Officer | Chair of the IEEE Synthetic Data IC Expert Group | AI, Privacy & GDPR Expert, MOSTLY AI | EEE Standards Association | #humanAIze

Uncovering Behavioral Segments by Applying Unsupervised Learning to Location Data

Location data is a powerful tool. The places people go reflect who they are and what they care about – especially during the holiday season, when shopping is at a high. During the holiday season, shoppers often deviate from their normal habits, shopping more frequently and engaging in larger purchases. Therefore, to help marketers understand and best meet holiday shoppers’ needs, Foursquare applied unsupervised learning methods to location data to derive meaningful segments of individuals based on their demographics and shopping behaviors. Rather than invest in reaching more general segments like Moms or Millennials, marketers are now able to focus their efforts by targeting these data-driven segments during the 2022 holiday season.

Speaker: Ali Rossi | Data Science Tech Lead | Foursquare

Beyond Credit Scoring: Hybrid Scorecard Models for Accuracy and Interpretability

In this presentation, we will demonstrate how to combine subject-matter expertise and scorecard techniques with ML methods to build high-performing models whose outputs are immediately understandable. In combination with robust validation strategies, these models balance power and predictive accuracy and provide a useful tool to ML practitioners searching for interpretable modeling techniques.

Speaker: Tom Shafer, PhD | Lead Data Scientist | Elder Research

Generating Content-based Recommendations for Millions of Merchants and Products

Shopify’s Search and Discovery team is responsible for generating recommendations for millions of merchants that span many industries and countries. Specifically, when we think about content-based recommendations, we deal with product descriptions that vary in length, cleanliness, and even coherence. In this talk, we will explore the challenges they face when building content-based recommendation systems at Shopify, among other topics.

Speakers: Madhav Thaker, Senior Data Scientist and Chen Karako-Argaman, Senior Data Science Manager | Shopify

Learn Better Methods For Better Machine Learning at ODSC East 2023

To dive deeper into these topics, join us at ODSC East 2023 this May 9th-11th, either in-person or virtually.  The conference will also feature hands-on training sessions in focus areas, such as machine learning, deep learning, MLOps and data engineering, responsible AI, and more. What’s more, you can extend your immersive training to 4 days with a bootcamp pass. Check out all of our types of passes here.