AI EXPO & DEMO HALL
AI Solutions Showcase & Networking
BOSTON | MAY 9-10
HYNES CONVENTION CENTER

TALKS AND DEMOS
PARTNERS
NETWORKING EVENTS
ATTENDEES
AI EXPO & DEMO HALL
AI Solutions Showcase & Networking
BOSTON
MAY 9-10
HYNES CONVENTION CENTER
Discover How to Generate the Future with AI
Want to keep up with the latest AI developments, trends, and insights? Dealing with the build or buy dilemma to grow your business? Seeking to interact with data-obsessed peers and build your network?
Look no further: The ODSC AI Expo & Demo Hall is the right destination for you
Expo Hall Topics
Partner sessions offer compelling insights on how to make data science and AI work for your industry. Here are some of the topics you can expect at AI Expo & Demo Hall. Full agenda is coming soon.
In-Person | Keynote | Machine Learning | All Levels
Large Language Models (LLMs) like ChatGPT have taken the world by storm with their ability to answer questions, write essays and even compose lyrics. These tools have profound implications for industries like financial services, retail, and healthcare. However, most organizations have yet to take advantage of LLMs. Why? The amount of compute, data, and knowledge required to build a proprietary model is daunting for many, yet the alternative is reliance on LLMs that are only accessible behind API paywalls and compromising your data privacy. In this keynote, we will explore why training, deploying, and owning your own LLM is critical (and at times even imperative). We will discuss how you can train and deploy your own models, while protecting your data and your business IP. Spoiler alert: in contrast to common wisdom, ownership of your own LLM is within reach for most organizations, and provides major benefits in increased security, flexibility, and accuracy…more details
Hagay Lupesko is the VP of Engineering at MosaicML, where he focuses on making generative AI training and inference efficient, fast, and accessible. Prior to MosaicML, Hagay held AI engineering leadership roles at Meta, AWS, and GE Healthcare. He shipped products across various domains: from 3D medical imaging, through global-scale web systems, and up to deep learning systems that power apps and services used by billions of people worldwide.
Jay is a VP of our Artificial Intelligence and Machine Learning organization at Oracle Cloud. He completed a degree in neuroscience and started his career in technology at Oracle, maintaining an idea that these two paths would converge.
Virtual | Keynote | Machine Learning | All Levels
AI is red hot, but in practice many projects still fail. This talk will cover some of the key things you need to know to succeed, including:
– What current AI is and is not good for
– The difference between a demo and a product
– Pitfalls to avoid
– Organizing AI teams…more details
Pedro Domingos is a professor emeritus of computer science and engineering at the University of Washington and the author of The Master Algorithm. He is a winner of the SIGKDD Innovation Award and the IJCAI John McCarthy Award, two of the highest honors in data science and AI. He is a Fellow of the AAAS and AAAI, and has received an NSF CAREER Award, a Sloan Fellowship, a Fulbright Scholarship, an IBM Faculty Award, several best paper awards, and other distinctions. Pedro received an undergraduate degree (1988) and M.S. in Electrical Engineering and Computer Science (1992) from IST, in Lisbon, and an M.S. (1994) and Ph.D. (1997) in Information and Computer Science from the University of California at Irvine. He is the author or co-author of over 200 technical publications in machine learning, data mining, and other areas. He is a member of the editorial board of the Machine Learning journal, co-founder of the International Machine Learning Society, and past associate editor of JAIR. He was program co-chair of KDD-2003 and SRL-2009, and has served on the program committees of AAAI, ICML, IJCAI, KDD, NIPS, SIGMOD, UAI, WWW, and others. I’ve written for the Wall Street Journal, Spectator, Scientific American, Wired, and others. He helped start the fields of statistical relational AI, data stream mining, adversarial learning, machine learning for information integration, and influence maximization in social networks.
Virtual | Keynote | Machine Learning | All Tracks | All Levels
Modern medicine has given us effective tools to treat some of the most significant and burdensome diseases. At the same time, it is becoming consistently more challenging and more expensive to develop new therapeutics. A key factor in this trend is that we simply don’t understand the underlying biology of disease, and which interventions might meaningfully modulate clinical outcomes and in which patients. To achieve this goal, we are bringing together large amounts of high content data, taken both from humans and from human-derived cellular systems generated in our own lab…more details
Demo Talk | In-person
Thinking about incorporating relationships into your data to improve predictions and machine
learning models? Maybe you are creating a knowledge graph or looking for a way to improve
customer 360, fraud detection, or supply chain performance. Relationships are highly predictive
of behavior. With graphs, they’re embedded in the data itself, making it easy to unlock and add
predictive capabilities to your existing practices.
Join us for a demo to learn why graph databases are a top choice for scalable analytics, intelligent
app development, and advanced AI/ML pipelines. We’ll showcase graphs using Neo4j’s enterprise-
ready graph data platform. You’ll see firsthand how easy it is to get started and we’ll highlight a
graph use case using Neo4j’s cloud platform for Graph Data Science. All attendees will get a link to
download and try Neo4j for free using your own data.”
Sydney “Syd” became a graph enthusiast through her work with clients to build graph-based solutions as well as supporting data science teams during her time at Deloitte and Accenture. Now she uses her graph expertise, to help customers realize the value of graph technology for their organization. She also contributes by teaching Neo4j graph database and data science training classes. Syd’s hobbies include interior design and defeating her car navigation system’s estimated drive time.
Demo Talk | In-person
Oracle Cloud Infrastructure (OCI) is proud to showcase a new product demo of Stable Diffusion for game content creation using popular user interfaces and the 3D modeling tool Blender. OCI’s demo of Stable Diffusion is powered by NVIDIA A10 Tensor Core GPUs in the Oracle cloud. Stable Diffusion, an innovative deep learning model released in 2022, has been primarily used for generating detailed images based on text descriptions. However, its capabilities extend to creating game textures, models, depth maps, skins, and other game content.
Diffusion models can even be utilized for other modalities, enabling tasks as diverse as music generation. The combination of Stable Diffusion and Blender allows artists to create high-quality game assets with complete control over the creative process while benefitting from quicker creative iterations. Artists can further train Stable Diffusion on their individual styles and develop complete workflows that allow for greater creative freedom and flexibility in game development.
Allen is a Principal Machine Learning Architect and AI Researcher working for Oracle Cloud Infrastructure.
Demo Talk | In-person
In the Python open-source ecosystem, many packages are available that cater to:
– the building of great algorithms
– the visualization of data
Despite this, over 85% of Data Science Pilots remain pilots and do not make it to the production stage.
With Taipy, a new open-source Python framework, Data Scientists/Python Developers are able to build great pilots as well as stunning production-ready applications for end-users.
Taipy provides two independent modules: Taipy GUI and Taipy Core.
In this talk, we will demonstrate how:
Taipy-GUI goes way beyond the capabilities of the standard graphical stack: Gradio, Streamlit, Dash, etc.
Taipy Core fills a void in the standard Python back-end stack.
Florian Jacta is a specialist of Taipy, a low-code open-source Python package enabling any Python developers to easily develop a production-ready AI application. Package pre-sales and after-sales functions. He is data Scientist for Groupe Les Mousquetaires (Intermarche) and ATOS. He developed several Predictive Models as part of strategic AI projects. Also, Florian got his master’s degree in Applied Mathematics from INSA, Major in Data Science and Mathematical Optimization.
Albert has skills in machine learning and big data to solve (financial) optimization problems. He developed projects of different skill levels for Taipy’s tutorial videos. He got his degree from McGill University – Bachelor of Science. Major in Computer Science & Statistics. Minor in Finance.
Demo Talk | In-person
Integrating and unifying data from diverse sources is foundational to AI and ML workflows. This workshop will demonstrate how Anzo’s knowledge graph platform can create an enterprise-scale knowledge graph from several sources – setting organizations up for sustainable success with collective intelligence. During this workshop, users will:
- Create a sample knowledge graph from several sources
- Demonstrate flexible data preparation for training datasets
- Analyze the knowledge graph with native visualizations and graph algorithms
- Connect to the knowledge graph for additional data science operations
From its hyper agile in-memory MPP graph engine to its point-and-click user experience and open flexible architecture, Anzo transcends the limitations of traditional knowledge graphs and gives you all the capabilities and flexibilities that complex, enterprise-scale solutions need.
Join this demo to see why Anzo might be the solution you need.
A member of CSI for a decade, Greg has developed a wealth of expertise on knowledge graph technology. His true speciality lies demonstrating and developing custom solutions that leverage Anzo’s unique capabilities.
Demo Talk | Virtual
Modeling time series data is difficult due to its large quantities and constantly evolving nature. Existing techniques have limitations in scalability, agility, explainability, and accuracy. Despite 50 years of research, current techniques often fall short when applied to time series data. The Tangent Information Modeler (TIM) offers a game-changing approach with efficient and effective feature engineering based on Information Geometry. This multivariate modeling co-pilot can handle a wider range of time series use cases with award-winning results and incredible performance.
During this demo session we will showcase how best-in-class and very transparent time series models can be built with just one iteration through the data. We will cover several concrete use cases for advanced time series forecasting, anomaly detection and root cause analysis.
Philip Wauters is Customer Success Manager and Value engineer at Tangent Works working on practical applications of time series machine learning at customers from various industries such as Siemens, BASF, Borealis and Volkswagen. With a commercial background and experience with data engineering, analysis and data science his goal is to find and extract the business value in the enormous amounts of time-series data that exists at companies today.
Demo Talk | Virtual
Experienced machine learning engineers and data scientists care about ways to easily get their models up and running quickly and share ML assets across teams for collaboration. Collaborate and streamline the management of thousands of models across teams with new, innovative features in Azure Machine Learning. Come and join us in this interactive session with our product experts and get your questions answered on the latest capabilities in Azure Machine Learning!
My name is Seth Juarez. I currently live near Redmond, Washington and work for Microsoft.
I received my Bachelors Degree in Computer Science at UNLV with a Minor in Mathematics. I also completed a Masters Degree at the University of Utah in the field of Computer Science. I currently am interested in Artificial Intelligence specifically in the realm of Machine Learning. I currently work as a Program Manager in the Azure Artificial Intelligence Product Group.
I’ve been married now for 21 years to a fabulously talented woman and have two beautiful daughters, and two feisty sons.
Demo Talk | In-person
Great AI starts with great features. While the modern data stack has made self-service ingestion and consumption a reality for BI, AI data remains a huge challenge. Feature engineering is non-standard, ML pipelines are manual, and data governance is a nightmare – limiting the scalability you can achieve with AI. We will discuss the shortcomings of the modern data stack for AI, and practical approaches for creating a self-service data environment for data scientists. Learn about strategies to accelerate feature engineering and experimentation, shorten the time to deploy feature pipelines, and govern the data and infrastructure. And ultimately, to truly scale AI in your organization.
Presented by Razi Raziuddin, Youssef Idelcaid, FeatureByte
Demo Talk | In-person
Have you ever wandered into a department store and wondered if the item you are looking at is on sale? What if you had the ability to scan the item ‘yourself’ and determine if it was discounted? That ‘ability’ would make your shopping experience easier as you would not have to wait for customer service to look up your item. You would have the ability to find merchandise discounts, yourself, as you browse through the department store!
In this session, we will examine the model development, training, and deployment process we used to create a discount coupon app that is executed from an edge device – your phone. Join us as we walk you through the model development and deployment lifecycle, including topics of object vision, containerizing models, data streaming, and MLOps.
At the end of the session, you can try the app yourself.
Kaitlyn Abdo is an Associate Technical Marketing Manager, working on technical enablement surrounding the AI/ML products and services at Red Hat. She has been at Red Hat for 2 years, and is interested in discovering and learning about new and innovative solutions in the AI/ML space. In her free time, Kaitlyn enjoys building Legos, cooking and spending time with animals.
Demo Talk | Virtual
With DALLE and ChatGPT, we have reached incredible capabilities and results, fundamentally changing our ability to tap into and leverage unstructured data in machine learning. With that said, general architectural understanding and intuition into how these models make decisions do not translate into minute detail interpretability.
We’re at a crossroads. This new “”””breed”””” of ML applications is here to stay, and unstructured data is only growing, but they are black boxes, and black boxes fail silently. So how can we as practitioners leverage NLP and vision while enjoying similar monitoring, interpretability, and explainability available to their tabular counterparts?
In this talk, we will introduce Elemeta, our OSS meta-feature extractor library in Python, which applies a structured approach to unstructured data by extracting information from text and images to create enriched tabular representations. With Elemeta, practitioners can utilize structured ML monitoring techniques in addition to the typical latent embedding visualizations and engineer alternative features to be utilized in simpler models such as decision trees.
In this talk, we’ll introduce you to Elemeta through a live notebook example and explain how it can be applied to text and images.
Lior Durahly is a data and ML engineer at Superwise, where he is responsible for researching and developing monitoring capabilities related to Responsible AI, including feature importance, fairness, and explainability. He is also the key contributor to the OSS package Elemeta, a meta-feature extractor for NLP and vision.
Demo Talk | In-person
Modeling time series data is difficult due to its large quantities and constantly evolving nature. Existing techniques have limitations in scalability, agility, explainability, and accuracy. Despite 50 years of research, current techniques often fall short when applied to time series data. The Tangent Information Modeler (TIM) offers a game-changing approach with efficient and effective feature engineering based on Information Geometry. This multivariate modeling co-pilot can handle a wider range of time series use cases with award-winning results and incredible performance.
During this demo session we will showcase how best-in-class and very transparent time series models can be built with just one iteration through the data. We will cover several concrete use cases for advanced time series forecasting, anomaly detection and root cause analysis.
Philip Wauters is Customer Success Manager and Value engineer at Tangent Works working on practical applications of time series machine learning at customers from various industries such as Siemens, BASF, Borealis and Volkswagen. With a commercial background and experience with data engineering, analysis and data science his goal is to find and extract the business value in the enormous amounts of time-series data that exists at companies today.
Demo Talk | In-person
Yet another meeting to discuss nuisance hairsplitting details for your data taxonomies and keywords list?
It shouldn’t take a team of domain experts, Excel specialists, Python developers, and Data Scientists weeks or months to build it. It is a simple problem that requires a simple solution.
You should be able to quickly and accurately analyze contracts, customer comments, and any other text-based content while easily building explainable NLP models.
Stop scrubbing through volumes of data to find key examples and then reducing the content to specific keywords and variations.
Join us as we explore a new and exciting solution using human language to easily develop ontologies, data taxonomies, and keyword lists, which you can share across your business with just a few simple clicks.
Accelerate these NLP tasks in every project and help eliminate those long-drawn meetings to discuss keywords for data taxonomies. Unless you enjoy those nuisance meetings 🙂
Kenny’s experience in startup product management and marketing has led him to joining the Gram X team. He became interested in the AI industry due to its future potential impact on every industry and believes that the biggest successful companies will be so due to their leveraging of AI. Fun fact: Kenny has been to 43 states.
Demo Talk | In-person
The process of investigating model performance issues in production environments has long been plagued by complexity, inefficiency, and limited success. In this demo talk, we unveil a new paradigm for streamlining performance diagnostics and performing effective RCA. Our Root Cause Analysis enables you to effortlessly slice and dice production data and identify previously hidden relationships and valuable insights from your raw data. Easily collaborate to find when, where, and why issues originated to expedite response and remediation. Investigate any use case, any production issue, in every model type, and turn deep insights into actionable success.
Reah Miyara is Head of Product at Arize AI; a startup focused on ML Observability. He was previously at Google AI, where he led product development for research, tools, and infrastructure related to graph-based machine learning, data-driven large-scale optimization, and market economics. Reah’s experience as a team and product leader is extensive, building and growing products across a broad cross-section of the AI landscape. He’s played pivotal roles in ML and AI initiatives at IBM Watson, Intuit, and NASA Jet Propulsion Laboratory. Reah also co-led Google Research’s Responsible AI initiative, confronting the risks of AI being misused and taking steps to minimize AI’s negative influence on the world. He has a bachelor’s from UC Berkeley’s Electrical Engineering and Computer Science program and was the founder and president of the Cal UAV team in 2014.
Demo Talk | Virtual
Integrating and unifying data from diverse sources is foundational to AI and ML workflows. This workshop will demonstrate how Anzo’s knowledge graph platform can create an enterprise-scale knowledge graph from several sources – setting organizations up for sustainable success with collective intelligence. During this workshop, users will:
- Create a sample knowledge graph from several sources
- Demonstrate flexible data preparation for training datasets
- Analyze the knowledge graph with native visualizations and graph algorithms
- Connect to the knowledge graph for additional data science operations
From its hyper agile in-memory MPP graph engine to its point-and-click user experience and open flexible architecture, Anzo transcends the limitations of traditional knowledge graphs and gives you all the capabilities and flexibilities that complex, enterprise-scale solutions need.
Join this demo to see why Anzo might be the solution you need.
A member of CSI for a decade, Greg has developed a wealth of expertise on knowledge graph technology. His true speciality lies demonstrating and developing custom solutions that leverage Anzo’s unique capabilities.
Demo Talk | In-person
Learn why the truly open-source HPCC Systems platform is better at Big Data and offers an end-to-end solution for Developers and Data Scientists. Learn how ECL can empower you to build powerful data queries with ease. HPCC Systems, a comprehensive and dedicated data lake platform makes combining different types of data easier and faster than competing platforms — even data stored in massive, mixed schema data lakes — and it scales very quickly as your data needs grow. Topics include HPCC Architecture, Embedded Languages and external data stores, Machine Learning Library, Visualization, Application Security and more.
Bob has worked with the HPCC Systems technology platform and the ECL programming language for over a decade and has been a technical trainer for over 30 years. He is the developer and designer of the HPCC Systems Online Training Courses and is the Senior Instructor for all classroom and remote based training.
Demo Talk | In-person
We will be using the plotly library to create visualizations in S&P Global Marketplace Workbench, which is powered by Databricks, and showcasing a Databricks Dashboard from the different charts. I would say this demo talk is best suited for beginner – intermediate audience.
Dani Herzberg is an Analyst on the Product Management and Development team at S&P Global Market Intelligence. On this team, she creates notebooks in Databricks, assists in analytic visualizations of S&P Global data, and provides SQL query support. She holds a Master of Science in Business Analytics from Georgetown University.
Demo Talk | In-person (AI Startup Showcase)
As companies collect larger and larger amounts of data and apply more complex ML models, the time and resources required to build and maintain the models continues to grow. In fact, in the past decade, training compute time has been growing at a staggering pace of 10x per year! But do we really need so much data to build better models?
In this talk, we will walk you through DataHeroes’ Python-based framework that uses a unique data reduction methodology to reduce your dataset size by orders of magnitude while maintaining the statistical properties and corner cases of the full dataset. We will demonstrate how having the reduced dataset makes it significantly easier and faster to clean the data and train and tune the model, which produces a better and more accurate model, at a fraction of the time and cost.
Oren is a serial entrepreneur with 23 years of experience. In 2007, Oren founded DoubleVerify (NYSE: DV), which pioneered the ad verification category and grew to become a global leader in advertising measurement and analytics. In 2012, as CEO of DoubleVerify, Oren received the distinguished Technology Pioneers Award from the World Economic Forum in Davos. Later, Oren went on to start cClearly, an advertising optimization company, before starting his most recent company, DataHeroes.
Eitan has 10 Years of experience as a data scientist and has taught data science and machine learning at the Technion, Israel’s Institute of Technology. Prior to his data science career, Eitan served as a Systems Engineer in the Israeli Military Forces 8200 Unit.
Demo Talk | In-person
When putting models into production it’s critical to know how they’re performing over time. As the last mile of the data pipeline, models can be impacted by a variety of issues, often outside the control of the data science team. “Observability” promises to help teams detect and prevent issues that could impact their models—but what is observability vs. data observability vs. ML observability? Get practical answers and recommendations from Kyle Kirwan, former product leader for Uber’s metadata tools, and founder of data observability company, Bigeye.
Demo Talk | In-person
In this 25-minute demo, we will explore the top 5 cool tricks of Delta for data scientists and discuss why your data lake should be a Delta Lake. Delta Lake is an open-source storage layer that brings reliability to data lakes by providing ACID transactions, scalable metadata handling, and data versioning. We will first introduce the concept of Delta Lake and explain how it helps data scientists to manage their data pipelines with ease. We will then dive into the top 5 cool tricks of Delta Lake, which include performance optimizations, time travel, schema enforcement, automatic data merging, and data validation. We will demonstrate these tricks using real-world examples and show how they can simplify your data pipeline and reduce your development time. By the end of this talk, you will have a better understanding of Delta Lake’s features and how it can help you to manage your data lake efficiently. You will also have learned about the benefits of using Delta Lake and why it’s a must-have for data scientists working with large data sets.
Eric Vogelpohl is the Managing Director of Tech Strategy at Blueprint. He’s a proven IT professional with more than 20 years of experience and a high degree of technical and business acumen. He has an insatiable passion for all-things-tech, pro-cloud/SaaS, leadership, learning, and sharing ideas on how technology can turn data into information & transform user experiences.
Demo Talk | In-person
Location intelligence can provide valuable insights by leveraging geospatial data in machine
learning. This demonstration will showcase how machine learning and location information can
work together to help organizations extract more value from their data. We will use a
comprehensive suite of geocoding, spatial analytics, and data enrichment capabilities to
visualize and analyze data, identify patterns, and derive insights that can be used to make
informed business decisions.
In this session, we will use Amazon SageMaker to train a machine-learning model using
property attributes, historical weather data, and fire data. The goal will be to predict the fire
risk for a property. You will see how quickly and efficiently we can build and train a machine-
learning model using various algorithms, such as decision trees and neural networks, to find the
best approach for our dataset.
While this demo example will highlight fire predictions, these location intelligence solutions can
be applied across multiple industries, including financial services, telecommunications,
insurance, retail, real estate, and more. We look forward to discussing how you can leverage
geospatial data in your machine-learning models.
Mayank Kasturia is a Senior Sales Engineer at Precisely responsible for developing interactive demo applications, demonstrating proof of concepts (POC), and implementing solutions for customers using Precisely’s Geo Addressing, Spatial Analytics and Data Enrichment capabilities in big data and cloud-native environments.
Demo Talk | Virtual
Proper tracking is crucial for ensuring the reproducibility of results obtained during model development and fostering effective collaboration among multiple developers on a machine learning project. In this talk, Kristen will discuss the process of developing a dog detection system using YOLOv8 on-edge devices and the role of Comet, an experiment management platform, in handling the intricacies of the project.
Kristen will guide you through the entire process, from generating a data artifact to deploying the model, emphasizing the benefits of utilizing Comet at each stage. She will showcase how Comet was employed to monitor experiment metrics, visualize model performance, and illustrate the ease with which the selected model can be tracked in production. Participants will gain valuable insights on how to leverage an experiment tracking and monitoring solution like Comet to enhance their model development process, making it more transparent and reproducible.
Kristen is the founder of Data Moves Me, LLC and a data scientist who has been delivering innovative and actionable machine learning solutions across the utilities, healthcare and eCommerce since 2010. Kristen is #8 Global LinkedIn Top Voice in Data Science & Analytics and holds an MS in Applied Statistics and a BS in Mathematics.
Demo Talk | In-person
MLRun is an open-source MLOps orchestration framework. It exists to accelerate the integration of AI/ML applications into existing business workflows. MLRun introduces Data Scientists to a simple Python SDK that transforms their code into a production-quality application. It does so by abstracting the many layers involved in the MLOps pipeline. Developers can build, test, and tune their work anywhere and leverage MLRun to integrate with other components of their business workflow.
The capabilities of MLRun are extensive, and we will cover the basics to get you started. You will leave this session with enough information to:
- Get started with MLRun, on your own, in 10 minutes, so you can automate and accelerate your path to production and have your first AI app running in 20 minutes
- Run local move to Kubernetes
- Understand how your Python code can run as a Kubernetes job with no code changes
- Track your experiments
- Get an introduction to advanced MLOps topics using MLRun”
Demo Talk | Virtual
This technical talk delves into the paradigm shift from model-centric to data-centric AI, emphasizing the importance of data quality in improving machine learning outcomes. We will explore the current AI landscape and discuss the reasons behind this shift. Focusing on the Pachyderm platform for data-driven processing and versioning, attendees will learn practical steps and principles to streamline their data-centric AI efforts. This talk aims to equip practitioners with the knowledge and tools necessary to harness AI’s full potential by embracing a data-driven approach and leveraging Pachyderm’s innovative platform.
Jimmy Whitaker is the Chief Scientist of AI at Pachyderm. He focuses on creating a great data science experience and sharing best practices for how to use Pachyderm. When he isn’t at work, he’s either playing music or trying to learn something new, because “You suddenly understand something you’ve understood all your life, but in a new way.”
Demo Talk | In-person
With DALLE and ChatGPT, we have reached incredible capabilities and results, fundamentally changing our ability to tap into and leverage unstructured data in machine learning. With that said, general architectural understanding and intuition into how these models make decisions do not translate into minute detail interpretability.
We’re at a crossroads. This new “”””breed”””” of ML applications is here to stay, and unstructured data is only growing, but they are black boxes, and black boxes fail silently. So how can we as practitioners leverage NLP and vision while enjoying similar monitoring, interpretability, and explainability available to their tabular counterparts?
In this talk, we will introduce Elemeta, our OSS meta-feature extractor library in Python, which applies a structured approach to unstructured data by extracting information from text and images to create enriched tabular representations. With Elemeta, practitioners can utilize structured ML monitoring techniques in addition to the typical latent embedding visualizations and engineer alternative features to be utilized in simpler models such as decision trees.
In this talk, we’ll introduce you to Elemeta through a live notebook example and explain how it can be applied to text and images.
Lior Durahly is a data and ML engineer at Superwise, where he is responsible for researching and developing monitoring capabilities related to Responsible AI, including feature importance, fairness, and explainability. He is also the key contributor to the OSS package Elemeta, a meta-feature extractor for NLP and vision.
Demo Talk | In-person
Pressure to maximize ROI and reduce costs continues to grow for ML organizations. Data labeling is often one of the line items under the most scrutiny, so how do you get more efficient with your spending?
In this demo, we’ll walk through several ways that you can need fewer labels and make smarter choices on what you label through a combination of automation and human-in-the-loop.
Our clients using these techniques were able to:
- Reduce the amount of data labeled by up to 90%
- Find frames with rare objects within huge datasets
- Optimize annotation guidelines to improve their model performance
Cassie is a Senior Product Marketing Manager at CloudFactory, serving as the bridge between technical expertise and creative communications for their AI data labeling products. She holds an MBA from North Carolina State University and has spent her career in product marketing roles for B2B technology companies.
Demo Talk | In-person
Proper tracking is crucial for ensuring the reproducibility of results obtained during model development and fostering effective collaboration among multiple developers on a machine learning project. In this talk, Kristen will discuss the process of developing a dog detection system using YOLOv8 on-edge devices and the role of Comet, an experiment management platform, in handling the intricacies of the project.
Kristen will guide you through the entire process, from generating a data artifact to deploying the model, emphasizing the benefits of utilizing Comet at each stage. She will showcase how Comet was employed to monitor experiment metrics, visualize model performance, and illustrate the ease with which the selected model can be tracked in production. Participants will gain valuable insights on how to leverage an experiment tracking and monitoring solution like Comet to enhance their model development process, making it more transparent and reproducible.
Kristen is the founder of Data Moves Me, LLC and a data scientist who has been delivering innovative and actionable machine learning solutions across the utilities, healthcare and eCommerce since 2010. Kristen is #8 Global LinkedIn Top Voice in Data Science & Analytics and holds an MS in Applied Statistics and a BS in Mathematics.
Demo Talk | In-person
When OpenAI launched ChatGPT at the end of 2022, more than one million people had tried the model in just a week and this trend has only accelerated with ChatGPT recently reaching 100 Million monthly users. It’s clear that NLP and Generative Large Language Models are becoming mainstream. In this talk, you will learn how to enable ChatGPT on your own data with vector search.
Generative LLMs like ChatGPT are trained on huge datasets of open data from the internet. This enables them to have vast amounts of general knowledge about the world and natural language. However, there is one disadvantage: once trained, you can only use Generative Pretrained Transformers (GPT) only on the data that it was trained on. When you ask what today’s news is, ChatGPT can’t answer that question.
In order to benefit from the capabilities of LLMs like ChatGPT in real-life use cases, it would be ideal if we could apply the generative power to new or custom data, such as a chatbot for your e-commerce platform which has knowledge about the products you sell, your detailed return policies or specific promotions currently going on. This becomes possible if we combine ChatGPT with a vector search engine! Integrating a generative LLM with a vector search engine allows you to filter through your entire personal database and search for information relevant to a prompt, which can then be provided to ChatGPT along with the prompt. This framework allows you to harness the LLM model’s power to solve tasks grounded in the context of your own data!
In this demo, we will show you how ChatGPT can be implemented with the open-source vector search engine Weaviate in live demos.
You will leave this talk with a solid understanding of how you can enable ChatGPT on your own data using vector search. Whether you’re a data scientist, developer, or NLP enthusiast, this talk will provide valuable insights and practical skills for enhancing your NLP projects with vector search and OpenAI’s ChatGPT.
Zain Hasan is a Senior Developer Advocate at Weaviate, an open-source vector database. He is an engineer and data scientist by training, who pursued his undergraduate and graduate work at the University of Toronto, building artificially intelligent assistive technologies for elderly patients. He then founded his company, developing a digital health platform that leveraged machine learning to remotely monitor chronically ill patients using data from their medical devices. More recently, he practised as a consultant senior data scientist in Toronto. He is passionate about the field of data science and machine learning and loves to share his love for the field with anyone interested in the domain.
Demo Talk | In-person
This technical talk delves into the paradigm shift from model-centric to data-centric AI, emphasizing the importance of data quality in improving machine learning outcomes. We will explore the current AI landscape and discuss the reasons behind this shift. Focusing on the Pachyderm platform for data-driven processing and versioning, attendees will learn practical steps and principles to streamline their data-centric AI efforts. This talk aims to equip practitioners with the knowledge and tools necessary to harness AI’s full potential by embracing a data-driven approach and leveraging Pachyderm’s innovative platform.
Jimmy Whitaker is the Chief Scientist of AI at Pachyderm. He focuses on creating a great data science experience and sharing best practices for how to use Pachyderm. When he isn’t at work, he’s either playing music or trying to learn something new, because “You suddenly understand something you’ve understood all your life, but in a new way.”
Demo Talk | In-person
Beacon Analytics helps customers transition from rigid and monolithic data solutions to flexible microservices architecture, enabling better performance and faster access to critical information. By breaking up data into smaller, independent services, customers gain greater access and modification capabilities. The team recommends using the Polars library, which is based on Apache Arrow, in combination with Dash Plotly to create easy to maintain, high-performance solutions at an excellent price-to-performance ratio. Join Danny Bharat, Senior Vice President of Analytics at Cedric Millar and co-founder of Beacon Analytics, as he shares how his team’s innovative approach to data solutions allows them to build comprehensive 360° intelligence and deliver actionable insights. Beacon Analytics empowers customers to achieve success in a rapidly changing business and technology landscape by utilizing schema-on-read approaches, unstructured data storage, and on-the-fly analysis and transformation.
Danny Bharat is a seasoned supply chain industry professional and the Senior Vice President of Analytics at Cedric Millar Integrated Solutions. As a co-founder of Beacon Analytics, powered by Cedric Millar, he leads a growing team of solutions architects and data scientists in delivering comprehensive business intelligence and supply-chain solutions for end-to-end operations. With a deep focus on corporate planning, strategy, and digital transformation, Danny has accumulated a wealth of experience in multiple industries. He is dedicated to encouraging continuous professional growth and development through mentorship. Danny strongly believes that leaders with technical competence are more effective, and he practices what he preaches by being a self-taught dabbler in Python and DAX languages. He is passionate about using his expertise to help businesses succeed and deliver exceptional results for their customers.
In-Person | Keynote | Machine Learning | All Levels
Large Language Models (LLMs) like ChatGPT have taken the world by storm with their ability to answer questions, write essays and even compose lyrics. These tools have profound implications for industries like financial services, retail, and healthcare. However, most organizations have yet to take advantage of LLMs. Why? The amount of compute, data, and knowledge required to build a proprietary model is daunting for many, yet the alternative is reliance on LLMs that are only accessible behind API paywalls and compromising your data privacy. In this keynote, we will explore why training, deploying, and owning your own LLM is critical (and at times even imperative). We will discuss how you can train and deploy your own models, while protecting your data and your business IP. Spoiler alert: in contrast to common wisdom, ownership of your own LLM is within reach for most organizations, and provides major benefits in increased security, flexibility, and accuracy…more details
Hagay Lupesko is the VP of Engineering at MosaicML, where he focuses on making generative AI training and inference efficient, fast, and accessible. Prior to MosaicML, Hagay held AI engineering leadership roles at Meta, AWS, and GE Healthcare. He shipped products across various domains: from 3D medical imaging, through global-scale web systems, and up to deep learning systems that power apps and services used by billions of people worldwide.
Jay is a VP of our Artificial Intelligence and Machine Learning organization at Oracle Cloud. He completed a degree in neuroscience and started his career in technology at Oracle, maintaining an idea that these two paths would converge.
Virtual | Keynote | Machine Learning | All Levels
AI is red hot, but in practice many projects still fail. This talk will cover some of the key things you need to know to succeed, including:
– What current AI is and is not good for
– The difference between a demo and a product
– Pitfalls to avoid
– Organizing AI teams…more details
Pedro Domingos is a professor emeritus of computer science and engineering at the University of Washington and the author of The Master Algorithm. He is a winner of the SIGKDD Innovation Award and the IJCAI John McCarthy Award, two of the highest honors in data science and AI. He is a Fellow of the AAAS and AAAI, and has received an NSF CAREER Award, a Sloan Fellowship, a Fulbright Scholarship, an IBM Faculty Award, several best paper awards, and other distinctions. Pedro received an undergraduate degree (1988) and M.S. in Electrical Engineering and Computer Science (1992) from IST, in Lisbon, and an M.S. (1994) and Ph.D. (1997) in Information and Computer Science from the University of California at Irvine. He is the author or co-author of over 200 technical publications in machine learning, data mining, and other areas. He is a member of the editorial board of the Machine Learning journal, co-founder of the International Machine Learning Society, and past associate editor of JAIR. He was program co-chair of KDD-2003 and SRL-2009, and has served on the program committees of AAAI, ICML, IJCAI, KDD, NIPS, SIGMOD, UAI, WWW, and others. I’ve written for the Wall Street Journal, Spectator, Scientific American, Wired, and others. He helped start the fields of statistical relational AI, data stream mining, adversarial learning, machine learning for information integration, and influence maximization in social networks.
Virtual | Keynote | Machine Learning | All Tracks | All Levels
Modern medicine has given us effective tools to treat some of the most significant and burdensome diseases. At the same time, it is becoming consistently more challenging and more expensive to develop new therapeutics. A key factor in this trend is that we simply don’t understand the underlying biology of disease, and which interventions might meaningfully modulate clinical outcomes and in which patients. To achieve this goal, we are bringing together large amounts of high content data, taken both from humans and from human-derived cellular systems generated in our own lab…more details
Demo Talk | In-person
Thinking about incorporating relationships into your data to improve predictions and machine
learning models? Maybe you are creating a knowledge graph or looking for a way to improve
customer 360, fraud detection, or supply chain performance. Relationships are highly predictive
of behavior. With graphs, they’re embedded in the data itself, making it easy to unlock and add
predictive capabilities to your existing practices.
Join us for a demo to learn why graph databases are a top choice for scalable analytics, intelligent
app development, and advanced AI/ML pipelines. We’ll showcase graphs using Neo4j’s enterprise-
ready graph data platform. You’ll see firsthand how easy it is to get started and we’ll highlight a
graph use case using Neo4j’s cloud platform for Graph Data Science. All attendees will get a link to
download and try Neo4j for free using your own data.”
Sydney “Syd” became a graph enthusiast through her work with clients to build graph-based solutions as well as supporting data science teams during her time at Deloitte and Accenture. Now she uses her graph expertise, to help customers realize the value of graph technology for their organization. She also contributes by teaching Neo4j graph database and data science training classes. Syd’s hobbies include interior design and defeating her car navigation system’s estimated drive time.
Demo Talk | In-person
Oracle Cloud Infrastructure (OCI) is proud to showcase a new product demo of Stable Diffusion for game content creation using popular user interfaces and the 3D modeling tool Blender. OCI’s demo of Stable Diffusion is powered by NVIDIA A10 Tensor Core GPUs in the Oracle cloud. Stable Diffusion, an innovative deep learning model released in 2022, has been primarily used for generating detailed images based on text descriptions. However, its capabilities extend to creating game textures, models, depth maps, skins, and other game content.
Diffusion models can even be utilized for other modalities, enabling tasks as diverse as music generation. The combination of Stable Diffusion and Blender allows artists to create high-quality game assets with complete control over the creative process while benefitting from quicker creative iterations. Artists can further train Stable Diffusion on their individual styles and develop complete workflows that allow for greater creative freedom and flexibility in game development.
Allen is a Principal Machine Learning Architect and AI Researcher working for Oracle Cloud Infrastructure.
Demo Talk | In-person
In the Python open-source ecosystem, many packages are available that cater to:
– the building of great algorithms
– the visualization of data
Despite this, over 85% of Data Science Pilots remain pilots and do not make it to the production stage.
With Taipy, a new open-source Python framework, Data Scientists/Python Developers are able to build great pilots as well as stunning production-ready applications for end-users.
Taipy provides two independent modules: Taipy GUI and Taipy Core.
In this talk, we will demonstrate how:
Taipy-GUI goes way beyond the capabilities of the standard graphical stack: Gradio, Streamlit, Dash, etc.
Taipy Core fills a void in the standard Python back-end stack.
Florian Jacta is a specialist of Taipy, a low-code open-source Python package enabling any Python developers to easily develop a production-ready AI application. Package pre-sales and after-sales functions. He is data Scientist for Groupe Les Mousquetaires (Intermarche) and ATOS. He developed several Predictive Models as part of strategic AI projects. Also, Florian got his master’s degree in Applied Mathematics from INSA, Major in Data Science and Mathematical Optimization.
Albert has skills in machine learning and big data to solve (financial) optimization problems. He developed projects of different skill levels for Taipy’s tutorial videos. He got his degree from McGill University – Bachelor of Science. Major in Computer Science & Statistics. Minor in Finance.
Demo Talk | In-person
Integrating and unifying data from diverse sources is foundational to AI and ML workflows. This workshop will demonstrate how Anzo’s knowledge graph platform can create an enterprise-scale knowledge graph from several sources – setting organizations up for sustainable success with collective intelligence. During this workshop, users will:
- Create a sample knowledge graph from several sources
- Demonstrate flexible data preparation for training datasets
- Analyze the knowledge graph with native visualizations and graph algorithms
- Connect to the knowledge graph for additional data science operations
From its hyper agile in-memory MPP graph engine to its point-and-click user experience and open flexible architecture, Anzo transcends the limitations of traditional knowledge graphs and gives you all the capabilities and flexibilities that complex, enterprise-scale solutions need.
Join this demo to see why Anzo might be the solution you need.
A member of CSI for a decade, Greg has developed a wealth of expertise on knowledge graph technology. His true speciality lies demonstrating and developing custom solutions that leverage Anzo’s unique capabilities.
Demo Talk | Virtual
Modeling time series data is difficult due to its large quantities and constantly evolving nature. Existing techniques have limitations in scalability, agility, explainability, and accuracy. Despite 50 years of research, current techniques often fall short when applied to time series data. The Tangent Information Modeler (TIM) offers a game-changing approach with efficient and effective feature engineering based on Information Geometry. This multivariate modeling co-pilot can handle a wider range of time series use cases with award-winning results and incredible performance.
During this demo session we will showcase how best-in-class and very transparent time series models can be built with just one iteration through the data. We will cover several concrete use cases for advanced time series forecasting, anomaly detection and root cause analysis.
Philip Wauters is Customer Success Manager and Value engineer at Tangent Works working on practical applications of time series machine learning at customers from various industries such as Siemens, BASF, Borealis and Volkswagen. With a commercial background and experience with data engineering, analysis and data science his goal is to find and extract the business value in the enormous amounts of time-series data that exists at companies today.
Demo Talk | Virtual
Experienced machine learning engineers and data scientists care about ways to easily get their models up and running quickly and share ML assets across teams for collaboration. Collaborate and streamline the management of thousands of models across teams with new, innovative features in Azure Machine Learning. Come and join us in this interactive session with our product experts and get your questions answered on the latest capabilities in Azure Machine Learning!
My name is Seth Juarez. I currently live near Redmond, Washington and work for Microsoft.
I received my Bachelors Degree in Computer Science at UNLV with a Minor in Mathematics. I also completed a Masters Degree at the University of Utah in the field of Computer Science. I currently am interested in Artificial Intelligence specifically in the realm of Machine Learning. I currently work as a Program Manager in the Azure Artificial Intelligence Product Group.
I’ve been married now for 21 years to a fabulously talented woman and have two beautiful daughters, and two feisty sons.
Demo Talk | In-person
Great AI starts with great features. While the modern data stack has made self-service ingestion and consumption a reality for BI, AI data remains a huge challenge. Feature engineering is non-standard, ML pipelines are manual, and data governance is a nightmare – limiting the scalability you can achieve with AI. We will discuss the shortcomings of the modern data stack for AI, and practical approaches for creating a self-service data environment for data scientists. Learn about strategies to accelerate feature engineering and experimentation, shorten the time to deploy feature pipelines, and govern the data and infrastructure. And ultimately, to truly scale AI in your organization.
Presented by Razi Raziuddin, Youssef Idelcaid, FeatureByte
Demo Talk | In-person
Have you ever wandered into a department store and wondered if the item you are looking at is on sale? What if you had the ability to scan the item ‘yourself’ and determine if it was discounted? That ‘ability’ would make your shopping experience easier as you would not have to wait for customer service to look up your item. You would have the ability to find merchandise discounts, yourself, as you browse through the department store!
In this session, we will examine the model development, training, and deployment process we used to create a discount coupon app that is executed from an edge device – your phone. Join us as we walk you through the model development and deployment lifecycle, including topics of object vision, containerizing models, data streaming, and MLOps.
At the end of the session, you can try the app yourself.
Kaitlyn Abdo is an Associate Technical Marketing Manager, working on technical enablement surrounding the AI/ML products and services at Red Hat. She has been at Red Hat for 2 years, and is interested in discovering and learning about new and innovative solutions in the AI/ML space. In her free time, Kaitlyn enjoys building Legos, cooking and spending time with animals.
Demo Talk | Virtual
With DALLE and ChatGPT, we have reached incredible capabilities and results, fundamentally changing our ability to tap into and leverage unstructured data in machine learning. With that said, general architectural understanding and intuition into how these models make decisions do not translate into minute detail interpretability.
We’re at a crossroads. This new “”””breed”””” of ML applications is here to stay, and unstructured data is only growing, but they are black boxes, and black boxes fail silently. So how can we as practitioners leverage NLP and vision while enjoying similar monitoring, interpretability, and explainability available to their tabular counterparts?
In this talk, we will introduce Elemeta, our OSS meta-feature extractor library in Python, which applies a structured approach to unstructured data by extracting information from text and images to create enriched tabular representations. With Elemeta, practitioners can utilize structured ML monitoring techniques in addition to the typical latent embedding visualizations and engineer alternative features to be utilized in simpler models such as decision trees.
In this talk, we’ll introduce you to Elemeta through a live notebook example and explain how it can be applied to text and images.
Lior Durahly is a data and ML engineer at Superwise, where he is responsible for researching and developing monitoring capabilities related to Responsible AI, including feature importance, fairness, and explainability. He is also the key contributor to the OSS package Elemeta, a meta-feature extractor for NLP and vision.
Demo Talk | In-person
Modeling time series data is difficult due to its large quantities and constantly evolving nature. Existing techniques have limitations in scalability, agility, explainability, and accuracy. Despite 50 years of research, current techniques often fall short when applied to time series data. The Tangent Information Modeler (TIM) offers a game-changing approach with efficient and effective feature engineering based on Information Geometry. This multivariate modeling co-pilot can handle a wider range of time series use cases with award-winning results and incredible performance.
During this demo session we will showcase how best-in-class and very transparent time series models can be built with just one iteration through the data. We will cover several concrete use cases for advanced time series forecasting, anomaly detection and root cause analysis.
Philip Wauters is Customer Success Manager and Value engineer at Tangent Works working on practical applications of time series machine learning at customers from various industries such as Siemens, BASF, Borealis and Volkswagen. With a commercial background and experience with data engineering, analysis and data science his goal is to find and extract the business value in the enormous amounts of time-series data that exists at companies today.
Demo Talk | In-person
Yet another meeting to discuss nuisance hairsplitting details for your data taxonomies and keywords list?
It shouldn’t take a team of domain experts, Excel specialists, Python developers, and Data Scientists weeks or months to build it. It is a simple problem that requires a simple solution.
You should be able to quickly and accurately analyze contracts, customer comments, and any other text-based content while easily building explainable NLP models.
Stop scrubbing through volumes of data to find key examples and then reducing the content to specific keywords and variations.
Join us as we explore a new and exciting solution using human language to easily develop ontologies, data taxonomies, and keyword lists, which you can share across your business with just a few simple clicks.
Accelerate these NLP tasks in every project and help eliminate those long-drawn meetings to discuss keywords for data taxonomies. Unless you enjoy those nuisance meetings 🙂
Kenny’s experience in startup product management and marketing has led him to joining the Gram X team. He became interested in the AI industry due to its future potential impact on every industry and believes that the biggest successful companies will be so due to their leveraging of AI. Fun fact: Kenny has been to 43 states.
Demo Talk | In-person
The process of investigating model performance issues in production environments has long been plagued by complexity, inefficiency, and limited success. In this demo talk, we unveil a new paradigm for streamlining performance diagnostics and performing effective RCA. Our Root Cause Analysis enables you to effortlessly slice and dice production data and identify previously hidden relationships and valuable insights from your raw data. Easily collaborate to find when, where, and why issues originated to expedite response and remediation. Investigate any use case, any production issue, in every model type, and turn deep insights into actionable success.
Reah Miyara is Head of Product at Arize AI; a startup focused on ML Observability. He was previously at Google AI, where he led product development for research, tools, and infrastructure related to graph-based machine learning, data-driven large-scale optimization, and market economics. Reah’s experience as a team and product leader is extensive, building and growing products across a broad cross-section of the AI landscape. He’s played pivotal roles in ML and AI initiatives at IBM Watson, Intuit, and NASA Jet Propulsion Laboratory. Reah also co-led Google Research’s Responsible AI initiative, confronting the risks of AI being misused and taking steps to minimize AI’s negative influence on the world. He has a bachelor’s from UC Berkeley’s Electrical Engineering and Computer Science program and was the founder and president of the Cal UAV team in 2014.
Demo Talk | Virtual
Integrating and unifying data from diverse sources is foundational to AI and ML workflows. This workshop will demonstrate how Anzo’s knowledge graph platform can create an enterprise-scale knowledge graph from several sources – setting organizations up for sustainable success with collective intelligence. During this workshop, users will:
- Create a sample knowledge graph from several sources
- Demonstrate flexible data preparation for training datasets
- Analyze the knowledge graph with native visualizations and graph algorithms
- Connect to the knowledge graph for additional data science operations
From its hyper agile in-memory MPP graph engine to its point-and-click user experience and open flexible architecture, Anzo transcends the limitations of traditional knowledge graphs and gives you all the capabilities and flexibilities that complex, enterprise-scale solutions need.
Join this demo to see why Anzo might be the solution you need.
A member of CSI for a decade, Greg has developed a wealth of expertise on knowledge graph technology. His true speciality lies demonstrating and developing custom solutions that leverage Anzo’s unique capabilities.
Demo Talk | In-person
Learn why the truly open-source HPCC Systems platform is better at Big Data and offers an end-to-end solution for Developers and Data Scientists. Learn how ECL can empower you to build powerful data queries with ease. HPCC Systems, a comprehensive and dedicated data lake platform makes combining different types of data easier and faster than competing platforms — even data stored in massive, mixed schema data lakes — and it scales very quickly as your data needs grow. Topics include HPCC Architecture, Embedded Languages and external data stores, Machine Learning Library, Visualization, Application Security and more.
Bob has worked with the HPCC Systems technology platform and the ECL programming language for over a decade and has been a technical trainer for over 30 years. He is the developer and designer of the HPCC Systems Online Training Courses and is the Senior Instructor for all classroom and remote based training.
Demo Talk | In-person
We will be using the plotly library to create visualizations in S&P Global Marketplace Workbench, which is powered by Databricks, and showcasing a Databricks Dashboard from the different charts. I would say this demo talk is best suited for beginner – intermediate audience.
Dani Herzberg is an Analyst on the Product Management and Development team at S&P Global Market Intelligence. On this team, she creates notebooks in Databricks, assists in analytic visualizations of S&P Global data, and provides SQL query support. She holds a Master of Science in Business Analytics from Georgetown University.
Demo Talk | In-person (AI Startup Showcase)
As companies collect larger and larger amounts of data and apply more complex ML models, the time and resources required to build and maintain the models continues to grow. In fact, in the past decade, training compute time has been growing at a staggering pace of 10x per year! But do we really need so much data to build better models?
In this talk, we will walk you through DataHeroes’ Python-based framework that uses a unique data reduction methodology to reduce your dataset size by orders of magnitude while maintaining the statistical properties and corner cases of the full dataset. We will demonstrate how having the reduced dataset makes it significantly easier and faster to clean the data and train and tune the model, which produces a better and more accurate model, at a fraction of the time and cost.
Oren is a serial entrepreneur with 23 years of experience. In 2007, Oren founded DoubleVerify (NYSE: DV), which pioneered the ad verification category and grew to become a global leader in advertising measurement and analytics. In 2012, as CEO of DoubleVerify, Oren received the distinguished Technology Pioneers Award from the World Economic Forum in Davos. Later, Oren went on to start cClearly, an advertising optimization company, before starting his most recent company, DataHeroes.
Eitan has 10 Years of experience as a data scientist and has taught data science and machine learning at the Technion, Israel’s Institute of Technology. Prior to his data science career, Eitan served as a Systems Engineer in the Israeli Military Forces 8200 Unit.
Demo Talk | In-person
When putting models into production it’s critical to know how they’re performing over time. As the last mile of the data pipeline, models can be impacted by a variety of issues, often outside the control of the data science team. “Observability” promises to help teams detect and prevent issues that could impact their models—but what is observability vs. data observability vs. ML observability? Get practical answers and recommendations from Kyle Kirwan, former product leader for Uber’s metadata tools, and founder of data observability company, Bigeye.
Demo Talk | In-person
In this 25-minute demo, we will explore the top 5 cool tricks of Delta for data scientists and discuss why your data lake should be a Delta Lake. Delta Lake is an open-source storage layer that brings reliability to data lakes by providing ACID transactions, scalable metadata handling, and data versioning. We will first introduce the concept of Delta Lake and explain how it helps data scientists to manage their data pipelines with ease. We will then dive into the top 5 cool tricks of Delta Lake, which include performance optimizations, time travel, schema enforcement, automatic data merging, and data validation. We will demonstrate these tricks using real-world examples and show how they can simplify your data pipeline and reduce your development time. By the end of this talk, you will have a better understanding of Delta Lake’s features and how it can help you to manage your data lake efficiently. You will also have learned about the benefits of using Delta Lake and why it’s a must-have for data scientists working with large data sets.
Eric Vogelpohl is the Managing Director of Tech Strategy at Blueprint. He’s a proven IT professional with more than 20 years of experience and a high degree of technical and business acumen. He has an insatiable passion for all-things-tech, pro-cloud/SaaS, leadership, learning, and sharing ideas on how technology can turn data into information & transform user experiences.
Demo Talk | In-person
Location intelligence can provide valuable insights by leveraging geospatial data in machine
learning. This demonstration will showcase how machine learning and location information can
work together to help organizations extract more value from their data. We will use a
comprehensive suite of geocoding, spatial analytics, and data enrichment capabilities to
visualize and analyze data, identify patterns, and derive insights that can be used to make
informed business decisions.
In this session, we will use Amazon SageMaker to train a machine-learning model using
property attributes, historical weather data, and fire data. The goal will be to predict the fire
risk for a property. You will see how quickly and efficiently we can build and train a machine-
learning model using various algorithms, such as decision trees and neural networks, to find the
best approach for our dataset.
While this demo example will highlight fire predictions, these location intelligence solutions can
be applied across multiple industries, including financial services, telecommunications,
insurance, retail, real estate, and more. We look forward to discussing how you can leverage
geospatial data in your machine-learning models.
Mayank Kasturia is a Senior Sales Engineer at Precisely responsible for developing interactive demo applications, demonstrating proof of concepts (POC), and implementing solutions for customers using Precisely’s Geo Addressing, Spatial Analytics and Data Enrichment capabilities in big data and cloud-native environments.
Demo Talk | Virtual
Proper tracking is crucial for ensuring the reproducibility of results obtained during model development and fostering effective collaboration among multiple developers on a machine learning project. In this talk, Kristen will discuss the process of developing a dog detection system using YOLOv8 on-edge devices and the role of Comet, an experiment management platform, in handling the intricacies of the project.
Kristen will guide you through the entire process, from generating a data artifact to deploying the model, emphasizing the benefits of utilizing Comet at each stage. She will showcase how Comet was employed to monitor experiment metrics, visualize model performance, and illustrate the ease with which the selected model can be tracked in production. Participants will gain valuable insights on how to leverage an experiment tracking and monitoring solution like Comet to enhance their model development process, making it more transparent and reproducible.
Kristen is the founder of Data Moves Me, LLC and a data scientist who has been delivering innovative and actionable machine learning solutions across the utilities, healthcare and eCommerce since 2010. Kristen is #8 Global LinkedIn Top Voice in Data Science & Analytics and holds an MS in Applied Statistics and a BS in Mathematics.
Demo Talk | In-person
MLRun is an open-source MLOps orchestration framework. It exists to accelerate the integration of AI/ML applications into existing business workflows. MLRun introduces Data Scientists to a simple Python SDK that transforms their code into a production-quality application. It does so by abstracting the many layers involved in the MLOps pipeline. Developers can build, test, and tune their work anywhere and leverage MLRun to integrate with other components of their business workflow.
The capabilities of MLRun are extensive, and we will cover the basics to get you started. You will leave this session with enough information to:
- Get started with MLRun, on your own, in 10 minutes, so you can automate and accelerate your path to production and have your first AI app running in 20 minutes
- Run local move to Kubernetes
- Understand how your Python code can run as a Kubernetes job with no code changes
- Track your experiments
- Get an introduction to advanced MLOps topics using MLRun”
Demo Talk | Virtual
This technical talk delves into the paradigm shift from model-centric to data-centric AI, emphasizing the importance of data quality in improving machine learning outcomes. We will explore the current AI landscape and discuss the reasons behind this shift. Focusing on the Pachyderm platform for data-driven processing and versioning, attendees will learn practical steps and principles to streamline their data-centric AI efforts. This talk aims to equip practitioners with the knowledge and tools necessary to harness AI’s full potential by embracing a data-driven approach and leveraging Pachyderm’s innovative platform.
Jimmy Whitaker is the Chief Scientist of AI at Pachyderm. He focuses on creating a great data science experience and sharing best practices for how to use Pachyderm. When he isn’t at work, he’s either playing music or trying to learn something new, because “You suddenly understand something you’ve understood all your life, but in a new way.”
Demo Talk | In-person
With DALLE and ChatGPT, we have reached incredible capabilities and results, fundamentally changing our ability to tap into and leverage unstructured data in machine learning. With that said, general architectural understanding and intuition into how these models make decisions do not translate into minute detail interpretability.
We’re at a crossroads. This new “”””breed”””” of ML applications is here to stay, and unstructured data is only growing, but they are black boxes, and black boxes fail silently. So how can we as practitioners leverage NLP and vision while enjoying similar monitoring, interpretability, and explainability available to their tabular counterparts?
In this talk, we will introduce Elemeta, our OSS meta-feature extractor library in Python, which applies a structured approach to unstructured data by extracting information from text and images to create enriched tabular representations. With Elemeta, practitioners can utilize structured ML monitoring techniques in addition to the typical latent embedding visualizations and engineer alternative features to be utilized in simpler models such as decision trees.
In this talk, we’ll introduce you to Elemeta through a live notebook example and explain how it can be applied to text and images.
Lior Durahly is a data and ML engineer at Superwise, where he is responsible for researching and developing monitoring capabilities related to Responsible AI, including feature importance, fairness, and explainability. He is also the key contributor to the OSS package Elemeta, a meta-feature extractor for NLP and vision.
Demo Talk | In-person
Pressure to maximize ROI and reduce costs continues to grow for ML organizations. Data labeling is often one of the line items under the most scrutiny, so how do you get more efficient with your spending?
In this demo, we’ll walk through several ways that you can need fewer labels and make smarter choices on what you label through a combination of automation and human-in-the-loop.
Our clients using these techniques were able to:
- Reduce the amount of data labeled by up to 90%
- Find frames with rare objects within huge datasets
- Optimize annotation guidelines to improve their model performance
Cassie is a Senior Product Marketing Manager at CloudFactory, serving as the bridge between technical expertise and creative communications for their AI data labeling products. She holds an MBA from North Carolina State University and has spent her career in product marketing roles for B2B technology companies.
Demo Talk | In-person
Proper tracking is crucial for ensuring the reproducibility of results obtained during model development and fostering effective collaboration among multiple developers on a machine learning project. In this talk, Kristen will discuss the process of developing a dog detection system using YOLOv8 on-edge devices and the role of Comet, an experiment management platform, in handling the intricacies of the project.
Kristen will guide you through the entire process, from generating a data artifact to deploying the model, emphasizing the benefits of utilizing Comet at each stage. She will showcase how Comet was employed to monitor experiment metrics, visualize model performance, and illustrate the ease with which the selected model can be tracked in production. Participants will gain valuable insights on how to leverage an experiment tracking and monitoring solution like Comet to enhance their model development process, making it more transparent and reproducible.
Kristen is the founder of Data Moves Me, LLC and a data scientist who has been delivering innovative and actionable machine learning solutions across the utilities, healthcare and eCommerce since 2010. Kristen is #8 Global LinkedIn Top Voice in Data Science & Analytics and holds an MS in Applied Statistics and a BS in Mathematics.
Demo Talk | In-person
When OpenAI launched ChatGPT at the end of 2022, more than one million people had tried the model in just a week and this trend has only accelerated with ChatGPT recently reaching 100 Million monthly users. It’s clear that NLP and Generative Large Language Models are becoming mainstream. In this talk, you will learn how to enable ChatGPT on your own data with vector search.
Generative LLMs like ChatGPT are trained on huge datasets of open data from the internet. This enables them to have vast amounts of general knowledge about the world and natural language. However, there is one disadvantage: once trained, you can only use Generative Pretrained Transformers (GPT) only on the data that it was trained on. When you ask what today’s news is, ChatGPT can’t answer that question.
In order to benefit from the capabilities of LLMs like ChatGPT in real-life use cases, it would be ideal if we could apply the generative power to new or custom data, such as a chatbot for your e-commerce platform which has knowledge about the products you sell, your detailed return policies or specific promotions currently going on. This becomes possible if we combine ChatGPT with a vector search engine! Integrating a generative LLM with a vector search engine allows you to filter through your entire personal database and search for information relevant to a prompt, which can then be provided to ChatGPT along with the prompt. This framework allows you to harness the LLM model’s power to solve tasks grounded in the context of your own data!
In this demo, we will show you how ChatGPT can be implemented with the open-source vector search engine Weaviate in live demos.
You will leave this talk with a solid understanding of how you can enable ChatGPT on your own data using vector search. Whether you’re a data scientist, developer, or NLP enthusiast, this talk will provide valuable insights and practical skills for enhancing your NLP projects with vector search and OpenAI’s ChatGPT.
Zain Hasan is a Senior Developer Advocate at Weaviate, an open-source vector database. He is an engineer and data scientist by training, who pursued his undergraduate and graduate work at the University of Toronto, building artificially intelligent assistive technologies for elderly patients. He then founded his company, developing a digital health platform that leveraged machine learning to remotely monitor chronically ill patients using data from their medical devices. More recently, he practised as a consultant senior data scientist in Toronto. He is passionate about the field of data science and machine learning and loves to share his love for the field with anyone interested in the domain.
Demo Talk | In-person
This technical talk delves into the paradigm shift from model-centric to data-centric AI, emphasizing the importance of data quality in improving machine learning outcomes. We will explore the current AI landscape and discuss the reasons behind this shift. Focusing on the Pachyderm platform for data-driven processing and versioning, attendees will learn practical steps and principles to streamline their data-centric AI efforts. This talk aims to equip practitioners with the knowledge and tools necessary to harness AI’s full potential by embracing a data-driven approach and leveraging Pachyderm’s innovative platform.
Jimmy Whitaker is the Chief Scientist of AI at Pachyderm. He focuses on creating a great data science experience and sharing best practices for how to use Pachyderm. When he isn’t at work, he’s either playing music or trying to learn something new, because “You suddenly understand something you’ve understood all your life, but in a new way.”
Demo Talk | In-person
Beacon Analytics helps customers transition from rigid and monolithic data solutions to flexible microservices architecture, enabling better performance and faster access to critical information. By breaking up data into smaller, independent services, customers gain greater access and modification capabilities. The team recommends using the Polars library, which is based on Apache Arrow, in combination with Dash Plotly to create easy to maintain, high-performance solutions at an excellent price-to-performance ratio. Join Danny Bharat, Senior Vice President of Analytics at Cedric Millar and co-founder of Beacon Analytics, as he shares how his team’s innovative approach to data solutions allows them to build comprehensive 360° intelligence and deliver actionable insights. Beacon Analytics empowers customers to achieve success in a rapidly changing business and technology landscape by utilizing schema-on-read approaches, unstructured data storage, and on-the-fly analysis and transformation.
Danny Bharat is a seasoned supply chain industry professional and the Senior Vice President of Analytics at Cedric Millar Integrated Solutions. As a co-founder of Beacon Analytics, powered by Cedric Millar, he leads a growing team of solutions architects and data scientists in delivering comprehensive business intelligence and supply-chain solutions for end-to-end operations. With a deep focus on corporate planning, strategy, and digital transformation, Danny has accumulated a wealth of experience in multiple industries. He is dedicated to encouraging continuous professional growth and development through mentorship. Danny strongly believes that leaders with technical competence are more effective, and he practices what he preaches by being a self-taught dabbler in Python and DAX languages. He is passionate about using his expertise to help businesses succeed and deliver exceptional results for their customers.
In-person passes are sold out online, but you can still buy a pass on spot
Save 20% on Full Price
Visionaries and Thought Leaders
With an AI Expo Pass you can take advantage of 40+ demo sessions and ODSC Keynotes. Our Speakers will provide compelling insights on how to make data science and AI work for your industry.
KEYNOTE SPEAKERS

Pedro Domingos, PhD
Pedro Domingos is a professor emeritus of computer science and engineering at the University of Washington and the author of The Master Algorithm. He is a winner of the SIGKDD Innovation Award and the IJCAI John McCarthy Award, two of the highest honors in data science and AI. He is a Fellow of the AAAS and AAAI, and has received an NSF CAREER Award, a Sloan Fellowship, a Fulbright Scholarship, an IBM Faculty Award, several best paper awards, and other distinctions. Pedro received an undergraduate degree (1988) and M.S. in Electrical Engineering and Computer Science (1992) from IST, in Lisbon, and an M.S. (1994) and Ph.D. (1997) in Information and Computer Science from the University of California at Irvine. He is the author or co-author of over 200 technical publications in machine learning, data mining, and other areas. He is a member of the editorial board of the Machine Learning journal, co-founder of the International Machine Learning Society, and past associate editor of JAIR. He was program co-chair of KDD-2003 and SRL-2009, and has served on the program committees of AAAI, ICML, IJCAI, KDD, NIPS, SIGMOD, UAI, WWW, and others. I’ve written for the Wall Street Journal, Spectator, Scientific American, Wired, and others. He helped start the fields of statistical relational AI, data stream mining, adversarial learning, machine learning for information integration, and influence maximization in social networks.
Secrets of Successful AI Projects(Keynote)

Eve Psalti
Eve Psalti is 20+year tech and business leader, currently the Senior Director at Microsoft’s Azure AI engineering organization responsible for scaling & commercializing artificial intelligence solutions.
She was previously the Head of Strategic Platforms at Google Cloud where she worked with F500 companies helping them grow their businesses through digital transformation initiatives.
Prior to Google, Eve held business development, sales and marketing leadership positions at Microsoft and startups across the US and Europe leading 200-people teams and $600M businesses.
A native of Greece, she holds a Master’s degree and several technology and business certifications from London Business School and the University of Washington. Eve currently serves on the board of WE Global Studios, a full-stack startup innovation studio supporting female entrepreneurs.
Infuse Generative AI in your Apps Using Azure OpenAI Service(Keynote)

Daphne Koller, PhD

Raluca Ada Popa, PhD
Raluca Ada Popa is the Robert E. and Beverly A. Brooks associate professor of computer science at UC Berkeley working in computer security, systems, and applied cryptography. She is a co-founder and co-director of the RISELab and SkyLab at UC Berkeley, as well as a co-founder of Opaque Systems and PreVeil, two cybersecurity companies. Raluca has received her PhD in computer science as well as her Masters and two BS degrees, in computer science and in mathematics, from MIT. She is the recipient of the 2021 ACM Grace Murray Hopper Award, a Sloan Foundation Fellowship award, Jay Lepreau Best Paper Award at OSDI 2021, Distinguished Paper Award at IEEE Euro S&P 2022, Jim and Donna Gray Excellence in Undergraduate Teaching Award, NSF Career Award, Technology Review 35 Innovators under 35, Microsoft Faculty Fellowship, and a George M. Sprowls Award for best MIT CS doctoral thesis.
Protecting Sensitive Data Throughout the ML Pipeline using Confidential Computing(Track Keynote)

Hagay Lupesko
Hagay Lupesko is the VP of Engineering at MosaicML, where he focuses on making generative AI training and inference efficient, fast, and accessible. Prior to MosaicML, Hagay held AI engineering leadership roles at Meta, AWS, and GE Healthcare. He shipped products across various domains: from 3D medical imaging, through global-scale web systems, and up to deep learning systems that power apps and services used by billions of people worldwide.
Unlocking the Power of Large Language Models: Why Owning Your Own Model is Critical—and Within Reach(Keynote)

Jay Jackson
Jay is a VP of our Artificial Intelligence and Machine Learning organization at Oracle Cloud. He completed a degree in neuroscience and started his career in technology at Oracle, maintaining an idea that these two paths would converge.
Unlocking the Power of Large Language Models: Why Owning Your Own Model is Critical—and Within Reach(Keynote)
AI EXPO SPEAKERS

Sydney Beckett
Sydney “Syd” became a graph enthusiast through her work with clients to build graph-based solutions as well as supporting data science teams during her time at Deloitte and Accenture. Now she uses her graph expertise, to help customers realize the value of graph technology for their organization. She also contributes by teaching Neo4j graph database and data science training classes. Syd’s hobbies include interior design and defeating her car navigation system’s estimated drive time.
Session Title: Unlocking the Value of Graph Data Science in the Age of AI
Abstract:
Thinking about incorporating relationships into your data to improve predictions and machine learning models? Maybe you are creating a knowledge graph or looking for a way to improve customer 360, fraud detection, or supply chain performance. Relationships are highly predictive of behavior. With graphs, they’re embedded in the data itself, making it easy to unlock and add predictive capabilities in your existing practices.
Join us for a demo to learn why graph databases are a top choice for scalable analytics, intelligent app development and advanced AI/ML pipelines. We’ll showcase graphs using Neo4j’s enterprise-ready graph data platform. You’ll see firsthand how easy it is to get started and we’ll highlight a graph use case using Neo4j’s cloud platform for Graph Data Science. All attendees will get a link to download and try Neo4j for free using your own data.

Gary Nakanelua
Gary Nakanelua is a professional technologist with over 17 years of experience and the author of Experiment or Expire. Gary is the Managing Director of Innovation at Blueprint, a data intelligence company based in Bellevue, WA. He’s responsible for the experimentation and creation of Blueprint’s transformative solutions and accelerators. With his diverse background, Gary brings a different perspective to problems that businesses are facing today to create quantifiable solutions driven through a high level of collaborative thought processing, strategic planning, and cannibalization.
Streamlining Your Streaming Analytics with Delta Lake & Rust(Talk)

Brennan Smith
Brennan is an experienced Machine Learning professional with a background in Information Technology solutions, Business Analytics, big data and AI. Currently, he serves as Senior Machine Learning Engineer at Iguazio (acquired by McKinsey & Company), bringing his expertise to help Data Scientists, Data Engineers, and ML Engineers work together to deploy AI/ML applications faster, more efficiently and in a reproducible way. Before that he spent 8 years at SAS in various technology roles. Brennan holds a BS in Computer Science from UNC Wilmington and previously served in the Marine Corps. He lives in North Carolina with his family, and when he’s not tangling with big data for customers, he enjoys tangling with big fish!
Session Title: Building an ML Factory with OS MLOps Orchestration tool MLRun
Abstract:
MLRun is an open-source MLOps orchestration framework. It exists to accelerate the integration of AI/ML applications into existing business workflows. MLRun introduces Data Scientists to a simple Python SDK that transforms their code into a production-quality application. It does so by abstracting the many layers involved in the MLOps pipeline. Developers can build, test, and tune their work anywhere and leverage MLRun to integrate with other components of their business workflow.
The capabilities of MLRun are extensive, and we will cover the basics to get you started. You will leave this session with enough information to:
Get you started with MLRun, on your own, in 10 minutes, so you can automate and accelerate your path to production and have your first AI app running in 20 minutes
Run local move to Kubernetes
Understand how your Python code can run as a Kubernetes job with no code changes
Track your experiments
Get an introduction to advanced MLOps topics using MLRun

Pavel Klushin
Pavel Klushin is a seasoned solution architecture expert who currently leads the function at Qwak. With years of experience in the technology industry, he is known for his exceptional ability to design and deliver innovative solutions that meet the specific needs of his clients. Pavel previously led the solution architecture team at Spot (Aquired by NetApp).
Session Title: End to End Machine Learning Pipeline Management
Abstract: Join this demo to find how to centralize your ML pipeline and cut down operational complexities at each stage along the way. Qwak’s platform supports multiple use cases across any business vertical and allows data teams to productionize their models more efficiently and without depending on engineering resources. Join us to watch how <presenter name> uses Qwak to create features from data and build, train and deploy models into production. All under a single platform and with unprecedented simplicity.

Zain Hasan
Zain Hasan is a Senior Developer Advocate at Weaviate, an open-source vector database. He is an engineer and data scientist by training, who pursued his undergraduate and graduate work at the University of Toronto, building artificially intelligent assistive technologies for elderly patients. He then founded his company, developing a digital health platform that leveraged machine learning to remotely monitor chronically ill patients using data from their medical devices. More recently, he practised as a consultant senior data scientist in Toronto. He is passionate about the field of data science and machine learning and loves to share his love for the field with anyone interested in the domain.

Allen Roush
Allen is a Principal Machine Learning Architect and AI Researcher working for Oracle Cloud Infrastructure.
Enabling MLOps at scale with Oracle Cloud(Talk)
Session Title: Open Source Generative AI: The Future of Game Asset Creation on Oracle Cloud
Abstract:
Oracle Cloud Infrastructure (OCI) is proud to showcase a new product demo of Stable Diffusion for game content creation using popular user interfaces and the 3D modeling tool Blender. OCI’s demo of Stable Diffusion is powered by NVIDIA A10 Tensor Core GPUs in the Oracle cloud. Stable Diffusion, an innovative deep learning model released in 2022, has been primarily used for generating detailed images based on text descriptions. However, its capabilities extend to creating game textures, models, depthmaps, skins, and other game content. Diffusion models can even be utilized for other modalities, enabling tasks as diverse as music generation. The combination of Stable Diffusion and Blender allows artists to create high-quality game assets with complete control over the creative process, while benefitting from quicker creative iterations. Artists can further train Stable Diffusion on their individual styles and develop complete workflows that allow for greater creative freedom and flexibility in game development.

Seth Juarez
My name is Seth Juarez. I currently live near Redmond, Washington and work for Microsoft.
I received my Bachelors Degree in Computer Science at UNLV with a Minor in Mathematics. I also completed a Masters Degree at the University of Utah in the field of Computer Science. I currently am interested in Artificial Intelligence specifically in the realm of Machine Learning. I currently work as a Program Manager in the Azure Artificial Intelligence Product Group.
I’ve been married now for 21 years to a fabulously talented woman and have two beautiful daughters, and two feisty sons.
Session Title: Ask the Experts! ML Pros Deep-Dive into Machine Learning Techniques and MLOps
Abstract: Experienced machine learning engineers and data scientists care about ways to easily get their models up and running quickly and share ML assets across teams for collaboration. Collaborate and streamline the management of thousands of models across teams with new, innovative features in Azure Machine Learning. Come and join us in this interactive session with our product experts and get your questions answered on the latest capabilities in Azure Machine Learning!

Cassie Thompson
Cassie is a Senior Product Marketing Manager at CloudFactory, serving as the bridge between technical expertise and creative communications for their AI data labeling products. She holds an MBA from North Carolina State University and has spent her career in product marketing roles for B2B technology companies.

Eric Vogelpohl
Eric Vogelpohl is the Managing Director of Tech Strategy at Blueprint. He’s a proven IT professional with more than 20 years of experience and a high degree of technical and business acumen. He has an insatiable passion for all-things-tech, pro-cloud/SaaS, leadership, learning, and sharing ideas on how technology can turn data into information & transform user experiences.
Session Title: Top 5 Cool Tricks of Delta for Data Scientists – Why Your Data Lake Should be a Delta Lake
Abstract:
In this 25-minute demo, we will explore the top 5 cool tricks of Delta for data scientists and discuss why your data lake should be a Delta Lake. Delta Lake is an open-source storage layer that brings reliability to data lakes by providing ACID transactions, scalable metadata handling, and data versioning. We will first introduce the concept of Delta Lake and explain how it helps data scientists to manage their data pipelines with ease. We will then dive into the top 5 cool tricks of Delta Lake, which include performance optimizations, time travel, schema enforcement, automatic data merging, and data validation. We will demonstrate these tricks using real-world examples and show how they can simplify your data pipeline and reduce your development time. By the end of this talk, you will have a better understanding of Delta Lake’s features and how it can help you to manage your data lake efficiently. You will also have learned about the benefits of using Delta Lake and why it’s a must-have for data scientists working with large data sets.

Andrew Cheesman
Andrew is the head of data science at Bigeye, a data observability company. Prior to joining Bigeye, Andrew built ML-powered tools for Citi and (as a consultant) a range of top consumer banks; he specialized in pricing and underwriting problems. In his free time, Andrew enjoys cooking, travel, and using his TVR Chimaera to escape New York.
Human-in-the-Loop: Strategies for Improving Time Series Anomaly Detection(Talk)
Session Title: Data Observability for Data Science Teams
Abstract:
When putting models into production it’s critical to know how they’re performing over time. As the last mile of the data pipeline, models can be impacted by a variety of issues, often outside the control of the data science team. “Observability” promises to help teams detect and prevent issues that could impact their models—but what is observability vs. data observability vs. ML observability? Get practical answers and recommendations from Kyle Kirwan, former product leader for Uber’s metadata tools, and founder of data observability company, Bigeye.

Mayank Kasturia
Mayank Kasturia is a Senior Sales Engineer at Precisely responsible for developing interactive demo applications, demonstrating proof of concepts (POC), and implementing solutions for customers using Precisely’s Geo Addressing, Spatial Analytics and Data Enrichment capabilities in big data and cloud-native environments.
Session Title: Leverage Geospatial Data and Machine Learning to Discover Hidden Insights – A Live Demo
Abstract:
Location intelligence can provide valuable insights by leveraging geospatial data in machine learning. This demonstration will showcase how machine learning and location information can work together to help organizations extract more value from their data. We will use a comprehensive suite of geocoding, spatial analytics, and data enrichment capabilities to visualize and analyze data, identify patterns, and derive insights that can be used to make informed business decisions.
In this session, we will use Amazon SageMaker to train a machine-learning model using property attributes, historical weather data, and fire data. The goal will be to predict the fire risk for a property. You will see how quickly and efficiently we can build and train a machine-
learning model using various algorithms, such as decision trees and neural networks, to find the best approach for our dataset. While this demo example will highlight fire predictions, these location intelligence solutions can be applied across multiple industries, including financial services, telecommunications, insurance, retail, real estate, and more. We look forward to discussing how you can leverage geospatial data in your machine-learning models.

Drazen Dodik
Bio Coming Soon!
Session Title: Driving AI Forward: Continental Tire’s Journey to MLOps Excellence
Abstract:
In this session, we will hear from Continental Tire about their journey towards implementing MLOps since 2015. We will explore how they enable data scientists from diverse backgrounds to easily build models with the languages, frameworks, and tools they are comfortable with.
The session will delve into the challenges faced by Continental Tire’s data science teams, and the strategies they have used to address them. Additionally, the session will cover important considerations for those starting on their MLOps journey, including what to keep in mind when building infrastructure and workflows for data science projects.
The session will conclude with a demo and overview of the Valohai platform, which has been used by Continental Tire to streamline their MLOps workflows.

Philip Wauters
Philip Wauters is Customer Success Manager and Value engineer at Tangent Works working on practical applications of time series machine learning at customers from various industries such as Siemens, BASF, Borealis and Volkswagen. With a commercial background and experience with data engineering, analysis and data science his goal is to find and extract the business value in the enormous amounts of time-series data that exists at companies today.
Learn how to Efficiently Build and Operationalize Time Series Models in 2023(Workshop)
Demo Talk: The Tangent Information Modeler, time series modeling reinvented
Abstract:
Modeling time series data is difficult due to its large quantities and constantly evolving nature. Existing techniques have limitations in scalability, agility, explainability, and accuracy. Despite 50 years of research, current techniques often fall short when applied to time series data. The Tangent Information Modeler (TIM) offers a game-changing approach with efficient and effective feature engineering based on Information Geometry. This multivariate modeling co-pilot can handle a wider range of time series use cases with award-winning results and incredible performance.
During this demo session we will showcase how best-in-class and very transparent time series models can be built with just one iteration through the data. We will cover several concrete use cases for advanced time series forecasting, anomaly detection and root cause analysis.

Kaitlyn Abdo
Kaitlyn Abdo is an Associate Technical Marketing Manager, working on technical enablement surrounding the AI/ML products and services at Red Hat. She has been at Red Hat for 2 years, and is interested in discovering and learning about new and innovative solutions in the AI/ML space. In her free time, Kaitlyn enjoys building Legos, cooking and spending time with animals.

Danny Bharat
Danny Bharat is a seasoned supply chain industry professional and the Senior Vice President of Analytics at Cedric Millar Integrated Solutions. As a co-founder of Beacon Analytics, powered by Cedric Millar, he leads a growing team of solutions architects and data scientists in delivering comprehensive business intelligence and supply-chain solutions for end-to-end operations. With a deep focus on corporate planning, strategy, and digital transformation, Danny has accumulated a wealth of experience in multiple industries. He is dedicated to encouraging continuous professional growth and development through mentorship. Danny strongly believes that leaders with technical competence are more effective, and he practices what he preaches by being a self-taught dabbler in Python and DAX languages. He is passionate about using his expertise to help businesses succeed and deliver exceptional results for their customers.
Demo Session Title: Achieving Flexibility and Speed with Schema-on-Read Architecture: Moving Beyond SQL
and RDBMS
Abstract:
Beacon Analytics helps customers transition from rigid and monolithic data solutions to flexible microservices architecture, enabling better performance and faster access to critical information. By breaking up data into smaller, independent services, customers gain greater access and modification capabilities. The team recommends using the Polars library, which is based on Apache Arrow, in combination with Dash Plotly to create easy to maintain, high-performance solutions at an excellent price-to-performance ratio. Join Danny Bharat, Senior Vice President of Analytics at Cedric Millar and co-founder of Beacon Analytics, as he shares how his team’s innovative approach to data solutions allows them to build comprehensive 360° intelligence and deliver actionable insights. Beacon Analytics empowers customers to achieve success in a rapidly changing business and technology landscape by utilizing schema-on-read approaches, unstructured data storage, and on-the-fly analysis and transformation.

Raghu Marwaha
Raghu Marwaha has been in the IT industry for the past 25 years. His recent interests in AI has led to his quest to find the best tools for various applications that could benefit from the recent advances in AI. As a director at IntraEdge overseas multiple IT projects, products and teams.
Session Title: Easy tools for Business Professionals to quickly build Ontologies, Data Taxonomies, and Keyword Lists.
Abstract:
Yet another meeting to discuss nuisance hairsplitting details for your data taxonomies and keywords list?
It shouldn’t take a team of domain experts, Excel specialists, Python developers and Data Scientists weeks or months to build it. It is a simple problem that requires a simple solution.
You should be able to quickly and accurately analyze contracts, customer comments and any other text-based content while easily building explainable NLP models.
Stop scrubbing through volumes of data to find key examples and then reducing the content to specific keywords and variations.
Join us as we explore a new and exciting solution using human language to easily develop ontologies, data taxonomies and keyword lists, which you can share across your business with just a few simple clicks.
Accelerate these NLP tasks in every project and help eliminate those long-drawn meetings to discuss keywords for data taxonomies. Unless you enjoy those nuisance meetings 🙂

Dani Herzberg
Dani Herzberg is an Analyst on the Product Management and Development team at S&P Global Market Intelligence. On this team, she creates notebooks in Databricks, assists in analytic visualizations of S&P Global data, and provides SQL query support. She holds a Master of Science in Business Analytics from Georgetown University.
Session Title: Data Visualizations Utilizing S&P Global Marketplace Workbench
Abstract:
We will be using the plotly library to create visualizations in S&P Global Marketplace Workbench, which is powered by Databricks, and showcasing a Databricks Dashboard from the different charts. I would say this demo talk is best suited for beginner – intermediate audience.

Albert Vu
Albert has skills in machine learning and big data to solve (financial) optimization problems. He developed projects of different skill levels for Taipy’s tutorial videos. He got his degree from McGill University – Bachelor of Science. Major in Computer Science & Statistics. Minor in Finance.
How to build stunning Data Science Web applications in Python – Taipy Tutorial(Workshop)
Bringing AI to Retail and Fast Food with Taipy’s Applications(Track Keynote)
Demo Talk Session Title: Turning your Data/AI Algorithms into full web apps in no time with Taipy
Abstract:
In the Python open-source ecosystem, many packages are available that cater to:
– the building of great algorithms
– the visualization of data
Despite this, over 85% of Data Science Pilots remain pilots and do not make it to the production stage.
With Taipy, a new open-source Python framework, Data Scientists/Python Developers are able to build great pilots as well as stunning production-ready applications for end-users.
Taipy provides two independent modules: Taipy GUI and Taipy Core.
In this talk, we will demonstrate how:
Taipy-GUI goes way beyond the capabilities of the standard graphical stack: Gradio, Streamlit, Dash, etc.
Taipy Core fills a void in the standard Python back-end stack.

Lior Durahly
Lior Durahly is a data and ML engineer at Superwise, where he is responsible for researching and developing monitoring capabilities related to Responsible AI, including feature importance, fairness, and explainability. He is also the key contributor to the OSS package Elemeta, a meta-feature extractor for NLP and vision. Prior to Superwise, Lior held positions as a software and data engineer at APM observability leader Coralogix and a data science engineer at the Israeli Defense Forces 8200 intelligence unit. He is currently in his second year of achieving a BsC. in Computer Science (with a focus on Data Science) at the Open University of Israel. He’s also passionate about physics and medicine and how they intersect with artificial intelligence. In his free time, Lior studies violin, which is a new passion that he picked up only last year, or can be found hunting for eateries in Tel Aviv with Asian food or unique themes.
Session Title: Introducing Elemeta: OSS meta-feature extractor for NLP and vision
Abstract:
With DALLE and ChatGPT, we have reached incredible capabilities and results, fundamentally changing our ability to tap into and leverage unstructured data in machine learning. With that said, general architectural understanding and intuition into how these models make decisions do not translate into minute detail interpretability.
We’re at a crossroads. This new “”breed”” of ML applications is here to stay, and unstructured data is only growing, but they are black boxes, and black boxes fail silently. So how can we as practitioners leverage NLP and vision while enjoying similar monitoring, interpretability, and explainability available to their tabular counterparts?
In this talk, we will introduce Elemeta, our OSS meta-feature extractor library in Python, which applies a structured approach to unstructured data by extracting information from text and images to create enriched tabular representations. With Elemeta, practitioners can utilize structured ML monitoring techniques in addition to the typical latent embedding visualizations and engineer alternative features to be utilized in simpler models such as decision trees.
In this talk, we’ll introduce you to Elemeta through a live notebook example and explain how it can be applied to text and images.

Florian Jacta
Florian Jacta is a specialist of Taipy, a low-code open-source Python package enabling any Python developers to easily develop a production-ready AI application. Package pre-sales and after-sales functions. He is data Scientist for Groupe Les Mousquetaires (Intermarche) and ATOS. He developed several Predictive Models as part of strategic AI projects. Also, Florian got his master’s degree in Applied Mathematics from INSA, Major in Data Science and Mathematical Optimization.
How to Build Stunning Data Science Web applications in Python – Taipy Tutorial(Workshop)
Bringing AI to Retail and Fast Food with Taipy’s Applications(Track Keynote)
Demo Session Title: Turning your Data/AI algorithms into full web apps in no time with Taipy
Abstract:
In the Python open-source ecosystem, many packages are available that cater to:
– the building of great algorithms
– the visualization of data
Despite this, over 85% of Data Science Pilots remain pilots and do not make it to the production
stage.
With Taipy, a new open-source Python framework, Data Scientists/Python Developers are able to
build great pilots as well as stunning production-ready applications for end-users.
Taipy provides two independent modules: Taipy GUI and Taipy Core.
In this talk, we will demonstrate how:
1. Taipy-GUI goes way beyond the capabilities of the standard graphical stack: Gradio,
Streamlit, Dash, etc.
2. Taipy Core fills a void in the standard Python back-end stack.

Greg West
A member of CSI for a decade, Greg has developed a wealth of expertise on knowledge graph technology. His true speciality lies demonstrating and developing custom solutions that leverage Anzo’s unique capabilities.
Session Title: Accelerating AI/ML Initiatives with Knowledge Graph
Abstract: Integrating and unifying data from diverse sources is foundational to AI and ML workflows. This workshop will demonstrate how Anzo’s knowledge graph platform can create an enterprise scale knowledge graph from several sources – setting organizations up for sustainable success with collective intelligence. During this workshop, users will:
Create a sample knowledge graph from several sources.
Demonstrate flexible data preparation for training datasets.
Analyze the knowledge graph with native visualizations and graph algorithms
Connect to the knowledge graph for additional data science operations
From its hyper agile in-memory MPP graph engine to its point-and-click user experience and open flexible architecture, Anzo transcends the limitations of traditional knowledge graphs and gives you all the capabilities and flexibilities that complex, enterprise-scale solutions need.
Join this demo to see why Anzo might be the solution you need.

Kristen Kehrer
Kristen is a Developer Advocate at CometML. Since 2010, Kristen has been delivering innovative and actionable statistical modeling solutions in industry in the utilities, healthcare, and eCommerce. Kristen was a LinkedIn Top Voice – Data Science & Analytics in 2018. Previously Kristen was Faculty/SME at Emeritus Institute of Management and Creator of Data Moves Me, LLC. Kristen holds an MS in Applied Statistics from Worcester Polytechnic Institute and a BS in Mathematics.
Session Title: On the Scent: Detecting Dogs on Edge Devices With YOLOv8 and Comet
Abstract:
Proper tracking is crucial for ensuring the reproducibility of results obtained during model development and fostering effective collaboration among multiple developers on a machine learning project.In this talk, Kristen will discuss the process of developing a dog detection system using YOLOv8 on edge devices and the role of Comet, an experiment management platform, in handling the intricacies of the project.
Kristen will guide you through the entire process, from generating a data artifact to deploying the model, emphasizing the benefits of utilizing Comet at each stage. She will showcase how Comet was employed to monitor experiment metrics, visualize model performance, and illustrate the ease with which the selected model can be tracked in production. Participants will gain valuable insights on how to leverage an experiment tracking and monitoring solution like Comet to enhance their model development process, making it more transparent and reproducible.

Hiro Kobashi
Hiro Kobashi is the head of Artificial Intelligence Division at Fujitsu Research of America where he leads a team of researchers both in the United States and Japan working on AutoML (Automation for Machine Learning) to realize sustainable and efficient AI creation. He joined Fujitsu in 2003 and has worked at Fujitsu research organizations both in Japan and United Kingdom. His research interests include artificial intelligence, machine learning, and distributed systems.
Session Title: Fujitsu AI Innovation Platform: Advanced AI Technologies Ready for Customer Adoption
Abstract:
In this session we will introduce Fujitsu’s unique and advanced AI technologies which are being demonstrated at the Fujitsu booth as part of Fujitsu AI Innovation Platform. The first is Actlyzer technology which can automatically sense human behavior, relationship between people and the environment and predict future actions via human and context sensing to support applications in many industries including retail, security, and manufacturing. Secondly, we will present Fujitsu’s Auto ML technology for structured data that creates high-quality ML models quickly with less data and limited resources and automatically generates production-ready ML Code accelerating the AI adoption by enterprises. Finally, we will present Galileo XAI solution jointly developed by Fujitsu and our partner Larus. Galileo XAI enables extraction of insights with built-in explain-ability from graphs, which are ubiquitous in todays connected world, leading to several practical applications including fraud detection, business process optimization, pandemic tracking, and threat analysis.

Bob Foreman
Bob has worked with the HPCC Systems technology platform and the ECL programming language for over a decade and has been a technical trainer for over 30 years. He is the developer and designer of the HPCC Systems Online Training Courses and is the Senior Instructor for all classroom and remote based training.
EXTRA EVENTS
ODSC Networking Reception
Wednesday, April 20th, 5:00 PM to 7:00 PM EST
Socialize with fellow attendees as you recount the day’s talks and workshops with a few well-deserved drinks and small bites. Network, connect and collaborate with those leading the future of data science and AI.
Women in Data Science Ignite
April 20th, 12:30 – 1:15 PM EST
Women in Data Science Ignite Session fuels creativity and innovation among conference attendees. Fast-paced, short presentations will allow YOU to pitch a unique, interesting project you’re working on.
AI Startups Showcase
April 20th-21st, 10:00 AM – 5:00 PM EST
Join AI Startups Showcase to meet with innovative founders and learn about new AI technologies reinventing industries.
Book Signing
Wednesday, April 20th and Thursday, April 21st
Featuring industry-leading authors working at the forefront of AI, this session will give attendees an opportunity to learn about critical Data Science concepts, approaches, supported programming languages, and their related packages.
AI Investors Reverse Pitch
April 20th, 4:30 – 5:30 PM EST
At the AI Investors Reverse Pitch, you’ll hear top investment firms & VCs explain why YOUR Startup should choose THEM, not the other way around. Learn what top firms look for in startups when they consider investing.
Showcase and Speak at ODSC AI Expo
Request brochure2023 PARTNERS
ODSC is proud to partner with numerous industry leaders providing organizations with the tools to accelerate digital transformation with AI. You can reach out to our Expo partners prior to the event for more information.
PLATINUM PARTNERS
PLATINUM PARTNERS
GOLD PARTNERS
GOLD PARTNERS
SILVER PARTNERS
NETWORKING PARTNERS
MEDIA & COMMUNITY PARTNERS
MEDIA & COMMUNITY PARTNERS
Interested in Partnering with ODSC?
From 2021 and 2022, ODSC welcomed nearly 20,000 attendees to an unparalleled range of events, from large conferences and hackathons to small community gatherings.
*Limited Booth Availability
Who Should Attend?
The AI Expo & Demo Hall gathers executives, business professionals, experts, and data scientists who are transforming the enterprise with Artificial Intelligence.
Business Leaders and Executives: Chief Data Scientists, Chief AI Officers, CDO, CIO, CTO, VPs of Engineering, R&D, Marketing, Business Development, Product, Development, Data
Directors of Data Science, Data Analytics Managers, Heads of Data and Innovation; Software, IT, and Product Managers
Data Science Professionals: Data Scientists, Data Engineers, Data Analysts, Architects, ML and DL Experts, Database Admins
Software Development Experts: Software Architects, Engineers, and Developers
ARE YOU AN EARLY-STAGE STARTUP?
Companies in Attendance
Connect with like-minded professionals to learn about the latest languages, tools, and frameworks related to all types of streaming media applications. Here’s a sampling of companies that have attended Streaming Media Connect events.
Participate at ODSC East 2023
As part of the global data science community we value inclusivity, diversity, and fairness in the pursuit of knowledge and learning. We seek to deliver a conference agenda, speaker program, and attendee participation that moves the global data science community forward with these shared goals. Learn more on our code of conduct, speaker submissions, or speaker committee pages.
ODSC Newsletter
Stay current with the latest news and updates in open source data science. In addition, we’ll inform you about our many upcoming Virtual and in person events in Boston, NYC, Sao Paulo, San Francisco, and London. And keep a lookout for special discount codes, only available to our newsletter subscribers!