May 9th-11th, 2023
Data Engineering & Big Data Track
Learn the latest models, advancements, and trends from the top practitioners and researchers behind Data Analytics
Big Data has seen rapid advances in recent years. With some of the sharpest minds in data science presenting, learn the latest techniques and processes to analyze raw data, be able to automate data into mechanical processes and algorithms, and hear use-cases focusing on how data can be used to optimize business performance.
This focus area will cover many of the techniques for drawing conclusions and insights from raw data. You’ll learn from leading experts in the field and complete the conference with an understanding of how more efficiently and accurately analyze data by demonstrating your knowledge in SQL, Python and Data Storytelling.
Some of Our Past Data Engineering & Big Data Speakers

Adrien Treuille, Phd
Adrien is co-founder and CEO of Streamlit. Previously, Dr. Treuille has been VP of Simulation Zoox, lead a Google X project, and was a Professor of Computer Science and Robotics at Carnegie Mellon. He gives talks around the world, including to the President’s Council of Advisors on Science and Technology, and has won numerous scientific awards, including the MIT TR35. Adrien and his work have been featured in the documentaries “What Will the Future Be Like” by PBS/NOVA, and “Lo and Behold” by Werner Herzog.
Streamlit: Next-generation Communication of Data Insights(Workshop)

Srinivasa Kadamati
Srini is a PMC member for the Apache Superset project and heads up community & developer relations efforts at Preset. Before joining Preset, Srini worked as a data scientist and a data science educator for over 6 years. Most recently, he was the first employee at Dataquest and helped lead educational content, engineering, and product efforts there.
Deep Dive Workshop for Apache Superset(Workshop)

Karin Wolok
Karin is currently the leading developer community programming in the Developer Relations team at StarTree. Karin initially began her career in entertainment marketing working with the likes of names like Eminem and Live Nation. She also launched a successful professional women’s network in two major cities in the U.S., organized events for her local Data Science meetup, and helped lead a on-going hackathon to put machine learning in the hands of cancer biologists. Her journey working in data eventually let her to a position as Program Manager for Community Development for the leading graph database in the world, Neo4j. Most recently, she was brought on to StarTree to improve the adoption and success of the overall developer community.
Real-Time Analytics: Going Beyond Stream Processing with Apache Pinot(Workshop)

Michael Mahoney, PhD
Michael W. Mahoney is at the University of California at Berkeley in the Department of Statistics and at the International Computer Science Institute (ICSI). He is also an Amazon Scholar as well as a faculty scientist at the Lawrence Berkeley National Laboratory. He works on algorithmic and statistical aspects of modern large-scale data analysis. Much of his recent research has focused on large-scale machine learning, including randomized matrix algorithms and randomized numerical linear algebra, geometric network analysis tools for structure extraction in large informatics graphs, scalable implicit regularization methods, computational methods for neural network analysis, physics informed machine learning, and applications in genetics, astronomy, medical imaging, social network analysis, and internet data analysis. He received his PhD from Yale University with a dissertation in computational statistical mechanics, and he has worked and taught at Yale University in the mathematics department, at Yahoo Research, and at Stanford University in the mathematics department. Among other things, he is on the national advisory committee of the Statistical and Applied Mathematical Sciences Institute (SAMSI), he was on the National Research Council’s Committee on the Analysis of Massive Data, he co-organized the Simons Institute’s fall 2013 and 2018 programs on the foundations of data science, he ran the Park City Mathematics Institute’s 2016 PCMI Summer Session on The Mathematics of Data, and he runs the biennial MMDS Workshops on Algorithms for Modern Massive Data Sets. He is the Director of the NSF/TRIPODS-funded FODA (Foundations of Data Analysis) Institute at UC Berkeley. More information is available at https://www.stat.berkeley.edu/~mmahoney/.
WeightWatcher, an Open-Source Diagnostic Tool for Analyzing Deep Neural Nets(Talk)

Amir Meimand, PhD
Amir Meimand is a Principal Solution Engineering on the Salesforce strategic solution team focusing on Data Science and Machine Learning. Amir has 10+ years experiences in building, deploying, and applying advanced analytics to solve enterprise business problems. Previously, he was the director of Data Science at Zilliant, a SaaS company providing machine learning solutions for price optimization and sales maximization lately acquired by Madison Dearborn. Amir’s current area of focus is scaling advanced analytics solutions by democratizing data science and machine learning. Amir holds a Ph.D. in Statistics and Operations Research from Pennsylvania State University, 2013.
Bridging the Gap Between Data Scientists and Business Users(Workshop)

Itai Yaffe
Itai Yaffe is a Senior Solutions Architect at Databricks. Prior to Databricks, Itai was a Principal Solutions Architect at Imply, and before that – a big data tech lead at Nielsen Identity, where he dealt with big data challenges using tools like Spark, Druid, Kafka, and others. He is also a part of the Israeli chapter’s core team of Women in Big Data. Itai is keen on sharing his knowledge and has presented his real-life experience in various forums in the past.
A Bamboo of Pandas: Crossing Pandas’ Single-machine Barrier with Apache Spark(Talk)

Frank DeFalco
Frank DeFalco is the Director of Epidemiology Analytics at Janssen Research and Development where he architects software solutions and data platforms for the analysis and application of observational data sources. He is currently the leader and Benevolent Dictator of the OHDSI open source architecture working group. Frank is a presenter and panelist at OHDSI symposiums and has served as faculty for OHDSI symposium tutorials classes on architecture and common data model vocabulary. In addition to leading the OHDSI Architecture working group Frank initiated development of a standardized platform for observational analytics known as ATLAS. He is an active contributor to the open source software repositories developed and released by OHDSI including ATLAS, WebAPI, Achilles, Circe, Arachne, Visualizations, Hermes, Helios and others. Frank’s areas of expertise include computation epidemiology, large scale data platforms, software development and architecture, data visualization and informatics. Prior to joining Janssen Research and Development, Frank held the position of Senior Principal and Director of Collaboration and Analytics at British Telecom where he was a strategic advisor for multiple Fortune 100 companies across sectors including Consumer Products, Telecommunications and Pharmaceuticals. Frank received his undergraduate degrees in Computer Science and Psychology at Rutgers University.

Ryan Blue
Ryan is the co-creator of Apache Iceberg and spent the last decade working on big data infrastructure at Netflix, Cloudera, and now Tabular. He is an ASF member and a committer in the Apache Parquet, Avro, and Spark communities.

John Peach
A modern polymath, John holds advanced degrees in mechanical engineering, kinesiology and data science, with a focus on solving novel and ambiguous problems. As a senior applied data scientist at Amazon, John worked closely with engineering to create machine learning models to arbitrate chatbot skills, entity resolution, search, and personalization.
As a principal data scientist for Oracle Cloud Infrastructure, he is now defining tooling for data science at scale. John frequently gives talks on best practices and reproducible research. To that end, he has developed an approach to improve validation and reliability by using data unit tests and has pioneered Data Science Design Thinking. He also coordinates SoCal RUG, the largest R meetup group in Southern California.
Tired of Cleaning your Data? Have Confidence in Data with Feature Types(Workshop)

Max Urbany
As Max progresses through his Master’s Program, he is particularly interested in intelligent digital accessibility design, along with the ethical analysis of existing predictive models. His passion for creating quality user-centered tools drives him to understand as much as he can about end users while leveraging what data can reveal.
Z by HP Panel Discussion on the Diverse Role of Data Science in Education(Talk)

Dan Chaney
Dan Chaney is the VP, Enterprise AI / Data Science Solutions, for Future Tech Enterprise, Inc., an award-winning global IT solutions provider. He oversees all sales, marketing, and technical activities focused on Future Tech’s comprehensive range of AI and data science workstation solutions. Prior to joining Future Tech, Dan spent 20 years at Northrop Grumman, most recently serving as the company’s Enterprise Director of IT Solution Architecture & Engineering. Dan earned his bachelor’s and master’s degrees in communication and computer science from the University of Kentucky. Dan is a Certified Information Systems Security Professional (CISSP) and adjunct instructor for the University of Louisville’s cybersecurity workforce program sponsored by the National Centers of Academic Excellence in Cybersecurity.
Z by HP Panel Discussion on the Diverse Role of Data Science in Education(Talk)

Kristin Hempstead
Kristin has been with HP for 11 years and is currently the North America business development manager for HP’s data science and artificial intelligence solutions focusing on federal, education, and public sector customers. She has an MBA from University in South Florida with a specialization in Finance and MIS and a BS in Agriculture from the University of Georgia.
Z by HP Panel Discussion on the Diverse Role of Data Science in Education(Talk)

Anais Dotis-Georgiou
Anais Dotis-Georgiou is a Developer Advocate for InfluxData with a passion for making data beautiful with the use of Data Analytics, AI, and Machine Learning. She takes the data that she collects, does a mix of research, exploration, and engineering to translate the data into something of function, value, and beauty. When she is not behind a screen, you can find her outside drawing, stretching, boarding, or chasing after a soccer ball.
InfluxDB: The Database for Your Time Series Data Science Problems(Demo Talk)

Seth Wiesman
Bio Coming Soon!

Jay Lowe
Jay is a field engineer with a background in deep learning, full stack development, and marine research. At Roboflow, he combines technical CV skills with business acumen to help customers rapidly build value and empower developers to integrate CV into their own applications.

Daniel Haviv
Daniel Haviv has been working with a multitude of companies helping them solve their data challenges throughout his career, recently as a Senior Solutions Architect for Databricks and as an Analytics Specialist SA in AWS.
A Bamboo of Pandas: Crossing Pandas’ Single-machine Barrier with Apache Spark(Talk)

Bob Foreman
Bob has worked with the HPCC Systems technology platform and the ECL programming language for over a decade and has been a technical trainer for over 30 years. He is the developer and designer of the HPCC Systems Online Training Courses and is the Senior Instructor for all classroom and remote based training.
Relational Dataset Analytics for Clear Customer Insights(Workshop)

James Olejniczak
James Olejniczak is a Product Manager within Data Management Solutions global product management team at S&P Global Market Intelligence. He leads multiple teams in developing visualizations for the S&P Global Marketplace platform. He is the BI tool expert responsible for guiding initiatives and assisting clients in bridging the gap between highly structured data and data structured for optimized BI ingestion. Preceding his role in Product Management, he has a vast spectrum of experience in his eight years at S&P Global from fundamental data collection to the onboarding and continued support of feed clients. Mr. Olejniczak holds a Bachelors degree in Business Finance from the Metropolitan State University of Denver.
Accelerating Your Advanced Analytics & ESG (Environmental, Social, and Governance) Journey(Talk)

Gavin McCormick
While a PhD student in energy econometrics at UC Berkeley, Gavin McCormick invented “Automated Emissions Reduction”: software that instantly reduces pollution from smart devices such as electric vehicles and smart thermostats. Today he is Executive Director of WattTime, a nonprofit that helps Fortune 500 companies and governments use this technology to lower their carbon footprint at scale. He is also a cofounder of Climate TRACE: a coalition of nonprofits, tech companies, and universities using satellites and AI to monitor every source of greenhouse gas emissions source on Earth.
Timing IoT Devices to Slash Carbon Emissions at Scale(Business Talk)

Denis Coady
An experienced product manager with a demonstrated history of providing valuable products and services in the big data and AI/ML industry, Denis currently serves as a Technical Product Manager for Molecula. Denis is driven to empower organizations with easier-to-use data products and to make cutting-edge advancements accessible to more people. He has a strong engineering background that informs his work. He most recently worked as a Senior Solutions Architect at Cloudera, and has previous experience at IBM, Microsoft, and Boeing.
A New Data Format to Deliver Real-Time Data at Massive Scale(Demo Talk)

Kirk DeBaets
Kirk DeBaets is a Senior Solution Engineer at Clarifai. He has an MBA and a passion for turning technologies into positive business outcomes. A former VP of Database Engineering in both the Investment Bank and Global Technology lines of business at JP Morgan Chase, he has spent the last several years working with customers to derive business value from their AI/ML investment.
Demystifying AI — Everything You Need to Know for Successful Deployment(Demo Talk)

Ben Amaba, Ph.D.
Dr. Ben Amaba is focused on AI, IoT, Data, and Edge Computing. Ben received his Ph.D. in Industrial Engineering from the University of Miami. Dr. Amaba is a registered and licensed Professional Engineer with International Registry; certified in Production, Operations, and Inventory Management by APICS ®; LEED® Accredited Professional (Leadership in Energy & Environmental Design); and certified in Corporate Strategy by Massachusetts Institute of Technology. Ben holds a copyright and several patents. Ben earned his BS in Electrical Engineering as well as his Master’s in Engineering/Industrial Management. Dr. Amaba holds positions as Board Member to the Oakland University Artificial Intelligence Research Center (OUAIRC), Founding member to the Institute of Advanced Systems Engineering, Founding member to the Center of Advanced Supply Chain Management, Industry Council Advisor for the Project Production Institute, Industrial Engineering Fellow, Board Member to the Council on Industrial and Systems Engineering (CISE), Executive Board Member of Applied Human Factors and Ergonomics (AHFE) and Editorial Board Member to IEEE (Institute of Electrical and Electronics Engineers) IT Professionals, and Editorial Board of The Open Cybernetics and Systemics Journal.
Demystifying AI — Everything You Need to Know for Successful Deployment(Demo Talk)

Fletcher Berryman
Fletcher Berryman is a lifelong geographer currently serving as a product manager for SafeGraph with a focus on international spatial data. At work and beyond, Berryman is most drawn to research questions that involve the intersection of geographers’ traditional considerations of “space and place” with modern technologies previously unavailable for use in examination, especially in developing economies. Outside of SafeGraph, Berryman is a co-chair of the world’s largest geospatial meetup (GeoNYC) and a research associate at the University of Chicago’s Center for Spatial Data Science.
Analyzing Dynamic Global Markets with Places Data(Talk)
Perform Detailed Spatial Analysis with SafeGraph and CARTO(Demo Talk)

Ehsan Khodabandeh, PhD
Ehsan is a Principal Operations Research Scientist at Decision Spot, with knowledge in logistics and transportation industries. Over the years, he has worked with several Fortune 500 companies, including GE, Norfolk Southern, and C.H. Robinson. Ehsan has worked on a variety of supply chain projects and has focused primarily on network optimization and routing. Before joining Decision Spot, he worked at Opex Analytics, which was acquired by Llamasoft, and later by Coupa.
He holds a PhD in Industrial Engineering and has been an Adjunct Lecturer at Northwestern Master of Science in Analytics (MSiA) program since Fall 2019.
More talks, hands-on workshop and training sessions
See all sessionsYou Will Meet
Some of the world’s best data science speakers
The brains and authors behind today’s most popular open data science tools, topics, and languages
Hundreds of attendees focused on data science
Chief Data Scientists
Thought leaders working in data science
Data Scientists and Analysts
Software Developers
CEOs, CTOs, CIOs
Data Visualization professionals
Venture Capitalists and Investors
Startup Founders and Executives
Attendees from Healthcare, Finance, Education, Business, Intelligence, and other industries
Big data and data science innovators
Why Attend?
Several of the best minds and biggest names in data science will be presenting
Network with attendees from leading data science companies to learn how others are tackling similar problems
Gain quality training in the hottest data science topics, tools, and languages
Learn the latest in data science from industry leaders without having to make room in the budget — tickets are surprisingly inexpensive
What You'll Learn
Talks & Workshops on these topics:
Topics
Data Analytics Systems
Building Advanced Analytics and Data Science Capabilities
Analytics with Graph Representations
Data Analytics with Kubernetes and OpenShift
Distributed Analytical Database
Sentiment Analysis
Analytics: Challenges and Opportunities
Infrastructure Slowing Your Data Analytics and AI Projects
Data Analytics Use Cases
Models
BERT
XLNet
GPT-2
Transformers
Word2Vec
Deep Learning Models
RNN & LSTM
Machine Learning Models
ULMFiT
Transfer Learning
Tools
Tensorflow 2.0
Hugging Face Transformers
PyTorch
Theano
SpaCy
NLTK
AllenNLP
Stanford CoreNLP
Keras
FLAIR
ODSC EAST 2023 | May 9th-11th
Register Now & Save 60%ODSC Newsletter
Stay current with the latest news and updates in open source data science. In addition, we’ll inform you about our many upcoming Virtual and in person events in Boston, NYC, Sao Paulo, San Francisco, and London. And keep a lookout for special discount codes, only available to our newsletter subscribers!