
Please Note: In-Person attendees will have access to virtual sessions. If you have a virtual pass, please note that we will not live-stream any in-person sessions. Only virtual sessions will be recorded. The schedule overview is available HERE.
50+ Sessions Added
100+ more coming soon.
Bootcamp/Pre-Bootcamp
Pre-Botocamp: Introduction to Data Course
Pre-Bootcamp: Introduction to Programming with Python Course
Pre-Bootcamp: Data Wrangling with Python Course
Pre-Bootcamp: Introduction to SQL Course
Pre-Bootcamp: Introduction to AI Course
Please Note: In-Person attendees will have access to virtual sessions. If you have a virtual pass, please note that we will not live-stream any in-person sessions. Only virtual sessions will be recorded. The schedule overview is available HERE.
70+ Sessions Added
100+ more coming soon.
In-person | Keynote | Machine Learning | All Levels
In this talk, I’ll discuss some ways that we might cope with and address the unreliability of neural network models. As an initial coping strategy, I will first discuss a technique for detecting whether some content was generated by a machine learning model, leveraging the probability distribution that the model assigns to different content. Next, I will describe an approach for enabling neural network models to better estimate what they don’t know, such that they can refrain from making predictions on such inputs (i.e. selective classification). I will lastly describe methods for adapting models with small amounts of data to improve their accuracy under distribution shift…more details
Chelsea Finn is an Assistant Professor in Computer Science and Electrical Engineering at Stanford University. Her research interests lie in the capability of robots and other agents to develop broadly intelligent behavior through learning and interaction. To this end, her work has pioneered end-to-end deep learning methods for vision-based robotic manipulation, meta-learning algorithms for few-shot learning, and approaches for scaling robot learning to broad datasets. Her research has been recognized by awards such as the Sloan Fellowship, the NSF CAREER Award, the MIT Tech Review 35 Under 35, and the ACM doctoral dissertation award, and has been covered by various media outlets including the New York Times, Wired, and Bloomberg. Prior to Stanford, she received her Bachelor’s degree in Electrical Engineering and Computer Science at MIT and her PhD in Computer Science at UC Berkeley.
Virtual | Talk | Machine Learning | Intermediate
In this talk, I will attempt to provide several “bird’s eye” views on GNNs. Following a quick motivation on the utility of graph representation learning, I will derive GNNs from first principles of permutation invariance and equivariance. We will discuss how we can build GNNs that are not strictly reliant on the input graph structure…more details
Petar Veličković is a Staff Research Scientist at Google DeepMind, Affiliated Lecturer at the University of Cambridge, and an Associate of Clare Hall, Cambridge. Petar holds a PhD in Computer Science from the University of Cambridge (Trinity College), obtained under the supervision of Pietro Liò. His research concerns geometric deep learning—devising neural network architectures that respect the invariances and symmetries in data (a topic I’ve co-written a proto-book about). Petar’s research has been used in substantially improving travel-time predictions in Google Maps, and guiding intuition of mathematicians towards new top-tier theorems and conjectures.
Virtual | Talk | NLP & LLMs | Intermediate
Large language models (LLMs) have achieved a milestone that undeniably changed many held beliefs in artificial intelligence (AI). However, there re-mains many limitations of these LLMs when it comes to true language un-derstanding, limitations that are a byproduct of the underlying architecture of deep neural networks. Moreover, and due to their subsymbolic nature, whatever knowledge these models acquire about how language works will always be buried in billions of microfeatures (weights), none of which is meaningful on its own, making such models hopelessly unexplainable…more details
Walid Saba is a Senior Research Scientist at the Institute for Experiential AI at Northeastern University. Prior to joining the institute in 2023, he worked at two Silicon Valley startups, focusing on conversational AI. This work included high-level roles as the principal AI scientist for telecommunications company Astound and CTO of software company Klangoo, where he helped develop its state-of-the-art digital content semantic engine (Magnet).
Saba’s career to date has seen him hold various positions in both the private sector and academia. His resume includes entities such as the American Institutes for Research, AT&T Bell Labs, IBM and Cognos, while he has also spent a cumulative seven years teaching computer science at the University of Ottawa, the New Jersey Institute of Technology (NJIT), the University of Windsor (a public research university in Ontario, Canada), and the American University of Beirut (AUB).
Walid is frequent invited for interviews and as a keynote speaker on AI and NLP has published over 45 technical articles, including an award-winning paper that he presented at the German Artificial Intelligence Conference (KI-2008). Walid received his BSc and MSc in Computer Science from the University of Windsor, and a Ph.D in Computer Science from Carleton University in 1999.
Virtual | Talk | NLP & LLMs | Beginner
Generative models have demonstrated how helpful they can be on general knowledge, helping students on their writing assignments. But as soon as you want to run it in a professional setting with prompts like “what are the three main feature requests from our largest customers?”, they demonstrate their lack of knowledge.
In this session, I will introduce how Large Language Models (LLMs) can be connected to your data via semantic search. As I will present, there are many pitfalls and challenges. Some can be solved, when using the right technologies, others are still open problems…more details
Nils Reimers is an NLP / Deep Learning researcher with extensive experience on representing text in dense vector spaces and how to use them for various applications. During his research career, he created sentence-transformers that were the foundation for many today’s semantic search applications.
In 2022, Nils joined Cohere.com to lead the team on smarter semantic search technologies and how to connect LLMs to enterprise data. Here, his teams develop new foundation models that can understand and reason over complex data.
In-person | Case Study
By adopting the proposed framework, organizations can effectively harness the potential of Gen AI, empowering all employees with invaluable insights to make data-driven decisions. This talk aims to inspire organizations to embrace a safe agile approach and cultivate a culture of experimentation and collaboration for the successful deployment of Gen AI solutions…more details
In-person | Talk | Data Visualization & Data Analysis | All Levels
Save time and money by equipping everyone with fundamental data literacy and analytics skills. It sounds great, but most organizations bite off more than they can chew. Instead of trying to turn everyone into a data analyst, teach people the most vital descriptive analytics skills, and eliminate the flow of tedious work requests to your data experts…more details
A TEDx speaker, Dom brings a wealth of data storytelling experience to StoryIQ from his career at QBE, one of Australia’s largest insurance companies. At QBE, he was a senior leader in data analytics and business improvement, presenting data-driven strategy recommendations to the company’s senior executives and producing reports for the Group Board of Directors.
In-person | Talk | MLOps | Data Engineering & Big Data | Machine Learning | Deep Learning | Intermediate
In this presentation, Amber Roberts, Machine Learning Engineer at Arize AI, will present findings from research on ways to measure vector/embedding drift for image and language models. With lessons learned from testing different approaches (including Euclidean and Cosine distance) across billions of streams and use cases, Roberts will dive into how to detect whether two unstructured language datasets are different — and, if so, how to understand that difference using techniques such as UMAP…more details
Amber Roberts is a ML Growth Lead at Arize AI, a ML observability company built for maintaining models in production. Previously, Amber was a product manager of AI at Splunk and the Head of Artificial Intelligence at Insight Data Science. A Carnegie Fellow, Amber has an MS in Astrophysics from the Universidad de Chile.
In-person | Talk | Machine Learning | Beginner
Statistics do not come intuitively to humans; they always try to find simple ways to describe complex things. Given a complex dataset, they may feel tempted to use simple summary statistics like the mean, median, or standard deviation to describe it. However, these numbers are not a replacement for visualizing the distribution…more details
Stefanie Molin is a software engineer and data scientist at Bloomberg in New York City, where she tackles tough problems in information security, particularly those revolving around data wrangling/visualization, building tools for gathering data, and knowledge sharing. She is also the author of “Hands-On Data Analysis with Pandas,” which is currently in its second edition. She holds a bachelor’s of science degree in operations research from Columbia University’s Fu Foundation School of Engineering and Applied Science, as well as a master’s degree in computer science, with a specialization in machine learning, from Georgia Tech. In her free time, she enjoys traveling the world, inventing new recipes, and learning new languages spoken among both people and computers.
In-person | Track Keynote | Generative AI | All Levels
Generative AI has taken over the world by storm, but building Gen AI applications for production comes with a unique set of challenges. Questions around cost performance, risk, simplifying implementation for production, setting guardrails, adding automation where possible and leveraging CI/CD for ML all become even more important when Gen AI is involved…more details
Yaron Haviv is a serial entrepreneur who has been applying his deep technological experience in AI, cloud, data and networking to leading startups and enterprises since the late 1990s. As the Co-Founder and CTO of Iguazio, Yaron drives the strategy for the company’s MLOps platform and led the shift towards the production-first approach to data science and catering to real-time AI use cases. He also initiated and built Nuclio, a leading open source serverless framework with over 4,000 Github stars and MLRun, a cutting-edge open source MLOps orchestration framework. Prior to co-founding Iguazio in 2014, Yaron was the Vice President of Datacenter Solutions at Mellanox (now NVIDIA – NASDAQ: NVDA), where he led technology innovation, software development and solution integrations. He also served as the CTO and Vice President of R&D at Voltaire, a high-performance computing, IO and networking company which floated on the NYSE in 2007 and was later acquired by Mellanox (NASDAQ:MLNX). Yaron is an active contributor to the CNCF Working Group and was one of the foundation’s first members. He sits on the Data Science Committee of the AI Infrastructure Alliance (AIIA), of which Iguazio is a founding member. He is co-authoring a book on Implementing MLOps in the Enterprise for O’Reilly. Yaron presents at major industry events worldwide and writes tech content for leading publications including TheNewStack, Hackernoon, DZone, Towards Data Science and more.
Virtual |Talk | Machine Learning | Beginner – Intermediate
In this training session, we will delve into the intricacies of building robust and scalable recommendation engines specifically tailored for online food delivery services. We will introduce a newly released dataset called the Delivery Hero Recommendation Dataset (DHRH) and understand how this can be used for training different recommendation models.We will explore the challenges faced in this domain and discuss the techniques and best practices to overcome them, ensuring our recommendation systems can handle large-scale operations and adapt to changing customer preferences…more details
Raghav is a seasoned Data Science professional with over a decade’s experience of research & development of large-scale solutions in Finance, Digital Experience, IT Infrastructure and Healthcare for giants such as Intel, American Express, United HealthGroup and DeliverHero. He is an innovator with 10+ patents, a published author of multiple well received books & peer-reviewed papers and a regular speaker in leading conferences on topics in the areas of Generative AI, Recommendation Systems, Computer Vision, NLP, Deep Learning, Machine Learning and Augmented Reality.
Vishal is an experienced Data science professional with 12+ years of experience. He is currently leading the Ranking and Recommendation topic at DeliveryHero, where he joined in May 2022. Before joining DeliveryHero, he worked at Monotype as a Data science manager, leading the AI Research team in building products like WhatTheFont (identifying fonts from images), Font Similarity and many more. He has a couple of patents filed in his name.
Virtual | Talk | Machine Learning | LLMs | Intermediate
A highlight of this workshop is the introduction to the “causal LLM”, a pioneering concept where LLMs are designed using foundational causal principles. By the end of this session, participants will be equipped with the skills and knowledge to employ LLMs effectively in discerning complex causal models and pioneering advancements in the field of causal AI…more details
Robert Osazuwa Ness is a researcher at Microsoft Research and author of the book Causal Machine Learning. He leads the development of MSR’s causal machine learning platform and conducts research into probabilistic models for advanced causal reasoning. He has worked as a machine learning engineer in various machine learning startups. He attended graduate school at both Johns Hopkins SAIS (Hopkins-Nanjing Center) and Purdue University. He received his Ph.D. in Statistics from Purdue, where his dissertation research focused on Bayesian active learning models for causal discovery.
Virtual | Talk | MLOps and Data Engineering | Intermediate
With the wide adoption of generative artificial intelligence (AI), more than ever, ensuring the robustness of machine learning models is becoming crucial. One of the most concerning security threats to machine learning (ML) is the potential for adversarial attacks, a technique to exploit vulnerabilities of models to cause incorrect output. They are unapparent to humans but sufficient for machine learning models to misclassify the data, potentially harming the end users. Therefore, including adversarial training in the ML lifecycle is important to consider as you build out your model and prepare it for production usage. Join this talk to learn how the symbiosis between the ML security open source projects like Adversarial Robustness Toolbox and an ML operations (MLOps) project, Kubeflow, can streamline your machine learning workflow and improve your model’s robustness and security. With numerous benefits of the MLOps to assist in generating and defending your model, accelerate your development of secure machine learning models…more details
Anna Jung is a Senior ML Open Source Engineer at VMware, leading the open source team as part of the VMware AI Labs. She currently contributes to various upstream ML-related open source projects focusing on the project’s overall health, adoption, and innovation. She believes in the importance of giving back to the community and is passionate about increasing diversity in open source. When away from the keyboard, Anna is often at film festivals supporting independent filmmakers.
Virtual | Talk | Machine Learning | Deep Learning | Intermediate-Advanced
In this session we will deep dive into all the new developments and techniques in PyTorch and provide recommendations on how you can accelerate your models using native PyTorch code…more details
Supriya is an Engineering Manager working on PyTorch at Meta. Her team works on architecture optimization techniques like quantization, pruning as well as other core components of PyTorch 2.0 whereby enabling users to run AI models on different HW efficiently using native PyTorch. Prior to Meta, she worked as a software engineer at Nvidia on improving their GPU Architecture and accelerating AI models via TensorRT for inference. Supriya has an MS in CSE from University of Michigan, Ann Arbor and a bachelor’s degree from Bits Pilani, India.
In-person | Talk | Data Analytics and Big data | Machine Learning | MLOps | Beginner – Intermediate
When choosing different architectures for your enterprise data environment, you always want to make sure how to manage choices in the CAP theorem. The CAP theorem states that you can not have consistency, availability, and partition-tolerance at the same time, but what if choosing for Kappa architecture makes it possible to have it all?…more details
Joep has more than 12 years experience of developing, engineering, architecting and visualising data products in various markets ranging from energy to clothing manufacturing. He’s focussing on enabling teams to be better at handling data and providing the teams with the tools and the knowledge needed to go live and to stay in production.
He was member of the Teqnation program committee), did a presentation on Kafka and Hue usage during football, developing and deploying on Hololens, Total Devops using Gitlab, Evolution of a datascience product, Using the elastic stack from PoC to Production, Xbox Kinect on a bike at Devoxx London
Virtual | Talk | Responsible AI | Intermediate
Digital experimentation and A/B testing are invaluable methodologies within the AI domain, facilitating the validation and continuous improvement of models, solutions, and systems. This talk will delve into the intricate role of these testing mechanisms in the AI landscape.
We will start by introducing the concept of digital experimentation and A/B testing, elaborating on their integral role in testing hypotheses and making data-driven decisions. The discussion will further touch upon the traditional uses of these methodologies in the digital marketing sphere, enabling businesses to optimize their online content and increase user engagement…more details
Alessandro is a highly experienced data scientist with a Bachelor’s degree in computer science and a Master’s in data science. He has collaborated with a variety of companies and organizations and currently holds the role of senior data scientist at logistics giant Kuehne+Nagel. Alessandro is particularly passionate about statistics and digital experimentation and has a strong track record of applying these skills to solve complex problems. He shares his knowledge regularly, speaking at events like the Data Innovation Summit and DataMass Gdansk Summit.
In-person | Talk | LLMs | Machine Learning | Intermediate
In the realm of advanced computational linguistics, the efficacy of Large Language Models (LLMs) is intrinsically tied to their contextual precision. In this session presented by Jake from Cloudera (not State Farm), we’ll navigate the complexities of ensuring LLMs yield contextually accurate results, a necessity in today’s intricate data environments. Crucially, attendees will be treated to a live demonstration showcasing the utilization of RAG (Retrieval-Augmented Generation) and PEFT (Parameter Efficient Fine-Tuning) techniques, two of the leading approaches for this task that underpin the success of Cloudera’s Applied ML Prototypes (AMPs)…more details
Jake currently holds the position of Principal Technical Evangelist at Cloudera, where he promotes the strengths of Cloudera’s Lakehouse for delivering trusted AI. His tenure at Cloudera began as a Senior Product Marketing Manager for Cloudera Machine Learning (CML).
Before Cloudera, Jake developed his ML expertise at ExxonMobil, starting as a Data Scientist and later transitioning to a Data Science and Analytics Solution Architect role. He also contributed significantly at FarmersEdge, taking on responsibilities as a Senior Data Scientist and subsequently as a Data Science Manager.
Jake earned both his bachelor’s and master’s degrees from Brigham Young University in Information Systems Management with an emphasis in Statistics.
Outside of work, Jake is passionate about outdoor activities. He enjoys skiing, golfing, rafting, and hiking. However, spending time with his family amidst the mountains remains his most rewarding pastime.
In-Person | Talk | NLP & LLMs | GenAI | Advanced
In this talk, I will present data2vec, a framework for general self-supervised learning that uses the same learning method for either speech, NLP or computer vision. The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture. Instead of predicting modality-specific targets such as words, visual tokens or units of human speech which are local in nature, data2vec predicts contextualized latent representations that contain information from the entire input. Experiments on the major benchmarks of speech recognition, image classification, and natural language understanding demonstrate a new state of the art or competitive performance to predominant approaches…more details
Michael Auli is a principal research scientist/director at FAIR in Menlo Park, California. His work focuses on speech and NLP and he helped create projects such as wav2vec/data2vec, the widely used fairseq toolkit, the first modern feed-forward seq2seq models outperforming RNNs for NLP, and several top ranked submissions at the WMT news translation task in 2018 and 2019. Before that Michael was at Microsoft Research, where he did early work on neural machine translation and using neural language models for conversational applications. During his PhD at the University of Edinburgh he worked on natural language processing and parsing. http://michaelauli.github.io
Virtual | Talk | Machine Learning Safety and Security | Intermediate
This session will focus on security threats to machine learning models, including a demonstration of the similarities and differences between adversarial attacks in the domains of computer vision and natural language processing (NLP) with examples from open source projects like Adversarial Robustness Toolbox and TextAttack. Join the session to discuss applying adversarial research to real-world systems and learn how to be proactive with machine learning security…more details
Teodora is an open source software engineer at VMware AI Labs. During her first couple of years in VMware, as part of the Open Source Program Office, she was an active contributor and maintainer of The Update Framework (TUF) – a framework for securing software update systems. Currently, she invests her time in open source projects related to machine learning security.
Virtual | Talk | NLP | Beginner
In this talk, we will delve into the intricate relationship between NLP task definition and the outcomes of NLP system development, exploring how the precise formulation of tasks can either propel or hinder progress. Through case studies and examples, attendees will understand how ambiguous or biased task definitions can lead to misguided model objectives, high data acquisition costs, and misleading system evaluations, and learn how to strike the right balance between specificity and adaptability in NLP task design…more details
Panos Alexopoulos has been working since 2006 at the intersection of data, semantics, and software, building intelligent systems that deliver value to business and society. Born and raised in Athens, Greece, he currently works as Head of Ontology at Textkernel, in Amsterdam, Netherlands, where he leads a team of Data Professionals in developing and delivering a large cross-lingual Knowledge Graph in the HR and Recruitment domain. Panos holds a PhD in Knowledge Engineering and Management from National Technical University of Athens, and has published more than 60 papers at international conferences, journals and books. He is the author of the book “Semantic Modeling for Data – Avoiding Pitfalls and Breaking Dilemmas” (O’Reilly, 2020), and a regular speaker and trainer in both academic and industry venues.
In-person | Talk | All Levels
Open source technology has historically driven technological progress, from operating systems, to web development, and now, AI and machine learning. We’re at a point in history where the canonical “open source ML stack” has yet to be discovered. In this talk, you’ll hear about HPE’s approach to building an end-to-end ML stack, including Determined AI for model training and Pachyderm for data preparation. You’ll also see a live demo of how to use Determined AI for a transfer learning use case in the biomedical image domain. We’ll also touch on some key collaborations, including our partnerships with the AI Infrastructure Alliance, NVIDIA, and Aleph Alpha…more details
Isha Ghodgaonkar is a Machine Learning Developer Advocate at Hewlett Packard Enterprise (HPE), working within the HPC, AI & Labs business unit, which includes open source platforms Determined AI and Pachyderm. Prior to HPE, Isha worked at MITRE as an Autonomous Systems Engineer, a Data Science Intern at Genentech, and a technical intern at The Aerospace Corporation. Isha is a graduate of Purdue University.
In-person | Ai X Talk | Intermediate
In an era of Generative AI and Large Language Models, using old school machine learning might sound quaint. The reality is, however, that hidden with your corporate data is a trove of information and hidden business patterns. Leveraging key techniques from Machine Learning can help businesses uncover hidden patterns that will have immediate impact on your bottom line and on your ROI…more details
Ryohei is the Founder & CEO of dotData. Prior to founding dotData, he was the youngest research fellow ever in NEC Corporation’s 119-year history, the title was honored for only six individuals among 1000+ researchers.
During his tenure at NEC, Ryohei was heavily involved in developing many cutting-edge data science solutions with NEC’s global business clients, and was instrumental in the successful delivery of several high-profile analytical solutions that are now widely used in industry.
Ryohei received his Ph.D. degree from the University of Tokyo in the field of machine learning and artificial intelligence.
In-person | Ai X Talk
Abstract Coming Soon!
Cai GoGwilt has always been passionate about building products that make people more effective at their jobs. He is an MIT graduate in Computer Science and Physics, as well as a former software engineer at Palantir Technologies. By day, he’s a celebrated software engineer and thoughtful leader; by night, he’s a concert cellist and karaoke enthusiast.
In-person | Talk | NLP & LLMs | GenAI | Intermediate
There seems to be a new large ML model grabbing headlines every week. Whether it’s OpenAI’s big releases like GPT-3, Dalle-2 or Whisper, or one of the many open source projects generating state-of-the-art models, like Stable Diffusion, OpenFold or Craiyon, these models have found their way into the mainstream. Lukas will map the landscape for you and share how these teams use W&B to accelerate their work…more details
Lukas Biewald is the founder and Chief Data Scientist of CrowdFlower. Founded in 2009, CrowdFlower is a data enrichment platform that taps into an on-demand to workforce to help companies collect training data and do human-in-the-loop machine learning.
Following his graduation from Stanford University with a B.S. in Mathematics and an M.S. in Computer Science, Lukas led the Search Relevance Team for Yahoo! Japan. He then worked as a senior data scientist at Powerset, acquired by Microsoft in 2008. Lukas was featured in Inc Magazine’s 30 Under 30 list.
Lukas is also an expert level Go player.
In-person | Ai X Talk
Abstract Coming Soon!
Dr. Alex Liu is the founder and Director of the RMDS Lab, a data and AI ecosystem service provider. From 2013 to 2019, Alex was one of the data science thought leaders and a distinguished data scientist at IBM where he served as a Chief Data Scientist for analytics services. Before joined IBM, Dr. Liu worked as a chief data scientist for a few companies including iSKY and Retention Science. Previously, Dr. Liu taught advanced quantitative methods to PhD candidates in the University of Southern California and the University of California at Irvine, while consulted for many well-known organizations such as the United Nations and Ingram Micro. Alex has a MS of Statistical Computing and a PhD of Sociology from Stanford University.
Virtual | Talk | LLMs | ML/AI Safety and Security | All Levels
Language models are incredible engineering breakthroughs but require auditing and risk management before productization. These systems raise concerns about toxicity, transparency and reproducibility, intellectual property licensing and ownership, disinformation and misinformation, supply chains, and more. How can your organization leverage these new tools without taking on undue or unknown risks? While language models and associated risk management are in their infancy, a small number of best practices in governance and risk are starting to emerge. If you have a language model use case in mind, want to understand your risks, and do something about them, this presentation is for you..more details
Patrick Hall is an assistant professor of decision sciences at the George Washington University School of Business, teaching data ethics, business analytics, and machine learning classes. He also conducts research in support of NIST’s AI risk management framework and is affiliated with leading fair lending and AI risk management advisory firms.
Patrick studied computational chemistry at the University of Illinois before graduating from the Institute for Advanced Analytics at North Carolina State University. He has been invited to speak on AI and machine learning topics at the National Academies of Science, Engineering, and Medicine, ACM SIG-KDD, and the Joint Statistical Meetings. He has been published in outlets like Information, Frontiers in AI, McKinsey.com, O’Reilly Ideas, and Thompson-Reuters Regulatory Intelligence, and his technical work has been profiled in Fortune, Wired, InfoWorld, TechCrunch, and others. Patrick is the lead author of the book Machine Learning for High-Risk Applications.
Prior to joining the GW School of Business, Patrick co-founded BNH.AI, a boutique law firm focused on AI governance and risk management. He led H2O.ai’s efforts in responsible AI, resulting in one of the world’s first commercial applications for explainability and bias mitigation in machine learning. Patrick also held global customer-facing roles and R&D roles at SAS Institute. Patrick has built machine learning software solutions and advised on matters of AI risk for Fortune 100 companies, cutting-edge startups, Big Law, and U.S. and foreign government agencies.
In-person | Talk | Deep Learning | Machine Learning | Intermediate
Keras, the popular deep learning library, is now becoming multi-backend with support for TensorFlow, JAX, and PyTorch. Available now, Keras Core allows developers to create models with all of the simple high-level components of Keras while interchanging between frameworks to take advantage of the benefits of each. Existing users of each framework can seamlessly integrate their code, including backend-specific code, with Keras. Additionally, Keras Core’s ops suite, which includes the NumPy API, allows you to develop custom components for use on any framework. For current users of tf.keras, Keras Core with the TensorFlow backend serves as a drop-in replacement, meaning that changing your imports is all that is necessary to start taking advantage of the framework-agnostic future…more details
Neel is currently an engineer on the Keras team at Google. His work has been focused on open source development, adding major features to the latest releases of TensorFlow and Keras. Experienced broadly from data science to ML infrastructure, he is responsible for the saving and export of ML models, as well as a developer for cross-framework compatibility at Google. As a key maintainer of keras.io and tensorflow.org, Neel is passionate about educating the data science community at large and helping the latest innovations of AI reach everyone.
In-person | Talk | Beginner – Intermediate
Manufacturers of modern age are looking for data science solutions to help move their 3 key metrics, – Produce more, Be efficient and optimize resource utilization, and ship with highest possible quality. Given these 3 KPIs , we will understand how data science influences each of these and will deep dive over the key projects that have influenced these KPIs. We deep down on specific projects. Eg how we use recursive hypothesis testing to reduce the testing frequency, how we understand the physics behind a test and optimize, how we make mass production decisions based on design of experiments and small distribution analysis…more details
Angad Arora is a seasoned professional with a wealth of expertise in manufacturing and quality for leading tech companies. With his innovative approach to lean manufacturing using data science, he has successfully saved millions of dollars for consumer electronics manufacturing lines. Currently serving as a Product Quality Manager at Google Inc., Angad continues to drive excellence and ensure top-notch product quality in the dynamic world of technology.
In-person | Talk | Beginner
Little is known about the history of Neural Networks. For instance, “”Artificial Intelligence.”” The history goes back to around 1946 when researchers noticed that the mathematics involved with linear algebra wherein materials were stretched that the neighboring atoms were affected, was best modeled with a branch of Math known as tensors. Tensors are used today to create neural networks. Neural networks are run on a powerful Graphics Processing Unit (GPU.) GPUs came about because of video games and entertainment. Thus we can say that video games laid the groundwork for AI…more details
Jack McCauley an Innovator in Residence at Jacobs Institute for Design Innovation at UC Berkeley, Professor at UC Berkeley, Co-Founder of Oculus, an American engineer, hardware designer, inventor, video game developer and philanthropist. Jack is best known for designing the guitars and drums for the Guitar Hero video game series, and as a co-founder and former chief engineer at Oculus VR. At Oculus, Jack designed and built the Oculus DK1 and DK2 virtual reality headsets. Oculus was acquired by Facebook for $2 Billion. McCauley holds numerous U.S. patents for inventions in software, audio effects, virtual reality, motion control, computer peripherals, and video game hardware and controllers. Jack was awarded a full scholarship to attend University of California, Berkeley where he earned a BSc., EECS in Electrical Engineering and Computer Science in 1986. Jack has authored numerous research papers in the field of artificial intelligence (AI) and mathematical modeling of AI-based systems and is currently pursuing new projects at his private R&D facility and hardware incubator in Pleasanton, California.
In-person | Talk | LLMs | Machine Learning | Intermediate
In the fast-paced world of data science and AI, we will explore how large language models (LLMs) can elevate the development process of Apache Spark™ applications.
We’ll demonstrate how LLMs can simplify SQL query creation, data ingestion, and DataFrame transformations, leading to faster development and more precise code that’s easier to review and understand. We’ll also show how LLMs can assist in creating visualizations and clarifying data insights, making complex data easy to understand…more details
Denny Lee is a long-time Apache Spark™ and MLflow contributor, Delta Lake maintainer, and a Sr. Staff Developer Advocate at Databricks. A hands-on distributed systems and data sciences engineer with extensive experience developing internet-scale data platforms and predictive analytics systems. He has previously built enterprise DW/BI and big data systems at Microsoft, including Azure Cosmos DB, Project Isotope (HDInsight), and SQL Server. He was also the Senior Director of Data Sciences Engineering at SAP Concur. He also has a Masters of Biomedical Informatics from Oregon Health and Sciences University and has implemented powerful data solutions for enterprise Healthcare customers. His current technical focuses include Distributed Systems, Delta Lake, Apache Spark, Deep Learning, Machine Learning, and Genomics.
In-person | Talk | All Levels
Attendees will learn how to apply NLP to financial data. Earnings data covers all industries in the US; attendees will be surprised to learn how much information is hiding in plain sight. Earnings calls are public information and I’m going to show you what you’re able to learn from them using ASR and NLP tools.
Speech recognition technology has come a long way in recent years, but there are still some larger market challenges when it comes to transcribing speech accurately. This comes into play especially for domain-specific terminology around certain industries. For example, in the renewable energy industry, transcription might include terms like “hydro DM,” “biomass conversion,” and “renewable portfolio standard,” that are outside normal conversation…more details
Katie is the Product Lead for Scribe at Kensho. Scribe is an Artificial Intelligence, Natural Language Processing tool that does voice to text transcription. Katie’s background is in data pipeline management and machine learning – specifically unsupervised machine learning in aerospace to detect satellite collisions. She is a guest lecturer at Johns Hopkins University and graduated with distinction from the University of Virginia.
In-person | Talk | Generative AI | NLP | Machine Learning | Research Frontiers (Research topics in ML, DL, Data Science etc) | All Levels
This talk will dive into the end-to-end process of and framework for building a generative AI application, leveraging a fun and engaging case study with open-source tooling (e.g., HuggingFace models, Python, PyTorch, Jupyter Notebooks). We will guide attendees through key stages from model selection and training to deployment, while also addressing fine-tuning versus prompt-engineering for specific tasks, ensuring the quality of output, and mitigating risks…more details
Michelle is a technology leader that specializes in machine learning and cloud computing. She has 15 years of experience in the technology industry, contributed to the original IBM Watson showcased on Jeopardy, and enjoys building and leading teams that develop and deploy AI solutions to solve real-world problems. Michelle is passionate about diversity, STEM education/careers for our minority communities, and serves both on the board of Women in Data and as an avid volunteer for Girls Who Code.
In-person | Keynote | Machine Learning | All Levels
We have seen amazing technical progress in AI applications in recent years. This talk considered the human side rather than the technical side: how can we gain confidence that our applications will be fair, just, truthful, beneficial, and well-stirred for their users, the other stakeholders, and society at large…more details
Peter Norvig is a Distinguished Education Fellow at Stanford’s Human-Centered Artificial Intelligence Institute and a researcher at Google Inc; previously he directed Google’s core search algorithms group and Google’s Research group. He was head of NASA Ames’s Computational Sciences Division, where he was NASA’s senior computer scientist and a recipient of NASA’s Exceptional Achievement Award in 2001. He has taught at the University of Southern California, Stanford University, and the University of California at Berkeley, from which he received a Ph.D. in 1986 and the distinguished alumni award in 2006. He was co-teacher of an Artifical Intelligence class that signed up 160,000 students, helping to kick off the current round of massive open online classes. His publications include the books Data Science in Context (to appear in 2022), Artificial Intelligence: A Modern Approach (the leading textbook in the field), Paradigms of AI Programming: Case Studies in Common Lisp, Verbmobil: A Translation System for Face-to-Face Dialog, and Intelligent Help Systems for UNIX. He is also the author of the Gettysburg Powerpoint Presentation and the world’s longest palindromic sentence. He is a fellow of the AAAI, ACM, California Academy of Science and American Academy of Arts & Sciences.
In-Person | Business Talk | All Levels
By harnessing the power of machine learning and AI, companies can unlock advanced capabilities and gain valuable insights that streamline sales processes and enhance sellers’ performance. This infusion eliminates the laborious task of manual information analysis across multiple disparate data sources, empowering sellers to dedicate their time and expertise to building meaningful customer relationships. In architecting AI-based solutions, it is of high importance to understand the users’ daily workflow and challenges, and not just surfacing the insights but surfacing the right information at the right place to simplify users’ workflow…more details
In-person | Ai X Talk | Generative AI
In this session you will learn about some early experiences, and best practices of deploying Generative AI solutions in enterprises…more details
In-person | Ai X Talk | Intermediate
Critical success factor for enterprise level AI adoption and sponsorship is the ability to rapidly deploy AI technologies, solve use cases quickly, and deliver iterative business value to stakeholders. In this session, we will discuss the challenges of AI adoption, share some pragmatic approaches that enterprises can adopt to accelerate their AI / ML initiatives and deliver quick and iterative business value. Examples of some approaches would include leveraging pre-integrated AI frameworks and automation techniques such as AutoML, along with ready-to-use industry-specific AI solutions. The faster and more incremental the delivery of business value through AI, the higher the likelihood of successful adoption and implementation of AI within enterprises…more details
Durga Kota is the Chief Technology Officer for the Americas region and is responsible for leading the strategies to translate Fujitsu technology innovation into differentiated and industry-ready solutions for the region’s customers.
Throughout his 24-year career, Durga has held customer-facing, consultative solution-selling and technology leadership roles. His collaborative spirit and deep technical expertise enable him to build trust, work in true partnership with customers, and provide solution leadership that leverages advanced technologies to solve even the most complex challenges.
Durga holds an MBA from the Indian Institute of Foreign Trade (IIFT), Delhi, and a B.Tech. in Computer Science and Engineering from the Jawaharlal Nehru Technological University, Hyderabad, India.
Durga resides in the San Diego area with his family and likes to spend his free time trekking on one of the many trails in the area, or volunteering for the local food bank.
In-person | Talk | Intermediate-Advanced
In today’s data-driven world, recommendation systems have become ubiquitous, driving user engagement and increasing revenue for businesses across various domains. This talk will take you on a journey from the raw data to vectorization techniques, ultimately culminating in the creation of a robust recommendation system. Whether you’re a seasoned data scientist or just starting your journey into recommendation systems, this presentation will provide valuable insights and practical takeaways for building powerful recommendation engines…more details
Hudson Buzby is a dynamic Solutions Architect with a passion for leveraging technology to solve complex challenges. With over a decade of experience in the tech industry, Hudson has made significant contributions to the world of data engineering and cloud solutions. Hudson currently serves as a Solutions Architect at Qwak, where he continues to push the boundaries of what’s possible in the world of cloud computing and data management.
In-person | Talk | NLP & LLMs | Intermediate-Advanced
In this session, we will take a deep dive into a novel application of AI: training Large Language Models (LLM) on individual employee’s Slack messages. The first portion of our discussion is dedicated to the technical aspects of this process, where we will explain the steps involved in fine-tuning the LLM. We will demonstrate how such models, tailored to mimic specific individual’s textual styles, can serve as the foundation for applications in text generation and automated question answering systems.
Transitioning into the second part of our talk, we will spotlight the often-underemphasized side of AI deployment – the risks and ethical concerns. Using our project as a case study, we will expose you to the potential pitfalls we encountered and the proactive measures taken to manage these risks. Our aim is to underline the necessity of an encompassing risk management framework for AI…more details
Eli is CTO and Co-Founder at Credo AI. He has led teams building secure and scalable software at companies like Netflix and Twitter. Eli has a passion for unraveling how things work and debugging hard problems. Whether it’s using cryptography to secure software systems or designing distributed system architecture, he is always excited to learn and tackle new challenges. Eli graduated with an Electrical Engineering and Computer Science degree from U.C. Berkeley.
In-person | Talk | Machine Learning | Deep Learning | Intermediate
In this talk I will present a new, hybrid approach which combines the best aspects of both methods. The process begins with a careful observation of customer data and assessment of whether there are naturally formed clusters in the data. It continues with the selection of a clustering algorithm and the fine-tuning of a model to create clusters…more details
Evie Fowler is a data scientist based in Pittsburgh, Pennsylvania. She currently works in the healthcare sector leading a team of data scientists who develop predictive models centered on the patient care experience. She holds a particular interest in the ethical application of predictive analytics and in exploring how qualitative methods can inform data science work. She holds an undergraduate degree from Brown University and a master’s degree from Carnegie Mellon.
In-person | Talk | Generative AI | Machine Learning Safety and Security | Beginner-Intermediate
In today’s digital world, we have access to everything at our fingertips. We generate and consume data at a lightning speed. Data is growing at an exponential rate. But how safe is your data? Our data is always surrounded by advanced persistent threats (APTs). On the other hand, Data Science, Machine Learning & Artificial Intelligence (AI) with its sub-domains like Deep Learning and Large Language Models (LLMs) are making tremendous progress. In this talk, I will focus on the advanced technologies like LLMs and GPT models in the domain of cyber security. But some of these concepts are relatable to other industries as well…more details
Nirmal Budhathoki is a Senior Data Scientist, who is currently working at Microsoft in Cloud Security. Nirmal has over 12+ years of experience in the IT industry, including 5+ years in data science. Nirmal’s strong belief in continuous learning has led him to complete three master’s degrees with majors on: Information Systems, Business Administration, and Data Science. Nirmal loves to help the data science community and has completed over 600+ free mentoring sessions with aspiring data scientists to help them navigate their data science career. Nirmal also conducts mentored learning sessions for MiT’s Data Science and Machine Learning certification program in collaboration with Great Learning. Nirmal has experience working with the US government for the Department of Navy, and he is also a US army veteran. Nirmal yearns to solve data science problems that are aligned with product strategy and business outcomes. In his free time, Nirmal loves using this data science skills in sports analytics.
In-person | Talk | Machine Learning | MLOps and Data Engineering | All Levels
The robotaxi industry has one of the most extreme use case for Machine Learning with very large image/LiDAR datasets and resource-hungry training jobs. I will recount my experience leading the ML Infrastructure team at Cruise, the #1 robotaxi company. From training on desktop GPUs to being entirely cloud-based, I will detail the challenges we faced, how we solved them. I will also detail what guarantees need to be built into any production-grade ML platform to ensure fast, robust, and safe ML development…more details
Emmanuel Turlay is the founder and CEO of Sematic, an open-source ML infrastructure company. Emmanuel started his career in academia researching particle physics at CERN, before branching out into tech and moving to the US in 2014. He led engineering teams at Instacart, and then Cruise, where he led the ML Infrastructure team for four years. In 2022, Emmanuel founded Sematic to bring learnings from building ML infrastructure for robotaxis to the rest of the industry in an open-source manner.
In-person | Talk | MLOps | Data Engineering & Big Data | All Levels
Over the past decade, the democratization of data science tooling, particularly through Python libraries like pandas and NumPy, has empowered practitioners of all levels to work with data efficiently. Yet, despite the popularity of these tools, they present challenges as practitioners look to scale their workflows to production. In this talk, we explore the limitations of these tools and pain points that data scientists encounter when working with data at scale. I will share how our open-source project Modin (10M+ downloads) addresses this issue by seamlessly scaling up your pandas code with just a single line of code change…more details
Doris Lee is the CEO and co-founder of Ponder (https://ponder.io/). Doris received her Ph.D. from the UC Berkeley RISE Lab and School of Information in 2021, where she developed tools that help data scientists explore and understand their data. She is the recipient of Forbes 30 under 30 for Enterprise Technology in 2023.
In-person | Talk | MLOps | Intermediate
Ensuring proper data quality is critical in the effective implementation of data pipelines for ML, data science, geospatial analysis, or general analytics.
Most engineering teams address data quality and pipeline orchestration as two separate tasks. In this presentation, Sandy Ryza will explain the benefits of a model in which arbitrary checks are included in the data orchestration logic, resulting in better control and integration of data quality checks at various steps in the pipeline…more details
Sandy is a lead engineer, author, and thought leader in the domain of data engineering. Sandy co-wrote “Advanced Analytics with PySpark” and “”Advanced Analytics with Spark”. He led ML and data science teams at Cloudera, Remix, Clover Health, and KeepTruckin.
Sandy is currently the lead engineer on the Dagster project, an open-source data orchestration platform used in MLOps, data science, IOT and analytics. Sandy is a regular speaker at data engineering and ML conferences.
In-person | Talk | Deep Learning | MLOps and Data Engineering | Intermediate – Advanced
In this talk, we’ll provide an overview of the core ideas behind Saturn, how it works on a technical level to reduce runtimes & costs, and the process of using Saturn for large-model finetuning. We’ll demonstrate how Saturn can accelerate and optimize large-model workloads in just a few lines of code and describe some high-value real-world use cases we’ve already seen in industry & academia…more details
Kabir Nagrecha is a Ph.D. candidate at UC San Diego, working with Professors Arun Kumar & Hao Zhang. His work focuses on systems infrastructure to support deep learning at scale, aiming to democratize large models and amplify the impact of machine learning applications. He is the recipient of the Meta Research Fellowship, as well as fellowships from the Halicioglu Data Science Institute and Jacobs School of Engineering at UCSD.
Kabir is the youngest-ever Ph.D. student at UCSD, having started his doctorate at the age of 17. He’s previously worked with companies such as Apple, Meta, & Netflix to build the core infrastructure that supports widely-used services such as Siri & Netflix’s recommendation algorithms. Most recently, he’s been working on Saturn, a new system to support automatic parallelization, scheduling, and resource apportioning for training large neural networks.
In-person | Talk | Machine Learning | Intermediate
Many retail giants are experiencing huge inventory loss and shrinkage problems because they process a huge number of transaction activities every day across the U.S. and offer a liberal shopping policy to provide a convenient customer shopping and return experience. Due to the facts that 1) they have highly imbalance and complex transaction data set as there are enormous transaction data while different types of anomalies are exceedingly rare; 2) there are seldom predefined labels available as it is not feasible to have human experts manually review every transaction and identify anomalies, it is a challenging task to investigate customers’ return behaviors and prevent fraudulent activities…more details
Chuying (Annie) Ma is a senior data scientist in Walmart Inkiru team, where she works on developing and implementing machine learning models and strategies for real-time fraud detection and risk mitigation. She has a Master of Science degree in Biostatistics from Harvard University and gets her bachelor’s degree in Statistics and Mathematics from University of Michigan – Ann Arbor. In her spare time, she enjoys playing violin and ukulele.
In-person | Talk | LLMs | Machine Learning | Intermediate
In this presentation, we explore an approach that utilizes LLMs to guide feature engineering. By leveraging the contextual understanding within LLMs, we have developed a system for LLM-assisted feature engineering. Our research demonstrates the practical benefits of this synergy – from feature ideation to improved feature relevance to enhanced model interpretability and efficiency…more details
Sergey is a data scientist with a background in physics and neurobiology. FeatureByte is Sergey’s second startup. He was one of the first employees at DataRobot where he created and led a professional services group and helped the company grow into a unicorn. Sergey is widely known for being a Kaggle Grandmaster and holding the #1 rank on Kaggle in the past. Multiple times he was mentioned as one of the top data scientists by various publications. Sergey’s passion is in machine learning, predictive modeling and inventive feature engineering.
In-person | Talk | Machine Learning
Deploying advanced Machine Learning technology to serve customers and/or business needs requires a rigorous approach and production-ready systems, which has led to the development of MLOps. Large models make rigorous engineering and scalable architectures even more important, which in turn has led to the emergence of LMOps. Just the size of the models themselves, and the datasets used for training, require highly efficient infrastructure. More complex pipeline topologies which include transfer learning, fine tuning, instruction tuning, and evaluation along a collection of complex dimensions, require a high degree of flexibility for customization. Add to this the complex inference-time systems and requirements, and LMOps starts to look somewhat challenging to implement…more details
A data scientist and ML enthusiast, Robert has a passion for helping developers quickly learn what they need to be productive. Robert is currently the Senior Product Manager for TensorFlow Open-Source and MLOps at Google and helps ML teams meet the challenges of creating products and services with ML. Previously Robert led software engineering teams for both large and small companies, always focusing on moving fast to implement clean, elegant solutions to well-defined needs. You can find him on LinkedIn at robert-crowe.
In-person | Talk | Deep Learning | NLP | Machine Learning | MLOps and Data Engineering | Beginner – Intermediate
We discuss a unified and user friendly approach to develop ML applications on MySQL HeatWave AutoML. We describe a simple to use MySQL API for HeatWave AutoML, its extension for different use cases and its ability to interface with third party applications. We also discuss how this facilitates the development of classification, regression, anomaly detection, forecasting and recommendation systems use cases, and present its comparison with other platforms that offer ML features for data bases. Lastly, we present how a user can fine tune the AutoML pipeline for their needs…more details
Sanjay Jinturkar is Senior Director, MySQL HeatWave, focused on building machine learning capabilities inside the MySQL HeatWave database. These capabilities enable the user to automatically develop and deploy machine learning models inside the data base using AutoML in a cloud native environment for a variety of use cases, without the need to pull the data or the model outside the database. In past, he has held multiple technology and engineering management positions in systems software, mobile communications software, applications development and diagnostics. His interests include machine learning, cloud computing, database and architecture. Sanjay has a PhD in Computer Science from University of Virginia.
Sandeep Agrawal leads the HeatWave Machine Learning (HeatWave ML) project within MySQL HeatWave. HeatWave ML is the product of years of research and advanced development, and aims to help both data scientists and non-data scientists quickly apply ML to a given problem. Prior to HeatWave, Sandeep led the Oracle AutoML project within Oracle labs, creating a state-of-the-art distributed AutoML engine. He is passionate about Machine Learning and Systems Architecture, and a project like HeatWave ML that combines the two is heaven for him. Prior to Oracle, he completed his PhD in Computer Science from Duke University in 2015.
In-Person | Half-Day Training | Machine Learning | Intermediate
Finding good datasets or web assets to build data products or websites with, respectively, can be time-consuming. For instance, data professionals might require data from heavily regulated industries like healthcare and finance, and, in contrast, software developers might want to skip the tedious task of collecting images, text, and videos for a website. Luckily, both scenarios can now benefit from the same solution, Synthetic Data…more details
Ramon is a data scientist, researcher, and educator currently working in the Developer Relations team at Seldon in London. Prior to joining Seldon, he worked as a freelance data professional and as a Senior Product Developer at Decoded, where he created custom data science tools, workshops, and training programs for clients in various industries. Before freelancing, Ramon wore different research hats in the areas of entrepreneurship, strategy, consumer behavior, and development economics in industry and academia. Outside of work, he enjoys giving talks and technical workshops and has participated in several conferences and meetup events. In his free time, you will most likely find him traveling to new places, mountain biking, or both.
In-person | Half-Day Training | Machine Learning | All Levels
In this 90-minute hands-on lab, you’ll learn how to start your Graph journey with ArangoDB. We will walk you through getting a basic deployment up and running in minutes. We will then help you create a deployment in the ArangoGraph insights platform and learn how to load and query your data…more details
Asif Kazi has directed global sales engineering, professional services, and technical customer success teams for over 15 years at Foursquare, Ahana, Amazon Web Services, StreamSets, and Couchbase. He is passionate about working with customers and solving their business problems. His core competencies include scaling teams globally, building referenceable client relationships, and accelerating revenue growth.
Arthur Keen, Ph.D. is a Graph evangelist with more than 20 years of experience in Graph AI. His expertise spans Graph Machine Learning, graph analytics, and ontology. Arthur has developed knowledge graph-based solutions in intelligence, cyber, financial services, logistics, retail, and energy. He has led 3 product teams from concept to GA and first customers. He has been awarded 7 patents and has held leadership roles including CTO, Chief Scientist, and VP of Engineering. He has a Computer Science/Industrial Engineering Ph.D. and has been invited to speak at conferences including Data Day TX, NoSQL Now, SemTech, and SXSW.
In-person | Half-Day Training | Machine Learning | Intermediate-Advanced
As machine learning models have become more present in our lives, there has been increasing attention on the reliability of these models. A major component of this is understanding the how uncertain the model is about its prediction. No model is exactly right 100% of the time, so we need methods and approaches by which we can quantify the level of uncertainty around a prediction.
Approaches to uncertainty quantification (UQ) vary, and depend on the type of problem. For classification problems, the primary approach is probability calibration: making sure that the model outputs corresponding to each class “behaves well” as a probability. For regression problems, there are several different approaches. One can configure models to output an interval, rather than a single point prediction, along with a “coverage” value that specifies the probability that the interval covers the true value. The framework of Conformal Prediction provides theoretical guarantees around such interval predictions. Or one can use methods that output an entire conditional density for y given X. This is called probabilistic regression, or conditional density estimation. Several parametric and non-parametric approaches exist for this problem including PrestoBoost, Coarsage, and NGBoost…more details
Brian Lucena is Principal at Numeristical, where he advises companies of all sizes on how to apply modern machine learning techniques to solve real-world problems with data. He is the creator of three Python packages: StructureBoost, ML-Insights, and SplineCalib. In previous roles he has served as Principal Data Scientist at Clover Health, Senior VP of Analytics at PCCI, and Chief Mathematician at Guardian Analytics. He has taught at numerous institutions including UC-Berkeley, Brown, USF, and the Metis Data Science Bootcamp.
In-person | Workshop | Machine Learning | Deep Learning | NLP | ALL | Intermediate
In this hands-on, example driven workshop, we’ll introduce Mojo🔥 language features by starting with Python code and making minor changes to convert into high-performance Mojo🔥 code. Mojo provides full interoperability with the Python ecosystem, and we’ll show you how to integrate Mojo🔥 into your existing Python workflows. We’ll share the workshop material as a hosted tutorial with several Mojo🔥 scripts and Jupyter Notebooks which you can use as a starting point for your projects…more details
Shashank is an engineer, educator and doodler. He writes and talks about machine learning, specialized machine learning hardware (AI Accelerators) and AI Infrastructure in the cloud. He previously worked at Meta, AWS, NVIDIA, MathWorks (MATLAB) and Oracle in developer relations and marketing, product management, and software development roles and hold an M.S. in electrical engineering.
Virtual | Training | Generative AI | All Levels
This engaging workshop provides a hands-on journey into the world of Generative AI and Autonomous Agents, crucial building blocks towards achieving Artificial General Intelligence (AGI). Participants will delve into the transformative shift in AI, gaining insights into the revolutionary impact of Generative AI across various domains such as text, image, video, and 3D object generation, as well as data augmentation. The workshop will equip attendees with cutting-edge tools to substantially boost their AI capabilities and efficiency. A key feature of the session is a practical exercise in crafting an Autonomous AI Agent, a technology poised to work in synergy with us in the near future. This succinct yet thorough workshop is tailored for those eager to maintain a competitive edge in the swiftly advancing field of AI…more details
Long before the buzz surrounding generative AI, Martin Musiol was already advocating for its significance in 2015. Since then, he has been a frequent speaker at conferences, podcasts, and panel discussions, addressing the technological advancements, practical applications, and ethical considerations of generative AI. Martin Musiol is a founder of generativeAI.net, a lecturer on AI to over 3000 students, and publisher of the newsletter ‘Generative AI: Short & Sweet’. As the lead for GenAI Projects in Europe at Infosys Consulting (previously at IBM), Martin Musiol helps companies globally harness the power of generative AI to gain a competitive advantage. -> https://www.linkedin.com/in/martinmusiol1/ and his webpage: https://generativeai.net/
In-person | Training | MLOps and Data Engineering | Machine Learning | Deep Learning | All Levels
Our objective is to ensure that you are equipped with the essential knowledge and practical tools to proficiently manage your machine-learning models in a real-world production environment…more details
Oliver Zeigermann has been developing software with different approaches and programming languages for more than 3 decades. In the past decade, he has been focusing on Machine Learning and its interactions with humans.
Virtual | Workshop | Machine Learning | Intermediate
In this talk, we will cover the use of Generative models, such as LLMs and GANs, for the generation of smart synthetic data that can be leveraged to impute missing data. By using a generative model to impute missing data, we can generate new samples that are representative of the underlying data distribution, which can help to reduce the impact of missing data on our models. In addition, these models can be fine-tuned to specific datasets, allowing us to generate synthetic data that is tailored to our particular use case…more details
Fabiana Clemente is the co-founder and CDO of YData, combining Data Understanding, Causality, and Privacy as her main fields of work and research, with the mission to make data actionable for organizations. Passionate for data, Fabiana has vast experience leading data science teams in startups and multinational companies. Host of “When Machine Learning meets privacy” podcast and a guest speaker at Datacast and Privacy Please, the previous WebSummit speaker, was recently awarded “Founder of the Year” by the South Europe Startup Awards.
In-person | Workshop | Data Engineering & Big Data | Machine Learning | Generative AI | Data Visualization | Beginner
In this presentation, Alison will take you through the example of one independent researchers journey from no-code to knowledge graph master. She will walk you through the steps the researcher took, including how to format data stores, what data to collect, ways to leverage LLMs in the process and ultimately have a working knowledge graph in a dashboard with minimal coding experience…more details
Alison Cossette is a dynamic Data Science Strategist, Educator, and Podcast Host. As a Developer Advocate at Neo4j specializing in Graph Data Science, she brings a wealth of expertise to the field. With her strong technical background and exceptional communication skills, Alison bridges the gap between complex data science concepts and practical applications.
Alison’s passion for responsible AI shines through in her work. She actively promotes ethical and transparent AI practices and believes in the transformative potential of responsible AI for industries and society. Through her engagements with industry professionals, policymakers, and the public, she advocates for the responsible development and deployment of AI technologies.
Alison’s academic journey includes pursuing her Master of Science in Data Science program, specializing in Artificial Intelligence, at Northwestern University and research with Stanford University Human-Computer Interaction Crowd Research Collective. Alison combines academic knowledge with real-world experience. She leverages this expertise to educate and empower individuals and organizations in the field of data science.
Overall, Alison Cossette’s multifaceted background, commitment to responsible AI, and expertise in data science make her a respected figure in the field. Through her role as a Developer Advocate at Neo4j and her podcast, she continues to drive innovation, education, and responsible practices in the exciting realm of data science and AI.
In-person | Workshop | Machine Learning | Beginner
Pandas can be tricky, and there is a lot of bad advice floating around. This tutorial will cut through some of the biggest issues I’ve seen with Pandas code after working with the library for a while and writing three books on it…more details
Matt Harrison has been using Python since 2000. He runs MetaSnake, a Python and Data Science consultancy and corporate training shop. In the past, he has worked across the domains of search, build management and testing, business intelligence, and storage.
He has presented and taught tutorials at conferences such as Strata, SciPy, SCALE, PyCON, and OSCON as well as local user conferences.
Virtual | Tutorial | GenAI | LLM | All Levels
The tutorial will consist of the following parts: (1) Introduction and overview of the generative AI landscape, (2) Technical and ethical challenges with generative AI, and (3) Solutions for alleviating the challenges with real-world use cases and case studies (including practical challenges and lessons learned in industry)…more details
Krishnaram Kenthapadi is the Chief AI Officer & Chief Scientist of Fiddler AI, an enterprise startup building a responsible AI and ML monitoring platform. Previously, he was a Principal Scientist at Amazon AWS AI, where he led the fairness, explainability, privacy, and model understanding initiatives in the Amazon AI platform. Prior to joining Amazon, he led similar efforts at the LinkedIn AI team, and served as LinkedIn’s representative in Microsoft’s AI and Ethics in Engineering and Research (AETHER) Advisory Board. Previously, he was a Researcher at Microsoft Research Silicon Valley Lab. Krishnaram received his Ph.D. in Computer Science from Stanford University in 2006. He serves regularly on the senior program committees of FAccT, KDD, WWW, WSDM, and related conferences, and co-chaired the 2014 ACM Symposium on Computing for Development. His work has been recognized through awards at NAACL, WWW, SODA, CIKM, ICML AutoML workshop, and Microsoft’s AI/ML conference (MLADS). He has published 50+ papers, with 7000+ citations and filed 150+ patents (70 granted). He has presented tutorials on privacy, fairness, explainable AI, model monitoring, responsible AI, and generative AI at forums such as ICML, KDD, WSDM, WWW, FAccT, and AAAI, given several invited industry talks, and instructed a course on responsible AI at Stanford.
Virtual | Workshop | NLP & LLMs | Intermediate
The workshop will be hands-on and composed of 4 modules, targeting data scientists, machine learning experts, software engineers, and product managers. Prior exposure to LLMs is not necessary. At the end of the workshops, participants will have gained insights on how to effectively make the best of open-source LLMs and learned the necessary steps to bridge the gap between prototype and production-ready applications…more details
Suhas Pai is a NLP researcher and co-founder/CTO at Bedrock AI, a Toronto based startup. At Bedrock AI, he works on text ranking, representation learning, and productionizing LLMs. He is also currently writing a book on Designing Large Language Model Applications with O’Reilly Media. Suhas has been active in the ML community, being the Chair of the TMLS (Toronto Machine Learning Summit) conference since 2021 and also NLP lead at Aggregate Intellect (AISC). He was also co-lead of the Privacy working group at Big Science, as part of the BLOOM project.
Virtual | Workshop | Machine Learning | Intermediate
In this talk, we will explore and demystify th world of Causal AI for data science practitioners, with a focus on understand cause-and-effect relationships within data to drive optimal decisions. In this talk, we will focus on:
* from shapley to DAGs: the dangers of using post-hoc explainability methods as tools for decision making, and how tranditional ML isn’t suited in situations where want to perform interventions on the system.
* discovering causality: how do we figure out what is causal and what isn’t, with a brief introduction to methods of structure learning and causal discovery
* optimal decision making: by understanding causality, we now can accurately estimate the impact we can make on our system – how to use this knowledge to derive the best possible actions to make?…more details
Andre joined causaLens from Goldman Sachs, where he was an executive director in the Model Risk Management group in Hong Kong and Frankfurt. Today he is working with industry leading, global organisations to apply cutting edge Causal AI research in production level solutions that empower individuals and teams to make better decisions. Andre received his PhD in theoretical physics from the University of Munich, where he studied the interplay between quantum mechanics and general relativity in black-holes.
In-person | Workshop | LLM | Intermediate
This can be either a talk or a workshop, depending on how much time you have. This session aims to provide hands-on, engaging content that gives developers a basic understanding of Llama 2 models, how to access and use them, build core components of the AI chatbot using LangChain and Tools. The audience will also learn core concepts around Prompt Engineering and Fine-Tuning and programmatically implement them using Responsible AI principles. Lastly, we will conclude the talk explaining how they can leverage this powerful tech, different use-cases and what the future looks like…more details
Amit Sangani is the Director of Partner Engineering leading the Applied AI Platforms team at Meta. Amit has been with Meta for 8+ years and manages developer-facing engineering teams working on Gen AI platforms such as Llama 2 and PyTorch. Amit’s mission is to democratize AI and increase the adoption of these platforms by making it easier for developers to integrate them into their products and spur innovation and increased productivity.
In-person | Tutorial | Generative AI | Machine Learning | Beginner – Intermediate
The tutorial will cover the existing research on the capabilities of LLMs versus small traditional ML models. If an LLM is the best solution, the tutorial covers several techniques, including evaluation suites like the EleutherAI Harness, head-to-head competition approaches, and using LLMs for evaluating other LLMs. The tutorial will also touch on subtle factors that affect evaluation, including role of prompts, tokenization, and requirements for factual accuracy. Finally, a discussion of model bias and ethics will be integrated into the working examples…more details
Rajiv Shah is a machine learning engineer at Hugging Face who focuses on enabling enterprise teams to succeed with AI. Rajiv is a leading expert in the practical application of AI. Previously, he led data science enablement efforts across hundreds of data scientists at DataRobot. He was also a part of data science teams at Snorkel AI, Caterpillar, and State Farm. Rajiv is a widely recognized speaker on AI, published over 20 research papers, and received over 20 patents, including sports analytics, deep learning, and interpretability. Rajiv holds a PhD in Communications and a Juris Doctor from the University of Illinois at Urbana Champaign. While earning his degrees, he received a fellowship in Digital Government from the John F. Kennedy School of Government at Harvard University. He also has a large following on AI-related short videos on Tik Tok and Instagram at @rajistics.
In-person | Workshop
Have you ever thought about building an application that can help people find places to live that maximize their quality of life and happiness?
The goal of this workshop is to analyze public datasets across different categories and discover correlations among them using the HPCC Systems platform. After analyzing, we will design an interface to query this data and assign it a scoring system, and then deliver it to the user via the HPCC ROXIE cluster and show the user the best places to live in the United States. Users should be given choices in an easy-to-use form that when submitted will generate a unique set of scores based on locations (Example: By State)…more details
Bob has worked with the HPCC Systems technology platform and the ECL programming language for over a decade and has been a technical trainer for over 30 years. He is the developer and designer of the HPCC Systems Online Training Courses and is the Senior Instructor for all classroom and remote based training.
In-person | Workshop | Machine Learning | Deep Learning | All Levels
During this workshop, you will build complex time series forecasting and anomaly detection models from the ground up, perform feature engineering and selection, assess the accuracy, and utilize the ModelZoo browser and root cause analysis functionalities to investigate the outcomes…more details
Philip Wauters is Customer Success Manager and Value engineer at Tangent Works working on practical applications of time series machine learning at customers from various industries such as Siemens, BASF, Borealis and Volkswagen. With a commercial background and experience with data engineering, analysis and data science his goal is to find and extract the business value in the enormous amounts of time-series data that exists at companies today.
In-person | Workshop | Deep Learning | All Levels
Facial recognition systems are everywhere. Of course, it’s where you would expect it, such as airports, border crossings, and government offices. However, it’s also in some public surveillance cameras, all over social media, embedded in smart home solutions, and even in your phone. Have you ever wondered how facial recognition systems work? In this hands-on session, we will build a facial recognition system from scratch using open-source technologies and publically available pre-trained models…more details
Serg Masís has been at the confluence of the internet, application development, and analytics for the last two decades. He’s an Agronomic Data Scientist at Syngenta, a leading agribusiness company with a mission to improve global food security. Before that role, he co-founded a search engine startup, incubated by Harvard Innovation Labs, that combined the power of cloud computing and machine learning with principles in decision-making science to expose users to new places and events efficiently. Whether concerning leisure activities, plant diseases, or customer lifetime value, Serg is passionate about providing the often-missing link between data and decision-making. He wrote the bestselling book “Interpretable Machine Learning with Python” and is currently working on a new book titled “DIY AI” with do-it-yourself projects for AI hobbyists and practitioners alike.
In-person | Workshop | LLM | Gen AI | Machine Learning | Intermediate
In this talk, we show how to personalize LLMs using a feature store and prompt engineering. We walk through how to build an example free serverless, personalized LLM application using Hopsworks, an open-source feature store with a built-in vector database. We will look at how to build templates for prompts, and how they can be easily constructed and included in user queries…more details
Jim Dowling is CEO of Hopsworks and an Associate Professor at KTH Royal Institute of Technology. He is lead architect of the open-source Hopsworks Feature Store platform. He is the organizer of the annual feature store summit conference and featurestore.org community, as well as co-organizer of PyData Stockholm.
In-person | Tutorial | Machine Learning | Intermediate
Machine learning has undergone a profound transformation with open source. From a technology that can only do naive curve fitting to technology that could potentially end humanities dominion. In 2017, Ali Rahimi declared that Machine learning is the new alchemy and we’d like to go further and claim that Machine Learning is the new necromancy. A forgotten science with a passionately strong open source ethos that was eventually destroyed by the catholic church…more details
Mark Saroufim is an engineer on PyTorch at Meta working on open infrastructure, compilers and community. Mark is fond of hot takes and shares them on his blog https://marksaroufim.substack.com/. Prior to Meta, Mark worked as a Machine Learning engineer at Graphcore, Microsoft and yuri.ai.
Virtual | Talk | Generative AI | All Levels
Human creators have provided the works (whether prose writing, source code, or visual works) that act as the basis for training huge DNNs, arguably create an obligation for the AI outputs. This feels most evident when promps ask AIs to create something “in the style of such-and-such-human.” While such outputs are often flawed in interesting ways, they are also usually recognizable in their connection to the prompted human creator. What rights should those source humans have to control those uses, including the moral right simply to be formally recognized as the source? Few laws exist governing attribution and moral rights in generative AI, but many will come to exist soon. Laws and technical standards may follow good or bad principles, both ethical and technical…more details
David is founder of KDM Training, a partnership dedicated to educating developers and data scientists in machine learning and scientific computing. He created the data science training program for Anaconda Inc. and was a senior trainer for them. With the advent of deep neural networks he has turned to training our robot overlords as well.
He was honored to work for 8 years with D. E. Shaw Research, who have built the world’s fastest, highly-specialized (down to the ASICs and network layer), supercomputer for performing molecular dynamics.
David was a Director of the PSF for six years, and remains co-chair of its Trademarks Committee and of its Scientific Python Working Group. His columns, Charming Python and XML Matters, written in the 2000s, were the most widely read articles in the Python world. He has written previous books for Manning, Packt, O’Reilly and Addison-Wesley, and has given keynote addresses at numerous international programming conferences.
In-person | Workshop | Data Analytics and Big data | Machine Learning | All Levels
Today, everything is online – meters, cars, elevators, assembly lines, and even bicycles are connected to the Internet. And all of these items are emitting a relentless stream of metrics and events. With the advent of IoT and the cloud, the volume of time-series data has begun growing exponentially in an unprecedented way. The massive size of time-series data sets is a major challenge for general database management systems like relational and NoSQL databases…more details
Jeff Tao is the founder and CEO of TDengine. He has a background as a technologist and serial entrepreneur, having previously conducted research and development on mobile Internet at Motorola and 3Com and established two successful tech startups. Foreseeing the explosive growth of time-series data generated by machines and sensors now taking place, he founded TDengine in May 2017 to develop a high-performance time-series database purpose-built for modern IoT and IIoT businesses.
In-person | Workshop | Machine Learning | All Levels
In the modern age of Machine Learning, the importance of high-quality, accurately labeled data cannot be overstated. This presentation introduces how to create high-quality, annotated datasets for training Machine Learning (ML) models. In this tutorial, we will use Label Studio, an open-source, multi-type data labeling tool, to explore common methods for annotating raw datasets, including both human and automated labeling techniques…more details
Chris Hoge is the Head of Community for HumanSignal, where he is helping to grow the Label Studio community. He has spent over a decade working in open source machine learning and infrastructure communities, including Apache TVM, Kubernetes, and OpenStack. He has an M.S. in Applied Mathematics from the University of Colorado, with an emphasis on using high-performance numerical methods for simulating physical systems. He makes his home in the Pacific Northwest, where he spends his free time trail running and playing piano.
In-person | Tutorial | Deep Learning | Machine Learning | MLOps and Data Engineering | Beginner – Intermediate
This tutorial will teach you how to operationalize fundamental ideas from data-centric AI across a wide variety of datasets (image, text, tabular, etc). We will cover recent algorithms to automatically identify common issues in real-world data (label errors, bad data annotators, outliers, low-quality examples, and other dataset problems that once identified can be easily addressed to significantly improve trained models)…more details
Jonas Mueller is Chief Scientist and Co-Founder at Cleanlab, a software company providing data-centric AI tools to efficiently improve ML datasets. Previously, he was a senior scientist at Amazon Web Services developing AutoML and Deep Learning algorithms which now power ML applications at hundreds of the world’s largest companies. In 2018, he completed his PhD in Machine Learning at MIT, also doing research in NLP, Statistics, and Computational Biology.
Jonas has published over 30 papers in top ML and Data Science venues (NeurIPS, ICML, ICLR, AAAI, JASA, Annals of Statistics, etc). This research has been featured in Wired, VentureBeat, Technology Review, World Economic Forum, and other media. He has also contributed open-source software, including the fastest-growing open-source libraries for AutoML (https://github.com/awslabs/autogluon) and Data-Centric AI (https://github.com/cleanlab/cleanlab).
Virtual | Tutorial | Prompt Engineering | LLM | Intermediate-Advanced
In this talk, I will demonstrate to students how to create a SequentialChain for generating articles using a colab notebook script. The aim of the script is to generate an article from an LLM via the following calls:
1. Generate an article outline.
2. Create an article from step 1.
3. Edit article from step 2.
4. Translate the article into Spanish, German and French versions from step 3…more details
James is a full-stack engineer that specialises in automating marketing and business processes with AI based solutions.
Virtual | Tutorial | Generative AI | Prompt Engineering | Intermediate
Most people who call themselves prompt engineers are really just blind prompting: chatting with ChatGPT and manually reviewing the results through crude trial and error. If you’re planning to use a prompt at scale – as part of a template, workflow, or product – it’s essential you run that prompt 20-30 times via the GPT-4 API to see how often it fails. You also need to be rigorously A/B testing your prompt against alternative approaches to find what really makes a difference to results. Langchain is a development framework for Large Language Models (LLMs) like GPT-4, and it can help you build a consistent system for running, monitoring, and measuring the performance of your prompts, so you can optimize them against your success metric…more details
Mike is a data-driven, technical marketer who built a 50 person marketing agency (Ladder), and 300k people have taken his online courses (LinkedIn, Udemy, Vexpower). He now works freelance on generative AI projects, and is writing a book on Prompt Engineering for O’Reilly Media.
Virtual | Tutorial | Machine Learning | Intermediate
The talk will explore technical approaches for responsible AI, including explainability, model validation, bias management, data privacy, and ML security. Additionally, we’ll discuss the formation of an impactful AI risk management practice. Moreover, the presentation will provide an introductory guide to existing standards, laws, and assessments for AI technology adoption, including the newly released NIST AI Risk Management Framework. Audiences will also be introduced to the available resources on GitHub and Colab, serving as valuable tools for the audience to continue learning, post the presentation…more details
Parul Pandey has a background in Electrical Engineering and currently works as a Principal Data Scientist at H2O.ai. Prior to this, she was working as a Machine Learning Engineer at Weights & Biases. Parul is one of the co-authors of Machine Learning for High-Risk Applications book, which focuses on the responsible implementation of AI. She is also a Kaggle Grandmaster in the notebooks category and was one of Linkedin’s Top Voices in the Software Development category in 2019. Parul has written multiple articles focused on Data Science and Software development for various publications and mentors, speaks, and delivers workshops on topics related to Responsible AI.
Virtual | Tutorial | Deep Learning | Machine Learning | Research Frontiers | Explainable AI | Intermediate
You have to see it to believe it! Imagine a technique where you randomly delete as many as 80% of your observations in the training set, without decreasing the predictive power (actually improving it in many cases), and reducing computing time by an order of magnitude. In its simplest version, that’s what stochastic thinning does. Here, performance improvement is measured outside the training set, on the validation set also called test data. I illustrate this method on a real-life dataset, in the context of regression and neural networks. In the latter, it speeds up the training stage by a noticeable factor. The thinning process applies to the training set, and may involve multiple tiny random subsets called fractional training sets, representing less than 20% of the training data when combined together. It can also be used for data compression, or to measure the strength of a machine learning algorithm…more details
Virtual | Workshop | LLMs | GenAI | Beginner
Abstract Coming Soon!
Valentina is a Data Science MSc graduate and Cloud Specialist at Microsoft, focusing on Analytics and AI workloads within the manufacturing and pharmaceutical industry since 2022. She has been working on customers’ digital transformations, designing cloud architecture and modern data platforms, including IoT, real-time analytics, Machine Learning, and Generative AI. She is also a tech author, contributing articles on machine learning, AI, and statistics, and recently published a book on Generative AI and Large Language Models.
In her free time, she loves hiking and climbing around the beautiful Italian mountains, running, and enjoying a good book with a cup of coffee.
Virtual | Half-Day Training | All Levels
In this 90-minute hands-on lab, you’ll learn how to start your Graph journey with ArangoDB. We will walk you through getting a basic deployment up and running in minutes. We will then help you create a deployment in the ArangoGraph insights platform and learn how to load and query your data.more details
Asif Kazi has directed global sales engineering, professional services, and technical customer success teams for over 15 years at Foursquare, Ahana, Amazon Web Services, StreamSets, and Couchbase. He is passionate about working with customers and solving their business problems. His core competencies include scaling teams globally, building referenceable client relationships, and accelerating revenue growth.
Arthur Keen, Ph.D. is a Graph evangelist with more than 20 years of experience in Graph AI. His expertise spans Graph Machine Learning, graph analytics, and ontology. Arthur has developed knowledge graph-based solutions in intelligence, cyber, financial services, logistics, retail, and energy. He has led 3 product teams from concept to GA and first customers. He has been awarded 7 patents and has held leadership roles including CTO, Chief Scientist, and VP of Engineering. He has a Computer Science/Industrial Engineering Ph.D. and has been invited to speak at conferences including Data Day TX, NoSQL Now, SemTech, and SXSW.
In-person | Workshop | AI Safety
Abstract Coming Soon!
Ian Eisenberg is Head of AI Governance Research at Credo AI, where he advances best practices in AI governance to support Credo AI’s product and policy strategy. He is also the founder of the AI Salon, an SF-based group supporting conversations on the meaning and impact of AI. His interest in AI started as a cognitive neuroscientist at Stanford, which developed into a focus on the sociotechnical challenges of AI technologies and reducing AI risk. Ian has been a researcher at Stanford, the NIH, Columbia and Brown University. He received his PhD from Stanford University, and BS from Brown University.
In-person | Workshop | Generative AI | NLP&LLMs | Machine Learning | Intermediate
This session is relevant to practitioners in a variety of industries, including:
Creative industries: Stable Diffusion can be used to generate images for marketing materials, product designs, and other creative projects.
Technology industries: Stable Diffusion can be used to develop new applications for text-to-image generation, such as chatbots and virtual assistants.
Research industries: Stable Diffusion can be used to conduct research on text-to-image generation and its applications…more details
Sandeep Singh is a leader in applied AI and computer vision in Silicon Valley’s mapping industry, and he is at the forefront of developing cutting-edge technology to capture, analyze and understand satellite imagery, visual and location data. With a deep expertise in computer vision algorithms, machine learning and image processing and applied ethics, Sandeep is responsible for creating innovative solutions that enable mapping and navigation software to accurately and efficiently identify and interpret features to remove inefficiencies of logistics and mapping solutions. His work includes developing sophisticated image recognition systems, building 3D mapping models, and optimizing visual data processing pipelines for use in logistics, telecommunications and autonomous vehicles and other mapping applications. With a keen eye for detail and a passion for pushing the boundaries of what’s possible with AI and computer vision, Sandeep’s leadership is driving the future of applied AI forward.
In-person | Workshop | Machine Learning | Big Data Analytics | Data Visualization | Intermediate
As part of Salesforce’s Platform performance engineering team, we use Python to employ anomaly detection techniques and other analytics methods for monitoring customer data. This helps us gain real-time insights into the performance of our production data and its impact on customers. By promptly identifying anomalies and extracting valuable insights, we improve system reliability, reduce downtime, and enhance customer trust—core values at Salesforce…more details
Geeta Shankar is a software engineer who specializes in leveraging data for business success. With expertise in computer science, data science, machine learning, and artificial intelligence, she stays updated with the latest data-driven innovations. Her Indian classical music background has taught her the value of sharp thinking, spontaneity, and connecting with diverse individuals. Geeta uses these skills to translate complex data into meaningful insights that enhance performance and customer experiences.
In-person | Workshop | Machine Learning | MLOps and Data Engineering | Intermediate – Advanced
This talk will outline the complexity of feature engineering from raw entity-level data, the reduction in complexity that comes with composable compute graphs, and an example of the working solution. We will also discuss a case study of the impact on a logistics & supply chain machine learning problem. If you work on large scale MLOps projects, this talk may be of interest…more details
Wes is a machine learning expert with over a decade of experience delivering business value with AI. Wes’s experience spans multiple industries, but always with an MLOps focus. His recent areas of focus and interest are graphs, distributed computing, and scalable feature engineering pipelines.
In-person | Workshop | NLP & LLMs | GenAI | Intermediate
Large Language Models (LLM’s) are starting to revolutionize how users can search for, interact with, and generate new content. Some recent stacks around Retrieval Augmented Generation (RAG) have emerged where users are building LLM search/retrieval applications (e.g. chatbots) on their own private data. Moreover, there is an opportunity to have even richer set of interactions with data; by empowering LLM agents with both read and write capabilities over a set of diverse data tools, they hold the promise of automating knowledge workflows. LlamaIndex provides the core tools to build LLM-powered search and retrieval systems as well as more automated knowledge workers capable of interfacing with your data sources in more sophisticated manners. In this workshop, we help you build both a simple QA bot as well as an automated workflow agent, all powered by LLMs…more details
Jerry is the co-founder/CEO of LlamaIndex, an open-source tool that provides a central data management/query interface for your LLM application. Before this, he has spent his career at the intersection of ML, research, and startups. He led the ML monitoring team at Robust Intelligence, did self-driving AI research at Uber ATG, and worked on recommendation systems at Quora. He graduated from Princeton in 2017 with a degree in CS.
In-person | Tutorial | Generative AI | Beginner-Intermediate
As the digital frontier expands, the inclusion of Generative and Autonomous AI technologies is becoming an integral part of the next evolutionary step in games, simulations, and the emergent Metaverse. In this in-depth, hands-off tutorial, we delve into the transformative potential of these cutting-edge AI paradigms, focusing on their capacity to create virtual landscapes that are not just visually stunning but also highly interactive, intelligent, and perpetually evolving. Participants will be led through a balanced curriculum that covers both the theoretical foundations and the practical techniques, right from core algorithms to real-world implementation…more details
Matt White is a distinguished expert in artificial intelligence and business, Renowned for successfully deploying large-scale AI platforms across the telecom, gaming, media, and entertainment industries. With over two decades of experience, Matt has consistently demonstrated his ability to stay ahead of the curve in technological innovation, spearheading advancements in AI applications in diverse domains.
In-person | Half-Day Training | LLMs | Beginner-Intermediate
In this workshop, you will learn the tips and tricks of creating and fine-tuning LLMs along with implementing cutting edge ideas of building these systems from the best research papers. We will start by learning the foundations behind what makes a LLM, quickly moving into fine-tuning our own GPT and finally implementing some of the cutting edge tricks of building these models…more details
Sanyam Bhutani is a Sr Data Scientist and Kaggle Grandmaster at H2O where he drinks chai and makes content for the community. When not drinking chai, he is to be found hiking the Himalayas, often with LLM Research papers. For the past 6 months, he has been writing about Generative AI everyday on the internet. Before that he has been recognised for his #1 Kaggle Podcast: Chai Time Data Science and also widely known on the internet for “maximising compute per cubic inch of an ATX case” by fixing 12 GPUs into his home office.
In-person | Workshop | LLMs | All Levels
In this workshop, we’ll explore how Label Studio, LangChain, Chroma, and Gradio can be employed as tools for continuous improvement, specifically in building a Question-Answering (QA) system trained to answer questions about an open source project, in this case Label Studio itself, using the domain-specific knowledge from Label Studio’s GitHub documentation…more details
Chris Hoge is the Head of Community for HumanSignal, where he is helping to grow the Label Studio community. He has spent over a decade working in open source machine learning and infrastructure communities, including Apache TVM, Kubernetes, and OpenStack. He has an M.S. in Applied Mathematics from the University of Colorado, with an emphasis on using high-performance numerical methods for simulating physical systems. He makes his home in the Pacific Northwest, where he spends his free time trail running and playing piano.
In-person | Tutorial | Machine Learning | Beginner-Intermediate
By completing this workshop, you will develop an understanding of no-code and low-code frameworks, how they are used in the ML workflow, how they can be used for data ingestion and analysis, and for building, training, and deploying ML models. You will become familiar with Google’s Vertex AI for both no-code and low-code ML model training, and Google’s Colab, a free Jupyter Notebook service for running Python and the Keras Sequential API, a simple and easy-to-use API that is well-suited for beginners. You will also become familiar with how to assess when to use low-code, no-code, and custom ML training frameworks…more details
Gwendolyn Stripling, Ph.D., is an Artificial Intelligence and Machine Learning Content Developer at Google Cloud. Stripling is author of the widely popular YouTube video, “Introduction to Generative AI” and of the O’Reilly Media book “Low-Code AI: A Practical Project Driven Approach to Machine Learning”. They are also the author of the LinkedIn Learning video “Introduction to Neural Networks”. Stripling is an Adjunct Professor and member of Golden Gate University’s Masters in Business Analytics Advisory Board. Stripling enjoys speaking on AI/ML, having presented at Dominican University of California’s Barowsky School of Business Analytics, Golden Gate University’s Ageno School of Business Analytics, and numerous Tech conferences.
In-Person | Workshop | NLP & LLMs | Intermediate-Advanced
The workshop will provide a comprehensive understanding of the challenges and intricacies involved in aligning LLMs. By learning to navigate through data preprocessing and quality assessment, participants will gain insights into identifying the most relevant data for fine-tuning LLMs effectively. Moreover, the practical application of reinforcement learning techniques will empower attendees to tailor LLMs for specific tasks, ensuring enhanced performance and precision in real-world applications…more details
Sinan Ozdemir is a mathematician, data scientist, NLP expert, lecturer, and accomplished author. He is currently applying my extensive knowledge and experience in AI and Large Language Models (LLMs) as the founder and CTO of LoopGenius, transforming the way entrepreneurs and startups market their products and services.
Simultaneously, he is providing advisory services in AI and LLMs to Tola Capital, an innovative investment firm. He has also worked as an AI author for Addison Wesley and Pearson, crafting comprehensive resources that help professionals navigate the complex field of AI and LLMs.
Previously, he served as the Director of Data Science at Directly, where my work significantly influenced their strategic direction. As an official member of the Forbes Technology Council from 2017 to 2021, he shared his insights on AI, machine learning, NLP, and emerging technologies-related business processes.
He holds a B.A. and an M.A. in Pure Mathematics (Algebraic Geometry) from The Johns Hopkins University, and he is an alumnus of the Y Combinator program. Sinan actively contribute to society through various volunteering activities.
Sinan’s skill set is strongly endorsed by professionals from various sectors and includes data analysis, Python, statistics, AI, NLP, theoretical mathematics, data science, function analysis, data mining, algorithm development, machine learning, game-theoretic modeling, and various programming languages.
Virtual | Half-Day Training | LLM | LangChain | Intermediate
During this workshop, we will walk through each component of a simple RAG system – from vector stores (we’ll use Pinecone) to embedding models (we’ll pick one from Hugging Face’s embedding leaderboard) to the LLM Ops infrastructure glue that holds it all together (LangChain of course!) – including how to think about each component conceptually in addition to how to set them up in Python code…more details
Dr. Greg Loughnane is the Founder & CEO of AI Makerspace, where he serves as lead instructor for their LLM Ops: LLMs in Production course. Since 2021 he has built and led industry-leading Machine Learning & AI bootcamp programs. Previously, he has worked as an AI product manager, a university professor teaching AI, an AI consultant and startup advisor, and ML researcher. He loves trail running and is based in Dayton, Ohio.
Chris Alexiuk, is the Head of LLMs at AI Makerspace, where he serves as a programming instructor, curriculum developer, and thought leader for their flagship LLM Ops: LLMs in Production course. During the day, he’s a Founding Machine Learning Engineer at Ox. He is also a solo YouTube creator, Dungeons & Dragons enthusiast, and is based in Toronto, Canada.
In-person | Tutorial | NLP & LLMs | Generative AI
Large language models have swept through the world of AI in the past few years. This talk will give some background on large language models, their origins from the early days of information theory, current capabilities, some pitfalls, and potential future capabilities with a focus on the innovations behind some recent models including the sparse LLM model GLaM and the recent model PaLM 2…more details
Andrew Dai did his PhD at the University of Edinburgh before joining Google Brain 9 years ago in 2014 where he did research on language models, story generation and conversational agents and products including SmartReply. He moved to Google Health in 2017 to research deep learning for medical records. He then returned to continue research at Google Brain (now Google Deepmind) in 2020 and since then has co-led the development and training of LLMs including PaLM 2 and GLaM. Andrew also is a lead for Google SGE modelling, Gemini and data research and is excited by the new abilities we see from LLMs.
In-person | Workshop | MLOps | Machine Learning | Intermediate
In this workshop, you will learn to build a Streamlit data application to help visualize the ROI of different advertising spends of an example organization…more details
Vino is a Developer Advocate for Snowflake. She started as a software engineer at NetApp, and worked on data management applications for NetApp data centers when on-prem data centers were still a cool thing. She then hopped onto the cloud and big data world and landed at the data teams of Nike and Apple. There she worked mainly on batch processing workloads as a data engineer, built custom NLP models as an ML engineer and even touched upon MLOps a bit for model deployments. When she is not working with data, you can find her doing yoga or strolling the golden gate park and ocean beach.
In-person | Workshop | Machine Learning | Intermediate
This workshop will show how to use XGBoost. It will demonstrate model creation, model tuning, model evaluation, and model interpretation…more details
Matt Harrison has been using Python since 2000. He runs MetaSnake, a Python and Data Science consultancy and corporate training shop. In the past, he has worked across the domains of search, build management and testing, business intelligence, and storage.
He has presented and taught tutorials at conferences such as Strata, SciPy, SCALE, PyCON, and OSCON as well as local user conferences.
In-person | Workshop | Machine Learning | Responsible AI | Explainability
This talk will examine the implications of incorporating graphs into the realm of generative AI, exploring the potential for even greater advancements. Learn about foundational concepts such as directed acrylic graphs (DAGs), Jedeau Pearl’s “do” operator, and keeping domain expertise in the loop. You’ll hear how the explainability landscape is evolving, comparisons of graph-based models to other methods, and how we can evaluate the different fairness models available…more details
Amy Hodler is an evangelist for graph analytics and responsible AI. She’s the co-author of O’Reilly books on Graph Algorithms and Knowledge Graphs as well as a contributor to the Routledge book, Massive Graph Analytics and Bloomsbury book, AI on Trial. Amy has decades of experience in emerging tech at companies such as Microsoft, Hewlett-Packard (HP), Hitachi IoT, Neo4j, Cray, and RelationalAI. Amy is the founder of GraphGeeks.org promoting connections everywhere.
Michelle is a technology leader that specializes in machine learning and cloud computing. She has 15 years of experience in the technology industry, contributed to the original IBM Watson showcased on Jeopardy, and enjoys building and leading teams that develop and deploy AI solutions to solve real-world problems. Michelle is passionate about diversity, STEM education/careers for our minority communities, and serves both on the board of Women in Data and as an avid volunteer for Girls Who Code.
This course is aimed at helping people begin their AI journey and gain valuable insights that we will build up in subsequent SQL, programming and AI courses…more details

This course covers topics such as database design and normalization, data wrangling, aggregate functions, subqueries, and join operations. Students will learn how to design and write SQL code to solve real-world problems …more details

This course aims to provide a basic foundation in Python and help participants develop the skills needed to progress in the field of data science and machine learning…more details

This AI literacy course is designed to introduce participants to the basic of artificial intelligence (AI) and machine learning.
Upon completions, individuals will have a foundational understanding of machine learning and its capabilities…more details

Virtual | Bootcamp | Machine Learning | Beginner
With the availability of data, there is a growing demand for talent who can analyze and make sense of it. This makes practical math all the more important because it helps infer insights from data. However, mathematics comprises many topics, and it is hard to identify which ones are applicable and relevant for a data science career. Knowing these essential math topics is key to integrating knowledge across data science, statistics, and machine learning. It has become even more important with the prevalance of libraries like PyTorch and scikit-learn, which can create “”black box”” approaches where data science professionals use these libraries but do not fully understand how they work…more details
Thomas Nield is the founder of Nield Consulting Group and Yawman Flight, as well as an instructor at University of Southern California. He enjoys making technical content relatable and relevant to those unfamiliar or intimidated by it. Thomas regularly teaches classes on data analysis, machine learning, mathematical optimization, and practical artificial intelligence. At USC he teaches AI System Safety, developing systematic approaches for identifying AI-related hazards in aviation and ground vehicles. He’s authored three books, including Essential Math for Data Science (O’Reilly) and Getting Started with SQL (O’Reilly)
He is also the founder and inventor of Yawman Flight, a company developing universal handheld flight controls for flight simulation and unmanned aerial vehicles.

In-person | Keynote | Machine Learning | All Levels
In this talk, I’ll discuss some ways that we might cope with and address the unreliability of neural network models. As an initial coping strategy, I will first discuss a technique for detecting whether some content was generated by a machine learning model, leveraging the probability distribution that the model assigns to different content. Next, I will describe an approach for enabling neural network models to better estimate what they don’t know, such that they can refrain from making predictions on such inputs (i.e. selective classification). I will lastly describe methods for adapting models with small amounts of data to improve their accuracy under distribution shift…more details
Chelsea Finn is an Assistant Professor in Computer Science and Electrical Engineering at Stanford University. Her research interests lie in the capability of robots and other agents to develop broadly intelligent behavior through learning and interaction. To this end, her work has pioneered end-to-end deep learning methods for vision-based robotic manipulation, meta-learning algorithms for few-shot learning, and approaches for scaling robot learning to broad datasets. Her research has been recognized by awards such as the Sloan Fellowship, the NSF CAREER Award, the MIT Tech Review 35 Under 35, and the ACM doctoral dissertation award, and has been covered by various media outlets including the New York Times, Wired, and Bloomberg. Prior to Stanford, she received her Bachelor’s degree in Electrical Engineering and Computer Science at MIT and her PhD in Computer Science at UC Berkeley.
Virtual | Talk | Machine Learning | Intermediate
In this talk, I will attempt to provide several “bird’s eye” views on GNNs. Following a quick motivation on the utility of graph representation learning, I will derive GNNs from first principles of permutation invariance and equivariance. We will discuss how we can build GNNs that are not strictly reliant on the input graph structure…more details
Petar Veličković is a Staff Research Scientist at Google DeepMind, Affiliated Lecturer at the University of Cambridge, and an Associate of Clare Hall, Cambridge. Petar holds a PhD in Computer Science from the University of Cambridge (Trinity College), obtained under the supervision of Pietro Liò. His research concerns geometric deep learning—devising neural network architectures that respect the invariances and symmetries in data (a topic I’ve co-written a proto-book about). Petar’s research has been used in substantially improving travel-time predictions in Google Maps, and guiding intuition of mathematicians towards new top-tier theorems and conjectures.
Virtual | Talk | NLP & LLMs | Intermediate
Large language models (LLMs) have achieved a milestone that undeniably changed many held beliefs in artificial intelligence (AI). However, there re-mains many limitations of these LLMs when it comes to true language un-derstanding, limitations that are a byproduct of the underlying architecture of deep neural networks. Moreover, and due to their subsymbolic nature, whatever knowledge these models acquire about how language works will always be buried in billions of microfeatures (weights), none of which is meaningful on its own, making such models hopelessly unexplainable…more details
Walid Saba is a Senior Research Scientist at the Institute for Experiential AI at Northeastern University. Prior to joining the institute in 2023, he worked at two Silicon Valley startups, focusing on conversational AI. This work included high-level roles as the principal AI scientist for telecommunications company Astound and CTO of software company Klangoo, where he helped develop its state-of-the-art digital content semantic engine (Magnet).
Saba’s career to date has seen him hold various positions in both the private sector and academia. His resume includes entities such as the American Institutes for Research, AT&T Bell Labs, IBM and Cognos, while he has also spent a cumulative seven years teaching computer science at the University of Ottawa, the New Jersey Institute of Technology (NJIT), the University of Windsor (a public research university in Ontario, Canada), and the American University of Beirut (AUB).
Walid is frequent invited for interviews and as a keynote speaker on AI and NLP has published over 45 technical articles, including an award-winning paper that he presented at the German Artificial Intelligence Conference (KI-2008). Walid received his BSc and MSc in Computer Science from the University of Windsor, and a Ph.D in Computer Science from Carleton University in 1999.
Virtual | Talk | NLP & LLMs | Beginner
Generative models have demonstrated how helpful they can be on general knowledge, helping students on their writing assignments. But as soon as you want to run it in a professional setting with prompts like “what are the three main feature requests from our largest customers?”, they demonstrate their lack of knowledge.
In this session, I will introduce how Large Language Models (LLMs) can be connected to your data via semantic search. As I will present, there are many pitfalls and challenges. Some can be solved, when using the right technologies, others are still open problems…more details
Nils Reimers is an NLP / Deep Learning researcher with extensive experience on representing text in dense vector spaces and how to use them for various applications. During his research career, he created sentence-transformers that were the foundation for many today’s semantic search applications.
In 2022, Nils joined Cohere.com to lead the team on smarter semantic search technologies and how to connect LLMs to enterprise data. Here, his teams develop new foundation models that can understand and reason over complex data.
In-person | Case Study
By adopting the proposed framework, organizations can effectively harness the potential of Gen AI, empowering all employees with invaluable insights to make data-driven decisions. This talk aims to inspire organizations to embrace a safe agile approach and cultivate a culture of experimentation and collaboration for the successful deployment of Gen AI solutions…more details
In-person | Talk | Data Visualization & Data Analysis | All Levels
Save time and money by equipping everyone with fundamental data literacy and analytics skills. It sounds great, but most organizations bite off more than they can chew. Instead of trying to turn everyone into a data analyst, teach people the most vital descriptive analytics skills, and eliminate the flow of tedious work requests to your data experts…more details
A TEDx speaker, Dom brings a wealth of data storytelling experience to StoryIQ from his career at QBE, one of Australia’s largest insurance companies. At QBE, he was a senior leader in data analytics and business improvement, presenting data-driven strategy recommendations to the company’s senior executives and producing reports for the Group Board of Directors.
In-person | Talk | MLOps | Data Engineering & Big Data | Machine Learning | Deep Learning | Intermediate
In this presentation, Amber Roberts, Machine Learning Engineer at Arize AI, will present findings from research on ways to measure vector/embedding drift for image and language models. With lessons learned from testing different approaches (including Euclidean and Cosine distance) across billions of streams and use cases, Roberts will dive into how to detect whether two unstructured language datasets are different — and, if so, how to understand that difference using techniques such as UMAP…more details
Amber Roberts is a ML Growth Lead at Arize AI, a ML observability company built for maintaining models in production. Previously, Amber was a product manager of AI at Splunk and the Head of Artificial Intelligence at Insight Data Science. A Carnegie Fellow, Amber has an MS in Astrophysics from the Universidad de Chile.
In-person | Talk | Machine Learning | Beginner
Statistics do not come intuitively to humans; they always try to find simple ways to describe complex things. Given a complex dataset, they may feel tempted to use simple summary statistics like the mean, median, or standard deviation to describe it. However, these numbers are not a replacement for visualizing the distribution…more details
Stefanie Molin is a software engineer and data scientist at Bloomberg in New York City, where she tackles tough problems in information security, particularly those revolving around data wrangling/visualization, building tools for gathering data, and knowledge sharing. She is also the author of “Hands-On Data Analysis with Pandas,” which is currently in its second edition. She holds a bachelor’s of science degree in operations research from Columbia University’s Fu Foundation School of Engineering and Applied Science, as well as a master’s degree in computer science, with a specialization in machine learning, from Georgia Tech. In her free time, she enjoys traveling the world, inventing new recipes, and learning new languages spoken among both people and computers.
In-person | Track Keynote | Generative AI | All Levels
Generative AI has taken over the world by storm, but building Gen AI applications for production comes with a unique set of challenges. Questions around cost performance, risk, simplifying implementation for production, setting guardrails, adding automation where possible and leveraging CI/CD for ML all become even more important when Gen AI is involved…more details
Yaron Haviv is a serial entrepreneur who has been applying his deep technological experience in AI, cloud, data and networking to leading startups and enterprises since the late 1990s. As the Co-Founder and CTO of Iguazio, Yaron drives the strategy for the company’s MLOps platform and led the shift towards the production-first approach to data science and catering to real-time AI use cases. He also initiated and built Nuclio, a leading open source serverless framework with over 4,000 Github stars and MLRun, a cutting-edge open source MLOps orchestration framework. Prior to co-founding Iguazio in 2014, Yaron was the Vice President of Datacenter Solutions at Mellanox (now NVIDIA – NASDAQ: NVDA), where he led technology innovation, software development and solution integrations. He also served as the CTO and Vice President of R&D at Voltaire, a high-performance computing, IO and networking company which floated on the NYSE in 2007 and was later acquired by Mellanox (NASDAQ:MLNX). Yaron is an active contributor to the CNCF Working Group and was one of the foundation’s first members. He sits on the Data Science Committee of the AI Infrastructure Alliance (AIIA), of which Iguazio is a founding member. He is co-authoring a book on Implementing MLOps in the Enterprise for O’Reilly. Yaron presents at major industry events worldwide and writes tech content for leading publications including TheNewStack, Hackernoon, DZone, Towards Data Science and more.
Virtual |Talk | Machine Learning | Beginner – Intermediate
In this training session, we will delve into the intricacies of building robust and scalable recommendation engines specifically tailored for online food delivery services. We will introduce a newly released dataset called the Delivery Hero Recommendation Dataset (DHRH) and understand how this can be used for training different recommendation models.We will explore the challenges faced in this domain and discuss the techniques and best practices to overcome them, ensuring our recommendation systems can handle large-scale operations and adapt to changing customer preferences…more details
Raghav is a seasoned Data Science professional with over a decade’s experience of research & development of large-scale solutions in Finance, Digital Experience, IT Infrastructure and Healthcare for giants such as Intel, American Express, United HealthGroup and DeliverHero. He is an innovator with 10+ patents, a published author of multiple well received books & peer-reviewed papers and a regular speaker in leading conferences on topics in the areas of Generative AI, Recommendation Systems, Computer Vision, NLP, Deep Learning, Machine Learning and Augmented Reality.
Vishal is an experienced Data science professional with 12+ years of experience. He is currently leading the Ranking and Recommendation topic at DeliveryHero, where he joined in May 2022. Before joining DeliveryHero, he worked at Monotype as a Data science manager, leading the AI Research team in building products like WhatTheFont (identifying fonts from images), Font Similarity and many more. He has a couple of patents filed in his name.
Virtual | Talk | Machine Learning | LLMs | Intermediate
A highlight of this workshop is the introduction to the “causal LLM”, a pioneering concept where LLMs are designed using foundational causal principles. By the end of this session, participants will be equipped with the skills and knowledge to employ LLMs effectively in discerning complex causal models and pioneering advancements in the field of causal AI…more details
Robert Osazuwa Ness is a researcher at Microsoft Research and author of the book Causal Machine Learning. He leads the development of MSR’s causal machine learning platform and conducts research into probabilistic models for advanced causal reasoning. He has worked as a machine learning engineer in various machine learning startups. He attended graduate school at both Johns Hopkins SAIS (Hopkins-Nanjing Center) and Purdue University. He received his Ph.D. in Statistics from Purdue, where his dissertation research focused on Bayesian active learning models for causal discovery.
Virtual | Talk | MLOps and Data Engineering | Intermediate
With the wide adoption of generative artificial intelligence (AI), more than ever, ensuring the robustness of machine learning models is becoming crucial. One of the most concerning security threats to machine learning (ML) is the potential for adversarial attacks, a technique to exploit vulnerabilities of models to cause incorrect output. They are unapparent to humans but sufficient for machine learning models to misclassify the data, potentially harming the end users. Therefore, including adversarial training in the ML lifecycle is important to consider as you build out your model and prepare it for production usage. Join this talk to learn how the symbiosis between the ML security open source projects like Adversarial Robustness Toolbox and an ML operations (MLOps) project, Kubeflow, can streamline your machine learning workflow and improve your model’s robustness and security. With numerous benefits of the MLOps to assist in generating and defending your model, accelerate your development of secure machine learning models…more details
Anna Jung is a Senior ML Open Source Engineer at VMware, leading the open source team as part of the VMware AI Labs. She currently contributes to various upstream ML-related open source projects focusing on the project’s overall health, adoption, and innovation. She believes in the importance of giving back to the community and is passionate about increasing diversity in open source. When away from the keyboard, Anna is often at film festivals supporting independent filmmakers.
Virtual | Talk | Machine Learning | Deep Learning | Intermediate-Advanced
In this session we will deep dive into all the new developments and techniques in PyTorch and provide recommendations on how you can accelerate your models using native PyTorch code…more details
Supriya is an Engineering Manager working on PyTorch at Meta. Her team works on architecture optimization techniques like quantization, pruning as well as other core components of PyTorch 2.0 whereby enabling users to run AI models on different HW efficiently using native PyTorch. Prior to Meta, she worked as a software engineer at Nvidia on improving their GPU Architecture and accelerating AI models via TensorRT for inference. Supriya has an MS in CSE from University of Michigan, Ann Arbor and a bachelor’s degree from Bits Pilani, India.
In-person | Talk | Data Analytics and Big data | Machine Learning | MLOps | Beginner – Intermediate
When choosing different architectures for your enterprise data environment, you always want to make sure how to manage choices in the CAP theorem. The CAP theorem states that you can not have consistency, availability, and partition-tolerance at the same time, but what if choosing for Kappa architecture makes it possible to have it all?…more details
Joep has more than 12 years experience of developing, engineering, architecting and visualising data products in various markets ranging from energy to clothing manufacturing. He’s focussing on enabling teams to be better at handling data and providing the teams with the tools and the knowledge needed to go live and to stay in production.
He was member of the Teqnation program committee), did a presentation on Kafka and Hue usage during football, developing and deploying on Hololens, Total Devops using Gitlab, Evolution of a datascience product, Using the elastic stack from PoC to Production, Xbox Kinect on a bike at Devoxx London
Virtual | Talk | Responsible AI | Intermediate
Digital experimentation and A/B testing are invaluable methodologies within the AI domain, facilitating the validation and continuous improvement of models, solutions, and systems. This talk will delve into the intricate role of these testing mechanisms in the AI landscape.
We will start by introducing the concept of digital experimentation and A/B testing, elaborating on their integral role in testing hypotheses and making data-driven decisions. The discussion will further touch upon the traditional uses of these methodologies in the digital marketing sphere, enabling businesses to optimize their online content and increase user engagement…more details
Alessandro is a highly experienced data scientist with a Bachelor’s degree in computer science and a Master’s in data science. He has collaborated with a variety of companies and organizations and currently holds the role of senior data scientist at logistics giant Kuehne+Nagel. Alessandro is particularly passionate about statistics and digital experimentation and has a strong track record of applying these skills to solve complex problems. He shares his knowledge regularly, speaking at events like the Data Innovation Summit and DataMass Gdansk Summit.
In-person | Talk | LLMs | Machine Learning | Intermediate
In the realm of advanced computational linguistics, the efficacy of Large Language Models (LLMs) is intrinsically tied to their contextual precision. In this session presented by Jake from Cloudera (not State Farm), we’ll navigate the complexities of ensuring LLMs yield contextually accurate results, a necessity in today’s intricate data environments. Crucially, attendees will be treated to a live demonstration showcasing the utilization of RAG (Retrieval-Augmented Generation) and PEFT (Parameter Efficient Fine-Tuning) techniques, two of the leading approaches for this task that underpin the success of Cloudera’s Applied ML Prototypes (AMPs)…more details
Jake currently holds the position of Principal Technical Evangelist at Cloudera, where he promotes the strengths of Cloudera’s Lakehouse for delivering trusted AI. His tenure at Cloudera began as a Senior Product Marketing Manager for Cloudera Machine Learning (CML).
Before Cloudera, Jake developed his ML expertise at ExxonMobil, starting as a Data Scientist and later transitioning to a Data Science and Analytics Solution Architect role. He also contributed significantly at FarmersEdge, taking on responsibilities as a Senior Data Scientist and subsequently as a Data Science Manager.
Jake earned both his bachelor’s and master’s degrees from Brigham Young University in Information Systems Management with an emphasis in Statistics.
Outside of work, Jake is passionate about outdoor activities. He enjoys skiing, golfing, rafting, and hiking. However, spending time with his family amidst the mountains remains his most rewarding pastime.
In-Person | Talk | NLP & LLMs | GenAI | Advanced
In this talk, I will present data2vec, a framework for general self-supervised learning that uses the same learning method for either speech, NLP or computer vision. The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture. Instead of predicting modality-specific targets such as words, visual tokens or units of human speech which are local in nature, data2vec predicts contextualized latent representations that contain information from the entire input. Experiments on the major benchmarks of speech recognition, image classification, and natural language understanding demonstrate a new state of the art or competitive performance to predominant approaches…more details
Michael Auli is a principal research scientist/director at FAIR in Menlo Park, California. His work focuses on speech and NLP and he helped create projects such as wav2vec/data2vec, the widely used fairseq toolkit, the first modern feed-forward seq2seq models outperforming RNNs for NLP, and several top ranked submissions at the WMT news translation task in 2018 and 2019. Before that Michael was at Microsoft Research, where he did early work on neural machine translation and using neural language models for conversational applications. During his PhD at the University of Edinburgh he worked on natural language processing and parsing. http://michaelauli.github.io
Virtual | Talk | Machine Learning Safety and Security | Intermediate
This session will focus on security threats to machine learning models, including a demonstration of the similarities and differences between adversarial attacks in the domains of computer vision and natural language processing (NLP) with examples from open source projects like Adversarial Robustness Toolbox and TextAttack. Join the session to discuss applying adversarial research to real-world systems and learn how to be proactive with machine learning security…more details
Teodora is an open source software engineer at VMware AI Labs. During her first couple of years in VMware, as part of the Open Source Program Office, she was an active contributor and maintainer of The Update Framework (TUF) – a framework for securing software update systems. Currently, she invests her time in open source projects related to machine learning security.
Virtual | Talk | NLP | Beginner
In this talk, we will delve into the intricate relationship between NLP task definition and the outcomes of NLP system development, exploring how the precise formulation of tasks can either propel or hinder progress. Through case studies and examples, attendees will understand how ambiguous or biased task definitions can lead to misguided model objectives, high data acquisition costs, and misleading system evaluations, and learn how to strike the right balance between specificity and adaptability in NLP task design…more details
Panos Alexopoulos has been working since 2006 at the intersection of data, semantics, and software, building intelligent systems that deliver value to business and society. Born and raised in Athens, Greece, he currently works as Head of Ontology at Textkernel, in Amsterdam, Netherlands, where he leads a team of Data Professionals in developing and delivering a large cross-lingual Knowledge Graph in the HR and Recruitment domain. Panos holds a PhD in Knowledge Engineering and Management from National Technical University of Athens, and has published more than 60 papers at international conferences, journals and books. He is the author of the book “Semantic Modeling for Data – Avoiding Pitfalls and Breaking Dilemmas” (O’Reilly, 2020), and a regular speaker and trainer in both academic and industry venues.
In-person | Talk | All Levels
Open source technology has historically driven technological progress, from operating systems, to web development, and now, AI and machine learning. We’re at a point in history where the canonical “open source ML stack” has yet to be discovered. In this talk, you’ll hear about HPE’s approach to building an end-to-end ML stack, including Determined AI for model training and Pachyderm for data preparation. You’ll also see a live demo of how to use Determined AI for a transfer learning use case in the biomedical image domain. We’ll also touch on some key collaborations, including our partnerships with the AI Infrastructure Alliance, NVIDIA, and Aleph Alpha…more details
Isha Ghodgaonkar is a Machine Learning Developer Advocate at Hewlett Packard Enterprise (HPE), working within the HPC, AI & Labs business unit, which includes open source platforms Determined AI and Pachyderm. Prior to HPE, Isha worked at MITRE as an Autonomous Systems Engineer, a Data Science Intern at Genentech, and a technical intern at The Aerospace Corporation. Isha is a graduate of Purdue University.
In-person | Ai X Talk | Intermediate
In an era of Generative AI and Large Language Models, using old school machine learning might sound quaint. The reality is, however, that hidden with your corporate data is a trove of information and hidden business patterns. Leveraging key techniques from Machine Learning can help businesses uncover hidden patterns that will have immediate impact on your bottom line and on your ROI…more details
Ryohei is the Founder & CEO of dotData. Prior to founding dotData, he was the youngest research fellow ever in NEC Corporation’s 119-year history, the title was honored for only six individuals among 1000+ researchers.
During his tenure at NEC, Ryohei was heavily involved in developing many cutting-edge data science solutions with NEC’s global business clients, and was instrumental in the successful delivery of several high-profile analytical solutions that are now widely used in industry.
Ryohei received his Ph.D. degree from the University of Tokyo in the field of machine learning and artificial intelligence.
In-person | Ai X Talk
Abstract Coming Soon!
Cai GoGwilt has always been passionate about building products that make people more effective at their jobs. He is an MIT graduate in Computer Science and Physics, as well as a former software engineer at Palantir Technologies. By day, he’s a celebrated software engineer and thoughtful leader; by night, he’s a concert cellist and karaoke enthusiast.
In-person | Talk | NLP & LLMs | GenAI | Intermediate
There seems to be a new large ML model grabbing headlines every week. Whether it’s OpenAI’s big releases like GPT-3, Dalle-2 or Whisper, or one of the many open source projects generating state-of-the-art models, like Stable Diffusion, OpenFold or Craiyon, these models have found their way into the mainstream. Lukas will map the landscape for you and share how these teams use W&B to accelerate their work…more details
Lukas Biewald is the founder and Chief Data Scientist of CrowdFlower. Founded in 2009, CrowdFlower is a data enrichment platform that taps into an on-demand to workforce to help companies collect training data and do human-in-the-loop machine learning.
Following his graduation from Stanford University with a B.S. in Mathematics and an M.S. in Computer Science, Lukas led the Search Relevance Team for Yahoo! Japan. He then worked as a senior data scientist at Powerset, acquired by Microsoft in 2008. Lukas was featured in Inc Magazine’s 30 Under 30 list.
Lukas is also an expert level Go player.
In-person | Ai X Talk
Abstract Coming Soon!
Dr. Alex Liu is the founder and Director of the RMDS Lab, a data and AI ecosystem service provider. From 2013 to 2019, Alex was one of the data science thought leaders and a distinguished data scientist at IBM where he served as a Chief Data Scientist for analytics services. Before joined IBM, Dr. Liu worked as a chief data scientist for a few companies including iSKY and Retention Science. Previously, Dr. Liu taught advanced quantitative methods to PhD candidates in the University of Southern California and the University of California at Irvine, while consulted for many well-known organizations such as the United Nations and Ingram Micro. Alex has a MS of Statistical Computing and a PhD of Sociology from Stanford University.
Virtual | Talk | LLMs | ML/AI Safety and Security | All Levels
Language models are incredible engineering breakthroughs but require auditing and risk management before productization. These systems raise concerns about toxicity, transparency and reproducibility, intellectual property licensing and ownership, disinformation and misinformation, supply chains, and more. How can your organization leverage these new tools without taking on undue or unknown risks? While language models and associated risk management are in their infancy, a small number of best practices in governance and risk are starting to emerge. If you have a language model use case in mind, want to understand your risks, and do something about them, this presentation is for you..more details
Patrick Hall is an assistant professor of decision sciences at the George Washington University School of Business, teaching data ethics, business analytics, and machine learning classes. He also conducts research in support of NIST’s AI risk management framework and is affiliated with leading fair lending and AI risk management advisory firms.
Patrick studied computational chemistry at the University of Illinois before graduating from the Institute for Advanced Analytics at North Carolina State University. He has been invited to speak on AI and machine learning topics at the National Academies of Science, Engineering, and Medicine, ACM SIG-KDD, and the Joint Statistical Meetings. He has been published in outlets like Information, Frontiers in AI, McKinsey.com, O’Reilly Ideas, and Thompson-Reuters Regulatory Intelligence, and his technical work has been profiled in Fortune, Wired, InfoWorld, TechCrunch, and others. Patrick is the lead author of the book Machine Learning for High-Risk Applications.
Prior to joining the GW School of Business, Patrick co-founded BNH.AI, a boutique law firm focused on AI governance and risk management. He led H2O.ai’s efforts in responsible AI, resulting in one of the world’s first commercial applications for explainability and bias mitigation in machine learning. Patrick also held global customer-facing roles and R&D roles at SAS Institute. Patrick has built machine learning software solutions and advised on matters of AI risk for Fortune 100 companies, cutting-edge startups, Big Law, and U.S. and foreign government agencies.
In-person | Talk | Deep Learning | Machine Learning | Intermediate
Keras, the popular deep learning library, is now becoming multi-backend with support for TensorFlow, JAX, and PyTorch. Available now, Keras Core allows developers to create models with all of the simple high-level components of Keras while interchanging between frameworks to take advantage of the benefits of each. Existing users of each framework can seamlessly integrate their code, including backend-specific code, with Keras. Additionally, Keras Core’s ops suite, which includes the NumPy API, allows you to develop custom components for use on any framework. For current users of tf.keras, Keras Core with the TensorFlow backend serves as a drop-in replacement, meaning that changing your imports is all that is necessary to start taking advantage of the framework-agnostic future…more details
Neel is currently an engineer on the Keras team at Google. His work has been focused on open source development, adding major features to the latest releases of TensorFlow and Keras. Experienced broadly from data science to ML infrastructure, he is responsible for the saving and export of ML models, as well as a developer for cross-framework compatibility at Google. As a key maintainer of keras.io and tensorflow.org, Neel is passionate about educating the data science community at large and helping the latest innovations of AI reach everyone.
In-person | Talk | Beginner – Intermediate
Manufacturers of modern age are looking for data science solutions to help move their 3 key metrics, – Produce more, Be efficient and optimize resource utilization, and ship with highest possible quality. Given these 3 KPIs , we will understand how data science influences each of these and will deep dive over the key projects that have influenced these KPIs. We deep down on specific projects. Eg how we use recursive hypothesis testing to reduce the testing frequency, how we understand the physics behind a test and optimize, how we make mass production decisions based on design of experiments and small distribution analysis…more details
Angad Arora is a seasoned professional with a wealth of expertise in manufacturing and quality for leading tech companies. With his innovative approach to lean manufacturing using data science, he has successfully saved millions of dollars for consumer electronics manufacturing lines. Currently serving as a Product Quality Manager at Google Inc., Angad continues to drive excellence and ensure top-notch product quality in the dynamic world of technology.
In-person | Talk | Beginner
Little is known about the history of Neural Networks. For instance, “”Artificial Intelligence.”” The history goes back to around 1946 when researchers noticed that the mathematics involved with linear algebra wherein materials were stretched that the neighboring atoms were affected, was best modeled with a branch of Math known as tensors. Tensors are used today to create neural networks. Neural networks are run on a powerful Graphics Processing Unit (GPU.) GPUs came about because of video games and entertainment. Thus we can say that video games laid the groundwork for AI…more details
Jack McCauley an Innovator in Residence at Jacobs Institute for Design Innovation at UC Berkeley, Professor at UC Berkeley, Co-Founder of Oculus, an American engineer, hardware designer, inventor, video game developer and philanthropist. Jack is best known for designing the guitars and drums for the Guitar Hero video game series, and as a co-founder and former chief engineer at Oculus VR. At Oculus, Jack designed and built the Oculus DK1 and DK2 virtual reality headsets. Oculus was acquired by Facebook for $2 Billion. McCauley holds numerous U.S. patents for inventions in software, audio effects, virtual reality, motion control, computer peripherals, and video game hardware and controllers. Jack was awarded a full scholarship to attend University of California, Berkeley where he earned a BSc., EECS in Electrical Engineering and Computer Science in 1986. Jack has authored numerous research papers in the field of artificial intelligence (AI) and mathematical modeling of AI-based systems and is currently pursuing new projects at his private R&D facility and hardware incubator in Pleasanton, California.
In-person | Talk | LLMs | Machine Learning | Intermediate
In the fast-paced world of data science and AI, we will explore how large language models (LLMs) can elevate the development process of Apache Spark™ applications.
We’ll demonstrate how LLMs can simplify SQL query creation, data ingestion, and DataFrame transformations, leading to faster development and more precise code that’s easier to review and understand. We’ll also show how LLMs can assist in creating visualizations and clarifying data insights, making complex data easy to understand…more details
Denny Lee is a long-time Apache Spark™ and MLflow contributor, Delta Lake maintainer, and a Sr. Staff Developer Advocate at Databricks. A hands-on distributed systems and data sciences engineer with extensive experience developing internet-scale data platforms and predictive analytics systems. He has previously built enterprise DW/BI and big data systems at Microsoft, including Azure Cosmos DB, Project Isotope (HDInsight), and SQL Server. He was also the Senior Director of Data Sciences Engineering at SAP Concur. He also has a Masters of Biomedical Informatics from Oregon Health and Sciences University and has implemented powerful data solutions for enterprise Healthcare customers. His current technical focuses include Distributed Systems, Delta Lake, Apache Spark, Deep Learning, Machine Learning, and Genomics.
In-person | Talk | All Levels
Attendees will learn how to apply NLP to financial data. Earnings data covers all industries in the US; attendees will be surprised to learn how much information is hiding in plain sight. Earnings calls are public information and I’m going to show you what you’re able to learn from them using ASR and NLP tools.
Speech recognition technology has come a long way in recent years, but there are still some larger market challenges when it comes to transcribing speech accurately. This comes into play especially for domain-specific terminology around certain industries. For example, in the renewable energy industry, transcription might include terms like “hydro DM,” “biomass conversion,” and “renewable portfolio standard,” that are outside normal conversation…more details
Katie is the Product Lead for Scribe at Kensho. Scribe is an Artificial Intelligence, Natural Language Processing tool that does voice to text transcription. Katie’s background is in data pipeline management and machine learning – specifically unsupervised machine learning in aerospace to detect satellite collisions. She is a guest lecturer at Johns Hopkins University and graduated with distinction from the University of Virginia.
In-person | Talk | Generative AI | NLP | Machine Learning | Research Frontiers (Research topics in ML, DL, Data Science etc) | All Levels
This talk will dive into the end-to-end process of and framework for building a generative AI application, leveraging a fun and engaging case study with open-source tooling (e.g., HuggingFace models, Python, PyTorch, Jupyter Notebooks). We will guide attendees through key stages from model selection and training to deployment, while also addressing fine-tuning versus prompt-engineering for specific tasks, ensuring the quality of output, and mitigating risks…more details
Michelle is a technology leader that specializes in machine learning and cloud computing. She has 15 years of experience in the technology industry, contributed to the original IBM Watson showcased on Jeopardy, and enjoys building and leading teams that develop and deploy AI solutions to solve real-world problems. Michelle is passionate about diversity, STEM education/careers for our minority communities, and serves both on the board of Women in Data and as an avid volunteer for Girls Who Code.
In-person | Keynote | Machine Learning | All Levels
We have seen amazing technical progress in AI applications in recent years. This talk considered the human side rather than the technical side: how can we gain confidence that our applications will be fair, just, truthful, beneficial, and well-stirred for their users, the other stakeholders, and society at large…more details
Peter Norvig is a Distinguished Education Fellow at Stanford’s Human-Centered Artificial Intelligence Institute and a researcher at Google Inc; previously he directed Google’s core search algorithms group and Google’s Research group. He was head of NASA Ames’s Computational Sciences Division, where he was NASA’s senior computer scientist and a recipient of NASA’s Exceptional Achievement Award in 2001. He has taught at the University of Southern California, Stanford University, and the University of California at Berkeley, from which he received a Ph.D. in 1986 and the distinguished alumni award in 2006. He was co-teacher of an Artifical Intelligence class that signed up 160,000 students, helping to kick off the current round of massive open online classes. His publications include the books Data Science in Context (to appear in 2022), Artificial Intelligence: A Modern Approach (the leading textbook in the field), Paradigms of AI Programming: Case Studies in Common Lisp, Verbmobil: A Translation System for Face-to-Face Dialog, and Intelligent Help Systems for UNIX. He is also the author of the Gettysburg Powerpoint Presentation and the world’s longest palindromic sentence. He is a fellow of the AAAI, ACM, California Academy of Science and American Academy of Arts & Sciences.
In-Person | Business Talk | All Levels
By harnessing the power of machine learning and AI, companies can unlock advanced capabilities and gain valuable insights that streamline sales processes and enhance sellers’ performance. This infusion eliminates the laborious task of manual information analysis across multiple disparate data sources, empowering sellers to dedicate their time and expertise to building meaningful customer relationships. In architecting AI-based solutions, it is of high importance to understand the users’ daily workflow and challenges, and not just surfacing the insights but surfacing the right information at the right place to simplify users’ workflow…more details
In-person | Ai X Talk | Generative AI
In this session you will learn about some early experiences, and best practices of deploying Generative AI solutions in enterprises…more details
In-person | Ai X Talk | Intermediate
Critical success factor for enterprise level AI adoption and sponsorship is the ability to rapidly deploy AI technologies, solve use cases quickly, and deliver iterative business value to stakeholders. In this session, we will discuss the challenges of AI adoption, share some pragmatic approaches that enterprises can adopt to accelerate their AI / ML initiatives and deliver quick and iterative business value. Examples of some approaches would include leveraging pre-integrated AI frameworks and automation techniques such as AutoML, along with ready-to-use industry-specific AI solutions. The faster and more incremental the delivery of business value through AI, the higher the likelihood of successful adoption and implementation of AI within enterprises…more details
Durga Kota is the Chief Technology Officer for the Americas region and is responsible for leading the strategies to translate Fujitsu technology innovation into differentiated and industry-ready solutions for the region’s customers.
Throughout his 24-year career, Durga has held customer-facing, consultative solution-selling and technology leadership roles. His collaborative spirit and deep technical expertise enable him to build trust, work in true partnership with customers, and provide solution leadership that leverages advanced technologies to solve even the most complex challenges.
Durga holds an MBA from the Indian Institute of Foreign Trade (IIFT), Delhi, and a B.Tech. in Computer Science and Engineering from the Jawaharlal Nehru Technological University, Hyderabad, India.
Durga resides in the San Diego area with his family and likes to spend his free time trekking on one of the many trails in the area, or volunteering for the local food bank.
In-person | Talk | Intermediate-Advanced
In today’s data-driven world, recommendation systems have become ubiquitous, driving user engagement and increasing revenue for businesses across various domains. This talk will take you on a journey from the raw data to vectorization techniques, ultimately culminating in the creation of a robust recommendation system. Whether you’re a seasoned data scientist or just starting your journey into recommendation systems, this presentation will provide valuable insights and practical takeaways for building powerful recommendation engines…more details
Hudson Buzby is a dynamic Solutions Architect with a passion for leveraging technology to solve complex challenges. With over a decade of experience in the tech industry, Hudson has made significant contributions to the world of data engineering and cloud solutions. Hudson currently serves as a Solutions Architect at Qwak, where he continues to push the boundaries of what’s possible in the world of cloud computing and data management.
In-person | Talk | NLP & LLMs | Intermediate-Advanced
In this session, we will take a deep dive into a novel application of AI: training Large Language Models (LLM) on individual employee’s Slack messages. The first portion of our discussion is dedicated to the technical aspects of this process, where we will explain the steps involved in fine-tuning the LLM. We will demonstrate how such models, tailored to mimic specific individual’s textual styles, can serve as the foundation for applications in text generation and automated question answering systems.
Transitioning into the second part of our talk, we will spotlight the often-underemphasized side of AI deployment – the risks and ethical concerns. Using our project as a case study, we will expose you to the potential pitfalls we encountered and the proactive measures taken to manage these risks. Our aim is to underline the necessity of an encompassing risk management framework for AI…more details
Eli is CTO and Co-Founder at Credo AI. He has led teams building secure and scalable software at companies like Netflix and Twitter. Eli has a passion for unraveling how things work and debugging hard problems. Whether it’s using cryptography to secure software systems or designing distributed system architecture, he is always excited to learn and tackle new challenges. Eli graduated with an Electrical Engineering and Computer Science degree from U.C. Berkeley.
In-person | Talk | Machine Learning | Deep Learning | Intermediate
In this talk I will present a new, hybrid approach which combines the best aspects of both methods. The process begins with a careful observation of customer data and assessment of whether there are naturally formed clusters in the data. It continues with the selection of a clustering algorithm and the fine-tuning of a model to create clusters…more details
Evie Fowler is a data scientist based in Pittsburgh, Pennsylvania. She currently works in the healthcare sector leading a team of data scientists who develop predictive models centered on the patient care experience. She holds a particular interest in the ethical application of predictive analytics and in exploring how qualitative methods can inform data science work. She holds an undergraduate degree from Brown University and a master’s degree from Carnegie Mellon.
In-person | Talk | Generative AI | Machine Learning Safety and Security | Beginner-Intermediate
In today’s digital world, we have access to everything at our fingertips. We generate and consume data at a lightning speed. Data is growing at an exponential rate. But how safe is your data? Our data is always surrounded by advanced persistent threats (APTs). On the other hand, Data Science, Machine Learning & Artificial Intelligence (AI) with its sub-domains like Deep Learning and Large Language Models (LLMs) are making tremendous progress. In this talk, I will focus on the advanced technologies like LLMs and GPT models in the domain of cyber security. But some of these concepts are relatable to other industries as well…more details
Nirmal Budhathoki is a Senior Data Scientist, who is currently working at Microsoft in Cloud Security. Nirmal has over 12+ years of experience in the IT industry, including 5+ years in data science. Nirmal’s strong belief in continuous learning has led him to complete three master’s degrees with majors on: Information Systems, Business Administration, and Data Science. Nirmal loves to help the data science community and has completed over 600+ free mentoring sessions with aspiring data scientists to help them navigate their data science career. Nirmal also conducts mentored learning sessions for MiT’s Data Science and Machine Learning certification program in collaboration with Great Learning. Nirmal has experience working with the US government for the Department of Navy, and he is also a US army veteran. Nirmal yearns to solve data science problems that are aligned with product strategy and business outcomes. In his free time, Nirmal loves using this data science skills in sports analytics.
In-person | Talk | Machine Learning | MLOps and Data Engineering | All Levels
The robotaxi industry has one of the most extreme use case for Machine Learning with very large image/LiDAR datasets and resource-hungry training jobs. I will recount my experience leading the ML Infrastructure team at Cruise, the #1 robotaxi company. From training on desktop GPUs to being entirely cloud-based, I will detail the challenges we faced, how we solved them. I will also detail what guarantees need to be built into any production-grade ML platform to ensure fast, robust, and safe ML development…more details
Emmanuel Turlay is the founder and CEO of Sematic, an open-source ML infrastructure company. Emmanuel started his career in academia researching particle physics at CERN, before branching out into tech and moving to the US in 2014. He led engineering teams at Instacart, and then Cruise, where he led the ML Infrastructure team for four years. In 2022, Emmanuel founded Sematic to bring learnings from building ML infrastructure for robotaxis to the rest of the industry in an open-source manner.
In-person | Talk | MLOps | Data Engineering & Big Data | All Levels
Over the past decade, the democratization of data science tooling, particularly through Python libraries like pandas and NumPy, has empowered practitioners of all levels to work with data efficiently. Yet, despite the popularity of these tools, they present challenges as practitioners look to scale their workflows to production. In this talk, we explore the limitations of these tools and pain points that data scientists encounter when working with data at scale. I will share how our open-source project Modin (10M+ downloads) addresses this issue by seamlessly scaling up your pandas code with just a single line of code change…more details
Doris Lee is the CEO and co-founder of Ponder (https://ponder.io/). Doris received her Ph.D. from the UC Berkeley RISE Lab and School of Information in 2021, where she developed tools that help data scientists explore and understand their data. She is the recipient of Forbes 30 under 30 for Enterprise Technology in 2023.
In-person | Talk | MLOps | Intermediate
Ensuring proper data quality is critical in the effective implementation of data pipelines for ML, data science, geospatial analysis, or general analytics.
Most engineering teams address data quality and pipeline orchestration as two separate tasks. In this presentation, Sandy Ryza will explain the benefits of a model in which arbitrary checks are included in the data orchestration logic, resulting in better control and integration of data quality checks at various steps in the pipeline…more details
Sandy is a lead engineer, author, and thought leader in the domain of data engineering. Sandy co-wrote “Advanced Analytics with PySpark” and “”Advanced Analytics with Spark”. He led ML and data science teams at Cloudera, Remix, Clover Health, and KeepTruckin.
Sandy is currently the lead engineer on the Dagster project, an open-source data orchestration platform used in MLOps, data science, IOT and analytics. Sandy is a regular speaker at data engineering and ML conferences.
In-person | Talk | Deep Learning | MLOps and Data Engineering | Intermediate – Advanced
In this talk, we’ll provide an overview of the core ideas behind Saturn, how it works on a technical level to reduce runtimes & costs, and the process of using Saturn for large-model finetuning. We’ll demonstrate how Saturn can accelerate and optimize large-model workloads in just a few lines of code and describe some high-value real-world use cases we’ve already seen in industry & academia…more details
Kabir Nagrecha is a Ph.D. candidate at UC San Diego, working with Professors Arun Kumar & Hao Zhang. His work focuses on systems infrastructure to support deep learning at scale, aiming to democratize large models and amplify the impact of machine learning applications. He is the recipient of the Meta Research Fellowship, as well as fellowships from the Halicioglu Data Science Institute and Jacobs School of Engineering at UCSD.
Kabir is the youngest-ever Ph.D. student at UCSD, having started his doctorate at the age of 17. He’s previously worked with companies such as Apple, Meta, & Netflix to build the core infrastructure that supports widely-used services such as Siri & Netflix’s recommendation algorithms. Most recently, he’s been working on Saturn, a new system to support automatic parallelization, scheduling, and resource apportioning for training large neural networks.
In-person | Talk | Machine Learning | Intermediate
Many retail giants are experiencing huge inventory loss and shrinkage problems because they process a huge number of transaction activities every day across the U.S. and offer a liberal shopping policy to provide a convenient customer shopping and return experience. Due to the facts that 1) they have highly imbalance and complex transaction data set as there are enormous transaction data while different types of anomalies are exceedingly rare; 2) there are seldom predefined labels available as it is not feasible to have human experts manually review every transaction and identify anomalies, it is a challenging task to investigate customers’ return behaviors and prevent fraudulent activities…more details
Chuying (Annie) Ma is a senior data scientist in Walmart Inkiru team, where she works on developing and implementing machine learning models and strategies for real-time fraud detection and risk mitigation. She has a Master of Science degree in Biostatistics from Harvard University and gets her bachelor’s degree in Statistics and Mathematics from University of Michigan – Ann Arbor. In her spare time, she enjoys playing violin and ukulele.
In-person | Talk | LLMs | Machine Learning | Intermediate
In this presentation, we explore an approach that utilizes LLMs to guide feature engineering. By leveraging the contextual understanding within LLMs, we have developed a system for LLM-assisted feature engineering. Our research demonstrates the practical benefits of this synergy – from feature ideation to improved feature relevance to enhanced model interpretability and efficiency…more details
Sergey is a data scientist with a background in physics and neurobiology. FeatureByte is Sergey’s second startup. He was one of the first employees at DataRobot where he created and led a professional services group and helped the company grow into a unicorn. Sergey is widely known for being a Kaggle Grandmaster and holding the #1 rank on Kaggle in the past. Multiple times he was mentioned as one of the top data scientists by various publications. Sergey’s passion is in machine learning, predictive modeling and inventive feature engineering.
In-person | Talk | Machine Learning
Deploying advanced Machine Learning technology to serve customers and/or business needs requires a rigorous approach and production-ready systems, which has led to the development of MLOps. Large models make rigorous engineering and scalable architectures even more important, which in turn has led to the emergence of LMOps. Just the size of the models themselves, and the datasets used for training, require highly efficient infrastructure. More complex pipeline topologies which include transfer learning, fine tuning, instruction tuning, and evaluation along a collection of complex dimensions, require a high degree of flexibility for customization. Add to this the complex inference-time systems and requirements, and LMOps starts to look somewhat challenging to implement…more details
A data scientist and ML enthusiast, Robert has a passion for helping developers quickly learn what they need to be productive. Robert is currently the Senior Product Manager for TensorFlow Open-Source and MLOps at Google and helps ML teams meet the challenges of creating products and services with ML. Previously Robert led software engineering teams for both large and small companies, always focusing on moving fast to implement clean, elegant solutions to well-defined needs. You can find him on LinkedIn at robert-crowe.
In-person | Talk | Deep Learning | NLP | Machine Learning | MLOps and Data Engineering | Beginner – Intermediate
We discuss a unified and user friendly approach to develop ML applications on MySQL HeatWave AutoML. We describe a simple to use MySQL API for HeatWave AutoML, its extension for different use cases and its ability to interface with third party applications. We also discuss how this facilitates the development of classification, regression, anomaly detection, forecasting and recommendation systems use cases, and present its comparison with other platforms that offer ML features for data bases. Lastly, we present how a user can fine tune the AutoML pipeline for their needs…more details
Sanjay Jinturkar is Senior Director, MySQL HeatWave, focused on building machine learning capabilities inside the MySQL HeatWave database. These capabilities enable the user to automatically develop and deploy machine learning models inside the data base using AutoML in a cloud native environment for a variety of use cases, without the need to pull the data or the model outside the database. In past, he has held multiple technology and engineering management positions in systems software, mobile communications software, applications development and diagnostics. His interests include machine learning, cloud computing, database and architecture. Sanjay has a PhD in Computer Science from University of Virginia.
Sandeep Agrawal leads the HeatWave Machine Learning (HeatWave ML) project within MySQL HeatWave. HeatWave ML is the product of years of research and advanced development, and aims to help both data scientists and non-data scientists quickly apply ML to a given problem. Prior to HeatWave, Sandeep led the Oracle AutoML project within Oracle labs, creating a state-of-the-art distributed AutoML engine. He is passionate about Machine Learning and Systems Architecture, and a project like HeatWave ML that combines the two is heaven for him. Prior to Oracle, he completed his PhD in Computer Science from Duke University in 2015.
In-Person | Half-Day Training | Machine Learning | Intermediate
Finding good datasets or web assets to build data products or websites with, respectively, can be time-consuming. For instance, data professionals might require data from heavily regulated industries like healthcare and finance, and, in contrast, software developers might want to skip the tedious task of collecting images, text, and videos for a website. Luckily, both scenarios can now benefit from the same solution, Synthetic Data…more details
Ramon is a data scientist, researcher, and educator currently working in the Developer Relations team at Seldon in London. Prior to joining Seldon, he worked as a freelance data professional and as a Senior Product Developer at Decoded, where he created custom data science tools, workshops, and training programs for clients in various industries. Before freelancing, Ramon wore different research hats in the areas of entrepreneurship, strategy, consumer behavior, and development economics in industry and academia. Outside of work, he enjoys giving talks and technical workshops and has participated in several conferences and meetup events. In his free time, you will most likely find him traveling to new places, mountain biking, or both.
In-person | Half-Day Training | Machine Learning | All Levels
In this 90-minute hands-on lab, you’ll learn how to start your Graph journey with ArangoDB. We will walk you through getting a basic deployment up and running in minutes. We will then help you create a deployment in the ArangoGraph insights platform and learn how to load and query your data…more details
Asif Kazi has directed global sales engineering, professional services, and technical customer success teams for over 15 years at Foursquare, Ahana, Amazon Web Services, StreamSets, and Couchbase. He is passionate about working with customers and solving their business problems. His core competencies include scaling teams globally, building referenceable client relationships, and accelerating revenue growth.
Arthur Keen, Ph.D. is a Graph evangelist with more than 20 years of experience in Graph AI. His expertise spans Graph Machine Learning, graph analytics, and ontology. Arthur has developed knowledge graph-based solutions in intelligence, cyber, financial services, logistics, retail, and energy. He has led 3 product teams from concept to GA and first customers. He has been awarded 7 patents and has held leadership roles including CTO, Chief Scientist, and VP of Engineering. He has a Computer Science/Industrial Engineering Ph.D. and has been invited to speak at conferences including Data Day TX, NoSQL Now, SemTech, and SXSW.
In-person | Half-Day Training | Machine Learning | Intermediate-Advanced
As machine learning models have become more present in our lives, there has been increasing attention on the reliability of these models. A major component of this is understanding the how uncertain the model is about its prediction. No model is exactly right 100% of the time, so we need methods and approaches by which we can quantify the level of uncertainty around a prediction.
Approaches to uncertainty quantification (UQ) vary, and depend on the type of problem. For classification problems, the primary approach is probability calibration: making sure that the model outputs corresponding to each class “behaves well” as a probability. For regression problems, there are several different approaches. One can configure models to output an interval, rather than a single point prediction, along with a “coverage” value that specifies the probability that the interval covers the true value. The framework of Conformal Prediction provides theoretical guarantees around such interval predictions. Or one can use methods that output an entire conditional density for y given X. This is called probabilistic regression, or conditional density estimation. Several parametric and non-parametric approaches exist for this problem including PrestoBoost, Coarsage, and NGBoost…more details
Brian Lucena is Principal at Numeristical, where he advises companies of all sizes on how to apply modern machine learning techniques to solve real-world problems with data. He is the creator of three Python packages: StructureBoost, ML-Insights, and SplineCalib. In previous roles he has served as Principal Data Scientist at Clover Health, Senior VP of Analytics at PCCI, and Chief Mathematician at Guardian Analytics. He has taught at numerous institutions including UC-Berkeley, Brown, USF, and the Metis Data Science Bootcamp.
In-person | Workshop | Machine Learning | Deep Learning | NLP | ALL | Intermediate
In this hands-on, example driven workshop, we’ll introduce Mojo🔥 language features by starting with Python code and making minor changes to convert into high-performance Mojo🔥 code. Mojo provides full interoperability with the Python ecosystem, and we’ll show you how to integrate Mojo🔥 into your existing Python workflows. We’ll share the workshop material as a hosted tutorial with several Mojo🔥 scripts and Jupyter Notebooks which you can use as a starting point for your projects…more details
Shashank is an engineer, educator and doodler. He writes and talks about machine learning, specialized machine learning hardware (AI Accelerators) and AI Infrastructure in the cloud. He previously worked at Meta, AWS, NVIDIA, MathWorks (MATLAB) and Oracle in developer relations and marketing, product management, and software development roles and hold an M.S. in electrical engineering.
Virtual | Training | Generative AI | All Levels
This engaging workshop provides a hands-on journey into the world of Generative AI and Autonomous Agents, crucial building blocks towards achieving Artificial General Intelligence (AGI). Participants will delve into the transformative shift in AI, gaining insights into the revolutionary impact of Generative AI across various domains such as text, image, video, and 3D object generation, as well as data augmentation. The workshop will equip attendees with cutting-edge tools to substantially boost their AI capabilities and efficiency. A key feature of the session is a practical exercise in crafting an Autonomous AI Agent, a technology poised to work in synergy with us in the near future. This succinct yet thorough workshop is tailored for those eager to maintain a competitive edge in the swiftly advancing field of AI…more details
Long before the buzz surrounding generative AI, Martin Musiol was already advocating for its significance in 2015. Since then, he has been a frequent speaker at conferences, podcasts, and panel discussions, addressing the technological advancements, practical applications, and ethical considerations of generative AI. Martin Musiol is a founder of generativeAI.net, a lecturer on AI to over 3000 students, and publisher of the newsletter ‘Generative AI: Short & Sweet’. As the lead for GenAI Projects in Europe at Infosys Consulting (previously at IBM), Martin Musiol helps companies globally harness the power of generative AI to gain a competitive advantage. -> https://www.linkedin.com/in/martinmusiol1/ and his webpage: https://generativeai.net/
In-person | Training | MLOps and Data Engineering | Machine Learning | Deep Learning | All Levels
Our objective is to ensure that you are equipped with the essential knowledge and practical tools to proficiently manage your machine-learning models in a real-world production environment…more details
Oliver Zeigermann has been developing software with different approaches and programming languages for more than 3 decades. In the past decade, he has been focusing on Machine Learning and its interactions with humans.
Virtual | Workshop | Machine Learning | Intermediate
In this talk, we will cover the use of Generative models, such as LLMs and GANs, for the generation of smart synthetic data that can be leveraged to impute missing data. By using a generative model to impute missing data, we can generate new samples that are representative of the underlying data distribution, which can help to reduce the impact of missing data on our models. In addition, these models can be fine-tuned to specific datasets, allowing us to generate synthetic data that is tailored to our particular use case…more details
Fabiana Clemente is the co-founder and CDO of YData, combining Data Understanding, Causality, and Privacy as her main fields of work and research, with the mission to make data actionable for organizations. Passionate for data, Fabiana has vast experience leading data science teams in startups and multinational companies. Host of “When Machine Learning meets privacy” podcast and a guest speaker at Datacast and Privacy Please, the previous WebSummit speaker, was recently awarded “Founder of the Year” by the South Europe Startup Awards.
In-person | Workshop | Data Engineering & Big Data | Machine Learning | Generative AI | Data Visualization | Beginner
In this presentation, Alison will take you through the example of one independent researchers journey from no-code to knowledge graph master. She will walk you through the steps the researcher took, including how to format data stores, what data to collect, ways to leverage LLMs in the process and ultimately have a working knowledge graph in a dashboard with minimal coding experience…more details
Alison Cossette is a dynamic Data Science Strategist, Educator, and Podcast Host. As a Developer Advocate at Neo4j specializing in Graph Data Science, she brings a wealth of expertise to the field. With her strong technical background and exceptional communication skills, Alison bridges the gap between complex data science concepts and practical applications.
Alison’s passion for responsible AI shines through in her work. She actively promotes ethical and transparent AI practices and believes in the transformative potential of responsible AI for industries and society. Through her engagements with industry professionals, policymakers, and the public, she advocates for the responsible development and deployment of AI technologies.
Alison’s academic journey includes pursuing her Master of Science in Data Science program, specializing in Artificial Intelligence, at Northwestern University and research with Stanford University Human-Computer Interaction Crowd Research Collective. Alison combines academic knowledge with real-world experience. She leverages this expertise to educate and empower individuals and organizations in the field of data science.
Overall, Alison Cossette’s multifaceted background, commitment to responsible AI, and expertise in data science make her a respected figure in the field. Through her role as a Developer Advocate at Neo4j and her podcast, she continues to drive innovation, education, and responsible practices in the exciting realm of data science and AI.
In-person | Workshop | Machine Learning | Beginner
Pandas can be tricky, and there is a lot of bad advice floating around. This tutorial will cut through some of the biggest issues I’ve seen with Pandas code after working with the library for a while and writing three books on it…more details
Matt Harrison has been using Python since 2000. He runs MetaSnake, a Python and Data Science consultancy and corporate training shop. In the past, he has worked across the domains of search, build management and testing, business intelligence, and storage.
He has presented and taught tutorials at conferences such as Strata, SciPy, SCALE, PyCON, and OSCON as well as local user conferences.
Virtual | Tutorial | GenAI | LLM | All Levels
The tutorial will consist of the following parts: (1) Introduction and overview of the generative AI landscape, (2) Technical and ethical challenges with generative AI, and (3) Solutions for alleviating the challenges with real-world use cases and case studies (including practical challenges and lessons learned in industry)…more details
Krishnaram Kenthapadi is the Chief AI Officer & Chief Scientist of Fiddler AI, an enterprise startup building a responsible AI and ML monitoring platform. Previously, he was a Principal Scientist at Amazon AWS AI, where he led the fairness, explainability, privacy, and model understanding initiatives in the Amazon AI platform. Prior to joining Amazon, he led similar efforts at the LinkedIn AI team, and served as LinkedIn’s representative in Microsoft’s AI and Ethics in Engineering and Research (AETHER) Advisory Board. Previously, he was a Researcher at Microsoft Research Silicon Valley Lab. Krishnaram received his Ph.D. in Computer Science from Stanford University in 2006. He serves regularly on the senior program committees of FAccT, KDD, WWW, WSDM, and related conferences, and co-chaired the 2014 ACM Symposium on Computing for Development. His work has been recognized through awards at NAACL, WWW, SODA, CIKM, ICML AutoML workshop, and Microsoft’s AI/ML conference (MLADS). He has published 50+ papers, with 7000+ citations and filed 150+ patents (70 granted). He has presented tutorials on privacy, fairness, explainable AI, model monitoring, responsible AI, and generative AI at forums such as ICML, KDD, WSDM, WWW, FAccT, and AAAI, given several invited industry talks, and instructed a course on responsible AI at Stanford.
Virtual | Workshop | NLP & LLMs | Intermediate
The workshop will be hands-on and composed of 4 modules, targeting data scientists, machine learning experts, software engineers, and product managers. Prior exposure to LLMs is not necessary. At the end of the workshops, participants will have gained insights on how to effectively make the best of open-source LLMs and learned the necessary steps to bridge the gap between prototype and production-ready applications…more details
Suhas Pai is a NLP researcher and co-founder/CTO at Bedrock AI, a Toronto based startup. At Bedrock AI, he works on text ranking, representation learning, and productionizing LLMs. He is also currently writing a book on Designing Large Language Model Applications with O’Reilly Media. Suhas has been active in the ML community, being the Chair of the TMLS (Toronto Machine Learning Summit) conference since 2021 and also NLP lead at Aggregate Intellect (AISC). He was also co-lead of the Privacy working group at Big Science, as part of the BLOOM project.
Virtual | Workshop | Machine Learning | Intermediate
In this talk, we will explore and demystify th world of Causal AI for data science practitioners, with a focus on understand cause-and-effect relationships within data to drive optimal decisions. In this talk, we will focus on:
* from shapley to DAGs: the dangers of using post-hoc explainability methods as tools for decision making, and how tranditional ML isn’t suited in situations where want to perform interventions on the system.
* discovering causality: how do we figure out what is causal and what isn’t, with a brief introduction to methods of structure learning and causal discovery
* optimal decision making: by understanding causality, we now can accurately estimate the impact we can make on our system – how to use this knowledge to derive the best possible actions to make?…more details
Andre joined causaLens from Goldman Sachs, where he was an executive director in the Model Risk Management group in Hong Kong and Frankfurt. Today he is working with industry leading, global organisations to apply cutting edge Causal AI research in production level solutions that empower individuals and teams to make better decisions. Andre received his PhD in theoretical physics from the University of Munich, where he studied the interplay between quantum mechanics and general relativity in black-holes.
In-person | Workshop | LLM | Intermediate
This can be either a talk or a workshop, depending on how much time you have. This session aims to provide hands-on, engaging content that gives developers a basic understanding of Llama 2 models, how to access and use them, build core components of the AI chatbot using LangChain and Tools. The audience will also learn core concepts around Prompt Engineering and Fine-Tuning and programmatically implement them using Responsible AI principles. Lastly, we will conclude the talk explaining how they can leverage this powerful tech, different use-cases and what the future looks like…more details
Amit Sangani is the Director of Partner Engineering leading the Applied AI Platforms team at Meta. Amit has been with Meta for 8+ years and manages developer-facing engineering teams working on Gen AI platforms such as Llama 2 and PyTorch. Amit’s mission is to democratize AI and increase the adoption of these platforms by making it easier for developers to integrate them into their products and spur innovation and increased productivity.
In-person | Tutorial | Generative AI | Machine Learning | Beginner – Intermediate
The tutorial will cover the existing research on the capabilities of LLMs versus small traditional ML models. If an LLM is the best solution, the tutorial covers several techniques, including evaluation suites like the EleutherAI Harness, head-to-head competition approaches, and using LLMs for evaluating other LLMs. The tutorial will also touch on subtle factors that affect evaluation, including role of prompts, tokenization, and requirements for factual accuracy. Finally, a discussion of model bias and ethics will be integrated into the working examples…more details
Rajiv Shah is a machine learning engineer at Hugging Face who focuses on enabling enterprise teams to succeed with AI. Rajiv is a leading expert in the practical application of AI. Previously, he led data science enablement efforts across hundreds of data scientists at DataRobot. He was also a part of data science teams at Snorkel AI, Caterpillar, and State Farm. Rajiv is a widely recognized speaker on AI, published over 20 research papers, and received over 20 patents, including sports analytics, deep learning, and interpretability. Rajiv holds a PhD in Communications and a Juris Doctor from the University of Illinois at Urbana Champaign. While earning his degrees, he received a fellowship in Digital Government from the John F. Kennedy School of Government at Harvard University. He also has a large following on AI-related short videos on Tik Tok and Instagram at @rajistics.
In-person | Workshop
Have you ever thought about building an application that can help people find places to live that maximize their quality of life and happiness?
The goal of this workshop is to analyze public datasets across different categories and discover correlations among them using the HPCC Systems platform. After analyzing, we will design an interface to query this data and assign it a scoring system, and then deliver it to the user via the HPCC ROXIE cluster and show the user the best places to live in the United States. Users should be given choices in an easy-to-use form that when submitted will generate a unique set of scores based on locations (Example: By State)…more details
Bob has worked with the HPCC Systems technology platform and the ECL programming language for over a decade and has been a technical trainer for over 30 years. He is the developer and designer of the HPCC Systems Online Training Courses and is the Senior Instructor for all classroom and remote based training.
In-person | Workshop | Machine Learning | Deep Learning | All Levels
During this workshop, you will build complex time series forecasting and anomaly detection models from the ground up, perform feature engineering and selection, assess the accuracy, and utilize the ModelZoo browser and root cause analysis functionalities to investigate the outcomes…more details
Philip Wauters is Customer Success Manager and Value engineer at Tangent Works working on practical applications of time series machine learning at customers from various industries such as Siemens, BASF, Borealis and Volkswagen. With a commercial background and experience with data engineering, analysis and data science his goal is to find and extract the business value in the enormous amounts of time-series data that exists at companies today.
In-person | Workshop | Deep Learning | All Levels
Facial recognition systems are everywhere. Of course, it’s where you would expect it, such as airports, border crossings, and government offices. However, it’s also in some public surveillance cameras, all over social media, embedded in smart home solutions, and even in your phone. Have you ever wondered how facial recognition systems work? In this hands-on session, we will build a facial recognition system from scratch using open-source technologies and publically available pre-trained models…more details
Serg Masís has been at the confluence of the internet, application development, and analytics for the last two decades. He’s an Agronomic Data Scientist at Syngenta, a leading agribusiness company with a mission to improve global food security. Before that role, he co-founded a search engine startup, incubated by Harvard Innovation Labs, that combined the power of cloud computing and machine learning with principles in decision-making science to expose users to new places and events efficiently. Whether concerning leisure activities, plant diseases, or customer lifetime value, Serg is passionate about providing the often-missing link between data and decision-making. He wrote the bestselling book “Interpretable Machine Learning with Python” and is currently working on a new book titled “DIY AI” with do-it-yourself projects for AI hobbyists and practitioners alike.
In-person | Workshop | LLM | Gen AI | Machine Learning | Intermediate
In this talk, we show how to personalize LLMs using a feature store and prompt engineering. We walk through how to build an example free serverless, personalized LLM application using Hopsworks, an open-source feature store with a built-in vector database. We will look at how to build templates for prompts, and how they can be easily constructed and included in user queries…more details