more sessions added weekly
Pre-Botocamp: Introduction to Data Course
Pre-Bootcamp: Introduction to Programming with Python Course
Pre-Bootcamp: Data Wrangling with Python Course
Pre-Bootcamp: Introduction to SQL Course
Pre-Bootcamp: Introduction to AI Course
The most up-to-date schedule you can check:
– for in-person: Download the ODSC EventX Ai app (the login details will be shared in email/calendar invite)
– for virtual: live.odsc.com (agenda section)
In-Person | Keynote
In-person | Keynote | Machine Learning | All Levels
In this talk, I’ll discuss some ways that we might cope with and address the unreliability of neural network models. As an initial coping strategy, I will first discuss a technique for detecting whether some content was generated by a machine learning model, leveraging the probability distribution that the model assigns to different content. Next, I will describe an approach for enabling neural network models to better estimate what they don’t know, such that they can refrain from making predictions on such inputs (i.e. selective classification). I will lastly describe methods for adapting models with small amounts of data to improve their accuracy under distribution shift…more details
Chelsea Finn is an Assistant Professor in Computer Science and Electrical Engineering at Stanford University. Her research interests lie in the capability of robots and other agents to develop broadly intelligent behavior through learning and interaction. To this end, her work has pioneered end-to-end deep learning methods for vision-based robotic manipulation, meta-learning algorithms for few-shot learning, and approaches for scaling robot learning to broad datasets. Her research has been recognized by awards such as the Sloan Fellowship, the NSF CAREER Award, the MIT Tech Review 35 Under 35, and the ACM doctoral dissertation award, and has been covered by various media outlets including the New York Times, Wired, and Bloomberg. Prior to Stanford, she received her Bachelor’s degree in Electrical Engineering and Computer Science at MIT and her PhD in Computer Science at UC Berkeley.
We have seen amazing technical progress in AI applications in recent years. This talk considered the human side rather than the technical side: how can we gain confidence that our applications will be fair, just, truthful, beneficial, and well-stirred for their users, the other stakeholders, and society at large…more details
Peter Norvig is a Distinguished Education Fellow at Stanford’s Human-Centered Artificial Intelligence Institute and a researcher at Google Inc; previously he directed Google’s core search algorithms group and Google’s Research group. He was head of NASA Ames’s Computational Sciences Division, where he was NASA’s senior computer scientist and a recipient of NASA’s Exceptional Achievement Award in 2001. He has taught at the University of Southern California, Stanford University, and the University of California at Berkeley, from which he received a Ph.D. in 1986 and the distinguished alumni award in 2006. He was co-teacher of an Artifical Intelligence class that signed up 160,000 students, helping to kick off the current round of massive open online classes. His publications include the books Data Science in Context (to appear in 2022), Artificial Intelligence: A Modern Approach (the leading textbook in the field), Paradigms of AI Programming: Case Studies in Common Lisp, Verbmobil: A Translation System for Face-to-Face Dialog, and Intelligent Help Systems for UNIX. He is also the author of the Gettysburg Powerpoint Presentation and the world’s longest palindromic sentence. He is a fellow of the AAAI, ACM, California Academy of Science and American Academy of Arts & Sciences.
In-Person | AI X Talk
Generative models like GPT-4, Claude, and PaLM have high levels of coherence and amazing reasoning capabilities, but they model knowledge “implicitly,” making them vulnerable to breaking down in out-of-domain settings. To build a conversational assistant / chatbot that is aware of new, domain-specific knowledge, we will explain the necessity of Retrieval Augmented Generation (RAG) and the role of domain-adapted hybrid semantic search. We will also explain how RAG approaches also protect against the privacy and sensitivity implications of fine-tuning generative models, by applying permissions at the appropriate input layer. We will also overview how LLM Agents and a Tool framework build off of RAG to increase the space of solvable workflows and intents.
Eddie joined Glean as a founding engineer. He leads a team of engineers and AI experts at Glean who work on challenging problems across Machine Learning, LLMs, NLU, domain adaptation, semantic search, and much more in order to build the world’s best work assistant. Previously, he worked as a research and software engineer for Google, contributing to a number of major projects, including natural language and semantic solutions. His current mission is to leverage AI and NLP technology to create the enterprise solution that speaks the language of each individual workplace.
In-person | Track Keynote | AI Safety/Security | Beginner-Intermediate
In this talk, I will describe the powerful engine of confidential computing, which promises to address these problems through a combination of hardware advances and cryptographic techniques. I will discuss our research from UC Berkeley as well as its tech transfer to industry. The concept is a powerful one: the servers can compute on data without seeing it. The attendees will understand what confidential computing is, what problems it can solve in their sensitive data lifecycle, and which tools they can use today for this purpose…more details
Raluca Ada Popa is the Robert E. and Beverly A. Brooks associate professor of computer science at UC Berkeley working in computer security, systems, and applied cryptography. She is a co-founder and co-director of the RISELab and SkyLab at UC Berkeley, as well as a co-founder of Opaque Systems and PreVeil, two cybersecurity companies. Raluca has received her PhD in computer science as well as her Masters and two BS degrees, in computer science and in mathematics, from MIT. She is the recipient of the 2021 ACM Grace Murray Hopper Award, a Sloan Foundation Fellowship award, Jay Lepreau Best Paper Award at OSDI 2021, Distinguished Paper Award at IEEE Euro S&P 2022, Jim and Donna Gray Excellence in Undergraduate Teaching Award, NSF Career Award, Technology Review 35 Innovators under 35, Microsoft Faculty Fellowship, and a George M. Sprowls Award for best MIT CS doctoral thesis.
In-person | Ai X Talk | Generative AI
In this session you will learn about some early experiences, and best practices of deploying Generative AI solutions in enterprises…more details
Rama Akkiraju is a VP of Enterprise AI & Automation at NVIDIA. Prior to this, Rama held various leadership roles at IBM in her two decades-plus career. Most recently, Rama was an IBM Fellow, and a Master Inventor at IBM where she was the CTO of IBM Watson AI Operations product, a product to optimize IT operations management with AI. Prior to this role, Rama led the AI mission of enabling natural, personalized, and compassionate conversations between computers and humans where she and her team developed and delivered several differentiating AI Services such as Personality Insights, Tone Analyzer, Emotion Analysis, and Sentiment Analysis services to the IBM Watson platform. Rama also led the effort to scale IBM Watson’s AI services including Automatic Speech Recognition, Natural Language Understanding, and Chatbot services to dozens of languages. Before this, Rama led various projects and Research teams at IBM Watson Research Center and IBM Almaden Research Center in the areas of agent-based decision support systems, business process management, electronic marketplaces, and semantic Web services, for which she led a World-Wide-Web (W3C) standard. Rama has co-authored over 100 technical papers. Rama has 50+ issued patents and 25+ pending. She is the recipient of 4 best paper awards in AI and Operations Research areas. Rama served as the President of ISSIP, a Service Science professional society for 2018, and continues to be on the organization’s Strategy committee. Rama holds a master’s degree in Computer Science and has received a gold medal from New York University for her MBA for the highest academic excellence. Rama was named by Forbes as one of the ‘Top 20 Women in AI Research’ in May 2017, has been featured in ‘A-Team in AI’ by Fortune magazine in July 2018, and was named ‘Top 10 pioneering women in AI and Machine Learning’ by Enterprise Management 360 in April 2019. Rama is also the recipient of the University of California, Berkeley’s Athena award for Technical and Executive Leadership for 2020. In 2022, Rama was named ‘USA Industry innovator of the Year 2022’ by the Women in AI Organization and ‘AI Industry Leader of the Year 2022’ award by the Women Leaders in Data & AI (WLDA) organization.
In-person | Ai X Talk | Intermediate
Critical success factor for enterprise level AI adoption and sponsorship is the ability to rapidly deploy AI technologies, solve use cases quickly, and deliver iterative business value to stakeholders. In this session, we will discuss the challenges of AI adoption, share some pragmatic approaches that enterprises can adopt to accelerate their AI / ML initiatives and deliver quick and iterative business value. Examples of some approaches would include leveraging pre-integrated AI frameworks and automation techniques such as AutoML, along with ready-to-use industry-specific AI solutions. The faster and more incremental the delivery of business value through AI, the higher the likelihood of successful adoption and implementation of AI within enterprises…more details
Durga Kota is the Chief Technology Officer for the Americas region and is responsible for leading the strategies to translate Fujitsu technology innovation into differentiated and industry-ready solutions for the region’s customers.
Throughout his 24-year career, Durga has held customer-facing, consultative solution-selling and technology leadership roles. His collaborative spirit and deep technical expertise enable him to build trust, work in true partnership with customers, and provide solution leadership that leverages advanced technologies to solve even the most complex challenges.
Durga holds an MBA from the Indian Institute of Foreign Trade (IIFT), Delhi, and a B.Tech. in Computer Science and Engineering from the Jawaharlal Nehru Technological University, Hyderabad, India.
Durga resides in the San Diego area with his family and likes to spend his free time trekking on one of the many trails in the area, or volunteering for the local food bank.
In-Person | Book Author
Get your copy of Amy Hodler‘s book signed at our book-author event! We will have a limited number of copies available for free.
Amy Hodler is an evangelist for graph analytics and responsible AI. She’s the co-author of O’Reilly books on Graph Algorithms and Knowledge Graphs as well as a contributor to the Routledge book, Massive Graph Analytics and Bloomsbury book, AI on Trial. Amy has decades of experience in emerging tech at companies such as Microsoft, Hewlett-Packard (HP), Hitachi IoT, Neo4j, Cray, and RelationalAI. Amy is the founder of GraphGeeks.org promoting connections everywhere.
Get your copy of Stefanie Molins’s book signed at our book-author event! We will have a limited number of copies available for free.
Stefanie Molin is a software engineer and data scientist at Bloomberg in New York City, where she tackles tough problems in information security, particularly those revolving around data wrangling/visualization, building tools for gathering data, and knowledge sharing. She is also the author of “Hands-On Data Analysis with Pandas,” which is currently in its second edition. She holds a bachelor’s of science degree in operations research from Columbia University’s Fu Foundation School of Engineering and Applied Science, as well as a master’s degree in computer science, with a specialization in machine learning, from Georgia Tech. In her free time, she enjoys traveling the world, inventing new recipes, and learning new languages spoken among both people and computers.
Get your copy of Matt Harrison’s book signed at our book-author event! We will have a limited number of copies available for free.
Matt Harrison has been using Python since 2000. He runs MetaSnake, a Python and Data Science consultancy and corporate training shop. In the past, he has worked across the domains of search, build management and testing, business intelligence, and storage.
He has presented and taught tutorials at conferences such as Strata, SciPy, SCALE, PyCON, and OSCON as well as local user conferences.
Virtual | Talk | Machine Learning | Intermediate
In this talk, I will attempt to provide several “bird’s eye” views on GNNs. Following a quick motivation on the utility of graph representation learning, I will derive GNNs from first principles of permutation invariance and equivariance. We will discuss how we can build GNNs that are not strictly reliant on the input graph structure…more details
Petar Veličković is a Staff Research Scientist at Google DeepMind, Affiliated Lecturer at the University of Cambridge, and an Associate of Clare Hall, Cambridge. Petar holds a PhD in Computer Science from the University of Cambridge (Trinity College), obtained under the supervision of Pietro Liò. His research concerns geometric deep learning—devising neural network architectures that respect the invariances and symmetries in data (a topic I’ve co-written a proto-book about). Petar’s research has been used in substantially improving travel-time predictions in Google Maps, and guiding intuition of mathematicians towards new top-tier theorems and conjectures.
Many retail giants are experiencing huge inventory loss and shrinkage problems because they process a huge number of transaction activities every day across the U.S. and offer a liberal shopping policy to provide a convenient customer shopping and return experience. Due to the facts that 1) they have highly imbalance and complex transaction data set as there are enormous transaction data while different types of anomalies are exceedingly rare; 2) there are seldom predefined labels available as it is not feasible to have human experts manually review every transaction and identify anomalies, it is a challenging task to investigate customers’ return behaviors and prevent fraudulent activities…more details
Chuying (Annie) Ma is a senior data scientist in Walmart Inkiru team, where she works on developing and implementing machine learning models and strategies for real-time fraud detection and risk mitigation. She has a Master of Science degree in Biostatistics from Harvard University and gets her bachelor’s degree in Statistics and Mathematics from University of Michigan – Ann Arbor. In her spare time, she enjoys playing violin and ukulele.
Virtual | Talk | NLP & LLMs | Intermediate
Large language models (LLMs) have achieved a milestone that undeniably changed many held beliefs in artificial intelligence (AI). However, there re-mains many limitations of these LLMs when it comes to true language un-derstanding, limitations that are a byproduct of the underlying architecture of deep neural networks. Moreover, and due to their subsymbolic nature, whatever knowledge these models acquire about how language works will always be buried in billions of microfeatures (weights), none of which is meaningful on its own, making such models hopelessly unexplainable…more details
Walid Saba is a Senior Research Scientist at the Institute for Experiential AI at Northeastern University. Prior to joining the institute in 2023, he worked at two Silicon Valley startups, focusing on conversational AI. This work included high-level roles as the principal AI scientist for telecommunications company Astound and CTO of software company Klangoo, where he helped develop its state-of-the-art digital content semantic engine (Magnet).
Saba’s career to date has seen him hold various positions in both the private sector and academia. His resume includes entities such as the American Institutes for Research, AT&T Bell Labs, IBM and Cognos, while he has also spent a cumulative seven years teaching computer science at the University of Ottawa, the New Jersey Institute of Technology (NJIT), the University of Windsor (a public research university in Ontario, Canada), and the American University of Beirut (AUB).
Walid is frequent invited for interviews and as a keynote speaker on AI and NLP has published over 45 technical articles, including an award-winning paper that he presented at the German Artificial Intelligence Conference (KI-2008). Walid received his BSc and MSc in Computer Science from the University of Windsor, and a Ph.D in Computer Science from Carleton University in 1999.
Virtual | Talk | NLP & LLMs | Beginner
Generative models have demonstrated how helpful they can be on general knowledge, helping students on their writing assignments. But as soon as you want to run it in a professional setting with prompts like “what are the three main feature requests from our largest customers?”, they demonstrate their lack of knowledge. In this session, I will introduce how Large Language Models (LLMs) can be connected to your data via semantic search. As I will present, there are many pitfalls and challenges. Some can be solved, when using the right technologies, others are still open problems…more details
Nils Reimers is an NLP / Deep Learning researcher with extensive experience on representing text in dense vector spaces and how to use them for various applications. During his research career, he created sentence-transformers that were the foundation for many today’s semantic search applications.
In 2022, Nils joined Cohere.com to lead the team on smarter semantic search technologies and how to connect LLMs to enterprise data. Here, his teams develop new foundation models that can understand and reason over complex data.
In-person | Talk | NLP & LLMs | GenAI | Intermediate
There seems to be a new large ML model grabbing headlines every week. Whether it’s OpenAI’s big releases like GPT-3, Dalle-2 or Whisper, or one of the many open source projects generating state-of-the-art models, like Stable Diffusion, OpenFold or Craiyon, these models have found their way into the mainstream. Lukas will map the landscape for you and share how these teams use W&B to accelerate their work…more details
Lukas Biewald is the founder and Chief Data Scientist of CrowdFlower. Founded in 2009, CrowdFlower is a data enrichment platform that taps into an on-demand to workforce to help companies collect training data and do human-in-the-loop machine learning.
Following his graduation from Stanford University with a B.S. in Mathematics and an M.S. in Computer Science, Lukas led the Search Relevance Team for Yahoo! Japan. He then worked as a senior data scientist at Powerset, acquired by Microsoft in 2008. Lukas was featured in Inc Magazine’s 30 Under 30 list.
Lukas is also an expert level Go player.
In-person | Talk | Beginner
Little is known about the history of Neural Networks. For instance, “”Artificial Intelligence.”” The history goes back to around 1946 when researchers noticed that the mathematics involved with linear algebra wherein materials were stretched that the neighboring atoms were affected, was best modeled with a branch of Math known as tensors. Tensors are used today to create neural networks. Neural networks are run on a powerful Graphics Processing Unit (GPU.) GPUs came about because of video games and entertainment. Thus we can say that video games laid the groundwork for AI…more details
Jack McCauley an Innovator in Residence at Jacobs Institute for Design Innovation at UC Berkeley, Professor at UC Berkeley, Co-Founder of Oculus, an American engineer, hardware designer, inventor, video game developer and philanthropist. Jack is best known for designing the guitars and drums for the Guitar Hero video game series, and as a co-founder and former chief engineer at Oculus VR. At Oculus, Jack designed and built the Oculus DK1 and DK2 virtual reality headsets. Oculus was acquired by Facebook for $2 Billion. McCauley holds numerous U.S. patents for inventions in software, audio effects, virtual reality, motion control, computer peripherals, and video game hardware and controllers. Jack was awarded a full scholarship to attend University of California, Berkeley where he earned a BSc., EECS in Electrical Engineering and Computer Science in 1986. Jack has authored numerous research papers in the field of artificial intelligence (AI) and mathematical modeling of AI-based systems and is currently pursuing new projects at his private R&D facility and hardware incubator in Pleasanton, California.
In-Person | Talk | All Levels
Alison Cossette is a dynamic Data Science Strategist, Educator, and Podcast Host. As a Developer Advocate at Neo4j specializing in Graph Data Science, she brings a wealth of expertise to the field. With her strong technical background and exceptional communication skills, Alison bridges the gap between complex data science concepts and practical applications.
Alison’s passion for responsible AI shines through in her work. She actively promotes ethical and transparent AI practices and believes in the transformative potential of responsible AI for industries and society. Through her engagements with industry professionals, policymakers, and the public, she advocates for the responsible development and deployment of AI technologies.
Alison’s academic journey includes pursuing her Master of Science in Data Science program, specializing in Artificial Intelligence, at Northwestern University and research with Stanford University Human-Computer Interaction Crowd Research Collective. Alison combines academic knowledge with real-world experience. She leverages this expertise to educate and empower individuals and organizations in the field of data science.
Overall, Alison Cossette’s multifaceted background, commitment to responsible AI, and expertise in data science make her a respected figure in the field. Through her role as a Developer Advocate at Neo4j and her podcast, she continues to drive innovation, education, and responsible practices in the exciting realm of data science and AI.
In-person | Talk | Deep Learning | Machine Learning | Responsible AI | Intermediate – Advanced
In this talk, we’ll delve into the intricate landscape of popularity bias in billion-scale recommender systems and review existing approaches to detect, quantify and mitigate it. We will also identify some of the open challenges in this area and offer a glimpse into the exciting future directions that are currently being explored…more details
Amey Porobo Dharwadker is a seasoned Machine Learning Engineering Manager at Meta, leading the Facebook Video Recommendations Ranking team. He is renowned for his pivotal role in developing personalization models used by billions of global users everyday, which have significantly contributed to Facebook’s impressive user growth. His contributions have led to the success of Facebook Watch and Reels, now engaging over 1.25 billion monthly users. Prior to this, he made substantial strides in improving user engagement and revenue growth at Facebook through his work on News Feed and Ads Machine Learning. He is a prolific researcher with multiple international publications in recommender systems, and he actively serves as a program committee member for top-tier AI conferences including AAAI, AISTATS, IJCAI, CIKM and ECIR. As a thought leader, Amey is a sought-after speaker at prestigious AI venues and contributes to hackathons, angel syndicates and startup accelerators as a mentor. He also plays a significant role on the juries of renowned global technology competitions, including the CES Innovation Awards and Edison Awards. He holds a Master’s degree from Columbia University in the City of New York and a Bachelor’s degree from the National Institute of Technology Tiruchirappalli, India.
In-person | Talk | Machine Learning | All Levels
In this presentation, our speaker will commence by conducting a comprehensive review of a list of common failure factors. Additionally, they will present an AI-driven ecosystem approach for knowledge discovery and discuss a real-world test of this approach involving 14 knowledge discovery projects. Through this, attendees will gain valuable insights into the crucial connection between the success of data-driven knowledge discovery projects and advancements in AI technology. Participants will also grasp how this ecosystem approach can yield remarkable gains in speed, enhanced quality, and effective risk mitigation for knowledge discovery initiatives. Furthermore, the session will highlight additional advantages brought about by the implementation of AI, such as the utilization of diverse data forms, the application of various algorithmic approaches, and the improvement of communication among project stakeholders. To conclude the presentation, the speaker will review the history of AI development. Subsequently, they will engage in a discussion of insights gleaned from this historical review and explore future trends in the field. The session will conclude with a question-and-answer segment open to all participants.…more details
Dr. Alex Liu is the founder and Director of the RMDS Lab, a data and AI ecosystem service provider. From 2013 to 2019, Alex was one of the data science thought leaders and a distinguished data scientist at IBM where he served as a Chief Data Scientist for analytics services. Before joined IBM, Dr. Liu worked as a chief data scientist for a few companies including iSKY and Retention Science. Previously, Dr. Liu taught advanced quantitative methods to PhD candidates in the University of Southern California and the University of California at Irvine, while consulted for many well-known organizations such as the United Nations and Ingram Micro. Alex has a MS of Statistical Computing and a PhD of Sociology from Stanford University.
In-person | Track Keynote | Generative AI | All Levels
Generative AI has taken over the world by storm, but building Gen AI applications for production comes with a unique set of challenges. Questions around cost performance, risk, simplifying implementation for production, setting guardrails, adding automation where possible and leveraging CI/CD for ML all become even more important when Gen AI is involved…more details
Yaron Haviv is a serial entrepreneur who has been applying his deep technological experience in AI, cloud, data and networking to leading startups and enterprises since the late 1990s. As the Co-Founder and CTO of Iguazio, Yaron drives the strategy for the company’s MLOps platform and led the shift towards the production-first approach to data science and catering to real-time AI use cases. He also initiated and built Nuclio, a leading open source serverless framework with over 4,000 Github stars and MLRun, a cutting-edge open source MLOps orchestration framework. Prior to co-founding Iguazio in 2014, Yaron was the Vice President of Datacenter Solutions at Mellanox (now NVIDIA – NASDAQ: NVDA), where he led technology innovation, software development and solution integrations. He also served as the CTO and Vice President of R&D at Voltaire, a high-performance computing, IO and networking company which floated on the NYSE in 2007 and was later acquired by Mellanox (NASDAQ:MLNX). Yaron is an active contributor to the CNCF Working Group and was one of the foundation’s first members. He sits on the Data Science Committee of the AI Infrastructure Alliance (AIIA), of which Iguazio is a founding member. He is co-authoring a book on Implementing MLOps in the Enterprise for O’Reilly. Yaron presents at major industry events worldwide and writes tech content for leading publications including TheNewStack, Hackernoon, DZone, Towards Data Science and more.
In-Person | AI X Panel
Join us for an engaging discussion with industry leaders as we delve into the dynamic realm of synthetic data. Our distinguished panelists will explore the intricacies of generating synthetic datasets and the profound impact synthetic data is having on machine learning, privacy, and innovation. Discover how synthetic data is revolutionizing AI research and applications, and gain insights into its potential to drive progress in diverse fields, from healthcare to autonomous vehicles.
Alex Watson, co-founder and chief product officer at Gretel.ai, has been a trailblazer in the technology sector, focusing on data security and innovation. Before founding Gretel, he launched Harvest.ai, a security startup that leveraged NLP and AI to protect cloud data, which was acquired by Amazon in 2016. At AWS, he pioneered Macie, one of the company’s leading security services. Alex’s technological roots run deep, with experience at the NSA and co-founding BTS to revolutionize battlefield communications, leading to the U.S. Army’s first deployment of 3G and 4G networks.
In-person | Talk | MLOps | Data Engineering & Big Data | Machine Learning | Deep Learning | Intermediate
In this presentation, Amber Roberts, Machine Learning Engineer at Arize AI, will present findings from research on ways to measure vector/embedding drift for image and language models. With lessons learned from testing different approaches (including Euclidean and Cosine distance) across billions of streams and use cases, Roberts will dive into how to detect whether two unstructured language datasets are different — and, if so, how to understand that difference using techniques such as UMAP…more details
Amber Roberts is a ML Growth Lead at Arize AI, a ML observability company built for maintaining models in production. Previously, Amber was a product manager of AI at Splunk and the Head of Artificial Intelligence at Insight Data Science. A Carnegie Fellow, Amber has an MS in Astrophysics from the Universidad de Chile.
In-person | Talk | Data Visualization & Data Analysis | All Levels
Save time and money by equipping everyone with fundamental data literacy and analytics skills. It sounds great, but most organizations bite off more than they can chew. Instead of trying to turn everyone into a data analyst, teach people the most vital descriptive analytics skills, and eliminate the flow of tedious work requests to your data experts…more details
A TEDx speaker, Dom brings a wealth of data storytelling experience to StoryIQ from his career at QBE, one of Australia’s largest insurance companies. At QBE, he was a senior leader in data analytics and business improvement, presenting data-driven strategy recommendations to the company’s senior executives and producing reports for the Group Board of Directors.
Virtual | Talk | Machine Learning | LLMs | Intermediate
A highlight of this workshop is the introduction to the “causal LLM”, a pioneering concept where LLMs are designed using foundational causal principles. By the end of this session, participants will be equipped with the skills and knowledge to employ LLMs effectively in discerning complex causal models and pioneering advancements in the field of causal AI…more details
Robert Osazuwa Ness is a researcher at Microsoft Research and author of the book Causal Machine Learning. He leads the development of MSR’s causal machine learning platform and conducts research into probabilistic models for advanced causal reasoning. He has worked as a machine learning engineer in various machine learning startups. He attended graduate school at both Johns Hopkins SAIS (Hopkins-Nanjing Center) and Purdue University. He received his Ph.D. in Statistics from Purdue, where his dissertation research focused on Bayesian active learning models for causal discovery.
Virtual | Talk | Generative AI | All Levels
Human creators have provided the works (whether prose writing, source code, or visual works) that act as the basis for training huge DNNs, arguably create an obligation for the AI outputs. This feels most evident when promps ask AIs to create something “in the style of such-and-such-human.” While such outputs are often flawed in interesting ways, they are also usually recognizable in their connection to the prompted human creator. What rights should those source humans have to control those uses, including the moral right simply to be formally recognized as the source? Few laws exist governing attribution and moral rights in generative AI, but many will come to exist soon. Laws and technical standards may follow good or bad principles, both ethical and technical…more details
David is founder of KDM Training, a partnership dedicated to educating developers and data scientists in machine learning and scientific computing. He created the data science training program for Anaconda Inc. and was a senior trainer for them. With the advent of deep neural networks he has turned to training our robot overlords as well.
He was honored to work for 8 years with D. E. Shaw Research, who have built the world’s fastest, highly-specialized (down to the ASICs and network layer), supercomputer for performing molecular dynamics.
David was a Director of the PSF for six years, and remains co-chair of its Trademarks Committee and of its Scientific Python Working Group. His columns, Charming Python and XML Matters, written in the 2000s, were the most widely read articles in the Python world. He has written previous books for Manning, Packt, O’Reilly and Addison-Wesley, and has given keynote addresses at numerous international programming conferences.
Virtual |Talk | Machine Learning | Beginner – Intermediate
In this training session, we will delve into the intricacies of building robust and scalable recommendation engines specifically tailored for online food delivery services. We will introduce a newly released dataset called the Delivery Hero Recommendation Dataset (DHRH) and understand how this can be used for training different recommendation models.We will explore the challenges faced in this domain and discuss the techniques and best practices to overcome them, ensuring our recommendation systems can handle large-scale operations and adapt to changing customer preferences…more details
Raghav is a seasoned Data Science professional with over a decade’s experience of research & development of large-scale solutions in Finance, Digital Experience, IT Infrastructure and Healthcare for giants such as Intel, American Express, United HealthGroup and DeliverHero. He is an innovator with 10+ patents, a published author of multiple well received books & peer-reviewed papers and a regular speaker in leading conferences on topics in the areas of Generative AI, Recommendation Systems, Computer Vision, NLP, Deep Learning, Machine Learning and Augmented Reality.
Vishal is an experienced Data science professional with 12+ years of experience. He is currently leading the Ranking and Recommendation topic at DeliveryHero, where he joined in May 2022. Before joining DeliveryHero, he worked at Monotype as a Data science manager, leading the AI Research team in building products like WhatTheFont (identifying fonts from images), Font Similarity and many more. He has a couple of patents filed in his name.
In-person | Talk | LLMs | Machine Learning | Intermediate
In the realm of advanced computational linguistics, the efficacy of Large Language Models (LLMs) is intrinsically tied to their contextual precision. In this session presented by Jake from Cloudera (not State Farm), we’ll navigate the complexities of ensuring LLMs yield contextually accurate results, a necessity in today’s intricate data environments. Crucially, attendees will be treated to a live demonstration showcasing the utilization of RAG (Retrieval-Augmented Generation) and PEFT (Parameter Efficient Fine-Tuning) techniques, two of the leading approaches for this task that underpin the success of Cloudera’s Applied ML Prototypes (AMPs)…more details
Jake currently holds the position of Principal Technical Evangelist at Cloudera, where he promotes the strengths of Cloudera’s Lakehouse for delivering trusted AI. His tenure at Cloudera began as a Senior Product Marketing Manager for Cloudera Machine Learning (CML).
Before Cloudera, Jake developed his ML expertise at ExxonMobil, starting as a Data Scientist and later transitioning to a Data Science and Analytics Solution Architect role. He also contributed significantly at FarmersEdge, taking on responsibilities as a Senior Data Scientist and subsequently as a Data Science Manager.
Jake earned both his bachelor’s and master’s degrees from Brigham Young University in Information Systems Management with an emphasis in Statistics.
Outside of work, Jake is passionate about outdoor activities. He enjoys skiing, golfing, rafting, and hiking. However, spending time with his family amidst the mountains remains his most rewarding pastime.
In-person | Talk | Intermediate-Advanced
In today’s data-driven world, recommendation systems have become ubiquitous, driving user engagement and increasing revenue for businesses across various domains. This talk will take you on a journey from the raw data to vectorization techniques, ultimately culminating in the creation of a robust recommendation system. Whether you’re a seasoned data scientist or just starting your journey into recommendation systems, this presentation will provide valuable insights and practical takeaways for building powerful recommendation engines…more details
Hudson Buzby is a dynamic Solutions Architect with a passion for leveraging technology to solve complex challenges. With over a decade of experience in the tech industry, Hudson has made significant contributions to the world of data engineering and cloud solutions. Hudson currently serves as a Solutions Architect at Qwak, where he continues to push the boundaries of what’s possible in the world of cloud computing and data management.
Virtual | Talk | Responsible AI | Intermediate
Digital experimentation and A/B testing are invaluable methodologies within the AI domain, facilitating the validation and continuous improvement of models, solutions, and systems. This talk will delve into the intricate role of these testing mechanisms in the AI landscape. We will start by introducing the concept of digital experimentation and A/B testing, elaborating on their integral role in testing hypotheses and making data-driven decisions. The discussion will further touch upon the traditional uses of these methodologies in the digital marketing sphere, enabling businesses to optimize their online content and increase user engagement…more details
Alessandro is a highly experienced data scientist with a Bachelor’s degree in computer science and a Master’s in data science. He has collaborated with a variety of companies and organizations and currently holds the role of senior data scientist at logistics giant Kuehne+Nagel. Alessandro is particularly passionate about statistics and digital experimentation and has a strong track record of applying these skills to solve complex problems. He shares his knowledge regularly, speaking at events like the Data Innovation Summit and DataMass Gdansk Summit.
Virtual | Talk | MLOps and Data Engineering | Intermediate
With the wide adoption of generative artificial intelligence (AI), more than ever, ensuring the robustness of machine learning models is becoming crucial. One of the most concerning security threats to machine learning (ML) is the potential for adversarial attacks, a technique to exploit vulnerabilities of models to cause incorrect output. They are unapparent to humans but sufficient for machine learning models to misclassify the data, potentially harming the end users. Therefore, including adversarial training in the ML lifecycle is important to consider as you build out your model and prepare it for production usage. Join this talk to learn how the symbiosis between the ML security open source projects like Adversarial Robustness Toolbox and an ML operations (MLOps) project, Kubeflow, can streamline your machine learning workflow and improve your model’s robustness and security. With numerous benefits of the MLOps to assist in generating and defending your model, accelerate your development of secure machine learning models…more details
Anna Jung is a Senior ML Open Source Engineer at VMware, leading the open source team as part of the VMware AI Labs. She currently contributes to various upstream ML-related open source projects focusing on the project’s overall health, adoption, and innovation. She believes in the importance of giving back to the community and is passionate about increasing diversity in open source. When away from the keyboard, Anna is often at film festivals supporting independent filmmakers.
In-Person | Talk | NLP & LLMs | GenAI | Advanced
In this talk, I will present data2vec, a framework for general self-supervised learning that uses the same learning method for either speech, NLP or computer vision. The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture. Instead of predicting modality-specific targets such as words, visual tokens or units of human speech which are local in nature, data2vec predicts contextualized latent representations that contain information from the entire input. Experiments on the major benchmarks of speech recognition, image classification, and natural language understanding demonstrate a new state of the art or competitive performance to predominant approaches…more details
Michael Auli is a principal research scientist/director at FAIR in Menlo Park, California. His work focuses on speech and NLP and he helped create projects such as wav2vec/data2vec, the widely used fairseq toolkit, the first modern feed-forward seq2seq models outperforming RNNs for NLP, and several top ranked submissions at the WMT news translation task in 2018 and 2019. Before that Michael was at Microsoft Research, where he did early work on neural machine translation and using neural language models for conversational applications. During his PhD at the University of Edinburgh he worked on natural language processing and parsing. http://michaelauli.github.io
In-person | Talk | Machine Learning | Beginner
Statistics do not come intuitively to humans; they always try to find simple ways to describe complex things. Given a complex dataset, they may feel tempted to use simple summary statistics like the mean, median, or standard deviation to describe it. However, these numbers are not a replacement for visualizing the distribution…more details
Virtual | Talk | Machine Learning | Deep Learning | Intermediate-Advanced
In this session we will deep dive into all the new developments and techniques in PyTorch and provide recommendations on how you can accelerate your models using native PyTorch code…more details
Supriya is an Engineering Manager working on PyTorch at Meta. Her team works on architecture optimization techniques like quantization, pruning as well as other core components of PyTorch 2.0 whereby enabling users to run AI models on different HW efficiently using native PyTorch. Prior to Meta, she worked as a software engineer at Nvidia on improving their GPU Architecture and accelerating AI models via TensorRT for inference. Supriya has an MS in CSE from University of Michigan, Ann Arbor and a bachelor’s degree from Bits Pilani, India.
In-person | AI X Talk | All Levels
When it comes to developing AI agents that can deliver reliable real-world results, the possibilities are broad and exciting… but the challenges can be profound and unexpected. Join Cai GoGwilt, Co-Founder and Chief Architect of Ironclad, as he shares the lessons learned in building the first-ever agent capable of performing complex legal contract analysis. This agent acts like a “talented intern,” capable of using tools independently, processing multiple logic loops, and applying imagination and creativity while still delivering precise results. It was designed to be immediately accepted by, and useful to, untrained business analysts and legal staff…more details
Cai GoGwilt has always been passionate about building products that make people more effective at their jobs. He is an MIT graduate in Computer Science and Physics, as well as a former software engineer at Palantir Technologies. By day, he’s a celebrated software engineer and thoughtful leader; by night, he’s a concert cellist and karaoke enthusiast.
This talk is aimed at data science and AI/ML research practitioners, and business leaders / AI strategists alike. The audience will leave with a view into some of the latest GPT genAI models, datasets, methods, and real-world business use cases, as well as insight into Cerebras’ revolutionary AI computing platform and GPT/LLM development services to transform data and AI initiatives into industry-leading generative AI solutions…more details
Andy Hock is VP of Product Management at Cerebras Systems. Andy was the Senior Director of Advanced Technologies at Skybox Imaging, makers of high-resolution satellites, when the company was purchased by Google in 2014 for $500M. After the acquisition, he continued on as Product Manager at Google before joining the team at Cerebras. Prior to Skybox, Andy was the Senior Program Manager, business development lead and a Senior Scientist for Arete Associates. He holds a PhD in Geophysics and Space Physics from UCLA.
Virtual | Talk | Machine Learning Safety and Security | Intermediate
This session will focus on security threats to machine learning models, including a demonstration of the similarities and differences between adversarial attacks in the domains of computer vision and natural language processing (NLP) with examples from open source projects like Adversarial Robustness Toolbox and TextAttack. Join the session to discuss applying adversarial research to real-world systems and learn how to be proactive with machine learning security…more details
Teodora is an open source software engineer at VMware AI Labs. During her first couple of years in VMware, as part of the Open Source Program Office, she was an active contributor and maintainer of The Update Framework (TUF) – a framework for securing software update systems. Currently, she invests her time in open source projects related to machine learning security.
Virtual | Talk | NLP | Beginner
In this talk, we will delve into the intricate relationship between NLP task definition and the outcomes of NLP system development, exploring how the precise formulation of tasks can either propel or hinder progress. Through case studies and examples, attendees will understand how ambiguous or biased task definitions can lead to misguided model objectives, high data acquisition costs, and misleading system evaluations, and learn how to strike the right balance between specificity and adaptability in NLP task design…more details
Panos Alexopoulos has been working since 2006 at the intersection of data, semantics, and software, building intelligent systems that deliver value to business and society. Born and raised in Athens, Greece, he currently works as Head of Ontology at Textkernel, in Amsterdam, Netherlands, where he leads a team of Data Professionals in developing and delivering a large cross-lingual Knowledge Graph in the HR and Recruitment domain. Panos holds a PhD in Knowledge Engineering and Management from National Technical University of Athens, and has published more than 60 papers at international conferences, journals and books. He is the author of the book “Semantic Modeling for Data – Avoiding Pitfalls and Breaking Dilemmas” (O’Reilly, 2020), and a regular speaker and trainer in both academic and industry venues.
In an era of Generative AI and Large Language Models, using old school machine learning might sound quaint. The reality is, however, that hidden with your corporate data is a trove of information and hidden business patterns. Leveraging key techniques from Machine Learning can help businesses uncover hidden patterns that will have immediate impact on your bottom line and on your ROI…more details
Ryohei is the Founder & CEO of dotData. Prior to founding dotData, he was the youngest research fellow ever in NEC Corporation’s 119-year history, the title was honored for only six individuals among 1000+ researchers.
During his tenure at NEC, Ryohei was heavily involved in developing many cutting-edge data science solutions with NEC’s global business clients, and was instrumental in the successful delivery of several high-profile analytical solutions that are now widely used in industry.
Ryohei received his Ph.D. degree from the University of Tokyo in the field of machine learning and artificial intelligence.
Get your copy of Sinan Ozdemir‘s book signed at our book-author event! We will have a limited number of copies available for free.
Sinan Ozdemir is a mathematician, data scientist, NLP expert, lecturer, and accomplished author. He is currently applying my extensive knowledge and experience in AI and Large Language Models (LLMs) as the founder and CTO of LoopGenius, transforming the way entrepreneurs and startups market their products and services.
Simultaneously, he is providing advisory services in AI and LLMs to Tola Capital, an innovative investment firm. He has also worked as an AI author for Addison Wesley and Pearson, crafting comprehensive resources that help professionals navigate the complex field of AI and LLMs.
Previously, he served as the Director of Data Science at Directly, where my work significantly influenced their strategic direction. As an official member of the Forbes Technology Council from 2017 to 2021, he shared his insights on AI, machine learning, NLP, and emerging technologies-related business processes.
He holds a B.A. and an M.A. in Pure Mathematics (Algebraic Geometry) from The Johns Hopkins University, and he is an alumnus of the Y Combinator program. Sinan actively contribute to society through various volunteering activities.
Sinan’s skill set is strongly endorsed by professionals from various sectors and includes data analysis, Python, statistics, AI, NLP, theoretical mathematics, data science, function analysis, data mining, algorithm development, machine learning, game-theoretic modeling, and various programming languages.
Virtual | Talk | LLMs | ML/AI Safety and Security | All Levels
Language models are incredible engineering breakthroughs but require auditing and risk management before productization. These systems raise concerns about toxicity, transparency and reproducibility, intellectual property licensing and ownership, disinformation and misinformation, supply chains, and more. How can your organization leverage these new tools without taking on undue or unknown risks? While language models and associated risk management are in their infancy, a small number of best practices in governance and risk are starting to emerge. If you have a language model use case in mind, want to understand your risks, and do something about them, this presentation is for you..more details
Patrick Hall is an assistant professor of decision sciences at the George Washington University School of Business, teaching data ethics, business analytics, and machine learning classes. He also conducts research in support of NIST’s AI risk management framework and is affiliated with leading fair lending and AI risk management advisory firms.
Patrick studied computational chemistry at the University of Illinois before graduating from the Institute for Advanced Analytics at North Carolina State University. He has been invited to speak on AI and machine learning topics at the National Academies of Science, Engineering, and Medicine, ACM SIG-KDD, and the Joint Statistical Meetings. He has been published in outlets like Information, Frontiers in AI, McKinsey.com, O’Reilly Ideas, and Thompson-Reuters Regulatory Intelligence, and his technical work has been profiled in Fortune, Wired, InfoWorld, TechCrunch, and others. Patrick is the lead author of the book Machine Learning for High-Risk Applications.
Prior to joining the GW School of Business, Patrick co-founded BNH.AI, a boutique law firm focused on AI governance and risk management. He led H2O.ai’s efforts in responsible AI, resulting in one of the world’s first commercial applications for explainability and bias mitigation in machine learning. Patrick also held global customer-facing roles and R&D roles at SAS Institute. Patrick has built machine learning software solutions and advised on matters of AI risk for Fortune 100 companies, cutting-edge startups, Big Law, and U.S. and foreign government agencies.
In-person | AI X Talk | Intermediate
The session will also highlight the importance of Declarative AI models, which offer a rapid composition of Applied AI applications, reducing time-to-market and emphasizing the crucial last mile of application delivery. Join us to learn about harnessing AI’s potential in real-world applied environments, focusing on tangible value and outcome-driven solutions…more details
Nick King is the Founder and CEO of Data Kinetic, a platform agnostic industry-focused applied AI research, development, and advisory company. Nick has been working on enterprise technology platforms for the last 20 years. In his role at Data Kinetic, he has focused on driving the applied use of AI/ML with enterprise companies. Previous to Data Kinetic, Nick has worked across multiple startups and open source projects from Snowplow, to DataRobot. He has also held roles at Microsoft, Google, VMware, and Cisco focusing on applied technologies and platforms.
In this presentation, we explore an approach that utilizes LLMs to guide feature engineering. By leveraging the contextual understanding within LLMs, we have developed a system for LLM-assisted feature engineering. Our research demonstrates the practical benefits of this synergy – from feature ideation to improved feature relevance to enhanced model interpretability and efficiency…more details
Sergey is a data scientist with a background in physics and neurobiology. FeatureByte is Sergey’s second startup. He was one of the first employees at DataRobot where he created and led a professional services group and helped the company grow into a unicorn. Sergey is widely known for being a Kaggle Grandmaster and holding the #1 rank on Kaggle in the past. Multiple times he was mentioned as one of the top data scientists by various publications. Sergey’s passion is in machine learning, predictive modeling and inventive feature engineering.
In-person | AI X Talk | Beginer-Intermediate
In this session, Yochai will share the essential skills needed to transform your product with an AI-first approach. He’ll cover how to:
– Decide on a Northstar that will set you up for success
– Build a rapid evaluation framework to elevate your product
– Align your AI with human guidance so it becomes more efficient and effective
– Create unique and defensible generative AI…more details
Yochai Konig is Ada’s Vice president of Machine learning. He has over 30 years of experience in Machine Learning research, conversational AI, and Deep Learning. Prior to joining Ada, Yochai led the Genesys Innovation group as well as Genesys Advanced Research. Yochai was the co-founder and CTO of Utopy (acquired by Genesys) where he created the first Speech Analytics product “SpeechMiner” which started the Speech Analytics market. He started his career at SRI International as a Research Scientist where he served as Principal Investigator for Department of Defense (DoD) research projects in the areas of Speech Recognition, Speaker Verification and Natural Language Processing (NLP). Yochai Konig is the inventor of over 100 patents and author of numerous academic papers. Yochai holds a Ph.D. in computer science from Berkeley, and a bachelor’s degree in computer engineering, summa cum laude, from the Technion, Israel Institute of Technology.
In the fast-paced world of data science and AI, we will explore how large language models (LLMs) can elevate the development process of Apache Spark™ applications.
We’ll demonstrate how LLMs can simplify SQL query creation, data ingestion, and DataFrame transformations, leading to faster development and more precise code that’s easier to review and understand. We’ll also show how LLMs can assist in creating visualizations and clarifying data insights, making complex data easy to understand…more details
Allison is a Senior Software Engineer at Databricks where she focuses on Spark SQL and PySpark. Before Databricks, she was an early member of Robinhood’s data team. She holds a Bachelor’s degree in Computer Science from Carnegie Mellon University.
Gengliang Wang, a committed Apache Spark PMC Member and Committer, actively works on important Spark projects including ANSI SQL mode, TIMESTAMP_NTZ data type, and data sources. His contributions extend to enhancing the SQL compiler and UI. In addition, he’s engaged in a project using large language models to streamline Spark application development. His work underscores a dedication to improving and making Apache Spark more user-friendly.
In-person | Talk | Deep Learning | Machine Learning | Intermediate
Keras, the popular deep learning library, is now becoming multi-backend with support for TensorFlow, JAX, and PyTorch. Available now, Keras Core allows developers to create models with all of the simple high-level components of Keras while interchanging between frameworks to take advantage of the benefits of each. Existing users of each framework can seamlessly integrate their code, including backend-specific code, with Keras. Additionally, Keras Core’s ops suite, which includes the NumPy API, allows you to develop custom components for use on any framework. For current users of tf.keras, Keras Core with the TensorFlow backend serves as a drop-in replacement, meaning that changing your imports is all that is necessary to start taking advantage of the framework-agnostic future…more details
Neel is currently an engineer on the Keras team at Google. His work has been focused on open source development, adding major features to the latest releases of TensorFlow and Keras. Experienced broadly from data science to ML infrastructure, he is responsible for the saving and export of ML models, as well as a developer for cross-framework compatibility at Google. As a key maintainer of keras.io and tensorflow.org, Neel is passionate about educating the data science community at large and helping the latest innovations of AI reach everyone.
In-person | Talk | All Levels
Open source technology has historically driven technological progress, from operating systems, to web development, and now, AI and machine learning. We’re at a point in history where the canonical “open source ML stack” has yet to be discovered. In this talk, you’ll hear about HPE’s approach to building an end-to-end ML stack, including Determined AI for model training and Pachyderm for data preparation. You’ll also see a live demo of how to use Determined AI for a transfer learning use case in the biomedical image domain. We’ll also touch on some key collaborations, including our partnerships with the AI Infrastructure Alliance, NVIDIA, and Aleph Alpha…more details
Isha Ghodgaonkar is a Machine Learning Developer Advocate at Hewlett Packard Enterprise (HPE), working within the HPC, AI & Labs business unit, which includes open source platforms Determined AI and Pachyderm. Prior to HPE, Isha worked at MITRE as an Autonomous Systems Engineer, a Data Science Intern at Genentech, and a technical intern at The Aerospace Corporation. Isha is a graduate of Purdue University.
Attendees will learn how to apply NLP to financial data. Earnings data covers all industries in the US; attendees will be surprised to learn how much information is hiding in plain sight. Earnings calls are public information and I’m going to show you what you’re able to learn from them using ASR and NLP tools.
Speech recognition technology has come a long way in recent years, but there are still some larger market challenges when it comes to transcribing speech accurately. This comes into play especially for domain-specific terminology around certain industries. For example, in the renewable energy industry, transcription might include terms like “hydro DM,” “biomass conversion,” and “renewable portfolio standard,” that are outside normal conversation…more details
Katie is the Product Lead for Scribe at Kensho. Scribe is an Artificial Intelligence, Natural Language Processing tool that does voice to text transcription. Katie’s background is in data pipeline management and machine learning – specifically unsupervised machine learning in aerospace to detect satellite collisions. She is a guest lecturer at Johns Hopkins University and graduated with distinction from the University of Virginia.
In-person | Talk | Data Analytics and Big data | Machine Learning | MLOps | Beginner – Intermediate
When choosing different architectures for your enterprise data environment, you always want to make sure how to manage choices in the CAP theorem. The CAP theorem states that you can not have consistency, availability, and partition-tolerance at the same time, but what if choosing for Kappa architecture makes it possible to have it all?…more details
Joep has more than 12 years experience of developing, engineering, architecting and visualising data products in various markets ranging from energy to clothing manufacturing. He’s focussing on enabling teams to be better at handling data and providing the teams with the tools and the knowledge needed to go live and to stay in production.
He was member of the Teqnation program committee), did a presentation on Kafka and Hue usage during football, developing and deploying on Hololens, Total Devops using Gitlab, Evolution of a datascience product, Using the elastic stack from PoC to Production, Xbox Kinect on a bike at Devoxx London
In-person | Talk | Generative AI | NLP | Machine Learning | Research Frontiers (Research topics in ML, DL, Data Science etc) | All Levels
This talk will dive into the end-to-end process of and framework for building a generative AI application, leveraging a fun and engaging case study with open-source tooling (e.g., HuggingFace models, Python, PyTorch, Jupyter Notebooks). We will guide attendees through key stages from model selection and training to deployment, while also addressing fine-tuning versus prompt-engineering for specific tasks, ensuring the quality of output, and mitigating risks…more details
Michelle is a technology leader that specializes in machine learning and cloud computing. She has 15 years of experience in the technology industry, contributed to the original IBM Watson showcased on Jeopardy, and enjoys building and leading teams that develop and deploy AI solutions to solve real-world problems. Michelle is passionate about diversity, STEM education/careers for our minority communities, and serves both on the board of Women in Data and as an avid volunteer for Girls Who Code.
In-person | Talk | Beginner – Intermediate
Manufacturers of modern age are looking for data science solutions to help move their 3 key metrics, – Produce more, Be efficient and optimize resource utilization, and ship with highest possible quality. Given these 3 KPIs , we will understand how data science influences each of these and will deep dive over the key projects that have influenced these KPIs. We deep down on specific projects. Eg how we use recursive hypothesis testing to reduce the testing frequency, how we understand the physics behind a test and optimize, how we make mass production decisions based on design of experiments and small distribution analysis…more details
Angad Arora is a seasoned professional with a wealth of expertise in manufacturing and quality for leading tech companies. With his innovative approach to lean manufacturing using data science, he has successfully saved millions of dollars for consumer electronics manufacturing lines. Currently serving as a Product Quality Manager at Google Inc., Angad continues to drive excellence and ensure top-notch product quality in the dynamic world of technology.
In-Person | AI Rapid Case Studies | All Levels
By harnessing the power of machine learning and AI, companies can unlock advanced capabilities and gain valuable insights that streamline sales processes and enhance sellers’ performance. This infusion eliminates the laborious task of manual information analysis across multiple disparate data sources, empowering sellers to dedicate their time and expertise to building meaningful customer relationships. In architecting AI-based solutions, it is of high importance to understand the users’ daily workflow and challenges, and not just surfacing the insights but surfacing the right information at the right place to simplify users’ workflow…more details
Bio Coming Soon!
In-person | AI Rapid Case Studies | All Levels
By adopting the proposed framework, organizations can effectively harness the potential of Gen AI, empowering all employees with invaluable insights to make data-driven decisions. This talk aims to inspire organizations to embrace a safe agile approach and cultivate a culture of experimentation and collaboration for the successful deployment of Gen AI solutions…more details
In-person | Talk | Machine Learning | Deep Learning | Intermediate
In this talk I will present a new, hybrid approach which combines the best aspects of both methods. The process begins with a careful observation of customer data and assessment of whether there are naturally formed clusters in the data. It continues with the selection of a clustering algorithm and the fine-tuning of a model to create clusters…more details
Evie Fowler is a data scientist based in Pittsburgh, Pennsylvania. She currently works in the healthcare sector leading a team of data scientists who develop predictive models centered on the patient care experience. She holds a particular interest in the ethical application of predictive analytics and in exploring how qualitative methods can inform data science work. She holds an undergraduate degree from Brown University and a master’s degree from Carnegie Mellon.
In-person | Talk | MLOps | Data Engineering & Big Data | All Levels
Over the past decade, the democratization of data science tooling, particularly through Python libraries like pandas and NumPy, has empowered practitioners of all levels to work with data efficiently. Yet, despite the popularity of these tools, they present challenges as practitioners look to scale their workflows to production. In this talk, we explore the limitations of these tools and pain points that data scientists encounter when working with data at scale. I will share how our open-source project Modin (10M+ downloads) addresses this issue by seamlessly scaling up your pandas code with just a single line of code change…more details
Doris Lee is the CEO and co-founder of Ponder (https://ponder.io/). Doris received her Ph.D. from the UC Berkeley RISE Lab and School of Information in 2021, where she developed tools that help data scientists explore and understand their data. She is the recipient of Forbes 30 under 30 for Enterprise Technology in 2023.
In-person | Talk | Machine Learning | Intermediate
Machine learning has undergone a profound transformation with open source. From a technology that can only do naive curve fitting to technology that could potentially end humanities dominion. In 2017, Ali Rahimi declared that Machine learning is the new alchemy and we’d like to go further and claim that Machine Learning is the new necromancy. A forgotten science with a passionately strong open source ethos that was eventually destroyed by the catholic church…more details
Mark Saroufim is an engineer on PyTorch at Meta working on open infrastructure, compilers and community. Mark is fond of hot takes and shares them on his blog https://marksaroufim.substack.com/. Prior to Meta, Mark worked as a Machine Learning engineer at Graphcore, Microsoft and yuri.ai.
In-person | Talk | NLP & LLMs | Intermediate-Advanced
In this session, we will take a deep dive into a novel application of AI: training Large Language Models (LLM) on individual employee’s Slack messages. The first portion of our discussion is dedicated to the technical aspects of this process, where we will explain the steps involved in fine-tuning the LLM. We will demonstrate how such models, tailored to mimic specific individual’s textual styles, can serve as the foundation for applications in text generation and automated question answering systems.
Transitioning into the second part of our talk, we will spotlight the often-underemphasized side of AI deployment – the risks and ethical concerns. Using our project as a case study, we will expose you to the potential pitfalls we encountered and the proactive measures taken to manage these risks. Our aim is to underline the necessity of an encompassing risk management framework for AI…more details
Eli is CTO and Co-Founder at Credo AI. He has led teams building secure and scalable software at companies like Netflix and Twitter. Eli has a passion for unraveling how things work and debugging hard problems. Whether it’s using cryptography to secure software systems or designing distributed system architecture, he is always excited to learn and tackle new challenges. Eli graduated with an Electrical Engineering and Computer Science degree from U.C. Berkeley.
In-person | Talk | Generative AI | Machine Learning Safety and Security | Beginner-Intermediate
In today’s digital world, we have access to everything at our fingertips. We generate and consume data at a lightning speed. Data is growing at an exponential rate. But how safe is your data? Our data is always surrounded by advanced persistent threats (APTs). On the other hand, Data Science, Machine Learning & Artificial Intelligence (AI) with its sub-domains like Deep Learning and Large Language Models (LLMs) are making tremendous progress. In this talk, I will focus on the advanced technologies like LLMs and GPT models in the domain of cyber security. But some of these concepts are relatable to other industries as well…more details
Nirmal Budhathoki is a Senior Data Scientist, who is currently working at Microsoft in Cloud Security. Nirmal has over 12+ years of experience in the IT industry, including 5+ years in data science. Nirmal’s strong belief in continuous learning has led him to complete three master’s degrees with majors on: Information Systems, Business Administration, and Data Science. Nirmal loves to help the data science community and has completed over 600+ free mentoring sessions with aspiring data scientists to help them navigate their data science career. Nirmal also conducts mentored learning sessions for MiT’s Data Science and Machine Learning certification program in collaboration with Great Learning. Nirmal has experience working with the US government for the Department of Navy, and he is also a US army veteran. Nirmal yearns to solve data science problems that are aligned with product strategy and business outcomes. In his free time, Nirmal loves using this data science skills in sports analytics.
In-person | Talk | Machine Learning | MLOps and Data Engineering | All Levels
The robotaxi industry has one of the most extreme use case for Machine Learning with very large image/LiDAR datasets and resource-hungry training jobs. I will recount my experience leading the ML Infrastructure team at Cruise, the #1 robotaxi company. From training on desktop GPUs to being entirely cloud-based, I will detail the challenges we faced, how we solved them. I will also detail what guarantees need to be built into any production-grade ML platform to ensure fast, robust, and safe ML development…more details
Emmanuel Turlay is the founder and CEO of Sematic, an open-source ML infrastructure company. Emmanuel started his career in academia researching particle physics at CERN, before branching out into tech and moving to the US in 2014. He led engineering teams at Instacart, and then Cruise, where he led the ML Infrastructure team for four years. In 2022, Emmanuel founded Sematic to bring learnings from building ML infrastructure for robotaxis to the rest of the industry in an open-source manner.
In-person | Talk | MLOps | Intermediate
Ensuring proper data quality is critical in the effective implementation of data pipelines for ML, data science, geospatial analysis, or general analytics. Most engineering teams address data quality and pipeline orchestration as two separate tasks. In this presentation, Sandy Ryza will explain the benefits of a model in which arbitrary checks are included in the data orchestration logic, resulting in better control and integration of data quality checks at various steps in the pipeline…more details
Sandy is a lead engineer, author, and thought leader in the domain of data engineering. Sandy co-wrote “Advanced Analytics with PySpark” and “”Advanced Analytics with Spark”. He led ML and data science teams at Cloudera, Remix, Clover Health, and KeepTruckin.
Sandy is currently the lead engineer on the Dagster project, an open-source data orchestration platform used in MLOps, data science, IOT and analytics. Sandy is a regular speaker at data engineering and ML conferences.
In-person | Talk | Deep Learning | MLOps and Data Engineering | Intermediate – Advanced
In this talk, we’ll provide an overview of the core ideas behind Saturn, how it works on a technical level to reduce runtimes & costs, and the process of using Saturn for large-model finetuning. We’ll demonstrate how Saturn can accelerate and optimize large-model workloads in just a few lines of code and describe some high-value real-world use cases we’ve already seen in industry & academia…more details
Kabir Nagrecha is a Ph.D. candidate at UC San Diego, working with Professors Arun Kumar & Hao Zhang. His work focuses on systems infrastructure to support deep learning at scale, aiming to democratize large models and amplify the impact of machine learning applications. He is the recipient of the Meta Research Fellowship, as well as fellowships from the Halicioglu Data Science Institute and Jacobs School of Engineering at UCSD.
Kabir is the youngest-ever Ph.D. student at UCSD, having started his doctorate at the age of 17. He’s previously worked with companies such as Apple, Meta, & Netflix to build the core infrastructure that supports widely-used services such as Siri & Netflix’s recommendation algorithms. Most recently, he’s been working on Saturn, a new system to support automatic parallelization, scheduling, and resource apportioning for training large neural networks.
Gaurav is currently the Executive Vice President and General Manager of Machine Learning and AI at AtScale. He is responsible for defining and leading the business that extends the company’s semantic layer platform to address the rapidly expanding set of Enterprise AI and machine learning applications.
Most recently, Gaurav served as VP of Product at Neural Magic – innovators in software acceleration for deep learning utilizing sparse model architectures. Previously, he served in a number of executive roles at IBM spanning product, engineering, and sales that were focused on taking cutting edge data science, machine learning, and AI offerings to market. He specializes in model training and serving, mlops, and trusted AI in the context of driving business outcomes for enterprise applications. He is also an advisor to data and AI companies.
Outside of work Gaurav is an avid sports fan, foodie, traveler, and enjoys spending time with his wife and kids.
In-person | Talk | Deep Learning | Machine Learning | Beginner – Intermediate
We will start by exploring the landscape of model explainability, distinguishing between “intrinsic” and “post-hoc” methods. Intrinsic methods, like decision trees and linear models, are naturally interpretable but may fall short in performance. Post-hoc methods, applicable to complex models like neural networks, are the focus of our session. We’ll dive into techniques such as Local Interpretable Model-agnostic Explanations (LIME), SHapley Additive exPlanations (SHAP), and Integrated Gradients. Through hands-on examples, we’ll demonstrate how these methods dissect a model’s decision boundary, feature importance, and decision paths…more details
Swagata is a Data Professional with over 6 years experience in Healthcare, Retail and Platform Integration industry. She is an avid blogger and writes about state of the art developments in the AI space. She is particularly interested in Natural Language Processing, and focuses on researching how to make NLP models work in practical setting. In her spare time, she loves to play her guitar, sip masala chai and find new spots for doing Yoga. Connect with her here – https://www.linkedin.com/in/swagata-ashwani/
In-person | Talk | Machine Learning
Deploying advanced Machine Learning technology to serve customers and/or business needs requires a rigorous approach and production-ready systems, which has led to the development of MLOps. Large models make rigorous engineering and scalable architectures even more important, which in turn has led to the emergence of LMOps. Just the size of the models themselves, and the datasets used for training, require highly efficient infrastructure. More complex pipeline topologies which include transfer learning, fine tuning, instruction tuning, and evaluation along a collection of complex dimensions, require a high degree of flexibility for customization. Add to this the complex inference-time systems and requirements, and LMOps starts to look somewhat challenging to implement…more details
A data scientist and ML enthusiast, Robert has a passion for helping developers quickly learn what they need to be productive. Robert is currently the Senior Product Manager for TensorFlow Open-Source and MLOps at Google and helps ML teams meet the challenges of creating products and services with ML. Previously Robert led software engineering teams for both large and small companies, always focusing on moving fast to implement clean, elegant solutions to well-defined needs. You can find him on LinkedIn at robert-crowe.
In-person | Talk | Deep Learning | NLP | Machine Learning | MLOps and Data Engineering | Beginner – Intermediate
We discuss a unified and user friendly approach to develop ML applications on MySQL HeatWave AutoML. We describe a simple to use MySQL API for HeatWave AutoML, its extension for different use cases and its ability to interface with third party applications. We also discuss how this facilitates the development of classification, regression, anomaly detection, forecasting and recommendation systems use cases, and present its comparison with other platforms that offer ML features for data bases. Lastly, we present how a user can fine tune the AutoML pipeline for their needs…more details
Sanjay Jinturkar is Senior Director, MySQL HeatWave, focused on building machine learning capabilities inside the MySQL HeatWave database. These capabilities enable the user to automatically develop and deploy machine learning models inside the data base using AutoML in a cloud native environment for a variety of use cases, without the need to pull the data or the model outside the database. In past, he has held multiple technology and engineering management positions in systems software, mobile communications software, applications development and diagnostics. His interests include machine learning, cloud computing, database and architecture. Sanjay has a PhD in Computer Science from University of Virginia.
Sandeep Agrawal leads the HeatWave Machine Learning (HeatWave ML) project within MySQL HeatWave. HeatWave ML is the product of years of research and advanced development, and aims to help both data scientists and non-data scientists quickly apply ML to a given problem. Prior to HeatWave, Sandeep led the Oracle AutoML project within Oracle labs, creating a state-of-the-art distributed AutoML engine. He is passionate about Machine Learning and Systems Architecture, and a project like HeatWave ML that combines the two is heaven for him. Prior to Oracle, he completed his PhD in Computer Science from Duke University in 2015.
In-person | Half-Day Training | Machine Learning | All Levels
In this 90-minute hands-on lab, you’ll learn how to start your Graph journey with ArangoDB. We will walk you through getting a basic deployment up and running in minutes. We will then help you create a deployment in the ArangoGraph insights platform and learn how to load and query your data…more details
Asif Kazi has directed global sales engineering, professional services, and technical customer success teams for over 15 years at Foursquare, Ahana, Amazon Web Services, StreamSets, and Couchbase. He is passionate about working with customers and solving their business problems. His core competencies include scaling teams globally, building referenceable client relationships, and accelerating revenue growth.
Arthur Keen, Ph.D. is a Graph evangelist with more than 20 years of experience in Graph AI. His expertise spans Graph Machine Learning, graph analytics, and ontology. Arthur has developed knowledge graph-based solutions in intelligence, cyber, financial services, logistics, retail, and energy. He has led 3 product teams from concept to GA and first customers. He has been awarded 7 patents and has held leadership roles including CTO, Chief Scientist, and VP of Engineering. He has a Computer Science/Industrial Engineering Ph.D. and has been invited to speak at conferences including Data Day TX, NoSQL Now, SemTech, and SXSW.
In-person | Half-Day Training | Machine Learning | Intermediate-Advanced
As machine learning models have become more present in our lives, there has been increasing attention on the reliability of these models. A major component of this is understanding the how uncertain the model is about its prediction. No model is exactly right 100% of the time, so we need methods and approaches by which we can quantify the level of uncertainty around a prediction.
Approaches to uncertainty quantification (UQ) vary, and depend on the type of problem. For classification problems, the primary approach is probability calibration: making sure that the model outputs corresponding to each class “behaves well” as a probability. For regression problems, there are several different approaches. One can configure models to output an interval, rather than a single point prediction, along with a “coverage” value that specifies the probability that the interval covers the true value. The framework of Conformal Prediction provides theoretical guarantees around such interval predictions. Or one can use methods that output an entire conditional density for y given X. This is called probabilistic regression, or conditional density estimation. Several parametric and non-parametric approaches exist for this problem including PrestoBoost, Coarsage, and NGBoost…more details
Brian Lucena is Principal at Numeristical, where he advises companies of all sizes on how to apply modern machine learning techniques to solve real-world problems with data. He is the creator of three Python packages: StructureBoost, ML-Insights, and SplineCalib. In previous roles he has served as Principal Data Scientist at Clover Health, Senior VP of Analytics at PCCI, and Chief Mathematician at Guardian Analytics. He has taught at numerous institutions including UC-Berkeley, Brown, USF, and the Metis Data Science Bootcamp.
In-person | Training | MLOps and Data Engineering | Machine Learning | Deep Learning | All Levels
Our objective is to ensure that you are equipped with the essential knowledge and practical tools to proficiently manage your machine-learning models in a real-world production environment…more details
Oliver Zeigermann has been developing software with different approaches and programming languages for more than 3 decades. In the past decade, he has been focusing on Machine Learning and its interactions with humans.
Virtual | Training | Generative AI | All Levels
This engaging workshop provides a hands-on journey into the world of Generative AI and Autonomous Agents, crucial building blocks towards achieving Artificial General Intelligence (AGI). Participants will delve into the transformative shift in AI, gaining insights into the revolutionary impact of Generative AI across various domains such as text, image, video, and 3D object generation, as well as data augmentation. The workshop will equip attendees with cutting-edge tools to substantially boost their AI capabilities and efficiency. A key feature of the session is a practical exercise in crafting an Autonomous AI Agent, a technology poised to work in synergy with us in the near future. This succinct yet thorough workshop is tailored for those eager to maintain a competitive edge in the swiftly advancing field of AI…more details
Long before the buzz surrounding generative AI, Martin Musiol was already advocating for its significance in 2015. Since then, he has been a frequent speaker at conferences, podcasts, and panel discussions, addressing the technological advancements, practical applications, and ethical considerations of generative AI. Martin Musiol is a founder of generativeAI.net, a lecturer on AI to over 3000 students, and publisher of the newsletter ‘Generative AI: Short & Sweet’. As the lead for GenAI Projects in Europe at Infosys Consulting (previously at IBM), Martin Musiol helps companies globally harness the power of generative AI to gain a competitive advantage. -> https://www.linkedin.com/in/martinmusiol1/ and his webpage: https://generativeai.net/
Virtual | Half-Day Training | MLOPs | Intermediate-Advanced
Our workshop begins with diving into exploratory data analysis on the Brackish Underwater dataset. This dataset comprises images captured 9 meters below the surface on the Limfjords bridge in northern Denmark by Aalborg University. Dive deep into the world of underwater AI as we explore marine life, including fish, crabs, and other captivating aquatic creatures…more details
Ayush Thakur is a MLE at Weights and Biases and Google Developer Expert in Machine Learning. He is interested in everything computer vision and representation learning. For the past 8 months he’s been working with LLMs and have covered RLHF and how and what of building LLM-based systems.
ML Engineer working on E2E computer vision projects as part of Weights & Biases. My research interests lie in the domains of generative computing, image restoration, and computer graphics. I contribute to open-source actively, primarily through the implementation of research papers and end-to-end ML examples.
In-person | Full-Day Training | Generative AI | Intermediate
This workshop is designed to explore how artificial intelligence can be used to generate creative outputs and to inspire technical audiences to use their skills in new and creative ways…more details
Leonardo De Marchi holds a Master in Artificial intelligence and has worked as a Data Scientist in the sports world, with clients such as the New York Knicks. He now works in Thomson Reuters as VP of Labs, and also provides consultancy and training for small and large companies. His previous experience includes being Head of Data Science and Analytics in Bumble, the largest dating site with over 500 million users, heading the team through acquisition and an IPO.
In-person | Workshop | Machine Learning | Beginner
Pandas can be tricky, and there is a lot of bad advice floating around. This tutorial will cut through some of the biggest issues I’ve seen with Pandas code after working with the library for a while and writing three books on it…more details
Virtual | Tutorial
Hao is currently a postdoctoral researcher at the Sky Lab, UC Berkeley, working with Prof. Ion Stoica. He is recently working on the Alpa project and the Sky project, aiming at democratizing large models like GPT-3. He is an Assistant Professor at Halıcıoğlu Data Science Institute and Department of Computer Science and Engineering (affiliate) at UC San Diego in Fall 2023.
He research is primarily focused on large-scale distributed ML in the joint context of ML and systems, concerning performance, usability, cost, and privacy. His work spans across distributed ML algorithms, large models, parallelisms, performance optimizations, system architectures, ML privacy, and AutoML, with applications in computer vision, natural language processing, and healthcare.
In-person | Tutorial | Generative AI | Beginner-Intermediate
As the digital frontier expands, the inclusion of Generative and Autonomous AI technologies is becoming an integral part of the next evolutionary step in games, simulations, and the emergent Metaverse. In this in-depth, hands-off tutorial, we delve into the transformative potential of these cutting-edge AI paradigms, focusing on their capacity to create virtual landscapes that are not just visually stunning but also highly interactive, intelligent, and perpetually evolving. Participants will be led through a balanced curriculum that covers both the theoretical foundations and the practical techniques, right from core algorithms to real-world implementation…more details
Matt White is a distinguished expert in artificial intelligence and business, Renowned for successfully deploying large-scale AI platforms across the telecom, gaming, media, and entertainment industries. With over two decades of experience, Matt has consistently demonstrated his ability to stay ahead of the curve in technological innovation, spearheading advancements in AI applications in diverse domains.
In-person | Workshop | Data Engineering & Big Data | Machine Learning | Generative AI | Data Visualization | Beginner
In this presentation, Alison will take you through the example of one independent researchers journey from no-code to knowledge graph master. She will walk you through the steps the researcher took, including how to format data stores, what data to collect, ways to leverage LLMs in the process and ultimately have a working knowledge graph in a dashboard with minimal coding experience…more details
Virtual | Workshop | NLP & LLMs | Intermediate
The workshop will be hands-on and composed of 4 modules, targeting data scientists, machine learning experts, software engineers, and product managers. Prior exposure to LLMs is not necessary. At the end of the workshops, participants will have gained insights on how to effectively make the best of open-source LLMs and learned the necessary steps to bridge the gap between prototype and production-ready applications…more details
Suhas Pai is a NLP researcher and co-founder/CTO at Bedrock AI, a Toronto based startup. At Bedrock AI, he works on text ranking, representation learning, and productionizing LLMs. He is also currently writing a book on Designing Large Language Model Applications with O’Reilly Media. Suhas has been active in the ML community, being the Chair of the TMLS (Toronto Machine Learning Summit) conference since 2021 and also NLP lead at Aggregate Intellect (AISC). He was also co-lead of the Privacy working group at Big Science, as part of the BLOOM project.
Virtual | Workshop | Machine Learning | Intermediate
In this talk, we will explore and demystify th world of Causal AI for data science practitioners, with a focus on understand cause-and-effect relationships within data to drive optimal decisions. In this talk, we will focus on:
* from shapley to DAGs: the dangers of using post-hoc explainability methods as tools for decision making, and how tranditional ML isn’t suited in situations where want to perform interventions on the system. * discovering causality: how do we figure out what is causal and what isn’t, with a brief introduction to methods of structure learning and causal discovery * optimal decision making: by understanding causality, we now can accurately estimate the impact we can make on our system – how to use this knowledge to derive the best possible actions to make?…more details
Andre joined causaLens from Goldman Sachs, where he was an executive director in the Model Risk Management group in Hong Kong and Frankfurt. Today he is working with industry leading, global organisations to apply cutting edge Causal AI research in production level solutions that empower individuals and teams to make better decisions. Andre received his PhD in theoretical physics from the University of Munich, where he studied the interplay between quantum mechanics and general relativity in black-holes.
Virtual | Workshop | LLMs | GenAI | Beginner
By completing this workshop, you will develop an understanding of Generative AI and Large Language Models, including the architecture behind them, their functioning and how to leverage their unique conversational capabilities. You will also become familiar with the concept of LLM as reasoning engine that can power your applications, paving the way to a new landscape of software development in the era of Generative AI. Finally, we will cover some examples of LLM-powered applications in Python using popular AI orchestrator, such as LangChain…more details
Valentina is a Data Science MSc graduate and Cloud Specialist at Microsoft, focusing on Analytics and AI workloads within the manufacturing and pharmaceutical industry since 2022. She has been working on customers’ digital transformations, designing cloud architecture and modern data platforms, including IoT, real-time analytics, Machine Learning, and Generative AI. She is also a tech author, contributing articles on machine learning, AI, and statistics, and recently published a book on Generative AI and Large Language Models.
In her free time, she loves hiking and climbing around the beautiful Italian mountains, running, and enjoying a good book with a cup of coffee.
In-person | Workshop | MLOps | Machine Learning | Intermediate
In this workshop, you will learn to build a Streamlit data application to help visualize the ROI of different advertising spends of an example organization…more details
Vino is a Developer Advocate for Snowflake. She started as a software engineer at NetApp, and worked on data management applications for NetApp data centers when on-prem data centers were still a cool thing. She then hopped onto the cloud and big data world and landed at the data teams of Nike and Apple. There she worked mainly on batch processing workloads as a data engineer, built custom NLP models as an ML engineer and even touched upon MLOps a bit for model deployments. When she is not working with data, you can find her doing yoga or strolling the golden gate park and ocean beach.
In-person | Workshop | Machine Learning | Deep Learning | All Levels
During this workshop, you will build complex time series forecasting and anomaly detection models from the ground up, perform feature engineering and selection, assess the accuracy, and utilize the ModelZoo browser and root cause analysis functionalities to investigate the outcomes…more details
Philip Wauters is Customer Success Manager and Value engineer at Tangent Works working on practical applications of time series machine learning at customers from various industries such as Siemens, BASF, Borealis and Volkswagen. With a commercial background and experience with data engineering, analysis and data science his goal is to find and extract the business value in the enormous amounts of time-series data that exists at companies today.
In-person | Workshop | Deep Learning | All Levels
Facial recognition systems are everywhere. Of course, it’s where you would expect it, such as airports, border crossings, and government offices. However, it’s also in some public surveillance cameras, all over social media, embedded in smart home solutions, and even in your phone. Have you ever wondered how facial recognition systems work? In this hands-on session, we will build a facial recognition system from scratch using open-source technologies and publically available pre-trained models…more details
Serg Masís has been at the confluence of the internet, application development, and analytics for the last two decades. He’s an Agronomic Data Scientist at Syngenta, a leading agribusiness company with a mission to improve global food security. Before that role, he co-founded a search engine startup, incubated by Harvard Innovation Labs, that combined the power of cloud computing and machine learning with principles in decision-making science to expose users to new places and events efficiently. Whether concerning leisure activities, plant diseases, or customer lifetime value, Serg is passionate about providing the often-missing link between data and decision-making. He wrote the bestselling book “Interpretable Machine Learning with Python” and is currently working on a new book titled “DIY AI” with do-it-yourself projects for AI hobbyists and practitioners alike.
Virtual | Tutorial | Generative AI | Prompt Engineering | Intermediate
Most people who call themselves prompt engineers are really just blind prompting: chatting with ChatGPT and manually reviewing the results through crude trial and error. If you’re planning to use a prompt at scale – as part of a template, workflow, or product – it’s essential you run that prompt 20-30 times via the GPT-4 API to see how often it fails. You also need to be rigorously A/B testing your prompt against alternative approaches to find what really makes a difference to results. Langchain is a development framework for Large Language Models (LLMs) like GPT-4, and it can help you build a consistent system for running, monitoring, and measuring the performance of your prompts, so you can optimize them against your success metric…more details
Mike is a data-driven, technical marketer who built a 50 person marketing agency (Ladder), and 300k people have taken his online courses (LinkedIn, Udemy, Vexpower). He now works freelance on generative AI projects, and is writing a book on Prompt Engineering for O’Reilly Media.
In-person | Workshop | LLM | Gen AI | Machine Learning | Intermediate
In this talk, we show how to personalize LLMs using a feature store and prompt engineering. We walk through how to build an example free serverless, personalized LLM application using Hopsworks, an open-source feature store with a built-in vector database. We will look at how to build templates for prompts, and how they can be easily constructed and included in user queries…more details
Jim Dowling is CEO of Hopsworks and an Associate Professor at KTH Royal Institute of Technology. He is lead architect of the open-source Hopsworks Feature Store platform. He is the organizer of the annual feature store summit conference and featurestore.org community, as well as co-organizer of PyData Stockholm.
In-person | Workshop
Have you ever thought about building an application that can help people find places to live that maximize their quality of life and happiness?
The goal of this workshop is to analyze public datasets across different categories and discover correlations among them using the HPCC Systems platform. After analyzing, we will design an interface to query this data and assign it a scoring system, and then deliver it to the user via the HPCC ROXIE cluster and show the user the best places to live in the United States. Users should be given choices in an easy-to-use form that when submitted will generate a unique set of scores based on locations (Example: By State)…more details
Bob has worked with the HPCC Systems technology platform and the ECL programming language for over a decade and has been a technical trainer for over 30 years. He is the developer and designer of the HPCC Systems Online Training Courses and is the Senior Instructor for all classroom and remote based training.
In-person | Tutorial | Generative AI | Machine Learning | Beginner – Intermediate
The tutorial will cover the existing research on the capabilities of LLMs versus small traditional ML models. If an LLM is the best solution, the tutorial covers several techniques, including evaluation suites like the EleutherAI Harness, head-to-head competition approaches, and using LLMs for evaluating other LLMs. The tutorial will also touch on subtle factors that affect evaluation, including role of prompts, tokenization, and requirements for factual accuracy. Finally, a discussion of model bias and ethics will be integrated into the working examples…more details
Rajiv Shah is a machine learning engineer at Hugging Face who focuses on enabling enterprise teams to succeed with AI. Rajiv is a leading expert in the practical application of AI. Previously, he led data science enablement efforts across hundreds of data scientists at DataRobot. He was also a part of data science teams at Snorkel AI, Caterpillar, and State Farm. Rajiv is a widely recognized speaker on AI, published over 20 research papers, and received over 20 patents, including sports analytics, deep learning, and interpretability. Rajiv holds a PhD in Communications and a Juris Doctor from the University of Illinois at Urbana Champaign. While earning his degrees, he received a fellowship in Digital Government from the John F. Kennedy School of Government at Harvard University. He also has a large following on AI-related short videos on Tik Tok and Instagram at @rajistics.
Tutorial | NLP and LLMs | Intermediate
This tutorial will take participants on a journey to explore the transformative capabilities of Large Language Models (LLMs) like GPT-3 and GPT-4 in revolutionizing the field of information retrieval. A key highlight will be the introduction to LangChain, a growingly popular framework designed to enable data connectivity for LLMs, enhancing their ability to offer context-aware and personalized results. This tutorial is designed for tech professionals, AI researchers, data scientists, and developers, offering a fresh perspective on the intersection of AI and interactive information retrieval…more details
Chaine San Buenaventura is the co-founder of Voilabs, an early-stage AI startup based in Paris specializing in voice chatbots for customer service. They are exploring the transformative capabilities of AI in reshaping digital interactions and are committed to driving innovation in this space. Chaine continues to contribute her expertise to Wizy.io, where she has been serving as the Lead Machine Learning Engineer, assisting in the advancement of their AI initiatives. Passionate about the future of AI, Chaine consistently explores the intersection of deep learning and natural, context-rich digital interactions, continually pushing the boundaries of what’s possible in Human-Machine Interaction. Her years of dedicated work in developing AI solutions and active participation in research, conferences, and community dialogues underscore her commitment to AI innovation and knowledge-sharing in the expanding field.
In-person | Workshop | Machine Learning | All Levels
In the modern age of Machine Learning, the importance of high-quality, accurately labeled data cannot be overstated. This presentation introduces how to create high-quality, annotated datasets for training Machine Learning (ML) models. In this tutorial, we will use Label Studio, an open-source, multi-type data labeling tool, to explore common methods for annotating raw datasets, including both human and automated labeling techniques…more details
Chris Hoge is the Head of Community for HumanSignal, where he is helping to grow the Label Studio community. He has spent over a decade working in open source machine learning and infrastructure communities, including Apache TVM, Kubernetes, and OpenStack. He has an M.S. in Applied Mathematics from the University of Colorado, with an emphasis on using high-performance numerical methods for simulating physical systems. He makes his home in the Pacific Northwest, where he spends his free time trail running and playing piano.
In-person | Tutorial | Machine Learning | Beginner-Intermediate
By completing this workshop, you will develop an understanding of no-code and low-code frameworks, how they are used in the ML workflow, how they can be used for data ingestion and analysis, and for building, training, and deploying ML models. You will become familiar with Google’s Vertex AI for both no-code and low-code ML model training, and Google’s Colab, a free Jupyter Notebook service for running Python and the Keras Sequential API, a simple and easy-to-use API that is well-suited for beginners. You will also become familiar with how to assess when to use low-code, no-code, and custom ML training frameworks…more details
Gwendolyn Stripling, Ph.D., is an Artificial Intelligence and Machine Learning Content Developer at Google Cloud. Stripling is author of the widely popular YouTube video, “Introduction to Generative AI” and of the O’Reilly Media book “Low-Code AI: A Practical Project Driven Approach to Machine Learning”. They are also the author of the LinkedIn Learning video “Introduction to Neural Networks”. Stripling is an Adjunct Professor and member of Golden Gate University’s Masters in Business Analytics Advisory Board. Stripling enjoys speaking on AI/ML, having presented at Dominican University of California’s Barowsky School of Business Analytics, Golden Gate University’s Ageno School of Business Analytics, and numerous Tech conferences.
In-person | Workshop | Data Analytics and Big data | Machine Learning | All Levels
Today, everything is online – meters, cars, elevators, assembly lines, and even bicycles are connected to the Internet. And all of these items are emitting a relentless stream of metrics and events. With the advent of IoT and the cloud, the volume of time-series data has begun growing exponentially in an unprecedented way. The massive size of time-series data sets is a major challenge for general database management systems like relational and NoSQL databases…more details
Jeff Tao is the founder and CEO of TDengine. He has a background as a technologist and serial entrepreneur, having previously conducted research and development on mobile Internet at Motorola and 3Com and established two successful tech startups. Foreseeing the explosive growth of time-series data generated by machines and sensors now taking place, he founded TDengine in May 2017 to develop a high-performance time-series database purpose-built for modern IoT and IIoT businesses.
Virtual | Tutorial | Machine Learning | Intermediate
The talk will explore technical approaches for responsible AI, including explainability, model validation, bias management, data privacy, and ML security. Additionally, we’ll discuss the formation of an impactful AI risk management practice. Moreover, the presentation will provide an introductory guide to existing standards, laws, and assessments for AI technology adoption, including the newly released NIST AI Risk Management Framework. Audiences will also be introduced to the available resources on GitHub and Colab, serving as valuable tools for the audience to continue learning, post the presentation…more details
Parul Pandey has a background in Electrical Engineering and currently works as a Principal Data Scientist at H2O.ai. Prior to this, she was working as a Machine Learning Engineer at Weights & Biases. Parul is one of the co-authors of Machine Learning for High-Risk Applications book, which focuses on the responsible implementation of AI. She is also a Kaggle Grandmaster in the notebooks category and was one of Linkedin’s Top Voices in the Software Development category in 2019. Parul has written multiple articles focused on Data Science and Software development for various publications and mentors, speaks, and delivers workshops on topics related to Responsible AI.
Virtual | Tutorial | Prompt Engineering | LLM | Intermediate-Advanced
In this talk, I will demonstrate to students how to create a SequentialChain for generating articles using a colab notebook script. The aim of the script is to generate an article from an LLM via the following calls:
1. Generate an article outline.
2. Create an article from step 1.
3. Edit article from step 2.
4. Translate the article into Spanish, German and French versions from step 3…more details
James is a full-stack engineer that specialises in automating marketing and business processes with AI based solutions.
Virtual | Tutorial | Deep Learning | Machine Learning | Research Frontiers | Explainable AI | Intermediate
You have to see it to believe it! Imagine a technique where you randomly delete as many as 80% of your observations in the training set, without decreasing the predictive power (actually improving it in many cases), and reducing computing time by an order of magnitude. In its simplest version, that’s what stochastic thinning does. Here, performance improvement is measured outside the training set, on the validation set also called test data. I illustrate this method on a real-life dataset, in the context of regression and neural networks. In the latter, it speeds up the training stage by a noticeable factor. The thinning process applies to the training set, and may involve multiple tiny random subsets called fractional training sets, representing less than 20% of the training data when combined together. It can also be used for data compression, or to measure the strength of a machine learning algorithm…more details
Virtual | Half-Day Training | LLM | LangChain | Intermediate
During this workshop, we will walk through each component of a simple RAG system – from vector stores (we’ll use Pinecone) to embedding models (we’ll pick one from Hugging Face’s embedding leaderboard) to the LLM Ops infrastructure glue that holds it all together (LangChain of course!) – including how to think about each component conceptually in addition to how to set them up in Python code…more details
Dr. Greg Loughnane is the Founder & CEO of AI Makerspace, where he serves as lead instructor for their LLM Ops: LLMs in Production course. Since 2021 he has built and led industry-leading Machine Learning & AI bootcamp programs. Previously, he has worked as an AI product manager, a university professor teaching AI, an AI consultant and startup advisor, and ML researcher. He loves trail running and is based in Dayton, Ohio.
Chris Alexiuk, is the Head of LLMs at AI Makerspace, where he serves as a programming instructor, curriculum developer, and thought leader for their flagship LLM Ops: LLMs in Production course. During the day, he’s a Founding Machine Learning Engineer at Ox. He is also a solo YouTube creator, Dungeons & Dragons enthusiast, and is based in Toronto, Canada.
In this talk, we will cover the use of Generative models, such as LLMs and GANs, for the generation of smart synthetic data that can be leveraged to impute missing data. By using a generative model to impute missing data, we can generate new samples that are representative of the underlying data distribution, which can help to reduce the impact of missing data on our models. In addition, these models can be fine-tuned to specific datasets, allowing us to generate synthetic data that is tailored to our particular use case…more details
Fabiana Clemente is the co-founder and CDO of YData, combining Data Understanding, Causality, and Privacy as her main fields of work and research, with the mission to make data actionable for organizations. Passionate for data, Fabiana has vast experience leading data science teams in startups and multinational companies. Host of “When Machine Learning meets privacy” podcast and a guest speaker at Datacast and Privacy Please, the previous WebSummit speaker, was recently awarded “Founder of the Year” by the South Europe Startup Awards.
This tutorial aims to walk the participants through a brief introduction of recommender systems, explanation of traditional algorithms along with their limitations. Next, we will discuss how graphs are proving to be an effective resource in developing effective recommender systems. Additionally, we will also discuss other challenges related to recommender systems including identifying the method best suited to measure effectiveness of a recommendation system in industrial set up and explain ability of recommendations…more details
Harshita is an Applied Scientist at Amazon based out of Seattle. She creates ML models that help various teams across AWS Sales and Marketing drive important businesses decisions. She specializes in generating actionable insights from data and quantifying business impact of various initiatives. Harshita is a ML enthusiast who graduated from Syracuse University with a Masters in Data Science in 2021.
Virtual | Half-Day Training | All Levels
In this 90-minute hands-on lab, you’ll learn how to start your Graph journey with ArangoDB. We will walk you through getting a basic deployment up and running in minutes. We will then help you create a deployment in the ArangoGraph insights platform and learn how to load and query your data.more details
In-person | Tutorial | Deep Learning | Machine Learning | MLOps and Data Engineering | Beginner – Intermediate
This tutorial will teach you how to operationalize fundamental ideas from data-centric AI across a wide variety of datasets (image, text, tabular, etc). We will cover recent algorithms to automatically identify common issues in real-world data (label errors, bad data annotators, outliers, low-quality examples, and other dataset problems that once identified can be easily addressed to significantly improve trained models)…more details
Jonas Mueller is Chief Scientist and Co-Founder at Cleanlab, a software company providing data-centric AI tools to efficiently improve ML datasets. Previously, he was a senior scientist at Amazon Web Services developing AutoML and Deep Learning algorithms which now power ML applications at hundreds of the world’s largest companies. In 2018, he completed his PhD in Machine Learning at MIT, also doing research in NLP, Statistics, and Computational Biology.
Jonas has published over 30 papers in top ML and Data Science venues (NeurIPS, ICML, ICLR, AAAI, JASA, Annals of Statistics, etc). This research has been featured in Wired, VentureBeat, Technology Review, World Economic Forum, and other media. He has also contributed open-source software, including the fastest-growing open-source libraries for AutoML (https://github.com/awslabs/autogluon) and Data-Centric AI (https://github.com/cleanlab/cleanlab).
In-person | Workshop | Machine Learning | Big Data Analytics | Data Visualization | Intermediate
As part of Salesforce’s Platform performance engineering team, we use Python to employ anomaly detection techniques and other analytics methods for monitoring customer data. This helps us gain real-time insights into the performance of our production data and its impact on customers. By promptly identifying anomalies and extracting valuable insights, we improve system reliability, reduce downtime, and enhance customer trust—core values at Salesforce…more details
Geeta Shankar is a software engineer who specializes in leveraging data for business success. With expertise in computer science, data science, machine learning, and artificial intelligence, she stays updated with the latest data-driven innovations. Her Indian classical music background has taught her the value of sharp thinking, spontaneity, and connecting with diverse individuals. Geeta uses these skills to translate complex data into meaningful insights that enhance performance and customer experiences.
Tuli Nivas is a Software Engineering Architect at Salesforce with extensive experience in design and implementation of test automation and monitoring frameworks. Her interests lie in software testing, cloud computing, big data analytics, systems engineering, and architecture. Tuli holds a PhD in computer science with a focus on building processes to set up robust and fault-tolerant performance engineering systems. Her recent area of expertise has been around machine learning and building data analytics for better and faster troubleshooting of performance problems and anomaly detection in production.
In-person | Workshop | Generative AI | NLP&LLMs | Machine Learning | Intermediate
This session is relevant to practitioners in a variety of industries, including:
Creative industries: Stable Diffusion can be used to generate images for marketing materials, product designs, and other creative projects. Technology industries: Stable Diffusion can be used to develop new applications for text-to-image generation, such as chatbots and virtual assistants. Research industries: Stable Diffusion can be used to conduct research on text-to-image generation and its applications…more details
Sandeep Singh is a leader in applied AI and computer vision in Silicon Valley’s mapping industry, and he is at the forefront of developing cutting-edge technology to capture, analyze and understand satellite imagery, visual and location data. With a deep expertise in computer vision algorithms, machine learning and image processing and applied ethics, Sandeep is responsible for creating innovative solutions that enable mapping and navigation software to accurately and efficiently identify and interpret features to remove inefficiencies of logistics and mapping solutions. His work includes developing sophisticated image recognition systems, building 3D mapping models, and optimizing visual data processing pipelines for use in logistics, telecommunications and autonomous vehicles and other mapping applications. With a keen eye for detail and a passion for pushing the boundaries of what’s possible with AI and computer vision, Sandeep’s leadership is driving the future of applied AI forward.
In-person | Workshop | NLP & LLMs | GenAI | Intermediate
Large Language Models (LLM’s) are starting to revolutionize how users can search for, interact with, and generate new content. Some recent stacks around Retrieval Augmented Generation (RAG) have emerged where users are building LLM search/retrieval applications (e.g. chatbots) on their own private data. Moreover, there is an opportunity to have even richer set of interactions with data; by empowering LLM agents with both read and write capabilities over a set of diverse data tools, they hold the promise of automating knowledge workflows. LlamaIndex provides the core tools to build LLM-powered search and retrieval systems as well as more automated knowledge workers capable of interfacing with your data sources in more sophisticated manners. In this workshop, we help you build both a simple QA bot as well as an automated workflow agent, all powered by LLMs…more details
Jerry is the co-founder/CEO of LlamaIndex, an open-source tool that provides a central data management/query interface for your LLM application. Before this, he has spent his career at the intersection of ML, research, and startups. He led the ML monitoring team at Robust Intelligence, did self-driving AI research at Uber ATG, and worked on recommendation systems at Quora. He graduated from Princeton in 2017 with a degree in CS.
In-person | Half-Day Training | LLMs | Beginner-Intermediate
In this workshop, you will learn the tips and tricks of creating and fine-tuning LLMs along with implementing cutting edge ideas of building these systems from the best research papers. We will start by learning the foundations behind what makes a LLM, quickly moving into fine-tuning our own GPT and finally implementing some of the cutting edge tricks of building these models…more details
Sanyam Bhutani is a Sr Data Scientist and Kaggle Grandmaster at H2O where he drinks chai and makes content for the community. When not drinking chai, he is to be found hiking the Himalayas, often with LLM Research papers. For the past 6 months, he has been writing about Generative AI everyday on the internet. Before that he has been recognised for his #1 Kaggle Podcast: Chai Time Data Science and also widely known on the internet for “maximising compute per cubic inch of an ATX case” by fixing 12 GPUs into his home office.
In-Person | Half-Day Training | Machine Learning | Intermediate
Finding good datasets or web assets to build data products or websites with, respectively, can be time-consuming. For instance, data professionals might require data from heavily regulated industries like healthcare and finance, and, in contrast, software developers might want to skip the tedious task of collecting images, text, and videos for a website. Luckily, both scenarios can now benefit from the same solution, Synthetic Data…more details
Ramon is a data scientist, researcher, and educator currently working in the Developer Relations team at Seldon in London. Prior to joining Seldon, he worked as a freelance data professional and as a Senior Product Developer at Decoded, where he created custom data science tools, workshops, and training programs for clients in various industries. Before freelancing, Ramon wore different research hats in the areas of entrepreneurship, strategy, consumer behavior, and development economics in industry and academia. Outside of work, he enjoys giving talks and technical workshops and has participated in several conferences and meetup events. In his free time, you will most likely find him traveling to new places, mountain biking, or both.
In-person | Workshop | Machine Learning | Intermediate
This workshop will show how to use XGBoost. It will demonstrate model creation, model tuning, model evaluation, and model interpretation…more details
In-person | Workshop | AI Safety
We’ll explore the development of AI risk profiles, which are standardized pre-deployment disclosures highlighting key risks such as fairness, security, and societal impact. I will introduce a proposed risk taxonomy, encompassing vital risk categories relevant across various industries. These profiles play a significant role in AI governance across the entire value chain…more details
Ian Eisenberg is Head of AI Governance Research at Credo AI, where he advances best practices in AI governance to support Credo AI’s product and policy strategy. He is also the founder of the AI Salon, an SF-based group supporting conversations on the meaning and impact of AI. His interest in AI started as a cognitive neuroscientist at Stanford, which developed into a focus on the sociotechnical challenges of AI technologies and reducing AI risk. Ian has been a researcher at Stanford, the NIH, Columbia and Brown University. He received his PhD from Stanford University, and BS from Brown University.
In-person | Workshop | LLMs | All Levels
In this workshop, we’ll explore how Label Studio, LangChain, Chroma, and Gradio can be employed as tools for continuous improvement, specifically in building a Question-Answering (QA) system trained to answer questions about an open source project, in this case Label Studio itself, using the domain-specific knowledge from Label Studio’s GitHub documentation…more details
In-person | Workshop | Machine Learning | MLOps and Data Engineering | Intermediate – Advanced
This talk will outline the complexity of feature engineering from raw entity-level data, the reduction in complexity that comes with composable compute graphs, and an example of the working solution. We will also discuss a case study of the impact on a logistics & supply chain machine learning problem. If you work on large scale MLOps projects, this talk may be of interest…more details
Wes is a machine learning expert with over a decade of experience delivering business value with AI. Wes’s experience spans multiple industries, but always with an MLOps focus. His recent areas of focus and interest are graphs, distributed computing, and scalable feature engineering pipelines.
In-Person | Workshop | NLP & LLMs | Intermediate-Advanced
The workshop will provide a comprehensive understanding of the challenges and intricacies involved in aligning LLMs. By learning to navigate through data preprocessing and quality assessment, participants will gain insights into identifying the most relevant data for fine-tuning LLMs effectively. Moreover, the practical application of reinforcement learning techniques will empower attendees to tailor LLMs for specific tasks, ensuring enhanced performance and precision in real-world applications…more details
In-person | Workshop | LLM | Intermediate
This can be either a talk or a workshop, depending on how much time you have. This session aims to provide hands-on, engaging content that gives developers a basic understanding of Llama 2 models, how to access and use them, build core components of the AI chatbot using LangChain and Tools. The audience will also learn core concepts around Prompt Engineering and Fine-Tuning and programmatically implement them using Responsible AI principles. Lastly, we will conclude the talk explaining how they can leverage this powerful tech, different use-cases and what the future looks like…more details
Amit Sangani is the Director of Partner Engineering leading the Applied AI Platforms team at Meta. Amit has been with Meta for 8+ years and manages developer-facing engineering teams working on Gen AI platforms such as Llama 2 and PyTorch. Amit’s mission is to democratize AI and increase the adoption of these platforms by making it easier for developers to integrate them into their products and spur innovation and increased productivity.
Jeffrey Yau is currently Chief Data & A.I. Officer at Fanatics Collectibles. Most recently, he served as Global Head of Data Science, Analytics & Engineering at Amazon Music where he oversaw multiple teams who developed both insights-packed analytics and end-to-end statistical and machine learning systems. Prior to Amazon, Jeffrey worked at WalmartLabs as the VP of Data Science & Engineering where he led the team responsible for powering Walmart store mobile apps and the entire store finance system. Further, his team created end-to-end machine learning systems for key business initiatives and had a multi-billion dollar impact annually on Walmart U.S.
Over the years, he has held various senior level positions in quantitative finance at global investment management firm AllianceBernstein, consulting firm Data Science at Silicon Valley Data Science, multinational financial services company Charles Schwab Corporation, and the world’s leading professional services firm KPMG. He began his career as a tenure-track Assistant Professor of Economics at Virginia Tech, and he was an adjunct professor at UC Berkeley, Cornell, and NYU, teaching machine learning and advanced statistical modeling for finance and business.
Virtual | Tutorial | GenAI | LLM | All Levels
The tutorial will consist of the following parts: (1) Introduction and overview of the generative AI landscape, (2) Technical and ethical challenges with generative AI, and (3) Solutions for alleviating the challenges with real-world use cases and case studies (including practical challenges and lessons learned in industry)…more details
Krishnaram Kenthapadi is the Chief AI Officer & Chief Scientist of Fiddler AI, an enterprise startup building a responsible AI and ML monitoring platform. Previously, he was a Principal Scientist at Amazon AWS AI, where he led the fairness, explainability, privacy, and model understanding initiatives in the Amazon AI platform. Prior to joining Amazon, he led similar efforts at the LinkedIn AI team, and served as LinkedIn’s representative in Microsoft’s AI and Ethics in Engineering and Research (AETHER) Advisory Board. Previously, he was a Researcher at Microsoft Research Silicon Valley Lab. Krishnaram received his Ph.D. in Computer Science from Stanford University in 2006. He serves regularly on the senior program committees of FAccT, KDD, WWW, WSDM, and related conferences, and co-chaired the 2014 ACM Symposium on Computing for Development. His work has been recognized through awards at NAACL, WWW, SODA, CIKM, ICML AutoML workshop, and Microsoft’s AI/ML conference (MLADS). He has published 50+ papers, with 7000+ citations and filed 150+ patents (70 granted). He has presented tutorials on privacy, fairness, explainable AI, model monitoring, responsible AI, and generative AI at forums such as ICML, KDD, WSDM, WWW, FAccT, and AAAI, given several invited industry talks, and instructed a course on responsible AI at Stanford.
In-person | Workshop | Machine Learning | Deep Learning | NLP | ALL | Intermediate
In this hands-on, example driven workshop, we’ll introduce Mojo🔥 language features by starting with Python code and making minor changes to convert into high-performance Mojo🔥 code. Mojo provides full interoperability with the Python ecosystem, and we’ll show you how to integrate Mojo🔥 into your existing Python workflows. We’ll share the workshop material as a hosted tutorial with several Mojo🔥 scripts and Jupyter Notebooks which you can use as a starting point for your projects…more details
Shashank is an engineer, educator and doodler. He writes and talks about machine learning, specialized machine learning hardware (AI Accelerators) and AI Infrastructure in the cloud. He previously worked at Meta, AWS, NVIDIA, MathWorks (MATLAB) and Oracle in developer relations and marketing, product management, and software development roles and hold an M.S. in electrical engineering.
In-person | Tutorial | NLP & LLMs | Generative AI
Large language models have swept through the world of AI in the past few years. This talk will give some background on large language models, their origins from the early days of information theory, current capabilities, some pitfalls, and potential future capabilities with a focus on the innovations behind some recent models including the sparse LLM model GLaM and the recent model PaLM 2…more details
Andrew Dai did his PhD at the University of Edinburgh before joining Google Brain 9 years ago in 2014 where he did research on language models, story generation and conversational agents and products including SmartReply. He moved to Google Health in 2017 to research deep learning for medical records. He then returned to continue research at Google Brain (now Google Deepmind) in 2020 and since then has co-led the development and training of LLMs including PaLM 2 and GLaM. Andrew also is a lead for Google SGE modelling, Gemini and data research and is excited by the new abilities we see from LLMs.
In-person | Workshop | Machine Learning | Responsible AI | Explainability
This talk will examine the implications of incorporating graphs into the realm of generative AI, exploring the potential for even greater advancements. Learn about foundational concepts such as directed acrylic graphs (DAGs), Jedeau Pearl’s “do” operator, and keeping domain expertise in the loop. You’ll hear how the explainability landscape is evolving, comparisons of graph-based models to other methods, and how we can evaluate the different fairness models available…more details
Virtual | Bootcamp | Machine Learning | All Levels
Statistics and statistical inference form the core of making sense of data. Inference is what allows us to extrapolate and generalize from our data to the populations we are trying to understand. Below inference lies measurement, how we attach numbers to phenomena we’d like to understand. In this tutorial we consider measurement and inference, especially as it pertains to scientific repeatability. We pay particular focus on using artificial intelligence and machine learning as methods of measurement and the fundamental role that inference plays. Specifically, we focus on validation…more details
This course is aimed at helping people begin their AI journey and gain valuable insights that we will build up in subsequent SQL, programming and AI courses…more details
This course covers topics such as database design and normalization, data wrangling, aggregate functions, subqueries, and join operations. Students will learn how to design and write SQL code to solve real-world problems …more details
This course aims to provide a basic foundation in Python and help participants develop the skills needed to progress in the field of data science and machine learning…more details
This AI literacy course is designed to introduce participants to the basic of artificial intelligence (AI) and machine learning.
Upon completions, individuals will have a foundational understanding of machine learning and its capabilities…more details
Virtual | Bootcamp | Machine Learning | Beginner
With the availability of data, there is a growing demand for talent who can analyze and make sense of it. This makes practical math all the more important because it helps infer insights from data. However, mathematics comprises many topics, and it is hard to identify which ones are applicable and relevant for a data science career. Knowing these essential math topics is key to integrating knowledge across data science, statistics, and machine learning. It has become even more important with the prevalance of libraries like PyTorch and scikit-learn, which can create “”black box”” approaches where data science professionals use these libraries but do not fully understand how they work…more details
Thomas Nield is the founder of Nield Consulting Group and Yawman Flight, as well as an instructor at University of Southern California. He enjoys making technical content relatable and relevant to those unfamiliar or intimidated by it. Thomas regularly teaches classes on data analysis, machine learning, mathematical optimization, and practical artificial intelligence. At USC he teaches AI System Safety, developing systematic approaches for identifying AI-related hazards in aviation and ground vehicles. He’s authored three books, including Essential Math for Data Science (O’Reilly) and Getting Started with SQL (O’Reilly)
He is also the founder and inventor of Yawman Flight, a company developing universal handheld flight controls for flight simulation and unmanned aerial vehicles.