All Sessions by ODSC Instructor
Programming with PythonMachine Learning | Beginner
The Python language is one of the most popular programming languages in data science and machine learning as it offers a number of powerful and accessible libraries and frameworks specifically designed for these fields. This programming course is designed to give participants a quick introduction to the basics of coding using the Python language. It covers topics such as data structures, control structures, functions, modules, and file handling. This course aims to provide a basic foundation in Python and help participants develop the skills needed to progress in the field of data science and machine learning.
Introduction to AIMachine Learning | Beginner
This AI literacy course is designed to introduce participants to the basics of artificial intelligence (AI) and machine learning. We will first explore the various types of AI and then progress to understand fundamental concepts such as algorithms, features, and models. We will study the machine learning workflow and how it is used to design, build, and deploy models that can learn from data to make predictions. This will cover model training and types of machine learning including supervised, and unsupervised learning, as well as some of the most common models such as regression and k-means clustering. Upon completion, individuals will have a foundational understanding of machine learning and its capabilities and be well-positioned to take advantage of introductory-level hands-on training in machine learning and data science such as ODSC East’s Mini-Bootcamp.
Data Wrangling with PythonMachine Learning | Beginner
Data wrangling is the cornerstone of any data-driven project, and Python stands as one of the most powerful tools in this domain. In preparation for the ODSC conference, our specially designed course on “Data Wrangling with Python” offers attendees a hands-on experience to master the essential techniques. From cleaning and transforming raw data to making it ready for analysis, this course will equip you with the skills needed to handle real-world data challenges. As part of a comprehensive series leading up to the conference, this course not only lays the foundation for more advanced AI topics but also aligns with the industry’s most popular coding language . Upon completion of this short course attendees will be fully equipped with the knowledge and skills to manage the data lifecycle and turn raw data into actionable insights, setting the stage for advanced data analysis and AI applications.
Introduction to Machine LearningMachine Learning | Beginner
In an introductory machine learning live training, key topics include defining machine learning, distinguishing supervised and unsupervised learning, covering basic concepts, exploring common algorithms (e.g., decision trees, neural networks)
Data and Generative AI LiteracyGenerative AI | Beginner
Data is the essential building block of Data Science, Machine Learning, and AI. This course is the first in the series and is designed to teach you the foundational skills and knowledge required to understand, work with, and analyze data.
Data Wrangling with SQLMachine Learning | Beginner
This SQL coding course teaches students the basics of Structured Query Language, which is a standard programming language used for managing and manipulating data and an essential tool in AI. The course covers topics such as database design and normalization, data wrangling, aggregate functions, subqueries, and join operations and students will learn how to design and write SQL code to solve real-world problems.
Introduction to Large Language ModelsLLMs | Beginner
This hands-on course serves as a comprehensive introduction to Large Language Models (LLMs), covering a spectrum of topics from their differentiation from other language models to their underlying architecture and practical applications. It delves into the technical aspects, such as the transformer architecture and the attention mechanism, which are the cornerstones of modern language models. The course also explores the applications of LLMs, focusing on zero-shot learning, few-shot learning, and fine-tuning, which showcase the models' ability to adapt and perform tasks with limited to no examples. Furthermore, it introduces the concept of flow chaining as a method to generate coherent and extended text, demonstrating its usefulness in tackling token limitations in real-world scenarios such as Q&A bots. Through practical examples and code snippets, participants are given a hands-on experience on how to utilize and harness the power of LLMs across various domains.
Prompt Engineering FundamentalsGenerative AI | Beginner
This workshop on Prompt Engineering explores the pivotal role of prompts in guiding Large Language Models (LLMs) like ChatGPT to generate desired responses. It emphasizes how prompts provide context, control output style and tone, aid in precise information retrieval, offer task-specific guidance, and ensure ethical AI usage. Through practical examples, participants learn how varying prompts can yield diverse responses, highlighting the importance of well-crafted prompts in achieving relevant and accurate text generation. Additionally, the workshop introduces temperature control to balance creativity and coherence in model outputs, and showcases LangChain, a Python library, to simplify prompt construction. Participants are equipped with practical tools and techniques to harness the potential of prompt engineering effectively, enhancing their interaction with LLMs across various contexts and tasks.
Prompt Engineering with OpenAIGenerative AI | Beginner
This workshop on prompt engineering with OpenAI discussed best practices for utilizing OpenAI models. We will review how to separate instructions and context using special characters to help improve instruction clarity, context isolation, and enhances control over the generation process. The workshop also included code for installing the langchain library and demonstrated how to create prompts effectively, emphasizing the importance of clarity, specificity, and precision in prompts. Additionally, the workshop showed how to craft prompts for specific tasks, such as extracting entities from text. It provided templates for prompts and highlighted the significance of specifying the desired output format through examples for improved consistency and customization. Lastly, the workshop addressed the importance of using prompts as safety guardrails. It introduced prompts to mitigate hallucination and jailbreaking risks by instructing the model to generate well-supported and verifiable information, thereby promoting responsible and ethical use of language models.
Build a Question & Answering BotGenerative AI | Beginner
The workshop notebook delves into building a Question and Answering Bot based on a fixed knowledge base, covering the integration of concepts discussed in earlier notebooks about LLMs (Large Language Models) and prompting. Initially, it introduces a high-level architecture focusing on vector search—a method to retrieve similar items based on vector representations. The notebook explains the steps involved in vector search including vector representation, indexing, querying, similarity measurement, and retrieval, detailing various technologies used for vector search such as vector libraries, vector databases, and vector plugins. The example utilizes an Open Source vector database, Chroma, to index data and uses state-of-the-union text data for the exercise. The notebook then transitions into the practical implementation, illustrating how text data is loaded, chunked into smaller pieces for effective vector search, and mapped into numeric vectors using the MPNetModel from the SentenceTransformer library via HuggingFace. Following this, the focus shifts to text generation where Langchain Chains are introduced. Chains, as described, allow for more complex applications by chaining several steps and models together into pipelines. A RetrievalQA chain is used to build a Q&A Bot application which utilizes an OpenAI chat model for text generation.
Fine Tuning Embedding ModelsLLMs | Beginner
This workshop explores the importance of fine-tuning Language and Embedding Models (LLMs). It highlights how embedding models are used to map natural language to vectors, crucial for pipelines with multiple models to adapt to specific data nuances. An example demonstrates fine-tuning an embedding model for legal text. The notebook discusses existing solutions and hardware considerations, emphasizing GPU usage for large data. The practical part of the notebook shows the fine-tuning process of the "distilroberta-base" model from the SentenceTransformer library. It utilizes the QQP_triplets dataset from Quora for training, designed around semantic meaning. The notebook prepares the data, sets up a DataLoader, and employs Triplet Loss to encourage the model to map similar data points closely while distancing dissimilar ones. It concludes by mentioning the training duration and resources needed for further improvements.
Fine Tuning an Existing LLMLLMs | Beginner
The workshop explores the process of fine-tuning Large Language Models (LLMs) for Natural Language Processing (NLP) tasks. It highlights the motivations for fine-tuning, such as task adaptation, transfer learning, and handling low-data scenarios, using a Yelp Review dataset. The notebook employs the HuggingFace Transformers library, including tokenization with AutoTokenizer, data subset selection, and model choice (BERT-based model). Hyperparameter tuning, evaluation strategy, and metrics are introduced. It also briefly mentions DeepSpeed for optimization and Parameter Efficient Fine-Tuning (PEFT) for resource-efficient fine-tuning, providing a comprehensive introduction to fine-tuning LLMs for NLP tasks.
LangChain AgentsLLMs | Beginner
The "LangChain Agents" workshop delves into the "Agents" component of the LangChain library, offering a deeper understanding of how LangChain integrates Large Language Models (LLMs) with external systems and tools to execute actions. This workshop builds on the concept of "chains," which can link multiple LLMs to tackle various tasks like classification, text generation, code generation, and more. "Agents" enable LLMs to interact with external systems and tools, making informed decisions based on available options. The workshop explores the different types of agents, such as "Zero-shot ReAct," "Structured input ReAct," "OpenAI Functions," "Conversational," "Self ask with search," "ReAct document store," and "Plan-and-execute agents." It provides practical code examples, including initializing LLMs, defining tools, creating agents, and demonstrates how these agents can answer questions using external APIs, offering participants a comprehensive overview of LangChain's agent capabilities.
Parameter Efficient Fine tuningLLMs | Beginner
For the next workshop, our focus will be on parameter-efficient fine-tuning (PEFT) techniques in the field of machine learning, specifically within the context of large neural language models like GPT or BERT. PEFT is a powerful approach that allows us to adapt these pre-trained models to specific tasks while minimizing additional parameter overhead. Instead of fine-tuning the entire massive model, PEFT introduces compact, task-specific parameters known as "adapters" into the pre-trained model's architecture. These adapters enable the model to adapt to new tasks without significantly increasing its size. PEFT strikes a balance between model size and adaptability, making it a crucial technique for real-world applications where computational and memory resources are limited, while still maintaining competitive performance. In this workshop, we will delve into the different PEFT methods, such as additive, selective, re-parameterization, adapter-based, and soft prompt-based approaches, exploring their characteristics, benefits, and practical applications. We will also demonstrate how to implement PEFT using the Hugging Face PEFT library, showcasing its effectiveness in adapting large pre-trained language models to specific tasks. Join us to discover how PEFT can make state-of-the-art language models more accessible and practical for a wide range of natural language processing tasks."
Retrieval-Augmented Generation (RAG)Generative AI | Beginner
Retrieval-Augmented Generation (RAG) is a powerful natural language processing (NLP) architecture introduced in this workshop notebook. RAG combines retrieval and generation models, enhancing language understanding and generation tasks. It consists of a retrieval component, which efficiently searches vast text databases for relevant information, and a generation component, often based on Transformer models, capable of producing coherent responses based on retrieved context. RAG's versatility extends to various NLP applications, including question answering and text summarization. Additionally, this notebook covers practical aspects such as indexing content, configuring RAG chains, and incorporating prompt engineering, offering a comprehensive introduction to harnessing RAG's capabilities for NLP tasks.
Introduction to NLPNLP | Beginner
Welcome to the Introduction to NLP workshop! In this workshop, you will learn the fundamentals of Natural Language Processing. From tokenization and stop word removal to advanced topics like deep learning and large language models, you will explore techniques for text preprocessing, word embeddings, classic machine learning, and cutting-edge NLP methods. Get ready to dive into the exciting world of NLP and its applications!
Introduction to RAll Tracks | Beginner
Dive into the world of R programming in this interactive workshop, designed to hone your data analysis and visualization skills. Begin with a walkthrough of the Colab interface, understanding cell manipulation and library utilization. Explore core R data structures like vectors, lists, and data frames, and learn data wrangling techniques to manipulate and analyze datasets. Grasp the basics of programming with iterations and function applications, transitioning into Exploratory Data Analysis (EDA) to derive insights from your data. Discover data visualization using ggplot2, unveiling the stories hidden within data. Lastly, get acquainted with RStudio, the robust Integrated Development Environment, enhancing your R programming journey. This workshop is your gateway to mastering R, catering to both novices and seasoned programmers.