Speed up Machine Learning Workflow’s in the Cloud with Lightning


It’s estimated to cost around $4.6 million US dollars and 355 years to train GPT-3 on a single GPU in 2020. Training large models in the cloud successfully requires optimization optimizations that we get out of the box with the open source library, lightning, by Lightning AI. In this workshop will walk through how to use lightning to speed up machine learning workflows in the Cloud. We will begin with an introduction to the different methods of speeding up training along with their cost implications and technical complexities. Then we’ll learn how to leverage key features of the lightning library like LightningDataset and multi-GPU training to speed up our training workflows in the cloud. By the end of this workshop, you should be able to train a basic model with fast data loading on multiple GPU’s.

Session Outline:

Lesson 1: Overview & Environment setup
Learn why training fast is important and the impact it has on costs. We’ll review the current challenges with efficient training and how Lightning was built to to solve those challenges. Bring your computer so you can setup a basic model that we’ll learn how to train efficiently together.

Lesson 2: Large datasets
Understand the cost considerations when working with large datasets on the cloud. We’ll also review the most common libraries for training with large datasets and learn how to create a custom LightningDataset that efficiently works with the Imagenet dataset on S3.

Lesson 3: Mutli-GPU
Here we’ll go over the cost and operational complexities of mutli-GPU training and learn how to use Lightning’s out-of-the-box multi-GPU support.

Lesson 4 (optional): Further challenges
What happens when training doesn’t fit on 1 GPU? If time allows, we’ll talk about some of the ongoing challenges with large-scale training and how Lightning is constantly evolving to solve the hardest and most common challenges.


Background Knowledge:

python, elementary knowledge of ML


Daniela Dapena is a research scientist at Lightning AI, where she works on different deep-learning models and front-end development. She obtained her Ph.D. from the University of Delaware in 2022, where her research focused on graph signal processing and its applications to machine learning. She holds a bachelor’s degree in electrical engineering from the University de Los Andes in Merida, Venezuela, obtained in 2018.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google