Abstract: The field of AI is advancing at unprecedented speed in the past few years, due to the rise of large-scale, self-supervised pre-trained models (a.k.a. “foundation models”), such as GPT-3, GPT-4, ChatGPT, Chinchilla, LLaMA, CLIP, DALL-e, StableDiffusion and many others. Impressive few-shot generalization capabilities of such models on a very wide range of novel tasks appear to emerge primarily due to the drastic increase in the size of the models, training data and compute resources. Thus, predicting how the model’s performance and other important characteristics (e.g., robustness, truthfulness, etc) scale with the data, model and compute became a rapidly growing area of research in the past couple of years. Neural scaling laws serve as “investment tools” suggesting optimal allocation of compute resources w.r.t. various aspects of a training process (e.g., model size vs data size ratio), and better compare different architectures and algorithms, predicting which ones will stand the test-of-time as larger compute resources become available. Last, but not least, accurate prediction of emerging behaviors in large-scale AI systems is essential from AI Safety perspective. In this talk, we will present a brief history of neural scaling laws, with the focus on our recent Broken Neural Scaling Laws that generalize previously observed scaling laws to more complex behaviors, metrics, and settings.
While the recent advances in foundation models are truly exciting, they also pose a new challenge to academic and non-profit AI research organizations which historically had no access to the level of compute resources available in industry. This motivated us – a rapidly growing international collaboration across several Universities and non-profit organizations, including U of Montreal/Mila, LAION, EleutherAI, and many others – to join forces and initiate an effort towards developing common objectives and tools for advancing the field of open-source foundation models, in order to avoid accumulation of state-of-the-art AI in a small set of large companies and facilitate democratization of AI. We will overview our recent effort in obtaining large compute resources (e.g., Summit supercomputer) and ongoing large-scale projects we are working on.
Bio: Irina Rish is an Associate Professor in the Computer Science and Operations Research Department at the Université de Montréal (UdeM) and a core faculty member of MILA - Quebec AI Institute. She holds Canada Excellence Research Chair (CERC) in Autonomous AI and a Canadian Institute for Advanced Research (CIFAR) Canada AI Chair. She received her MSc and PhD in AI from University of California, Irvine and MSc in Applied Mathematics from Moscow Gubkin Institute. Dr. Rish's research focus is on machine learning, neural data analysis and neuroscience-inspired AI. Before joining UdeM and MILA in 2019, Irina was a research scientist at the IBM T.J. Watson Research Center, where she worked on various projects at the intersection of neuroscience and AI, and led the Neuro-AI challenge. She received multiple IBM awards, including IBM Eminence & Excellence Award and IBM Outstanding Innovation Award in 2018, IBM Outstanding Technical Achievement Award in 2017, and IBM Research Accomplishment Award in 2009. Dr. Rish holds 64 patents, has published over 80 research papers in peer-reviewed conferences and journals, several book chapters, three edited books, and a monograph on Sparse Modeling.