Tuning the un-tunable: lessons for tuning expensive deep learning functions

Abstract: Tuning models with lengthy training cycles, typically found in deep learning, can be extremely expensive to train and tune. In certain instances, this high cost may even render tuning infeasible for a particular model. Even if tuning is feasible, it is often extremely expensive. Popular methods for tuning these types of models, such as evolutionary algorithms, typically require several orders of magnitude the time and compute as other methods. And techniques like parallelism often come with a degradation of performance trade-off that results in the use of many more expensive computational resources. This leaves most teams with few good options for tuning particular expensive deep learning functions.

But new methods related to task sampling in the tuning process create the chance for teams to dramatically lower the cost of tuning these models. This method, referred to as multitask optimization, combines “strong anytime performance” from bandit-based methods with “strong eventual performance” of Bayesian optimization. As a result, this process can unlock tuning for some deep learning models that have particularly lengthy training and tuning cycles.

During this talk, Patrick Hayes, Chief Technology Officer & Co-Founder of SigOpt, walks through a variety of methods for training models with lengthier training cycles before diving deep on this multitask optimization functionality. The rest of the talk will focus on how this type of method works and explain the ways in which deep learning experts are deploying it today. Finally, we will talk through implications of early findings in this area of research and next steps for exploring this functionality further. This is a particularly valuable and interesting talk for anyone who is working with large data sets or complex deep learning models.

Bio: Patrick is happiest when building the most efficient architecture to reliably scale complex systems. He is responsible for the innovation and evolution of SigOpt’s products, and for evangelizing the value they bring to our customers. Prior to SigOpt, Patrick led engineering efforts at Foursquare to develop passive local recommendations and supported a team that build a more scalable approach to user growth experimentation. Before Foursquare, Patrick was a software engineer at Facebook and Wish responsible for building systems that scaled to tens of millions of users. Patrick holds a Bachelor of Mathematics in Computer Science and Pure Mathematics from the University of Waterloo.

Open Data Science Conference