Learning a Universal Latent Space for Accelerated Drug Discovery

Abstract: The use of machine learning and data science to accelerate drug discovery has become an area of increased interest over recent years. There are many subproblems to solve in the acceleration of drug discovery, including target identification, ADMET (absorption, distribution, metabolism, excretion, and toxicity) prediction, and identification of the most effective compounds for a given disease. There are also equally many data modalities that can be used to solve them, ranging from chemical structures and biochemical assay endpoints, microscopy imaging, gene expression data, and more. Many approaches to solving these problems involve collecting just the data necessary to answer a very focused question. But what if we could integrate data sets created in the pharmaceutical industry over months or even years and involve multiple data modalities and tasks in order to create a universal latent space from which many problems could be solved simultaneously? By combining some of the latest methods in domain adaptation, multitask learning and conditioning methods for deep neural networks we can create such a latent space containing the information necessary to solve problems ranging from standard classification of various conditions, prediction of compounds effective in rescuing a morphological change, identification of mechanism of action of compounds, prediction of various toxicity endpoints, and more. In this presentation, we explain how this can be done using fluorescent microscopy imaging as a backbone data modality, augmented with additional datasets such as chemical structure, bioassay endpoints, and experimental contexts. This presentation will include a discussion of handling missing data when combining multiple modalities and tasks, overcoming batch effects due to varying experimental conditions, and the challenges of learning on data generated at a rate of over 20TB per week.

Bio: As Recursion Pharmaceutical's VP of Data Science, Mason is focused on upending the pharmaceutical industry by using fluorescent microscopy imaging as a fundamental data modality for understanding biology and rapidly finding treatments in parallel for a large number of diseases. Prior to joining Recursion, Mason earned his MS in mathematics from BYU, where he focused on developing new approaches to identify individuals at-risk for developing chronic kidney disease. After an internship with the NSA, being unable to decide between becoming a spook or pursuing a PhD, he chose the obvious route: join a start-up instead, where he built systems to mine unstructured medical records, analyze human behavior through cell phone logs, and simulate complex call centers. He joined Recursion three years ago to build its data science and machine learning teams and platform.

Open Data Science Conference