
Abstract: Healthcare and biology have complex, interconnected data models. Can graph databases be used to store these data sets? Can machine learning models identify novel relationships in these graph networks to improve prediction accuracy? Is feature extraction or training performance impacted by the use of graph databases? This session will demonstrate how open-source graph databases can be used to model biomedical data. We will show how data pipelines can be used to create complex networks that can integrate biological and clinical data sets for downstream machine learning applications. We will review the approaches available to create graph-based embeddings of these data and use these embedding to generate predictive models. Finally, we will highlight methods to visualize model performance and effectively communicate these approaches and results to organizational stakeholders. At the end of the session, participants will understand the best practices to create graph data models and describe approaches that can be used to efficiently ingest and integrate data into a graph database. They will also be able to explain the benefits of graph database technology, differences in graph embedding methods, and describe approaches to create predictive models from these data. We will provide participants with a list of open-source applications, tools, and open data sets to further explore these concepts and begin to create their own graph networks, with an emphasis in healthcare and life sciences. We will highlight these concepts using Python and Neo4J, but will be generalizable approaches that can be readily transferred to other languages and database technologies.
Background Knowledge
Intermediate proficiency with Python and ML
Bio: Dr. Schulz is a physician scientist with a background in computational healthcare, molecular biology, and virology. Dr. Schulz has over 20 years’ experience in software development with a focus on enterprise system architecture and has a research interests in the management of large, biomedical data sets and the use of real-world data for predictive modeling. At Yale School of Medicine, he has led the deployment of the organization’s data science infrastructure which consists of a composable computing infrastructure to support the development of biomedical AI applications. Dr. Schulz is also a co-founder of Refactor Health, a digital health startup focused on the development of AI-driven digital signatures and automated healthcare DataOps.

Wade Schulz, MD, PhD
Title
Assistant Professor; Director, Center for Computational Health | Yale University
