Patient Level Prediction with Supervised Learning Models in Federated Data Networks


Global federated data networks with patient level data have emerged in recent years thanks to the adoption of data standards such as the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). The Observational Health Data Sciences and Informatics (OHDSI) community has leveraged these data networks and developed a standardized, open-source patient level prediction framework to provide reliable and reproducible methods to generate medical evidence.

In this tutorial we will review the fundamentals of standardized data sources in federated health data networks and describe the data standards that enable open-source software development. We will demonstrate how to use a suite of patient level prediction tools to develop data-driven prediction models using standardized observational health data. The session will include the review a broad set of supervised learning models including regularized logistic regression, random forest, gradient boosting machines and neural networks. The flexibility of the approach will be highlighted by showing how it enables supervised learning models from Python, R, and JAVA to be integrated into the framework.

Session Outline:

Section 1: Observational data standards for federated data networks
Gain an understanding of how standardized data models enable open-source development of tools for data characterization and evidence generation including patient level prediction.

Section 2: Review the open source Patient Level Prediction package to learn how machine learning can be applied to a global network of federated health data sources to build models using a broad set of supervised learning models.

Background Knowledge:

Understanding of Machine Learning required


Frank DeFalco is the Director of Epidemiology Analytics at Janssen Research and Development where he architects software solutions and data platforms for the analysis and application of observational data sources. He is currently the leader and Benevolent Dictator of the OHDSI open source architecture working group. Frank is a presenter and panelist at OHDSI symposiums and has served as faculty for OHDSI symposium tutorials classes on architecture and common data model vocabulary.

In addition to leading the OHDSI Architecture working group Frank initiated development of a standardized platform for observational analytics known as ATLAS. He is an active contributor to the open source software repositories developed and released by OHDSI including ATLAS, WebAPI, Achilles, Circe, Arachne, Visualizations, Hermes, Helios and others. Frank’s areas of expertise include computation epidemiology, large scale data platforms, software development and architecture, data visualization and informatics. Prior to joining Janssen Research and Development, Frank held the position of Senior Principal and Director of Collaboration and Analytics at British Telecom where he was a strategic advisor for multiple Fortune 100 companies across sectors including Consumer Products, Telecommunications and Pharmaceuticals. Frank received his undergraduate degrees in Computer Science and Psychology at Rutgers University."

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google