Gaining Disease Insights Using Machine Learning: Predicting Disease Diagnosis, Progression, and Treatment Patterns in Real World Evidence Datasets

Abstract: Real world evidence (RWE) healthcare databases such as claims and electronic health records (EHRs) are becoming increasingly used and valued for conducting a wide range of analysis in the healthcare and pharmaceutical industry. They are important from a commercial perspective in understanding market trends, and global regulators and payers are increasingly requiring evidence of real world product value, safety, and effectiveness. Machine learning techniques are now enabling RWE databases to be used not only to understand the efficacy of drugs in a real world setting, but also to gain insights into the disease itself. The RWE databases contain anonymized patient-level longitudinal information including clinical diagnoses, treatments, procedures, medications, laboratory tests, and patient demographics. Machine learning models have the ability to handle large numbers of these diverse variables and analyze the complex relationships and interactions among them. Additional challenges with using US claims or EHR data arise from the fact that the US healthcare system is a multi-payer and provider system financed and delivered through a combination of private and public resources. Machine learning models are well suited to handle the systemic differences from diverse claims and EHR systems across various US providers. For these reasons, machine learning is now being applied to RWE databases to gain disease insights such as predicting early detection and diagnosis, progression, and treatment patterns. It is also being used to further our understanding of risk factors, disease sub-types, and patient journeys. Case studies will be described such the prediction of pulmonary arterial hypertension (PAH) diagnosis, severity of a stroke event, and treatment progression in Crohn's Disease (CD). Ultimately, these disease insights will help to improve the quality of care for patients, shift the focus from treating diseases to preventing diseases, and enable the delivery of more personalized medicine.

Bio: Kathryn joined the Data Sciences team at Janssen R&D (Pharmaceuticals company of Johnson & Johnson) in 2014, and her main areas of expertise are machine learning, predictive modeling, and advanced analytics. She has worked in multidisciplinary teams delivering data driven insights to business partners across the value chain of Janssen. As part of the Machine Learning and Advanced Algorithms group, she is helping to shape the analytic capabilities of Janssen through the implementation of innovative technology and solutions. Kathryn holds a bachelor’s degree in Chemistry and Physics from Dartmouth College, and a Ph.D. in Physical Chemistry from the University of Oxford, where she was awarded the Rhodes Scholarship.

Open Data Science Conference