Revolutionizing Healthcare with Synthetic Clinical Trial Data


Synthetic data is transforming the field of AI across industries, particularly in healthcare where patient privacy concerns often limit sharing of patient information and datasets. With advancements in generative AI and synthetic data, it is now possible to create realistic synthetic health data that can be used to train machine learning models without compromising patient privacy. Majority of the existing generative models require large-scale datasets for training, which can be challenging in clinical trial settings where data may be limited. In this context, the development of generative models that can accommodate smaller training datasets is critical. During this talk, we explore how synthetic data is breaking down silos and improving clinical trial data sharing capabilities. We discuss how synthetic data can be used to generate simulated heart models of patients, providing doctors with a virtual representation of a patient's heart that can be used for better medical intervention planning. We examine the development of in silico clinical trials and how synthetic data can be used to enhance the simulation processes to create better virtual twins. Augmenting in silico clinical trials with synthetic data can significantly accelerate the integral simulation process and consequently impact the drug development process, reducing costs and improving patient safety. By using synthetic data to create more accurate virtual twins, in silico clinical trials can provide even more reliable results, bringing us one step closer to precision medicine. We introduce Simulants, an open-source Python library that can generate high-quality synthetic clinical trial data at scale, even from small training datasets. The talk will also cover the qualitative and quantitative measures that can be used to assess the quality of synthetic data, including data fidelity and privacy, in the context of clinical trial data.


Afrah Shafquat is a Senior Data Scientist II at Medidata, a Dassault Systemès company where she leads synthetic data solutions in clinical trials. At Medidata, her work focuses on innovative solutions to generate synthetic data, synthetic data evaluation (fidelity and privacy metrics), and new use cases for synthetic data. She has a Ph.D. in Computational Biology from Cornell University and an S.B. in Biological Engineering from Massachusetts Institute of Technology.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google