Applying State-of-the-art Natural Language Processing for Personalized Healthcare
Applying State-of-the-art Natural Language Processing for Personalized Healthcare


Accelerating progress in personalized healthcare requires learning the causal relationships between diseases, genes, treatments, medications, labs, and other clinical information – at scale over a large population and time range. More than half of the clinically relevant data in oncology is only found in free-text pathology reports, radiology reports, sequencing reports, and progress notes.

Extracting and normalizing these facts from these clinical documents requires training oncology-specific models that can accurately extract these specific facts from a variety of documents. This talk describes results and lessons learned, from a real-world project doing this at scale, in three areas:

1. Applying state-of-the-art deep-learning based NLP models for entity recognition, entity resolution, negation detection, and document segmentation. This is one of the first projects outside a research setting applying BioBERT and we’ll compare versus “vanilla” BERT, share tricks for improving embeddings using vocabularies, and the impact of this form on transfer learning on the ability to learn from small labeled datasets.
2. Using Spark NLP for training and inference of these NLP pipelines – to unify processing from document loading to generating final results, that runs well locally and scales natively on a Spark cluster. We’ll share benchmarks from optimized builds on Intel and Nvidia hardware.
3. Considerations for architecting an AI platform that can process protected health information (PHI) in a secure and compliant way, for both training and inference. This involves operating the whole process – data integration, experimentation, scaling to a cluster, versioning, reproducibility, model deployment – within an air-gap environment without Internet access.


Guneet Walia is a multi-disciplinary scientist with expertise in oncology and infectious disease biology. Experience in scientific alliance development & management, outreach & due diligence for new business development opportunities, and, portfolio strategy.