Introduction to Natural Language Processing in Healthcare

Abstract: Healthcare is an industry that is greatly benefiting from data science and machine learning. To successfully build predictive models, healthcare data scientists must extract and combine data of various types (numerical, categorical, text, and/or images) from electronic medical records. Unfortunately, many clinical signs and symptoms (e.g. coughing, vomiting, or diarrhea) are often not captured with numerical data and are usually only present in the clinical notes of physicians and nurses.

In this workshop, the audience will build a machine learning model to predict unplanned hospital readmission with discharge summaries using the MIMIC III data set. Throughout the tutorial, the audience will have the opportunity to prepare data for a machine learning project, preprocess unstructured notes using a bag-of-words approach, build a simple predictive model, assess the quality of the model and strategize how to improve the model. Note to the audience: the MIMIC III data set requires requesting access in advance, so please request access as early as possible.

Bio: Andrew Long is a Data Scientist at Fresenius Medical Care North America (FMCNA). Andrew holds a PhD in biomedical engineering from Johns Hopkins University and a Master’s degree in mechanical engineering from Northwestern University. Andrew joined FMCNA last year after participating in the Insight Health Data Fellows Program. At FMCNA, he is responsible for building predictive models using machine learning to improve the quality of life of every patient who receives dialysis from FMCNA. He is currently creating a model to predict which patients are at the highest risk of imminent hospitalization.