Abstract: Nausea, vomiting, and diarrhea are words you would not frequently find in a natural language processing (NLP) project for tweets or product reviews. However, these words are common in healthcare. In fact, many clinical signs and patient symptoms (e.g. shortness of breath, fever, or chest pain) are only present in free-text notes and are not captured with structured numerical data. As a result, it is important for healthcare data scientists to be able to extract insight from unstructured clinical notes in electronic medical records.
In this hands-on workshop, the audience will have the opportunity to complete a Python NLP project with doctors’ discharge summaries to predict unplanned hospital readmission. The audience will learn how to prepare data for a machine learning project, preprocess text using a bag-of-words approach, train a few predictive models, evaluate the performance of the models, and strategize how to improve the models. The MIMIC III data set is used in this tutorial and requires requesting access in advance (an artificial dataset will be provided for those without access).
Bio: Andrew Long is a Senior Data Scientist at Fresenius Medical Care North America (FMCNA). Andrew holds a PhD in biomedical engineering from Johns Hopkins University and a Master’s degree in mechanical engineering from Northwestern University. Andrew joined FMCNA in 2017 after participating in the Insight Health Data Fellows Program. At FMCNA, he is responsible for building, piloting, and deploying predictive models using machine learning to improve the quality of life of every patient who receives dialysis from FMCNA. He currently has multiple models in production to predict which patients are at the highest risk of negative outcomes.