Identify Heart Disease Risk Factors from Clinical Notes
Identify Heart Disease Risk Factors from Clinical Notes


There is a wealth of information in clinical text, that can be leveraged using NLP and deep learning techniques. In 2014, i2b2 (Integrating Biology and the Bedside) issued a shared task challenge of identifying risk factors for heart disease in clinical narratives. This work proposes a novel way of using contextual embeddings (first of its kind on this shared task dataset) as well as by stacking embeddings to improve on work that has already been done in this space as part of the i2b2 2014 challenge.
Conceptually, stacking embeddings is simply a way to combine multiple embeddings and has demonstrated good results on the i2b2 heart disease risk factors challenge dataset. By stacking BERT and character embeddings (BERT-CHAR Embedding), we have achieved an F1 score of 93.07% on the test set, restricting to 4 out of 8 of the indicators for heart disease. This is an encouraging outcome of our experiment towards further research that could surpass the existing benchmark of deep learning models built on this shared task. Our work also highlights the fact that use of contextual embeddings further enhances the power of NLP & deep learning. This research work is a step towards an implementation that has the potential to beat the current state-of-the-art results with minimal feature engineering and offer a solution that can perform better than human annotators.
I like to present the work done as part of this research initiative and how we have used BERT embeddings as well as classifiers, leveraging Google Cloud TPUs for the compute power.


Sudha Subramanian is a Data Scientist and NLP enthusiast. Sudha has over 24 years of professional experience that encompasses statistical analysis as well as building analytics-based solutions.

Sudha has worked on multiple assignments that included building custom algorithms and applied machine learning techniques for business, government and non-profit organizations such as Delta Analytics and Data Kind. She is also active in the community via organizations such as R-Ladies and WiML. Sudha has demonstrated the ability to leverage the power of data through exploration and statistical modeling techniques to convey a story that influences business decisions.

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google