The Healthy Approach – Organic Data Enrichment Through Entity Extraction
The Healthy Approach – Organic Data Enrichment Through Entity Extraction


Assembling comprehensive datasets is often the biggest challenge in creating effective analytics and machine learning models. This requires enriching existing datasets with external sources; examples such as product catalogs or company records can complement existing data gaps. However, external data sources can be a time consuming and expensive resource. Instead, what if you could use machine learning to extract previously inaccessible information from existing data? A solution which reliably extracts important data from text can enrich datasets in a repeatable fashion. With a relatively small amount of manual labelling, a Named Entity Recognition (NER) model can be trained to identify and extract entities.
In this session, we discuss how Named Entity Recognition algorithms can expand the accessible information in a dataset by extracting known entities from unstructured attributes. As an example, we’ll use a Recurrent Neural Net (RNN) to identify and retrieve product properties from supply chain datasets. These extracted attributes can be used for downstream data curation and analytics, such as units of measure standardization or price comparison. We benchmark the performance of our algorithms against more traditional extraction methods, including regular expressions. Finally, we show how the RNN models can be provided in a simple API for data enrichment at scale.


Ian is a DataOps Engineer at Tamr, where he works on designing and implementing Tamr’s data solutions for clients. Before Tamr, Ian applied machine learning models to research properties of materials. His previous work includes high-throughput modeling of superelasticity in a novel class of intermetallic crystals. Ian has a PhD in Mechanical Engineering from Colorado State University.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google