Real-time Field Extraction on Mobile-devices Using Machine Learning


Extracting key-fields from a variety of document types remains a challenging machine learning problem. Services such as AWS and Google Cloud provide text extraction products to ""digitize"" images or pdfs. These return phrases, words and characters with their corresponding coordinate locations. Working with these outputs remains challenging and unscalable as different document types require different heuristics. The speed-limit for extracting information remains in the ~3-5+ seconds range, too slow for utilizing video and AR to present results

We propose a compressed on-device solution that extracts fields in sub-second range, with speeds approaching 200ms on modern devices for field extraction from invoices and receipts. Real-time scanning and extraction opens up new possible product workflows. is working to build a paperless future. We parse through millions of documents a year ranging from invoices, contracts, receipts and a variety of other types. Understanding those documents is critical to building intelligent products for our users.


Eitan is the Chief Data Scientist at and has many years of experience as a researcher. His recent focus is on machine learning, deep learning, applied statistics and software engineering. Before, he was a Postdoctoral Scholar at Lawrence Berkeley National Lab, received his PhD in Physics from Boston University and B.S. in Astrophysics from University of California Santa Cruz. Eitan holds 4 patents and 11 publications to date and has spoken about data at various conferences around the world.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google