Neural Networks for Information Extraction from Financial Documents

Abstract: Paying invoices can be quite a hassle. Visiting a bank, standing in a queue, manually filling forms takes a lot of time. Using online banking does not completely solve the problem either since you still need to type a lot. In this talk we shall give an overview of how one could automate the process of information extraction from financial documents using basic and advanced recurrent neural networks. We shall describe one such system that uses both traditional and neural based approaches, the pros and cons of each and potential improvements. We shall provide the information about extraction rates on real life documents and how neural based approaches scale in production in contrast to some sandboxed environment and artificially created datasets. In the end we shall also share the difficulties and setbacks we experienced when moving the prototypes into production.

Bio: Pavel Shkadzko is a semantics engineer at Gini GmbH working on a neural based solution to improve the quality of information extraction from financial documents. He has conducted research in natural language generation and automatic semantic role labelling domain using neural and statistical approaches. Pavel acquired a master degree in Language Science and Technology from Saarland University (Saarbrucken) in 2018.

Open Data Science Conference