Software Knowledge Base Construction from Scientific Articles


Most of the cutting-edge science is built on scientific software, which makes scientific software often as important as scholarly articles. Despite that, the software is not always treated as such, especially when it comes to funding, credit, and citations. Additionally, with the ever-growing number of open-source software tools, it is impossible for many researchers to track tools, databases, and methods in a specific field.
In this work, we leverage machine learning to automate systematic cataloging and build a comprehensive and queryable knowledge base of scientific software mined from scientific articles (including preprints) with a focus on the CORD19 dataset.


Ivana Williams is a Staff Research Scientist at the Chan Zuckerberg Initiative. She is passionate about delivering state of the art machine learning and data science solutions in support of accelerating scientific discovery, unlocking insights from scientific publications and delivering personalized content. Her recent research focuses on novel approaches to data representation and automated knowledge base construction.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google