Abstract: "The web is overflowing with knowledge buried in unstructured sources ready to be extracted via analysis of text, tabular data, page structure, and semantic annotations. In this talk we will explore our AI engine for unsupervised full-site web extraction from millions of websites everyday. Supervised machine learning methods fuse these distinct information sources and link them into our domain-specific probabilistic knowledge graph with already over 275 million facts mined. We will share practical learnings on how we combine traditional information extraction techniques with recent advancements in deep learning for a variety of NLP tasks such as entity-level sentiment and relation extraction on 10’s of millions of new documents a day across 15 different languages. Finally, we will highlight how we are democratizing Insight building to empower a growing external developer community for next generation Competitive Intelligence applications by opening up our data, our wrangling platform, and our interoperable NLP pipelines, ready to be distributed to our 28,000 enterprise customers globally.
Bio: Aditya Jami is the Senior Director of Engineering at Meltwater and oversees all AI and Big Data initiatives by teams comprised of experienced researchers from top universities (Oxford, Stanford, etc.), big data engineers with prior industry experience at Netflix, Walmart, Yahoo, MongoDB, and committers of popular open source projects. He was the chief software architect of a cross university joint project called RoboBrain, a large-scale computational system that learns from publicly available Internet resources, computer simulations, real-life robot trials, and was listed in the Top 10 Breakthrough technologies by MIT TechReview 2016. He built the first version of Chaos Monkey (later extended to Simian Army) that ran on Netflix production cloud and its open source version is widely adapted by many companies. Prior to that at Yahoo, he worked on Datahighway, a real time data platform that collected and analyzed 300 Billion events with a hardware footprint of 500K nodes to power its news feed recommendation and behavioral advertising models.