Why Knowledge Matters in Natural Language Understanding

Abstract: "Word embeddings" are a useful technique for giving computers an approximate understanding of what words mean. This allows an application to recognize different ways of saying the same thing, to give more intuitive search results, and to respond to what the user meant instead of just the exact words they said. The most popular word-embedding techniques right now, such as Google's word2vec, involve training a learning algorithm that starts out knowing nothing on a large amount of natural language text. What happens when we instead make a system that *does* start out knowing things? I'll show how adding ConceptNet, a large, open knowledge graph, to the learning process creates a more effective representation of meaning. It significantly improves a system's ability to match people's intuitions about what words mean, and works well in many different languages, not just English.

Bio: Rob Speer is the chief scientist at Luminoso. He is an alumnus of the MIT Media Lab, where he worked on the ConceptNet project, an open, multilingual semantic network. With ConceptNet as a foundation, he developed a system that analyzes natural language text and represents its word meanings as embeddings in a vector space, which would become the foundation of Luminoso.At Luminoso, he directs the NLP research and data science that improves the effectiveness of Luminoso's software and makes new techniques possible in text understanding. He supports Luminoso's contributions to open-source software, including "ftfy", a tool for fixing Unicode errors that has been recommended by the Awesome Python list (http://awesome.re).He continues to develop and support ConceptNet, and uses it to produce state-of-the-art multilingual word representations, which perform better than popular systems such as word2vec at matching human intuitions about word meanings.