Let’s Embed Everything!

Abstract: Word embeddings have emerged as a revolutionary technique in Natural Language Processing in the last decade, allowing machines to read large reams of unlabeled text and automatically answer questions such as What is to man as queen is to woman? The core idea behind word embeddings is to 'embed' each word, using neural networks, into a continuous real-valued space of a few hundred dimensions, and to infer analogies such as the example above using simple mathematical operators. Following the success of word embeddings, there have been massive efforts in both academia and industry to embed all kinds of data, including images, speech, video, entire sentences, phrases and documents, structured data, and even computer programs. These piecemeal approaches are now starting to converge, drawing on a similar mix of techniques. In this talk, I will discuss this ongoing movement that is attempting to embed every conceivable kind of data, sometimes jointly, into rich vector spaces. I will conclude by showing the exciting, still largely unrealized, possibilities in general AI that have become possible through such multi-modal embeddings.

Bio: Mayank Kejriwal is a research scientist and lecturer at the University of Southern California's Information Sciences Institute (ISI). He received his Ph.D. from the University of Texas at Austin. His dissertation involved Web-scale data linking, and in addition to being published as a book, was recently recognized with an international Best Dissertation award in his field. His research is highly applied and sits at the intersection of knowledge graphs, social networks, Web semantics, network science, data integration and AI for social good. He has contributed to systems that are being used by both DARPA and by law enforcement, and he has active collaborations in both academia and industry. He is currently co-authoring a textbook on knowledge graphs (MIT Press, 2018), and has delivered tutorials and demonstrations at numerous conferences and venues, including KDD, AAAI, and ISWC.