What Do Neural Embedding Based Translation Algorithms Tell Us About Language Similarity?

Abstract: In this talk we review insights generated from training and then analyzing several neural embedding based machine learning translation algorithms. We use several methodologies to train the linear translation matrices that map neural word embeddings of the vocabulary of one language to the vocabulary in a second language. The training process is conducted to ensure that we generate translation matrices that meet the mathematical properties required to conduct a spectral decomposition of the matrix, enabling us to (in effect) have a representation vector of the inter language relationships. Analysis of these translation """"spectra"""" provides mathematical criteria for measuring similarity between different languages. Results confirm several hypotheses based on linguistic entomology about (for example) similarity between romance languages vs English or east Asian languages.

Bio: "Amit is a Research Fellow at Skymind Labs. His research interests include natural language processing and deep learning and uses them
to perform machine translation and develop scalable data pipelines."