You are here
Cross-linguality and machine translation without bilingual data
Short bio: Eneko Agirre is Professor at the University of the Basque Country and member of the IXA Natural Language Processing group. His research focuses on lexical and computational semantics, with applications in information retrieval and machine translation. He has produced more than 100 peer-reviewed articles. He has been president of the ACL SIGLEX, member of the editorial board of Computational Linguistics, and has received two Google research awards. He is currently an action editor of the TACL journal.
Machine translation is one of the most successful text processing application. Current state-of-the-art systems leverage large amounts of translated text to learn how to translate, but is it possible to translate between two languages without having any bilingual data? In this presentation we will show that this is indeed the case. We will first map the word embedding spaces of two languages to each other, with and without seed bilingual dictionaries. This allows to produce accurate bilingual dictionaries based on monolingual corpora alone, with the same quality as supervised methods. Based on these mappings, it is then possible to train machine translation systems without accessing any bilingual data.