Virtually all NLP systems nowadays use vector representations of words, a.k.a. word embeddings. Similarly, the processing of language combined with vision or other sensory modalities employs multimodal embeddings. While embeddings do embody some form of semantic relatedness, the exact nature of the latter remains unclear. This loss of precise semantic information can affect downstream tasks. The goals of IMPRESS are to investigate the integration of semantic and common sense knowledge into linguistic and multimodal embeddings and the impact on selected downstream tasks. IMPRESS will also develop open source software and lexical resources, focusing on video activity recognition as a practical testbed. Furthermore, while there is a growing body of NLP research on languages other than English, most research on multimodal embeddings is still done on English. IMPRESS will consider a multilingual extension of the developed methods to handle French, German and English.
- DFKI 2. INRIA