Publikation

Can Pre-training help VQA with Lexical Variations?

Shailza Jolly; Shubham Kapoor

In: Findings of Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing (EMNLP-2020), November 16-20, Online-Conference, ACL, 2020.

Zusammenfassung

Rephrasings or paraphrases are sentences with similar meanings expressed in different ways. Visual Question Answering (VQA) models are closing the gap with the oracle performance for datasets like VQA2.0. However, these models fail to perform well on rephrasings of a question, which raises some important questions like Are these models robust towards linguistic variations? Is it the architecture or the dataset that we need to optimize? In this paper, we analyzed VQA models in the space of paraphrasing. We explored the role of language & cross-modal pre-training to investigate the robustness of VQA models towards lexical variations. Our experiments find that pre-trained language encoders generate efficient representations of question rephrasings, which help VQA models correctly infer these samples. We empirically determine why pre-training language encoders improve lexical robustness. Finally, we observe that although pre-training all VQA components obtain state-of-the-art results on the VQA-Rephrasings dataset, it still fails to completely close the performance gap between original and rephrasing validation splits.

Projekte

DeFuseNN - Deep Fusion für Neuronale Netze

jolly_cameraready.pdf (pdf, 283 KB )