Publikation

Heteronym Sense Linking

Lenka Bajčetić; Thierry Declerck; John P. McCrae

In: Proceedings of the eLex 2021 conference. Electronic lexicography in the 21st century (eLex-2021), post-editing lexicography, July 5-7, Brno, Czech Republic, Pages 503-513, Lexical Computing CZ, s.r.o, 7/2021.

Zusammenfassung

In this paper we present ongoing work which aims to semi-automatically connect pronunciation information to lexical semantic resources which currently lack such information, with a focus on WordNet. This is particularly relevant for the cases of heteronyms — homographs that have different meanings associated with different pronunciations — as this is a factor that implies a re-design and adaptation of the formal representation of the targeted lexical semantic resources: in the case of heteronyms it is not enough to just add a slot for pronunciation information to each WordNet entry. Also, there are numerous tools and resources which rely on WordNet, so we hope that enriching WordNet with valuable pronunciation information can prove beneficial for many applications in the future. Our work consists of compiling a small gold standard dataset of heteronymous words, which contains short documents created for each WordNet sense, in total 136 senses matched with their pronunciation from Wiktionary. For the task of matching WordNet senses with their corresponding Wiktionary entries, we train several supervised classifiers which rely on various similarity metrics, and we explore whether these metrics can serve as useful features as well as the quality of the different classifiers tested on our dataset. Finally, we explain in what way these results could be stored in OntoLex-Lemon and integrated to the Open English WordNet.

Projekte

Pret-a-LLOD - Scalable Open Linked Data environment

eLex_2021_32_pp503-513(1).pdf (pdf, 545 KB )