An Integrated Formal Representation for Terminological and Lexical Data included in Classification Schemes

Thierry Declerck, Kseniya Egorova, Eileen Schnur

In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). International Conference on Language Resources and Evaluation (LREC-2018) May 7-12 Miyazaki Japan ELRA Paris 5/2018.


This paper presents our work dealing with a potential application in e-lexicography: the automatized creation of specialized multilingual dictionaries from structured data, which are available in the form of comparable multilingual classification schemes or taxonomies. As starting examples, we use comparable industry classification schemes, which frequently occur in the context of stock exchanges and business reports. Initially, we planned to follow an approach based on cross-taxonomies and cross-languages string mapping to automatically detect candidate multilingual dictionary entries for this specific domain. However, the need to first transform the comparable classification schemes into a shared formal representation language in order to be able to properly align their components before implementing the algorithms for the multilingual lexicon extraction soon became apparent. We opted for the SKOS-XL vocabulary for modelling the multilingual terminological part of the comparable taxonomies and for OntoLex-Lemon for modelling the multilingual lexical entries which can be extracted from the original data. In this paper, we present the suggested modelling architecture, which demonstrates how terminological elements and lexical items can be formally integrated and explicitly cross-linked in the context of the Linguistic Linked Open Data (LLOD).


integrated-formal-representation-2.pdf (pdf, 350 KB)

