DFKI-LT - Proceedings of the 2nd Workshop on Collaboration and Computing for Under-Resourced Languages

Claudia Soria, Laurette Pretorius, Thierry Declerck, Joseph Mariani, Kevin Scannell, Eveline Wandl-Vogt (eds.)
Proceedings of the 2nd Workshop on Collaboration and Computing for Under-Resourced Languages
1 Portoro¸, Slovenia, European Language Resources Association (ELRA), Paris, 5/2016
The LREC 2016 Workshop on “Collaboration and Computing for Under-Resourced Languages: Towards an Alliance for Digital Language Diversity” (CCURL 2016) explores the relationship between language and the Internet, and specifically the web of documents and the web of data, as well as the emerging Internet of things, is a growing area of research, development, innovation and policy interest. The emerging picture is one where language profoundly affects a person’s experience of the Internet by determining the amount of accessible information and the range of services that can be available, e.g. by shaping the results of a search engine, and the amount of everyday tasks that can be carried out virtually. The extent to which a language can be used over the Internet or in the Web not only affects a person’s experience and choice of opportunities; it also affects the language itself. If a language is poorly or not sufficiently supported to be used over digital devices, for instance if the keyboard of the device is not equipped with the characters and diacritics necessary to write in the language, or if there is no spell checker for a language, then its usability becomes severely affected, and it might never be used online. The language could become “digitally endangered”, and its value and profile could be lessened, especially in the eyes of new generations. On the other hand, concerted efforts to develop a language technologically could contribute to the digital ascent and digital vitality of a language, and therefore to digital language diversity. These considerations call for a closer examination of a number of related issues. First, the issue of “digital language diversity”: the Internet appears to be far from linguistically diverse. With a handful of languages dominating the Web, there is a linguistic divide that parallels and reinforces the digital divide. The amount of information and services that are available in digitally less widely used languages are reduced, thus creating inequality in the digital opportunities and linguistic rights of citizens. This may ultimately lead to unequal digital dignity, i.e. uneven perception of a language importance as a function of its presence on digital media, and unequal opportunities for digital language survival. Second, it is important to reflect on the conditions that make it possible for a language to be used over digital devices, and about what can be done in order to grant this possibility to languages other than the so-called “major” ones. Despite its increasing penetration in daily applications, language technology is still under development for these major languages, and with the current pace of technological development, there is a serious risk that some languages will be left wanting in terms of advanced technological solutions such as smart personal assistants, adaptive interfaces, or speech-to- speech translations. We refer to such languages as under-resourced. The notion of digital language diversity may therefore be interpreted as a digital universe that allows the comprehensive use of asmany languages as possible.
