The SALSA corpus: A German corpus resource for lexical semantics

Aljoscha Burchardt, Katrin Erk, Anette Frank, Andrea Kowalski, Sebastian Pado, Manfred Pinkal

In: Proceedings of LREC 2006. International Conference on Language Resources and Evaluation (LREC) Genoa Italy Seiten 969-974 2006.


This paper describes the SALSA corpus, a large German corpus manually annotated with role-semantic information, based on the syntactically annotated TIGER newspaper corpus (Brants et al., 2002). The first release, comprising about 20,000 annotated predicate instances (about half the TIGER corpus), is scheduled for mid-2006. In this paper we discuss the frame-semantic annotation framework and its cross-lingual applicability, problems arising from exhaustive annotation, strategies for quality control, and possible applications.

lrec06_burchardt1-1.pdf (pdf, 130 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence