Publikation

Automatic Testing and Evaluation of Multilingual Language Technology Resources and Components

Ulrich Schäfer; Daniel Beck

In: Proceedings of the 5th International Conference on Language Resources and Evaluation LREC-2006. International Conference on Language Resources and Evaluation (LREC), Genoa, Italy, 5/2006.

Zusammenfassung

We describe SProUTomat, a tool for daily building, testing and evaluating a complex general-purpose multilingual natural language text processor including its linguistic resources (lingware). Software and lingware are developed, maintained and extended in a distributed manner by multiple authors and projects, i.e., the source code stored in a version control system is modified frequently. The modular design of different, dedicated lingware modules like tokenizers, morphology, gazetteers, type hierarchy, rule formalism on the one hand increases flexibility and re-usability, but on the other hand may lead to fragility with respect to changes. Therefore, frequent testing as known from software engineering is necessary also for lingware to warrant a high level of quality and overall stability of the system. We describe the build, testing and evaluation methods for LT software and lingware we have developed on the basis of the open source, platform-independent Apache Ant tool and the configurable evaluation tool JTaCo.

sproutomat.pdf (pdf, 254 KB )