• DFKI

The VOICE Awards Corpus

Facts

1970human-machine dialogs
120unique dialog systems
25domains
> 23.000user utterances
> 500.000words

Available Data

Data/Annotations
Audio
Transcription
Dialog acts
Error/Miscommunication
Repetitions
Dialog success
User ratings (per user/system)
Domain (per system)

Description

The annual competition “VOICE Awards” is an evaluation of commercially deployed spoken dialog systems from the German speaking area. Since 2004, the best German spoken dialog applications are entered in this benchmarking evaluation, where they are tested by novice and expert users. The corpus consists of the available recordings from this competition, including the years 2005-2009.

The corpus represents a large breadth of dialog systems and constitutes a cut through the current state of the art in commercially deployed German SDSs. Altogether, there are 150 dialog systems in the corpus, with a total of 1970 dialogs. A few dialog systems were entered in the VOICE Award contest in several consecutive years. Since there are usually differences between each year’s systems in these cases, a system is counted as many times as it was entered.

In each year of the competition, several novice users were asked to call the dialog systems to be tested and perform a given task in each of them. The task was pre-determined by the expert testers according to the developers’ system descriptions, and the same for all users. After completing the task, the users filled out satisfaction surveys which comprised the bulk of the evaluation for the award. In addition to these naïve callers, two experts interacted with each system and performed more intensive tests. These interactions are only in some cases included in the corpus.

The VOICE Award corpus was hand-annotated with the NITE XML Toolkit on three levels on top of the transcription. The domains of annotation are dialog acts, markers of miscommunication, and measures of task success, as well as repetitions. The annotations are used for learning of dialog and error strategies by a user simulation. Therefore, only data that can be obtained in real-time by a running spoken dialog application (user simulation) is considered for mark-up.

Further, the corpus contains content and goal domain classifications of the systems.