Skip to main content Skip to main navigation


Human-Machine Corpus Analysis for Generation and Interaction with Spoken Dialog Systems

Roland Roller; Tatjana Scheffler; Norbert Reithinger
In: Joscha Bach; Stefan Edelkamp (Hrsg.). KI 2011: Advances in Artificial Intelligence. German Conference on Artificial Intelligence (KI-2011), October 4-7, Berlin, Germany, Pages 272-276, Lecture Notes in Artificial Intelligence (LNAI), Vol. 7006, ISBN 978-3-642-24454-4, Springer, 2011.


This paper describes a new approach to language generation for simulated users based on the construction of flexible templates extracted from a corpus. In our opinion a realistic user simulation on the speech level is based on two parts: user behavior and language generation. In this work we mainly concentrate on the language generation for simulated user interaction with spoken dialog systems (SDS). The presented approach could be used as part of a user simulation for intensive end-to-end system tests and evaluations and for testing purposes of the speech recognition and natural language understanding modules of an SDS. We present our semi-automatic analysis of a human-machine corpus, the corpus-based language generation process, which generates realistic user replies on the basis of their usage frequency and verbosity, and a speech enrichment approach to increase the variability of the output. We demonstrate in user simulation experiments realized with synthesized speech, that the generated output is comparable in its variability to the utterances of human testers.