Human-Machine Corpus Analysis for Generation and Interaction with Spoken Dialog Systems

Roland Roller, Tatjana Scheffler, Norbert Reithinger

In: Joscha Bach , Stefan Edelkamp (Hrsg.). KI 2011: Advances in Artificial Intelligence. German Conference on Artificial Intelligence (KI-2011) October 4-7 Berlin Germany Seiten 272-276 Lecture Notes in Artificial Intelligence (LNAI) 7006 ISBN 978-3-642-24454-4 Springer 2011.


This paper describes a new approach to language generation for simulated users based on the construction of flexible templates extracted from a corpus. In our opinion a realistic user simulation on the speech level is based on two parts: user behavior and language generation. In this work we mainly concentrate on the language generation for simulated user interaction with spoken dialog systems (SDS). The presented approach could be used as part of a user simulation for intensive end-to-end system tests and evaluations and for testing purposes of the speech recognition and natural language understanding modules of an SDS. We present our semi-automatic analysis of a human-machine corpus, the corpus-based language generation process, which generates realistic user replies on the basis of their usage frequency and verbosity, and a speech enrichment approach to increase the variability of the output. We demonstrate in user simulation experiments realized with synthesized speech, that the generated output is comparable in its variability to the utterances of human testers.


Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence