. German Research Center for Artificial Intelligence GmbH
User acceptance of a speech dialogue system is critically dependent on the degree of ``naturalness'' realized. This is particularly true of the speech generation and synthesis modules since these form the output ``visible'' to the user. It is therefore of upmost importance that these two modules be able to run in real time. This requirement necessitates an incremental processing.
EFFENDI is a real-time incremental generation module which has been especially developed for use in a speech dialogue system for train inquiries. In order to produce natural sounding speech, the synthesizer requires not only a knowledge of what words to say in what order, but also information about how these words are structurally related to each other. This latter information is expressed acoustically in the form of prosody, i.e. how the voice raises and falls during an utterance, the rhythm, where pauses are set, etc. This prosody is also influenced by the properties associated with given words in the context of an utterance, e.g. the focus of a sentence or certain emphatic elements. This article describes an interface protocol for conveying this information from the generator to the synthesis module and describes how (parts of) this information are derived in the EFFENDI system.