Safira - Related Software - by September 2001

Speech Markup Languages

  • SABLE: A Standard for TTS Markup. SABLE is an XML (Extensible Markup Language)/SGML (Standard Generalized Markup Language)-based markup scheme for text-to-speech synthesis, developed to address the need for a common TTS control paradigm. SABLE is based in part on two previous proposals by a subset of the present authors: the Spoken Text Markup Language, STML and its earlier version SSML as well as the Java Speech Markup Language JSML.
  • JSML: Java Speech Markup Language. The Java Speech Markup Language (JSML) is used by applications to annotate text input to Java Speech API speech synthesizers. The JSML elements provide a speech synthesizer with detailed information on how to say the text. JSML includes elements that describe the structure of a document, provide pronunciations of words and phrases, and place markers in the text. JSML also provides prosodic elements that control phrasing, emphasis, pitch, speaking rate, and other important characteristics. Appropriate markup of text improves the quality and naturalness of the synthesized voice. JSML uses the Unicode character set, so JSML can be used to mark up text in most languages of the world.

Speech Synthesizers (TTS, DTS)

  • MBROLA. MBROLA is a high-quality, diphone-based speech synthesizer which is available for free. It is provided by the TCTS Lab of the Faculte Polytechnique de Mons (Belgium) which aims to obtain a set a speech synthesizers for as many languages as possible which will be free of use for non-commercial, non-military applications. MBROLA 2.00 takes a list of phonemes as input, together with prosodic information (duration of phonemes and a piecewise linear description of pitch), and produces 16bit speech samples at the sampling frequency of the diphone database (typically 16kHz). It is therefore NOT a Text-To-Speech (TTS) synthesizer, since it does not accept raw text as input.
  • Festival. Festival is a general multi-lingual speech synthesis system developed at CSTR. It offers a full text to speech system with various APIs, as well an environment for development and research of speech synthesis techniques. It is written in C++ with a Scheme-based command interpreter for general control. Festival also supports the speech markup lanugages JSML and SABLE.

Template or Rule based Textgenerators

  • TG/2. Many intelligent applications that provide information or include a need to communicate are characterized by stereotypical formulations instantiating only a limited set of parameters. These applications allow for a complete a-priori structuring of the texts to be generated. This renders it possible to dispense with complex text planning and inference tasks that usually form a prerequisite for generating high-quality, readable text. It is perfectly adequate to address the problem with a more handsome approach.
    It is for this kind of application that TG/2 offers a solution. TG/2 is organized as a classical production system, separating the generation grammar rules from their interpreter. Generation rules are defined as condition-action pairs. They can represent not only templates but also context-free grammar rules, using category symbols. Integrating these rule types into a single formalism allows for a shallow modelling of language where this proves sufficient, or for more fine-grained models whenever necessary. The interpreter consists of the standard three-step processing cycle: determine the set of applicable rules, select one from this set, and execute it.
    Other features include personalized formulations through generic parameter settings and adjustable output formats (e.g. HTML, LaTeX).

Safira DFKI Main | Affective Speech Module | Related Research | ASM Software
Contact: Elisabeth André, Patrick Gebhard, © DFKI GmbH, Last modified: Mon Oct 7 11:01:57 CEST 2002