Practical Generation of Natural Language Text

Practical Generation of Natural Language Text

  • Duration:

Dynamically generated text is necessary for many intelligent applications that provide information or include a need to communicate. Many of these applications - for instance, stock market reports, wheather reports, appointment scheduling dialogues - are characterized by stereotypical formulations instantiating only a limited set of parameters. For instance, in appointment scheduling negotiations, suggestion, acceptance, rejection or counter-suggestion of a date are the basic speech acts. For environmental air quality reports, condensed descriptions of time series and of exceeding threshold values are most important.

These applications allow for a complete a-priori structuring of the texts to be generated, which renders it possible to dispense with complex text planning and inference tasks that usually form a prerequisite for generating high-quality, readable text. It is perfectly adequate to address the problem with a more handsome approach.

It is for this kind of application that TG/2 offers a commercially viable solution. TG/2 can deal with different kinds of input representations.. Flat input, such as a list of domain-specific attribute-value pairs, can be accomodated as well as complex structured logical form representations of sentential semantics.

TG/2 stands for a flexible generation of template-based generators. The system is organized as a classical production system, separating the generation rules from their interpreter. Generation rules are defined as condition-action pairs. They can represent not only templates but also context-free grammar rules, using category symbols. Integrating these rule types into a single formalism allows for a shallow modelling of language where this proves sufficient, or for more fine-grained models whenever necessary. The interpreter consists of the standard three-step processing cycle:

  • determine the set of applicable rules,
  • select one from this set, and
  • execute it.

By simultaneously designing rule systems for multiple languages, one gets multilingual generation almost for free. Developing a rule system for an additional language costs much less effort than the initial system since associations to input structures can be copied in most cases.

TG/2 is implemented in Allegro Common Lisp and runs on Unix and PC platforms.

  • TG/2 offers solutions for limited sublanguages that are tuned towards the domain and the task at hand
  • TG/2 can quickly be accommodated to new tasks (average effort approx. three person-months);
  • TG/2 can be integrated smoothly with deep generation processes;
  • TG/2 integrates canned text, templates, and context-free rules into a single formalism;
  • TG/2 efficiently reuses generated substrings for alternative formulations;
  • TG/2 can be parameterized to produce the preferred formulation first (regarding style, grammar, fine-grained rhetorics etc.).

Publications about the project

Stephan Busemann; Tim vor der Brück

In: Zeitschrift für Sprachwissenschaft, Vol. 26, No. 2, Pages 291-315, de Gruyter, 2007.

To the publication

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz