TG/2 Practical Text Generation: Some technical details

For a general overview, you may want to consider the TG/2 flyer first - in case you didn't do that already :-).

TG/2 is a shallow generation system. The notion of shallow generation (not to be mixed up with surface generation!) emphasizes some philosophical similarity to shallow parsing. In both cases, the use of shallow models of language sacrifices the completeness of coverage and many linguistic generalizations. On the other hand, many useful applications can be realized in an easy way. For both shallow parsing and shallow generation, relations to the more comprehensive and theory-based models remain to be established.

While TG/2 produces surface strings as output, it is much more than a surface generator. The most important difference ist that TG/2 can adapt to 'deep' linguistic representations or even domain-semantic representations.

You can see a multilingual application demo with TG/2 at work (in Chinese, French, English, German, Japanese and Portuguese).

TG/2 is based on restricted production system techniques that preserve modularity of processing and linguistic knowledge, hence making the system transparent and reusable for various applications. Here is an overview of TG/2's architecture that is discussed in the sequel.

Generation rules are written in the language TGL, expressing preconditions and actions in a uniform format (for more details on TGL see [Busemann 1996,1998]). A context-free backbone allows the system to select rules on the basis of their categories (the left-hand side category is part of the preconditions; the right-hand side categories are each assigned to an action).

Input to TG/2 is first translated into a system-internal language - in the figure called GIL - in order to abstract away from many application-driven requirements on the input structure representations. A GIL structure is fed to the generation engine, which performs the three-step processing cycle known from AI production systems on the available TGL rules:

identify all applicable rules,
select an applicable rule (e.g. according to preferences),
fire that rule.

The processing strategy for constructing derivations is top-down and depth-first. The set of actions in a rule is fired from left to right. Each TGL rule may pick up some part of the current input structure, which forms the input for some action. If a TGL rule fails, backtracking is used to try another applicable rule.

The interpreter yields all formulations the grammar can generate. It attempts to generate and output a first formulation, producing possible alternatives only on external demand. The order in which formulations are generated can be influenced by parameterizing the generic backtracking mechanism. This method also allows the user to have the system generate a preferred formulation first. [Busemann 1996, 1998] and [Wein 1996] give the details.

A current survey of ten years of TG/2 development and usage is found in [Busemann 2005].

Larger TG/2 grammars are nowadays being developed using the development environment eGram implemented in Java [Busemann 2004]. The rule format is easier to understand, syntactic consistency is checked, and grammars can easily be tested with TG/2 or XtraGen, the sister implementation in Java [Stenzhorn 2002].

Acknowledgments

Many thanks to Matthias Rinck for considerably improving and debugging the system, to Michael Wein, who implemented a first version of the interpreter and the backtracking mechanism (and who drew the above picture), and to Jan Alexandersson for influential work on an early version of the system. Work on TG/2 was partially funded by the German Minister for Research and Technology (BMBF) under contract ITW~9402 (project COSMA, 1994-1996) and by the European Commission (Telematics Application Program) under contracts C9-2945 (project TEMSIS, 1996-1998) and MLIS 5015 (project MUSI, 2000-2001).

Publications on TG/2

(for download of the paper search in the LT publications page)

Stephan Busemann. Ten Years After: An Update on TG/2 (and Friends), in: Graham Wilcock, Kristiina Jokinen, Chris Mellish, and Ehud Reiter (eds): Proceedings of the Tenth European Natural Language Generation Workshop (ENLG 2005), Aberdeen, 2005, 32-39.
Stephan Busemann. Best-First Surface Realization, in: Donia Scott (ed.): Proceedings of the Eighth International Natural Language Generation Workshop (INLG '96), Herstmonceux, Sussex, 1996, 101-110. Also at the Computation and Language Archive.
Matthias Rinck. Ein Metaregelformalismus für TG/2. Master's Thesis, Institute for Computational Linguistics, University of the Saarland, 2002.
Holger Stenzhorn. XtraGen. A Natural Language Generation System Using Java- and XML-Technologies. Master's thesis, Institute for Computational Linguistics, University of the Saarland, 2002.
Stephan Busemann. A Shallow Formalism for Defining Personalized Text, Workshop Professionelle Erstellung von Papier- und Online-Dokumenten: Perspektiven für die automatische Textgenerierung at the 22nd Annual German Conference on Artificial Intelligence (KI-98), Bremen, September 16-17, 1998.
Michael Wein. Eine parametrisierbare Generierungskomponente mit generischem Backtracking. Master's thesis, Department for Computer Science, University of the Saarland, 1996.

Publications on Applications Using TG/2

(for download of the paper search in the LT publications page)

Stephan Busemann. eGram - a Grammar Development Environment and Its Usage for Natural Language Generation, in Proc. Fourth International Conference on Language Resources and Evaluation (LREC), Lisbon, Portugal, 2004.
Stephan Busemann. Language Generation for Cross-Lingual Document Summarization, in: Hyanye Sheng: Proceedings of the International Workshop on Innovative Language Technology and Chinese Information Processing, Shanghai, 2001. Science Press, Beijing.
Stephan Busemann and Helmut Horacek. A Flexible Shallow Approach to Text Generation, in: E. Hovy (ed.): Proceedings of the Nineth International Natural Language Generation Workshop (INLG '98), Niagara-on-the-Lake, August 1998. Also at the Computation and Language Archive. Related Online Demo
Stephan Busemann and Helmut Horacek. Generating Air-Quality Reports from Environmental Data, in: Tilman Becker, Stephan Busemann, and Wolfgang Finkler (eds.), DFKI Workshop on Natural Language Generation, April 1997, DFKI Document D-97-06, Saarbrücken. Related Online Demo
Helmut Horacek and Stephan Busemann. Towards a Methodology for Developing Application-Oriented Report Generation, in: O. Herzog (ed.): KI-98. 22nd Annual German Conference on Artificial Intelligence, Bremen, September 1998. Related Online Demo
Stephan Busemann, Thierry Declerck, Abdel Kader Diagne, Luca Dini, Judith Klein, and Sven Schmeier, ``Natural Language Dialogue Service for Appointment Scheduling Agents'', in Proc. 5th Conference on Applied Natural Language Processing, Washington, DC., 1997. Also at the Computation and Language Archive.

last modified: October 24, 2005 Stephan Busemann (busemann@dfki.de)