TG/2 Practical Text Generation: Some technical details


For a general overview, you may want to consider the TG/2 flyer first - in case you didn't do that already :-).

TG/2 is a shallow generation system. The notion of shallow generation (not to be mixed up with surface generation!) emphasizes some philosophical similarity to shallow parsing. In both cases, the use of shallow models of language sacrifices the completeness of coverage and many linguistic generalizations. On the other hand, many useful applications can be realized in an easy way. For both shallow parsing and shallow generation, relations to the more comprehensive and theory-based models remain to be established.

While TG/2 produces surface strings as output, it is much more than a surface generator. The most important difference ist that TG/2 can adapt to 'deep' linguistic representations or even domain-semantic representations.

You can see a multilingual application demo with TG/2 at work (in Chinese, French, English, German, Japanese and Portuguese).

TG/2 is based on restricted production system techniques that preserve modularity of processing and linguistic knowledge, hence making the system transparent and reusable for various applications. Here is an overview of TG/2's architecture that is discussed in the sequel.

Generation rules are written in the language TGL, expressing preconditions and actions in a uniform format (for more details on TGL see [Busemann 1996,1998]). A context-free backbone allows the system to select rules on the basis of their categories (the left-hand side category is part of the preconditions; the right-hand side categories are each assigned to an action).

Input to TG/2 is first translated into a system-internal language - in the figure called GIL - in order to abstract away from many application-driven requirements on the input structure representations. A GIL structure is fed to the generation engine, which performs the three-step processing cycle known from AI production systems on the available TGL rules:

The processing strategy for constructing derivations is top-down and depth-first. The set of actions in a rule is fired from left to right. Each TGL rule may pick up some part of the current input structure, which forms the input for some action. If a TGL rule fails, backtracking is used to try another applicable rule.

The interpreter yields all formulations the grammar can generate. It attempts to generate and output a first formulation, producing possible alternatives only on external demand. The order in which formulations are generated can be influenced by parameterizing the generic backtracking mechanism. This method also allows the user to have the system generate a preferred formulation first. [Busemann 1996, 1998] and [Wein 1996] give the details.

A current survey of ten years of TG/2 development and usage is found in [Busemann 2005].

Larger TG/2 grammars are nowadays being developed using the development environment eGram implemented in Java [Busemann 2004].  The  rule format is easier to understand, syntactic consistency is checked, and grammars can easily be tested with TG/2 or XtraGen, the sister implementation in Java [Stenzhorn 2002].


Many thanks to Matthias Rinck for considerably improving and debugging the system, to Michael Wein, who implemented a first version of the interpreter and the backtracking mechanism (and who drew the above picture), and to Jan Alexandersson for influential work on an early version of the system. Work on TG/2 was partially funded by the German Minister for Research and Technology (BMBF) under contract ITW~9402 (project COSMA, 1994-1996) and by the European Commission (Telematics Application Program) under contracts C9-2945 (project TEMSIS, 1996-1998) and MLIS 5015 (project MUSI, 2000-2001).

Publications on TG/2

(for download of the paper search in the LT publications page)

Publications on Applications Using TG/2

(for download of the paper search in the LT publications page)

last modified: October 24, 2005 Stephan Busemann (