TG/2 Practical Text Generation: Some technical details
|
|
For a general
overview, you may want to consider the
TG/2
flyer first - in case you didn't do that already :-).
TG/2 is a shallow generation system. The notion of shallow
generation
(not to be mixed up with surface generation!) emphasizes some
philosophical
similarity to shallow parsing. In both cases, the use of shallow models
of language sacrifices the completeness of coverage and many linguistic
generalizations. On the other hand, many useful applications can be
realized
in an easy way. For both shallow parsing and shallow generation,
relations
to the more comprehensive and theory-based models remain to be
established.
While TG/2 produces surface strings as output, it is much more than
a surface generator. The most important difference ist that TG/2 can
adapt
to 'deep' linguistic representations or even domain-semantic
representations.
You can see a multilingual application
demo with TG/2 at work (in Chinese, French, English, German,
Japanese and Portuguese).
TG/2 is based on restricted production system techniques that
preserve
modularity of processing and linguistic knowledge, hence making the
system
transparent and reusable for various applications. Here is an overview
of TG/2's architecture that is discussed in the sequel.
Generation rules are written in the language TGL, expressing
preconditions
and actions in a uniform format (for more details on TGL see [Busemann
1996,1998]).
A context-free backbone allows the system to select rules on the basis
of their categories (the left-hand side category is part of the
preconditions;
the right-hand side categories are each assigned to an action).
Input to TG/2 is first translated into a system-internal language
- in the figure called GIL - in order to abstract away from many
application-driven requirements on
the input structure representations. A GIL structure is fed to
the generation
engine, which performs the three-step processing cycle known from AI
production
systems on the available TGL rules:
- identify all applicable rules,
- select an applicable rule (e.g. according to preferences),
- fire that rule.
The processing strategy for constructing derivations is top-down and
depth-first.
The set of actions in a rule is fired from left to right. Each TGL rule
may pick up some part of the current input structure, which forms the
input
for some action. If a TGL rule fails, backtracking is used to try
another
applicable rule.
The interpreter yields all formulations the grammar can generate. It
attempts to generate and output a first formulation, producing possible
alternatives only on external demand. The order in which formulations
are
generated can be influenced by parameterizing the generic backtracking
mechanism. This method also allows the user to have the system generate
a preferred formulation first. [Busemann
1996,
1998]
and [Wein 1996] give the details.
A current survey of ten years of TG/2
development and usage is found in [Busemann 2005].
Larger TG/2 grammars are nowadays being developed using the
development environment eGram implemented in Java [Busemann 2004].
The rule format is easier to understand, syntactic consistency is
checked, and grammars can easily be tested with TG/2 or XtraGen, the
sister implementation in Java [Stenzhorn 2002].
Acknowledgments
Many thanks to Matthias Rinck for considerably improving and debugging
the system, to Michael Wein, who implemented a first version of the
interpreter
and the backtracking mechanism (and who drew the above picture), and to
Jan Alexandersson for influential work on an early version of the
system.
Work on TG/2 was partially funded by the German Minister for Research
and
Technology (BMBF) under contract ITW~9402 (project COSMA,
1994-1996) and by the European Commission (Telematics Application
Program)
under contracts C9-2945 (project TEMSIS,
1996-1998) and MLIS 5015 (project MUSI,
2000-2001).
Publications on TG/2
(for download of the paper search in the LT
publications page)
- Stephan Busemann. Ten Years After: An Update on TG/2 (and
Friends), in: Graham Wilcock, Kristiina Jokinen, Chris Mellish, and
Ehud Reiter (eds): Proceedings of the Tenth European Natural
Language Generation
Workshop (ENLG 2005), Aberdeen, 2005, 32-39.
- Stephan Busemann. Best-First Surface Realization, in: Donia Scott
(ed.): Proceedings of the Eighth International Natural Language
Generation
Workshop (INLG '96), Herstmonceux, Sussex, 1996, 101-110. Also at
the Computation and
Language
Archive.
- Matthias Rinck. Ein Metaregelformalismus für TG/2.
Master's
Thesis, Institute for Computational Linguistics, University of the
Saarland,
2002.
- Holger Stenzhorn. XtraGen. A Natural Language Generation
System
Using
Java- and XML-Technologies. Master's thesis, Institute for
Computational
Linguistics, University of the Saarland, 2002.
- Stephan Busemann. A Shallow Formalism for Defining Personalized
Text,
Workshop Professionelle Erstellung von Papier- und
Online-Dokumenten:
Perspektiven
für die automatische Textgenerierung at the 22nd Annual German
Conference on Artificial Intelligence (KI-98), Bremen, September 16-17,
1998.
- Michael Wein. Eine parametrisierbare Generierungskomponente
mit
generischem
Backtracking. Master's thesis, Department for Computer Science,
University
of the Saarland, 1996.
Publications on Applications Using TG/2
(for download of the paper search in the LT
publications page)
- Stephan Busemann. eGram - a Grammar Development Environment and
Its Usage for Natural Language Generation, in Proc. Fourth International Conference on
Language Resources and Evaluation (LREC), Lisbon, Portugal, 2004.
- Stephan Busemann. Language Generation for Cross-Lingual Document
Summarization,
in: Hyanye Sheng: Proceedings of the International Workshop on
Innovative
Language Technology and Chinese Information Processing, Shanghai, 2001.
Science Press, Beijing.
- Stephan Busemann and Helmut Horacek. A Flexible Shallow Approach
to
Text
Generation, in: E. Hovy (ed.): Proceedings of the Nineth
International
Natural Language Generation Workshop (INLG '98),
Niagara-on-the-Lake,
August 1998. Also at the Computation
and Language Archive. Related
Online Demo
- Stephan Busemann and Helmut Horacek. Generating Air-Quality
Reports
from
Environmental Data, in: Tilman Becker, Stephan Busemann, and Wolfgang
Finkler
(eds.), DFKI Workshop on Natural Language Generation, April
1997,
DFKI Document D-97-06, Saarbrücken. Related
Online Demo
- Helmut Horacek and Stephan Busemann. Towards a Methodology for
Developing
Application-Oriented Report Generation, in: O. Herzog (ed.): KI-98.
22nd Annual German Conference on Artificial Intelligence, Bremen,
September
1998. Related Online
Demo
- Stephan Busemann, Thierry Declerck, Abdel Kader Diagne, Luca
Dini,
Judith
Klein, and Sven Schmeier, ``Natural Language Dialogue Service for
Appointment
Scheduling Agents'', in Proc. 5th Conference on Applied Natural
Language
Processing, Washington, DC., 1997. Also at the Computation
and Language Archive.
last modified: October 24, 2005 Stephan
Busemann
(busemann@dfki.de)