Next: Language Generation - New Achievements
Up: Technology Watch Activity
Previous: Multimedia Information Presentation for DSS
Natural Language Generation is a subfield of Computational Linguistics
and language-oriented Artificial Intelligence research devoted to
studying and simulating the production of written or spoken discourse.
The study of human language generation is a multidisciplinary
enterprise, requiring expertise in areas of linguistics, psychology,
engineering and computer science.
One of the central goals is to investigate how computer programs can
be made to produce high-quality natural language text from
computer-internal representations of information.
Natural language generation often is characterized as a process that
has to start from the communicative goals of the writer or speaker and
needs to employ some sort of planning to progressively convert them into
written or spoken words.
In this view, the general aims of the language producer are refined
into goals that are increasingly linguistic in nature, culminating in
low-level goals to produce particular words.
Usually, a modularization of the generation process is assumed which
roughly distinguishes between a strategical (deciding what to say) and
a tactical (deciding how to say it) part.
This strategy-tactics distinction is partly mirrored by a distinction
between text planning and sentence generation.
Text planning is concerned with working out the large-scale structure
of the text to be produced and may also comprise content selection.
The result of this subprocess is commonly taken to be a tree-like
discourse structure, which has at each leaf an instruction to
produce a single sentence.
These instructions are then passed in turn to a sentence generator,
whose task can be further subdivided into sentence planning,
i.e. organizing the content of each sentence, and the final step of
surface realization, i.e. converting sentence-sized chunks of
representation into grammatically correct sentences.
The different types of generation techniques can be classified into four
main categories:
- Canned text systems constitute the simplest approach for
single-sentence and multi-sentence text generation.
They are trivial to create, but very inflexible.
- Template systems, the next level of sophistication, rely on the
application of pre-defined templates or schemas and are able to
support flexible alterations.
The template approach is used mainly for multi-sentence generation,
particularly in applications whose texts are fairly regular in
structure.
- Phrase-based systems employ what can be seen as generalized
templates.
In such systems, a phrasal pattern is first selected to match the
top level of the input, and then each part of the pattern is
recursively expanded into a more specific phrasal pattern that
matches some subportion of the input.
At the sentence level, the phrases resemble phrase structure grammar
rules and at the discourse level they play the role of text plans.
- Feature-based systems, which are as yet restricted to
single-sentence generation, represent each possible minimal
alternative of expression by a single feature.
Accordingly, each sentence is specified by a unique set of features.
In this framework, generation consists in the incremental collection
of features appropriate for each portion of the input.
Feature collection itself can either be based on unification or on
the traversal of a feature selection network.
The expressive power of the approach is very high since any
distinction in language can be added to the system as a feature.
Sophisticated feature-based generators, however, require very
complex input and make it difficult to maintain feature
interrelationships and control feature selection.
Many natural language generation systems follow a hybrid approach by
combining components that utilize different techniques.
- Selected References:
-
- G. Adorni, M. Zock (eds.):
Trends in Natural Language Generation--An Artificial
Intelligence Perspective.
Berlin, Heidelberg: Springer, 1996.
- R. A. Cole, J. Mariani, H. Uszkoreit, A. Zaenen, V. Zue (eds.):
Survey of the State of the Art in Human Language Technology.
Kluwer, Dordrecht, 1996.
- K. de Smedt, C. Mellish, H.-J. Novak (eds.):
Proc. of the 5th European Workshop on Natural Language
Generation, 20-23 May 1995 Leiden, The Netherlands.
Rijks University Leiden, 1995.
- Proc. of INLG'96: Eight International Natural Language
Generation Workshop, 12-15 June 1996, Sussex, UK.
Information Technology Research Institute, University of Brighton,
1996.
- D. D. McDonald:
Natural-Language Generation. In: S. C. Shapiro (ed.),
Encyclopedia of Artificial Intelligence. 2nd edition,
New York: Wiley, pp. 983-997, 1992.
Next: Language Generation - New Achievements
Up: Technology Watch Activity
Previous: Multimedia Information Presentation for DSS
Gerd Herzog
Last update: Fri Feb 26 13:17:30 MET 1999
Send comments to herzog@acm.org