MATE Deliverable D1.1
Supported Coding Schemes
Responsible Editor: Marion Klein (DFKI)
Niels Ole Bernsen (MIP), Sarah Davies (HCRC),
Laila Dybkjær (MIP),
Juanma Garrido (DFE), Henrik Kasch (MIP), Andreas
Mengel (IMS), Vito Pirrelli (ILC),
Massimo Poesio (HCRC), Silvia Quazza (CSELT),
Claudia Soria (ILC)
Abstract:
The first step of the MATE project is to define an overall mark-up formalism
which is based on the TEI/CES standards. This formalism accommodates the
needs of current and emerging coding schemes for the level of prosody,
(morhpo-) syntax, co-reference, dialogue acts, communication problems,
and cross-level issues. In order to accomplish this one has to observe
existing coding schemes. These schemes should have proved their reliability
in the way that they have been used in systems by a couple of novice and
/ or expertise users for annotating a corpus of reasonable size. This report
represents a survey of such coding schemes which fulfil this property.
These coding schemes are described in detail with regard to their coding
book, number of annotators working with it, number of annotated dialogues
/ segments / utterances, evaluation results, underlying task, list of annotated
phenomena, and mark-up language used. Also annotation examples are provided.
Keywords:
coding scheme, communication problem, co-reference, cross-level issue,
dialogue act, multilevel annotation, morphosyntax, prosody, standardization,
tools engineering
Executive Summary
This report gives an overview of the state of the art of coding schemes.
Schemes for the levels of prosody, morpho-syntax, co-reference, dialogue
acts, communication problems, and cross-level issues have been examined.
In order to allow an appropriate comparison of schemes guidelines have
been developed (s. section 1.3). These guidelines will guide the decision
of which coding schemes will be supported by the MATE project and which
ones might be of less interest as they lack reliability. The MATE annotation
standards are going to be developed on the basis of the results of this
report.
A brief overview of the chapters of this report is given below:
Chapter 1 gives
a general introduction to the theme, summarizes the projects approach and
discusses the guidelines which are used to standardize the retrieval of
important information about schemes.
Chapters 2 - 7 present the state of the art of the five different annotation
levels which MATE is going to investigate plus cross-level:
Chapter 8 draws conclusions
about the scheme comparisons on the different levels and outlines future
work.
A detailed list of all schemes under consideration can be found in Annexes.
Acknowledgements
We would like to thank Masahiro Araki, Florence Bruneseaux, Sherri Condon,
Mark Core, Barbara di Eugenio, Giovanni Flammia, Arne Jönsson, Staffan
Larsson, Lori Levin, Christine Nakatani, Joakim Nivre, Laurent Romary,
Jacques Terken, Ann Thyme-Gobbel, and Hans de Vreught for providing information
on their schemes.
Glossary of Terms
-
Cross-level: annotation relation, that is established between
any two or more linguistic units that are considered distinct in phenomenological
classes
-
DRI : Discourse Resource Initiative
-
EAGLES: Expert Advisory Group on Language Standards
-
Explicit tags: tags are represented by character combinations and
included in the representation of the target of description
-
Hierarchical annotation: theory dependent and consistent annotation
of hierarchically structured levels of description
-
Layout tags: tags are represented by special positions of the representation
of the targets of description
-
MUC : Message Understanding Conference
-
Multilevel annotation: annotation on more than one level of description
-
NLP : Natural Language Processing
Last Modification: 26.8.1998 by Marion
Klein