MATE Deliverable D1.1
Supported Coding Schemes
1. Overview
Introduction
MATE's Aims
Scheme
Evaluation Guidelines
Further
Content of this Document
Introduction
During the last years, corpus based approaches have gained significant
importance in the field of natural language processing (NLP). Large corpora
for many different languages are currently being collected all over the
world. In order to reuse this amount of data for training and testing purposes
of NLP systems, the corpora must be annotated in various ways [Carletta
et al. 1997]. This annotation assumes an underlying coding scheme.
The way such schemes are designed depends on the task and the linguistic
phenomena on which developers focus. The author's own style also has its
effects on the scheme. For these reasons reusability of annotated corpora
is extremely complicated.
The Discourse Resource Initiative (DRI) was started as an effort to
assemble discourse resources to support discourse research and application.
The goal of this initiative is to develop a standard for semantic / pragmatic
and discourse features of annotated corpora [Carletta
et al. 1997]. Another project, LE-EAGLES, also has the goal to provide
preliminary guidelines for the representation or annotation of dialogue
resources for language engineering [Leech et al. 1998].
These guidelines cover the areas of orthographic transcription, morpho-syntactic,
syntactic, prosodic, and pragmatic annotation. But instead of developing
a standard they describe the most used schemes, mark-up languages and systems
for annotation.
MATE's Aims
MATE aims to develop a preliminary form of standard concerning annotation
schemes on the levels of prosody, morpho-syntax, co-reference, dialogue
acts, and communication problems, as well as their interaction.
MATE's annotation standard is meant to be closely related to the standardisation
efforts in the US, Europe and Japan and will thus build on the work of
DRI and EAGLES, mentioned above.
The annotation standard will allow multi-linguality and the co-existence
of a multitude of coding schemes. This report provides the basis for the
decision on which existing coding schemes MATE should support. It represents
a broad overview on current schemes and covers all levels under consideration.
The information collected in this report was collected from the web,
from recent proceedings and through personal contact. In the future we
will continue our search of schemes which, by accident, were not included
in this report. A web version of this report will be available and regularly
updated even after the deadline of deliverable D1.1.
The results of this report will feed into the work on implementation
of the MATE workbench (WP3) which is a tool box in support of the MATE
standard, and they will form the basis of the definition of level mark-up
(WP2).
Scheme Evaluation Guidelines
Lots of research has been done in the field of annotation schemes. Therefore
one has to carefully look at all schemes to make the right decision whether
the scheme will be supported in the MATE project or if it doesn't seem
to be reliable enough and, hence, has to be omitted. To ease this, decision
guidelines have been used which are listed below:
-
Existence of coding book
The schemes have to be well-documented. Therefore a coding book has
to be available that describes the purpose, the domain, and the application
for which the scheme has been developed.
-
Number of annotators
The schemes must have been used by a decent number of different coders.
This is because coding schemes that have only been used by their developers
tend to be too subjective and difficult to use.
-
Number of dialogues / utterances / segments
The schemes must have been used for annotating a certain number of
dialogues to prove their usability.
-
Evaluations of scheme
The evaluation of inter-coder agreement reflects the reliability of
the coding scheme. As a common measurement the kappa-value is used.
The kappa-coefficient is computed as:
P(A) - P(E)
kappa = ----------
1 - P(E)
where P(A) represents the probability that the annotators agree, while
P(E) stands for the probability that the coders agree by chance. The per
chance agreement is determined as:
P(E) = p1 * p1 + p2
* p2 + ... + pn * pn
It appears to be settling on the interpretation that coding schemes
with overall reliabilities at kappa = 0.8 or higher are good enough
that it is not necessary to try to improve on them, and values between
0.67 and 0.8 allow tentative conclusions to be drawn but indicate that
the scheme should be improved [Carletta 96].
Another measurement that should be mentioned is the alpha-value [Krippendorff
80]. It's calculated as:
DO
alpha = 1 - ---
DE
Where DO describes the observed disagreements
and DE describes the expected disagreements.
-
Underlying task
Schemes for annotation are frequently linked to an underlying domain
or task. This might reduce their general usability.
-
List of phenomena annotated
For comparison of schemes and the development of a standard a list
of phenomena is essential.
-
Examples
For a better understanding of schemes examples are essential.
-
Mark-up language
The mark-up language of a scheme has to be known for writing mark-up
translators, for instance. Also it is interesting to see which mark-up
language is used most.
-
Existence of annotation tools
Annotation tools for schemes make annotation easier and are therefore
more likely to be used. Also the tool might be integrated in the MATE workbench.
-
Usability
Schemes should be used in existing systems to show their usability.
-
Contact
In order to gain further information on schemes a contact address is
given.
All schemes which have been observed are tested with these guidelines.
A detailed listing of all schemes can be found in the Appendices.
Further Content of this
Document
Chapters 2 to 7 provide insight into the level-stages on communication
problems, co-reference, cross-level issues, dialogue acts, (morpho)-syntax
and prosody. The observed schemes which can be found in Annexes are compared
with regard to their levels. At the end of each chapter conclusions concerning
the supported schemes are drawn from these comparisons.
In chapter 8, a summary of the results of the research on the different
levels is presented and further work that could be done in this field is
outlined.
Last Modification: 28.8.1998 by Marion
Klein