MATE Deliverable D1.1

Supported Coding Schemes

(Carnegie Mellon University, Department of Psychology)

Coding book:
Author: Brian MacWhinney
Title: The CHILDES Project: Tools for Analysing Talk

Number of annotators:
The CHAT system is a real standard for transcription and coding of child language in a number of European and non-European languages. This means that a great number of annotators has been using CHAT for different purposes, so that it is difficult to state an exact number of annotators. Most of the annotators were linguists.

Number of annotated dialogues:
A huge number of dialogues has been/is being annotated with the CHAT coding scheme. This number exceeds the amount of dialogues in the database, as many projects concerning child language make use of CHAT without contributing to the overall CHILDES database. The internationally recognized CHILDES database include transcripts from over forty major projects in English and additional data from 19 other languages. The additional languages are Brazilian Portuguese, Chinese (Mandarin), Chinese (Cantonese), Danish, Dutch, French, German, Greek, Hebrew, Hungarian, Italian, Japanese, Mambila, Polish, Russian, Spanish, Swedish, Tamil, Turkish, and Ukrainian. The total size of the database is now approximately 160 million characters (160 MB).

Evaluation of scheme:
As a result of its worldwide use, CHAT is continuously evaluated and updated to meet the needs of different languages and different users. Anyway, we are not aware of statistical/quantitative evaluations of its reliability.

Underlying task:
Analysis of child language.

List of phenomena annotated:
Speech act codes:

*MOT:    are you okay?
%spa:    $x:dhs $i:yq

Mark-up language:
CHAT's own format.

Existence of annotation tools:
The CHILDES system contains several separate, yet integrate, tools, subdivided in two major tools. The first tool is a full-fledged and ASCII-oriented editor (CED, Childes EDitor), specifically designed to facilitate the editing of CHAT files and to check for accuracy of transcriptions. The second tool, actually a bunch of several smaller tools, is a set of computer programs called CLAN (Child Language ANalysis) which serve different analysis purposes. The full system is presented in detail in MacWhinney (1991) and illustrated through practical examples in Sokolov and Snow (1994).

MacWhinney, B. (1991). The CHILDES project: Tools for analyzing talk. Hillsdale, NJ: Erlbaum. Sokolov, J. and C. Snow (Eds.). (1994). Handbook of research in language development using CHILDES. Hillsdale, NJ: Erlbaum.

Used in the CHILDES project.

Contact person:
Brian MacWhinney (macw@cmu.edu)

Last Modification: 27.8.1998 by Marion Klein