Domain-specific Classification Methods for Disfluency Detection

Sebastian Germesin; Peter Poller; Tilman Becker

In: Janet Fletcher; Deborah Loakes; Roland Göcke; Denis Burnham; Michael Wagner (Hrsg.). Interspeech 2008. Conference in the Annual Series of Interspeech Events (INTERSPEECH-08), September 22-26, Brisbane, QLD, Australia, ISBN 1990-9772, International Speech Communication Association, 9/2008.


Speech disfluencies are very common in our everyday life and considerably affect NLP systems, which makes systems that can detect or even repair them highly desirable. Previous research achieved good results in the field of disfluency detection but only in subsets of the disfluency types. The aim of this study was to develop a technology that is able to cope with a broad field of disfluency types. A thorough investigation of our corpus led us to a detection design where basic rule-matching techniques are complemented with machine learning and N-gram based approaches. In this paper, we describe the different detection techniques, each specialized on its own disfluency domain and the results we gained.

Weitere Links

germesin_dsfl_is08(2).pdf (pdf, 156 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence