An Extensive Empirical Evaluation of Character-Based Morphological Tagging for 14 Languages
Georg Heigold; Günter Neumann; Josef van Genabith
In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL) 15th Conference of the European Chapter of the Association for Computational Linguistics- Proceedings of Conference, Volume 1: Long Papers. Conference of the European Chapter of the Association for Computational Linguistics (EACL-2017), April 3-7, Valencia, Spain, Pages 505-5013, Vol. 1, Long Papers, ISBN 978-1-945626-34-0, Association for Computational Linguistics (ACL), 4/2017.
This paper investigates neural character-based morphological tagging for languages with complex morphology and large tag sets. Character-based approaches are attractive as they can handle rarely- and unseen words gracefully. We evaluate on 14 languages and observe consistent gains over a state-of-the-art morphological tagger across all languages except for English and French, where we match the state-of-the-art. We compare two architectures for computing character-based word vectors using recurrent (RNN) and convolutional (CNN) nets. We show that the CNN based approach performs slightly worse and less consistently than the RNN based approach. Small but systematic gains are observed when combining the two architectures by ensembling.