Publikation

Cross-Lingual Lemmatization and Morphology Tagging with Two-Stage Multilingual BERT Fine-Tuning

Daniel Kondratyuk

In: Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology. Workshop on Computational Research in Phonetics, Phonology, and Morphology (SIGMORPHON-2019), August 1-2, Florence, Italy, Pages 12-18, Association for Computational Linguistics, 8/2019.

Zusammenfassung

We present our CHARLES-SAARLAND system for the SIGMORPHON 2019 Shared Task on Crosslinguality and Context in Morphology, in task 2, Morphological Analysis and Lemmatization in Context. We leverage the multilingual BERT model and apply several fine-tuning strategies introduced by UDify demonstrating exceptional evaluation performance on morpho-syntactic tasks. Our results show that fine-tuning multilingual BERT on the concatenation of all available treebanks allows the model to learn cross-lingual nformation that is able to boost lemmatization and morphology tagging accuracy over fine-tuning it purely monolingually. Unlike UDify, however, we show that when paired with additional character-level and word-level LSTM layers, a second stage of fine-tuning on ach treebank individually can improve evaluation even further. Out of all submissions for this shared task, our system achieves the highest average accuracy and f1 score in morphology tagging and places second in average lemmatization accuracy.

Projekte

DEEPLEE - Tiefes Lernen für End-to-End-Anwendungen in der Sprachtechnologie

Weitere Links

https://www.aclweb.org/anthology/W19-4203.pdf

W19-4203_Cross-Lingual_Lemmatization_and_Morphology_Tagg.pdf (pdf, 274 KB )