ParDeepBank: Multiple Parallel Deep Treebanking

Daniel Flickinger; Valia Kordoni; Yi Zhang; António Branco; Kiril Simov; Petya Osenova; Catarina Carvalheiro; Francisco Costa; Sérgio Castro
In: Proceedings of the Eleventh International Workshop on Treebanks and Linguistic Theories. International Workshop on Treebanks and Linguistic Theories (TLT-11), 11th, November 30 - December 1, Lisbon, Portugal, Pages 97-108, Edições Colibri, Lisbon, 2012.


This paper describes the creation of an innovative and highly parallel tree- bank of three languages from different language groups — English, Por- tuguese and Bulgarian. The linguistic analyses for the three languages are done by compatible parallel automatic HPSG grammars using the same for- malism, tools and implementation strategy. The final analysis for each sen- tence in each language consists of (1) a detailed feature structure analysis by the corresponding grammar and (2) derivative information such as derivation trees, constituent trees, dependency trees, and Minimal Recursion Seman- tics structures. The parallel sentences are extracted from the Penn Treebank and translated into the other languages. The Parallel Deep Bank (ParDeep- Bank) has potentially many applications: for HPSG grammar development; machine translation; evaluation of parsers on comparable data; etc.



