DFKI-LT - Determining the Origin and Structure of Person Names
Determining the Origin and Structure of Person Names
3 Proceedings of the 7th International Conference on Language Resources and Evaluation, Valletta, Malta, European Language Resources Association (ELRA), 5/2010
This paper presents a novel system HENNA (Hybrid Person Name Analyzer) for identifying language origin and analyzing linguistic structures of person names. We conduct ME-based classification methods for the language origin identification and achieve very promising performance. We will show that word-internal character sequences provide surprisingly strong evidence for predicting the language origin of person names. Our approach is context-, language- and domain-independent and can thus be easily adapted to person names in or from other languages. Furthermore, we provide a novel strategy to handle origin ambiguities or multiple origins in a name. HENNA also provides a person name parser for the analysis of linguistic and knowledge structures of person names. All the knowledge about a person name in HENNA is modelled in a person-name ontology, including relationships between language origins, linguistic features and grammars of person names of a specific language and interpretation of name elements. The approaches presented here are useful extensions of the named entity recognition task.
Files: BibTeX, 763_Paper.pdf, 763.html, 763_Paper.pdf