Determining the Origin and Structure of Person Names

Yu Fu, Feiyu Xu, Hans Uszkoreit

In: Nicoletta Calzolari , Khalid Choukri , Bente Maegaard , Joseph Mariani , Jan Odjik , Stelios Piperidis , Mike Rosner , Daniel Tapias (Hrsg.). Proceedings of the 7th International Conference on Language Resources and Evaluation. International Conference on Language Resources and Evaluation (LREC-2010) May 19-21 Valletta Malta ISBN 2-9517408-6-7 European Language Resources Association (ELRA) 5/2010.


This paper presents a novel system HENNA (Hybrid Person Name Analyzer) for identifying language origin and analyzing linguistic structures of person names. We conduct ME-based classification methods for the language origin identification and achieve very promising performance. We will show that word-internal character sequences provide surprisingly strong evidence for predicting the language origin of person names. Our approach is context-, language- and domain-independent and can thus be easily adapted to person names in or from other languages. Furthermore, we provide a novel strategy to handle origin ambiguities or multiple origins in a name. HENNA also provides a person name parser for the analysis of linguistic and knowledge structures of person names. All the knowledge about a person name in HENNA is modelled in a person-name ontology, including relationships between language origins, linguistic features and grammars of person names of a specific language and interpretation of name elements. The approaches presented here are useful extensions of the named entity recognition task.


Weitere Links

763_Paper.pdf (pdf, 759 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence