ARNE - A tool for Namend Entity Recognition from Arabic Text

Carolin Shihadeh; Günter Neumann

In: Fourth Workshop on Computational Approaches to Arabic Script-based Languages (CAASL4). Workshop on Computational Approaches to Arabic Script-based Languages (CAASL-4), 4th, located at The Tenth Biennial Conference of the Association for Machine Translation in the Americas, November 1, San Diego, CA, USA, AMTA, 2012.


In this paper, we study the problem of finding named entities in the Arabic text. For this task we present the development of our pipeline software for Arabic named entity recognition (ARNE), which includes tokenization, morphological analysis, Buckwalter transliteration, part of speech tagging and named entity recognition of person, location and organisation named entities. In our first attempt to recognize named entities, we have used a simple, fast and language independent gazetteer lookup approach. In our second attempt, we have used the morphological analysis provided by our pipeline to remove affixes and observed hence an improvement in our performance. The pipeline presented in this paper, can be used in future as a basis for a named entity recognition system that recognized named entities not only using gazetteers, but also making use of morphological information and part of speech tagging.

Weitere Links

arne-proceedings-2012.pdf (pdf, 1 MB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence