DFKI-LT - Shallow, Deep and Hybrid Processing with UIMA and Heart of Gold

Ulrich Schäfer
Shallow, Deep and Hybrid Processing with UIMA and Heart of Gold
1 Proceedings of the LREC-2008 Workshop Towards Enhanced Interoperability for Large HLT Systems: UIMA for NLP, 6th International Conference on Language Resources and Evaluation, Pages 43-50, Marrakesh, Morocco, ELRA, 2008,
Workshop URL

The Unstructured Information Management Architecture (UIMA) is a generic platform for processing text and other unstructured, human-generated data. For text, it has been proposed and is being used mainly for shallow natural language processing (NLP) tasks such as part-of-speech tagging, chunking, named entity recognition and shallow parsing. However, it is commonly accepted that getting interesting structure and semantics from documents requires deeper methods. Therefore, one of the future goals for UIMA will be inclusion of openly available, deep linguistic parsing technology for the generation of semantics representations from documents.

Heart of Gold is a lightweight, XML-based middleware architecture that has been developed for this purpose. It supports hybrid, i.e. combined shallow and deep processing workflows of multiple NLP components to increase robustness and exploit synergy, and linguistic resources for multiple languages. The notion of explicit transformation between component input and output enables flexible interaction of existing NLP components. Heart of Gold foresees both tightly (same process) and loosely coupled (via networked services) processing modes. Assuming familarity with UIMA, we introduce Heart of Gold and propose and discuss hybrid integration scenarios in the context of UIMA. Possible applications include precision-oriented question answering, deep information extraction and opinion mining, textual entailment checking and machine translation.
Files: BibTeX, uima4hog2008.pdf