Jakub Piskorski, Petr Homola, Malgorzata Marciniak, Agnieszka Mykowiecka, Adam Przepiórkowski, Marcin Wolinski
Information Extraction for Polish Using the SProUT Platform
1 Proceedings of International Conference on Intelligent Information Systems - New Trends in Intelligent Information Processing and Web Mining, May 2004, Zakopane, Poland, 2004
The aim of this article is to present the initial results of adapting SProUT, a multi-lingual Natural Language Processing platform developed at DFKI, Germany, to the processing of Polish. The article describes some of the problems posed by the integration of Morfeusz, an external morphological analyzer for Polish, and various solutions to the problem of the lack of extensive gazetteers for Polish. The main sections of the article report on some initial experiments in applying this adapted system to the Information Extraction task of identifying various classes of Named Entities in financial and medical texts, perhaps the first such Information Extraction effort for Polish.
