Skip to main content Skip to main navigation

Project

SProUT

Shallow Processing with Unification and Typed Feature Structures

Shallow Processing with Unification and Typed Feature Structures

  • Duration:

SProUT is a system for partial analysis of texts. It is used in particular – though not exclusively – for named entity recognition (NER) and opinion mining. The simple recognition of named entities includes, among others, persons, locations, date and currency expressions, functions, companies and organizations. In addition, SProUT can recognize relations between named entities and analyse, e.g., „Theodore Roosevelt, President of the United States of America“ as a complex, coherent entity. Going further, SProUT instances can be "cascaded" in that results of one instance are processed further by a subsequent instance. This way, complex relationships can be extrated from texts.

SProUT currently recognizes named entity expressions in German, English, French, Italian, Spanish, and Dutch with high quality. Linguistic resources for other languages are integrated on a continuous basis.

SProUT is implemented in Java and C and is equipped with a Java API. The system can thus be easily integrated. SProUT processes text files and delivers structured results in XML format. For grammar developers SProUT offers a development and test platform with a comfortable graphical user interface. This way language resources can be adapted to individual requirements.