SMES - A fast and robust information extraction core system
                 for real-world German text processing

Günter Neumann, LT-lab, DFKI, Saarbrücken

NEWS (date March, 2006)

SMES, is an information extraction core system for real world German text processing. The basic design criterion of the
system is of providing a set of basic powerful, robust, and efficient natural language components and generic linguistic
knowledge sources which can easily be customized for processing different tasks in a  flexible manner.
The main components of SMES are:

SMES has been implemented in Allegro CommonLisp (ACL) and runs on Sun, Linux, and Windows. It can process real-time natural text very robustly and efficiently (about 4 msec/token on standard PC). SMES has a very high lexical and syntactic overage - about 1 million word forms, and coverage of mostly all German clause types. It also performs morphosyntactic agreement and determination of verb group morphology.

SMES now also runs with CLISP (e.g., using CYGWIN), which is a freely available CommonLisp version. In order to make things a bit easier, I have defined a specific system definition file, see notes below. The performance of SMES under Franz Allegro CL and CLISP is approx. the same. For a brief overview, see also  smes-pd-docu.pdf, which also serves as a (brief) user guide.

SMES can be used freely for research purpose (but please fill out and send me the rearch license contract, which will help me to keep track who is using the system. In this case, announcements of new stuff can be made more directly).


The major references for SMES are:

Application areas

SMES has already been used in a number of differtent areas at universities, research institues, and industries (mostly in departments of computer science and computational linguistics), e.g., Universities of Potsdam (Berlin, Germany), Rostock (Germany), Karlsruhe (AIFB, Germany), CMU (USA), Stanford (CMU), Edinburgh (UK), Berlin (Germany), Dortmund (Germany), EML (Germany), Würzburg, FHG München, Koblenz; companies: Ontoprise (Germany), Gecko (Rostock), FGAN/FKIE (Germany), Ecircle (Germany). The areas mostly exploited are:

1.   information extraction
2.   text mining
3.   ontology extraction & semantic web
4.   statistical approaches to semantic analysis

5.     open-domain question answering

Example text analysis:
About the Software

I would be very happy, if you could download the research licencese agreement, and send to me a signed copy before downloading the system. This would help me to bookkeep the use of the system. On this basis, I am planning to improve the system.

The Lisp Version of SMES is a standalone version implemented in Common Lisp. Although it comes with no server/client interfaces or major blink-and-lights, it provides the major linguistic and processing functionality.  There is a number of documentation which helps writing grammars and extending the lexicon.

The system processes ASCII-based input documents and returns a list of all found expressions. It also creates an HTML-based marked-up version of the input file on-line. L-SMES consists of:

Installing SMES

Once you have downloaded the SMES gzipped tar file, gunzip and untar it. This will create a folder pd-smes into your current folder. pd-smes contains several subfolders among others one named docu (which contains a number of documentation) and systems which contains several system definitions.

Now go to the systems folder. Here you only have to edit  one file viz. the file named smes-sys.lisp.
NEW! In case, you prefer to use CLISP, then edit the file named clisp-smes-sys.lisp

In this file you only need to edit two variables:

1. *smes-sys-root*:   set it to the pathname ends in pd-smes. This should automatically set all other pathnames proper!

2. *image-pathname*: you can create your own image file for SMES using the function (dump-smes).
                                This function will create an image file named SMES into the folder you have set via
                                 *image-pathname*. In case of CLISP, you then can simply call SMES via clisp -M YOUR-IMAGE-FILE  


Loading and compiling SMES

1. Load the system file smes-sys.lisp (or clisp-smes-sys.lisp)
2. The first time you are using SMES you have to call the function (compile-smes) first.
3. Once you have compiled SMES you can load it by calling function (load-smes).
4. Since loading might take some time, you should create your own image using function (dump-smes)

Initialization of SMES
(for a more complete description see file ../pd-smes/docu/smes-pd-docu.pdf which is contained in the tar file of SMES)

Once you have loaded SMES, then call the function (init-smes) to initialize the smes system. This function directly jumbs into the main package of SMES, namely "SMES".

Now you should follow the descriptions in section 6 of the ../pd-smes/docu/smes-pd-docu.pdf which tells you how to call the main top-level functions!