Skip to main content Skip to main navigation



General Document Extraction Technology

General Document Extraction Technology

  • Duration:

The GDET project will create a software prototype for the automatic extraction of data from unstructured documents of diverse kinds and their automated further processing, thereby reducing the staff effort currently needed for these tasks considerably. Such a system is highly demanded by industry in oder to optimize management processes in a way that is not supported by current market solutions.

In large companies, usually several hundred thousands of documents need to be classified, indexed and stored every day. Today's users must categorize documents under time pressure using a multitude of rules. Moreover they must index and archive them. In the course of creating up-to-date process management structures for the assessment of diverse kinds of documents companies express a need for the semi-automatic archiving of those documents in a legally compliant way, based on a standardized taxonomy.

More and more enterprises use electronic archiving systems for their documents. The personel expenses for capturing and assigning documents is, however, persisting.

The GDET project aims at creating a technology that extends automatic e-mail management to fully automatically assigning any business document one or several classes of a standard taxonomy, e.g. invoice, complaint letter, reminder. In addition, process-specific properties will be extrated from the documents, thus supporting their largely automatic further processing and compliant archiving.

GDET is a partner project between the Berlin-based SMEs docs&rules GmbH and StoneOne AG. DFKI is supporting docs&rules GmbH with extraction and classification results.


docs&rules GmbH, StoneOne AG, DFKI GmbH


docs&rules GmbH