DFKI-LT - IDX
Information Retrieval with IDX
Information Retrieval with IDX
IDX is a professional text indexing system with high-quality linguistic knowledge. It achieves an excellent level of consistency in the indexing of large amounts of German or English text. IDX is able to determine the base words of derivations and compounds, so that every instance of a search term will be systematically found. Therefore IDX, when used in conjunction with other tools, forms an ideal basis for high performance information retrieval.
IDX offers the following services:
- Full text retrieval
- Proper name recognition
- Interfaces to information retrieval systems
Full Text Retrieval with IDX
Words are reduced to their base forms (example: houses → house; went → go). For German, spelling errors are recognized using the Primus correction manager, which is a well recognized tool in professional spelling correction. Information about part of speech of the base forms is preserved and can be used for information retrieval.
If required, stop words (e.g. "the", "and") can be excluded from indexing using a dictionary. These words are usually not wanted in an index.
Compound words and derived words are in addition reduced to their (meaningful) components (e.g., farmhouse → farm, house) in order to enable further search for the individual words or compounds thereof. Compounds are analyzed either following word identification or - if already lexicalized - with the use of a relational dictionary.
Abbreviations and their extensions are indexed into the same entry (e.g., USA - United States - United States of America).
Among words with the same spelling, selections can be made according to word classes. For example in the case of "race", the verb "to race" can be blocked (thereby permitting only the noun "race").
Multi-word terms, as far as they can be lexically identified, e.g. on the basis of dedicated translation dictionaries, will also be provided for retrieval (example: legal actions → legal action).
Synonyms or hypernyms can be assigned to the same index entries, if desried, e.g. gate → door; door → building. Other associative relations can be defined freely.
A word-based translation dictionary allows for the recognition of foreign terms based on English or German search terms. For instance: Secretary of State → Außenminister (USA); Chancellor of the Exchequer → Finanzminister (GB); goal → Ziel; goal → Tor.
Proper Name Recognition with IDX
Another component of IDX is the recognition of proper names, a feature developed for the languages German and English. It uses semantic information to examine a text for the presence of possible proper names: the surrounding words are examined for certain pre-stored patterns, e.g. "Sweets and Knuckles Ltd.".
IDX Interfaces to Information Retrieval systems
An XML-based interface is provided to integrate the results into an appropriate full-text retrieval system without much effort. More extensive adaptations for the retrieval of formatted data can be carried out easily and efficiently on demand.
IDX can be licensed for commercial puposes. For more information, please use the contact information given below.
|Project Manager:||Stephan Busemann (Stephan.Busemann@dfki.de)|
|Contact:||Stefania Racioppa (Stefania.Racioppa@dfki.de)|