The semantic search capabilities offered by the Searchbench by far exceed the state-of-the-art in text search. Its sentence-semantic analysis methods have been developed in the project TAKE (Technologies for Advanced Knowledge Extraction; funded by BMBF).
The Searchbench is a web application combining sentence-semantic search (statements search) with fulltext, terminology and bibliographic search in digital libraries. The system works domain-independently: even domain terminology (
topics) is extracted fully automatically from collections and individual documents. In the Searchbench user interface, information seekers can formulate structured queries of the form "subject-predicate-object" , e.g. "method improves precision" or simply "improve precision". The system automatically resolves passive constructions such as "precision is improved by... method" in the semantically correct way. Moreover, predicate synonyms can be used.
The main advantage of the new search paradigm lies in the search precision: It searches for complete or partial statements, excluding hits where the query words happen to co-occur, but without the relation specified in the query. It is even possible to omit the search predicate, specifying only subject and object, to enumerate all different relations between them, expressed in the text. Search can include or exclude negated sentences, where negation also includes antonyms and some forms of soft negation (such as "rarely"). Search results are always presented in sentence context if the Acrobat Reader browser plug-in is used, even in the original PDF view. Queries can be bookmarked and sent via email.
The statement search can be combined with classical full-text and bibliographic metadata search, faceted search including autosuggestion fields and thus makes the system a powerful tool for precision-oriented search in large digital text collections.
The Searchbench technology not only helps improving search, but also other innovative semantics-oriented knowledge techniques have been developed and make use of the semantic index: automatic, unsupervised domain term extraction, taxonomy extraction and glossary extraction. The main advantage is that the techniques work domain-independently. Further possible advanced applications building on these basic technologies are automatic question answering, text summarization, controlled language and style checking, and many more.
A graphical citation browser in the Searchbench helps exploring related publications by displaying citation sentence information in the graph. By clicking on nodes or edges, the original citation context can be viewed in the PDF. Altogether, the system is meant to be used as a workbench for search, therefore the name Searchbench a workbench for search.
- "DFG Cluster of Excellence Multimodal Computing and Interaction (M2CI)
- Robust, Efficient and Intelligent Processing of Text, Speech, Visual Data and High Dimensional Representations
- Open Science Web":http://www.mmci.uni-saarland.de/index.php?id=1&L=0
- Delph-In DEEP LINGUISTIC PROCESSING WITH HPSG Initiative