Hybride semantische Suche - eine Kombination aus Fakten- und Dokumentretrieval

Kinga Schumacher

PhD-Thesis, Universität Potsdam, 2017.


The subject of this doctoral thesis is semantic search in the context of today’s informa- tion management systems. These systems include intranets and Web 3.0 applications, as well as many web portals that contain information in heterogeneous formats and struc- tures. On the one hand, they contain data in a structured form, and on the other hand they contain documents that are related to this data. However, these documents are usually only partially structured or completely unstructured. For example, travel portals describe the period, the destination, the cost of the travel through structured data, while additional information, such as descriptions of the hotel, destination, excursions, etc. is in unstructured form. The focus of today’s semantic search engines is to find knowledge either in a structured form (also called fact retrieval), or in semi- or un-structured form, which is commonly referred to as semantic document retrieval. Only a few search engines are trying to close the gap between these two approaches. Although they search simultaneously for structu- red and unstructured data, the results are either analyzed independently, or the search possibilities are highly limited: for example, they might support only specific question patterns. Accordingly, the information available in the system is not exploited, and, si- multaneously, the relationships between individual pieces of content in the respective information systems and complementary information cannot reach the user. In order to close this gap, this thesis develops and evaluates a new hybrid semantic search approach that combines structured and semi- or un-structured content throughout the entire search process. This approach not only finds facts and documents, it uses also relationships that exist between the different items of structured data at every stage of the search, and integrates them into the search results. If the answer to a query is not completely structured (like a fact), or unstructured (like a document), this approach pro- vides a query-specific combination of both. However, consideration of structured as well as semi- or un-structured content by the information system throughout the entire search process poses a special challenge to the search engine. This engine must be able to browse facts and documents independently, to combine them, and to rank the differently struc- tured results in an appropriate order. Furthermore, the complexity of the data should not be apparent to the end user. Rather, the presentation of the contents must be un- derstandable and easy to interpret, both in the query request and the presentation of results. The central question of this thesis is whether a hybrid approach can answer the que- ries on a given database better than a semantic document search or fact-finding alone, or any other hybrid search that does not combine these approaches during the search process. The evaluations from the perspective of the system and users show that the hybrid semantic search solution developed in this thesis provides better answers than the methods above by combining structured and unstructured content in the search process, and therefore gives an advantage over previous approaches. A survey of users shows that the hybrid semantic search is perceived as understandable and preferable for heteroge- neously structured datasets.

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence