Learning of Knowledge-Intensive Similarity Measures in Case-Based Reasoning

Armin Stahl

PhD-Thesis Technische Universität Kaiserslautern ISBN 3-89825-886-6 2004.


Retrieving information from some set of data is a classical research topic in computer science. However, depending on the requirements of the particular application field, different approaches for realising retrieval functionality have been developed. On the one hand, traditional database management systems suppose users are beware of their information needs and that they are able to express these needs exactly by using a standardised query language. Here, the retrieval objective is the efficient selection of data records matching exactly the users’ queries. On the other hand, in other application fields such well-formulated queries cannot be presumed, either because one has to deal with unstructured data (e.g. documents in the World Wide Web), or because users are not beware of their actual information needs. One such application field is a problem-solving technique developed in Artificial Intelligence (AI), called Case-Based Reasoning (CBR). Here, collected data records—called cases—represent information about problems solved in the past, and the basic idea is to reuse this knowledge when solving new problems. Therefore, cases useful for the current problem-solving episode have to be retrieved from all collected cases. The major problem of this retrieval task is that users usually are not able to express exact retrieval criteria describing which cases are useful and which are not. However, they should be able to describe their current problem situation. The selection of corresponding useful cases is then left to the CBR system which retrieves cases to be used for solving the problem by employing socalled similarity measures. Basically, a similarity measure represents a heuristics for estimating the a-priori unknown utility of a case. As typically for heuristics, the quality of a similarity measure can be improved by incorporating as much as possible knowledge about the particular application domain. However, the definition of such knowledge-intensive similarity measures leads to the well-known knowledge acquisition problem of AI. Unfortunately, the difficulty to acquire and formalise specific domain knowledge often prevents the usage of these actually very powerful kinds of similarity measures in commercial applications. The objective of this thesis is the development of a framework and algorithms based on Machine Learning methods in order to facilitate the definition of knowledgeintensive similarity measures in CBR. The basic idea of this framework is to extract the mandatory domain knowledge form special training data that can be acquired more easily than the actual knowledge itself.

Dissertation_Stahl.pdf (pdf, 2 MB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence