Model-based Classification of Unstructured Data Sources

Kerstin Bach; Klaus-Dieter Althoff
In: Isabelle Bichindaritz; Cindy Marling; Stefania Montani (Hrsg.). Proceedings Workshop Case-Based Reasoning in the Health Sciences. ICCBR Workshop on Case-Based Reasoning in the Health Sciences (CBR-HS-13), located at International Conference on Case-Based Reasoning ICCBR, July 8-11, Saratoga Springs, NY, USA, Pages 50-59, ICCBR, 7/2013.


In this paper we present an approach that uses knowledge provided in Case-Based Reasoning (CBR) systems for the classification of unknown and unstructured textual data. In the course of developing distributed CBR systems, heterogeneous knowledge sources are mined for populating knowledge containers of various CBR systems. We present how available knowledge, especially the kind of knowledge stored in the vocabulary knowledge container, can be applied for identifying relevant experiences and distributing them among various CBR systems. The work presented is part of the SEASALT architecture that provides a framework for developing distributed, agent-based CBR systems. We focus on the implementation of the knowledge mining task within SEASALT and apply the approach within a travel medicine application domain. Our underlying data source is a user forum, in which various travel medicine topics are discussed, and we show that our approach outperforms the C4.5 and SVM classifiers in terms of accuracy and efficiency in identifying relevant forum entries to create cases from.

Weitere Links