Publikation

Learning of Semantic Sibling Group Hierarchies - K-Means vs. Bi-Secting-K-Means

Marko Brunzel

In: I.Y. Song; J. Eder; T.M. Nguyen (Hrsg.). Data Warehousing and Knowledge Discovery. Proceedings of the 9th International Conference (DaWaK-2007)September 3-7, 2007, Regensburg, Germany. International Conference on Data Warehousing and Knowledge Discovery (DaWaK), Pages 365-374, LNCS, Vol. 4654, Springer, 2007.

Zusammenfassung

In [3] we have shown that our XTREEM-SG (Xhtml TREE Mining for Sibling Groups) method is capable of finding semantic motivated sibling groups with higher quality than prior methods. This was done by a flat K-Means clustering which worked as a kind of lossy compression facility. The human expert needs to inspect a potentially long list of sets. In this publication we will improve this situation by performing a hierarchical clustering. We expect similar clusters to be arranged in a hierarchy; the human expert needs only to inspect the hierarchy down to his desired granularity on structuring. This method is called XTREEM-SGH (XTREEM for Sibling Group Hierarchies). We will investigate how the quality of results obtained by a divisive hierarchical clustering method (Bi-Secting-K-Means) compares to the quality obtained by a flat simple K-Means clustering. The clusterings are evaluated on external criteria, namely two gold standard ontologies. We have to revise the statement of Steinbach, Karypis and Kumar [16]; our finding is that K-Means clustering is better than Bi-Secting-K-Means clustering - for finding semantic sibling groups based on the Group-By-Path [3] method.