Novel Properties and Well-Tried Performance of EM-Based Multivariate Clustering

Detlef Prescher

In: Proceedings of the EuroConference on Recent Advances in Natural Language Processing (RANLP-01), September 5-7. International Conference on Recent Advances in Natural Language Processing (RANLP) Tzigov Chark, Bulgaria Seiten 216-222 2001.


We present three novel properties for EM-based multivariate clustering: simplified re-estimation formulas, a simple pruning technique, and a novel invariance property preserving the characteristics of the given empirical distribution. Evaluation on two tasks shows: EM-based multivariate clustering models require only twice the storage space of the original sample, and these models yield reliable estimates for unknown data. Moreover we refer to selected experiments showing that EM-based multivariate clustering improves several real-world applications.

Prescher_2001_NPW.pdf (pdf, 208 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence