Balanced Clustering for Content-based Image Browsing
Christopher Tim Althoff; Adrian Ulges; Andreas Dengel
In: GI-Informatiktage 2011. GI-Informatiktage (Informatik-2011), March 25-26, Bonn, Germany, Gesellschaft für Informatik e.V. 3/2011.
In recent years the explosive growth of digitally stored image and video data has raised the need for tools to search and organize visual data automatically by their content. Browsing environments, which structure image and video collections, are one solution to this problem. Therefore, image clustering techniques are needed that group semantically related images, are highly scalable, and produce balanced structures. We propose a simple and efficient strategy to enforce a more balanced clustering based on a hierarchical variant of the online k-means algorithm that favors small clusters over larger ones by adapting the prior probability of each cluster. We compare our method to standard hierarchical agglomerative techniques using multiple standard features and real-world datasets, showing that the proposed approach yields clusters of comparable qualitity while being substantially more balanced and scalable.