Skip to main content Skip to main navigation


Efficient and Scalable k-Means on GPUs

Clemens Lutz; Sebastian Breß; Tilmann Rabl; Steffen Zeuch; Volker Markl
In: Theo Härder; Ralf Schenkel (Hrsg.). Datenbank-Spektrum, Vol. Schwerpunktbeitrag, No. 13222, Pages 1-13, Springer, 2018.


k-Means is a versatile clustering algorithm widely used in practice. To cluster large data sets, state-of-the-art implementations use GPUs to shorten the data to knowledge time. These implementations commonly assign points on a GPU and update centroids on a CPU. We identify two main shortcomings of this approach. First, it requires expensive data exchange between proces- sors when switching between the two processing steps point assignment and centroid update. Second, even when processing both steps of k-means on the same processor, points still need to be read two times within an iteration, leading to inefficient use of memory bandwidth. In this paper, we present a novel approach for centroid update that allows us to efficiently process both phases of k-means on GPUs. We fuse point assignment and centroid update to execute one iteration with a single pass over the points. Our evaluation shows that our k-means approach scales to very large data sets. Overall, we achieve up to 20× higher throughput compared to the state-of-the-art approach.