Skip to main content Skip to main navigation


Efficient k-Means on GPUs

Clemens Lutz; Sebastian Breß; Tilmann Rabl; Steffen Zeuch; Volker Markl
In: Proceedings of the 14th International Workshop on Data Management on New Hardware. International Workshop on Data Management on New Hardware (DaMoN-2018), 14th, located at ACM SIGMOD International Conference on Management of Data, June 10-15, Houston, TX, USA, ISBN 978-1-4503-5853-8/18/06, ACM, New York, NY, USA, 2018.


k-Means is a versatile clustering algorithm widely-used in practice. To cluster large data sets, state-of-the-art implementations use GPUs to shorten the data to knowledge time. These implementations commonly assign points on a GPU and update centroids on a CPU. We show that this approach has two main drawbacks. First, it separates the two algorithm phases over different processors, which requires an expensive data exchange between devices. Second, even when both phases are computed on the GPU, the same data are read twice per iteration, leading to inefficient use of memory bandwidth. In this paper, we describe a new approach that executes k-means in a single data pass per iteration. We propose a new algorithm to updates centroids that allows us to perform both phases efficiently on GPUs. Thereby, we remove data transfers within each iteration. We fuse both phases to eliminate artificial synchronization barriers, and thus compute k-means in a single data pass. Overall, we achieve up to 20× higher throughput compared to the state-of-the-art approach.


Weitere Links