Panagiotis Ritsos, Kai Xu (editor).
Computer Graphics & Visual Computing (CGVC) 2020. Computer Graphics and Visual Computing (CGVC-2020) September 10-11 London England United Kingdom The Eurographics Association 2020.
Nearest neighbor search algorithms on GPUs have been improving for years. Starting with tree-based approaches in the middle
70’s, state-of-the-art methods use hash-based or grid-based methods. Leveraging high-performance hardware functionality
decreases runtime of these search algorithms. Furthermore, memory consumption has been decreased significantly as well
using Shared Memory. In the scope of these enhancements, particles have been reordered by different constraints that simplify
neighbor processing. However, inspecting the existing algorithms reveals underused capabilities caused by algorithm desing.
Exploiting these capabilities in a smart way can increase occupancy and efficiency on GPUs. In this paper, we present a
neighbor processing approach that is based on dynamic load balancing. We rely on a lightweight workload-analysis phase that
is applied during neighbor processing to distribute work throughout all warps in a thread group on-the-fly. In different domains,
the neighbor function is often symmetric and, thus, commutative in each argument. In contrast to prior work, we use this
domain knowledge to reduce the number of memory accesses considerably. Measurements of the newly introduced features on
our evaluation scenarios show a comparable runtime performance to state-of-the-art methods. Increasing the overall workload
by processing million-particle domains leads to significant improvements in terms of runtime. At the same time, we minimize
global memory consumption to enable more particles to be processed compared to current approaches.