A Data Layout Transformation for Vectorizing Compilers

Arsène PĂ©rard-Gayot; Richard Membarth; Philipp Slusallek; Simon Moll; Roland Leißa; Sebastian Hack
In: Proceedings of the 2018 Workshop on Programming Models for SIMD/Vector Processing (WPMVP). Workshop on Programming Models for SIMD/Vector Processing (WPMVP-2018), located at PPoPP18: Principles and Practice of Parallel Programming, February 24, Vösendorf / Vienna, Austria, Pages 7:1-7:8, ACM, 2/2018.


Modern processors are often equipped with vector instruction sets. Such instructions operate on multiple elements of data at once, and greatly improve performance for specific applications. A programmer has two options to take advantage of these instructions: writing manually vectorized code, or using an auto-vectorizing compiler. In the latter case, he only has to place annotations to instruct the auto-vectorizing compiler to vectorize a particular piece of code. Thanks to auto-vectorization, the source program remains portable, and the programmer can focus on the task at hand instead of the low-level details of intrinsics programming. However, the performance of the vectorized program strongly depends on the precision of the analyses performed by the vectorizing compiler. In this paper, we improve the precision of these analyses by selectively splitting stack-allocated variables of a structure or aggregate type. Without this optimization, automatic vectorization slows the execution down compared to the scalar, non-vectorized code. When this optimization is enabled, we show that the vectorized code can be as fast as hand-optimized, manually vectorized implementations.



Weitere Links