Revisiting Reuse in Main Memory Database SystemsKayhan Dursun; Carsten Binnig; Ugur Çetintemel; Tim Kraska
In: Semih Salihoglu; Wenchao Zhou; Rada Chirkova; Jun Yang; Dan Suciu (Hrsg.). Proceedings of the 2017 ACM International Conference on Management of Data. ACM SIGMOD International Conference on Management of Data (SIGMOD-2017), May 14-19, Chicago, IL, USA, Pages 1275-1289, ACM, 2017.
Reusing intermediates in databases to speed-up analytical query processing has been studied in the past. Existing solutions typically require intermediate results of individual operators to be materialized into temporary tables to be considered for reuse in subsequent queries. However, these approaches are fundamentally ill-suited for use in modern main memory databases. The reason is that modern main memory DBMSs are typically limited by the bandwidth of the memory bus, thus query execution is heavily optimized to keep tuples in the CPU caches and registers. To that end, adding additional materialization operations into a query plan not only add additional traffic to the memory bus but more importantly prevent the important cache- and register-locality opportunities resulting in high performance penalties. In this paper we study a novel reuse model for intermediates, which caches internal physical data structures materialized during query processing (due to pipeline breakers) and externalizes them so that they become reusable for upcoming operations. We focus on hash tables, the most commonly used internal data structure in main memory databases to perform join and aggregation operations. As queries arrive, our reuse-aware optimizer reasons about the reuse opportunities for hash tables, employing cost models that take into account hash table statistics together with the CPU and data movement costs within the cache hierarchy. Experimental results, based on our HashStash prototype demonstrate performance gains of $2times$ for typical analytical workloads with no additional overhead for materializing intermediates.