FITing-Tree: A Data-aware Index StructureAlex Galakatos; Michael Markovitch; Carsten Binnig; Rodrigo Fonseca; Tim Kraska
In: Peter A. Boncz; Stefan Manegold; Anastasia Ailamaki; Amol Deshpande; Tim Kraska (Hrsg.). Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019. ACM SIGMOD International Conference on Management of Data (SIGMOD-2019), June 30 - July 5, Amsterdam, Netherlands, Pages 1189-1206, ACM, 2019.
Index structures are one of the most important tools that DBAs leverage to improve the performance of analytics and transactional workloads. However, building several indexes over large datasets can often become prohibitive and consume valuable system resources. In fact, a recent study showed that indexes created as part of the TPC-C benchmark can account for 55% of the total memory available in a modern DBMS. This overhead consumes valuable and expensive main memory, and limits the amount of space available to store new data or process existing data. In this paper, we present FITing-Tree, a novel form of a learned index which uses piece-wise linear functions with a bounded error specified at construction time. This error knob provides a tunable parameter that allows a DBA to FIT an index to a dataset and workload by being able to balance lookup performance and space consumption. To navigate this tradeoff, we provide a cost model that helps determine an appropriate error parameter given either (1) a lookup latency requirement (e.g., 500ns) or (2) a storage budget (e.g., 100MB). Using a variety of real-world datasets, we show that our index is able to provide performance that is comparable to full index structures while reducing the storage footprint by orders of magnitude.