Towards a Benchmark for Interactive Data ExplorationPhilipp Eichmann; Emanuel Zgraggen; Zheguang Zhao; Carsten Binnig; Tim Kraska
In: IEEE Data Engineering Bulletin, Vol. 39, No. 4, Pages 50-61, IEEE, 2016.
Existing benchmarks for analytical database systems such as TPC-DS and TPC-H are designed for static reporting scenarios. The main metric of these benchmarks is the performance of running different SQL queries over a predefined database. In this paper, we argue that such benchmarks are not suitable for evaluating modern interactive data exploration (IDE) systems, which allow data scientists of varying skill levels to manipulate, analyze, and explore large data sets, as well as to build models and apply machine learning at interactive speeds. While query performance is still important for data exploration, we believe that a much better metric would reflect the number and complexity of insights users gain in a given amount of time. This paper discusses challenges of creating such a metric and presents ideas towards a new benchmark that simulates typical user behavior and allows IDE systems to be compared in a reproducible way.