Skip to main content Skip to main navigation

Publikation

Demonstrating PDSP-Bench: A Benchmarking System for Parallel and Distributed Stream Processing

Pratyush Agnihotri; Carsten Binnig
In: Volker Markl; Joseph M. Hellerstein; Azza Abouzied (Hrsg.). Companion of the 2025 International Conference on Management of Data, SIGMOD/PODS 2025, Berlin, Germany, June 22-27, 2025. ACM SIGMOD International Conference on Management of Data (SIGMOD), Pages 7-10, ACM, 2025.

Zusammenfassung

The paper introduces PDSP-Bench, a novel benchmarking system designed for a systematic understanding of performance of paral- lel stream processing in a distributed environment. Such an understand- ing is essential for determining how Stream Processing Systems (SPS) use operator parallelism and the available resources to process massive work- loads of modern applications. Existing benchmarking systems focus on analyzing SPS using queries with sequential operator pipelines within a homogeneous centralized environment. Quite differently, PDSP-Bench emphasizes the aspects of parallel stream processing in a distributed het- erogeneous environment and simultaneously allows the integration of ma- chine learning models for SPS workloads. In our results, we benchmark a well-known SPS, Apache Flink, using parallel query structures derived from real-world applications and synthetic queries to show the capa- bilities of PDSP-Bench towards parallel stream processing. Moreover, we compare different learned cost models using generated SPS work- loads on PDSP-Bench by showcasing their evaluations on model and training efficiency. We present key observations from our experiments us- ing PDSP-Bench that highlight interesting trends given different query workloads, such as non-linearity and paradoxical effects of parallelism on the performance.

Weitere Links