Efficient Migration of Very Large Distributed State for Scalable Stream Processing

Bonaventura Del Monte
In: Proceedings of the VLDB 2017 PhD Workshop. International Conference on Very Large Data Bases (VLDB-2017), located at co-located with the 43rd International Conference on Very Large Databases (VLDB 2017), August 28, München, Germany,, 2017.


Any scalable stream data processing engine must handle the dynamic nature of data streams and it must quickly react to every fluctuation in the data rate. Many systems successfully address data rate spikes through resource elasticity and dynamic load balancing. The main challenge is the presence of stateful op- erators because their internal, mutable state must be scaled out while assuring fault-tolerance and continuous stream processing. Both rescaling, load balancing, and recovering demand state movement among work units. Therefore, how to guarantee those features in the presence of large distributed state with minimal impact on the performance is still an open issue. We propose an incremental migration mechanism for fine-grained state shards through periodic incremental checkpoints and replica groups. This enables moving large state with minimal impact on stream processing. Finally, we present a low-latency hand-over protocol that smoothly migrates tuples processing among work units.

Weitere Links