European Extreme Performing Big Data Stacks

European Extreme Performing Big Data Stacks

  • Laufzeit:

In today’s world, data is streamed from the local network or edge devices to a cloud provider which is rented by a customer to perform the data execution. The Big Data software stack, in an application and hardware agnostic manner, splits the execution stream into multiple tasks and send them for processing on the nodes the customer has paid for. If the outcome does not match the strict three second business requirement, then the customer has two options: 1) scale-up (by upgrading processors at node level) 2) scale-out (by adding nodes to their clusters), or 3) manually implement code optimizations specific to the underlying hardware.  However, the customer does not have the financial capability to achieve that. Ideally, they would like to achieve their business requirements without stretching their hardware budget. In order to address the alarming scalability concerns, both end-users as well as cloud infrastructure vendors (such as Google, Microsoft, Amazon, and Alibaba) are investing in heterogeneous hardware resources able to utilize a diverse selection of architectures such as CPUs, GPUs, FPGAs, and MICs aiming to further increase performance while minimizing the climbing operational costs. Furthermore, despite current investments in heterogeneous resources, large companies such as Google develop in-house ASICs with TensorFlow being the prime example. 

E2Data proposes an end-to-end solution for Big Data deployments that will fully exploit and advance the state-of-the-art in infrastructure services by delivering a performance increase of up to 10x while utilizing up to 50% less cloud resources.  E2Data will provide a new Big Data software paradigm of achieving the maximum resource utilization for heterogeneous cloud deployments without affecting current Big Data programming norms (i.e., no code changes in the original source). The proposed solution takes a cross-layer approach by allowing vertical communication between the four key layers of Big Data deployments (application, Big Data software, scheduler/cloud provider, and execution run time).


Europäische Union

Europäische Union

Projekt teilen auf:



Publikationen zum Projekt

Jonas Traub, Philipp Grulich, Alejandro Rodríguez Cuéllar, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl

In: 22th International Conference on Extending Database Technology (EDBT). International Conference on Extending Database Technology (EDBT-2019) 22th March 26-29 Lisbon Portugal OpenProceedings 2019.

Zur Publikation
Philipp Grulich, Jonas Traub, Asterios Katsifodimos, Tilmann Rabl, Sebastian Breß, Volker Markl

In: Proceedings of the 13th ACM International Conference on Distributed and Event-based Systems. ACM International Conference on Distributed and Event-Based Systems (DEBS-2019) June 24-28 Darmstadt Germany Seiten 256-257 ISBN 978-1-4503-6794-3/19/06 ACM 2019.

Zur Publikation

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence