Project

E2Data

European Extreme Performing Big Data Stacks

European Extreme Performing Big Data Stacks

  • Duration:

In today´world, data is streamed from the local network or edge devices to a cloud provider which is rented by a customer to perform the data execution. The Big Data software stack, in an application and hardware agnostic manner, splits the execution stream into multiple tasks and send them for processing on the nodes the customer has paid for. If the outcome does not match the strict three second business requirement, then the customer has two options: 1) scale-up (by upgrading processors at node level) 2) scale-out (by adding nodes to their clusters), or 3) manually implement code optimizations specific to the underlying hardware. However, the customer does not have the financial capability to achieve that. Ideally, they would like to achieve their business requirements without stretching their hardware budget. In order to address the alarming scalability concerns, both end-users as well as cloud infrastructure vendors (such as Google, Microsoft, Amazon, and Alibaba) are investing in heterogeneous hardware resources able to utilize a diverse selection of architectures such as CPUs, GPUs, FPGAs, and MICs aiming to further increase performance while minimizing the climbing operational costs. Furthermore, despite current investments in heterogeneous resources, large companies such as Google develop in-house ASICs with TensorFlow being the prime example.

E2Data proposes an end-to-end solution for Big Data deployments that will fully exploit and advance the state-of-the-art in infrastructure services by delivering a performance increase of up to 10x while utilizing up to 50% less cloud resources. E2Data will provide a new Big Data software paradigm of achieving the maximum resource utilization for heterogeneous cloud deployments without affecting current Big Data programming norms (i.e., no code changes in the original source). The proposed solution takes a cross-layer approach by allowing vertical communication between the four key layers of Big Data deployments (application, Big Data software, scheduler/cloud provider, and execution run time).

Partners

The University of Manchester, Institute of Communications and Computer Systems, Neurocom Luxembourg, KALEAO Limited, Computer Technology Institute and Press "Diophantus" (CTI), Spark Works Limited, iProov Limited

Sponsors

European Union (EU)

European Union (EU)

Share project:

Contact Person

Publications about the project

Clemens Lutz, Steffen Zeuch, Volker Markl

In: David Maier , Rachel Pottinger (editor). Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. ACM SIGMOD International Conference on Management of Data (SIGMOD-2020) June 14-19 Portland OR United States Pages 1633-1649 ISBN 978-1-4503-6735-6 The Association for Computing Machinery 2020.

To the publication
Clemens Lutz, Steffen Zeuch, Volker Markl

In: Proceedings of the VLDB Endowment (PVLDB) 12 5 Pages 516-530 VLDB Endowment 2019.

To the publication

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz