In today´world, data is streamed from the local network or edge devices to a cloud provider which is rented by a customer to perform the data execution. The Big Data software stack, in an application and hardware agnostic manner, splits the execution stream into multiple tasks and send them for processing on the nodes the customer has paid for. If the outcome does not match the strict three second business requirement, then the customer has two options: 1) scale-up (by upgrading processors at node level) 2) scale-out (by adding nodes to their clusters), or 3) manually implement code optimizations specific to the underlying hardware. However, the customer does not have the financial capability to achieve that. Ideally, they would like to achieve their business requirements without stretching their hardware budget. In order to address the alarming scalability concerns, both end-users as well as cloud infrastructure vendors (such as Google, Microsoft, Amazon, and Alibaba) are investing in heterogeneous hardware resources able to utilize a diverse selection of architectures such as CPUs, GPUs, FPGAs, and MICs aiming to further increase performance while minimizing the climbing operational costs. Furthermore, despite current investments in heterogeneous resources, large companies such as Google develop in-house ASICs with TensorFlow being the prime example.
E2Data proposes an end-to-end solution for Big Data deployments that will fully exploit and advance the state-of-the-art in infrastructure services by delivering a performance increase of up to 10x while utilizing up to 50% less cloud resources. E2Data will provide a new Big Data software paradigm of achieving the maximum resource utilization for heterogeneous cloud deployments without affecting current Big Data programming norms (i.e., no code changes in the original source). The proposed solution takes a cross-layer approach by allowing vertical communication between the four key layers of Big Data deployments (application, Big Data software, scheduler/cloud provider, and execution run time).
The University of Manchester, Institute of Communications and Computer Systems, Neurocom Luxembourg, KALEAO Limited, Computer Technology Institute and Press "Diophantus" (CTI), Spark Works Limited, iProov Limited