Publication
ELEET: Efficient Learned Query Execution over Text and Tables
Matthias Urban; Carsten Binnig
In: Proceedings of the VLDB Endowment (PVLDB), Vol. 17, No. 13, Pages 4867-4880, VLDB, 2024.
Abstract
In this paper, we present ELEET, a novel execution engine that al-
lows one to seamlessly query and process text as a first-class citizen
along with tables. To enable such a seamless integration of text and
tables, ELEET leverages learned multi-modal operators (MMOps)
such as joins and unions that seamlessly combine structured with
unstructured textual data. While large language models (LLM) such
as GPT-4 are interesting candidates to enable such learned multi-
modal operations, we deliberately do not follow this trend to enable
MMOps, since it would result in high overhead at query runtime.
Instead, to enable MMOps, ELEET comes with a more efficient small
language model (SLM) that is targeted to extract structured data
from text. Thanks to our novel architecture and pre-training proce-
dure, the ELEET-model enables high-accuracy extraction with low
overheads. In our evaluation, we compare query execution based
on ELEET to baselines leveraging LLMs such as GPT-4 and show
that ELEET can speed up multi-modal queries over tables and text
by up to 575× without sacrificing accuracy.
