Skip to main content Skip to main navigation

Publication

Efficient Learned Query Execution over Text and Tables [Technical Report]

Matthias Urban; Carsten Binnig
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2410.22522, Pages 1-23, arXiv, 2024.

Abstract

In this paper, we present ELEET, a novel execution engine that al- lows one to seamlessly query and process text as a first-class citizen along with tables. To enable such a seamless integration of text and tables, ELEET leverages learned multi-modal operators (MMOps) such as joins and unions that seamlessly combine structured with unstructured textual data. While large language models (LLM) such as GPT-4 are interesting candidates to enable such learned multi- modal operations, we deliberately do not follow this trend to enable MMOps, since it would result in high overhead at query runtime. Instead, to enable MMOps, ELEET comes with a more efficient small language model (SLM) that is targeted to extract structured data from text. Thanks to our novel architecture and pre-training proce- dure, the ELEET-model enables high-accuracy extraction with low overheads. In our evaluation, we compare query execution based on ELEET to baselines leveraging LLMs such as GPT-4 and show that ELEET can speed up multi-modal queries over tables and text by up to 575× without sacrificing accuracy.

More links