Skip to main content Skip to main navigation

Publication

JOB-Complex: A Challenging Benchmark for Traditional & Learned Query Optimization

Johannes Wehrstein; Timo Eckmann; Roman Heinrich; Carsten Binnig
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2507.07471, Pages 1-14, arXiv, 2025.

Abstract

Large Language Models (LLMs) have demonstrated significant po- tential for automating data engineering tasks on tabular data, giving enterprises a valuable opportunity to reduce the high costs associ- ated with manual data handling. However, the enterprise domain introduces unique challenges that existing LLM-based approaches for data engineering often overlook, such as large table sizes, more complex tasks, and the need for internal knowledge. To bridge these gaps, we identify key enterprise-specific challenges related to data, tasks, and background knowledge and conduct a compre- hensive study of their impact on recent LLMs for data engineering. Our analysis reveals that LLMs face substantial limitations in real- world enterprise scenarios, resulting in significant accuracy drops. Our findings contribute to a systematic understanding of LLMs for enterprise data engineering to support their adoption in industry.

More links