Publication
Towards Complex Table Question Answering Over Tabular Data Lakes
Daniela Risis; Jan-Micha Bodensohn; Matthias Urban; Carsten Binnig
In: Carsten Binnig; Andreas Henrich; Daniela Nicklas; Maximilian E. Schüle; Klaus Meyer-Wegener (Hrsg.). Datenbanksysteme für Business, Technologie und Web (BTW 2025) Workshopband, Bamberg, Germany, March 3-7, 2025. BTW Workshop on Data Engineering for Data Science (DE4DS-2025), Pages 267-275, LNI, Vol. P-363, Gesellschaft für Informatik e.V. 2025.
Abstract
Natural Language Interfaces for Databases (NLIDBs) are an interesting alternative to SQL since they empower non-experts to query data. However, they require this data to first be integrated into a database schema, causing high upfront data engineering and integration overheads. As such, Open Table Question Answering (OTQA) is promising since it allows directly querying tables in data lakes without first incorporating them into a relational schema. Many recent OTQA approaches combine Retrieval-Augmented Generation (RAG) with Large Language Models (LLMs), where relevant tables are first retrieved from a data lake and then used as input to an LLM to answer the user query. In this paper, we take the first systematic step for investigating how LLMs paired with table retrievers can answer queries over private tabular data lakes. As a main finding, we see that even when tuning several parameters of this approach, current LLMs still fail to answer queries that focus on the simple extraction of individual cell values, let alone aggregate queries. Thus, they are far from the rich querying capabilities that NLIDB approaches offer today. To solve this, we point towards promising future work enabling complex question answering over tabular data lakes.