Skip to main content Skip to main navigation


Democratizing Data Science through Interactive Curation of ML Pipelines

Zeyuan Shang; Emanuel Zgraggen; Benedetto Buratti; Ferdinand Kossmann; Philipp Eichmann; Yeounoh Chung; Carsten Binnig; Eli Upfal; Tim Kraska
In: Peter A. Boncz; Stefan Manegold; Anastasia Ailamaki; Amol Deshpande; Tim Kraska (Hrsg.). Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019. ACM SIGMOD International Conference on Management of Data (SIGMOD-2019), June 30 - July 5, Amsterdam, Netherlands, Pages 1171-1188, ACM, 2019.


Statistical knowledge and domain expertise are key to extract actionable insights out of data, yet such skills rarely coexist together. In Machine Learning, high-quality results are only attainable via mindful data preprocessing, hyperparameter tuning and model selection. Domain experts are often overwhelmed by such complexity, de-facto inhibiting a wider adoption of ML techniques in other fields. Existing libraries that claim to solve this problem, still require well-trained practitioners. Those frameworks involve heavy data preparation steps and are often too slow for interactive feedback from the user, severely limiting the scope of such systems.

Weitere Links