Download PDF

Symposium on Intelligent Data Analysis (IDA 2021), Date: 2021/04/26 - 2021/04/28, Location: Online

Publication date: 2021-01-01
Volume: 12695 Pages: 235 - 247
ISSN: 978-3-030-74251-5
Publisher: Springer, Cham

Lecture Notes in Computer Science

Author:

Verbruggen, Gust
Van Wolputte, Elia ; Dumancic, Sebastijan ; De Raedt, Luc ; Abreu, PH ; Rodrigues, PP ; Fernandez, A ; Gama, J

Keywords:

Science & Technology, Technology, Computer Science, Artificial Intelligence, Computer Science, Information Systems, Computer Science, Theory & Methods, Computer Science, Data wrangling, Program synthesis, Machine learning

Abstract:

A large part of the time invested in data science is spent on manual preparation of data. Transforming wrongly formatted columns into useful features takes up a significant part of this time. We present the avatar algorithm for automatically learning programs that perform this type of feature wrangling. Instead of relying on users to guide the wrangling process, avatar directly uses the predictive performance of machine learning models to measure its progress during wrangling. We use datasets from Kaggle to show that avatar improves raw data for prediction, and square it off against human data scientists.