Symposium on Intelligent Data Analysis (IDA 2021), Date: 2021/04/26 - 2021/04/28, Location: Online
Lecture Notes in Computer Science
Author:
Keywords:
Science & Technology, Technology, Computer Science, Artificial Intelligence, Computer Science, Information Systems, Computer Science, Theory & Methods, Computer Science, Data wrangling, Program synthesis, Machine learning
Abstract:
A large part of the time invested in data science is spent on manual preparation of data. Transforming wrongly formatted columns into useful features takes up a significant part of this time. We present the avatar algorithm for automatically learning programs that perform this type of feature wrangling. Instead of relying on users to guide the wrangling process, avatar directly uses the predictive performance of machine learning models to measure its progress during wrangling. We use datasets from Kaggle to show that avatar improves raw data for prediction, and square it off against human data scientists.