Download PDF

17th International Symposium on Intelligent Data Analysis (IDA), Date: 2018/10/24 - 2018/10/26, Location: NETHERLANDS, Hertogenbosch

Publication date: 2018-01-01
Volume: 11191 LNCS Pages: 367 - 379
ISSN: 9783030017675
Publisher: Springer Verlag

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Author:

Verbruggen, G
De Raedt, L ; Duivesteijn, W ; Siebes, A ; Ukkonen, A

Keywords:

Science & Technology, Technology, Computer Science, Artificial Intelligence, Computer Science, Information Systems, Computer Science, Theory & Methods, Computer Science, Data wrangling, Program synthesis, Spreadsheets, Preprocessing, Inductive programming

Abstract:

© Springer Nature Switzerland AG 2018. To help automate the important pre-processing step in machine learning and data mining, we introduce synth-a-sizer, a tool for semi-automatically wrangling spreadsheets into attribute-value format, so that they can be used by popular machine learning tools, only requiring the user to mark cells belonging to one single example. synth-a-sizer is based on inductive programming principles. We introduce synth-a-sizer’s transformations, search algorithm as well as a heuristic and distance measure for identifying types. We also report on a first experimental evaluation.