Download PDF

The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD), Date: 2014/09/15 - 2014/09/19, Location: Nancy, France

Publication date: 2014-01-01
Volume: 8725 LNAI Pages: 98 - 113
ISSN: 9783662448502
Publisher: Springer

Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD) 2014

Author:

Le Van, Thanh
van Leeuwen, Matthijs ; Nijssen, Siegfried ; Fierro Gutiérrez, Ana Carolina Elisa ; Marchal, Kathleen ; De Raedt, Luc ; Calders, Toon ; Esposito, Floriana ; Hüllermeier, Eyke ; Meo, Rosa

Abstract:

Tiling is a well-known pattern mining technique. Traditionally, it discovers large areas of ones in binary databases or matrices, where an area is defined by a set of rows and a set of columns. In this paper, we introduce the novel problem of ranked tiling, which is concerned with finding interesting areas in ranked data. In this data, each transaction defines a complete ranking of the columns. Ranked data occurs naturally in applications like sports or other competitions. It is also a useful abstraction when dealing with numeric data in which the rows are incomparable. We introduce a scoring function for ranked tiling, as well as an algorithm using constraint programming and optimization principles. We empirically evaluate the approach on both synthetic and real-life datasets, and demonstrate the applicability of the framework in several case studies. One case study involves a heterogeneous dataset concerning the discovery of biomarkers for different subtypes of breast cancer patients. An analysis of the tiles by a domain expert shows that our approach can lead to the discovery of novel insights.