Download PDF

Mining Sets of Patterns (Zoeken naar verzamelingen van patronen)

Publication date: 2009-05-29

Author:

Zimmermann, Albrecht
De Raedt, Luc

Abstract:

Local pattern mining is an integral part of a variety of approaches in d ata mining and machine learning. Especially in data mining, a lot of res earch exists on how to mine local patterns efficiently from large data b ases. As a side-effect, the resulting pattern collections are far too bi g to be useful. In machine learning, on the other hand, local patterns a re ingredients to more complex models used for classification, or cluste ring, for instance. In both cases, efficiently and effectively assembling those local patter ns into sets of patterns is an important task. In this thesis, we presen t a in-depth discussion of various aspects of pattern set mining. The thesis has three main contributions. The first one is the introducti on of a formal framework for pattern set mining as a distinct task. To t he best of our knowledge, this is the first time that such a formal defi nition has been given. By identifying types of pattern sets, properties of those types, and developing constraints based on these properties, th e framework enables the principled discussion of existing and future pat tern set mining techniques. As a direct benefit, in a second step, we leverage the framework for the discussion of existing approaches from data mining and machine learning . We identify the two main approaches to pattern set mining, post-proces sing and iterative mining, and characterize existing heuristic technique s. Following our discussion, we suggest an exhaustive alternative to exi sting heuristic data mining techniques, as well as possibilities for the upgrade of those techniques. Additionally, we inspect the local pattern mining approaches to which most pattern set mining techniques from mach ine learning are coupled and argue that they can be replaced by differen t methods. Finally, to validate our theoretical findings, we perform a number of ex perimental evaluations of existing and proposed systems. We show that an exhaustive post-processing method does indeed allows us to perform cons traint-based pattern set mining. We also evaluate the effect of heuristi c parameters, such as orders and quality measures, for heuristic post-pr ocessing. Additionally, we explore the application of exhaustive local p attern mining in iterative pattern set mining, showing both its applicab ility. A surprising finding in this last setting is that exhaustive meth ods, guided by iterative mining, are not more computationally expensive than their heuristic counterparts.