Title: Finding robust itemsets under subsampling
Authors: Tatti, Nikolaj ×
Moerchen, Fabian
Calders, Toon #
Issue Date: 2013
Publisher: Association for Computing Machinery
Series Title: ACM Transactions on Database Systems vol:39 issue:3 pages:20:1-20:27
Article number: 20
Abstract: Mining frequent patterns is plagued by the problem of pattern explosion making pattern reduction tech-
niques a key challenge in pattern mining. In this paper we propose a novel theoretical framework for pattern
reduction. We do this by measuring the robustness of a property of an itemset such as closedness or non-
derivability. The robustness of a property is the probability that this property holds on random subsets of
the original data. We study four properties: an itemset being closed, free, non-derivable or totally shattered,
and demonstrate how to compute the robustness analytically without actually sampling the data. Our con-
cept of robustness has many advantages: Unlike statistical approaches for reducing patterns, we do not
assume a null hypothesis or any noise model and in contrast to noise tolerant or approximate patterns, the
robust patterns for a given property are always a subset of the patterns with this property. If the underlying
property is monotonic, then the measure is also monotonic, allowing us to efficiently mine robust itemsets.
We further derive a parameter-free technique for ranking itemsets that can be used for top-k approaches.
Our experiments demonstrate that we can successfully use the robustness measure to reduce the number of
patterns and that ranking yields interesting itemsets.
ISSN: 0362-5915
Publication status: published
KU Leuven publication type: IT
Appears in Collections:Informatics Section
× corresponding author
# (joint) last author

Files in This Item:
File Description Status SizeFormat
akoglu13dot.pdf Published 861KbAdobe PDFView/Open


All items in Lirias are protected by copyright, with all rights reserved.

© Web of science