ACM Transactions on Database Systems vol:39 issue:3 pages:20:1-20:27
Mining frequent patterns is plagued by the problem of pattern explosion making pattern reduction tech-
niques a key challenge in pattern mining. In this paper we propose a novel theoretical framework for pattern
reduction. We do this by measuring the robustness of a property of an itemset such as closedness or non-
derivability. The robustness of a property is the probability that this property holds on random subsets of
the original data. We study four properties: an itemset being closed, free, non-derivable or totally shattered,
and demonstrate how to compute the robustness analytically without actually sampling the data. Our con-
cept of robustness has many advantages: Unlike statistical approaches for reducing patterns, we do not
assume a null hypothesis or any noise model and in contrast to noise tolerant or approximate patterns, the
robust patterns for a given property are always a subset of the patterns with this property. If the underlying
property is monotonic, then the measure is also monotonic, allowing us to efficiently mine robust itemsets.
We further derive a parameter-free technique for ranking itemsets that can be used for top-k approaches.
Our experiments demonstrate that we can successfully use the robustness measure to reduce the number of
patterns and that ranking yields interesting itemsets.