Download PDF

IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans

Publication date: 2011-09-01
Volume: 41 Pages: 1013 - 1025
Publisher: Institute of Electrical and Electronics Engineers

Author:

Haghir Chehreghani, Mostafa
Haghir Chehreghani, Morteza ; Lucas, Caro ; Rahgozar, Masoud

Keywords:

data mining, frequent patterns, Science & Technology, Technology, Computer Science, Cybernetics, Computer Science, Theory & Methods, Computer Science, Breadth-first candidate generation, frequency counting, frequent tree pattern, induced subtree, rooted ordered labeled tree, tree encoding, 08 Information and Computing Sciences, 09 Engineering, Artificial Intelligence & Image Processing, 40 Engineering, 46 Information and computing sciences

Abstract:

Frequent tree patterns have many practical applications in different domains such as XML mining, web usage analysis, etc. In this paper, we present OInduced, a novel and efficient algorithm for finding frequent ordered induced tree patterns. OInduced uses a breadth-first candidate generation method and improves it by means of an indexing scheme. We also introduce frequency counting using tree encoding. For this purpose, we present two novel tree encodings, m-coding and cmcoding, and show how they can restrict nodes of input trees and compute frequencies of generated candidates. We perform extensive experiments on both real and synthetic datasets to show efficiency and scalability of OInduced.