Download PDF

Mine, Interact, Learn, Repeat: Interactive Pattern-based Data Exploration

Publication date: 2017-06


Dzyuba, Vladimir


In many fields, the rapid growth of the amount of available data has created the need for automated tools to assist analysts in understanding these data and discovering useful knowledge in them. Pattern mining is a well-studied knowledge discovery task, which aims at providing concise, comprehensible descriptions of coherent regions in the data. Many variations of pattern mining have been proposed in the literature, together with even more algorithms to efficiently mine the corresponding patterns. However, the vast majority of these methods do not adapt their results to the goals and interests of a particular analyst, which makes pattern mining inaccessible to non-expert users and hampers its adoption as a practical data exploration tool. In this thesis, we investigate algorithmic approaches to interactive pattern mining, where an analyst only needs to provide feedback with respect to intermediate results, which is then used to steer the mining process towards subjectively interesting results (patterns). We frame this problem as an interactive mining and learning loop that can be paraphrased by the formula “Mine, interact, learn, repeat.” The main contributions of this thesis are the techniques that implement individual steps of this loop. The first contribution is an algorithm to learn user preferences for patterns from ordered feedback and methods to minimize the amount of user feedback required to learn an accurate user model. The second contribution is a flexible pattern sampling algorithm, which supports a wide range of pattern constraints and sampling distributions and generates diverse, representative collections of patterns on demand. The third contribution is an end-to-end interactive pattern mining algorithm that combines preference learning with “anytime” mining by sampling. Experiments demonstrate that the techniques presented in this thesis perform well in a variety of pattern mining tasks and thus are promising building blocks for practical interactive data exploration systems.