Download PDF

Clustering Evolving Data using Kernel-Based Methods (Clusteren van evoluerende data met behulp van kernel-gebaseerde methodes)

Publication date: 2014-07-02

Author:

Langone, Rocco

Keywords:

SISTA

Abstract:

Thanks to recent developments of Information Technologies, there is a profusion of available data in a wide range of application domains ranging from science and engineering to biology and business. For this reason, the demand for real-time data processing, mining and analysis is experiencing an explosive growth in recent years. Since labels are usually not available and in general a full understanding of the data is missing, clustering plays a major role in shedding an initial light. In this context, elements such as generalization to out-of-sample data, model selection criteria, consistency of the clustering results over time and scalability to large data become key issues. A successful modelling framework is offered by Least Squares Support Vector Machine (LS-SVM), which is designed in a primal-dual optimization setting. The latter allows extensions of the core models by adding additional constraints to the primal problem, by changing the objective function or by introducing new model selection criteria. In this thesis, we propose several modelling strategies to tackle evolving data in different contexts. In the framework of static clustering, we start by introducing a soft kernel spectral clustering (SKSC) algorithm, which can better deal with overlapping clusters with respect to kernel spectral clustering (KSC) and provides more interpretable outcomes. Afterwards, a whole strategy based upon KSC for community detection of static networks is proposed, where the extraction of a high quality training sub-graph, the choice of the kernel function, the model selection and the applicability to large-scale data are key aspects. This paves the way for the development of a novel clustering algorithm for the analysis of evolving networks called kernel spectral clustering with memory effect (MKSC), where the temporal smoothness between clustering results in successive time steps is incorporated at the level of the primal optimization problem, by properly modifying the KSC formulation. Later on, an application of KSC to fault detection of an industrial machine is presented. Here, a smart pre-processing of the data by means of a proper windowing operation is necessary to catch the ongoing degradation process affecting the machine. In this way, in a genuinely unsupervised manner, it is possible to raise an early warning when necessary, in an online fashion. Finally, we propose a new algorithm called incremental kernel spectral clustering (IKSC) for online learning of non-stationary data. This ambitious challenge is faced by taking advantage of the out-of-sample property of kernel spectral clustering (KSC) to adapt the initial model, in order to tackle merging, splitting or drifting of clusters across time. Real-world applications considered in this thesis include image segmentation, time-series clustering, community detection of static and evolving networks.