Sparsity in Large Scale Kernel Models

Mall, Raghvendra

Author:

Mall, Raghvendra

Keywords:

Kernel Methods, Sparsity, Scalability, Community Detection, Reweighted L1-norm penalty, Visualization

Abstract:

In the modern era with the advent of technology and its widespread usage there is a huge proliferation of data. Gigabytes of data from mobile devices, market basket, geo-spatial images, search engines, online social networks etc. can be easily obtained, accumulated and stored. This immense wealth of data has resulted in massive datasets and has led to the emergence of the concept of Big Data. Mining useful information from this big data is a challenging task. With the availability of more data the choices in selecting a predictive model decreases, because very few tools arenbsp;feasible for processing large scale datasets. A successful learning framework to perform various learning tasks like classification, regression, clustering, dimensionality reduction, feature selection etc. is offered by Least Squares Support Vector Machines (LSSVM) which is designed in a primal-dual optimization setting. It provides the flexibility to extend core models by adding additional constraints to the primal problem, by changing the objective function ornbsp;introducing new model selection criteria. The goal of this thesis is to explore the role of sparsity in large scale kernel models using core models adopted from the LSSVM framework. Real-world data is often noisy and only a small fraction of it contains the most relevant information. Sparsity plays a big role in selection of this representative subset of data. We first explored sparsity in the case of large scale LSSVM using fixed-size methods with a re-weighted L1 penalty on top resulting in very sparse LSSVM (VS-LSSVM). An important aspect of kernel based methods is the selection of a subset on which the model is built and validated. We proposed a novel fast and unique representative subset (FURS) selection technique to select a subset from complex networks which retains the inherent community structure in the network. We extend this method for Big Data learning by constructing k-NN graphs out of dense data using a distributed computing platform i.e. Hadoop and then apply the FURS selection technique to obtain representative subsets on top of which models are built by kernel based methods. We then focused on scaling the kernel spectralnbsp;(KSC) technique for big data networks. We devised two model selection techniques namely balanced angular fitting (BAF) and self-tuned KSC (ST-KSC) by exploiting the structure of the projections in the eigenspace to obtain the optimal number of communities k in the large graph. A multilevel hierarchical kernel spectral clustering (MH-KSC) technique was then proposed which performs agglomerative hierarchical clustering using similarity information between the out-of-sample eigen-projections. Furthermore, we developed an algorithm to identify intervals for hierarchical clustering using the Gershgorin Circle theorem. These intervals were used to identify the optimal number of clusters at a given level of hierarchy in combination with KSC model. The MH-KSC technique was extended from networks to images and datasets using the BAF model selection criterion. We also proposed optimal sparse reductions to KSC model by reconstructing the model using a reduced set. We exploited the Group Lasso and convex re-weighted L1 penalty to sparsify the KSC model. Finally, we explored the role of re-weighted L1 penalty in case of feature selection in combination with LSSVM. We proposed a visualization (Netgram) toolkit to track the evolution of communities/clusters over time in case of dynamic time-evolving communities and datasets. Real world applications considered in this thesis include classification and regression of large scale datasets, image segmentation, flat and hierarchical community detection in large scale graphs and visualization of evolving communities.