Incorporation of Prior Knowledge into Kernel Based Models

Mehrkanoon, Siamak; Suykens, Johan

Author:

Mehrkanoon, Siamak

Suykens, Johan

Keywords:

SISTA

Abstract:

Incorporation of the available prior knowledge into to learning framework can play an important role in improving the generalization of a machine learning algorithm. The type of available side information can vary depending on the context. The scope of this thesis is the development of learning algorithms that exploit the side information. In particular the focus has been on learning the solution of a dynamical system, parameter estimation and semi-supervised learning. To this end, the prior knowledge is incorporated into the kernel based core model via adding a regularization term and/or set of constraints. In the context of dynamical systems, the available differential equations together with initial/boundary conditions are considered as side information. Starting from a least squares support vector machines (LSSVM) core formulation, the extension to learn the solution of dynamical system governed by ordinary differential equations (ODEs), differential algebraic equations (DAEs) and partial differential equations (PDEs) are considered. The primal-dual optimization formulation typical of LSSVM allows the integration of side information by modifying the primal problem. A kernel based approach for estimating the unknown (constant/time-varying) parameters of a dynamical system described by ordinary differential equations (ODEs) is introduced. The LSSVM serves as a core model to estimate the state trajectories and its derivatives based on the observational data. The approach presents a number of advantages. In particular, it avoids repeated integration of the system and also in case of parameter affine systems, one obtains a convex optimization problem. Moreover for systems with delays (state delay), where the objective function can be non-smooth, the approach shows promising results by converting the problem into an algebraic optimization problem. In many applications ranging from machine learning to data mining, obtaining the labeled samples is costly and time consuming. On the other hand with the recent development of information technologies one can easily encounter a huge amount of unlabeled data coming from the web, smartphones, satellites and etc. In these situations, one may consider to design an algorithm that can learn from both labeled and unlabeled data. In this context, elements such as dealing with data streams (real time data analysis), scalability to large-scale data and model selection criteria become key aspects. Starting from the Kernel Spectral Clustering (KSC) core formulation, which is an unsupervised algorithm, extensions towards integration of available side information and devising a semi-supervised algorithm are a scope of this thesis. A novel multi-class semi-supervised learning algorithm (MSS-KSC) is developed that addresses both semi-supervised classification and clustering. The labeled data points are incorporated into the KSC formulation at the primal level via adding a regularization term. This converts the solution of KSC from an eigenvalue problem to a system of linear equations in the dual. The algorithm realizes a low dimensional embedding for discovering micro clusters. Though the portion of labeled data points is small, one can easily encounter a huge amount of the unlabeled data points. In order to make the algorithm scalable to large scale data two approaches are proposed, Fixed-size and reduced kernel MSS-KSC (FS-MSS-KSC and RD-MSS-KSC). The former relies on the Nystr{\'o}m method for approximating the feature map and solves the problem in the primal whereas the latter uses a reduced kernel technique and solves the problem in the dual. Both approaches possess the out-of-sample extension property to unseen data points. In today's applications, evolving data streams are ubiquitous. Due to the complex underlying dynamics and non-stationary behavior of real-life data, the demand for adaptive learning mechanisms is increasing. An incremental multi-class semi-supervised kernel spectral clustering (I-MSS-KSC) algorithm is proposed for an on-line clustering/classification of time-evolving data. It uses the available side information to continuously adapt the initial MSS-KSC model and learn the underlying complex dynamics of the data stream. The performance of the proposed method is demonstrated on synthetic data sets and real-life videos. Furthermore, for the video segmentation tasks, Kalman filtering is used to provide the labels for the objects in motion and thereby regularizing the solution of I-MSS-KSC.