Download PDF

Neural Networks

Publication date: 2021-10-01
Volume: 142 Pages: 661 - 679
Publisher: Elsevier

Author:

Tonin, Francesco
Patrinos, Patrinos ; Suykens, Johan

Keywords:

Science & Technology, Technology, Life Sciences & Biomedicine, Computer Science, Artificial Intelligence, Neurosciences, Computer Science, Neurosciences & Neurology, Kernel methods, Unsupervised learning, Manifold learning, Learning disentangled representations, COMPONENT ANALYSIS, SUBOPTIMAL SOLUTIONS, BOLTZMANN MACHINES, PCA, Algorithms, Reproducibility of Results, Unsupervised Machine Learning, STADIUS-20-151, C14/18/068#54689594, G0A0920N#55519129, Artificial Intelligence & Image Processing, 4602 Artificial intelligence, 4611 Machine learning, 4905 Statistics

Abstract:

We introduce Constr-DRKM, a deep kernel method for the unsupervised learning of disentangled data representations. We propose augmenting the original deep restricted kernel machine formulation for kernel PCA by orthogonality constraints on the latent variables to promote disentanglement and to make it possible to carry out optimization without first defining a stabilized objective. After discussing a number of algorithms for end-to-end training, we quantitatively evaluate the proposed method's effectiveness in disentangled feature learning. We demonstrate on four benchmark datasets that this approach performs similarly overall to β-VAE on several disentanglement metrics when few training points are available while being less sensitive to randomness and hyperparameter selection than β-VAE. We also present a deterministic initialization of Constr-DRKM's training algorithm that significantly improves the reproducibility of the results. Finally, we empirically evaluate and discuss the role of the number of layers in the proposed methodology, examining the influence of each principal component in every layer and showing that components in lower layers act as local feature detectors capturing the broad trends of the data distribution, while components in deeper layers use the representation learned by previous layers and more accurately reproduce higher-level features.