Download PDF

Change Point Detection based on Time-invariant Modeling of Time Series

Publication date: 2024-11-28

Author:

Cao, Zhenxiang
Bertrand, Alexander ; De Vos, Maarten

Keywords:

STADIUS-24-165

Abstract:

Time series data, comprising sequential measurements of one or more variables over time, offer significant insights into the dynamics of generative systems when segmented based on statistical properties. Change Point Detection (CPD) refers to identifying abrupt changes in statistics within these sequences, which often indicate transitions between different underlying states. Given CPD's crucial role as a pre-processing step in various applications, numerous algorithms have been developed over the past several decades. However, many existing algorithms are tailored to specific datasets and typically require pre-knowledge related to the applications. Even when methods claim broad applicability, their simple assumptions—such as tracking changes solely through the similarity of coefficients of a pre-defined generative model—often limit their detection accuracy and practical adoption. This thesis aims to address these challenges by proposing generic unsupervised algorithms for CPD that can be effectively applied across diverse applications. Our approach begins with an autoencoder-based CPD model known as time-invariant representation (TIRE). While TIRE has demonstrated superior performance compared to many state-of-the-art CPD methods, several limitations need addressing: 1) the proposed time-invariant loss does not effectively prevent the leakage of information between time-invariant and time-variant features, resulting in reduced detection accuracy; 2) the presence of numerous tunable hyperparameters complicates its practical utilization; and 3) the original TIRE model was designed for low-dimensional time series data, making it unsuitable for multi-channel data. To overcome these issues, we propose a new loss function, termed ``diamond loss" to replace the combination of reconstruction loss and time-invariant loss. This new loss function imposes constraints on time-invariant and time-variant features, enhancing their separation and eliminating the need for a trade-off hyperparameter between the two loss terms. Additionally, we develop a multi-view TIRE structure that can automatically preserve CPD-related information from either the time domain, frequency domain, or both, within the time-invariant features without requiring domain-specific knowledge or compromising detection accuracy. Furthermore, we design a new multi-channel TIRE model that explicitly incorporates cross-channel information, making it more suitable for multi-channel time series datasets. In addition to our work on unsupervised CPD algorithms, we explore CPD in a semi-supervised setting. Unlike unsupervised CPD algorithms, which often produce results that diverge significantly from user expectations, semi-supervised CPD algorithms can focus on detecting application-relevant change points while ignoring false positives or irrelevant changes. In this context, we introduce an algorithm called active-learning change point detection (ALCPD). ALCPD consists of a semi-supervised TIRE model that continuously proposes newly detected change points and a random forest model that aims to remove false-positive samples from the new candidate change point sets. These two models are trained alternately, with feedback from the end-user. Our experimental results demonstrate that user feedback significantly enhances detection accuracy. We then propose an interactive change point detection (ICPD) algorithm, where we replace the semi-supervised TIRE model and the random forest model in the ALCPD method with a one-class support vector machine (OCSVM) model. This modification stabilizes the training process by preventing counteraction between the two models in the ALCPD method, allowing the ICPD model to converge quickly after collecting a sufficient number of queries. Through these contributions, this thesis establishes a comprehensive suite of CPD tools that can be readily applied across different settings and scenarios. Our extensive quantitative experiments on a variety of simulated and real-life datasets validate the advantages of these newly proposed methods over existing state-of-the-art CPD approaches. The findings of this research advance the field of time series analysis by providing more robust, accurate, and versatile CPD techniques, enhancing their applicability across diverse domains. This work not only addresses the limitations of current CPD methodologies but also paves the way for future research and development in adaptive and generalizable time series segmentation techniques.