0% found this document useful (0 votes)
16 views86 pages

AST4

This document outlines Lecture 4 of a course on Machine Learning for Time Series, focusing on data enhancement and preprocessing techniques. Key topics include denoising methods, detrending, interpolation of missing samples, and outlier removal, emphasizing the importance of these processes for preparing noisy time series data for machine learning applications. The lecture also discusses various techniques such as filtering, sparse approximations, and the impact of preprocessing on the results of machine learning models.

Uploaded by

Nicolas Grosse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views86 pages

AST4

This document outlines Lecture 4 of a course on Machine Learning for Time Series, focusing on data enhancement and preprocessing techniques. Key topics include denoising methods, detrending, interpolation of missing samples, and outlier removal, emphasizing the importance of these processes for preparing noisy time series data for machine learning applications. The lecture also discusses various techniques such as filtering, sparse approximations, and the impact of preprocessing on the results of machine learning models.

Uploaded by

Nicolas Grosse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Machine Learning for Time Series

Lecture 4: Data Enhancement and Preprocessings

Laurent Oudre
[Link]@[Link]

Master MVA
2023-2024

Laurent Oudre Machine Learning for Time Series 2023-2024 1 / 86


Contents
1. Problem statement
2. Denoising
2.1 Filtering
2.2 Sparse approximations
2.3 Low-rank approximations
2.4 Other techniques
3. Detrending
3.1 Least Square regression
3.2 Other approaches
4. Interpolation of missing samples
4.1 Polynomial interpolation
4.2 Low-rank interpolation
4.3 Model-based interpolation
5. Outlier removal
5.1 Isolated samples
5.2 Contiguous samples

Laurent Oudre Machine Learning for Time Series 2023-2024 2 / 86


Problem statement

Contents

1. Problem statement

2. Denoising

3. Detrending

4. Interpolation of missing samples

5. Outlier removal

Laurent Oudre Machine Learning for Time Series 2023-2024 3 / 86


Problem statement

The need for preprocessing

▶ Typical usecase: noisy time series with outliers and missing values
▶ In order to apply ML algorithms, the data scientist needs to clean and
consolidate the data
▶ Time-consuming and tedious task: fortunately, ML also provides tools to that
aim!
▶ Careful! All these preprocessing have a strong impact on the expected results
and on the future learned rules!

Laurent Oudre Machine Learning for Time Series 2023-2024 4 / 86


Problem statement

Introductory example

ECG signal during general anesthesia

Laurent Oudre Machine Learning for Time Series 2023-2024 5 / 86


Problem statement

Introductory example

Presence of measurement noise → Denoising

Laurent Oudre Machine Learning for Time Series 2023-2024 6 / 86


Problem statement

Introductory example

Presence of a trend → Detrending

Laurent Oudre Machine Learning for Time Series 2023-2024 7 / 86


Problem statement

Introductory example

Data loss causing missing samples → Interpolation

Laurent Oudre Machine Learning for Time Series 2023-2024 8 / 86


Problem statement

Introductory example

Presence of outliers → Outlier removal and suppression of impulsive noise

Laurent Oudre Machine Learning for Time Series 2023-2024 9 / 86


Problem statement

Introductory example

Break in stationarity → Change-point detection (see Lecture 5)

Laurent Oudre Machine Learning for Time Series 2023-2024 10 / 86


Problem statement

Introductory example

-1
ECG (µ V)

-2

-3

-4

-5

-6
30 31 32 33 34 35
Time (s)

When all preprocessings have been performed, it becomes possible to retrieve the
heartbeats and thus to perform ML

Laurent Oudre Machine Learning for Time Series 2023-2024 11 / 86


Denoising

Contents

1. Problem statement

2. Denoising
2.1 Filtering
2.2 Sparse approximations
2.3 Low-rank approximations
2.4 Other techniques

3. Detrending

4. Interpolation of missing samples

5. Outlier removal

Laurent Oudre Machine Learning for Time Series 2023-2024 12 / 86


Denoising

Additive white Gaussian noise (AWGN) model


The most common model for noisy signals is

y[n] = x[n] + b[n]

▶ x[n] is the clean (unknown) signal


▶ b[n] is the measurement noise, assumed to be additive, white and Gaussian
(AWGN)
▶ y[n] is the measured signal
▶ x[n] and b[n] are uncorrelated

Denoising

Given a noisy signal y[n] corrupted by AWGN, retrieve the clean


signal x[n]

Laurent Oudre Machine Learning for Time Series 2023-2024 13 / 86


Denoising

Notion of AWGN

An AWGN b[n] is:


▶ Additive: the noise therefore corrupts all the samples
▶ White: stationary process with zero-mean and all samples are pairwise
uncorrelated (
σ2 m = 0
γb [m] =
0 otherwise
▶ Gaussian: all samples are i.i.d. according to

b[n] ∼ N (0, σ 2 )

Laurent Oudre Machine Learning for Time Series 2023-2024 14 / 86


Denoising

Example

Electricity consumption data Electricity consumption data


2 2

1.5 1.5

1 1

0.5 0.5

0 0
Usage

Usage
-0.5 -0.5

-1 -1

-1.5 -1.5

-2 -2

-2.5 -2.5
200 400 600 800 1000 1200 1400 200 400 600 800 1000 1200 1400
Minutes Minutes

How can we remove the noise component?

Laurent Oudre Machine Learning for Time Series 2023-2024 15 / 86


Denoising Filtering

Filtering

▶ The first solution consists in using results from signal processing and statistics
▶ Knowing that γx [m] = E [x[n]x[n + m]] and using the fact that x[n] and b[n]
are uncorrelated, we get that

γy [m] = γx [m] + γb [m]

▶ By computing the DFT of this equation, we have

|Y [k]|2 = |X [k]|2 + Nσ 2

▶ Adding AGWN is equivalent to adding a constant on the DFT of the signal (in
linear scale)

Laurent Oudre Machine Learning for Time Series 2023-2024 16 / 86


Denoising Filtering

Example
40
Original signal
2
Noisy signal = 0.01
20

0
10 log10 (|X[k]|2/N) (dB)

-20

-40

-60

-80

-100
-8 -6 -4 -2 0 2 4 6 8
Frequency (Hz) -3
10
2
In the frequency band where only AGWN is present (here with σ = 0.01), the log-spectrum
is equal to
|Y [k]|2 |X [k]|2
   
10 log10 = 10 log10 + σ 2 = 10 log10 (0.01) = −20dB
N N
Laurent Oudre Machine Learning for Time Series 2023-2024 17 / 86
Denoising Filtering

Example
30
Noisy signal
2
20 10 log10( )

10
10 log10 (|X[k]|2/N) (dB)

-10

-20

-30

-40

-50
0 1 2 3 4 5 6 7 8
Frequency (Hz) -3
10

By plotting the log-spectrum of the noisy signal and knowing the noise variance σ 2 ,
one can guess that all frequencies greater that e.g. 0.001 Hz are likely to only
contain noise.

Laurent Oudre Machine Learning for Time Series 2023-2024 18 / 86


Denoising Filtering

Filter design

▶ By observing the log-spectrum of the noisy signal and using either prior
knowledge on the original signal bandwidth or on the noise level, we can
determine the type of filter and associated cut-off frequencies that can be used
for denoising
▶ From that, it is only digital filter design (out of scope for this course !). Two
popular solutions
▶ Moving average filter of length L:

L−1
1X
x̂[n] = y[n − k]
L
k=1

0.442947×F

Low-pass filter with cut-off frequency fc ≈ s
L2 −1
▶ Butterworth filters: can be low-pass, bandpass, etc…

Laurent Oudre Machine Learning for Time Series 2023-2024 19 / 86


Denoising Filtering

Example

2 2

1.5 1.5

1
1

0.5
0.5
0
Usage

Usage
0
-0.5
-0.5
-1

-1
-1.5
Noisy signal
Original signal
Denoised signal with low-pass filtering
-2 -1.5 Denoised signal with low-pass filtering

-2.5 -2
0 500 1000 1500 0 500 1000 1500
Minutes Minutes

Low-pass filtering (Butterworth filter of order 4) with fc = 0.001 Hz

Laurent Oudre Machine Learning for Time Series 2023-2024 20 / 86


Denoising Sparse approximations

Filtering vs. sparsity


▶ As such, filtering a signal consists in picking the frequencies that we want to
keep
▶ Instead of designing a filter, we can attempt to retrieve a sparse frequency
representation for the signal, which is equivalent to remove the small values
on the spectrum
▶ Assumption:
Large Fourier coefficients → Signal
Small Fourier coefficients → Noise
▶ Principle of data compression: thresholding of small values in an appropriate
representation space

X
x̂ = z k dk , with |K| < N
k∈K

Laurent Oudre Machine Learning for Time Series 2023-2024 21 / 86


Denoising Sparse approximations

Dictionaries

Several dictionaries can be used for denoising [Rubinstein et al., 2010]


▶ Fourier dictionary (also called Discrete Cosine Transform for real signals):
(
√1 k=0
dk [n] = N
2 π 1
 
N cos N n+ k 2 1≤k ≤N −1

▶ Wavelet dictionary with wavelet function ψ(t) and scaling function ϕ(t)
[Percival et al., 2000 ; Mallat, 1999]

ϕm,l [n] = 2−m/2 ϕ(2−m n − l) ψm,k [n] = 2−m/2 ψ(2−m n − l)

The dictionary is often computed up to level jmax :


▶ N × 2−j wavelets functions ϕj,l at level 1 ≤ j ≤ jmax with l multiple of 2j : details
▶ N × 2−jmax scaling function ϕjmax ,l with l multiple of 2jmax : approximation
▶ Gabor dictionary, Modified Discrete Cosine Transform (MDCT)…

Laurent Oudre Machine Learning for Time Series 2023-2024 22 / 86


Denoising Sparse approximations

Sparse coding
Given an input dictionary D, the denoising task is equivalent to a sparse coding
task, and all previously seen algorithms can be used to that aim (see Lecture 3)
▶ ℓ0-based algorithms with hard thresholding

2
z∗ = argmin ∥x − Dz∥2
z
∥z∥0 =K0

Only keep the K0 largest coefficients in the decomposition


▶ ℓ1-based algorithms with soft thresholding

2
z∗ = argmin ∥x − Dz∥2 + λ ∥z∥1
z

Set to zero the coefficients that are lower than a given threshold (and shrink
the other ones)
Sλ (z) = sign(z) × max (|z| − λ, 0)
Laurent Oudre Machine Learning for Time Series 2023-2024 23 / 86
Denoising Sparse approximations

Example

2 2

1.5 1.5

1
1

0.5
0.5
0
Usage

Usage
0
-0.5
-0.5
-1

-1
-1.5

-2 Noisy signal -1.5


Original signal
Denoised signal sparse Fourier representation (L0)
Denoised signal sparse Fourier representation (L0)
-2.5 -2
0 500 1000 1500 0 500 1000 1500
Minutes Minutes

Matching pursuit with Fourier dictionary and K0 = 40

Laurent Oudre Machine Learning for Time Series 2023-2024 24 / 86


Denoising Sparse approximations

Example

30

25

20

15

10

0
200 400 600 800 1000 1200 1400
Number of kept coefficients K0

Influence of the K0 parameter on the denoising performances


Blue: distance to the noisy signal, red: distance to the clean signal

Laurent Oudre Machine Learning for Time Series 2023-2024 25 / 86


Denoising Sparse approximations

How to set K0 or λ

▶ The parameters depend on the used dictionary (in particular on orthogonality


properties of the atoms) and on the used algorithm
▶ Heuristics :
▶ Use a training set
▶ Use for λ a certain percentage of λmax = ∥Dt X∥∞
▶ Empirical observation of the distribution of activations can also be used to choose
the parameters (find an elbow on the curve)
▶ Stochastic strategies: divide the over redundant dictionary into several small
dictionary and average the decomposition made on these dictionaries
▶ Statistics (often require a probabilistic model for the data and/or noise) :
▶ Some statistical results on estimators can be used to have an idea of the range of
relevant parameters (Stein’s Unbiased Estimate Risk Estimator (SURE), Minimax
criteria) : see next slides for an example
▶ Model selection strategies can also be used (see Lecture 5 and Tutorial session 3)

Laurent Oudre Machine Learning for Time Series 2023-2024 26 / 86


Denoising Sparse approximations

How to determine the stopping criterion?

▶ Example for the Matching Pursuit algorithm. For greedy denoising


approaches (such as Matching Pursuit), a good denoising strategy would be to
stop once the atoms in the dictionary have captured all relevant information
on the signal, i.e. when the residual is composed of pure noise
▶ An interesting measure is the normalized coherence between the signal x
and the dictionary D

|⟨x, d⟩|
λD (x) = max
d∈D ∥x∥2

Laurent Oudre Machine Learning for Time Series 2023-2024 27 / 86


Denoising Sparse approximations

Stopping criteria for matching pursuit

▶ By denoting r(ℓ) the residual at iteration ℓ,

r(ℓ) = r(ℓ−1) − ⟨r(ℓ−1) , d∗ ⟩d∗

where d∗ is the atom most correlated to r(ℓ−1)


▶ Basic calculations give that

∥r(ℓ) ∥22
= 1 − λ2D (r(ℓ−1) )
∥r(ℓ−1) ∥22
▶ The decreasing of the L2 norm of the residual is therefore linked to the
normalized coherence of the residual with the dictionary
▶ If λD (r(ℓ−1) ) is large, it is worth continuing
▶ If λD (r(ℓ−1) ) becomes too small, the algorithm can stop
▶ When can we say that the coherence becomes too low?

Laurent Oudre Machine Learning for Time Series 2023-2024 28 / 86


Denoising Sparse approximations

Stopping criteria for matching pursuit

▶ One interesting question is therefore: what is the value of λD (r) when the
residual r is pure noise ?
▶ If r is pure random with a known distribution p(r) (e.g. AGWN), we can be
interested in the quantity

λp(r) (D) = Ep(r) [λD (r)]

▶ Intuitively, denoising for x can then be achieved by stopping when the


norrmalized coherence of the residual has the same order of magnitude as this
value
▶ How to compute λp(r) (D) ?
▶ Use a training set of noise signals
▶ Use statistical considerations with parametrized distribution (see mini-project)

Laurent Oudre Machine Learning for Time Series 2023-2024 29 / 86


Denoising Sparse approximations

Use of adaptive dictionaries

▶ Instead of using off-the-shelf dictionaries, we can learn the representation


directly from the signal: dictionary learning (see Lecture 3)
▶ Use of the trajectory matrix X: matrix representation of the input signal
frames
▶ Noise is random: when sparsity is enforced, the approximation tends to only
model signal

Laurent Oudre Machine Learning for Time Series 2023-2024 30 / 86


Denoising Sparse approximations

Trajectory matrix

Nw : window length, No : overlap length


Nw rows, Nf = ⌊ NN−N w
w −No
⌋ + 1 columns

Laurent Oudre Machine Learning for Time Series 2023-2024 31 / 86


Denoising Sparse approximations

Dictionary learning

With algorithms already described in Lecture 3, compute an approximation of the


trajectory matrix

X̂ = DZ ≈ X

▶ D ∈ RNw ×K : dictionary composed of K atoms


▶ Z ∈ RK ×Nf : sparse activations (sparsity level specified with K0 or λ)

Each frame is approximated as a sparse linear combination of the learned atoms

Laurent Oudre Machine Learning for Time Series 2023-2024 32 / 86


Denoising Sparse approximations

Reconstruction from the approximated trajectory matrix

Unfolding of the matrix and averaging along overlapping frames

Laurent Oudre Machine Learning for Time Series 2023-2024 33 / 86


Denoising Sparse approximations

Example

2 2

1.5 1.5

1
1

0.5
0.5
0
Usage

Usage
0
-0.5
-0.5
-1

-1
-1.5
Noisy signal
Denoised signal with dictionary learning -1.5 Original signal
-2 Denoised signal with dictionary learning

-2.5 -2
0 500 1000 1500 0 500 1000 1500
Minutes Minutes

Dictionary learning with K = 5, K0 = 2, Nw = 32, No = 28

Laurent Oudre Machine Learning for Time Series 2023-2024 34 / 86


Denoising Low-rank approximations

Trajectory matrix

▶ When No = Nw − 1, we have Nf = N − Nw + 1 and the trajectory matrix


X ∈ RNw ×Nf has a particular form
 
x[0] ··· x[N − Nw − 1]
 x[1] ··· x[N − Nw ] 
X=
 
.. .. .. 
 . . . 
x[Nw − 1] ··· x[N − 1]

▶ It contains all Nf sequences of length Nw in the time series


▶ Low-rank approximations attempt to reconstruct matrix X as the sum of
K < min(Nw , Nf ) rank-one matrices

Laurent Oudre Machine Learning for Time Series 2023-2024 35 / 86


Denoising Low-rank approximations

Singular Value Decomposition

Assuming that Nw < Nf , the Singular Value Decomposition (SVD) of matrix X


writes:
X = |{z}
U |{z} Vt
Λ |{z}
Nw ×Nw Nw ×Nf Nf ×Nf

where
▶ U and V are orthogonal matrices
▶ Λ is a diagonal matrix containing on its first diagonal at most Nw singular
values λ1 ≥ . . . ≥ λNw

Nw
X
X= λk uk vtk
k=1

Laurent Oudre Machine Learning for Time Series 2023-2024 36 / 86


Denoising Low-rank approximations

Interpretation of the singular values

▶ For a zero-mean stationary signal, the lag-covariance matrix for lag Nw can be
estimated as:
1
CX = XXt
Nf
▶ Definite positive matrix with eigen decomposition

CX = ṼΛ̃Ṽt

λ̃k corresponds to the contribution of the direction given by k th eigenvector to


the global variance (see Lecture 2 on Principal Component Analysis (PCA))
▶ Basic computations give that
q
λk ∝ λ̃k
which provides a natural interpretation of the singular values of the trajectory
matrix

Laurent Oudre Machine Learning for Time Series 2023-2024 37 / 86


Denoising Low-rank approximations

Singular Spectrum Analysis (SSA)

This principle is the core of the Singular Spectrum Analysis (SSA) algorithm
[Vautard et al., 1992]:
1. Compute the SVD of the trajectory matrix
Nw
X
X= λk uk vtk
k=1

2. By analyzing the singular value distribution, form groups K1 , K2 , . . . , KM of


singular values corresponding to similar phenomenon
X X
X≈ λk uk vtk + . . . + λk uk vtk
k∈K1 k∈KM

Laurent Oudre Machine Learning for Time Series 2023-2024 38 / 86


Denoising Low-rank approximations

Using SSA for denoising

▶ Intuitively, for reasonable signal-to-noise ratio, signal should be dominant and


thus corresponds to the largest singular values
▶ By plotting the singular values λk as a function of k, it is possible to detect and
group the different phenomenon within the time series
▶ For denoising, it is common to remove all components corresponding to small
singular values
▶ Choice of Nw (only parameter): longest periodicity captured by SSA

Laurent Oudre Machine Learning for Time Series 2023-2024 39 / 86


Denoising Low-rank approximations

Example

250 2

1
0

-2
0 500 1000 1500
200 0.2
0.1

2
0
Singular value k

0 500 1000 1500


150
0

3
-0.1
-0.2
100 0 500 1000 1500
0.05

4
0

-0.05
50 0 500 1000 1500
0.05

5
0

0 -0.05
0 500 1000 1500
0 5 10 15 20 25 30 35
k

Singular values and reconstructed components with Nw = 32. From the graphs it
appears that the two first singular values are likely to be signal

Laurent Oudre Machine Learning for Time Series 2023-2024 40 / 86


Denoising Low-rank approximations

Example

2 2
Noisy signal Original signal
1.5 Denoised signal with SSA Denoised signal with SSA
1.5

1
1

0.5
0.5
0
Usage

Usage
0
-0.5
-0.5
-1

-1
-1.5

-2 -1.5

-2.5 -2
0 500 1000 1500 0 500 1000 1500
Minutes Minutes

Denoising with SSA with Nw = 32 and using only the two first components.

Laurent Oudre Machine Learning for Time Series 2023-2024 41 / 86


Denoising Other techniques

Other techniques

Several other decomposition techniques can be used:


▶ Independent Component Analysis (ICA): decompose the signal into the
sum of statistically independent components [Comon, 1994]
▶ Useful for blind source separation and unmixing (e.g. in EEG data or audio)
▶ Algorithms based on the optimization of several measures of independence
(mutual information, gaussianity etc.)
▶ Empirical Mode Decomposition (EMD): decompose the signal into the sum
of oscillary modes with various amplitude and frequency [Flandrin et al., 2004 ;
Boudraa et al., 2006]
▶ Useful for denoising but also detrending
▶ Algorithms based on the iterative modeling of the signal as splines

Laurent Oudre Machine Learning for Time Series 2023-2024 42 / 86


Detrending

Contents

1. Problem statement

2. Denoising

3. Detrending
3.1 Least Square regression
3.2 Other approaches

4. Interpolation of missing samples

5. Outlier removal

Laurent Oudre Machine Learning for Time Series 2023-2024 43 / 86


Detrending

Trend+Seasonality model

The trend+seasonality model writes as

x[n] = α1 β1 (nTs ) + . . . + αj βj (nTs ) + αj+1 βj+1 (nTs ) + . . . + αd βd (nTs ) +b[n]


| {z } | {z }
x trend [n] x seasonality [n]

▶ Seasonality: pseudo-periodic component


▶ Trend: smooth variations, systematic increase or decrease in the data

Detrending

Given a signal x[n], estimate and remove the trend component x trend [n]

Laurent Oudre Machine Learning for Time Series 2023-2024 44 / 86


Detrending Least Square regression

Standard models

The most common trend models are:


▶ Constant trend
x trend [n] = α0
▶ Linear trend
x trend [n] = α1 (nTs ) + α0
▶ Polynomial trend
K
k
X
x trend [n] = αk (nTs )
k=0

Laurent Oudre Machine Learning for Time Series 2023-2024 45 / 86


Detrending Least Square regression

Least-square regression
▶ Least-square estimator: minimization of

∥x − βα∥2

where  
β0 (0) ··· βK (0)
 β0 (Ts ) ··· βK (Ts ) 
β=
 
.. .. .. 
 . . . 
β0 ((N − 1)Ts ) · · · βK ((N − 1)Ts )
▶ Closed form solution
−1
α̂ = β T β βT x

▶ Estimation of the trend

xtrend = β α̂

Laurent Oudre Machine Learning for Time Series 2023-2024 46 / 86


Detrending Least Square regression

Example

0.5 0.6

0.4 0.4

0.3
0.2
0.2
0
0.1
-0.2
0
-0.4
-0.1
-0.6
-0.2
-0.8
-0.3

-0.4 -1

-0.5 -1.2
0 20 40 60 80 100 0 20 40 60 80 100
Time (s) Time (s)

Signal with/without trend

Laurent Oudre Machine Learning for Time Series 2023-2024 47 / 86


Detrending Least Square regression

Example

0.6 0.1

0.4 0

-0.1
0.2
-0.2
0
-0.3
-0.2
-0.4
-0.4
-0.5
-0.6
-0.6
-0.8
-0.7

-1 -0.8

-1.2 -0.9
0 20 40 60 80 100 0 20 40 60 80 100
Time (s) Time (s)

Regression on polynomials of order 3

Laurent Oudre Machine Learning for Time Series 2023-2024 48 / 86


Detrending Least Square regression

Example

0.1 0.6
Signal original
0 Signal détrendé
0.4
-0.1

-0.2
0.2
-0.3

-0.4 0

-0.5
-0.2
-0.6

-0.7
-0.4
-0.8

-0.9 -0.6
0 20 40 60 80 100 0 20 40 60 80 100
Time (s) Time (s)

Regression on polynomials of order 3

Laurent Oudre Machine Learning for Time Series 2023-2024 49 / 86


Detrending Other approaches

Other approaches

Other approaches for detrending include


▶ Filtering techniques, as trends often correspond to low frequencies or
smooth components (low-pass/bandpass filters, Fourier or wavelets
thresholding…)
▶ Decomposition techniques, as trends may be considered independent of the
seasonality and/or the noise component (EMD, SSA, ICA…)

Laurent Oudre Machine Learning for Time Series 2023-2024 50 / 86


Interpolation of missing samples

Contents

1. Problem statement

2. Denoising

3. Detrending

4. Interpolation of missing samples


4.1 Polynomial interpolation
4.2 Low-rank interpolation
4.3 Model-based interpolation

5. Outlier removal

Laurent Oudre Machine Learning for Time Series 2023-2024 51 / 86


Interpolation of missing samples

Interpolation of missing samples


Electricity consumption data Electricity consumption data
2 2

1.5 1.5

1 1

0.5 0.5
Usage

Usage
0 0

-0.5 -0.5

-1 -1

-1.5 -1.5

-2 -2
200 400 600 800 1000 1200 1400 200 400 600 800 1000 1200 1400
Minutes Minutes

Interpolation of missing samples

Given a signal x and a set of missing samples T , estimate the missing


samples x̂T

Laurent Oudre Machine Learning for Time Series 2023-2024 52 / 86


Interpolation of missing samples

Interpolation of missing samples

▶ Missing data are very frequent :


▶ Sensor malfunctions
▶ Clipping effect
▶ Corrupted samples
▶ Missing data can take several forms
▶ Isolated samples: easy to handle
▶ Contiguous samples (up to 100): necessitates a full reconstruction
▶ Interpolation includes prediction and inpainting [Lepot et al., 2017]

Laurent Oudre Machine Learning for Time Series 2023-2024 53 / 86


Interpolation of missing samples Polynomial interpolation

Polynomial interpolation
Given a time series x that we want to interpolate on the integer set T = Jnstart , nend K, the
easiest interpolation strategy consists in using polynomial models for the reconstruction
▶ Constant value

x[nstart − 1] + x[nend + 1]
∀n ∈ T , x̂[n] =
2

▶ Linear interpolation

∀n ∈ T , x̂[n] = β1 n + β0

where β0 , β1 are determined with the values x[nstart − 1] and x[nend + 1]


▶ Cubic spline interpolation [McKinley et al., 1998]

∀n ∈ T , x̂[n] = β3 n3 + β2 n2 + β1 n + β0

where βk are determined by solving a system of equations based on x[nstart − 2], x[nstart − 1],
x[nend + 1] and x[nend + 2]

Laurent Oudre Machine Learning for Time Series 2023-2024 54 / 86


Interpolation of missing samples Polynomial interpolation

Example

1.5 1.5
Original signal Original signal
Linear interpolation Cubic spline interpolation
1 1

0.5 0.5

0 0

-0.5 -0.5

-1 -1

-1.5 -1.5

-2 -2
0 500 1000 1500 0 500 1000 1500
Minutes Minutes

Laurent Oudre Machine Learning for Time Series 2023-2024 55 / 86


Interpolation of missing samples Polynomial interpolation

Pros and cons

▶ Easy to implement and good results for small segments


▶ In particular, when only a few missing samples: constant values is often the
best
▶ When the degree of the polynomial increases, instabilities may occur (strong
dependency with the neighborhood samples)
▶ When used extensively, may lead to a smoothing of the signal hence a change
in the spectrum (boosting of the low frequencies)

Laurent Oudre Machine Learning for Time Series 2023-2024 56 / 86


Interpolation of missing samples Low-rank interpolation

Low-rank interpolation

▶ The low-rank assumption on the trajectory matrix can also be used for
reconstructing missing samples
 
x[0] · · · x[N − Nw − 1]
 x[1] ··· x[N − Nw ] 
X=
 
.. .. .. 
 . . . 
x[Nw − 1] ··· x[N − 1]

▶ In this case, we will use the Singular Value Decomposition adapted to data
with missing values [Srebro et al., 2003]
▶ These techniques are efficient for medium-size missing patches, as the
low-rank assumption is usually only valid for relatively small windows

Laurent Oudre Machine Learning for Time Series 2023-2024 57 / 86


Interpolation of missing samples Low-rank interpolation

Principle

▶ The main idea is to compute a low-rank approximation of the trajectory matrix


X ∈ RNw ×Nf , where only the largest singular values are kept

K
X
X̂ = λk uk vtk
k=1

where k < min(Nw , Nf )


▶ But how do we compute the SVD for a matrix that contains missing values?
▶ Mask matrix: (
0 if Xi,j is missing
Wi,j =
1 else
▶ Low-rank approximation will only be used to update the missing samples

Laurent Oudre Machine Learning for Time Series 2023-2024 58 / 86


Interpolation of missing samples Low-rank interpolation

Low-rank interpolation

Algorithm 1: Low-rank interpolation


Input : Trajectory matrix X with missing values
Mask matrix W
Expected rank K
Output: Interpolated trajectory matrix X̂
Initialize X̂;
while niter < nmax do
SVD computation;

[U, Λ, V] = SVD X ⊙ W + X̂ ⊙ (1 − W) ;
Low-rank approximation;
X̂ = Kk=1 λk uk vtk
P

end

Laurent Oudre Machine Learning for Time Series 2023-2024 59 / 86


Interpolation of missing samples Low-rank interpolation

Example

1.5
Original signal
Low-rank approximation
1

0.5

-0.5

-1

-1.5

-2
0 500 1000 1500
Minutes

Nw = 300, K = 10

Laurent Oudre Machine Learning for Time Series 2023-2024 60 / 86


Interpolation of missing samples Low-rank interpolation

Pros and cons

▶ Good results for medium size segments


▶ Always set Nw greater than the largest missing patch: when Nw becomes too
large, the low-rank approximation becomes less valid
▶ Variant with adaptive rank can provide better results [Srebro et al., 2003]

Kniter = max(min(Nw , Nf ) − niter , K )

▶ Rank K can be estimated by computing the SVD of the initialized trajectory


matrix (with polynomial reconstruction for instance)

Laurent Oudre Machine Learning for Time Series 2023-2024 61 / 86


Interpolation of missing samples Model-based interpolation

Model-based interpolation

▶ For long segments of missing samples, interpolation becomes a full


reconstruction task
▶ In this case a model is necessary to obtain a satisfactory interpolation
1. Choice of an adequate model
2. Parameter inference from the known samples
3. Replacement of the missing samples by values in adequacy with the learned model

Laurent Oudre Machine Learning for Time Series 2023-2024 62 / 86


Interpolation of missing samples Model-based interpolation

Model-based interpolation

▶ Problem: how do we estimate the parameters from a time series with missing
data?
▶ Iterative solution
1. Initialization of the missing samples with simple rough estimates (set to zero,
constant or linear interpolation…)
2. Parameter inference from all samples
3. Reconstruction of the missing samples from the learned model
4. Repeat steps 2 and 3 until convergence

Laurent Oudre Machine Learning for Time Series 2023-2024 63 / 86


Interpolation of missing samples Model-based interpolation

AR-based interpolation

▶ For an AR(p) model, given estimates of parameters â, the signal can be
reconstructed by assuming that
p
X
x[n] ≈ − âi x[n − i]
i=1

▶ The prediction error on the whole time series writes

N−1 p 2
X X
E(x) = x[n] + âi x[n − i]
n=p i=1

▶ The main idea is to minimize this quantity in order to retrieve appropriate


values for the missing samples [Janssen et al., 1986]

Laurent Oudre Machine Learning for Time Series 2023-2024 64 / 86


Interpolation of missing samples Model-based interpolation

AR-based interpolation

x∗ = argmin E(x̃)
∀n∈T
/ ,x̃[n]=x[n]

▶ This optimization problem has a closed form solution (least-square estimates) that is
obtained by rewritting E(x) as the sum of terms depending on the missing samples
n ∈ T and other depending only on the known samples.
▶ By denoting xT the set of missing samples, the equation rewrites
E(x) = xT T BxT + 2xT d + C
where  ′
X |
p−|t−t

âl âl+|t−t ′ | if 0 ≤ |t − t ′ | ≤ p


▶ ∀(t, t ) ∈ T , bt,t ′ =

 l=0
0 else

X
▶ ∀(t, t ′ ) ∈ T , dt = b|k| x[t − k]
−p≤k≤p
t−k ∈T
/
▶ C is a constant only depending on the known samples
▶ The final problem is simply a linear system and thus easy to solve
BxT = −d
Laurent Oudre Machine Learning for Time Series 2023-2024 65 / 86
Interpolation of missing samples Model-based interpolation

AR-based interpolation

Algorithm 2: AR-based interpolation


Inputs : Time series x ∈ RN with missing values
Set of missing samples T
AR model order p
Output: Interpolated samples x̂T
x̂T = 0|T | ;
while niter < nmax do
AR estimation step;
Estimate â with the Levinson Durbin algorithm;
AR interpolation step;
Compute B and d and solve for x̂T ;
Set xT = x̂T ;
end

Laurent Oudre Machine Learning for Time Series 2023-2024 66 / 86


Interpolation of missing samples Model-based interpolation

Example

1.5 0.18
Original signal
AR(90) interpolation
1 0.16

0.5 0.14

0 0.12

RMSE
-0.5 0.1

-1 0.08

-1.5 0.06

-2 0.04
0 500 1000 1500 1 2 3 4 5 6 7 8 9 10
Minutes Iteration

Interpolation with AR(90) model

Laurent Oudre Machine Learning for Time Series 2023-2024 67 / 86


Outlier removal

Contents

1. Problem statement

2. Denoising

3. Detrending

4. Interpolation of missing samples

5. Outlier removal
5.1 Isolated samples
5.2 Contiguous samples

Laurent Oudre Machine Learning for Time Series 2023-2024 68 / 86


Outlier removal

Outlier removal

-0.02 2

-0.04
1.5

-0.06

1
-0.08

-0.1 0.5

-0.12
0

-0.14

-0.5
-0.16

-0.18 -1
70.14 70.16 70.18 70.2 70.22 70.24 70.26 70.28 70.3 70.32 8.3 8.35 8.4 8.45 8.5 8.55 8.6 8.65
Time (s) Time (s)

Outliers, also called impulsive noise (as opposed to AWGN) correspond to spurious
samples (isolated or continuous) that take unlikely values

Laurent Oudre Machine Learning for Time Series 2023-2024 69 / 86


Outlier removal

Outlier removal

Outlier removal

Given a signal x[n], outlier removal consists in detecting the locations


T of the outliers (detection phase) and to replace these values with
more adequate values (interpolation phase)

▶ Interpolation phase can be done by using the previously described algorithms.


We will therefore focus on the detection phase.
▶ Two settings: isolated samples or contiguous group of samples
▶ Outliers are not only characterized by their values but also on their positions
in the time series: context is fundamental

Laurent Oudre Machine Learning for Time Series 2023-2024 70 / 86


Outlier removal Isolated samples

Isolated samples

0.5 -0.02

0.4
-0.04
0.3
-0.06
0.2
-0.08
0.1

0 -0.1

-0.1
-0.12
-0.2
-0.14
-0.3
-0.16
-0.4

-0.5 -0.18
0 20 40 60 80 100 70.14 70.16 70.18 70.2 70.22 70.24 70.26 70.28 70.3 70.32
Time (s) Time (s)

Impulsive noise that only corrupts isolated samples

Laurent Oudre Machine Learning for Time Series 2023-2024 71 / 86


Outlier removal Isolated samples

Histogram

0.07

0.06

0.05
Count (in %)

0.04

0.03

0.02

0.01

0
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
Value

If the values taken by the impulsive noise are particularly large with respect to the
signal, they can be detected by looking at the histogram of the values taken by the
samples: similar to outlier detection in statistical data

Laurent Oudre Machine Learning for Time Series 2023-2024 72 / 86


Outlier removal Isolated samples

Median filtering

▶ Outliers can be detected AND removed by using a sliding median filtering that
replaces each value by the median of the samples in a window of length 2w + 1:

x̂[n] = median−w≤i≤+w {x[n − i]}

▶ Median filtering allows to smooth the time series while preserving the
discontinuities
▶ Example : original signal [0.3 0.4 0.45] and noisy signal [0.3 0.9 0.45]
▶ Moving average filter: 0.9 → 0.55
▶ Median filter: 0.9 → 0.375

Laurent Oudre Machine Learning for Time Series 2023-2024 73 / 86


Outlier removal Isolated samples

Median filtering

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0

-0.1 -0.1

-0.2 -0.2

-0.3 -0.3

-0.4 -0.4

-0.5 -0.5
0 20 40 60 80 100 0 20 40 60 80 100
Time (s) Time (s)

Perfect reconstruction with median filtering (2w + 1 = 3 samples)

Laurent Oudre Machine Learning for Time Series 2023-2024 74 / 86


Outlier removal Contiguous samples

Contiguous samples

0.5 2

0.4
1.5
0.3

0.2
1
0.1

0 0.5

-0.1
0
-0.2

-0.3
-0.5
-0.4

-0.5 -1
0 20 40 60 80 100 8.3 8.35 8.4 8.45 8.5 8.55 8.6 8.65
Time (s) Time (s)

Impulsive noise that corrupts groups of contiguous samples

Laurent Oudre Machine Learning for Time Series 2023-2024 75 / 86


Outlier removal Contiguous samples

Contiguous samples

▶ When the impulsive noise corrupts groups of contiguous samples, studying the
values is not sufficient
▶ In order to retrieve the set of outliers T , using a model may be necessary
▶ Outliers: samples that are far from their predicted values according to a model
▶ Same principle that model-based interpolation: parameter estimation,
detection, interpolation and reiterate
▶ Note: this task is close to the Anomaly Detection task (see Lecture 5)

Laurent Oudre Machine Learning for Time Series 2023-2024 76 / 86


Outlier removal Contiguous samples

AR-based outlier detection

p
X
x[n] = − ai x[n − i] + b[n]
i=1

▶ Given estimates of the AR parameters â, the prediction error writes:


p
X
e[n] = x[n] + âi x[n − i]
i=1

▶ If adapted model, good parameter estimation and low noise variance, this
quantity must be rather small for samples that are not outliers [Oudre, 2015]
▶ Detection method with threshold λ :

T = {n s.t. |e[n]| > λ}

Laurent Oudre Machine Learning for Time Series 2023-2024 77 / 86


Outlier removal Contiguous samples

AR-based outlier detection

0.5 2

0.4
1.5
0.3

0.2 1

0.1
0.5
0
0
-0.1

-0.2 -0.5

-0.3
-1
-0.4

-0.5 -1.5
0 20 40 60 80 100 0 20 40 60 80 100
Time (s) Time (s)

Detection with AR(10) model

Laurent Oudre Machine Learning for Time Series 2023-2024 78 / 86


Outlier removal Contiguous samples

AR-based outlier detection and removal

In order to perform both detection and removal of impulsive noise, alternance


between
1. Estimation step: learn the AR parameters from the current time series
2. Detection step: detect the set of outliers
3. Interpolation step: replace these outliers by appropriate values
4. Reiterate steps 1, 2, 3

Laurent Oudre Machine Learning for Time Series 2023-2024 79 / 86


Outlier removal Contiguous samples

AR-based outlier detection and removal

Algorithm 3: AR-based outlier detection and removal


Inputs : Time series x ∈ RN with outliers
AR model order p, Threshold λ
Output: Denoised time series x̂ ∈ RN
x̂ = x;
while niter < nmax do
AR estimation step;
Estimate â from x̂ with the Levinson Durbin algorithm;
Detection step;
Compute e and set T = {n s.t. |e[n]| > λ};
AR interpolation step;
Compute B and d and solve for x̂T ;
end

Laurent Oudre Machine Learning for Time Series 2023-2024 80 / 86


Outlier removal Contiguous samples

Example

2 2

1.5 1.5

1 1

0.5 0.5

0 0

-0.5 -0.5

-1 -1

-1.5 -1.5
0 20 40 60 80 100 0 20 40 60 80 100
Time (s) Time (s)

Iteration 1: Detection with AR(10) model

Laurent Oudre Machine Learning for Time Series 2023-2024 81 / 86


Outlier removal Contiguous samples

Example

2 2

1.5 1.5

1 1

0.5 0.5

0 0

-0.5 -0.5

-1 -1

-1.5 -1.5
0 20 40 60 80 100 0 20 40 60 80 100
Time (s) Time (s)

Iteration 1: Interpolation with AR(10) model

Laurent Oudre Machine Learning for Time Series 2023-2024 82 / 86


Outlier removal Contiguous samples

Example

2 2

1.5 1.5

1 1

0.5 0.5

0 0

-0.5 -0.5

-1 -1

-1.5 -1.5
0 20 40 60 80 100 0 20 40 60 80 100
Time (s) Time (s)

Iteration 2: Detection with AR(10) model

Laurent Oudre Machine Learning for Time Series 2023-2024 83 / 86


Outlier removal Contiguous samples

Example

2 2

1.5 1.5

1 1

0.5 0.5

0 0

-0.5 -0.5

-1 -1

-1.5 -1.5
0 20 40 60 80 100 0 20 40 60 80 100
Time (s) Time (s)

Iteration 2: Interpolation with AR(10) model

Laurent Oudre Machine Learning for Time Series 2023-2024 84 / 86


Outlier removal Contiguous samples

References

▶ Rubinstein, R., Bruckstein, A. M., & Elad, M. (2010). Dictionaries for sparse representation modeling. Proceedings of the IEEE, 98(6), 1045-1057.
▶ Percival, D. B., & Walden, A. T. (2000). Wavelet methods for time series analysis (Vol. 4). Cambridge university press.
▶ Mallat, S. (1999). A wavelet tour of signal processing. Elsevier.
▶ Vautard, R., Yiou, P., & Ghil, M. (1992). Singular-spectrum analysis: A toolkit for short, noisy chaotic signals. Physica D: Nonlinear Phenomena, 58(1-4), 95-126.
▶ Flandrin, P., Goncalves, P., & Rilling, G. (2004, September). Detrending and denoising with empirical mode decompositions. In 2004 12th European Signal
Processing Conference (pp. 1581-1584). IEEE.
▶ Boudraa, A. O., & Cexus, J. C. (2006). Denoising via empirical mode decomposition. Proc. IEEE ISCCSP, 4(2006).
▶ Comon, P. (1994). Independent component analysis, a new concept?. Signal processing, 36(3), 287-314.
▶ Lepot, M., Aubin, J. B., & Clemens, F. H. (2017). Interpolation in time series: An introductive overview of existing methods, their performance criteria and
uncertainty assessment. Water, 9(10), 796.
▶ McKinley, S., & Levine, M. (1998). Cubic spline interpolation. College of the Redwoods, 45(1), 1049-1060.
▶ Srebro, N., & Jaakkola, T. (2003). Weighted low-rank approximations. In Proceedings of the 20th International Conference on Machine Learning (ICML-03)
(pp. 720-727).
▶ Janssen, A. J. E. M., Veldhuis, R., & Vries, L. (1986). Adaptive interpolation of discrete-time signals that can be modeled as autoregressive processes. IEEE
Transactions on Acoustics, Speech, and Signal Processing, 34(2), 317-330.
▶ Oudre, L. (2015). Automatic detection and removal of impulsive noise in audio signals. Image Processing On Line, 5, 267-281.

Laurent Oudre Machine Learning for Time Series 2023-2024 85 / 86


Outlier removal Contiguous samples

List of possible topics/projects


▶ Flandrin, P., Goncalves, P., & Rilling, G. (2004, September). Detrending and denoising with empirical mode
decompositions. In 2004 12th European Signal Processing Conference (pp. 1581-1584). IEEE.
How to use EMD for denoising and detrending.
▶ Rhif, M., Ben Abbes, A., Farah, I. R., Martı́nez, B., & Sang, Y. (2019). Wavelet transform application for/in
non-stationary time-series analysis: a review. Applied Sciences, 9(7), 1345.
How to use wavelets to work on non-stationary time series.
▶ Bayer, F. M., Kozakevicius, A. J., & Cintra, R. J. (2019). An iterative wavelet threshold for signal denoising. Signal
Processing, 162, 10-20.
How to use adaptive wavelet thresholding for denoising
▶ Moussallam, M., Gramfort, A., Daudet, L., & Richard, G. (2014). Blind denoising with random greedy pursuits.
IEEE Signal Processing Letters, 21(11), 1341-1345.
How to use statistical considerations to set the parameters in greedy denoising approaches
▶ Aharon, M., Elad, M., & Bruckstein, A. (2006). K-SVD: An algorithm for designing overcomplete dictionaries for
sparse representation. IEEE Transactions on signal processing, 54(11), 4311-4322.
How to learn an overcomplete dictionary with K-SVD
▶ de Cheveigné, A., & Arzounian, D. (2018). Robust detrending, rereferencing, outlier detection, and inpainting for
multichannel data. Neuroimage, 172, 903-912.
How to combine detrending, outlier detection and removal for multichannel data
▶ Hassani, H., & Mahmoudvand, R. (2013). Multivariate singular spectrum analysis: A general view and new vector
forecasting approach. International Journal of Energy and Statistics, 1(01), 55-83.
How to use SSA for forecasting time series
▶ Adler, A., Emiya, V., Jafari, M. G., Elad, M., Gribonval, R., & Plumbley, M. D. (2011). Audio inpainting. IEEE
Transactions on Audio, Speech, and Language Processing, 20(3), 922-932..
How to use sparse representation to perform audio inpainting

Laurent Oudre Machine Learning for Time Series 2023-2024 86 / 86

You might also like