0% found this document useful (0 votes)
25 views15 pages

Deep Learning Based Ensemble Approach For Probabilistic Wind Power Forecasting

This document presents a novel deep learning-based ensemble approach for probabilistic wind power forecasting, utilizing convolutional neural networks (CNN) and wavelet transform to enhance forecast accuracy. The proposed method effectively addresses uncertainties in wind power data by separately evaluating model misspecification and data noise. Extensive assessments using real wind farm data demonstrate the competitive performance and robustness of the approach in mitigating risks associated with wind power integration into power systems.

Uploaded by

1289922836
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views15 pages

Deep Learning Based Ensemble Approach For Probabilistic Wind Power Forecasting

This document presents a novel deep learning-based ensemble approach for probabilistic wind power forecasting, utilizing convolutional neural networks (CNN) and wavelet transform to enhance forecast accuracy. The proposed method effectively addresses uncertainties in wind power data by separately evaluating model misspecification and data noise. Extensive assessments using real wind farm data demonstrate the competitive performance and robustness of the approach in mitigating risks associated with wind power integration into power systems.

Uploaded by

1289922836
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Applied Energy 188 (2017) 56–70

Contents lists available at ScienceDirect

Applied Energy
journal homepage: www.elsevier.com/locate/apenergy

Deep learning based ensemble approach for probabilistic wind power


forecasting
Huai-zhi Wang a, Gang-qiang Li a, Gui-bin Wang b,⇑, Jian-chun Peng a, Hui Jiang c, Yi-tao Liu a
a
The College of Mechatronics and Control Engineering, Shenzhen University, Shenzhen 518060, China
b
Shenzhen Key Laboratory of Urban Rail Transit, The College of Urban Rail Transit, Shenzhen University, Shenzhen 518060, China
c
The College of Optoelectronic Engineering, Shenzhen University, Shenzhen 518060, China

h i g h l i g h t s

 Convolutional neural network is designed for probabilistic wind power forecasting.


 Ensemble technique is used to cancel out the diverse errors of point forecasters.
 The model misspecification and data noise in wind power are separately evaluated.
 The competitive performance and robustness of the proposed method were proved.

a r t i c l e i n f o a b s t r a c t

Article history: Due to the economic and environmental benefits, wind power is becoming one of the more promising
Received 9 September 2016 supplements for electric power generation. However, the uncertainty exhibited in wind power data is
Received in revised form 8 November 2016 generally unacceptably large. Thus, the data should be accurately evaluated by operators to effectively
Accepted 26 November 2016
mitigate the risks of wind power on power system operations. Recognizing this challenge, a novel deep
Available online 8 December 2016
learning based ensemble approach is proposed for probabilistic wind power forecasting. In this approach,
an advanced point forecasting method is originally proposed based on wavelet transform and convolu-
Keywords:
tional neural network. Wavelet transform is used to decompose the raw wind power data into different
Convolutional neural network
Ensemble
frequencies. The nonlinear features in each frequency that are used to improve the forecast accuracy are
Probabilistic wind power forecast later effectively learned by the convolutional neural network. The uncertainties in wind power data, i.e.,
Deep learning the model misspecification and data noise, are separately identified thereafter. Consequently, the prob-
Wavelet transform abilistic distribution of wind power data can be statistically formulated. The proposed ensemble
approach has been extensively assessed using real wind farm data from China, and the results demon-
strate that the uncertainties in wind power data can be better learned using the proposed approach
and that a competitive performance is obtained.
Ó 2016 Elsevier Ltd. All rights reserved.

1. Introduction environmental pollution reduction [2]. Coupled with its mature


technology, wind energy has experienced an unexpected annual
Due to the continuous decrease in the storage capacity of fossil growth on a global scale. Wind energy can be used to drive engines
fuel, the energy crisis is becoming more significant than ever [1]. directly and provide rural energy services. In [3], a novel mean flow
Therefore, to mitigate the energy crisis, regulatory acts that acoustic engine with a cross-junction configuration was designed
encourage the use of renewable energy have been promoted to convert wind energy in a pipeline into acoustic energy, and its
worldwide. Among the renewable energy resources, wind energy, efficiency was numerically analyzed in [4] by using computational
as an alternative to fossil energy, has attracted much attention fluid dynamics method. In practice, wind energy is mainly utilized
due to its beneficial impacts on climate change mitigation and to mechanically power generators for electricity. The annual
growth rate of worldwide wind power has been between 20% to
35% per year since 2000 [5]. However, due to the chaotic nature
⇑ Corresponding author. of the earth’s atmosphere, wind generated power always exhibits
E-mail addresses: [email protected] (H.-z. Wang), [email protected] (G.-q. Li), nonlinear and non-stationary uncertainties, which pose great chal-
[email protected] (G.-b. Wang), [email protected] (J.-c. Peng), [email protected]. lenges for the management and operations of electric power and
cn (H. Jiang).

http://dx.doi.org/10.1016/j.apenergy.2016.11.111
0306-2619/Ó 2016 Elsevier Ltd. All rights reserved.
H.-z. Wang et al. / Applied Energy 188 (2017) 56–70 57

Nomenclature

ACE average coverage error BP back-propagation algorithm


CNN convolutional neural network CRPS continuous ranking probability score
DBM deep restricted Boltzmann machine IS interval sharpness
MWWF milky way wind farm NN neural network
PI prediction interval PINC prediction interval nominal confidence
QR quantile regression SAE stacked auto-encoder
SIWF Shangchuan island wind farm SVM support vector machine
WPF wind power forecasting WT wavelet transform
An wavelet approximation signal CDFi cumulative distribution function at time step i
Dn wavelet detail signal DSdu dataset used for data noise uncertainty evaluation
DSmu dataset used for model uncertainty evaluation Em squared-error loss function considering m batches
GD Gaussian distribution H() indicative function
Iai PI at time step i given PINC = 100(1  a)% Lah lower bound of PI given target h and PINC
Mdu mean of data noise uncertainty Me mean of the uncertainty signal e(xi)
NE number of ensembles NM number of selected input maps
NS number of training samples Uah upper bound of PI given target h and PINC
W CNN’s weight matrix Wlcon weight matrix at lth convolution layer
WLlog weight matrix at Lth logistic regression layer WSai wind speed at time step i given PINC
T length of the signal required to be decomposed blj bias of jth output map at lth layer
b CNN’s bias matrix blcon bias matrix at lth convolution layer
bLlog bias matrix at Lth logistic output layer clj additive bias of jth output map at lth layer
c CNN’s additive bias matrix clsub additive bias matrix at lth sub-sampling layer
d output vector size of training samples down () down-sampling function
f() output activation function g() signal required to be decomposed by wavelet
hij the jth target in ith training sample len length of a given map
m mini-batch size of training sample ri indicator of prediction interval coverage probability
t discrete time step uL output vector of the neurons in (L  1) layer
up () up-sampling function wid width of a given map
wi,jl weight matrix at lth layer connecting the ith input map xli the ith input map at lth layer
and jth output map xL1 the output of the neurons at (L  1)th layer
xi,j,k the ith input in jth input map at kth layer yi,j,k the ith output in jth output map at kth layer
_
y ðxi Þ mean of the estimated model uncertainty yij the jth output in ith training sample
ylj the jth output map at lth layer z1a/2 critical value of a Gaussian distribution function
_
y ðxi Þ output of the jth deep CNN model b CNN’s multiplicative bias matrix
al confidence level parameter blj multiplicative bias of the jth output map at lth layer
blsub multiplicative bias matrix at lth sub-sampling layer dai width of the PI at time step i given PINC
clen,wid average filter parameter matrix with size len  wid g learning rate
e(xi) uncertainty given the input xi t scaling variable
j translation variable r2du variance of data noise uncertainty
/() mother wavelet function r2h variance of total forecasting error
r2mu variance of model uncertainty
re2 variance of the uncertainty signal e(xi)

energy systems. It is demonstrated in [6] that the impact of these auto-correlation function and Gaussian process regression was
uncertainties on power system operations can be, to a certain proposed. Simulation results indicate that the suggested approach,
degree, mitigated via advanced WPF methods, which are consid- i.e., the generalized WPF model, performed the best among the
ered to be the most promising solutions for the integration of a three compared methods. In addition, soft-computing techniques,
large amount of wind energy into power grids. Aimed at this task, such as the artificial neural network [11,12] and Elman neural net-
three typical methodologies for WPF have been proposed in the lit- work [13], were utilized for WPF. In [14], a WPF model based on
erature, including physical modeling, statistical methods and soft- extreme machine learning was presented to evaluate wind power
computing techniques. density. In [15], a multi-layer neural fuzzy network was mooted
Physical modeling methods try to establish an accurate mathe- for hour-ahead WPF, and the model parameters were well-
matical model for WPF using various geographical and meteoro- trained by using simultaneous perturbation stochastic approxima-
logical information. However, this type of approach may not be tion. In [16], a hybrid model based on wavelet packet technique
applicable for practical real-time prediction tasks due to the high and artificial neural network was originally proposed, and the
amount of calculation costs involved [7,8], whereas statistical model parameters were optimized by using crisscross optimization
approaches manage to develop an optimal relationship between algorithm. In [17], reproducing kernel Hilbert space based proba-
future wind power and historical samples via error minimization. bilistic WPF method was proposed and the performance was eval-
In [9], a generalized WPF model was proposed based on time- uated by CRPS. In [18], the randomness and uncertainty of wind
varying threshold autoregressive moving average, and the effi- energy were quantitatively evaluated using Gaussian process
ciency was numerically analyzed. In [10], a hybrid statistical regression and teaching learning optimization. In [19], a general
approach in combination with empirical wavelet transform, partial framework based on k-nearest neighbors algorithm and kernel
58 H.-z. Wang et al. / Applied Energy 188 (2017) 56–70

density estimator were constructed for probabilistic WPF. The approach in which the connectivity principles between the neu-
main merit of the soft-computing based WPF models is their rons are inspired by the organization of animal visual cortex. Com-
potential abilities regarding data-mining and feature extracting pared to SAE and DBM, CNN not only has fewer parameters to be
[20]. A detailed overview of probabilistic WPF and wind speed estimated because of the weight sharing technique, but it also
forecasting is presented in [21]. can effectively extract the hidden structure and inherent features.
However, though the afore mentioned prediction models can be Nevertheless, to the authors’ knowledge, CNN designed for WPF
categorized into individual forecasters, they may suffer from insta- has not yet been considered in the published literature. Therefore,
bility, high sensitivity to initial parameters and over-training prob- this research is devoted to investigating a deep framework for
lems. Nonetheless, these problems can be effectively mitigated by probabilistic WPF. Therefore, a hybrid approach based on WT and
ensemble strategy [22], because the involved diverse errors from CNN, that can enhance the forecasting performance and prediction
model misspecification and data noise can cancel each other out efficiency, is originally proposed. The main contributions of this
via the aggregating progress. Another advantage with regard to paper are presented as follows:
ensemble technique is the increased accuracy and provided prob-
abilistic uncertainties, which play a decisive role as the operators (1) For the first time, CNN is introduced and tailored to compre-
prepare for future unknown conditions. Therefore, ensemble tech- hensively extract the deep invariant structures and hidden
nique based WPF approach has attracted extensive attention in high-level nonlinear features exhibited at any wind power
recent years. In [23], a weather ensemble prediction framework frequency.
for wind power forecast was constructed, and it demonstrated (2) A hybrid approach based on WT, CNN and ensemble tech-
superior performance. In [24], an ensemble approach based on nique is proposed to quantify the wind power uncertainties
empirical mode decomposition, sample entropy technique and with respect to model misspecification and data noise.
extreme learning machine, was proposed for probabilistic WPF. (3) Based on the quantified model misspecification and data
In [25], an ensemble model consisting of 52 neural networks and noise, a WPF framework is originally formulated to proba-
five Gaussian process sub-models were put forward to better quan- bilistically evaluate the randomness and stochastics in wind
tify the wind power distribution information. power data from the perspective of sharpness, reliability and
Another demerit regarding the physical modeling, statistical overall skills.
approaches and intelligent methods previously presented is their
shallow learning models. As presented in [26], considering the The proposed probabilistic WPF approach has been thoroughly
wind power data is complicated in nature, these shallow models tested and benchmarked using real wind power data in time
may be insufficient to extract the corresponding deep nonlinear domain under various time-scales and performance criteria.
and non-stationary traits. In addition, with the rapid development
of the smart grid, the environmental sensors and related technolo- 2. Deep convolutional neural network
gies are becoming increasingly more widely used than ever before,
which compel us into a big-data era. Accordingly, it becomes even As one of the means to effectively extract the invariant struc-
more difficult to extract the deep features of wind power data [27]. tures and inherent hidden features in data, deep CNN has been suc-
However, one effective way to address the shallow model issue is cessfully applied in various fields, including image-identification,
the use of deep learning [28], due to the ability to discover the classification, and feature-mining [33]. A typical deep CNN archi-
inherent abstract features and hidden high-level invariant struc- tecture consists of alternating the convolution layer and sub-
tures in data. The characteristics specific to feature extraction sampling layer with a fully connected layer as the output.
results in deep learning being much more attractive for WPF
[29]. In summary, the unsatisfactory feature mining of the shallow 2.1. Convolution layer
model and existing problems for individual forecasters inspire us
to rethink the WPF problem based on deep learning architecture The convolution layer is a two-layer feed-forward neural net-
and the ensemble technique. work that adopts a convolution operation to map the low-level
Deep learning mainly consists of SAE, DBM and CNN. SAE con- maps with local features into several high-level maps with global
sists of an unsupervised learning subpart that use auto-encoder features. In CNN architecture, no connections exist between the
as its building blocks and a logistic regression layer for data fitting neurons in the same layer. Moreover, weight sharing technique is
[26]. SAE aims to learn a dimensionality reduction representation employed between the neurons in different CNN layer to simplify
for the input data. DBM is a generative graphical model composed the feed forward and back propagation process. Concretely, the
of multiple layers of hidden Boolean units with connections feature maps in the previous (l  1)th layer are convolved with
between the layers but not between the units within each layer the shared weights, also termed kernels, and then passed through
[30]. DBM tries to learn a desirable probability distribution from a user-defined activation function to generate the lth layer with
the input units to the output units by minimizing a user-defined several output feature maps, as follows,
energy function. Compared to conventional neural network, the !
merit of SAE and DBM is in their training process, which can be X l
ylj ¼f xl1
i  wli; j þ bj ð1Þ
used to effectively alleviate the local minimum dilemma. The train-
i2NM
ing process of SAE and DBM consists of a layer-wise pre-training
process and a fine-tuning process. The former is used to provide where  denotes a convolution operation. In addition, the output
good initial values for all parameters, while the latter is used to activation function f() is chosen to be the sigmoid function.
search the optimum based on the given initial states of the net- The weight sharing technique is the main attribute that differs
work [31]. The earlier implementations of SAE and DBM tailored from other deep learning algorithms, i.e., SAE and DBM. In this
for wind power prediction were reported in [26,31,32]. However, technique, all neurons in the same output feature map share the
the drawback of SAE and DBM is their limited ability to search same weight but receives inputs from the neurons at different
for a global optimum, especially when the number of parameters locations. It has been demonstrated that the memory footprints
to be optimized substantially increases as the neural network and number of parameters to be estimated can be reduced signif-
architecture goes deeper or the dimensionality of the neurons icantly and the hidden invariant features in data can be effectively
expands. Another typical deep learning algorithm is CNN, an extracted via the use of weight sharing technique [34].
H.-z. Wang et al. / Applied Energy 188 (2017) 56–70 59

2.2. Sub-sampling layer convolution layer and sub-sampling layer are used to extract the
hidden features in wind power data, and the logistic regression
Classical convolution layers are usually intersected with sub- layer is employed for WPF.
sampling layers to gradually build up the high-level invariant
structures in data without sacrifice of specificity and also to reduce
the calculation time. The output maps of a sub-sampling layer are 3. Deep CNN based point forecast of wind power
more concise representations of input maps. Specifically, a sub-
sampling layer generates a down-sampled version of the input Owing to the chaotic nature of the weather system on earth,
maps, as follows wind power data always exhibit high variability and volatility.
    Therefore, a hybrid point forecaster for WPF is proposed in this
ylj ¼ f blj down xl1
j þ clj ð2Þ research to alleviate the impact of the uncertainties on WPF accu-
racy, The forecaster is a hybrid of WT and deep CNN, as shown in
where down-sampling function down() adopts the average func- Fig. 2. The raw wind power data are first normalized and then
tion, described as follows, decomposed into several frequencies. Each of the frequencies exhi-
X bits better data outliers with smaller degrees of uncertainty than
yi;j;k ¼ clen;wid xiþp;jþq;k ð3Þ
the original signal. An independent deep CNN network is then
len;wid
designed for each frequency and trained in a back propagation
The down-sampling function will average the sum of each n  n manner to predict the behavior of each frequency as accurately
area in each input image, which causes the output image to be n- as possible. Consequently, synthesizing all of the forecasting fre-
times compacted in both spatial dimensions. quencies via wavelet reconstruction and anti-normalization pro-
duces the final results for the deterministic point forecast of
2.3. Deep CNN for wind power prediction wind power. Fig. 2 presents the schematic diagram of the proposed
point forecaster. The details of each subpart are given below.
The building block of deep CNN consists of a pair of hidden
plies, namely, a convolution layer and a sub-sampling layer. Stack-
ing the building blocks one-by-one hierarchically and adding a 3.1. Wavelet decomposition
fully connected layer, such as a classifier, at the end of the stacks
creates a typical deep CNN. The input of deep CNN contains a num- Generally, raw wind power data series may contain nonlinear
ber of localized 2-D maps. The resolution of input maps becomes and non-stationary features in the form of spikes and fluctuations.
increasingly smaller as more convolution and down-sampling Indisputably, these features are the main attributes to deteriorate
operations are applied [35]. The parameters in deep CNN, including the accuracy of WPF [36]. One way to mitigate the impact of spikes
the weights and bias, are trained in a data-driven fashion via the and fluctuations on WPF accuracy is the use of WT [37]. Generally,
use of back propagation algorithm. WT can be defined in a discrete form to improve the computational
Deep CNN differs from SAE and DBM in two features. One is that efficiency, as follows
CNN adopts a more concise deep architecture with fewer memory
footprints and fewer parameters because of the use of weight shar- X
T1

ing technique. Another is that the pre-training process in SAE and Wav eletðt; jÞ ¼ 2ðt=2Þ gðtÞ/½ðt  j2t Þ=2t  ð4Þ
t¼0
DBM is not required for CNN training, making CNN more appropri-
ate for real-time implementation. These two features, coupled with In this paper, 4th Daubechies function is chosen to be the
its effective feature extraction and model recognition, make CNN mother wavelet because it provides a proper balance between
an attractive option for WPF. However, classical deep CNN cannot the wavelength and smoothness [38]. Therefore, in this research,
be applied directly to WPF because wind power data is 1D data in a a fast discrete WT, termed Mallat algorithm, is adopted. This algo-
time domain. Therefore, 1D wind power data should be first con- rithm is based on decomposition filters and reconstruction filters.
verted into a 2D image for feature extraction, and then reconverted Each filter corresponds to a low pass and a high pass. Thereafter,
to a 1D vector for prediction. Therefore, a deep CNN architecture the raw wind power data series can be decomposed into one
for WPF is proposed in this paper, as illustrated in Fig. 1. The archi- approximation (An) and several details (Dn). Compared to original
tecture includes a 1D-Data-to-2D-Image layer, several building wind power data series, the obtained approximation and details
blocks consisting of a convolution layer and a sub-sampling layer, exhibit better outliers and lower uncertainties, making the decom-
a 2D-Image-to-1D-Data layer and a logistic regression layer. The posed signals easier for WPF.

Feature Extrator
Input Predictor

Output

1D-Data-to- Logisc
Log s
Convoluon Sub-sampling Sub-sampling 2D-Image-to Regression
2D-Image 1D-Data
Layer Layer Layer Layer
Layer Layer
Fig. 1. Deep CNN architecture for wind power prediction.
60 H.-z. Wang et al. / Applied Energy 188 (2017) 56–70

Fig. 2. Schematic diagram of the proposed point forecaster for WPF.

3.2. Conversion from 1D data to 2D image 1X m X d


i 2
Em ¼ ðh  yij Þ ð5Þ
m i¼1 j¼1 j
As presented in Section 2.3, the wind power data series should
first be converted to a 2D image to apply deep CNN for WPF. The Thus, the updating rule for weights W and biases b, b, c can be
1D-Data-to-2D-Image layer is responsible for the conversion pro- defined in an iterative form, as follows,
cess. For simplicity, this paper proposes a rearrangement proce-
W ¼ W  g  @Em =@W ð6Þ
dure to achieve the 1D-Data-to-2D-Image task, described as
follows: (1) Determine the size of the image p  q; (2) Given a wind
b ¼ b  g  @Em =@b ð7Þ
power data sample, note that the length of data l should satisfy
l = p  q; (3) Sequentially arrange the first q data as the first row
b ¼ b  g  @Em =@b ð8Þ
of the image, the next q data as the second row of the image, etc.
Therefore, using these three steps, the 1D-Data-to-2D-Image pro-
c ¼ c  g  @Em =@c ð9Þ
cess is easily completed through the 3 steps and its effectiveness
is tested in Section 4. Similarly, the 2D-Image-to-1D-Data proce- here, oEm/oW, oEm/ob, oEm/ob, and oEm/oc are the partial derivatives
dure is able to be achieved in a reversible way. In addition, the size of loss function with respect to perturbations of W, b, b and c.
of the image is adjusted in a trial-and-error manner to improve the However, three types of layers with parameters to be learned exist
forecast accuracy. in deep CNN, i.e., the convolution layer, sub-sampling layer, and
logistic regression layer. Therefore, because the update rules of
3.3. Back propagation training for deep CNN the parameters in (6-9) depend on the type of layer, each layer
should be analyzed separately.
To solve the WPF problem, the parameters of deep CNN, i.e., the Regarding logistic regression layer, the partial derivative oEm/o
weights and biases, should be trained by the back propagation rule bLlog is calculated as the element-wise product between the deriva-
applying stochastic gradient descent. Back propagation attempts to tive of layer with respect to input and output error. Then, oEm/oWL-
L
minimize the squared-error loss function Em between the outputs log is calculated by scaling up the oEm/oblog via the input of the layer
and target as follows, [39], as follows,
H.-z. Wang et al. / Applied Energy 188 (2017) 56–70 61

T
@Em =@W Llog ¼ xL1 ð@Em =@b log Þ
L
ð10Þ [40]. The former, also termed model uncertainty, may be caused by
a local minimum in the training process, finite training samples,
L 0 d and stochastically generated parameters. The latter may be
@Em =@b log ¼ f ðuL Þ  ðyd  h Þ ð11Þ because wind power time-series data have stochastic characteris-
tics because of the chaotic weather system. These uncertainties
where uL = WLxL1 + bL, and ‘‘°” denotes element-wise
have a significant side impact on prediction accuracy because it
multiplication.
is difficult to estimate them in a deterministic manner. Therefore,
With respect to the convolution layer followed by a sub-
in this research, these two types of uncertainties are probabilisti-
sampling layer, it is difficult to estimate the involved partial
cally modeled as a Gaussian distribution that depends on the input,
derivatives since each neuron in an input map connects to only
as follows,
one neuron in the output map. In other words, the sensitivity maps
for one output pixel correspond to a set of sensitivity input maps.
eðxi Þ GDðMe ðxi Þ; r2e ðxi ÞÞ ð17Þ
Concretely, oEm/oblcon is obtained by synthesizing all of the entries
in the sensitivity output map. Each entry is calculated by up- Note that the adopted Gaussian distribution can fit different shapes
sampling the sub-sampling layer’s sensitivity maps to make it of distributions. Even if the actual uncertainty is not Gaussian-like,
the same size as the convolutional layer’s input map, and then it has been demonstrated in [41] that the prediction model based on
multiplying the up-sampled map with the activation derivative Gaussian distribution can also be implemented with competitive
map in an element-wise way. Thus, oEm/oWlcon is then computed performance.
as the sum of the product between oEm/oblcon and the correspond-
ing patch vector [34] in xl1 as follows,
X 4.2. Model uncertainty evaluation
0 lþ1 l1
@Em =@W lcon ¼ blþ1 ðf ðul Þ  upð@Em =@bcon ÞÞðpatch Þ ð12Þ
len;wid The model uncertainty is estimated via an ensemble of individ-
X ual deep CNNs, thus leading to a less biased approximation of the
l 0 lþ1
@Em =@bcon ¼ blþ1 ðf ðul Þ  upð@Em =@bcon ÞÞ ð13Þ true regression of the measured targets. In the proposed ensemble
len;wid approach, the adopted deep CNN model differs from the other
models in terms of the number of hidden layers, the number of
Generally, one effective way to implement the up-sampling
maps in each layer, and the length of input vector. Considering a
function is the use of the Kronecker product, as given below,
training dataset consisting of NS samples
upðxÞ x  1nn ð14Þ
DSmu ¼ ðx1 ; h1 Þ; . . . ; ðxi ; hi Þ; . . . ; ðxN ; hNS Þ ð18Þ
Considering the sub-sampling layer, the back propagation rules
depend on the location of the layer. If the sub-sampling layer con- and an ensemble of NE deep CNN models, the true regression is esti-
nects the logistic regression layer, then the update rules are the mated as the average output of the ensemble, as follows,
same with standard back propagation equations [39]. If not, the
1 X
NM
_ _
update rule for additive bias is the sum of the elements in the cor- y ðxi Þ ¼ y ðxi Þ ð19Þ
responding output patch. Accordingly, the update rule for multi- NE j¼1 l
plicative bias can be computed via a down-sampling operation,
as follows, Then, the variance of the model uncertainty is evaluated as the
X  0    variance of the output set of the well-trained CNN models,
lþ1
@Em =@c lsub ¼ f ulj  @Em =@bj;con  W lþ1
j;con ð15Þ expressed as
j2map
NE  2
! 1 X _ _
X r2mu ðxi Þ ¼ y ðxi Þ  y ðxi Þ ð20Þ
N E  1 j¼1 l
@Em =@blsub ¼ @Em =@clsub;j  downðxl1
j Þb
lþ1
ð16Þ
j2map

Therefore, all of the parameters in deep CNN can be periodically 4.3. Data uncertainty evaluation
updated in a top-down manner based on (6-16) until they all con-
verge. One period means the parameters of the entire model are It is challenging to estimate the data uncertainty of wind power
updated once, resulting in smaller errors. The errors are then back due to the erratic nature of the weather system. In this research, it
propagated through the training set to re-correct the model is assumed that the mean and variance of data uncertainty are also
parameters towards the optimal states. The training process is conditioned on the input variables. Given the result of the point
_
completed when all of the parameters find their optimal states. forecaster y ðxi Þ, the input dataset for data uncertainty is expressed
as follows,
4. Uncertainty evaluation for point forecast _ _
DSdu ¼ ðh1  y ðxi ÞÞ; . . . ; ðhNS  y ðxi ÞÞ ð21Þ
Due to the chaotic nature of the weather system on earth, wind
power data always exhibit nonlinear and non-stationary Then, the mean and variance of the data noise can be calculated
uncertainties. These uncertainties are mainly resulted from as the follows,
model misspecification and data noise, which are estimated as
1 X SN
follows. _
Mdu ¼ ðhl  y ðxl ÞÞ ð22Þ
NS l¼1
4.1. Uncertainties of WPF
1 X S N
_ 2
The uncertainties exhibited in wind power data are inevitable r2du ¼ ððhl  y ðxl ÞÞ  Mdu Þ ð23Þ
NS  1 l¼1
due to the misspecification of the forecasting model and data noise
62 H.-z. Wang et al. / Applied Energy 188 (2017) 56–70

8 a a a a a
5. Probabilistic WPF based on ensemble approach
1 X < 2adi  4½Li  WSi ; if WSi < Li
NS >
a a a
IS ¼ 2adi if WSi 2 Ii ð28Þ
In this section, a novel approach for probabilistic WPF based on NS i¼1 >
:
2adai  4½WSai  U ai ; if WSai > U ai
ensemble technique is proposed. Moreover, the performance crite-
ria for probabilistic WPF are also presented. The width of PI dai can be computed as (Uai  Lai ), and indicator ri
is defined as,
(
5.1. Probabilistic WPF approach
0; WSai 2 Iai
ri ¼ ð29Þ
Due to the variability and volatility of wind power data, a novel
1; WSai R Iai
approach is originally proposed to evaluate the probabilistic infor- Another widely used performance criteria for probabilistic WPF
mation in wind power data. This approach is a hybrid of WT, CNN is CRPS that considers both reliability and sharpness simultane-
and ensemble technique. Firstly, the input dataset are uniformly ously [42]. Generally, given the cumulative distribution function
resampled to form NE point forecasters, also termed NE replicates. CDFi and the measurement yi over the testing sample, the average
For each replicate, an independent point prediction model for value of CRPS can be calculated as [43]
WPF is designed. Then, based on the prediction results, the mean
Z
and variance of the model uncertainty and data noise are sepa- 1X Ns 1

rately evaluated by using (18-20) and (21-23). Assuming that these CRPS ¼ ½CDF i  Hðy  yi Þ2 dy ð30Þ
Ns i¼1 y¼0
two uncertainties are independent from each other, then the vari-
ance of total forecast error rh2 can be obtained by synthesizing The value of indicative function H(y-yi) is 0 if y < yi, otherwise,
these two uncertainties, i.e., the value is 1.

r2h ðxi Þ ¼ r2mu ðxi Þ þ Mdu þ r2du ð24Þ


6. Numerical results and analysis
Therefore, given the target hi and confidence level 100(1  a)%,
the prediction intervals can be effectively constructed as a lower In this research, the proposed approach for probabilistic WPF
bound Lah (xi) and an upper bound Uah (xi), as follows, based on WT, deep CNN and ensemble technique is extensively
qffiffiffiffiffiffiffiffiffiffiffiffiffi evaluated and benchmarked using real data from the MWWF in
_
Lah ðxi Þ ¼ y ðxi Þ  z1a=2 r2h ðxi Þ ð25Þ Shandong Province, China, and the SIWF in Guangdong Province,
China.
_
qffiffiffiffiffiffiffiffiffiffiffiffiffi
U ah ðxi Þ ¼ y ðxi Þ þ z1a=2 r2h ðxi Þ ð26Þ 6.1. Investigations on milky way wind farm
The overall architecture for probabilistic WPF is illustrated in
6.1.1. Experimental settings
Fig. 3. The performance of the proposed approach has been com-
The MWWF has a rated capacity of 47.5 MW in combination
pletely benchmarked and tested in Section 4.
with 19 wind turbines of 2.5 MW each. The wind power data from
MWWF are sampled in 5-min intervals and cover the period from
5.2. Performance criteria Jan. 2011 to Dec. 2011. The wind power data are divided into a
training dataset and a testing dataset. The training dataset covers
In general, probabilistic WPF adopts average coverage error the days from the 1st to the 25th each month, and the remainder
(ACE) and interval sharpness (IS) as the performance criteria. ACE comprise the testing dataset. Based on the training dataset, the
measures how well the prediction quantiles matches the observed proposed approach specific for the training dataset is applied for
values, and IS is to comprehensively evaluate the sharpness of the probabilistic WPF. For each point forecaster, the input parameters
PI, by rewarding the narrower PIs and penalizing wider PIs, are the wind power data at a current time step WPt, and the previ-
expressed as follows. ous 8, 15 or 24 values. The input wind power data series are then
decomposed into three frequencies, including one approximation
1 X S N
ACE ¼ r i  100%  PINC ð27Þ and two details, or four frequencies, including one approximation
NS i¼1
and three details. For each frequency, the building blocks consist-

The First Point 2


Forecaster mu
y1
The Second Point y
Forecaster
y2 Lh 99%
Input
Dataset The NEth Point Forecaster yNE 2 Lh 95%
h

95%
- M du Uh
+ Data Noise Analysis 2 99%
du Uh
Fig. 3. The overall architecture of the proposed approach for probabilistic WPF.
H.-z. Wang et al. / Applied Energy 188 (2017) 56–70 63

ing of one convolution layer and one subsampling layer adopt 3, 4, 6.1.3. Analysis
5 or 6. Therefore, there are 3  2  4 = 24 replicates, i.e., NE = 24. From Table 1, it is evident that the ACEs obtained from the pro-
These parameters are determined in a trial-and-error manner. In posed ensemble approach perform the best in all four seasons.
addition, the wavelet parameters in each replicate adopt the Quantitatively, the ACEs from the proposed WPF approach have a
default values in MATLAB wavelet toolbox because the results minimum of 1.81% and a maximum of 2.96% with an average of
are not sensitive to these parameters. All of the model parameters 0.16%. However, the average ACEs obtained from persistence, BP
in CNN, including the weights W and biases b, b, c, are randomly + QR, and SVM+QR correspond to 6.81%, 3.51%, and 1.72,
initialized and updated based on the training process presented respectively. The numerical results indicate that the resultant ACEs
in Section 3.3 until they all converge. Moreover, to mitigate the from the proposed approach are closer to the corresponding nom-
impact of seasonal uncertainty on prediction accuracy, the pro- inal confidence level, especially at the 95% and 99% PINC levels.
posed WPF approach is seasonally trained and evaluated because Therefore, compared to the three benchmarks, the proposed
of the erratic nature of the weather system. approach exhibits higher prediction capability in term of
Furthermore, to validate the high-efficiency and high-accuracy reliability.
of the proposed approach, the obtained results are compared to The presented ISs in Table 2 indicate that, compared to persis-
classical WPF methods, including persistence, SVM, and BP. The tence, BP+QR, and SVM+QR, the ISs obtained from the proposed
mean of the persistence model adopts the last available observa- approach have been evenly improved by 58.94%, 60.56%, and
tion and the variance is calculated using the latest measurements. 58.33% at PINC 85%, 61.81%, 62.67% and 58.48% at PINC 90%,
The SVM and BP are extended to probabilistic WPF via QR, which is 64.79%, 61.46% and 58.10% at PINC 95%, 79.26%, 58.02% and
frequently used for probabilistic WPF. The prediction algorithms 61.94% at PINC 99%, respectively. These results prove that at a high
are implemented in MATLAB R2014a and conducted on a personal confidence level, the sharpness errors between the observed prob-
computer with an Intel(R) Xeon(R) E3-1225 V2 3.2-GHz CPU and ability and nominal confidence obtained from the proposed
16.00 GB of RAM. approach are at a minimum. Apparently, the IS performance
demonstrates that the proposed approach has a higher forecast
capability in term of sharpness and is thus more appealing when
compared to the other three benchmarks.
6.1.2. Numerical results Figs. 4–7 indicate that the measured wind power data are, for
A series of simulations are conducted to demonstrate the feasi- the most part, within the lower and upper bounds of the con-
bility and effectiveness of the proposed approach. Here, the perfor- structed PIs. In addition, it is noted that the shapes of the three
mance of probabilistic WPF is evaluated in terms of ACE and IS. The lines in each of the four graphs are very similar to each other.
1-h ahead seasonal results obtained from the four benchmarks, i.e., Therefore, from the high PI coverage and the similarity of the lines,
ACE and IS, are presented in Tables 1 and 2 over a high confidence it can be concluded that the probabilistic performances obtained
level ranging from 85% to 99% because of the high reliability from the proposed approach are satisfactory. In addition, it is evi-
required for power system optimization and operation. In this case, dent that the actual wind power data in spring and winter have
the constructed seasonal PIs with PINC 90% obtained from the pro- more zero values than the data in summer and autumn. Mean-
posed approach are graphically presented in Figs. 4–7. In addition, while, the line trends of the constructed PIs in Figs. 5 and 6 exhibit
Figs. 8 and 9 depicts the reliability deviations and IS in spring and stronger fluctuations than those in Figs. 4 and 7. This is because the
winter over the entire confidential level range with a step of 5%. weather systems of MWWF in summer and autumn are more erra-
Moreover, to further demonstrate the advantages of the proposed tic than those in spring and winter, and thus it is more difficult to
probabilistic WPF approach, a series of simulations using different make accurate prediction.
time resolutions has been conducted and the numerical results are Fig. 8 indicates that the ACE deviations obtained from the pro-
shown in Fig. 10. In this simulation, the training dataset covers the posed approach vary at a lesser value than those obtained from
whole year and are obtained by interval sampling of the original SVM in all confidential levels and seasons. The trends of the ACE
wind power dataset. The results of SVM are only presented because deviations appear to be irregular and chaotic. From Fig. 9, it can
of the relatively superior performance in 1-h ahead prediction be concluded that the sharpness of the quantiles obtained from
tasks. the proposed approach is better than that obtained from the

Table 1
1-Hour ahead forecasting ACE for MWWF.

Method Season PINC 85% PINC 90% PINC 95% PINC 99%
Persistence Spring 3.85% 5.18% 6.52% 5.81%
Summer 1.23% 4.66% 8.09% 8.95%
Autumn 5.94% 8.32% 7.57% 7.90%
Winter 5.94% 8.85% 10.71% 9.47%
BP+QR Spring 3.85% 2.04% 2.33% 0.57%
Summer 2.28% 0.99% 1.81% 0.05%
Autumn 4.37% 3.09% 4.95% 0.48%
Winter 3.83% 10.42% 12.28% 3.71%
SVM+QR Spring 3.85% 1.52% 1.28% 1.09%
Summer 5.16% 2.67% 1.86% 0.58%
Autumn 3.32% 4.14% 1.86% 0.57%
Winter 5.42% 4.66% 2.33% 2.14%
Proposed approach Spring 1.23% 1.52% 0.29% 0.50%
Summer 1.91% 0.05% 2.91% 0.05%
Autumn 2.96% 2.67% 0.76% 0.57%
Winter 1.91% 0.99% 1.81% 1.62%

The bold values denote the best performance among the benchmarks.
64 H.-z. Wang et al. / Applied Energy 188 (2017) 56–70

Table 2
1-Hour ahead forecasting IS for MWWF.

Method Season PINC 85% PINC 90% PINC 95% PINC 99%
Persistence Spring 11.90 9.43 6.51 3.11
Summer 5.81 4.82 3.61 2.17
Autumn 7.98 6.32 4.23 1.89
Winter 11.89 9.70 6.81 3.44
BP+QR Spring 11.38 8.59 5.00 1.17
Summer 6.95 5.83 3.81 0.92
Autumn 8.39 6.49 3.97 1.09
Winter 12.40 10.06 6.55 2.06
SVM+QR Spring 10.40 7.81 4.48 1.03
Summer 6.89 4.42 2.83 0.78
Autumn 8.22 6.24 3.77 1.14
Winter 11.52 9.37 6.70 2.83
Proposed Approach Spring 5.68 4.20 2.72 0.78
Summer 3.88 3.02 1.94 0.66
Autumn 1.60 1.16 0.73 0.21
Winter 4.27 3.18 2.06 0.55

The bold values denote the best performance among the benchmarks.

Fig. 4. PIs with PINC 90% in Spring 2011 at MWWF obtained from the proposed approach.

50 Actual Wind Power


Wind Power (MW)

45
Constructed PI
40
35
30
25
20
15
10
5
0
20 40 60 80 100 120 140 160 180
Time (h)

Fig. 5. PIs with PINC 90% in Summer 2011 at MWWF obtained from the proposed approach.

50
Actual Wind Power
Wind Power (WM)

45
Constructed PI
40
35
30
25
20
15
10
5
0
0 20 40 60 80 100 120 140 160 180
Time (h)

Fig. 6. PIs with PINC 90% in Autumn 2011 at MWWF obtained from the proposed approach.
H.-z. Wang et al. / Applied Energy 188 (2017) 56–70 65

50 Actual Wind Power


45 Constructed PI

Wind Power (MW)


40
35
30
25
20
15
10
5
0
20 40 60 80 100 120 140 160 180
Time (h)

Fig. 7. PIs with PINC 90% in Winter 2011 at MWWF obtained from the proposed approach.

tecture, which provides an effective way to approximate the inher-


ent invariant features and hidden structures. Therefore, the high-
level nonlinear, non-stationary and non-smoothness exhibited in
the wind power can be better extracted.

6.2. Investigations on Shangchuan island wind farm

6.2.1. Experimental settings


The SIWF is situated on the southern coast of China. The raw
wind power data collected from SIWF cover the period from Jan.
PINC 2013 to Dec. 2013 with a 15-min resolution. The SIWF has a rated
Fig. 8. The reliability deviation for 1-h-ahead prediction.
capacity of 48.45 MW with 57 wind turbines of 0.85 MW each. The
dataset is divided into the training dataset and testing dataset.
Similarly, the model for probabilistic WPF is seasonally designed
because of the erratic nature of wind power data. The design pro-
cess is similar to the process described in VI-A. Consequently, the
obtained WPF model consists of 24 replicates, and the model
parameters can thus be trained for WPF. The benchmarks for per-
formance comparison also adopt persistence, BP and SVM.

6.2.2. Numerical results


To comprehensively demonstrate the overall privilege of the
proposed approach, the performance criteria in this subsection
PINC adopts CRPS, which is widely used for probabilistic WPF. The
advantage of CRPS is that it can simultaneously address both reli-
Fig. 9. The IS for 1-h-ahead prediction. ability and sharpness. Table 3 presents the CRPS data in terms of
seasons and prediction horizons. The prediction horizons range
from 15-min ahead to 8-h ahead. The CRPS indices are obtained
SVM in all of the confidential levels and seasons. The values of IS by using the created quantiles ranging from 5% to 99%. The wind
tend to increase as the PINC increases. Therefore, the ACE and IS power density in each quantile is assumed to be uniform [42]. In
performances in Figs. 8 and 9 illustrate that the proposed approach the numerical simulation, a moving average process is adopted
performs the best in all confidential levels and seasons. to obtain adequate wind power samples for probabilistic WPF over
In Fig. 10, it is obvious that the proposed approach performs various prediction horizons. Moreover, Figs. 11–17 present the
better than the three benchmarks with respect to time resolutions relationships between CRPS and its corresponding wind power.
and confidential levels. The ACE deviation from the proposed The presented samples are randomly selected from the result pool.
approach ranges from 0.18% to 7.00% at PINC 85%, from 0.47% to Furthermore, the overall performance of the proposed approach
4.24% at PINC 90%, from 0.24% to 3.43% at PINC 95%, and from somehow depends on the number of replicates, i.e., point fore-
0.48% to 3.19% at PINC 99%, respectively. The average value of caster. Therefore, a series of sensitivity analyses was carried out
the deviations is 1.91%. The absolute values of IS vary from 1.36 to provide more realistic and persuasive comparisons. The related
to 8.25 at PINC 85%, from 1.04 to 6.25 at PINC 90%, from 0.73 to CRPS is presented in Table 4. The number of replicates in the pro-
3.83 at PINC 95%, and from 0.24 to 1.20 at PINC 99%. The average posed approach adopts 8, 24, 48 and 96.
value is 3.20. Compared to SVM, the performances of the ACE
and IS of the proposed approach are evenly improved by 50.84% 6.2.3. Analysis
and 40.51%, respectively. From Table 3, the CRPS values obtained from the proposed
The ACE and IS are two typical metrics that are frequently used approach has a minimum of 0.2809 and a maximum of 1.7813 with
to evaluate the effectiveness and feasibility of probabilistic WPF. an average of 0.9410 at 15-min-ahead prediction, a minimum of
Based on the results presented herein, it is evident that the pro- 1.3444 and a maximum of 1.7476 with an average of 1.6101 at
posed probabilistic WPF approach outperforms the three bench- 30-min-ahead prediction, a minimum of 1.4532 and a maximum
marks not only with respect to reliability but also from the of 2.4334 with an average of 1.9381 at 1-h-ahead prediction, and
perspective of sharpness. The high-accuracy and superiority of a minimum of 2.1296 and a maximum of 2.7067 with an average
the proposed approach are mainly derived from the deep NN archi- of 2.3270 at 2-h-ahead prediction. In addition, the ranges of CRPS
66 H.-z. Wang et al. / Applied Energy 188 (2017) 56–70

Fig. 10. ACE and IS in different time resolutions.

Table 3
The CRPS in different prediction horizons.

Method Prediction horizon Spring Summer Autumn Winter


Persistence 15-min 2.2102 0.7196 0.3514 1.0693
30-min 2.8667 2.0422 1.6968 2.9776
1-h 2.5799 3.5908 2.5862 3.5578
2-h 4.3073 2.4523 3.6326 6.0400
4-h 5.4908 2.9488 5.5191 7.3005
6-h 7.6553 3.3618 5.2071 7.2689
8-h 9.2039 3.3239 5.7837 9.1547
BP+QR 15-min 2.4195 0.6995 0.4243 1.2663
30-min 3.2270 3.4703 2.0680 3.1258
1-h 3.0575 5.3135 3.2823 3.6227
2-h 4.6495 4.1036 4.7229 6.3712
4-h 6.5049 3.5162 5.9394 8.0313
6-h 8.3627 3.9519 5.7148 9.0757
8-h 10.0314 3.4295 6.2314 10.0169
SVM+QR 15-min 1.9264 0.6611 0.4486 1.1397
30-min 2.6813 2.3440 1.6697 2.7172
1-h 2.5362 3.4096 2.7210 3.2541
2-h 4.4057 3.1566 4.0988 5.7870
4-h 6.1283 3.2145 5.9401 6.3476
6-h 7.0058 2.7965 4.7600 4.3214
8-h 7.5087 2.6379 4.2695 5.4061
Proposed Approach 15-min 1.7813 0.6397 0.2809 1.0621
30-min 1.7476 1.6313 1.3444 1.7172
1-h 1.4532 2.4334 1.8367 2.0293
2-h 2.1489 2.1296 2.3229 2.7067
4-h 2.8531 1.9691 2.7849 3.5409
6-h 3.5699 1.9162 2.5432 3.6821
8-h 4.2750 1.8777 2.8082 4.3385

The bold values denote the best performance among the benchmarks.

are from 1.9691 to 3.5409 at 4-h-ahead prediction, from 1.9162 to proposed approach has been improved, on average, by 13.48%,
3.6821 at 6-h-ahead prediction, and from 1.8777 to 4.3385 at 8-h- 21.74%, and 9.86%, respectively, compared to persistence, BP and
ahead prediction. It is evident that the proposed approach exhibits SVM. At 30-min-ahead prediction, the improvements are 32.79%,
the most ideal CRPS values in all of the prediction horizons. Quan- 45.84% and 31.57%, respectively. At 1-h-ahead prediction and 2-
titatively, at 15-min-ahead prediction, the CRPS performance of the h-ahead prediction, the improvements correspond to 37.05%,
H.-z. Wang et al. / Applied Energy 188 (2017) 56–70 67

Fig. 11. The change of CRPS for 15-min-ahead WPF in winter.

Fig. 12. The change of CRPS for 30-min-ahead WPF in spring.

Fig. 13. The change of CRPS for 1-h-ahead WPF in autumn.

Fig. 14. The change of CRPS for 2-h-ahead WPF in summer.

49.25%, 34.97%, and 43.35%, 53.10%, 46.65%, respectively. In addi- horizon increases. This is because the long-term wind power data
tion, compared to the three benchmarks, the CRPS performances exhibit a more erratic nature and thus are more difficult to predict.
have been evenly improved by 48.42%, 45.02%, and 45.10% when Figs. 11–17 indicates that the value of CRPS at power output 0–
the prediction horizon ranges from 4-h ahead to 8-h ahead. The 0.1 p.u. is approximately 2 in all cases. This value is relatively small
numerical results presented in Table 3 further demonstrate that when compared to the values at other output levels. The value of
the proposed approach has a higher forecasting ability with respect CRPS then continues to increase as wind power output increases
to overall skill, i.e., CRPS. Moreover, it can be easily concluded that until peaking at an output level between 0.5 p.u. and 0.7 p.u.
the CRPS performance deteriorates substantially as the prediction Thereafter, the value of CRPS gradually decreases as the power out-
68 H.-z. Wang et al. / Applied Energy 188 (2017) 56–70

Fig. 15. The change of CRPS for 4-h-ahead WPF in spring.

Fig. 16. The change of CRPS for 6-h-ahead WPF in winter.

Fig. 17. The change of CRPS for 8-h-ahead WPF in summer.

Table 4
The Chang of CRPS with the number of replicates.

Prediction horizon Season Replicates


8 24 48 96
15-min Spring 2.8731 1.7813 1.4254 1.2939
Autumn 1.2845 0.2809 0.2503 0.2456
30-min Spring 3.2492 1.7476 1.5602 1.3861
Autumn 2.4735 1.3444 0.9880 0.9204
1-h Spring 3.9830 1.4532 1.2122 1.0723
Autumn 2.9028 1.8367 1.5039 1.3748
2-h Spring 4.5097 2.1489 1.8023 1.7399
Autumn 3.8721 2.3229 1.9304 1.7430
4-h Spring 4.1662 2.8531 2.6303 2.5011
Autumn 4.5396 2.7849 2.5116 2.5020
6-h Spring 5.5102 3.5699 3.3307 3.1984
Autumn 4.2019 2.5432 2.1787 2.0641
8-h Spring 6.6488 4.2750 3.9215 3.7714
Autumn 5.0361 2.8082 2.6601 2.5646
H.-z. Wang et al. / Applied Energy 188 (2017) 56–70 69

put increases. In addition, the values of CRPS at 0.9 to 1.0 p.u. out- for probabilistic WPF than the benchmarks of the states. More
put levels are usually below 5 and thus slightly greater than the accurate WPF approach is able to effectively reduce the uncertainty
CRPS value at 0–0.1 p.u. output level. Therefore, from the line about the future availability of wind power, and thus play a signif-
trends of CRPS in Figs. 11–17, it can be concluded that the wind icant role in the planning and operation of electric power and
power data at a lower or higher output level are less difficult to energy systems. For example, in [44], probabilistic WPF approach
predict and thus exhibit a relatively smaller degree of uncertainty was used to develop a novel unit commitment model which could
than the data in the mid-range of output levels. Moreover, the CRPS be applied to analyze different commitment strategies and thus to
obtained from the proposed approach is always the best regardless optimize the operation of electric power and energy systems. In
of seasons, power output levels or prediction horizons. The com- [45], a probabilistic nonlinear interval optimization model for
paratively better performances also demonstrate that the proposed power system dispatch was proposed. In this model, wind power
approach has the most robust prediction capability among the quantiles obtained from probabilistic WPF method were utilized
benchmarks. Furthermore, the CRPS performance of SVM is within to evaluate the effect of wind power uncertainty on power flow
the ideal range in same cases but may become unacceptable in distribution, considering both the average and deviation of the
other cases. Persistence and BP perform likewise. Therefore, the objective functions. The feasibility of this model was demonstrated
results demonstrate that the proposed approach not only signifi- on a modified IEEE 30-bus system. Also for example, probabilistic
cantly improves the prediction performance, but it also exhibits WPF results can be employed to facilitate energy scheduling of sys-
high stability and strong robustness. Thus the superiority and tem operators in electricity markets [46]. Three steps were
potentiality of the proposed approach are further illustrated. required in this process. Firstly, wind power point forecasting
From Table 4, it is evident that the value of CRPS in all of the results were applied to determine the day-ahead unit commitment
cases gradually decreases as the number of replicates increases. and energy dispatch. Then, the commitment statuses of fast-
This is because the increase of replicates undoubtedly reduces starting units were real-time adjusted based on the probabilistic
the uncertainties exhibited in wind power data, i.e., model mis- information of wind power. These two steps constitute a two-
specification and data noise. However, when the replicates settlement electricity market model with clearing of day-ahead
increase to a certain level, the CRPS performance will no longer and real-time markets for energy and operating reserves. Lastly,
improve but rather suffer small fluctuations. It is further noted that the real-time energy dispatch was achieved based on the realized
the values of CRPS worsen as the prediction horizon increases. This availability of wind power. The practicability of this energy
is because long-term wind power data exhibit a more erratic nat- scheduling strategy incorporating probabilistic WPF approach
ure and are thus more difficult to predict. On the other hand, the was demonstrated on power system in Illinois.
simulation for 15-min-ahead WPF accounts for 3 min. There are In addition, it has been demonstrated by the authors that wind
24 replicates and 4  24  25  3 = 7200 samples in the simula- power probabilistic information can also be implemented in eco-
tion. When there are 48 and 96 replicates, although the CRPS per- nomic dispatch to quantify the impacts of wind power uncertain-
formance improves, the calculation time doubles in size and the ties on total generation cost [31]. More accurate wind power
algorithm efficiency thus becomes unsatisfactory. Taken together, forecasting results mean less uncertainty on the quantiles and thus
24 replicates is a compromise solution between algorithm effi- lead to decrease the operation cost of wind-embedded electric
ciency and overall performance. However, it should be noted that power and energy systems. Therefore, from these presented exam-
the proposed approach is not the most efficient among the bench- ples, it can be seen that probabilistic wind power approach with
marks from the perspective of calculation time. This result is high-accuracy is very appealing for practical applications.
understandable because the CNN-based WPF architecture contains
a signal decomposition process and four independent CNN net-
works, each of which would require a substantial amount of time.
The results in Tables 1–4 and Figs. 4–17 demonstrate that the 7. Conclusions
proposed ensemble approach exhibits the best prediction capabil-
ity not only from the perspective of reliability and sharpness, but To maximize the beneficial impact of wind energy on climate
also from that of overall performance. Two important aspects change mitigation and environmental pollution reduction, proba-
account for the relatively better performance of the proposed bilistic wind power forecasting with high-accuracy is a pressing
approach based on WT, deep CNN and the ensemble technique. need. Aimed at the target, a novel hybrid approach based on wave-
First, the nonlinear, non-stationary features and deep hidden let transform, deep convolutional neural network and ensemble
invariant structure exhibited in wind power data cannot be fully technique was originally proposed in this paper for probabilistic
modeled by shallow NN models, such as BP and SVM. On the con- WPF. The developed hybrid approach was comprehensively com-
trary, the primary advantage of CNN is the compact representation pared with the benchmark persistence method and shallow NN
of a large set of hidden functions, which leads CNN to learn in a models, such as BP and SVM. The wind power dataset used for
part-whole-decomposition manner that provides an effective WPF was collected from actual wind farms in China. The test
way to extract the highly-varying inherent features and hidden results obtained from various seasons, time resolutions and predic-
invariant structure. Second, the ensemble technique is capable of tion horizons demonstrate that the proposed approach outper-
mitigating the wind power uncertainties with respect to model forms all of the tested alternatives in terms of reliability,
misspecification and data noise. This is because the diverse errors sharpness and overall skill, i.e., CRPS. These comparatively better
of individual forecasters are stochastically distributed on the input performances in all of the cases examined herein further illustrate
space and can thus be cancelled out in the ensemble process. that the forecasted distribution from the proposed approach
Therefore, considering forecasting reliability, sharpness and overall reflects the real wind power uncertainty. Therefore, the proposed
performance, the proposed approach exhibits much better compre- approach exhibits high-stability and strong robustness and is supe-
hensive performances than the other three benchmarks. rior to all of the alternatives with which it was compared. The
superiority of the proposed probabilistic WPF approach is attribu-
6.3. Practical application of the proposed approach ted to the deep CNN architecture and the ensemble technique. The
former can effectively extract the nonlinear and stochastic nature
The in-depth numerical simulations demonstrate that the pro- exhibited in each wind power frequency, and the latter can cancel
posed approach exhibits a more accurate and robust performance the diverse errors out. It is also evident that the proposed proba-
70 H.-z. Wang et al. / Applied Energy 188 (2017) 56–70

bilistic WPF approach demonstrates a high potential for practical [19] Zhang Y, Wang J. K-nearest neighbors and a kernel density estimator for
GEFCom2014 probabilistic wind power forecasting. Int J Forecast 2016;32
applications in electrical power and energy systems.
(3):1074–80.
[20] Wang Y, Wang JZ, Wei X. A hybrid wind speed forecasting model based on
Acknowledgement phase space reconstruction theory and Markov model: a case study of wind
farms in northwest China. Energy 2015;91:556–72.
[21] Giebel G. The state of the art in short term prediction of wind power – a
This work was jointly supported by Natural Science Foundation literature overview. Deliverable 1.2b of the ANEMOS. Plus project; 2011. p. 4.
of China (Nos. 51477104 and 51507103), Natural Science Founda- [22] Li S, Wang P, Goel L. Wind power forecasting using neural network ensembles
tion of Guangdong Province (Nos. 2015A030310316 and with feature selection. IEEE Trans Sustain Energy 2015;6(4):1447–56.
[23] Alessandrini S, Sperati S, Pinson P. A comparison between the ECMWF and
2016A030313041), the Foundations of Shenzhen Science and Tech- COSMO ensemble prediction systems applied to short-term wind power
nology Committee (Nos. JCYJ20150525092941041 and forecasting on real data. Appl Energy 2013;107:271–80.
JCYJ20160422165525693), Shenzhen International Cooperation [24] Zhang G, Wu Y, Wong KP, Xu Z, Dong ZY, Herbert HC. An advanced approach
for construction of optimal wind power prediction intervals. IEEE Trans Power
Research Project (GJHZ20150313093836007), Shenzhen University Syst 2015;30(5):2706–15.
Research and Development Startup Fund (Nos. 2016035 and [25] Lee D, Baldick R. Short-term wind power ensemble prediction based on
2015030), and National Basic Research Program (973 Program) Gaussian processes and neural networks. IEEE Trans Smart Grid 2014;5
(1):501–10.
(No. 2013CB228202). [26] Hu QH, Zhang RJ, Zhou YC. Transfer learning for short-term wind speed
prediction with deep neural networks. Renew Energy 2016;85:83–95.
References [27] Lv Y, Duan Y, Kang WW, Li ZX, Wang FY. Traffic flow prediction with big data: a
deep learning approach. IEEE Trans Intell Transport Syst 2015;16(2):865–73.
[28] Chen Y, Lin Z, Zhao X, Wang G, Gu YF. Deep learning-based classification of
[1] Zhao YN, Ye L, Li Z, Song XR, Lang YS, Su J. A novel bidirectional mechanism
hyperspectral data. IEEE J Selected Top Appl Earth Observ Remote Sens 2014;7
based on time series model for wind power forecasting. Appl Energy
(6):2094–107.
2016;177:793–803.
[29] Rashwan MAA, Al Sallab AA, Raafat HM, Rafea A. Deep learning framework
[2] Jong P, Kiperstok A, Sanchez AS, Dargaville R, Torres EA. Integrating large scale
with confused sub-set resolution architecture for automatic Arabic
wind power into the electricity grid in the Northeast of Brazil. Energy
diacritization. IEEE Trans Audio Speech Language Proc 2015;23(3):505–16.
2016;100:401–15.
[30] Hinton G. Deep belief networks. Scholarpedia 2009;4(5):786–804.
[3] Sun DM, Xu Y, Chen HJ, Wu K, Liu KK, Yu Y. A mean flow acoustic engine
[31] Wang HZ, Wang GB, Li GQ, Peng JC, Liu YT. Deep belief network based
capable of wind energy harvesting. Energy Convers Manage 2012;63:101–5.
deterministic and probabilistic wind speed forecasting approach. Appl Energy
[4] Yu Y, Sun DM, Wu K, Xu Y, Chen HJ, Zhang XJ, et al. CFD study on mean flow
2016;182:80–93.
engine for wind power exploitation. Energy Convers Manage 2011;52
[32] Zhang CY, Chen CLP, Gan M, Chen L. Predictive deep Boltzmann machine for
(6):2355–9.
multiperiod wind speed forecasting. IEEE Trans Sustain Energy 2015;6
[5] The Global Wind Energy Council. Global wind energy outlook 2008. The Global
(4):1416–25.
Wind Energy Council, Belgium; Oct. 2008. Available: <http://www.gwec.
[33] Ding J, Liu H, Huang M. Convolutional neural network with data augmentation
net/fileadmin/documents/Publications/GWEO_2008_final.pdf> [accessed: Feb.
for SAR target recognition. IEEE Geosci Remote Sens Lett 2016;13(3):364–8.
11, 2009].
[34] Liang YD, Wang J, Zhou S, Gong Y, Zheng N. Incorporating image priors with
[6] Tastu J, Pinson P, Trombe PJ, Madsen H. Probabilistic forecasts of wind power
deep convolutional neural networks for image super-resolution.
generation accounting for geographically dispersed information. IEEE Trans
Neurocomputing 2016;194:340–7.
Smart Grid 2014;5(1):480–9.
[35] Abdel HO, Mohamed AR, Jiang H, Deng L, Penn G, Yu D. Convolutional neural
[7] Zhao J, Guo ZH, Su ZY, Zhao ZY, Xiao X, Liu F. An improved multi-step
networks for speech recognition. IEEE Trans Audio Speech Language Proc
forecasting model based on WRF ensembles and creative fuzzy systems for
2014;22(10):1533–45.
wind speed. Appl Energy 2016;162:808–26.
[36] Deo RC, Wen XH, Qi F. A wavelet-coupled support vector machine model for
[8] Haque AU, Hashem NM, Mandal P. A hybrid intelligent model for deterministic
forecasting global incident solar radiation using limited meteorological
and quantile regression approach for probabilistic wind power forecasting.
dataset. Appl Energy 2016;168:568–93.
IEEE Trans Power Syst 2014;29(4):1663–72.
[37] Tascikaraoglu A, Sanandaji MB, Poolla K, Varaiya P. Exploiting sparsity of
[9] Ziel F, Croonenbroeck C, Ambach D. Forecasting wind power – modeling
interconnections in spatio-temporal wind speed forecasting using wavelet
periodic and non-linear effects under conditional heteroscedasticity. Appl
transform. Appl Energy 2016;165:735–47.
Energy 2016;177:285–97.
[38] Catalão JPS, Pousinho HMI, Mendes VMF. Hybrid wavelet-PSO-ANFIS approach
[10] Hu J, Wang J. Short-term wind speed prediction using empirical wavelet
for short-term wind power forecasting in Portugal. IEEE Trans Sustain Energy
transform and Gaussian process regression. Energy 2015;93:1456–66.
2011;2(1):50–9.
[11] Liu H, Tian HQ, Pan DF, Li YF. Forecasting models for wind speed using wavelet,
[39] Bouvrie J. Notes on convolution neural networks. Available on-line <http://
wavelet packet, time series and artificial neural networks. Appl Energy
cogprints.org/5869/1/cnn_tutorial.pdf>.
2013;107:191–208.
[40] Wan C, Xu Z, Dong ZY, Wong KP. Probabilistic forecasting of wind power
[12] Li G, Shi J. On comparing three artificial neural networks for wind speed
generation using extreme learning machine. IEEE Trans Power Syst 2014;29
forecasting. Appl Energy 2010;87(7):2313–20.
(3):1033–44.
[13] Liu H, Tian HQ, Pan DF, Li YF. Wind speed forecasting approach using
[41] Muthén B. Moments of the censored and truncated bivariate normal
secondary decomposition algorithm and Elman neural networks. Appl Energy
distribution. Br J Math Stat Psychol 1990;43(1):131–43.
2015;157:183–94.
[42] Pinson P, Reikard G, Bidlot JR. Probabilistic forecasting of the wave energy flux.
[14] Mohammadi K, Shamshirband S, Yee PL, Petkovic D, Zamani M, Ch S. Predicting
Appl Energy 2012;93:364–70.
the wind power density based upon extreme learning machine. Energy
[43] Gneiting T, Balabdaoui F, Raftery AE. Probabilistic forecasts, calibration and
2015;86:232–9.
sharpness. J Roy Stat Soc 2007;69(2):243–68.
[15] Higgins P, Foley AM, Douglas R, Li K. Impact of offshore wind power forecast
[44] Botterud A, Zhou Z, Wang J, Sumaili J, Keko H, Mendes J, et al. Demand dispatch
error in a carbon constraint electricity market. Energy 2014;76:187–97.
and probabilistic wind power forecasting in unit commitment and economic
[16] Meng A, Ge J, Yin H, Chen S. Wind speed forecasting based on wavelet packet
dispatch: a case study of Illinois. IEEE Trans Sustain Energy 2013;4(1):250–61.
decomposition and artificial neural networks trained by crisscross
[45] Li YZ, Wu QH, Jiang L, Yang JB, Xu DL. Optimal power system dispatch with
optimization algorithm. Energy Convers Manage 2016;114:75–88.
wind power integrated using nonlinear interval optimization and evidential
[17] Cristobal GC, Ricardo B, Laura C, Oscar LG. On-line quantile regression in the
reasoning approach. IEEE Trans Power Syst 2016;31(3):2246–54.
RKHS (Reproducing Kernel Hilbert Space) for operational probabilistic
[46] Zhou Z, Botterud A, Wang J, Bessa RJ, Keko H, Sumaili J. Application of
forecasting of wind power. Energy 2016;113:355–65.
probabilistic wind power forecasting in electricity markets. Wind Energy
[18] Yan J, Li K, Bai E, Yang Z, Foley A. Time series wind power forecasting based on
2013;16(3):321–38.
variant Gaussian Process and TLBO. Neurocomputing 2016;189:135–44.

You might also like