Data-Augmented Sequential Deep Learning For Wind Power Forecasting
Data-Augmented Sequential Deep Learning For Wind Power Forecasting
A R T I C L E I N F O A B S T R A C T
Keywords: Accurate wind power forecasting plays a critical role in the operation of wind parks and the dispatch of wind
Renewable energy energy into the power grid. With excellent automatic pattern recognition and nonlinear mapping ability for big
Wind power forecasting data, deep learning is increasingly employed in wind power forecasting. However, salient realities are that in-situ
Data augmentation
measured wind data are relatively expensive and inaccessible and correlation between steps is omitted in most
Deep learning
Encoder-decoder networks
multistep wind power forecasts. This paper is the first time that data augmentation is applied to wind power
Big data forecasting by systematically summarizing and proposing both physics-oriented and data-oriented time-series
wind data augmentation approaches to considerably enlarge primary datasets, and develops deep encoder-
decoder long short-term memory networks that enable sequential input and sequential output for wind power
forecasting. The proposed augmentation techniques and forecasting algorithm are deployed on five turbines with
diverse topographies in an Arctic wind park, and the outcomes are evaluated against benchmark models and
different augmentations. The main findings reveal that on one side, the average improvement in RMSE of the
proposed forecasting model over the benchmarks is 33.89%, 10.60%, 7.12%, and 4.27% before data augmen
tations, and increases to 40.63%, 17.67%, 11.74%, and 7.06%, respectively, after augmentations. The other side
unveils that the effect of data augmentations on prediction is intricately varying, but for the proposed model with
and without augmentations, all augmentation approaches boost the model outperformance from 7.87% to
13.36% in RMSE, 5.24% to 8.97% in MAE, and similarly over 12% in QR90. Finally, data-oriented augmenta
tions, in general, are slightly better than physics-driven ones.
1. Introduction power production of generation companies, the balance of the grid and
may profoundly jeopardize its security. [2] In a large-scale grid-con
Wind is a renewable, sustainable, and environmentally friendly en nected system involving wind power, an unplanned load increase or an
ergy resource. As wind technology has developed in recent years, wind unscheduled wind power decrease will cause a supply-demand imbal
energy has received attention from a growing number of countries for its ance when thermal power or hydropower ceases generation or is
low-cost operation and maintenance, small turbine footprint, flexibility insufficient.[3] Hence, the uncertainty in wind power production en
in development scale, and rapidly decreasing electricity generation larges the required reserve capacity of the system. An accurate wind
costs. [1] power forecast minimizes the spare capacity and enables optimal
Meanwhile, massive electricity generated by wind energy is volatile, dispatch of power in systems with wind power generation. Furthermore,
intermittent, and with low power density. These features influence the an effective prediction serves as a basis for wind parks to engage in
Abbreviations: P
̂ i+n , n timestep ahead predicted wind power; Pi , Measured wind power; vi, Measured wind speed; ui+n , n, timestep ahead wind speed calculated
from weather model; m, Sample number of the testing set; Cap, Designed capacity of the wind turbine; T, Statistic of paired T-test; F, Statistic of paired Friedman test;
BA, Bionic optimized neural networks constructed Adaboost; DA#, Physics-oriented data augmentation strategy – number #; ED, Encoder-Decoder; EDLSTM,
Proposed Encoder-Decoder Long Short-Term Memory neural networks; LSTM, Long Short-Term Memory; MAE, Mean Absolute Error; MSE, Mean Square Error; NLP,
Natural Language Processing; NN, Three-layer backpropagation Neural Networks; NWP, Numerical Weather Prediction; PA#, Data-oriented data augmentation
strategy – strategy number #; PR, Persistence model; QR90, Qualification Rate at the 90% threshold; RMSE, Root Mean Square Error; RNN, Recurrent Neural
Networks; seq2seq, Sequence-to-Sequence; STD, Standard deviation; T#, Wind turbines with different terrain – turbine number.
* Corresponding author.
E-mail address: [Link]@[Link] (H. Chen).
[Link]
Received 7 July 2021; Accepted 18 September 2021
Available online 1 October 2021
0196-8904/© 2021 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license ([Link]
H. Chen et al. Energy Conversion and Management 248 (2021) 114790
generation bidding, determines a reasonable charging and discharging methodologies in data augmentation. [18] Shorten and Khoshgoftaar
strategy for energy storage, and lowers the occurrence and duration of [13] systematically presented current imagery data augmentation
wind curtailments. methods, their promising advances, and methodologies used to imple
Wind power forecasting methodology is generally divided into ment them to boost the performance of imagining deep learning tasks.
physical, statistical, and hybrid approaches. [4] The first predicts wind Cubuk et al. [19] investigated several commonly used image recognition
power by extensive numerical computation of physical equations. It is datasets and designed an augmentation strategy that learns from the
based on fluid dynamics and uses Numerical Weather Prediction (NWP) datasets. The strategy consists of many sub-strategies and is automati
data such as wind speed and pressure, and geoinformation like ground cally selected in the model training process and helps gain 0.4% to 0.6%
roughness and altitude. The method performs best in medium or long- imagine classification accuracy on different datasets. But the data
term forecasting and applies to the wind resource assessment of new augmentation technique is mainly in the field of image recognition and
wind parks that lack historical observations. The statistical approach has received little attention to transfer the technique to the sequence
aims to establish linear or nonlinear patterns within wind data se domain. However, both image and sequence deep learning tasks
quences that can be utilized in forecasting. In particular, machine intrinsically focus on automatically exploiting data features while
learning-based wind power forecasting methods developed in recent avoiding overfitting. So, researchers should concentrate more on data
years are widely applied. The hybrid approach is a combination of the augmentation applied to sequential deep learning. DeVries and Taylor
former categories and has shown its edge profoundly. [5] [20] summarized and utilized interpolation and extrapolation, etc., and
In 2006, Hinton et al. successfully trained deep neural networks (i.e., domain-agnostic approach to reach the predictions with deep learning
artificial neural networks with several hidden layers) and achieved for time-series datasets, and tentatively proved the techniques are timely
excellent performance on multiple datasets, [6] which signified the birth and effective in some supervised learning problems. Park et al. [21]
of deep learning. Since then, deep learning techniques based on neural presented a speech recognition augmentation approach named Spe
networks of different designs have flourished and solved long-standing cAugment consisting of masking features, frequency channels, and time
challenges, such as voice and image recognition and generation, pre steps to reach leading capabilities on two speech recognition mission
liminary implementation of autonomous driving, etc. [7] Recently, the sets.
application of deep learning to energy science has also become popular Deep learning techniques have gotten much attention from re
because of its powerful auto-pattern recognition and nonlinear mapping searchers in renewable energy forecasting. [8] With its distinctive
capabilities. [8] The two major drivers of deep learning evolution are automatic nonlinear recognition capabilities, deep learning has gradu
progressive computational capabilities and the influx of big data. It is ally emerged as an important approach to the challenge of forecasting
generally agreed that larger datasets yield better deep learning models. sharply volatile wind power. [5, 22] Yildiz et al. [23] extracted wind
[9] datasets with features with variational mode decomposition and con
The effectiveness of deep supervised learning relies on the volume verted these features into images. Then the images were handled by an
and quality of labeled training data as well as the topology and pa improved residual-based deep convolutional neural network to forecast
rameters tuning of deep networks. [10] Notably, an effective solution to wind power for a wind park in Turkey. The edge of the proposed process
establish large sets of training data is data augmentation, since the was proved by a comparison between some existing well-used large
training set typically lacks a sufficient number of manually labeled networks. Kisvari et al. [24] constructed a framework consisting of data
samples. Especially in wind energy, it is generally challenging to acquire preprocessing, anomaly detection, feature engineering, and gated
high-quality and long-duration meteorological and power production recurrent deep learning models for wind power prediction and demon
data. strated that the framework offered more effective predictions than
Data augmentation is a technique to make supervised machine traditional recurrent neural networks. Shahid et al. [25] piled up Long
learning, especially deep networks, more efficient. It extends the amount Short-Term Memory (LSTM) units into a large network and tunes the
of available training data by adding modified versions of existing data or network by using the genetic algorithm to forecast wind power validated
new data generated based on existing data. Technically, data augmen the statistical advantage of the network over a single unit by the Wil
tation imposes a sort of perturbation or noise on the datasets, both of coxon Signed-Rank test. Memarzadeh et al. [26] applied a bionic algo
which are viewed as unfavorable factors in signal processing and sta rithm, wavelet transform, feature selection, and LSTM networks to
tistical modeling and need to be removed by implementing filters. [11, forecast wind power of two wind parks in Spain and Iran, and showed
12] However, the technique effect in deep learning is to regularize the the effectiveness of the proposed method by comparison with bench
model and assist in mitigating overfitting during deep training, thereby mark neural networks.
improving the generalizability and ubiquity of the learned models. While numerous wind power models based on a hybrid of traditional
Overfitting is a phenomenon that occurs as a learner learns a function data methodologies and deep learning have been developed and
with extraordinarily large variance, such as perfectly fitting the training advanced in forecasting for many sites, nevertheless, further sophisti
data. Generalizability defines the difference in performance when a cation of forecasting models may render the results specific, i.e., wind
model is assessed in relation to data in the training set previously seen power forecasts are restricted to a certain category of terrain and
compared to previously unseen data in the testing set. [13] weather features and difficult to be generalized and not be easily
Essentially, using multi-inputs to make multistep wind power fore employed because their consisting techniques such as signal processing,
casting can be regarded as a Sequence-to-Sequence (seq2seq) prediction feature engineering, etc. require a prolonged and special training to
that is framed as a mapping of multiple inputs to multiple time-series master. Lipu et al. [27] also summarized the most recent progress of
outputs. It was demonstrated that the seq2seq model “approaches or wind power forecasting using artificial intelligence and pointed out the
surpasses all currently published results” [14] in Natural Language Pro issues and challenges in the field. The challenges include many various
cessing (NLP), like Google Translate, and recently it has also shown its data preprocessing techniques for diverse wind data, model structure,
promise in renewable energy forecasting. [15, 16] The Encoder-Decoder and optimization, etc. In particular, Reichstein et al. [28] recommended
(ED) Recurrent Neural Networks (RNN) has successfully handled seq2 that more attention should be given to Earth system science problems to
seq problems [17] and exhibits state-of-the-art performance in the area the coupled data approaches with physical phenomena and deep
of text translation that is fundamentally a time-series problem. learning methods themselves, rather than building more complex
traditional methods-based models.
1.1. Previous work review In the present study, in the contrast, we return to the physical process
of wind power generation, the statistical characteristics of wind data,
In computer science research, there are several developed and the nature of deep learning to approach the forecasting problem.
2
H. Chen et al. Energy Conversion and Management 248 (2021) 114790
After synthesizing numerous data augmentation methodologies and proportional to the cubic of the wind speed, is the wind curve function at
drawing on multiple state-of-the-art advances in sequential data pre the speed interval, CP means wind energy utilization efficiency; ρ is the
diction, the robust and efficacious encoder-decoder deep neural net air density (kg/m2); A is the effective area swept by turbine blades (m2),
works with stacking LSTM units are proposed for wind turbine power v denotes the wind speed (m/s); vmin, vmax, and vn respectively are cut-in,
forecasting in the Arctic. cut-off, and rated wind speed. Pr is turbine rated wind power. From (1),
the output of a wind turbine is mainly influenced by the third power of
wind speed, air density, and swept area.
1.2. Contributions The study centers on the wind turbine, 3.0 MW Vestas V90, elec
tricity production of a wind park, Fakken, with an installed capacity of
Leveraging the aforementioned literature review, attention is paid to 54 MW with 18 turbines, average annual production is 139 GWh in the
a wind park, inside the Arctic, in complex terrain. The principal con Arctic region. Wind is predominantly influenced by the terrain; wind
tributions of the present study paper are as follows: anomalies occur when wind moves through these areas. The influence is
dependent on the height and width of the barriers. The terrain of Fakken
1. This paper systematically applies data augmentation to wind power wind park is with low and flat hills and narrow valleys, and towards a
forecasting for the first time. Specifically, eight time-series data fjord.
augmentation approaches are proposed according to physical char The timescale of data in this study is from 0:00 1st January 2017 to
acteristics of wind energy and statistical properties of data in wind 23:50 31st December 2017. Raw wind speed and power data of each
engineering. The approaches are implemented in four benchmarking turbine, 10 mins temporal resolution and recorded by Supervisory
models and proposed advanced deep learning models. The method Control And Data Acquisition SCADA, are supplied by a local wind en
ology is particularly suitable for new wind parks that have a short ergy operator. The NWP wind speed data, calculated by the Meteoro
period of operation and therefore a limited amount of accumulated logical cooperation on operational Ensemble Prediction System (MEPS)
data. It enables to fully and automatically deepen the information NWP model, are with 2.5 km horizontal resolution that is taken as the
and value of these limited data. mesoscale. The model, operating by the Norwegian Meteorological
2. We exhaustively develop a seq2seq deep learning predictive end-to- Institute, updates at 00, 06, 12, and 18 UTC, and its forecasts for the next
end model with inputs of historical wind speed and power data and 66 h are available around 1 h 15 min later. The wind speed data se
wind speed from NWP as well as simultaneously interrelated outputs quences from NWP comprise the nearest accessible weather prediction
of multistep, futuristic wind power. The model is based on an data.
encoder-decoder constructed with LSTM and shows its superiority in To verify the generality and portability of the proposed methodol
forecasting power. ogy, five wind turbines separately situated in different topographic
3. It is demonstrated that the impact of various augmentation ap conditions in the wind park are selected as study subjects. Moreover,
proaches is different in each forecasting algorithm. Augmentations wind measurements are taken at the turbine nacelle, which is 80 m
somewhat increase linear, like persistence, model errors. Nonethe about the ground. Their topographic features and statistics of annual in-
less, augmentations improve the performance, most notably the situ measured wind speed and power are shown in Table 1.
proposed deep learning model, of neural networks-based algorithms, Statistically, wind power forecasting can be regarded as a multi
where data-oriented augmentations generally contribute greater variable regression problem, in which wind power time series is autor
than physics-oriented ones. egressed, and wind speed serves as the supplementing information to the
4. The data augmentations combined with the proposed and bench autoregression. Updating the wind speed from NWP of the predicted
mark forecasting models are utilized to predict power generated by time, the current information, is also the key feature in the prediction
five turbines in various landscapes. The results are analyzed by since according to an extensively cited reference by Giebel and Kar
rigorous statistical methods and indicate that the augmentations and iniotakis [29], forecasting wind power beyond three to six hours typi
the proposed forecasting model have wind engineering values and cally requires consideration of information on NWP wind speed at the
potentially extensive applicability in other energy sectors. moment of prediction. In this study, we chose measured data of the
previous six hours to make multistep forecasts for the wind power from
The architecture avenue opens the article with an introduction on the next six to twelve hours with the assistance of wind speed from NWP.
wind energy forecasting and its deep learning utilization status quo as The fundamental multistep forecasting model f(.) with timestep i + n
well as contributions presented in Section 1. Section 2 illustrates the is described as:
principle of wind power generation and the utilized data and scheme. ( )
Section 3 delves into proposed data augmentation techniques and a ̂ i+n = f Pi− j ; vi− j ; ui+n + εn
P (2)
novel predictive deep neural network. Section 4 provides detailed
experiment procedures and model assessment metrics. In Section 5, hi where i represents the base current time i = 1, 2, …, 7, and with each i, j
erarchical experimental results and discussions, from comparisons of = 0, 1, …, 6. P ̂ i+n is n timestep ahead predicted wind power,
models themselves to data augmentation approaches, are presented. n ∈ {6, 7, 8, 9, 10, 11, 12}, v is the wind speed observed in the turbine, u
Finally, the main findings, research outlooks, and derivative policy represents the wind speed calculated from the mesoscale NWP wind
recommendations are demonstrated in Section 6. model for the site. εn is the error of the forecasting model.
Since the scopes of wind power and speed are not the same, it is
2. Data preparation and forecast scheme beneficial to rescale the raw data into a new set with a similar scale. Data
standardization is rescaling variables with a mean of zero and a standard
Wind power generation is a conversion from wind energy to elec deviation (STD) of one. The technique can accelerate convergence speed
tricity. Ideally, the output generation of a wind turbine is expressed as in and improve algorithms’ accuracy of neural networks. [30]
(1):
⎧ 3. Methodology
⎪
⎪ 0 v < vmin
⎨
Pv (CP , ρ, A; v) vmin < v < vn
P= (1) 3.1. Wind data augmentation
⎪
⎪ Pr vn < v < vmax
⎩
0 v > vmax
In practice, testing errors need to be continuously reduced along with
where P is the output power of the wind turbine (W); Pv (.), typically training errors to construct meaningful deep learning models. Data
3
H. Chen et al. Energy Conversion and Management 248 (2021) 114790
Table 1
The terrain and statistics of wind turbines.
Wind Turbine Terrain Wind power Wind speed
Mean [kW] STD [kW] Skew Kur Mean [m/s] STD [m/s] Skew Kur
Note: STD is standard deviation, Skew is skewness and Kur is relative kurtosis (actual kurtosis minus 3).
augmentation is a phenomenally robust approach to accomplish this processing, and machine learning techniques. It consists of five ap
aim. It embarks on overfitting from the origin, the training data them proaches. DA1: Various simple interpolation and extrapolation methods
selves, of the problem, assuming that further information can be are used to obtain data on larger time scales. DA2: Implements noise to
retrieved from the source dataset. the original dataset. DA3: Sequential augmentation approaches, named
Based on know-how in wind energy technology and state-of-the-art geometric transformations, draw on image processing, symmetry or
data science, we divide the techniques for augmenting wind data for flipping, translation, and random erasing. DA4: Methodology used for
forecasting with robust and efficient deep learning into two categories: decomposition in time-series data. DA5: Scenario generation methods
physics-oriented and data-oriented. for the single turbine include statistical and machine learning
generation.
3.1.1. Physics-oriented approaches DA1: Averaging is usually required to calculate the data in hourly
Inspired by the physics of wind power engineering, we propose three units as the original measured dataset is in ten-minute increments. The
strategies to augment training set data for forecasting models. The first is new hourly data can be acquired by performing some interpolation or
the explicit perturbation of the wind power curve according to Eq. (1). extrapolation modification to this averaging process. The new averaging
The second is the implicit perturbation based on the difference between is defined as:
the numerical weather predicted wind speed of the wind park area and ∑6
the actual measured wind speed of turbines. The third considers the x’t = j=1
ωj xj (4)
operational data of the other wind turbines in the vicinity of the studied
wind turbines. These three physics-oriented approaches are shortened as where x’t is the hourly data and xj donates the raw 10-mins data. ωj is the
PA1, PA2, and PA3, respectively. stochastic weight that fulfills:
∑6
j=1 ωj = 6, (− 0.3 ≤ ωj ≤ 1.3), which
PA1: Considering the wind speed as the independent variable and when ωj < 0 is extrapolation while ωj ≥ 0 means interpolation.
differentiating Eq. (1), the following Eq. (3) is obtained.
DA2: Another simple, probably the simplest, method of data
⎧
⎪ augmentation is the addition of white noise, following the standard
⎪ 0 v < vmin
⎪(
⎪ normal distribution, to data. A wind power forecasting study considered
⎨ P’ (C , ρ, A; v)/P)dv v < v < v
⎪
dP = v P min n
(3) noise in data as a detrimental factor for prediction and removed it by
⎪
⎪
⎪ 0 v n < v < v max signal processing. [32] Nonetheless, in machine learning research,
⎪
⎪
⎩ 0 v > vmax applying noise to the neural network’s inputs increases the generaliz
ability of the networks. [18] The noise injection is determined with a
from Eq. (1), it is observed that when v is in the cut-in and rated wind scaling parameter δ:
speed interval, the derivative of the power curve, the ratio of tiny var x’t = xt + δX, X ∼ N (0, σ i ) (5)
iations in wind turbine power and wind speed, is proportional to the
quadratic of this point wind speed. Therefore, according to Eq. (3), it is where x’t is the enhanced data and xt donates the original hourly data.
possible to artificially adhere a slight random perturbation in a wind DA3: Geometric transformations are among the initial data
speed point in the interval and calculate the corresponding power augmentation methods with excellent effectiveness in deep learning for
variation in accordance with the speed. image recognition, such as flipping, cropping, and color trans
PA2: According to Eq. (2), the input to the power forecasting model formations. [13] Based on the characteristics of the measured wind
contains the wind speed from measurements and the NWP model, but speed time series and referring to image geometric augmentations, we
they correspond to different time stamps when entering the model. Since stochastically opt for, 10% respectively, symmetry along with the
NWP datasets also have wind speeds that correspond to the same time average point, substitution of prior or posterior values, and stochastic
stamps as the measured wind speeds, and there is no significant differ erasing of some data.
ence in wind speed probability distribution from two wind speed re DA4: Wind power forecasting is known mathematically as a special
sources in the wind park based on our previous study. [31] So, we resort time-series problem. Ordinarily, the time series xt can be decomposed
to a random replacement strategy with a fixed probability to replace the into base αt , trend τt , season st , and residual γt part as in Eq. (6).
wind speeds in the measured datasets with the correspondent NWP wind
speeds. xt = αt + τt + st + γt , t = 1, 2, …, N (6)
PA3: Since the neighboring turbines to the target turbine have The extensively implemented approach is firstly based on the time-
similar wind conditions in operation. Therefore, adopting the measured domain figure of the time series or its Fourier analysis to obtain its
wind speed of the neighboring turbine with a specific probability to period corresponding to seasonality, and then decomposes the time se
replace the target turbine could be a strategy to augment the target wind ries with the loess smoothing technique, [33] a locally weighted
speed dataset. autoregression, into the above four components. The weights of these
four components are subsequently and stochastically adjusted by Eq. (7)
3.1.2. Data-oriented approaches to form an augmented series.
The proposed taxonomy for the data-oriented methods for wind ∑4
power forecasting is enlightened by the feature space expansion, signal x’t = ω1 αt + ω2 τt + ω3 st + ω4 γt , ωi = 4, 0.9 ≤ ωi ≤ 1.1 (7)
i=1
4
H. Chen et al. Energy Conversion and Management 248 (2021) 114790
DA5: The data augmentation methodologies described above all 3.3. Proposed deep EDLSTM for wind power forecasting
involve randomness, data selections, and/or weight adjustments, so they
are relatively independent of the data and require considerable manual According to Eq. (2), wind power prediction involves autoregression,
fine-tuning. Wind power scenario generation is an effective tool to multiple sources of wind speed, and nonlinear functional relationships,
resolve uncertainties in stochastic planning of the energy system with all of which may lead to the application of EDLSTM networks. In addi
the integration of wind power. [34] Classical and advanced statistical tion, multistep wind power forecasting is appropriate to be handled as a
methods and machine learning models are broadly employed [35] to seq2seq problem since the historical data of the inputs are linked and
predict wind power scenarios. Intrinsically, these models profile con interactive. Therefore, a deep, stacked multilayers EDLSTM, shorten as
ditional distributions of time series by assuming that the current value EDLSTM, is proposed and utilized to extract the implicit features from
depends on previous points: a new time series may be generated from layer to layer. The detailed deep EDLSTM employed in this article is
the learned conditional distributions provided that original series values illustrated in Fig. 1.
are perturbed in some way. First, the encoder consists of a stack of three-layer LSTMs, which
sequentially extracts complex time-dependent features of inputting
3.2. Encoder-decoder LSTM deep networks measured and meteorological data deeply layer by layer with trans
ferring hidden states h. And then generate a fixed-length context vector
RNN has achieved tremendous success and wide application in containing the extracted characteristic information. The structure and
numerous sequence applications. [18] RNN is designed to process transmission of information for the decoder are basically identical to
learning tasks with sequential data. ‘Recurrent’ means the current those for the encoder. Then, the context vector serves as the initial input
output is related to the previous output. The nodes in hidden are to the decoder. Regardless of the updating from the encoder of the
structurally connected to each other to reach inputs of the hidden layers context vector, the vector is sent to the first layer of the decoder as its
includes not only outputs of the input layer but also ones of the previous- input, and its output is used as the input of the second layer. Sequen
time hidden layers. tially, the third layer output is transformed through the output layer and
Among the RNN network structures, the most extensively used and cyclically fed back to the first layer as its next input. Eventually, the
highly successful model is the LSTM network, with a kind of unique decoder generates a time series of the predicted wind power.
memory unit in its hidden layers and is generally more expressive of
long-short time dependencies than the other RNNs. [36] Typically, the 4. Experiments
LSTM unit consists of three gates, i.e., input gate, forget gate, and output
gate. There are three primary internal phases of the unit. The first is 4.1. Experimental scheme
forget phase, which retains the important information coming in from
the previous node and forgets the unimportant details. The next phase is The scheme of forecasting individual turbine wind power by
the selective memory phase, which optionally remembers inputs of this employing EDLSTM with data augmentation is animatedly illustrated in
phase. Finally, an output phase determines which ones should be treated Fig. 2. Firstly, the measured wind speed and power with the ten-minute
as outputs of the current state. Mathematically, the long-short memory resolution are averagely interpolated into, except for the DA1
unit can be expressed as [37]: augmentation measure, data with hourly resolution. All hourly data are
segmented into training and testing sets, accounting for 65% and 35%,
it := σ((W xi xt + W hi ht− 1 + bi ),)
respectively. Secondly, the measured wind speed and/or wind power
f t := σ W xf xt + W hf ht− 1 + bf ,
ot := σ (W xo xt + W ho ht− 1 + bo ), data in the training set are separately augmented with the approaches
(8) proposed in Section 3.1 to enlarge the data amount to five times the
ct := tanh(W xc xt + W hc ht− 1 + bc ),
ct := f t ⊙ ct− 1 + it ⊙ ct , original training set size. i. e., the new data with the four times larger
ht := ot ⊙ tanh(ct ). size of the original training set are generated with augmentations.
Thirdly, the unexpanded and expanded training sets are individually fed
where xt is the input and ht− 1 is the hidden state of the previous time into the benchmark models, i.e., Persistence (PR), simple three-layer
step. it , f t , and ot are input, forget, and output gates, W. denotes the backpropagation Neural Networks (NN), basic LSTM RNN (LSTM), Bi
corresponding weight parameter, and b. is the corresponding bias onic optimized neural networks constructed Adaboost (BA) ensemble
parameter. ct is the candidate memory cell, ct is the memory cell, and leaning (regarded as a popular and advanced hybrid forecasting model
ct− 1 is its previous time step state. ht is the hidden state. σ (.) is the have been proven to perform well and have been extensively studied
sigmoid function, tanh (.) is hyperbolic tangent function, and ⊙ repre [39, 40, 41, 42], namely, ensemble learning perdition models) and the
sents the pointwise multiplication. proposed deep EDLSTM network to conduct training and obtain multiple
The encoder-decoder LSTM is a type of EDRNN network designed to learned models. The benchmark models have been introduced in
deal with seq2seq, and its architecture is innovative in terms of sequence Ref. [41, 43, 44] and their parameters are briefly summarized in Table 2.
embedding, i.e., the usage of a reading-in and exporting-out fixed-size Finally, the testing set data are imported into the trained models to yield
sequences. The encoder-decoder LSTM includes an input layer, LSTM the multistep predicted wind power and to assess and compare the
based encoder and decoder, and an output layer in this study. The LSTM forecasting models’ performance.
unit achieves the extraction and utilization of important information in
the sequence through its gate controls. The encoder reads input se 4.2. Data augmentation program
quences and encodes them into fixed-length vectors by the weight of
each time step with a context vector. The decoder decodes these fixed- Our data augmentation strategy fine-tunes the data without altering
length vectors and outputs predicted sequences. The fixed-length the temporal order of the original data and ensures that the augmented
context vector introduces a mechanism called Attention, which en training data and the previous ones maintain statistical consistency. This
ables highly summarize and highlight the information learned by the study augments the training samples and scales up their number to five
encoder and uses it as input to the decoder for translation. The encoder times the original sample size. The data augmentation techniques
and decoder networks are mutually independent, which indicates that explained above, apart from DA5, all involve stochastic perturbation of
their LSTM units do not share parameters during the process of networks the original data. Our method is to gradually enlarge the perturbation
training. amplitude and accordingly generate new data four times. For the DA5
method, four new datasets are produced by individually operating
autoregressive models based on four machine learning models. Details
5
H. Chen et al. Energy Conversion and Management 248 (2021) 114790
Fig. 1. The structural diagram of the proposed deep EDLSTM for wind power forecasting (The LSTM unit graph in is cited from Ref. [38]), (The three-layer stacking
structure of the LSTM of the encoder and decoder is designed to mine the point information and the dependence information of sequences through the state transfer
twice between the three layers.).
Table 2
A summary of forecasting models’ parameters.
Forecasting Main parameters
model
PR The predicted value for the next moment is the current moment’s
value.
NN The input, hidden, and output layers are with 15, 30, and 7
neurons, respectively; sigmoid activation function, and MSE loss
function. (The number of hidden layer neurons is determined
based on a grid search with density of 5 from 10 to 100.)
LSTM One fully connected dense NN layer, Seven LSTM units
(TensorFlow optimized default settings for regression problems),
and one dense NN with 7 neurons as output layer; sigmoid
activation function, MSE loss function, and Adam algorithm
optimizer. (Dense NN layers are the same with NN model.)
BA As the performance of a neural network is intimately linked to
neuron number in the hidden layer, the genetic algorithm, [45] a
bionic algorithm, is applied in training iterations to automatically
search for the adaptive neuron number and constitute optimized
neural networks as Adaboost’s base learners. The node number
searching interval is set as [10,100] and the max iteration is 50.
The Adaboost emphasizes (with bigger weights) data mislearned
in the previous base learner to establish an ensemble model that
boosts the performance of single base learners. The number of
base learners is 10 and Adaboost max iteration is 20. (BA model
behaves like a deep learning model called the residual deep
network, [46] which is similar to seq2seq structure in forward
propagation to achieve the integration of input and output for
effectively mining the features; while in backpropagation some
gradients are fed directly to output, avoiding gradient vanishing.)
Fig. 2. The main procedure of the data augmentation based EDLSTM for pre EDLSTM As described in Section 3.3 and Fig. 1. (LSTM unit is TensorFlow
dicting wind power. optimized default settings for regression problems.)
6
H. Chen et al. Energy Conversion and Management 248 (2021) 114790
Table 3 ⎧ ⎛ ⃒ ⃒⎞
⃒ ̂ i ⃒⃒
A detailed description of each data augmentation process. ⎪
⎪
⎪ ⃒Pi − P
⎪
⎪ ⎜ ⎟
⎪ 1, ⎝1 − ⎠⩾Q
Physics- PA1 The Vestas V90 3 MW wind turbine corresponds to a cut-in ⎪
⎪ Cap
⎪
oriented and rated wind speed of 4 and 15 m/s, respectively, 1 ∑m ⎨
according to its power curve. Select the measured wind speed QR = ⎛ ⃒ ⃒⎞ (11)
m i=1 ⎪
⎪ ⃒ ̂ i ⃒⃒ 〈
vi in the corresponding interval; v’i = vi + X, X ∼ ⎪
⎪
⎪ ⎜ ⃒Pi − P
⎪ ⎟
U[ − 0.1n, 0.1n], n = 1, 2, 3, 4, where U represents the ⎪
⎪ 0, ⎝1 − ⎠ Q
⎪
⎩ Cap
uniform distribution. Then the power variation
corresponding to the wind speed variation is calculated by
Eq. (3), and new power data are generated accordingly.
where Cap is the designed capacity of the turbine. Q is the quantile
PA2 The measured wind speeds are randomly substituted with 50
% probability four times with NWP wind speed data with the percentage for qualified predictions, chosen as 90% in this study.
same timestamps, and the wind power data are added a white Two statistical tests are employed to check whether there are sta
noise following N(0,0.1). tistically significant differences exist in the performance of forecasting
PA3 We select measured wind speeds of the two closest turbines models. And both of their conference values are set as 0.05. The first is
to the target turbine and randomly substitute, with a
paired T-test for the two comparisons. The null hypothesis H0: The av
probability of 15% for each and a total of 30%, the target
wind speed dataset. The power data are with the same erages of these samples are equivalent; Ha: The averages are not
treatment in PA2. equivalent. And its test statistic T is:
Data- DA1 As described in DA1 introduction in Section 3.1.2.
Y1 − Y2
oriented DA2 Two normally distributed noises, N(0,0.1n) and N(0,0.02n), T= ) ∼ t2l− (12)
STD(
2
are separately loaded into the measured wind speed and
Y1− Y2
power data four times, where n = 1, 2, 3, 4.
DA3 As described in DA3 introduction in Section 3.1.2.
DA4 As described in DA4 introduction in Section 3.1.2. where Y is the average and l is the number of samples.
DA5 Four learning algorithms to augment measured wind data, The second is the Friedman test, for multiple comparisons, is har
such as; x’t = fi (xt− 1 , xt− 2 , xt− 3 , xt− 4 , xt− 5 , xt− 6 ), i = 1, 2, 3, 4,
nessed to examine across multiple trials and checks column effects after
where x’t is the generating data, fi () represents a single step
ahead forecasting model established by learning algorithms.
statistically eliminating potential row effects. [48]
f1 (.) is linear regression, f2 (.) is support vector H0: The column data do not have a significant difference.
regression,f3 (.) is classification and regression tree, andf4 (.) Ha: They have a significant difference.
is simple three-layer neural networks with 15 hidden neurons The statistic F is given as:
regression models, respectively. All four are well-established
[ ]
and widespread machine learning algorithms, and a detailed ∑
12l k
k(k + 1)2
description of them can be found in Ref. [43] for space F= r2i − (13)
constraints. k(k + 1) i=1 4
Note: The units of wind speed and power in the table are m/s and MW, where k is the number of columns. ri is the average value of row i, which
respectively.
follows χ 2(k− 1) distribution under H0.
where Pi and P̂i are normalized measured and corresponding predicted The standardized measured and NWP wind data of chosen five wind
wind power, m is the sample number of the testing set. turbines are respectively loaded into the four benchmarks and proposed
Nevertheless, the RMSE is with a disproportionately big effect of deep EDLSTM models to make six to twelve hours ahead of wind power
larger errors and, sometimes, is close when comparing some different forecasts. The RMSE is displayed in Fig. 3. In general, the RMSE of all
forecasting models. Therefore, in these cases, Mean Absolute Error forecasting models grows as increasing prediction steps. The PR grows
(MAE) and Qualification Rate (QR) [47] indices are introduced as below faster compared to the other models. The proposed deep EDLSTM out
to comprehensively assess the performance of models. MAE uniformly performs best among all models for multistep power prediction for all
examines the forecasting errors while the QR emphasizes the smaller wind turbines in almost all cases. The RMSE of the NN, LSTM, BA, and
errors. EDLSTM all constructed on neural networks is noticeably smaller than
the one of PR, suggesting that neural networks can reflect the nonlinear
∑m ⃒⃒ ⃒
⃒̂ characteristics of wind power. Moreover, these characteristics are better
i=1 ⃒Pi − Pi ⃒
MAE = (10) retained by the forecasting models as the networks are deeper and more
m
tailored. On the overall average, the benchmarking PR, NN, LSTM, and
BA models have RMSE that is 51.46%, 11.89%, 7.67%, and 4.46%
higher than EDLSTM. This demonstrates that the proposed model can
efficiently and accurately predict the power generated by the five wind
7
H. Chen et al. Energy Conversion and Management 248 (2021) 114790
Fig. 3. The multistep performance of benchmarking and deep EDLSTM forecasting models for each turbine: (a) 1. Plateau, (b) 2. Valley, (c) 3. Lakeside, (d) 4.
Hilltop, (e) 5. Seaside, (f) Average.
lakeside, both of which are regarded as flat terrains. In contrast, the Mean 0 0.003876 0.005813 0.004610 0.010291
unique fjord topography on the Norwegian coast causes wind turbines p-values / 0.002219 0.000083 0.000173 0.000001
Fig. 4. The overall average RMSE of multistep forecasting models without and with data augmentations for each turbine: (a) Without augmentations, (b) With
augmentations.
8
H. Chen et al. Energy Conversion and Management 248 (2021) 114790
hidden and sophisticated patterns in the forecasting. In addition, the values are demonstrated in Table 5. Among the power forecasts based on
STD of RMSE between multiple predictions shows no significant varia data augmentations for all turbines, The effect of different augmentation
tion before and after data augmentations, which points out that the ef approaches for forecasting models is not statistically significant in most
fects of data augmentations are approximate for each step. Generally, cases, such as in NN, LSTM, and most cases of BA. Particularly, the
the average RMSE of augmented models of NN, LSTM, and BA separately proposed EDLSTM models’ RMSE, with a relatively complex p-value set,
grows by 21.47%, 13.30%, and 7.60% compared with augmented differs only in sixth and seventh step forecasts with varying augmenta
EDLSTM. tions. Additionally, in view of the EDLSTM’s favorable outperformance
To more explicitly show outcomes of the various data-augmented in wind power forecasting, the decrease rate of average multistep RMSE
models, the RMSE of each step prediction based on the eight augmen for each augmented versus unaugmented model based on the same
tation approaches is averaged and plotted in Fig. 5. forecasting algorithm is computed. The rate is averaged among five
By comparing Fig. 5 with Fig. 3, it can be found that: first, the ten turbines and illustrated in Fig. 7. The p-value for the multivariate
dency of gradually increasing RMSE persists of data-augmented multi comparison between these RMSE decrease rates is 0.00033, much less
step predictions. Secondly, the augmented EDLSTM model outperforms than 0.05, indicating that overall improvements in EDLSTM perfor
its counterpart based on raw data in almost every step of prediction for mance with various augmentations are statistically different. In general,
all wind turbines. And thirdly, the power prediction of T3 wind turbine based on RMSE, PA3, PA2, DA1, and PA1 provide modest improve
is the best, corresponding to the RMSE of the data augmented EDLSTM ments, from 7.87% to 9.96%, to the EDLSTM model, while DA5, DA4,
model is barely less than 0.11, and the second-best one is T1. Further DA3, and DA2 improve, sequentially from 10.80% to 11.36%, the model
more, the predictions for T2, T4, and T5, located in complex terrain, are relatively substantially.
also significantly improved. Thus, data augmentation improves EDLSTM Despite the varying decrease degrees in RMSE for the EDLSTM
for power forecasting, resulting in satisfactory reductions in model models with different augmentation approaches, the difference is min
RMSE errors. imal between some approaches, like DA4 and DA5. To further compare
the effects of different augmentations, the average MAE and QR90 of
forecasts with the same scenario as in Fig. 7 are gained and their change
5.3. Competition between diverse data augmentation methodologies rates before and after augmentations are calculated and tested in Figs. 8
and 9. The p-value of MAE decrease rate comparison is 0.0023, less than
The superiority of data augmentation approaches as a whole in wind 0.05, also smaller than its counterpart of RMSE, which also means
power prediction is elaborated in Section 5.2. To further investigate varying augmentations give statistically different boosts in EDLSTM.
which data augmentation approaches are more effective, the average Similar to Fig. 7, the DAs are better than PAs, but Fig. 8 offers a clearer
and STD of RMSE for each step of prediction by algorithms based on distinction between several DAs. DA4 and DA5 have a greater MAE
different augmentation approaches are taken and presented in Fig. 6. As decline, 8.97% and 8.82%, than DA2 and DA3, 8.49% and 7.79%, which
can be seen, there is no obvious regularity in the average multistep generally indicates that the former two provide closer predictions to the
forecasting performance with different augmentation-based models. real values. But DA4 and DA5 may have big deviations in some fore
That is, the results of various augmentation approaches in different casting points, so these data-oriented augmentations are quite close in
forecasting algorithms are not tendentious. The overall RMSE of distinct Fig. 7. The p-value of QR90 increase rate comparison is 0.0052, bigger
augmentations is comparable in NN, LSTM, and BA but the opposite is than 0.05, which illustrates different augmentations have no significant
the view in EDLSTM. Nevertheless, certain patterns exist for augmen different improvements, around 12% to 13%, in QR90. This phenome
tations in the prediction of different turbines. Regardless of what aug non reveals that either augmentation technique can elevate the quali
mentations, the errors in predictions for turbines in flatter terrain are fication rate of the EDLSTM model in a relatively similar amount, and
smaller, consistent with the predictions without augmentations. provide satisfactory forecasts in terms of this evaluation index.
As a further statistical examination to test the variation in different To summarize, the impact of the different data augmentation
data augmentation in multistep predictions, the Friedman test to answer methods on the benchmark models is not significantly different. How
whether there is a difference between the RMSE averages of the five ever, the improvement for the deep EDLSTM is slightly varied,
wind turbines with different augmentations in the same time step. The p-
Fig. 5. The multistep average RMSE of forecasting models with data augmentations for each turbine: (a) NN, (b) LSTM, (c) BA, (d) EDLSTM.
9
H. Chen et al. Energy Conversion and Management 248 (2021) 114790
Fig. 6. The multistep average RMSE of forecasting models with various data augmentations for each turbine: (a) NN, (b) LSTM, (c) BA, (d) EDLSTM.
Table 5
The p-values of RMSE Friedman test within five turbines for multiple comparisons in different data-augmented approaches.
P-values 6 7 8 9 10 11 12
Note: The p-values less than 0.05 are marked in italics meaning H0 is rejected.
Fig. 7. The average RMSE decrease rate of multistep EDLSTM forecast with Fig. 9. The average QR90 increase rate of multistep EDLSTM forecast with
various augmentations for averaging five turbines. various augmentations for averaging five turbines.
10
H. Chen et al. Energy Conversion and Management 248 (2021) 114790
information, like signal decompositions, of the wind data by mentioned [5] Hanifi S, Liu X, Lin Z, Lotfian S. A critical review of wind power forecasting
methods—past, present and future. Energies 2020;13(15):3764.
augmentation techniques.
[6] Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural
The impact of the eight data augmentation approaches employed, networks. Science 2006;313(5786):504–7.
three physics-oriented and five data-oriented, on wind power prediction [7] Kim P. In: MATLAB Deep Learning. Berkeley, CA: Apress; 2017. p. 1–18. https://
is forecasting arithmetic sensitive. For the proposed well-performing [Link]/10.1007/978-1-4842-2845-6_1.
[8] Wang H, Lei Z, Zhang X, Zhou B, Peng J. A review of deep learning for renewable
EDLSTM, various augmentations can approximately, by over 12%, energy forecasting. Energy Convers Manage 2019;198:111799.
boost the forecasting qualification rate at the 90% threshold. But aug [9] C. Sun, A. Shrivastava, S. Singh, and A. Gupta, “Revisiting unreasonable
mentations improve the forecasting performance to slightly different effectiveness of data in deep learning era,” in Proceedings of the IEEE international
conference on computer vision, 2017, pp. 843-852.
degrees when evaluated by RMSE and MAE: multistep and multiturbine [10] Wang X, Shi Y, Kitani KM. Deep supervised hashing with triplet labels. In: Asian
meanly, the improvement varies from approximately 7.87% to 11.36% conference on computer vision. Springer; 2016. p. 70–84.
of RMSE and 5.24% to 8.97% of MAE within one standard deviation, and [11] Solomon C, Breckon T. Fundamentals of Digital Image Processing: A practical
Approach with Examples in Matlab. John Wiley & Sons; 2011.
generally, data-oriented augmentations outperform physics-oriented [12] Chatterjee S, Hadi AS. Regression Analysis by Example. John Wiley & Sons; 2013.
ones. Among data-oriented augmentations, the results illustrate that [13] Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep
EDLSTM’s forecasting RMSE is significantly decreased even by simply learning. Journal of Big Data 2019;6(1):1–48.
[14] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural
appending noisy and randomly perturbing, or moving data the same way networks,” arXiv preprint arXiv:1409.3215, 2014.
as sophisticated statistical data decomposition and learning data gen [15] Zhang Y, Li Y, Zhang G. Short-term wind power forecasting approach based on
eration, however, as per MAE, the latter two provide overall closer Seq2Seq model using NWP data. Energy 2020:118371.
[16] Pirhooshyaran M, Scheinberg K, Snyder LV. Feature engineering and forecasting
predictions to the real power.
via derivative-free optimization and ensemble of sequence-to-sequence networks
Our future research, on the basis of this paper, foresees to further with applications in renewable energy. Energy 2020;196:117136.
investigate de facto more advanced data augmentation techniques and [17] K. Cho et al., “Learning phrase representations using RNN encoder-decoder for
integrate them into the proposed model to conduct in-depth point and statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014.
[18] Goodfellow I, Bengio Y, Courville A, Bengio Y. Deep Learning. MIT press
probability predictions and attempt industrial applications in extensive Cambridge; 2016.
comparisons with other forecasting models. [19] E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, and Q. V. Le, “Autoaugment:
Additionally, ensuing policy recommendations may be extrapolated. Learning augmentation strategies from data,” in Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, 2019, pp. 113-123.
Drawing on state-of-the-art deep learning techniques and increasing [20] T. DeVries and G. W. Taylor, “Dataset augmentation in feature space,” arXiv
computational abilities, wind power forecasting and deriving data issues preprint arXiv:1702.05538, 2017.
in energy fields shall be approached progressively from traditional sta [21] D. S. Park et al., “Specaugment: A simple data augmentation method for automatic
speech recognition,” arXiv preprint arXiv:1904.08779, 2019.
tistical and parameters-sensitive classical machine learning methods to [22] Deng X, Shao H, Hu C, Jiang D, Jiang Y. Wind power forecasting methods based on
deep learning approaches that can automatically identify complex pat deep learning: A survey. Computer Modeling in Engineering & Sciences 2020;122
terns. Besides, the sophisticated deep networks are particularly reliant (1):273–302.
[23] Yildiz C, Acikgoz H, Korkmaz D, Budak U. An improved residual-based
on data amounts. Motivated by this article, limited data of wind parks or convolutional neural network for very short-term wind power forecasting. Energy
other energy sectors could be artificially enlarged by appropriate data Convers Manage 2021;228:113731.
augmentations to serve as the stepping stone for further applications of [24] Kisvari A, Lin Z, Liu X. Wind power forecasting–A data-driven method along with
gated recurrent neural network. Renewable Energy 2021;163:1895–909.
deep learning to challenge related scientific and engineering difficulties.
[25] Shahid F, Zameer A, Muneeb M. A novel genetic LSTM model for wind power
forecast. Energy 2021;223:120069.
[26] Memarzadeh G, Keynia F. A new short-term wind speed forecasting method based
Declaration of Competing Interest on fine-tuned LSTM neural network and optimal input sets. Energy Convers
Manage 2020;213:112824.
The authors declare that they have no known competing financial [27] M. H. Lipu et al., “Artificial Intelligence Based Hybrid Forecasting Approaches for
Wind Power Generation: Progress, Challenges and Prospects,” IEEE Access, 2021.
interests or personal relationships that could have appeared to influence [28] Reichstein M, Camps-Valls G, Stevens B, Jung M, Denzler J, Carvalhais N. Deep
the work reported in this paper. learning and process understanding for data-driven Earth system science. Nature
2019;566(7743):195–204.
[29] Giebel G, Kariniotakis G. Wind power forecasting—A review of the state of the art.
Acknowledgments In: Renewable Energy Forecasting. Elsevier; 2017. p. 59–109. [Link]
10.1016/B978-0-08-100504-0.00003-2.
[30] Ng A. Advice for applying machine learning. Machine learning 2011.
The open-access publication charges for this article have been funded [31] Chen H, Birkelund Y, Anfinsen SN, Staupe-Delgado R, Yuan F. Assessing
by a grant from the publication fund of UiT The Arctic University of probabilistic modelling for wind speed from numerical weather prediction model
Norway. And thanks to the support of UiT Arctic Centre for Sustainable and observation in the Arctic. Sci Rep 2021;11(1):1–11.
[32] Dong Q, Sun Y, Li P. A novel forecasting model based on a hybrid processing
Energy and Dr. Fuqing Yuan and Stian Normann Anfinsen for their
strategy and an optimized local linear fuzzy neural network to make wind power
comments on the manuscript. forecasting: A case study of wind farms in China. Renewable Energy 2017;102:
241–57.
[33] Cleveland RB, Cleveland WS, McRae JE, Terpenning I. STL: A seasonal-trend
Data availability decomposition. Journal of official statistics 1990;6(1):3–73.
[34] Li J, Zhou J, Chen B. Review of wind power scenario generation methods for
optimal operation of renewable energy systems. Appl Energy 2020;280:115992.
The NWP data are publicly available from The Norwegian Meteo [35] Chen H, Birkelund Y, Anfinsen SN, Yuan F. Comparative study of data-driven short-
rological Institute. The measured wind data from Fakken wind park are term wind power forecasting approaches for the Norwegian Arctic region.
the property of the power company Troms Kraft AS. These data are J Renewable Sustainable Energy 2021;13(2):023314. [Link]
5.0038429.
available on reasonable request from the authors.
[36] F. A. Gers, J. Schmidhuber, F. Cummins. Learning to forget: Continual prediction
with LSTM. 1999.
References [37] R. C. Staudemeyer and E. R. Morris, “Understanding LSTM–a tutorial into Long
Short-Term Memory Recurrent Neural Networks,” arXiv preprint arXiv:
1909.09586, 2019.
[1] Letcher TM. Wind Energy Engineering: A Handbook for Onshore and Offshore
[38] C. Olah. Understanding lstm networks. 2015.
Wind Turbines. Academic Press; 2017.
[39] Ren Ye, Suganthan PN, Srikanth N. Ensemble methods for wind and solar power
[2] Rahimi E, Rabiee A, Aghaei J, Muttaqi KM, Esmaeel Nezhad A. On the management
forecasting—A state-of-the-art review. Renew Sustain Energy Rev 2015;50:82–91.
of wind power intermittency. Renew Sustain Energy Rev 2013;28:643–53.
[40] Jiajun H, Chuanjin Y, Yongle L, Huoyue X. Ultra-short term wind prediction with
[3] Cai Y, Bréon F-M. Wind power potential and intermittency issues in the context of
wavelet transform, deep belief network and ensemble learning. Energy Convers
climate change. Energy Convers Manage 2021;240:114276.
Manage 2020;205:112418.
[4] Liu H, Chen C, Lv X, Wu X, Liu M. Deterministic wind energy forecasting: A review
[41] Lee J, Wang W, Harrou F, Sun Y. Wind power prediction using ensemble learning-
of intelligent predictors and auxiliary methods. Energy Convers Manage 2019;195:
based models. IEEE Access 2020;8:61517–27.
328–45.
11
H. Chen et al. Energy Conversion and Management 248 (2021) 114790
[42] da Silva RG, Ribeiro MHDM, Moreno SR, Mariani VC, dos Santos Coelho L. A novel [46] Veit A, Wilber MJ, Belongie S. Residual networks behave like ensembles of
decomposition-ensemble learning framework for multi-step ahead wind energy relatively shallow networks. Advances in neural information processing systems
forecasting. Energy 2021;216:119174. 2016;29:550–8.
[43] Murphy KP. Machine learning: a probabilistic perspective. MIT press; 2012. [47] Lijuan L, Hongliang L, Jun W, Hai B. A novel model for wind power forecasting
[44] Sun W, Zhang T, Tao R, Wang A. Short-Term Photovoltaic Power Prediction based on Markov residual correction. In: IREC2015 The Sixth International
Modeling Based on AdaBoost Algorithm and Elman. In: 2020 10th International Renewable Energy Congress. IEEE; 2015. p. 1–5.
Conference on Power and Energy Systems (ICPES). IEEE; 2020. p. 184–8. [48] Gibbons JD, Chakraborti S. Nonparametric Statistical Inference; Revised and
[45] Whitley D. A genetic algorithm tutorial. Statistics and computing 1994;4(2):65–85. Expanded. CRC Press; 2014.
12