Short Term Electrical Load Forecasting Using Mutua
Short Term Electrical Load Forecasting Using Mutua
Article
Short Term Electrical Load Forecasting Using
Mutual Information Based Feature Selection
with Generalized Minimum-Redundancy
and Maximum-Relevance Criteria
Nantian Huang *, Zhiqiang Hu, Guowei Cai and Dongfeng Yang
School of Electrical Engineering, Northeast Dianli University, Jilin 132012, China;
[email protected] (Z.H.); [email protected] (G.C.); [email protected] (D.Y.)
* Correspondence: [email protected]; Tel.: +86-432-6480-6691
Abstract: A feature selection method based on the generalized minimum redundancy and maximum
relevance (G-mRMR) is proposed to improve the accuracy of short-term load forecasting (STLF).
First, mutual information is calculated to analyze the relations between the original features and the
load sequence, as well as the redundancy among the original features. Second, a weighting factor
selected by statistical experiments is used to balance the relevance and redundancy of features when
using the G-mRMR. Third, each feature is ranked in a descending order according to its relevance
and redundancy as computed by G-mRMR. A sequential forward selection method is utilized for
choosing the optimal subset. Finally, a STLF predictor is constructed based on random forest with the
obtained optimal subset. The effectiveness and improvement of the proposed method was tested
with actual load data.
Keywords: short term load forecasting; generalized minimum redundancy and maximum relevance;
random forest; sequential forward selection
1. Introduction
A short-term load forecasting (STLF) predicts future electric loads with a particular prediction
limit from one hour extending up to several days. The primary target of smart grids, such as reducing
the difference between peak and valley electric loads, large-scale renewable energy absorption, demand
side response, and optimal economic operation of the power grid, needs accurate STLF results [1].
In addition, with the development of competitive electricity markets, an accurate STLF is an important
basis for drafting a reasonable electricity price and improving the stability of electricity market
operation [2].
The existing STLF methods can be divided into traditional methods and artificial intelligence
methods. In the traditional methods, such as autoregressive integrated moving average (ARIMA) [3]
and regression analysis [4], Kalman filter [5] and exponential smoothing [6] are commonly used.
The combination of autoregressive and moving average in ARIMA is a better time series model for
STLF [7]. According to the historical time-varying load data, the ARIMA is established and applied
for predicting the forthcoming electrical load. The regression analysis uses historical data to establish
simple but highly efficient regression models [8]. The Kalman filter improves the accuracy of STLF
by estimating each component of load which is apportioned into random and fixed components.
The exponential smoothing eliminates the noise in the load time series, and the degree of future
load influenced by recent load data can be reflected by adjusting the weight of both data, which is
helpful for improving the accuracy of STLF [9]. Overall, the traditional STLF methods can analyze
the linear relationships between input and output, but not the nonlinear relationships [10]. If the load
presents large fluctuations caused by environmental factors, the traditional methods may provide
inaccurate forecasts.
In recent years, predictors based on artificial intelligence algorithms were widely used in the STLF
of power systems [10–17]. Such processes like fuzzy logic [14], expert systems [16,17], artificial neural
networks (ANNs) [18,19], and support vector machines (SVMs) [20,21] are currently used in STLF.
Fuzzy logic methods divide the input and the output into different kinds of membership functions,
and then the relationship between input and output is established by a set of fuzzy rules for fuzzy
systems for STLF [22]. However, the fuzzy systems with single if-then rules lack self-learning and
adaptive ability to be able to learn the input information effectively. An ANN acquires the complicated
non-linear relationship between input and output variables by learning the training samples. However,
there is no scientific way of acquiring the optimal network architecture when establishing an ANN
model. In addition, it also encounters the problems of falling into local optima and over-fitting [15,23].
SVMs overcome the deficiencies of ANNs by dealing with quadratic programming problems in
acquiring the global optimal solution. As compared to an ANN, the SVM has many advantages.
However, the SVM parameters, such as the type and variance of the kernel function, and penalty
factor, are selected empirically. To achieve the optimal parameters, a SVM combined with genetic and
particle swarm optimization algorithm is utilized [24,25]. The random forest (RF) is a combination of
classification and regression trees (CARTs) and a bagging learning method. Randomly, by sampling
from the training samples and selecting features for splitting node, the RF provides the ability to resist
noise and is free from over-fitting problems [26]. Furthermore, in actual practice, there are only two
parameters (the tree number and the number of the features for node splitting) that need to be set
when RF is applied for STLF [15], making RF highly suitable for STLF.
Considering the effect of various factors, artificial intelligence methods analyze the complicated
nonlinear relationships between power load and related factors to achieve higher precision of
prediction. However, the features that the predictor employs will influence the accuracy and efficiency
of STLF. Therefore, a feature selection schedule should be generated for choosing the optimal feature
subset for a predictor. The common features, including historical load, time, and meteorology, are used
for STLF modeling [11,27,28]. Historical load can reflect the variation of load accurately, which contains
plenty of information. The features of time, such as hour point, day of week, and on/off work day,
can also indirectly show the load pattern. In addition, a short-term power load is mainly affected by
the changing weather conditions which have a strong correlation with load demand. The accurate
meteorological information of the numerical weather prediction (NWP) can improve the accuracy of
STLF effectively. Consequently, NWP errors will reduce the accuracy of STLF [29].
A feature selection is a process of choosing the most effective features from an original feature
set. The optimal feature subset extracted from a given feature set can improve the efficiency and
accuracy of predictor in STLF [30]. Nowadays, the manner of selecting the features has become a hot
topic in short-term load forecasting research. Reference [31] adopted conditional mutual information
for feature selection. The mutual information values between features and load was measured and
subsequently ranked through their values. The first 50 features were used as a threshold parameter
for filtering out the irrelevant and weakly relevant features. Reference [10] constructed an original
feature set by using the phase space reconstruction theory. The correlation between features and load
was analyzed, discovering the optimal feature subset. In reference [29], the mutual information was
applied for extracting the effective features from the weather features, as well as, the historical load data
features were also extracted for improving the accuracy of holiday load forecasting. Reference [32] used
a memetic algorithm to extract a proper feature subset from an original feature set for medium-term
load forecasting. Reference [33] analyzed the daily and weekly pattern by autocorrelation function,
and chose 50 features as the best features for very short-term load forecasting. The mutual information
based on feature selection was used in reference [23]. By calculating the mutual information values
between feature vectors and target variable, we can temporarily define a lower boundary criterion
Entropy 2016, 18, 330 3 of 19
to filter the features. The optimal feature subset with best features was achieved for STLF. All of the
researches [10,23,29,31–33] made important contributions to the feature selection in STLF. However,
these feature selection methods were just carried out by analyzing the correlation between features
and load and the redundancy among these features was not considered.
To improve the accuracy of STLF, the mutual information based on generalized minimum
redundancy and maximum relevance feature selection and RF for STLF is proposed. First, an original
feature set is formed by extracting historical load features and time features from the original load
data. Second, G-mRMR is used for generating the candidate feature, which is ranked in a descending
order. Third, the sequential forward selection (SFS) method and a decision criteria based on mean
absolute percentage error (MAPE) are utilized for obtaining optimal feature subset by adding one
feature at a time to the input feature set of RF. Finally, the RF-based predictor is constructed with the
optimal feature subset to achieve the optimal predictor. The proposed method is validated through
STLF experiments using the actual load data from a city in Northeast China. The experimental results
are compared with different feature selection methods and predictors.
2. Methodology
P( x, y)
I ( X, Y ) = ∑ P(x, y)log P(x) P(y) (1)
X,Y
where P(x) and P(y) are the marginal density functions, and P(x, y) is the joint probability
density function.
The target of feature selection methods based on MI is finding a feature subset J with n features
which reflect the largest dependency on the target variable l from a feature set Fm with m features
(n m).
The maximum-relevance criterion uses the mean value of MI between feature xi and target l is
described as follows:
1
| J | x∑
maxD ( J, l ), D = I ( xi , l ) (2)
∈J i
The redundancy indicated by MI value describes the overlapping information among features,
wherein a larger MI signifies more overlapping information and vice versa. In the process of feature
selection, the features selected by maximum-relevance criterion can have more redundancy, and the
redundant features have similar information as the prior selected feature cannot improve the accuracy
of predictor. Therefore, the redundancy among features should also be evaluated in the process of
feature selection.
The minimum-redundancy requires a minimum dependency among each feature:
1
minR( J ), D = ∑
| J |2 xi ,x j ∈ J
I ( xi , x j ) (3)
The mRMR criterion combined with Equations (2) and (3) is computed as follows:
Generally, an incremental search method is used to search for the optimal features [34].
Generally,
Supposing there anisincremental
a feature set search method
Jn−1 with n−1 is used tothat
features search
hasfor theselected.
been optimal features
The aim[34].
is toSupposing
select the
there is a feature set J
nth feature from the nrest with n − 1 features that has been selected.
−1 of set {Fm-Jn−1} according to Equation (4). The incremental search The aim is to select the nthmethod
feature
from respect
with the rest to
of the
set {F m -Jn −1 } according
condition is as follows: to Equation (4). The incremental search method with respect to
the condition is as follows:
" 1 #
mRMR : max I ( x j , l) − 1 I ( x j , xi ) (5)
mRMR :
x j ∈Fm − Jn−1
max I ( x j , l ) − Jn −1 xi ∈Jn−1∑ I ( x j, xi ) (5)
x j ∈ Fm -Jn−1 | Jn−1 | x ∈ J
i n −1
where |Jn−1| refers to the number of features in Jn−1.
where |Jn −1 | refersEquation
Restructuring to the number (5) by of features
using ainweighting Jn − 1 . factor to balance the redundancy and
Restructuring Equation (5) by using
relevance of feature subset develops into the generalized a weighting factor to balance mRMR the redundancy
(G-mRMR) and relevance
presented as
of feature
follows subset develops into the generalized mRMR (G-mRMR) presented as follows [35]:
[35]:
" #
G − mRMR
G-mRMR : : max max II((xxjj,,ll) −−αα ∑I( xIj (, xji ,)xi ) (6)
(6)
j ∈-J
x j ∈ Fxm Fm − Jn−1
n −1 i∈
xx J
i ∈n−J1n−1
Root
Feature 1
node1 node2
Feature 2 Feature 3
Figure 1. A
A simple
simple CART.
CART.
Supposing there is a dataset D with d samples which includes C classes, the Gini index of D
Supposing there is a dataset D with d samples which includes C classes, the Gini index of D can
can be defined as:
be defined as:
C 22
d
G ( D ) = 1 − ∑ di i
C
G( D) = 1 − d (7)
(7)
i =1 d
i =1
where di is the number of ith class.
where di is the number
Afterward, of ithf class.
the feature is used to divide D into D1 and D2 subset, wherein the Gini index after
Afterward,
the split is: the feature f is used to divide D into D1 and D2 subset, wherein the Gini index after
the split is: d d2
Gsplit ( D ) = 1 G ( D1 ) + G ( D2 ) (8)
d d
Entropy 2016, 18, 330 5 of 19
d1 d
Entropy 2016, 18, 330 Gsplit ( D) = G( D1 ) + 2 G( D2 ) (8)19
5 of
d d
2.2.2. Bagging
2.2.2. Bagging
The bagging
The bagging is is an
an integrated
integrated learning
learning algorithm
algorithm proposed
proposed by
by Leo
Leo Breiman
Breiman [37].
[37]. Given
Given dataset
dataset
B
B with
with M
M features
features and
and learning
learning rule
rule H,
H, a
a bootstrapping
bootstrapping is
is carried
carried out
out to
to generate
generate training
training sets
sets
q
1 2
B1 , B2 , . . . , B . The samples in dataset
{ B , B , , B } . The samples in dataset B may
q B may be appraised many times or not at all. A forecasting
be appraised many times or not at all. A forecasting
system consists of a group of learning rule H 11, H 22, . . . , Hqq which have learned the training set is
system consists of a group of learning rule { H , H , , H } which have learned the training set is
achieved. Breiman pointed out that bagging can improve the accuracy of predicting the instability of
achieved. Breiman pointed out that bagging can improve the accuracy of predicting the instability
learning algorithms such as CART and ANN [37].
of learning algorithms such as CART and ANN [37].
2.2.3. RF
2.2.3. RF
The RF is a group of predictors { p(x, Θk ), k = 1, 2, . . .}, which is composed of numbers of CARTs,
where The RFthe
x is is ainput
groupvector
of predictors x , Θk ), k = 1,2,
and {Θk {}p(represents the } , which is composed
independent identicallyofdistributed
numbers ofrandom
CARTs,
vectors.
where x The modeling
is the process
input vector {Θkis:
of RF
and } represents the independent identically distributed random
vectors. The modeling process of RF is:
(1) k training sets are sampled with replacement from the dataset B by bootstrap.
(2) (1)
Eachktraining
training set
setsgrows
are sampled
up to awith
tree replacement
according tofrom
CART thealgorithm.
dataset B by bootstrap.dataset B has
Supposing
(2) Each training
M features and mtry setfeatures
grows up
are to a tree according
randomly to CART
selected from B foralgorithm. Supposing
each non-leaf dataset B
node. Afterward,
has M
the node features
is split by a and mtry
feature features
selected arethese
from randomly selected from B for each non-leaf node.
mtry features.
(3) EachAfterward,
tree grows the node is split
completely by apruning.
without feature selected from these mtry features.
(3) Each tree grows completely without
(4) The forecasting result is solved by calculating pruning.the mean value of the consequences of each
(4) The forecasting result is solved by calculating the mean value of the consequences of each
tree predicted.
tree predicted.
The flow chart of RF model is illustrated in Figure 2.
The flow chart of RF model is illustrated in Figure 2.
Original
training set
bootstrap
...
1 k
- ∑
k i=1
value(i)
Figure
Figure 2.
2. Random
Random Forest
Forest modeling
modeling and
and predicting
predicting process.
process.
The bagging and the random selection of feature for splitting ensure the good performance of
The bagging and the random selection of feature for splitting ensure the good performance of
RF, wherein:
RF, wherein:
Entropy 2016, 18, 330 6 of 19
3.3.Data
Data Analysis
Analysis
The
Thehistorical
historicalload
loaddata
data used
used in
in this paper is
this paper is archived
archiveddata
datafrom
fromaacity
cityininNortheast
Northeast China
China from
from
2005
2005toto2012.
2012.As Asshown
shown in in Figure
Figure 3a,b, the load
3a,b, the load demand
demandfrom
from2005
2005toto2012
2012increased
increased rapidly
rapidly with
with
the increase in population and development of the local society. It is difficult to
the increase in population and development of the local society. It is difficult to generate a highlygenerate a highly
accurate
accurateSTLF
STLFininthis
thiskind
kind ofof load
load pattern. Figure 3c
pattern. Figure 3cshows
showsthethecorrelation
correlationanalysis
analysis results
results of of
thethe
historical
historicalload
loadbybyautocorrelation
autocorrelation function [38]. Evidently,
function [38]. Evidently,the
theautocorrelation
autocorrelationcoefficient
coefficient is is reduced
reduced
gradually
graduallyalong
alongwith
withthe
theincreasing
increasingofofhour
hourlag.
lag.According
AccordingtotoFigure
Figure3c,3c,the
theload
loadfar
farfrom
fromcurrent
currenthas
low
hascorrelation. Only Only
low correlation. the correlation of theof
the correlation load
thedata
loadfrom
data 2011
fromto 2012
2011 tois2012
above the confidence
is above interval
the confidence
interval
which which iscorrelation
is positive positive correlation (above
(above of the blueofline).
the blue
Withline). With the increasing
the increasing of the load,ofthethehistoric
load, the load
historic
with largeload
lag with largelow
has very lag correlation
has very low correlation
with with thepoint.
the forecasting forecasting point.we
Therefore, Therefore ,wedata
prefer the prefer
from
the to
2011 data from
2012 to 2011 to 2012
be used to be used
for further for further research.
research.
GDP(billion yuan)
Population(million)
0.8
0.6
0.4
0.2
-0.2
0 0.876 1.752 2.628 3.504 4.38 5.256 6.132
4
Lag/Hour X10
(c)
Figure 3. Yearly load curve analysis: (a) Average daily load from 8 January 2005 to 31 December.
Figure 3. Yearly
2012; (b) load curveand
The population analysis: (a) Average
GDP from 2005 todaily
2012;load from 8 load
(c) Hourly January 2005 to 31 December
autocorrelation 2012;
of historical
(b) Thedata.
load population and GDP from 2005 to 2012; (c) Hourly load autocorrelation of historical load data.
Figure44shows
Figure showsthe
theaverage
average daily
daily load
load pattern
pattern occurring
occurringinindifferent
differentseasons.
seasons.These
Theseloads have
loads have
visibly different patterns which are caused by the varying climate.
visibly different patterns which are caused by the varying climate.
Entropy 2016, 18,
2016,
Entropy 2016, 18, 330
330 7 of
of 19
19
Entropy 18, 330 77 of 19
Entropy 2016, 18, 330 7 of 19
3400
3400
3400
Demand(MW)
LoadDemand(MW)
Average Daily Load Demand(MW)
3000
30003000
2600
DailyLoad
26002600
AverageDaily
2200
22002200
Average
Figure 4. Four
Figure
Figure Four seasons
4. Four seasonsaverage
averagedaily
dailyload
loadprofile
profile from December
December2010
December 2010totoNovember
November2011.
2011.
Figure 4.
4. Four seasons
seasons average
average daily
daily load
load profile
profile from
from December 2010
2010 to
to November 2011.
November 2011.
ByBy
By observing
observing
observing Figure
Figure5,
Figure 5,5,it
ititis
isispossible
possible
possibletoto know
toknow that
know that the load
that the load demand
load demandpresents
demand presentsa aakind
presents kindofof
kind ofcycling
cycling
cycling
Bywith
observing Figureof75,7days.
itdays.
is possible to know thatfrom
the load demand presents isisasimilar,
kind of whereas
cycling mode
mode with a period of 7 days. The load demand from Monday to Friday is similar, whereasonon
mode
mode witha a period
period of Theload
The load demand
demand from Monday
Monday to
to Friday
Friday similar, whereas on
with a period
Saturday
Saturday and of
and 7 days. The
Sundaythey
Sunday load
theyare demand
aredissimilar from
dissimilar from Monday
from each to Friday
each other. This
Thisis similar,
pattern
pattern whereas
isisdue
due on
totothe Saturday
theconcurrent and
concurrent
Saturday and Sunday they are dissimilar from each other. This pattern is due to the concurrent
Sunday
changing they
changingof ofare
load dissimilar
load level
level with from
withthe each other.
thevarying
varying This pattern
electricity
electricity is duebehavior
consumption
consumption to the concurrent
behavior of
ofpeople changing
peoplewithin of load
withina aweek.
a week.
week.
changing of load level with the varying electricity consumption behavior of people within
level with the varying electricity consumption behavior of people within a week.
3800
3800
3800
3200
3200
3200
2600
2600
2600
Mon. Tues. Wed. Thur. Fri. Sat. Sun. Mon. Tues. Wed. Thur. Fri. Sat. Sun.
2000
0
Mon. 24
Tues. 48
Wed. 72Thur. 96Fri. 120Sat. 144Sun. 168Mon.192Tues.216 Wed.
240 Thur.
264 Fri.288 Sat. 312 Sun.336
2000 Mon. Tues. Wed. Thur. Fri. Sat. Sun. Mon. Tues. Wed. Thur. Fri. Sat. Sun.
2000 0 24 48 72 96 120 144 Hour168 192 216 240 264 288 312 336
0 24 48 72 96 120 144 168 192 216 240 264 288 312 336
Hour
Figure 5. Load curve from Hour 15 to 28 August 2011.
Figure 5.
Figure 5. Load
Load curve
Load curve from 15
curve from 15 to
to 28
28 August
August 2011.
2011.
The load point predicts the highly correlated load points similar from the day before as well as
relevant
The load
The with
load previous
point week.
predicts theAs shown
highly in Figureload
correlated 6, the load similar
points points throughout
from the thebefore
the day week at
as lag
well1,as
The load point
point predicts
predicts the
the highly
highly correlated
correlated load
load points
points similar
similar from
from the day
day before
before as
as well
well as
as
lag 24,with
relevant lag 48, lag 72, lag
previous 96, lag
week. As120, lag 144,
shown and lag
in Figure
Figure 6,168
thehave
loadstrong
pointsrelevance assuming
throughout eachatlag
the week
week lagis 1,
relevant
relevant with
with previous
previous week.
week. AsAs shown
shown in in Figure 6,6, the
the load
load points
points throughout
throughout the
the week at
at lag
lag 1,
1,
lag124,
lag
hour
24, lagdifference.
lag 48, lag
48, 72,Furthermore,
lag 72, lag 96,
lag 96, lag other
lag 120,
120, lagmoment
lag andload
144, and
144, lagvalues
lag 168 have
168
alsostrong
have have different
strong
dependence.
relevance
relevance assuming each
assuming each lag
lag is
is
lag 24, lag 48, lag 72, lag 96, lag 120, lag 144, and lag 168 have strong relevance assuming each lag is
1 hour
hour difference.
difference. Furthermore,
Furthermore, other
other moment
moment loadload values
values also
also have different
different dependence.
dependence.
11 h difference. Furthermore, other
1 moment load values also havehave
different dependence.
10.8
1
Autocorrelation coefficient
0.80.6
0.8
coefficient
0.60.4
Autocorrelationcoefficient
0.6
0.40.2
0.4
0
0.2
Autocorrelation
0.2
-0.2
0
0
-0.4
-0.2
-0.2
-0.6
-0.4 0 24 48 72 96 120 144 168
-0.4
Lag(Hours)
-0.6
Figure
-0.6 6.
0 Autocorrelation
24 48 coefficient
72 96 of load
120 with
144 168 168
lags.
0 24 48 72 96 120 144 168
Lag(Hours)
Lag(Hours)
Figure 6.
Figure 6. Autocorrelation
Autocorrelation coefficient
coefficient of
of load
load with
with 168
168 lags.
lags.
Figure 6. Autocorrelation coefficient of load with 168 lags.
Entropy 2016, 18, 330 8 of 19
The original feature set for STLF can be achieved based on the above analysis. The 168 load
variables {Lt-168 , Lt-167 , . . . , Lt-2 , Lt-1 } are extracted as part of original feature set. When doing a day
ahead load forecasting, assuming the current moment is t, the load values from the moment t-1 to t-24
are unknown. Therefore, the variables {Lt-24 , Lt-23 , . . . , Lt-1 } are eliminated from the original feature
set. In addition, the features, such as hour of day, the day is within weekday or weekend, day of week
and season, are considered for constructing the original feature set.
Though meteorological factor affects the load demand, it is not considered in this paper because
the error of NWP influences the accuracy of STLF [29]. If needed, the meteorological can be added into
the original feature set for feature selection in the same manner. There are 168 features in the original
feature set F, as shown in Table 1.
where Zi is the actual value of load, Ẑi is the forecasting value, N is the number of sample.
4.3.
4.3. The The Proposed
Proposed STLFSTLF Model
Model
BasedBased
on theonmethods
the methods in Sections
in Sections 4.1 4.1
andand
4.2,4.2,
thethe methodofoffeature
method feature selection
selection with
withRF
RFfor
forSTLF
STLF is
is proposed. The feature selection and short-term load forecasting process are shown in Figure 7,
proposed. The feature selection and short-term load forecasting process are shown in Figure 7, where p
where p is the number of feature and α is the weighting factor from 0.1 to 0.9, with an increment of
is the number of feature and α is the weighting factor from 0.1 to 0.9, with an increment of 0.1.
0.1.
Start RF initialization
End
Figure
Figure 7. The
7. The feature
feature selection
selection processbased
process based on
on G-mRMR
G-mRMR and
andRFRFforfor
STLF.
STLF.
Entropy 2016, 18, 330 10 of 19
N i∑
u
is also used. The RMSE is defined RMSE = tequation:
in the follow ( Zi − Ẑi ) (10)
=1
1 N
N
5.1. Feature Selection Results Based on G-mRMR and RF
i =1
(Zi − Zˆ i )2
RMSE = (10)
In this subsection, the optimal subset is achieved according to the minimum MAPE by setting
5.1. Feature Selection Results Based on G-mRMR and RF
different weighting factor values of G-mRMR. Figure 8 shows the MAPE curves of the results from
RF predictionsIn this subsection,
under the optimal
different subsetfactor
weighting is achieved
α. Asaccording
showntointheFigure
minimum8a, MAPE
the MAPEby setting
is reduced
different weighting factor values of G-mRMR. Figure 8 shows the MAPE curves of the results from
and reaches a minimum value with the increase in the number of feature. Subsequently, it ceases
RF predictions under different weighting factor α. As shown in Figure 8a, the MAPE is reduced and
to decrease and gradually increases, indicating that the later addition of features does not improve
reaches a minimum value with the increase in the number of feature. Subsequently, it ceases to
the performance
decrease and ofgradually
RF, but increases,
only brings adverse
indicating thateffect.
the laterAs shown
addition of in Figure
features 8b,not
does the error is
improve thereduced
rapidlyperformance
when adopting of RF,abut
small
onlyvalue
bringsof α, foreffect.
adverse instance α = 0.1,
As shown in which
Figure 8b,indicates
the errorthat features have
is reduced
rapidly when for
useful information adopting a smallthe
improving value of α, for instance
performance of RF.α By
= 0.1, which indicates
excessively that features
considering have
the redundancy
among useful
features information
when using for aimproving
large value theofperformance
α, the selectedof RF. By excessively
feature subset doesconsidering
not provide theenough
redundancy among features when using a large value of α, the selected feature subset does not
relevant information for the prediction of RF-based predictor.
provide enough relevant information for the prediction of RF-based predictor.
5.5
α=0.1
α=0.1 3 α=0.2
5 α=0.2
α=0.3
α=0.3
α=0.4
α=0.4 2.9
4.5 α=0.5
α=0.5
MAPE (%)
α=0.6
MAPE (%)
α=0.6 α=0.7
α=0.7 2.8
4 α=0.8
α=0.8
α=0.9
α=0.9
3.5 2.7
3 2.6
(15 , 2.5597)
2.5 2.5
0 32 64 96 128 148 10 13 16 19 22 25 28 30
Number of Feature Number of Feature
(a) (b)
Figure 8. Prediction error curves: (a) Prediction error curves corresponding to different weighting
Figure 8. Prediction error curves: (a) Prediction error curves corresponding to different weighting
factor α; (b) The enlarged figure of red box in (a).
factor α; (b) The enlarged figure of red box in (a).
Entropy 2016, 18, 330 11 of 19
Table 3 presents the results of feature selection. When α = 0.4, the feature subset has the least
number of feature and the RF generates the minimum MAPE. The optimal feature subset is selected.
Table 3. Feature subsets selected by minimum MAPE under different weighting factors.
The RF will do poor forecasting with less trees, while excessive trees will make it a complicated
predictor. In order to obtain a reasonable number of trees of RF, an experiment is designed as follows:
(1) The training set and test set with optimal features are used for the experiment.
(2) The initial number of tree nTree = 1.
(3) Training RF and testing with different nTree value with increment of 1 until nTree = 500.
4
4
(%)(%)
3.6
MAPE
3.6
MAPE
3.2
3.2
2.8
2.8
2.4
0 100 200 300 400 500
2.4
0 100 200
Number 300
of tree 400 500
Number of tree
Figure 9. Correlation between tree number and prediction of RF.
Figure 9.
Figure 9. Correlation
Correlation between
between tree
tree number
number and
and prediction
prediction of
of RF.
RF.
The prediction error decreases with the increasing number of tree. When nTree > 100, the error
The
tendsTheto beprediction
steady. By error decreases
analyzing with
thewith
result,the increasing number of tree. When=nTree > 100, the error
prediction error decreases thenTree = 184 with
increasing minimum
number of tree.MAPE
When nTree2.5389% is obtained,
> 100, the error
tends this
using to be steady.of By analyzing the result, nTree
of RF==in184
184 with minimum MAPE = 2.5389% is obtained,
tends to benumber
steady. Bytrees as the the
analyzing parameter
result, nTree thewith
future experiment.
minimum MAPE = 2.5389% is obtained,
using this number of trees as the parameter of RF in the future experiment.
using this number of trees as the parameter of RF in the future experiment.
5.2. Comparision Experiments for STLF
5.2. Comparision Experiments for STLF
5.2. Comparison Experiments for STLF
The data shown in Table 2 are used in the comparision of experiments.
The data shown in Table 2 are used in the comparision of experiments.
The data shown in Table 2 are used in the comparison of experiments.
5.2.1. Comparison of Different Feature Selection Methods
5.2.1. Comparison
5.2.1. Comparison of of Different
Different Feature
Feature Selection
Selection Methods
Methods
By using RF as the predictor, the feature selection methods such as Pearson Correlation
By using
By
Coefficient using RFRF
(PCC), as and
asMI,
the the SFS,
predictor,
predictor, the feature
thecompared
are feature withselection
selection themethods methods
proposedsuchmethod such foras
as Pearson Pearson the
Correlation
estimating Correlation
Coefficient
effect of
Coefficient
(PCC), (PCC),
MI, and of
feature selection MI,
SFS, and SFS,
are compared
G-mRMR. are
The resultscompared
withofthe these with
proposedthe proposed method
method methods
feature selection for
for estimating estimating
the effect
are presented the
of effect
in Figure 10.of
feature
feature
selection selection
of G-mRMR.
In Figure of G-mRMR.
10, with The The
theresults results
sameofpredictor, of these
these feature feature selection
the selection
SFS provides methods
methodsthe are are presented
bestpresented in
in Figure
performance, Figure
10. 10.by
followed
G-mRMR In Figure
In Figure 10,
(α = 10, with
0.4)with
and thethe same
MI,same predictor,
predictor,
and finally the
the thePCC. SFS
SFS provides
provides
The the best
the
SFS, which best performance,
performance,
convolves with RF, followed
followed
selects by by
22
G-mRMR
G-mRMR
features and (α = 0.4)
(α =achieves and
0.4) and the MI, and
MI, minimum finally
and finallyMAPE the
the PCC. PCC. The SFS,
The SFS, Considering
= 2.4925%. which
which convolves convolves with
with RF, selects
the relevance RF,
between selects
22 features
feature22
features
and loadand
achieves andachieves
thetheminimum the minimum
redundancy MAPE MAPE
= 2.4925%.
among = Considering
features, 2.4925%.
G-mRMR Considering
the = 0.4)the
(α relevance relevance
between
selects between
feature
15 features feature
and
with load
the
and
and load and
the redundancy
minimum the redundancy
MAPE = 2.5597%. among features,among
The feature features,
G-mRMR G-mRMR
subset(αselected (α
= 0.4) selects= 0.4)
by MI,15which selects
features 15
does features
withnotthe with
minimum
consider the
the
minimum
MAPE
redundancy MAPE
= 2.5597%.
among =
The 2.5597%. The
feature subset
features, is higherfeature
selected subset selected
by MI, which
than G-mRMR by
(α =does MI,
0.4).not which
consider
Only the PCCdoes not consider
the redundancy
analyzes theamong the
linear
redundancy
features,
relation betweenamong
is higher than features,
features and is
G-mRMR higher
(α =however
load, than
0.4). Only G-mRMR
thethe PCC(α
feature = 0.4).selected
analyzes
subset Only
the the
linear PCC analyzes
relation
through between
this methodthe linear
features
is not
relation
and between
load,ashowever
as good G-mRMR features
the(α = 0.4).subset selected through this method is not as good as G-mRMR (α =is0.4).
and
feature load, however the feature subset selected through this method not
as good as G-mRMR (α = 0.4).
7 3
7 MI 3
MI
G-mRMR (α=0.4) 2.9
6 MI
G-mRMR (α=0.4)
PCC 2.9 (21, 2.8841)
6 MI
G-mRMR (α=0.4)
SFS
PCC
2.8 (21, 2.8841) G-mRMR (α=0.4)
PCC
SFS
5 2.8 SFS
PCC
5 SFS
2.7
2.7
4
4 2.6 (15, 2.5597)
2.6 (15, 2.5597) (56, 2.6017)
3 (56, 2.6017)
2.5
3 (22, 2.4925)
2.5
(22, 2.4925)
2 2.4
0 32 64 96 128 148 10 20 30 40 50 60
2 2.4
0 32 64 (a) 96 128 148 10 20 30 (b) 40 50 60
(a) (b)
Figure 10. Prediction error curves: (a) Prediction error curves corresponding to different feature
Figure
Figure 10.
10. Prediction
Prediction error curves:
errorenlarge (a) Prediction
curves:figure
(a) Prediction error curves corresponding to different feature
selection methods; (b) The of red box error
in (a).curves corresponding to different feature
selection
selection methods;
methods; (b)
(b) The
The enlarge
enlarge figure
figureofofred
redbox
boxin
in(a).
(a).
Entropy 2016, 18, 330 13 of 19
Load Demand(MW)
SFS-RF SFS-RF
RF (full features) RF (full features)
4000 4000
3500 3500
3000 3000
2500 2500
2000 2000
0 24 48 72 96 120 144 168 0 24 48 72 96 120 144 168
Hour Hour
(a) (b)
5000 5000 True value
True value
G-mRMR-RF(α=0.4) G-mRMR-RF(α=0.4)
MI-RF MI-RF
4500 4500 PCC-RF
Load Demand(MW)
PCC-RF
Load Demand(MW)
SFS-RF SFS-RF
RF (full features) RF (full features)
4000 4000
3500 3500
3000 3000
2500 2500
2000 2000
0 24 48 72 96 120 144 168 0 24 48 72 96 120 144 168
Hour Hour
(c) (d)
Figure 11. Load curves of forecasting results of four weeks in four seasons and the true values: (a)
Figure 11. Load curves of forecasting results of four weeks in four seasons and the true values:
Forecasting from 23 to 29 February 2012; (b) Forecasting from 13 to 19 May 2012; (c) Forecasting
(a) Forecasting from 23 to 29 February 2012; (b) Forecasting from 13 to 19 May 2012; (c) Forecasting
from 21 to 27 August 2012; (d) Forecasting from 24 to 30 November 2012.
from 21 to 27 August 2012; (d) Forecasting from 24 to 30 November 2012.
By analyzing Figures 10 and 11 and Tables 3–6 comprehensively, although SFS achieved the
Byforecasting
best analyzing Figures
results in 10the
andfeature
11 andselection
Tables 4–7 comprehensively,
process, the proposedalthough
method SFS achieved
achieved the best
the better
forecasting
result in results in the
the testing feature selection
schedule. process,the
When predicting the 28
proposed
days in method
the test achieved the bettermethod
set, the proposed result in
theyields
testingthe
schedule. When predicting
best forecasting in 20 daystheand
28 days in the in
the MAPE testthe
set,remaining
the proposed
eightmethod
days is yields
higherthe best
than
other methods,
forecasting ranging
in 20 days and the from 0.04%into
MAPE 0.37%.
the The average
remaining eight daysMAPE and the
is higher thanaverage RMSE indicate
other methods, ranging
G-mRMR-RF
from performs
0.04% to 0.37%. the bestMAPE
The average amongandthe
the methods
average RMSE whichindicate
demonstrates
G-mRMR-RFthe validity
performs andthe
advancement of G-mRMR.
best among the methods which demonstrates the validity and advancement of G-mRMR.
TheThenewnew methodalso
method alsohas
hasthe
theminimum
minimum value
value ofof the
the maximum
maximumerror errorofofSTLF
STLF inin
the testing
the testingset.set.
AsAs shown
shown in in Table
Table 6, 5,
the the maximumMAPE
maximum MAPEand and maximum
maximum RMSE RMSEof ofthe
theproposed
proposedmethod
method areare
6.12%
6.12%
and 208.00 MW. Although the maximum error of the new method is high,
and 208.00 MW. Although the maximum error of the new method is high, but compared with other but compared with other
Entropy 2016, 18, 330 14 of 19
methods, the proposed method still performed better. The high prediction error can be caused by two
factors. On the one hand, the load of forecasting day is much larger than the historical load data in the
training set. In this paper, most features in the original feature set are extracted from the historical
load data. Without the consideration of other features, the prediction results cannot advance just
by improving the feature selection and forecasting method. On the other hand, with the significant
economic rise of China from 2005 to 2012, the growth rate of gross domestic product of the city is
more than 10%. Under this premise, the electric load of the city increases rapidly which makes STLF
a challenging work.
Table 4. Comparison of prediction error (MAPE (%) and RMSE (MW)) from 23 to 29 February 2012.
G-mRMR-RF RF with
MI-RF PCC-RF SFS-RF
Day (α = 0.4) Full Features
MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE
Day 1 1.93 75.24 1.79 69.42 10.28 401.01 2.07 79.58 1.91 70.74
Day 2 1.77 66.63 1.78 67.90 9.78 388.26 2.22 79.46 1.80 69.13
Day 3 1.58 53.24 1.63 51.63 7.59 285.51 1.47 49.33 1.50 50.49
Day 4 1.69 79.28 1.59 70.02 5.35 189.65 2.52 105.32 1.98 76.33
Day 5 2.26 90.72 2.66 104.16 11.14 440.91 2.04 83.68 2.91 113.32
Day 6 1.58 57.73 2.37 83.87 9.78 396.44 1.61 57.41 2.54 87.59
Day 7 1.28 51.92 0.97 36.35 9.26 362.46 1.87 73.03 1.29 44.60
Average 1.72 67.82 1.82 69.05 9.02 352.03 1.97 75.40 1.99 73.17
Table 5. Comparison of prediction error (MAPE (%) and RMSE (MW)) from 13 to 19 May 2012.
G-mRMR-RF RF with
MI-RF PCC-RF SFS-RF
Day (α = 0.4) Full Features
MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE
Day 1 1.20 42.36 1.22 39.15 3.76 110.39 1.43 50.90 1.57 47.92
Day 2 1.64 60.32 1.33 50.26 8.98 273.78 1.28 46.10 1.37 53.34
Day 3 2.04 66.88 2.04 67.09 6.56 246.64 2.03 69.43 2.00 66.78
Day 4 0.94 34.38 0.96 34.48 7.04 263.29 0.89 34.98 1.11 41.54
Day 5 1.55 53.26 1.40 46.62 7.17 261.54 1.40 50.04 1.50 52.38
Day 6 1.28 41.45 1.34 44.68 6.66 237.55 1.28 40.22 1.45 40.03
Day 7 0.84 26.82 0.99 36.97 5.51 178.83 0.92 50.61 1.01 49.05
Average 1.35 46.49 1.33 48.03 6.53 224.57 1.32 48.90 1.40 50.15
Table 6. Comparison of prediction error (MAPE (%) and RMSE (MW)) from 21 to 27 August 2012.
G-mRMR-RF RF with
MI-RF PCC-RF SFS-RF
Day (α = 0.4) Full Features
MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE
Day 1 2.88 92.71 2.61 83.05 6.69 258.31 3.32 104.41 2.89 90.68
Day 2 1.48 55.30 1.55 56.81 8.22 319.74 1.77 62.53 1.59 57.62
Day 3 0.91 31.93 0.82 29.02 7.04 263.33 1.00 38.68 1.07 36.28
Day 4 1.88 76.95 2.27 90.86 8.97 344.82 1.99 84.88 2.17 87.44
Day 5 1.77 54.77 1.87 56.56 6.42 227.25 2.16 70.95 1.91 58.15
Day 6 2.08 73.60 1.78 71.44 5.91 181.33 1.71 65.13 1.86 74.78
Day 7 6.12 208.00 6.77 237.00 11.26 458.19 6.98 247.66 6.57 227.17
Average 2.45 72.83 2.52 89.25 7.79 293.28 2.70 96.32 2.58 90.30
Entropy 2016, 18, 330 15 of 19
Table 7. Comparison of prediction error (MAPE (%) and RMSE (MW)) from 24 to 30 November 2012.
G-mRMR-RF RF with
MI-RF PCC-RF SFS-RF
Day (α = 0.4) Full Features
MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE MAPE RMSE
Day 1 1.61 58.60 1.64 58.26 6.80 263.40 1.96 68.90 1.78 63.62
Day 2 1.05 48.24 1.12 43.26 6.97 242.74 2.11 78.43 1.09 40.30
Day 3 1.98 74.62 2.06 74.67 10.78 427.39 1.98 73.90 2.12 76.64
Day 4 1.47 57.14 1.33 48.09 9.21 387.30 1.80 67.13 1.50 57.63
Day 5 1.12 42.84 0.90 33.83 10.29 413.17 1.15 46.87 1.26 45.01
Day 6 1.31 53.79 1.33 52.33 9.03 389.52 1.32 54.74 1.23 47.40
Day 7 1.10 42.86 1.22 45.31 9.53 387.69 1.06 44.21 1.18 43.42
Average 1.38 54.01 1.37 50.82 8.93 358.74 1.63 62.02 1.45 53.43
Table 8. The optimal subset selected by using different intelligent STLF methods.
The test sets, with four weeks being distributed over the four seasons, are used for estimating
each predictor with the features chosen above. Figure 12 shows the MAPE for comparison and
Table 9 gives the predictive accuracy of each model through maximum, minimum, and average
MAPE. In addition, a direct comparison between G-mRMR-RF, generalized minimum redundancy and
maximum relevance-back propagation neural network (G-mRMR-BPNN), and generalized minimum
redundancy and maximum relevance-support vector regression (G-mRMR-SVR), in terms of MAPE,
are also presented in this figure. Except for the MAPE prediction in the seventh day, as shown in
Figure 12c, the accuracy of G-mRMR-RF is between 1% and 2%; one point is above 2%. In the
whole experiment, only four days show that G-mRMR-RF forecasted worse than other models.
Clearly, the G-mRMR-RF is the best prediction model for its low MAPE and small fluctuation of
error. The G-mRMR-BPNN shows a little better performance than G-mRMR-SVR. We can observe
the maximum MAPE of these four weeks of G-mRMR-RF is 2.26%, 2.04%, 6.12%, 1.98%, respectively,
which is smaller than other models. Same conclusion can be drawn by analyzing the minimum and
average MAPE.
Entropy 2016, 18, 330 16 of 19
fluctuation of error. The G-mRMR-BPNN shows a little better performance than G-mRMR-SVR. We
can observe the maximum MAPE of these four weeks of G-mRMR-RF is 2.26%, 2.04%, 6.12%, 1.98%,
respectively,
Entropy 2016, 18,which
330 is smaller than other models. Same conclusion can be drawn by analyzing the
16 of 19
minimum and average MAPE.
8 8
G-mRMR-RF G-mRMR-RF
7 7
G-mRMR-BPNN G-mRMR-BPNN
6 G-mRMR-SVR 6 G-mRMR-SVR
5 5
MAPE(%)
MAPE(%)
4 4
3 3
2 2
1 1
0 0
1 2 3 4 5 6 7 1 2 3 4 5 6 7
Day Day
(a) (b)
8 8
G-mRMR-RF G-mRMR-RF
7 7
G-mRMR-BPNN G-mRMR-BPNN
6 G-mRMR-SVR 6 G-mRMR-SVR
5 5
MAPE(%)
MAPE(%)
4 4
3 3
2 2
1 1
0
1 2 3 4 5 6 7 1 2 3 4 5 6 7
Day Day
(c) (d)
Figure 12. Forecasting error profiles of different predictors: (a) Forecsting from 23 to 29 February 2012; (b)
Figure 12. Forecasting error profiles of different predictors: (a) Forecasting from 23 to 29 February 2012;
Forecsting from 13 to 19 May 2012; (c) Forecsting from 21 to 27 August 2012; (d) Forecsting from 24 to 30
(b) Forecasting from 13 to 19 May 2012; (c) Forecasting from 21 to 27 August 2012; (d) Forecasting from
November 2012.
24 to 30 November 2012.
Table 8. Max, Min and Average daily MAPEs of test set corresponding to different predictors.
Table 9. Max, Min and Average daily MAPEs of test set corresponding to different predictors.
6. Conclusions
For the issues regarding the selection of reasonable features for STLF, a feature selection method
based on G-mRMR and RF is proposed in this paper. The experimental results show that the proposed
feature selection approach can select fewer features than other feature selection methods, and the
features identified by the proposed approach are useful for STLF. In addition, the experimental results
show that the forecasting consequences by RF are better than other predictors.
The advantages of the proposed method are as follows:
(1) MI is adopted as the criterion to measure the relevance between features and time series of load
and the dependency among features, which is the basis of quantitative analysis of feature selection
by mRMR.
(2) The correlation between features and load as well as the redundancy of these features are
considered. As compared to the maximum relevance method, the G-mRMR method for feature
selection reduces the number of optimal feature subset and avoids the association of STLF
accuracy with the redundancy of features. For the time being, the relevance and redundancy
are balanced by using a variable weighting factor. The features selected by G-mRMR make the
accuracy of RF more precise than mRMR.
(3) The optimal structure of RF is designed for reducing the complexity of the model and for
improving the accuracy of STLF.
Acknowledgments: This work is supported by the National Nature Science Foundation of China (No. 51307020),
the Science and Technology Development Project of Jilin Province (No. 20160411003XH) and the Science and
Technology Foundation of Department of Education of Jilin Province (2016, No. 90).
Author Contributions: Nantian Huang put forward to the main idea and design the whole venation of this paper.
Zhiqiang Hu did the experiments and prepared the manuscript. Guowei Cai guided the experiments and paper
writing. Dongfeng Yang provided materials. All authors have read and approved the final manuscript.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Moslehi, K.; Kumar, R. A reliability perspective of the smart grid. IEEE Trans. Smart Grid 2010, 1, 57–64.
[CrossRef]
2. Ren, Y.; Suganthan, P.N.; Srikanth, N.; Amaratunga, G. Random vector functional link network for short-term
electricity load demand forecasting. Inf. Sci. 2016, 367–368, 1078–1093. [CrossRef]
3. Lee, C.-M.; Ko, C.-N. Short-term load forecasting using lifting scheme and ARIMA models. Expert Syst. Appl.
2011, 38, 5902–5911. [CrossRef]
4. Goia, A.; May, C.; Fusai, G. Functional clustering and linear regression for peak load forecasting.
Int. J. Fofrecast. 2010, 26, 700–711. [CrossRef]
5. Al-Hamadi, H.M.; Soliman, S.A. Fuzzy short-term electric load forecasting using Kalman filter. IEE Proc.
Gener. Transm. Distrib. 2006, 153, 217–227. [CrossRef]
6. Ramos, S.; Soares, J.; Vale, Z. Short-term load forecasting based on load profiling. In Proceedings of the 2013
IEEE Power and Energy Society General Meeting, Vancouver, BC, Canada, 21–25 July 2013; pp. 1–5.
7. Li, W.; Zhang, Z.G. Based on Time Sequence of ARIMA Model in the Application of Short-Term Electricity
Load Forecasting. In Proceedings of the 2009 International Conference on Research Challenges in Computer
Science, Shanghai, China, 28–29 December 2009; pp. 11–14.
8. Deshmukh, M.R.; Mahor, A. Comparisons of Short Term Load Forecasting using Artificial Neural Network
and Regression Method. Int. J. Adv. Comput. Res. 2011, 1, 96–100.
9. Taylor, J.W. Short-Term Load Forecasting With Exponentially Weighted Methods. IEEE Trans. Power Syst.
2012, 27, 458–464. [CrossRef]
10. Kouhi, S.; Keynia, F.; Ravadanegh, S.N. A new short-term load forecast method based on neuro-evolutionary
algorithm and chaotic feature selection. Int. J. Electr. Power Energy Syst. 2014, 62, 862–867. [CrossRef]
11. Raza, M.Q.; Khosravi, A. A review on artificial intelligence based load demand forecasting techniques for
smart grid and buildings. Renew. Sustain. Energy Rev. 2015, 50, 1352–1372. [CrossRef]
Entropy 2016, 18, 330 18 of 19
12. Lin, C.T.; Chou, L.D.; Chen, Y.M.; Tseng, L.M. A hybrid economic indices based short-term load forecasting
system. Int. J. Electr. Power Energy Syst. 2014, 54, 293–305. [CrossRef]
13. Yu, F.; Xu, X. A short-term load forecasting model of natural gas based on optimized genetic algorithm and
improved BP neural network. Appl. Energy 2014, 134, 102–113. [CrossRef]
14. Çevik, H.H.; Çunkaş, M. Short-term load forecasting using fuzzy logic and ANFIS. Neural Comput. Appl.
2015, 26, 1355–1367. [CrossRef]
15. Lahouar, A.; Slama, J.B.H. Day-ahead load forecast using random forest and expert input selection.
Energy Convers. Manag. 2015, 103, 1040–1051. [CrossRef]
16. Ho, K.L.; Hsu, Y.Y.; Chen, C.F.; Lee, T.E. Short term load forecasting of Taiwan power system using
a knowledge-based expert system. IEEE Trans. Power Syst. 1990, 5, 1214–1221.
17. Srinivasan, D.; Tan, S.S.; Cheng, C.S.; Chan, E.K. Parallel neural network-fuzzy expert system strategy for
short-term load forecasting: System implementation and performance evaluation. IEEE Trans. Power Syst.
1999, 14, 1100–1106. [CrossRef]
18. Quan, H.; Srinivasan, D.; Khosravi, A. Uncertainty handling using neural network-based prediction intervals
for electrical load forecasting. Energy 2014, 73, 916–925. [CrossRef]
19. Hernández, L.; Baladrón, C.; Aguiar, J.M.; Carro, B.; Sánchez-Esguevillas, A.; Lloret, J. Artificial neural
networks for short-term load forecasting in microgrids environment. Energy 2014, 75, 252–264. [CrossRef]
20. Ko, C.N.; Lee, C.M. Short-term load forecasting using SVR (support vector regression)-based radial basis
function neural network with dual extended Kalman filter. Energy 2013, 49, 413–422. [CrossRef]
21. Che, J.X.; Wang, J.Z. Short-term load forecasting using a kernel-based support vector regression combination
model. Appl. Energy 2014, 132, 602–609. [CrossRef]
22. Pandian, S.C.; Duraiswamy, K.; Rajan, C.C.A.; Kanagaraj, N. Fuzzy approach for short term load forecasting.
Electr. Power Syst. Res. 2006, 76, 541–548. [CrossRef]
23. Božić, M.; Stojanović, M.; Stajić, Z.; Stajić, N. Mutual Information-Based Inputs Selection for Electric Load
Time Series Forecasting. Entropy 2013, 15, 926–942. [CrossRef]
24. Ma, L.; Zhou, S.; Lin, M. Support Vector Machine Optimized with Genetic Algorithm for Short-Term Load
Forecasting. In Proceedings of the 2008 International Symposium on Knowledge Acquisition and Modeling,
Wuhan, China, 21–22 December 2008; pp. 654–657.
25. Gao, R.; Liu, X. Support vector machine with PSO algorithm in short-term load forecasting. In Proceedings
of the 2008 Chinese Control and Decision Conference, Yantai, China, 2–4 July 2008; pp. 1140–1142.
26. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [CrossRef]
27. Jurado, S.; Nebot, À.; Mugica, F.; Avellana, N. Hybrid methodologies for electricity load forecasting:
Entropy-based feature selection with machine learning and soft computing techniques. Energy 2015, 86,
276–291. [CrossRef]
28. Wilamowski, B.M.; Cecati, C.; Kolbusz, J.; Rozycki, P. A Novel RBF Training Algorithm for short-term Electric
Load Forecasting and Comparative Studies. IEEE Trans. Ind. Electron. 2015, 62, 6519–6529.
29. Wi, Y.M.; Joo, S.K.; Song, K.B. Holiday load forecasting using fuzzy polynomial regression with weather
feature selection and adjustment. IEEE Trans. Power Syst. 2012, 27, 596–603. [CrossRef]
30. Viegas, J.L.; Vieira, S.M.; Melício, M.; Mendes, V.M.F.; Sousa, J.M.C. GA-ANN Short-Term Electricity
Load Forecasting. In Proceedings of the 7th IFIP WG 5.5/SOCOLNET Advanced Doctoral Conference on
Computing, Electrical and Industrial Systems, Costa de Caparica, Portugal, 11–13 April 2016; pp. 485–493.
31. Li, S.; Wang, P.; Goel, L. A novel wavelet-based ensemble method for short-term load forecasting with hybrid
neural networks and feature selection. IEEE Trans. Power Syst. 2015, 1788–1798. [CrossRef]
32. Hu, Z.; Bao, Y.; Chiong, R.; Xiong, T. Mid-term interval load forecasting using multi-output support vector
regression with a memetic algorithm for feature selection. Energy 2015, 84, 419–431. [CrossRef]
33. Koprinska, I.; Rana, M.; Agelidis, V.G. Yearly and seasonal models for electricity load forecasting.
In Proceedings of the 2011 International Joint Conference on Neural Networks (IJCNN), San Jose, CA,
USA, 31 July–5 August 2011; pp. 1474–1481.
34. Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information: Criteria of max-dependency,
max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [CrossRef]
[PubMed]
Entropy 2016, 18, 330 19 of 19
35. Nguyen, X.V.; Chan, J.; Romano, S.; Bailey, J. Effective global approaches for mutual information based feature
selection. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining, New York, NY, USA, 24–27 August 2014; pp. 512–521.
36. Speybroeck, N. Classification and regression trees. Int. J. Public Health 2012, 57, 243–246. [CrossRef] [PubMed]
37. Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [CrossRef]
38. Sood, R.; Koprinska, I.; Agelidis, V.G. Electricity load forecasting based on autocorrelation analysis.
In Proceedings of the 2010 International Joint Conference on Neural Networks (IJCNN), Barcelona, Spain,
18–23 July 2010; pp. 1–8.
39. Kohavi, R.; John, G.H. Wrappers for feature subset selection. Artif. Intell. 1997, 97, 273–324. [CrossRef]
40. Dudek, G. Short-Term Load Forecasting Using Random Forests. In Intelligent Systems’2014; Springer: Cham,
Switzerland, 2015; pp. 821–828.
41. Che, J.X.; Wang, J.Z.; Tang, Y.J. Optimal training subset in a support vector regression electric load forecasting
model. Appl. Soft Comput. 2012, 12, 1523–1531. [CrossRef]
42. Sheela, K.G.; Deepa, S.N. Review on Methods to Fix Number of Hidden Neurons in Neural Networks.
Math. Probl. Eng. 2013, 2013, 425740. [CrossRef]
43. Rana, M.; Koprinska, I. Forecasting electricity load with advanced wavelet neural networks. Neurocomputing
2016, 182, 118–132. [CrossRef]
© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC-BY) license (http://creativecommons.org/licenses/by/4.0/).