0% found this document useful (0 votes)
38 views11 pages

Power Load Probability Density Forecasting Using Gaussian Process Quantile Regression

This paper presents a Gaussian process quantile regression (GPQR) model for power load probability density forecasting, addressing the uncertainties in load predictions due to various external factors. The model is evaluated using real datasets from the PJM electric power company, demonstrating improved prediction intervals and forecasting accuracy compared to traditional methods. The study highlights the effectiveness of GPQR in providing a comprehensive probabilistic distribution of future power demand, which is essential for effective grid management and operations.

Uploaded by

1289922836
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views11 pages

Power Load Probability Density Forecasting Using Gaussian Process Quantile Regression

This paper presents a Gaussian process quantile regression (GPQR) model for power load probability density forecasting, addressing the uncertainties in load predictions due to various external factors. The model is evaluated using real datasets from the PJM electric power company, demonstrating improved prediction intervals and forecasting accuracy compared to traditional methods. The study highlights the effectiveness of GPQR in providing a comprehensive probabilistic distribution of future power demand, which is essential for effective grid management and operations.

Uploaded by

1289922836
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Applied Energy 213 (2018) 499–509

Contents lists available at ScienceDirect

Applied Energy
journal homepage: [Link]/locate/apenergy

Power load probability density forecasting using Gaussian process quantile T


regression

Yandong Yanga,b, Shufang Lia,b, , Wenqi Lic, Meijun Qua,b
a
Beijing Key Laboratory of Network System Architecture and Convergence, Beijing University of Posts and Telecommunications, Beijing 100876, China
b
Beijing Laboratory of Advanced Information Network, University of Posts and Telecommunications, Beijing 100876, China
c
State Grid Henan Electric Power Company, Jinshui District, Zhengzhou 450052, China

H I G H L I G H T S

• Propose a Gaussian process quantile regression (GPQR) model.


• The proposed method can provide power load probability density forecasting.
• PICP, PINAW and CWC are adopted to assess the GPQR model.
• The quality of PIs can be significantly improved by the GPQR model.
• The power load forecast accuracy are evaluated by two cases of PJM.

A R T I C L E I N F O A B S T R A C T

Keywords: Accurately predicting the power load in certain areas is of great importance for grid management and power
Power load forecasting dispatching. A great deal of research has been conducted within the smart grid system community in developing
Gaussian process an assortment of different algorithms that seek to increase the accuracy of these predictions. However, these
Quantile regression predictions suffer from various sources of error, such as the variations in weather conditions, calendar effects,
Probability density forecasting
economic indicators, and many other sources, which are caused by the inherent stochastic and nonlinear
characteristics of power demand. In order to quantify the uncertainty in load forecasting effectively, this paper
proposes a comprehensive probability density forecasting method employing Gaussian process quantile re-
gression (GPQR). GPQR is a type of Bayesian non-parametric method which can handle the uncertainties in
power load data in a principled manner. Consequently, the probabilistic distribution of power load data can be
statistically formulated. The effectiveness of the proposed method for short-term load forecasting has been as-
sessed adopting the real dataset provided by American PJM electric power company. Numerical results de-
monstrate that the uncertainties in power load data can be effectively acquired based on the proposed method.
Meanwhile, the competitive predictive performance could be yielded with respect to the conventional adopted
methods.

1. Introduction can provide insightful information for system operators to reduce the
maintenance costs [1], but also can ensure reliable power systems
The success of smart grid applications relies on the quality of grid planning and operations [2].
information. This is especially true for state grid intelligent control Due to the significance of Short-Term Load Forecasting (STLF),
system, where reliable and accurate grid information is highly desired many efforts have been devoted to developing varieties of STLF tech-
for system operators. One of the critical needs for smart grid is to niques such as statistical methods, machine learning methods and hy-
forecast the future power load. As is known, electricity cannot be stored brid models. For an overview of the related works, the interested
in energy storage devices efficiently in large quantities, therefore readers can refer to the recent book by [3]. The classical statistical
system operators need to ensure that the amount generated during a models for STLF are various types of ARIMA which express the forecast
certain period is sufficient to satisfy the load while not exceeding this as a function of historical load and possible exogenous variables [4–6].
demand significantly. Accurately predicting the power load not only Meanwhile, machine learning-based models such as Support Vector


Corresponding author at: Beijing Key Laboratory of Network System Architecture and Convergence, Beijing University of Posts and Telecommunications, Beijing 100876, China.
E-mail address: bupt_paper@[Link] (S. Li).

[Link]
Received 28 August 2017; Received in revised form 2 November 2017; Accepted 4 November 2017
Available online 10 November 2017
0306-2619/ © 2017 Elsevier Ltd. All rights reserved.
Y. Yang et al. Applied Energy 213 (2018) 499–509

Regression (SVR) [7,8] and Gaussian Process Regression (GPR) [9,10] high uncertainty. Kernel-based GPR [9] has been developed to fully
have achieved great success in the last decade. Neural Network (NN) make use of the relationships between multiple profiles. Meanwhile, the
models have gained popularity for their excellent ability to model performance of three combinations of kernels have been compared. It is
complex nonlinear relationship as well [11]. In addition, hybrid models demonstrated that kernels with a multiplicative structure yield superior
such as [12–15], taking advantages of existing models, have attracted predictive performance than the widely adopted additive models.
considerable attention of researchers. Selection of any of these techni- However, these kernel-based GPR models only provided point predic-
ques for STLF depends on problem domain, computation burden, da- tions instead of probabilistic predictions of the future demand.
taset size, and analysis purpose. Compared with GPs regression, Quantile regression (QR) is a type of
Generally speaking, most existing literatures on electric load fore- regression method which aims at estimating quantiles of the response
casting can be divided into two categories which are point forecasts and variable given certain values of the predictor variables [26,27]. Re-
interval predictions [16,17] according to forecasting outputs. Point lative to the classical least squares estimation, quantile regression es-
forecasts are the most traditional techniques, which provide an estimate timates are more robust against to outliers in the response measure-
of the future load for each step throughout the forecasting horizons as ments. What’s more, different measures of central tendency and
precise as possible. Rather than providing single-valued load forecast statistical dispersion can be useful to obtain a more comprehensive
information, interval prediction methods attempt to construct well-ca- analysis of the relationship between variables. Therefore, quantile re-
librated lower and upper bounds of the future prediction associated gression has been applied in the field of energy extensively, where
with a prescribed probability called the confidence level. Different from probabilistic analysis and reliability assessment of power load are very
these two types of forecasts, probability density forecasting can quan- significative [28,29]. The traditional regression models assume that any
tify the uncertainty by constructing probability density function of uncertainty in the learned model results from incomplete knowledge of
forecasting results. What is more, it can provide full probability dis- underlying deterministic function. Quantile regression is the most re-
tribution description of the future demand, which is especially desirable levant when the response is likely to be subject to variability or intrinsic
for power system management. Although calculating the probability for randomness [30].
each possible prediction requires extra efforts, the additional informa- Due to the stochastic characteristics of power demand and various
tion is highly useful to facilitate the full understanding of the service external impacts such as calendar effects, seasonal factors and weather
reliability. conditions, the load signals would exhibit volatile temporal character-
With the penetration of renewable energies such as wind and solar istics. Thus, a more meaningful prediction scheme should provide the
power, the level of uncertainty in power systems has significantly in- most probable distribution of power load rather than one crisp value. As
creased. It is imperative to quantify potential uncertainties associated is known, there is no certain in forecasting. In this paper, we solve the
with forecasts. The applications of NN-based prediction intervals (PIs) problem of short-term probabilistic load forecasting by the means of
for quantifying uncertainties associated with forecasted loads were in- Gaussian process quantile regression (GPQR), which incorporates
vestigated in [16,18], and the numerical results showed that prediction Gaussian processes into the quantile regression to construct a more
intervals provide more information about uncertainties existed in the powerful nonlinear quantile regression model. GPQR can be thought of
process of load forecasting. Nevertheless, construction of (PIs) using as a Bayesian alternative to the kernel methods [31]. It is the first ap-
these NN-based methods are computationally expensive. The interval plication of method to handle the uncertainties in load forecasting to
type-2 FLS (IT2 FLS) was adopted to STLF for handling uncertainties the best of our knowledge, which will enrich the literatures on prob-
[19], whereas the output of the IT2 FLS is an interval rather than a abilistic forecasting for electricity load data. A key advantage of this
prediction interval. Liu et al. [20] utilized the quantile regression approach is non-parametric, which means our method can model ar-
averaging (QRA) methodology to a set of sister point forecasts and bitrarily complex systems given enough data.
generated PIs of future electric loads. Besides, a boosting additive The contributions could be summarized as follows:
quantile regression model for a set of quantiles of the future distribution
was proposed by Taieb et al. [21]. Unlike interval prediction, prob- • This paper proposes a comprehensive probabilistic load forecasting
ability density forecasting can provide a new perspective to solve this method based on Gaussian process quantile regression which can
problem of uncertainties evaluation in power systems. There have been generate complete probability distribution of the future demand. We
only a limited number of studies investigating the underlying prob- compare three different kernel functions and choose the optimal
abilistic information. A semi-parametric additive model for forecasting kernel function for the proposed model.
the probability density functions (PDF) of the half-hourly electricity • Three Prediction Intervals (PIs) assessment criteria are employed to
demand for power system was proposed in [22]. Recently, quantile evaluate the performance of the proposed probabilistic load fore-
regression neural network (QRNN) [23] has been proposed to draw casting method, i.e., PI coverage probability (PICP), PI normalized
complete conditional probability density curve of future load, but the average (PINAW) and coverage width-based criterion (CWC).
most frequently occurring value of probability density curve could not • Based on the open datasets obtained from PJM electric power
achieve the most accurate prediction value. This phenomenon reveals company, we demonstrate the advantages of the proposed approach
that the shallow architecture of QRNN lacks enough capability to model compared with the common used methods under various time-
the complex temporal characteristics of load series [24]. Later, a kernel- scales.
based support vector quantile regression [25] was put forward to im- • The obtained results from two case studies show that the quality of
prove the forecasting accuracy and quantify the uncertainty. In the PIs has been significantly improved compared with BPQR and SVQR
smart grid era, the electricity demand is more active and less predictive models.
than before, probabilistic load forecasting should be capable of quan-
tifying the inherent uncertainties in load series and helpful to assess the The reminder of this paper is organized as follows: An overview of
risk of relying on the forecasts. Hence, a reliable and efficient prob- existing literatures on short-term load forecasting is provided in the first
abilistic load forecasting technique is urgent to be designed. section, and then, the mathematical background of the GPQR model is
As to time series modeling, Gaussian processes (GPs) are one of the described in Section 2. In Section 3, we introduce several point fore-
most popular and advanced choice in the current state of the art for casting metrics and PI evaluation indices for the model assessment.
regression, for they are naturally able to handle complex relationships Section 4 validates the effectiveness and superiority of the proposed
contained in time series. In addition, the uncertainty about the variance algorithm by comprehensive case studies adopting real-world PJM
of the series at each point could be maintained, which is important for electricity datasets. Finally, the conclusions and guidelines for future
STLF since the nonstationarity and variability of load series can lead to work are outlined in Section 5.

500
Y. Yang et al. Applied Energy 213 (2018) 499–509

Start The covariance function hyperparameters, θf and θl , control the


length scale. This covariance function is infinitely differentiable, which
means that the GP with this covariance function is very smooth. Such
Split dataset into training set and test set then strong smoothness assumptions are too strict for realistic physical
Normalization phenomena [33], and thus a common alternative is the Matérn covar-
iance class:
Correlation Analysis between temperature Kernel function ν
and power load selection 21 − ν ⎛ x −x ′ ⎞ x −x ′ ⎞
κMa (x −x ′) = σ 2 2ν K ν ⎛ 2ν
Γ(ν ) ⎝ l ⎠ ⎝ l ⎠ (5)
where ν and l are both positive hyperparameters, K ν is the modified
Gaussian process quantile regression model
Bessel function. This covariance function become especially simple
when ν is half-integer: ν = p + 1/2, where p is a non-negative integer.
Kernel function and The most common cases are ν = 5/2 and ν = 3/2 , which are named as
Conditional quantile Ma5 and Ma3 in this work. The final covariance function is the period
Bandwith selection
covariance,

2 x −x ′ ⎞ ⎞
Kernel density estimation κpe (x ,x ′) = θf2exp ⎛⎜− 2 sin2 ⎛⎜π ⎟⎟

⎝ θl ⎝ p ⎠⎠ (6)
which is suitable for modeling periodic phenomena. The hyperpara-
Probability density forecasting meter p controls the periodicity. Although there are many literatures
cover the design of complex kernels [34], we limit our attention to the
aforementioned relatively straightforward kernels, since these kernels
End are found to be capable of tackling the STLF problem in our study.

Fig. 1. The flowchart of GPQR probability density forecasting model. 2.2. Gaussian process quantile regression model

2. Methodology Typically, the load forecasts depend on varieties of features in-


cluding calendar variables, weather conditions, electricity prices, his-
This section gives a detailed description of the approach to handling torical load data, etc. A formulation takes into account these features
the uncertainty in the probabilistic load forecasting. As the traditional can be constructed in (7) as
linear quantile regression model is not suitable to solve the complex y ̂ = f (t ,d,vl,vt ,p) (7)
nonlinear optimization problem owing to its limited ability. Hence, the
flexible Gaussian process quantile regression (GPQR) model is utilized where the dependent variables are: t ∈ [0,24] is the hour of day,
to estimate the quantiles in this paper. It should be mentioned that this d ∈ {1,2,…,365,366} is the day of the year, vl is a vector of the historical
type of model has not been applied in this area so far. Generally, three power load values, vt is a vector of weather variables like temperature, p
steps are required to perform probabilistic load prediction. Firstly, the is the real-time price. The Gaussian process quantile regression (GPQR)
influential factors which are relevant to load variability should be algorithm is utilized to capture the relationship between input and
identified from historical load and meteorological datasets. Secondly, output. GPQR is developed based on quantile regression algorithm, but
the GPQR algorithm is utilized to predict the load on different quantiles builds the model based on a Gaussian process probabilistic framework.
for the next time step. Finally, the probability density function can be Quantile regression (QR) is a regression analysis which could esti-
obtained by kernel density estimate. The flow chart of the proposed mate the quantiles of the conditional distribution of a response variable
GPQR-based probability density forecasting method is shown in Fig. 1. as a function of input variables. The special L1 norm regression case of
We now provide some background on Gaussian process. QR is least absolute deviations regression that involves computing the
median of the conditional distribution. In contrast with L1 norm re-
2.1. Gaussian process gression and L2 norm regression or least-squares regression, quantile
regression involves minimizing a sum of asymmetrically weighted ab-
A Gaussian process (GP) is a collection of random variables, any solute residuals, also known as the tilted or pinball loss function [27].
finite number of which have a joint Gaussian distribution [32]. In our Thus, QR can model a more comprehensive relationship between the
case, the random variables represent the value of the function. Thus it response variables and input variables. In addition, making no as-
defines a probability distribution over functions, and can be described sumptions about the nature of the error process leads to the application
as: of QR more extensive and flexible, like econometrics [35], electricity
market [36] and big data applications [37].
f (x ) ∼ GP (m (x ), κ (x ,x ′)) (1) The tilted loss function is defined as
where m (x ) and κ (x ,x ′) are the mean and covariance functions re- τξi if ξi ⩾ 0
spectively, denoted by L τ (ξi ) = ⎧
⎨ (τ −1) ξi if ξi < 0 (8)

m (x ) = E [f (x )] (2)
where τ ∈ [0,1] is the required quantile, ξi = yi −yi ̂ and yi ̂ is the predicted
κ (x ,x ′) = E [(f (x )−m (x ))(f (x ′)−m (x ′))T ] (3) (quantile) model. Many linear programming approaches are utilized to
obtain desired quantiles by minimizing the loss function directly.
Covariance function is crucial for Gaussian process, as it defines However, this minimization is exactly equivalent to the maximization
similarity between data points. The most widely-used covariance of a likelihood function formed by combining independently distributed
function is the squared exponential (SE) which has the form as Asymmetric Laplace Distributions (ALD) [38]. This has provided QR
2
‖x −x ′‖ ⎞ with a Bayesian treatment which has been a rich area of research with
κSE (x ,x ′) = θf2exp ⎛⎜ 2 ⎟
numerous estimation methods developed in the last decade [39]. In this
⎝ θl ⎠ (4)
paper, we apply a Bayesian non-parametric approach using Gaussian

501
Y. Yang et al. Applied Energy 213 (2018) 499–509

process to estimate the desired quantile functions as in [30]. GPs are 3. Evaluation metrics
essentially equivalent to infinite dimensional multivariate Gaussian
distribution [32], which makes them become one of the most flexible In the literatures, various methods have been employed to evaluate
and powerful models to estimate the quantiles. The model can be the performance of prediction models. Out of these methods, the most
trained by maximizing the utility function based on ALD. We can ac- popular ones are error-based measurement, Root Mean Squared Error
quire the ALD by taking the exponent of the negative of the tilted loss (RMSE) and Mean Absolute Percentage Errors (MAPE), to name a few.
and normalizing. The density function is: RMSE and MAPE are defined as follows:

τ (1−τ ) t −μ 1 N
L (t|μ,σ ,τ ) = exp ⎡− (τ −I (t ⩽ μ)) ⎤ RMSE (yi ,yi )̂ = ∑i =1 (yi −yi )̂ 2
σ ⎣ σ ⎦ (9) N (15)

The parameter τ ∈ [0,1] controls the skewness of the distribution. The 1


N
yi −yi ̂
mean μ can take any real value and the standard deviation σ > 0 . The MAPE (yi ,yi )̂ = ∑
N i=1
yi (16)
indicator function I (t ⩽ μ) is 1 if the condition is true, 0 otherwise.
Therefore the utility function is defined as where i denotes the hour, N indicates the total number of the hour over
N the forecasting period. yi and yi ̂ represent the i-th actual and predicted
⎡ ⎤ value, respectively.
Uτ (y,q) = Z exp ⎢− ∑ L τ (yi ,qi ) ⎥
⎣ i=1 ⎦ (10) Likewise, the quality of Prediction Intervals (PIs) need to be quan-
titatively evaluated as well. PICP (PI coverage probability) and PINAW
where q is the predicted value of the τ quantile, y the observations, Z
(PI normalized average width) are usually considered as the criterions
the normalization constant. Then a GP prior is placed on the quantile
for assessing the accuracy of the PIs. PICP is the key feature of PIs,
regression function:
which shows the probability of target values will be covered by the
p (q) = GP (q|0,K ) (11) upper and lower bounds. A larger PICP means that more targets lie
within the constructed PIs and vice versa. PICP is defined as follows
and the model can be trained by maximizing the integral. [44]:
argmax ∫q Uτ (y,q) p (q) dq (12) 1
N
PICP =
N
∑ εi
i=1 (17)
This integral is analytically intractable, however it can be locally ap-
proximated using an Expectation Propagation algorithm outlined in where εi is a Boolean variable, which shows the coverage behavior of
[40]. The GP hyperparameters learning proceed in a similar fashion to PIs. If the target value yi is covered by the lower bound Li and upper
the ordinary GP regression [32]. The numerous contributing sources of bound Ui , then εi = 1; otherwise εi = 0 . Mathematically, εi is defined as
uncertainty leading to a highly complicated stochastic process moti- follows:
vates us to adopt the flexible GPQR method to tackle this issue.
1, if yi ∈ [Li ,Ui];
εi = ⎧

⎩ 0, if yi ∉ [Li ,Ui]. (18)
2.3. Probability density prediction based on kernel density estimation
Essentially, to construct valid PIs, PICP should be sufficiently close
Kernel density estimation (KDE) is a non-parametric method for to or greater than the nominal confidence level. The ideal value for
estimating the probability density function from a series of samples. It is PICP is 100%, namely, all the target values lie within the constructed
also a fundamental data smoothing problem where inferences about the PIs.
population are made, based on a finite data sample. Let X1,X2 ,…,Xn be an The high reliability of the PIs can be easily obtained via simply
independent and identically distributed sample drawn from some dis- increasing their width. However, this is meaningless for practical ap-
tribution with an unknown density f. We are interested in estimating plications. Moreover, in order to further evaluate the quality of PIs, the
the shape of this function f whose kernel density estimator is widths of PIs should be measured. The widths of PIs determine their
n informativeness, thus narrow PIs are more informative than wide PIs. In
1
fh ̂ (x ) = ∑ K (x ,x i ) the literature [45], the PI normalized average width (PINAW) is defined
n i=1 (13) as follows:
where K (·) is the kernel. A kernel is a non-negative symmetric function N
1
that integrates to one and has mean zero. These kernel functions usually PINAW =
NR
∑ (Ui−Li )
represent some kind of similarity between two points in a space. There i=1 (19)
are many types of kernels we can choose from, empirically it is not very where R is the target range calculated by maximum minus minimum of
important. Regularly Gaussian kernel is used, it is represented as fol- the target values. PINAW is the mean of the PI widths normalized by the
lows: target range. The normalization allows PIs constructed for different
(x − x ) 2 targets to be compared objectively. This criterion expresses the ability
1 − 1 2
K (x1,x2,σ ) = e 2σ 2 of the model to concentrate the uncertainty information of the prob-
2π σ (14)
abilistic forecasts. Practically, it is desirable to have PIs with a small
The σ > 0 is a smoothing parameter called the bandwidth, which PINAW and a high PICP.
heavily influences the shape of the distribution. A narrow bandwidth Both PICP and PINAW only assess one aspect of PIs individually.
would generate a very wonky distribution while a large bandwidth Focusing on only one side of PIs would lead to misleading results. In
would lose a lot of details. There is a natural bias-variance tradeoff for order to reconcile both aspects and evaluate the overall quality of PIs
the selection of bandwidth. There are many bandwidth selection comprehensively, coverage width-based criterion (CWC) [46] is widely
methods such as cross validation, plug-in methods and rule of thumb, used in practice.
see [41,42] for reviews. The advantage of Gaussian kernel KDE is that it
CWC = PINAW (1 + γe−η (PICP − μ) ) (20)
can calculate the bandwidth by a rule of thumb [43] automatically. To
generate a full probability density, we repeat the density estimation for where γ and η are parameters determining the contribution of PICP and
a range of different values of x. PINAW. If the coverage probability is above the confidence threshold μ ,

502
Y. Yang et al. Applied Energy 213 (2018) 499–509

i.e. PICP ⩾ μ , then γ = 0 , CWC only depends on the PI’s width. If the
coverage probability is below the confidence threshold. I.e. the PIs are
not reliable, then γ = 1, CWC will be high, regardless of PI’s width
measured by PINAW. This is achieved by setting a high value for η in
the exponential term. In this way, the CWC balances the usefulness of
the constructed PIs (narrow width) with their correctness (acceptable
coverage probability).

4. Numerical experiments

In this section, the performance of the proposed GPQR forecasting


model is assessed adopting the actual electricity and meteorological
dataset. We take the dataset from the PJM Interconnection company
website [47], which is a regional transmission organization (RTO) in
the United States. In this dataset, the hourly load measurements and
temperature data from January 1, 2011 to December 31, 2011 is Fig. 3. Hourly load against temperature, with cubic polynomial fits in red. (For inter-
available. In order to manage reliable operations and efficient whole- pretation of the references to color in this figure legend, the reader is referred to the web
sale markets, the electricity decision making is conducted based on one version of this article.)

hour interval. In this sense, the hour-ahead load forecasting is beneficial


to help dispatching electric generating plants on a lowest cost basis, polynomial fitting models are preferred over quadratic polynomial fit-
thereby reducing the electric costs for electricity producers, so we focus ting models, because the data shape is usually asymmetric, while
on the hour-ahead forecasting in the following experiments. This study quadratic polynomial models can only produce symmetric shapes.
concentrates on the probabilistic load forecasting, so some popular The temperature serves as an important feature in our forecasting
existing forecasting approaches are also tested for comparison. The model. In order to model the effects of recent temperatures on the load,
whole dataset is reviewed and some missing points are filled by the we include the cubic of current temperature and the average of lagged
neighborhood values to guarantee satisfying quality of data. All these temperatures as exogenous variables. Since the temperature at future
forecasting approaches are implemented with Matlab, and tested using time interval t is unknown. Instead, one can use the forecasting tem-
a personal computer with Intel i7 2.6 GHz CPU, 8 GB DDR4 RAM perature at time interval t or the approximate temperature at time in-
memory. terval t −1. Note that we do not take into account the uncertainty in
temperature forecasts. Because we focus on hour ahead load forecasts,
4.1. Identifying the relationship between temperature and load the uncertainty in temperature is expected to be small. Finally, we in-
corporate the lagged load for each of the preceding 12 h as predictor
Among all the meteorological elements such as temperature, hu- variables. This allow us to capture the serial correlations within the
midity, wind and cloud cover, temperature contributes the most to load time series.
majority of load forecasting problems. Therefore, we mainly take the
temperature into consideration. Fig. 2 shows the hourly power load and 4.2. Quantitative analysis
temperature within a month in winter and in summer. It can be seen
that the load in winter is almost inversely proportional to the tem- A series of simulations are conducted to demonstrate the feasibility
perature, and the lower the temperature is, the higher the loads are, and effectiveness of the proposed approach in this subsection. To mi-
which is primarily due to heating. In the summer, changes in hourly tigate the impact of seasonal uncertainty on prediction accuracy, these
demand follow closely changes in temperature, which is primarily due methods are seasonally trained and evaluated because of the erratic
to space cooling needs. The scatter plot in Fig. 3 presents a typical load nature of the weather system. Furthermore, some other popular fore-
and temperature relationship, On the left side, lower the temperature casting techniques are tested for comparison at the same time, in-
results in higher load. On the right side, higher temperature also results cluding SVM, GP and BP. The SVM and BP are extended to probabilistic
in higher load. The load dependency on temperature can be well-ap- STLF via QR, which is frequently used for probabilistic STLF. Moreover,
proximated by cubic polynomial. For load forecasting, the cubic to further demonstrate the superiority of the proposed probabilistic

Fig. 2. Hourly power load and temperature (blue: hourly power load, red: temperature). (For interpretation of the references to color in this figure legend, the reader is referred to the
web version of this article.)

503
Y. Yang et al. Applied Energy 213 (2018) 499–509

Table 1 Table 3
Forecasting errors on two datasets. Forecasting errors of SVR, GPR, RNN and GPQR on two datasets.

Datasets Algorithm One hour ahead Two hour ahead Datasets Algorithm One hour ahead Two hour ahead

MAPE (%) RMSE (%) MAPE (%) RMSE (%) MAPE (%) RMSE (%) MAPE (%) RMSE (%)

2011.01 SE GPQR 1.73 3.79 2.23 5.45 2011.01 SVR 2.93 5.90 2.83 5.98
Ma3 GPQR 2.07 4.55 2.51 5.59 GPR 2.16 4.45 2.87 7.26
Periodic GPQR 2.21 4.74 3.26 6.90 BPNN 2.86 6.32 3.76 8.65
GPQR(SE) 1.73 3.79 2.23 5.45
2011.07 SE GPQR 1.75 4.44 2.29 6.16
Ma3 GPQR 1.96 5.22 2.84 6.18 2011.07 SVR 2.11 5.08 2.46 6.30
Periodic GPQR 1.76 4.47 3.18 8.14 GPR 1.84 4.77 2.87 7.26
BPNN 2.25 5.88 3.19 8.27
GPQR(SE) 1.75 4.44 2.29 6.16
STLF approach, these simulations have been conducted with different
time resolutions. We first choose two typical time periods in the winter
(Jan 5–31) and summer (Jul 5–31) of 2011 as two subsets. The two operation of power system. For the case of winter in January, 2011, the
subsets have 24 load points in each day. Then the two subsets are PICP of the three different kernels GPQR is 99.40%, while the PINAW of
further divided into training set (Jan 5–24, 2011 and Jul 5–24, 2011) 1-h ahead prediction is smaller than that of 2-h ahead prediction. For
and test set (Jan 25–31, 2011 and Jul 25–31, 2011). Before constructing the case of summer in July, 2011, the high PICP values for three kernel
the forecasting models, we normalize all samples, with the aim of fur- functions are achieved as well. This also means that the constructed PIs
ther improving the numerical stability of the training process. For this cover the target values with a high probability. In addition, the PINAW
reason, the minimum and maximum values of all inputs and target values of the Ma3 GPQR model is wider than that of the other two
samples are linearly normalized to the range of [0, 1]. In the CWC models. Considering the highly volatility of the short-term power load,
evaluation criteria, we use η = 10 and nominal confidence level the probabilistic performance obtained from the proposed approach is
μ = 0.9 . Note that the high value of η will highly penalize the PIs with a satisfactory.
coverage probability below the nominal confidence level. Because the proposed comprehensive GPQR model also provides
To identify a suitable kernel function, we first assess three common point forecasts, we compare its performance with three baselines: SVR,
kernels in our comparative studies, namely, Squared Exponential (SE), GPR and BP Neural network methods. The comparison results are
Matérn 3/2 (Ma3) and Periodic (Pe). The two cases select 20 quantiles summarized in Table 3. For the SVR model, Radial Basis Function (RBF)
with the interval of 0.05, and the quantile is from 0.01 to 0.96. The is utilized with three adjustable parameters: penalty parameter C, width
mode of the probability density function is chose as the point prediction parameter γ and the insensitive loss function ∊. These parameters are
estimator. Table 1 presents the prediction errors in terms of seasons and calibrated using 5-fold cross validation, we choose C = 10,γ = 0.01 and
prediction horizons. The prediction horizons range from 1-h ahead to 2- ∊ = 0.1 in the experiment. For the GPR model, the hyperparameters are
h ahead. Two forecasting error measurements, MAPE and RMSE are inferred by using a maximum posteriori probability (MAP) estimate. As
utilized to evaluate the forecasting errors of the proposed model. It is to BPNN model, the neural network structure for the two cases are 14-7-
evident that the MAPE and RMSE results obtained from the SE kernel 1. One can see that the GPQR method is superior to the three bench-
GPQR perform the best in two prediction horizons. Quantitatively, for marks based on the results in two prediction horizons of the two cases.
the case of winter in January, 2011, at 1-h-ahead prediction, the MAPE It should be noted that, the accuracy would degrade as the prediction
is 1.73% and RMSE is 3.79%, at 2-h-ahead prediction, the MAPE and horizon increases. In summary, we can say that the point forecasting
RMSE are 2.23% and 5.45%, respectively. For the case of summer in performance of the proposed GPQR is very competitive compared to the
July, 2011, at 1-h-ahead prediction, the MAPE is 1.75% and RMSE is existing state-of-art techniques.
4.44%, at 2-h-ahead prediction, the MAPE and RMSE are 2.29% and To examine the prediction performance in a more intuitive way, we
6.16%, respectively. We can conclude that the SE kernel function can give a visually illustration of the probabilistic forecasting results for the
capture the load pattern effectively and thus achieve a good perfor- two cases. Figs. 4 and 6 draw the point forecasts (mode) and prediction
mance. Moreover, it can be easily found that the performance dete- intervals given by GPQR(SE) method in two prediction horizons, which
riorates as the prediction step increases. This is because the long-term show that the target values always fall into the prediction intervals and
load data exhibits a more erratic nature and thus are more difficult to actual load trajectories are closely followed by the mode of prediction
predict. results. We can conclude that the proposed method can capture the
For demonstrating the superiority of the proposed probabilistic load electricity load fluctuations accurately. Meanwhile, Figs. 5 and 7 pro-
forecasting method, PICP, PINAW and CWC are chosen as the evalua- vide the diagrams of one-hour ahead probability density curve for 6 h
tion criterias and the performance comparisons are presented in based on GPQR(SE) method on Jan 25, 2011 and Jul 25, 2011, re-
Table 2. We can find that the constructed PIs cover the target values spectively, with the actual demands shown as red vertical lines. In
with a high probability, which is highly required for the reliable Figs. 5 and 7, the actual demand values at the six hour periods all fall

Table 2
PI evaluation indices on two datasets.

Datasets Algorithm One hour ahead Two hour ahead

PICP (%) PINAW (%) CWC (%) PICP (%) PINAW (%) CWC (%)

2011.01 SE GPQR 99.40 23.82 23.82 99.40 26.16 26.16


Ma3 GPQR 99.40 25.28 25.28 99.40 28.24 28.24
Periodic GPQR 99.40 22.88 22.88 99.40 28.95 28.95

2011.07 SE GPQR 98.21 17.70 17.70 97.62 19.37 19.37


Ma3 GPQR 95.24 24.06 24.06 91.07 28.24 28.24
Periodic GPQR 98.21 14.06 14.06 98.21 16.95 16.95

504
Y. Yang et al. Applied Energy 213 (2018) 499–509

Fig. 4. Prediction results based on GPQR(SE) on Jan 25–31, 2011.

within the region predicted from the forecast distribution. Moreover, probability density forecasting method in quantifying the uncertainty
almost all the actual values are located in the middle of probability and improving the prediction accuracy.
density curves, which means these values appear in the forecasting The major goal of the proposed GPQR method is to provide prob-
distributions with high probability. These figures provide a complete abilistic load forecasting. In view of this, its performance is validated
probability description of the future load which show the advantages of from this perspective, some other popular probabilistic forecasting

505
Y. Yang et al. Applied Energy 213 (2018) 499–509

Fig. 5. One-hour ahead load density curve based on GPQR(SE) on Jan 25, 2011 at 6 h, the red lines denote the actual values. (For interpretation of the references to color in this figure
legend, the reader is referred to the web version of this article.)

techniques such as SVQR and BPQR are also tested for comparison. We sharpness, the proposed approach exhibits much better comprehensive
have shown the advantages of using proper kernel function for cap- performance than the other benchmarks. The PIs learned in this way
turing complex seasonal patterns in the previous two case studies, have been demonstrated to provide the satisfactory coverage on real
hence the SE kernel is choosed for analysis here. The comparison results data and the visualisations illustrate the beneficial help to quantify the
are reported in Table 4. It is evident that the proposed approach ex- uncertainties in electricity load series. The proposed method not only
hibits the most ideal PI coverage, while the other two methods fail to outperforms the traditional prediction approaches in terms of single-
satisfy the preassigned PICP (90%). This means that the PIs constructed valued forecasts, but also can provide a valuable probability distribu-
by the GPQR method cover the target values with a high probability. tion. Forecasting distributions provide an indication of the forecast
Moreover, the CWC results obtained from GPQR method is always the accuracy, and give useful information for system scheduling. Besides, it
best regardless of seasons and prediction horizons. In addition, we can enable developing a novel unit commitment model that can optimize
find that the value of CWC in two cases gradually rises as the prediction the operation of energy systems by analyzing different commitment
horizon increases. This is because the increase of prediction step adds strategies in a stochastic as opposed to a deterministic context. For
more uncertainties to the load data. Meanwhile, the comparatively instance, the upper bounds of prediction intervals can be used for de-
better performances demonstrate that the proposed approach has the veloping conservative electricity generation plans and schedules, while
most robust prediction capability among the benchmarks. In summary, the lower bounds of prediction intervals reflects an optimistic attitude
SVQR model is sensitive to the setting of parameters, this would result in scheduling with more attention to over-supply avoidance. Also for
in a poor PI performance if the parameters are not properly set. BPQR example, probabilistic load forecasting results can be employed to fa-
model is easily caught at a local optimum and its prediction result has cilitate hierarchical electricity grid management [2]. Firstly, the hour
great randomness since the initial weight matrix is random. Therefore, ahead point forecasting results can be used to construct real-time proxy
the results demonstrate that the proposed approach not only sig- for assessing the day-ahead unit commitment and energy dispatch.
nificantly improves the prediction performance, but it also exhibits high Then the commitment statuses can be real-time adjusted based on the
stability and strong robustness. Thus the superiority and potentiality of probabilistic information of demand. Thus, the system operators could
the proposed approach are further illustrated. ensure high reliability in stochastic complex power systems with in-
terleaved decision making in different time horizons.
We acknowledge that there are many other effective compound
4.3. Practical application of the proposed approach kernel designs, which could have been implemented for capturing
complex relationships. However, as our focus is on probabilistic load
Because the presence of variability and nonstationarity associated forecasting method, we consider that it is proper to use a relatively
with the electricity demand, an approach to handle these effects is straightforward and uncontroversial kernel design. The simulation re-
developed using quantile regression to learn upper and lower bounds sults have shown the good performance of the commonly used squared
on the predictions. Considering the forecasting reliability and

506
Y. Yang et al. Applied Energy 213 (2018) 499–509

Fig. 6. Prediction results based on GPQR(SE) on Jul 25–31, 2011.

exponential kernel. These GPQR models are implemented in Matlab predictions made, but also quantify the uncertainty associated. More
using the toolbox GPStuff [48] which is available online freely, facil- accurate load forecasting results mean less uncertain on the quantiles
itating the deployment of GPQR probability density forecasting method. and thus lead to decrease the operation cost of electric power and en-
To this end, the proposed model can not only improve the quality of the ergy systems. Therefore, from these presented examples, it can be seen

507
Y. Yang et al. Applied Energy 213 (2018) 499–509

Fig. 7. One-hour ahead load density curve based on GPQR(SE) on Jul 25, 2011 at 6 h, the red lines denote the actual values. (For interpretation of the references to color in this figure
legend, the reader is referred to the web version of this article.)

Table 4
PI evaluation indices of SVQR, BPQR and GPQR on two datasets.

Datasets Algorithm One hour ahead Two hour ahead

PICP (%) PINAW (%) CWC (%) PICP (%) PINAW (%) CWC (%)

2011.01 SVQR 85.71 12.43 31.52 75.60 14.90 77.79


BPQR 83.21 13.48 40.06 73.81 14.36 86.85
GPQR 99.40 23.82 23.82 99.40 26.16 26.16

2011.07 SVQR 89.29 16.23 33.65 83.93 18.21 51.62


BPQR 85.48 14.62 37.59 79.17 12.67 50.09
GPQR 98.21 17.70 17.70 97.62 19.37 19.37

that our proposed probabilistic load forecasting approach with high- associated with each point, which is rather crucial for probabilistic load
accuracy is very appealing for practical applications. forecasting.
Based on the comprehensive studies on the datasets obtained from
5. Conclusions PJM, three GPQR methods have achieved satisfactory prediction per-
formances. It is found that the widely-used squared exponential cov-
In this paper, we present a novel model for short-term load prob- ariance function is an excellent kernel function. By comparing with
ability density forecasts based on Gaussian process quantile regression other conventional algorithms, GPQR algorithm not only receives the
(GPQR). Different from the previous research works for load prediction, most accurate prediction performance in the point prediction, but also
the GPQR method can not only give point forecasts for the future load can precisely output the lower and upper bounds of predicted load.
but also provide complete probability descriptions for prediction in- These analysis results will be practically beneficial to utilities to gen-
tervals, which is especially desirable for quantifying the numerous erate an adequate amount of electricity which could avoid grid outages
uncertainty of power systems. Since kernel functions are significant for and energy losses as well as construct dynamic pricing schemes based
capturing complex relationships, we compare three commonly used upon future demand.
kernel functions and choose the optimal one to construct the model. We have endeavoured to demonstrate the superiority of this method
Then the model forecasting results under different quantiles are input through a number of illustrative figures. However, the main bottleneck
into the kernel density estimation function, and thus the probability of this approach is the heavy computational cost induced by Markov-
density function could be obtained. The main advantage of GPQR Chain Monte Carlo (MCMC) estimation, and therefore we need to
model is that it can maintain the uncertainty about the variance compress the scale of datasets by selecting the most informative subset

508
Y. Yang et al. Applied Energy 213 (2018) 499–509

from the whole available training data. The further research will focus 2012;27(3):1274–82. [Link]
on investigating the improvement of computing efficiency by in- [20] Liu B, Nowotarski J, Hong T, Weron R. Probabilistic load forecasting via quantile
regression averaging on sister forecasts. IEEE Trans Smart Grid 2017;8(2):730–7.
corporating the active learning strategy and selecting the most in- [Link]
formative training sets. Furthermore, the recent emerged dropout [21] Taieb SB, Huser R, Hyndman RJ, Genton MG. Forecasting uncertainty in electricity
neural networks [49] have been proved to be identical to variational smart meter data by boosting additive quantile regression. IEEE Trans Smart Grid
2016;7(5):2448–55. [Link]
inference in Gaussian processes, which provide an interesting direction [22] Fan S, Hyndman RJ. Short-term load forecasting based on a semi-parametric ad-
for uncertainty analysis on load forecasting. ditive model. IEEE Trans Power Syst 2012;27(1):134–41. [Link]
1109/TPWRS.2011.2162082.
[23] He Y, Xu Q, Wan J, Yang S. Short-term power load probability density forecasting
Acknowledgement based on quantile regression neural network and triangle kernel function. Energy
2016;114:498–512.
This work is supported by the Natural Science Foundation of China [24] Goodfellow I, Bengio Y, Courville A. Deep learning. MIT Press; 2016. < http://
[Link] > .
(Grant Nos. 61672292 and 61300162), and the State Grid Corporation
[25] He Y, Liu R, Li H, Wang S, Lu X. Short-term power load probability density fore-
2016 science and technology project: Service information based busi- casting method using kernel-based support vector quantile regression and copula
ness integration and data sharing service technology. theory. Appl Energy 2017;185:254–66.
[26] Koenker R, Bassett G. Regression quantiles. Econometrica 1978;46(1):33–50.
[27] Roger. KoenkerQuantile regression. Cambridge University Press; 2005.
References [28] Haque AU, Nehrir MH, Mandal P. A hybrid intelligent model for deterministic and
quantile regression approach for probabilistic wind power forecasting. IEEE Trans
[1] Masa-Bote D, Castillo-Cagigal M, Matallanas E, Caama-Martn E, Gutirrez A, Power Syst 2014;29(4):1663–72. [Link]
Monasterio-Hueln F, et al. Improving photovoltaics grid integration through short 2299801.
time forecasting and self-consumption. Appl Energy 2014;125(2):103–13. [29] Jnsson T, Pinson P, Madsen H, Nielsen HA. Predictive densities for day-ahead
[2] Dalal G, Gilboa E, Mannor S. Hierarchical decision making in electricity grid electricity prices using time-adaptive quantile regression. Energies
management. In: International conference on machine learning; 2016. p. 2197–206. 2014;7(9):5523–47.
[3] Soliman SA-h, Al-Kandari AM. Electrical load forecasting: modeling and model [30] Boukouvalas A, Barillec R, Dan C. Gaussian process quantile regression using ex-
construction. Elsevier; 2010. pectation propagation. Comput Sci 2012.
[4] Chen P, Pedersen T, Bak-Jensen B, Chen Z. Arima-based time series model of sto- [31] Murphy KP. Machine learning: a probabilistic perspective. MIT Press; 2012.
chastic wind power generation. IEEE Trans Power Syst 2010;25(2):667–76. http:// [32] Rasmussen CE, Williams CK. Gaussian processes for machine learning. Cambridge:
[Link]/10.1109/TPWRS.2009.2033277. MIT Press; 2006.
[5] Lee CM, Ko CN. Short-term load forecasting using lifting scheme and arima models. [33] Stein ML. Interpolation of spatial data: some theory for kriging. Springer Science &
Expert Syst Appl 2011;38(5):5902–11. Business Media; 2012.
[6] Bozkurt ÖÖ, Biricik G, Tayşi ZC. Artificial neural network and sarima based models [34] Duvenaud D, Lloyd JR, Grosse R, Tenenbaum JB, Ghahramani Z. Structure dis-
for power load forecasting in turkish electricity market. PloS One covery in nonparametric regression through compositional kernel search. Available
2017;12(4):e0175915. from: arXiv preprint arXiv:1302.4922.
[7] Chen B-J, Chang M-W, lin C-J. Load forecasting using support vector machines: a [35] Chiang TC, Li J. Stock returns and risk: evidence from quantile. J Risk Financ
study on eunite competition 2001. IEEE Trans Power Syst 2004;19(4):1821–30. Manage 2012;5(1):20–58.
[Link] [36] Maciejowska K, Nowotarski J, Weron R. Probabilistic forecasting of electricity spot
[8] Elattar EE, Goulermas J, Wu QH. Electric load forecasting based on locally weighted prices using factor quantile regression averaging. Int J Forecast 2016;32(3):957–65.
support vector regression. IEEE Trans Syst, Man, Cybernet, Part C (Appl Rev) [37] Yang J, Meng X, Mahoney MW. Quantile regression for large-scale applications.
2010;40(4):438–47. [Link] Siam J Sci Comput 2013;36(5):2013.
[9] Fiot JB, Dinuzzo F. Electricity demand forecasting by multi-task learning. IEEE [38] Yu K, Moyeed RA. Bayesian quantile regression. Stat Probab Lett
Trans Smart Grid 2016;PP(99). [Link] 2001;54(4):437–47.
p. 1–1. [39] Takeuchi I, Le QV, Sears TD, Smola AJ. Nonparametric quantile estimation. J Mach
[10] Sheng H, Xiao J, Cheng Y, Ni Q, Wang S. Short-term solar power forecasting based Learn Res 2006;7(July):1231–64.
on weighted Gaussian process regression. IEEE Trans Ind Electron 2017;PP(99). [40] Minka TP. Expectation propagation for approximate Bayesian inference.
[Link] p. 1–1. Proceedings of the seventeenth conference on uncertainty in artificial intelligence.
[11] Ding N, Benoit C, Foggia G, Bsanger Y, Wurtz F. Neural network-based model design Morgan Kaufmann Publishers Inc.; 2001. p. 362–9.
for short-term load forecast in distribution systems. IEEE Trans Power Syst [41] Turlach BA, et al. Bandwidth selection in kernel density estimation: a review.
2016;31(1):72–81. [Link] Université catholique de Louvain Louvain-la-Neuve; 1993.
[12] Potter CW, Negnevitsky M. Very short-term wind forecasting for tasmanian power [42] Bashtannyk DM, Hyndman RJ. Bandwidth selection for kernel conditional density
generation. IEEE Trans Power Syst 2006;21(2):965–72. [Link] estimation. Comput Stat Data Anal 2001;36(3):279–98.
TPWRS.2006.873421. [43] Scott DW. Multivariate density estimation: theory, practice, and visualization. John
[13] Melin P, Soto J, Castillo O, Soria J. A new approach for time series prediction using Wiley & Sons; 2015.
ensembles of anfis models. Expert Syst Appl Int J 2012;39(3):3494–506. [44] Khosravi A, Nahavandi S, Creighton D. Construction of optimal prediction intervals
[14] Amjady N, Keynia F, Zareipour H. Wind power prediction by a new forecast engine for load forecasting problems. IEEE Trans Power Syst 2010;25(3):1496–503. http://
composed of modified hybrid neural network and enhanced particle swarm opti- [Link]/10.1109/TPWRS.2010.2042309.
mization. IEEE Trans Sustain Energy 2011;2(3):265–76. [Link] [45] Khosravi A, Nahavandi S, Creighton D. Prediction interval construction and opti-
1109/TSTE.2011.2114680. mization for adaptive neurofuzzy inference systems. IEEE Trans Fuzzy Syst
[15] Lloyd JR. Gefcom2012 hierarchical load forecasting: gradient boosting machines 2011;19(5):983–8. [Link]
and Gaussian processes. Int J Forecast 2014;30(2):369–74. [46] Khosravi A, Nahavandi S, Creighton D, Atiya AF. Lower upper bound estimation
[16] Quan H, Srinivasan D, Khosravi A. Short-term load and wind power forecasting method for construction of neural network-based prediction intervals. IEEE Trans
using neural network-based prediction intervals. IEEE Trans Neural Networks Learn Neural Networks 2011;22(3):337–46. [Link]
Syst 2014;25(2):303–15. [Link] 2096824.
[17] Hu Z, Bao Y, Chiong R, Xiong T. Mid-term interval load forecasting using multi- [47] Pjm. < [Link] > .
output support vector regression with a memetic algorithm for feature selection. [48] Vanhatalo J, Riihimäki J, Hartikainen J, Jylänki P, Tolvanen V, Vehtari A. Gpstuff:
Energy 2015;84:419–31. [Link] Bayesian modeling with Gaussian processes. J Mach Learn Res
[18] Khosravi A, Nahavandi S, Creighton D. Construction of optimal prediction intervals 2013;14(April):1175–9.
for load forecasting problems. IEEE Trans Power Syst 2010;25(3):1496–503. http:// [49] Gal Y, Ghahramani Z. Dropout as a Bayesian approximation: representing model
[Link]/10.1109/TPWRS.2010.2042309. uncertainty in deep learning. In: International conference on machine learning;
[19] Khosravi A, Nahavandi S, Creighton D, Srinivasan D. Interval type-2 fuzzy logic 2016. p. 1050–9.
systems for load forecasting: a comparative study. IEEE Trans Power Syst

509

You might also like