Electronics 2025
Electronics 2025
1 Electrical Power and Renewable Energy, College of Engineering and Technology, University of Doha for
Science and Technology, Doha 24449, Qatar
2 Qatar Environment and Energy Research Institute, Hamad Bin Khalifa University, Qatar Foundation,
Doha 34110, Qatar; asanfilippo@[Link] (A.P.S.); dbachour@[Link] (D.B.);
dastudillo@[Link] (D.P.-A.)
* Correspondence: [Link]@[Link] or qat7@[Link]
Abstract: Solar energy is an inherently variable energy resource, and the ensuing un-
certainty in matching energy demand presents a challenge in its operational use as an
alternative energy source. The factors influencing solar energy power generation include
geographic location, solar radiation, weather conditions, and solar panel performance.
Solar energy forecasting is performed using machine learning for better accuracy and
performance. Due to the variability of solar energy, the forecasting window is an important
aspect of solar energy forecasting that must be integrated into any machine learning model.
This study evaluates the suitability of selected machine learning (ML) models comprising
Linear Regression, Decision Tree, Random Forest and XGBoost, which have been proven
to be effective at forecasting. The data forecasting horizon used was a 24-h window in
steps of 30 min. We focused on the first 30-min, 3-h, 6-h, 12-h, and 24-h windows to gain
an appreciation of the impact of forecasting duration on the accuracy of prediction using
the selected machine learning algorithms. The study results show that Random Forest
outperformed all other tested algorithms. It recorded the best values in all evaluation
metrics: an average mean absolute error of 0.13, mean absolute percentage error of 0.6,
root-mean-square error of 0.28 and R-squared value of 0.89.
will respond
Electronics 2024, 13, x FOR PEER REVIEW to the already variable energy demand. This challenge can be addressed 2 ofby
34
developing a solar energy forecasting model, which will be beneficial in several ways [4].
First, the model will ensure a reliable control system. This will maintain grid stability
and optimize
optimize operating
operating costscosts by committing
by committing appropriate
appropriate amounts
amounts of solar
of solar energy
energy through
through co-
co-generation strategies
generation strategies [5].Second,
[5]. Second,it itwill
willhelp
helpininthe
the effective
effective integration
integration of
of solar
solar energy
energy
and
andstorage
storagetotooptimize
optimizeenergy
energyresource
resourceuse use[6].
[6].Third,
Third,the
theforecasting
forecastingmodel
modelwillwilladdress
address
the
thedemand
demandresponse.
[Link]
maximizes the
theuseuseof of
solar energy
solar in times
energy of peak
in times consumption
of peak consump-
tion
to to reduce
reduce stressstress on power
on the the power gridgrid
andand increase
increase energy
energy efficiency
efficiency [7].The
[7]. Theforecasting
forecasting
model will
model will help plants implement
implementdynamic
dynamicelectricity
electricitypricing.
[Link]
Dynamic pricing
pricingis vital for
is vital
adjusting
for electricity
adjusting sales
electricity to energy
sales demand
to energy demand thatthat
varies overover
varies [Link].
The solar energy
The solar fore-
energy
casting model
forecasting willwill
model enable
enableinstalled plants
installed to optimise
plants PV plant
to optimise performance
PV plant performance[8]. This con-
[8]. This
tributes to increasing the productivity and longevity of solar PV plants.
contributes to increasing the productivity and longevity of solar PV plants. By developing By developing
and implementing an effective forecasting model, PV plants will avoid the injection of
and implementing an effective forecasting model, PV plants will avoid the injection of ex-
excessive solar power into the grid during times of low demand and high PV productivity
cessive solar power into the grid during times of low demand and high PV productivity [9].
[9]. The forecasting model will help with the protection of On Load Tap Changers (OLTC),
The forecasting model will help with the protection of On Load Tap Changers (OLTC),
which regulates the voltage ratio in sub-station transformers.
which regulates the voltage ratio in sub-station transformers.
1.1. Review of Related Work
1.1. Review of Related Work
According to Mellit et al. [10], four major PV forecasting methods have dominated
According
the field to Mellit
in the period et al.
from [10],These
2010. four major PV forecasting
are physical methods,methods have
statistical dominated
methods, the
artificial
field in the period
intelligence methodsfrom
and2010. Thesehybrid
emergent are physical methods,
methods, statistical
as presented methods,
in Figure artificial
1 below.
intelligence methods and emergent hybrid methods, as presented in Figure 1 below.
effectiveness of the evaluated machine learning model. Further to MAE and RMSE, we
included the mean absolute percentage error (MAPE) and R-squared measures to assess
the prediction reliability of each of the evaluated machine learning models. This article
contributes to revolutionizing the design and development of solar-based energy projects
by improving forecasting methods and narrowing down the options for the best algorithms
for developing PV forecasting models.
y = n + mx (1)
Wherein:
■ y—values of the (dependent) second dataset
■ x—values of the (independent) first dataset
■ n—y-intercept of the line
■ m—slope of the line
is mapped to the right as an outcome. The decision tree structure can be summarized as
Electronics 2024, 13, x FOR PEER REVIEWconsisting
of a root node (the entire dataset), internal nodes (decisions or tests),4branches
of 34
Figure 2. Decision
Figure tree tree
2. Decision algorithm structure
algorithm [17].[17].
structure
TheThe
testtest applied
applied represents
represents thethe mostappropriate
most appropriateattribute
attributeofof the
the target dataset
datasetthat
thatwill
willlead
leadtotothe
the optimal
optimal dichotomy
dichotomy of the
of the dataset.
dataset. TheThe selection
selection processprocess employs
employs ap-
approaches
proaches like Gini impurity, entropy, and information gain. Gini impurity is a
like Gini impurity, entropy, and information gain. Gini impurity is a metric which evaluates metric
which
the evaluates
probabilitytheofprobability
occurrenceofofoccurrence of an
an incorrect incorrect classification
classification of a new data of point
a newthat
datawas
point that wasclassified.
randomly randomlyItclassified. It is determined
is determined based on
based on Equation (2)Equation
below. (2) below.
𝐺𝑖𝑛𝑖 = 1 − 𝑝𝑖 (2)
Gini = 1 − ∑in=1 ( pi )2 (2)
The entropy metric evaluates the level of uncertainty in the data set and is calculated
accordingThe
to entropy
Equationmetric evaluates the level of uncertainty in the data set and is calculated
(3) below.
according to Equation (3) below.
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 = − 𝑝𝑖𝑙𝑜𝑔 𝑝𝑖 (3)
Entropy = −∑in=1 pilog2 ( pi ) (3)
The information gain splitting method evaluates the reduction in entropy or Gini im-
purity after
Theainformation
dataset has been split based
gain splitting on an evaluates
method attribute. The formula for
the reduction inimplementation
entropy or Gini im-
of information gain is as shown in Equation (4) below.
purity after a dataset has been split based on an attribute. The formula for implementation
of information gain is as=shown in Equation (4) |
|𝐷below.
𝐼𝑛𝑓𝑜. 𝐺𝑎𝑖𝑛 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 − ∗ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝐷 (4)
|𝐷|
In some cases, the | Di |
Indichotomisation
f o. Gain = Entropymay−lead
0 ∑in=1to too little data in
∗ Entropy ( Dthe
i)
given subtree, (4)
| D |
and this results in overfitting. Decision trees, therefore, tend to have a preference for di-
chotomies that culminate in as few branches as possible. If the dichotomy leads to features
In some cases, the dichotomisation may lead to too little data in the given subtree,
of less significance, a process known as pruning is conducted. When a decision tree adopts
and this results in overfitting. Decision trees, therefore, tend to have a preference for
an ensemble approach to ensure the accuracy of the classification process, it changes to a
dichotomies
random that culminate
forest algorithm in as few
[17], described in branches as possible.
Section 1.5.3 below. If the dichotomy leads to
features of less significance, a process known as pruning is conducted. When a decision
tree
1.5.3. adoptsForest
Random an ensemble approach to ensure the accuracy of the classification process, it
changes
Random toforests
a randomtakeforest
note algorithm [17],
of the factor described
that no singleinmodel
Sectioncan
1.5.3
fit below.
all aspects of the
problem to be modelled. Therefore, it encompasses a variety of modelling techniques,
1.5.3. Random Forest
each applied at the appropriate stage as deemed suitable to give better results. As in some
Random discussed
of the previously forests take note of the
methods, factor
it starts thatan
with noinput
single model
feed that can
thenfitfollows
all aspects of the
a deci-
sion-tree-like
problem toanalysis and, at Therefore,
be modelled. each stage,itanencompasses
appropriate atechnique
variety ofis modelling
applied to techniques,
give an
outcome, which forms
each applied at thean input into the
appropriate stagenext
as stage.
deemed It issuitable
best described
to give as a collaboration
better results. As in
amongsome decision trees giving
of the previously a single output.
discussed methods, Random
it startsforest
with anbuilds
inputupon
feedthe
thatweakness of a
then follows
its parent algorithm, decision trees, which have inherent overfitting. It prevents
decision-tree-like analysis and, at each stage, an appropriate technique is applied to give overfitting
by introducing
an outcome,randomness
which formsin anthe construction
input of the
into the next decision
stage. trees.
It is best Becauseasrandom
described for-
a collaboration
est effectively deals with overfitting and handles the issues of missing data, it presents
itself as one of the most effective methods for the forecasting process.
Electronics 2025, 14, 866 5 of 22
among decision trees giving a single output. Random forest builds upon the weakness of
its parent algorithm, decision trees, which have inherent overfitting. It prevents overfitting
by introducing randomness in the construction of the decision trees. Because random forest
effectively deals with overfitting and handles the issues of missing data, it presents itself as
one of the most effective methods for the forecasting process.
1.5.4. XGBoost
Extreme gradient boosting (XGBoost) is a gradient-boosted machine learning model.
XGBoost is one of the ensemble machine learning algorithms that is renowned for the
efficient treatment of missing values in a dataset.
2.1.2. R-Squared
The R2 value is another measure that is used to evaluate an algorithm in predicting the
expected outcome. The value of R2 in a model is determined by Equation (6). The numerator
is the sum of the squares of residuals. The denominator of the function represents the
total sum of the squares. A value of R2 that is closer to 1 signifies greater accuracy of the
regression model.
2
∑(yi − ŷi )
R2 = 1 − 2
(6)
∑(yi − ŷ)
1 n At − Ft
n t∑
MAPE = 100 (8)
=1 At
where:
At —Actual value
Ft —Forecast value
n—Number of fitted points
3. Results
3.1. Linear Regression Model
Table 1 below provides the results of MAE, mean MAPE, RMSE and R-squared values
for the linear regression model. The evaluation metrics were analysed for each of the five
forecast windows.
To appreciate the impact of the forecast horizon (window) on the quality of predictions
for linear regression, we developed a regression plot based on the predicted value. We
chose the 3-h window and the 24-h window. Figure 3 presents the regression plot for
Electronics 2024, 13, x FOR PEER REVIEW 7 of 34
the 3-h window and shows that the points are less symmetrically distributed along the
regression line.
Figure 4 shows the plot for the 24-h window for the linear regression model. The 24-h
Figure 4 shows the plot for the 24-hour window for the linear regression model. The
forecast window shows an improved distribution of points along the regression line in the
24-hour forecast window shows an improved distribution of points along the regression
linear
line inregression
the linear model.
regression model.
Figure
[Link]
Linearregression
regressionbased
basedon
onaa24-hour period.
24-h period.
Figure
Figure55 shows
showsMAEMAEvalues
valuesacross
acrossthe
theforecast
forecastwindows
windowsforforthe
the linear
linear regression
regression
model.
[Link]
showsaahigher
higherMAE
MAEatatthe
the3-h
3-hwindow,
window,which
whichdeclines
declinesto
tothe
thelowest
lowestvalue
valueof
of
0.2434 at the 12-h forecast window.
0.2434 at the 12-h forecast window.
Electronics 2024, 13, x FOR PEER REVIEW 9 of 34
Figure5.
Figure 5. Linear
Linear regression
regressiongraph
graphof
ofMAE
MAEacross
acrossthe
theforecast
forecastwindows.
windows.
Electronics 2024, 13, x FOR PEER REVIEW 10 of 3
AA similar
similartrend
trendinin
thethe values
values of MAPE
of MAPE canobserved
can be be observed
for thefor the regression
linear linear regression
model;
model; Figure 6. There is a general reduction in the value of MAPE with increases
Figure 6. There is a general reduction in the value of MAPE with increases in the span in the
of
span of the forecast window in the linear regression
the forecast window in the linear regression model. model.
[Link]
Figure Linear regression
regression graph
graph of MAPE
of MAPE acrossacross the forecast
the forecast windows.
windows.
Figure
Figure7 represents thethe
7 represents RMSE
RMSEvalue for the
value forlinear regression
the linear model. model.
regression It depicts
It adepicts
sine a sin
profile
profilefor
forthe RMSE
the RMSEvalues across
values the forecast
across windows.
the forecast windows.
Electronics 2024, 13, x FOR PEER REVIEW 11 of 34
Figure7.
Electronics 2024, 13, x FOR PEER REVIEW
Figure 7. Linear
Linear regression
regressiongraph
graphof
ofRMSE
RMSEacross
acrossthe
theforecast
forecastwindows.
windows. 12 of 34
The R-squared
The R-squaredtrend
trendfor
forthe
thelinear
linearregression
regressionmodel
modelisispresented
presentedin
inFigure
[Link] shows
shows
an irregular profile for the values of R-squared across the forecast windows.
an irregular profile for the values of R-squared across the forecast windows.
Figure 8.
Figure 8. Linear
Linear regression
regression graph
graph of
of R-squared
R-squaredacross
acrossthe
theforecast
forecastwindows.
windows.
3.2. Results
3.2. Results of
of Decision
Decision Tree
TreeModel
Model
Table 22 shows
Table shows the
the evaluation
evaluation results
results for
for the
the decision
decision tree
tree model
model based
based on
on the
the five
five
forecasting windows of 0.5, 3, 6, 12 and 24 h respectively.
forecasting windows of 0.5, 3, 6, 12 and 24 h respectively.
Table2.
Table 2. Results
Results of
of decision
decision tree
treemodel.
model.
Metric
Metric 0.5-h
0.5-h 3-h
3-h 6-h
6-h 12-h
12-h 24-h
24-h Average
Average
MAE 0.0431 0.0445 0.0856 0.2961 0.2927 0.1524
MAE 0.0431 0.0445 0.0856 0.2961 0.2927 0.1524
MAPE 0.4021 0.2812 0.3376 1.6362 1.0623 0.7439
MAPE 0.4021 0.2812 0.3376 1.6362 1.0623 0.7439
RMSE 0.1410 0.1429 0.2380 0.5878 0.6446 0.3509
RMSE
R2 0.1410
0.9803 0.1429
0.9793 0.2380
0.9450 0.5878
0.6119 0.6446
0.6297 0.3509
0.8292
R2 0.9803 0.9793 0.9450 0.6119 0.6297 0.8292
For the decision tree prediction model, the visual results for a 3-h forecast window
are presented in Figure 9. The results show that the data points are symmetrically distrib-
uted along the regression line.
Electronics 2024,
Electronics 14, x866
2025, 13, FOR PEER REVIEW 1310ofof 34
22
For the decision tree prediction model, the visual results for a 3-h forecast window are
presented in Figure 9. The results show that the data points are symmetrically distributed
along the regression line.
Figure 10 shows results for a 24-h forecast window for the decision tree regression plot.
Figure 10 shows results for a 24-hour forecast window for the decision tree regression
The outcome maintains
plot. The outcome a symmetric
maintains distribution
a symmetric of the
distribution datadata
of the points along
points thethe
along regression
regres-
line.
sion line.
Figure 10.
Figure Decision tree
10. Decision tree regression
regression based
based on
on aa 24-h
24-h period.
period.
Figure 11
Figure 11 represents
represents the
the graph
graph of
of MAE
MAE values across the
values across the forecast
forecast windows for the
windows for the
decision tree algorithm. The curve depicts a sigmoid shape with a peak at the
decision tree algorithm. The curve depicts a sigmoid shape with a peak at the 12-hour12-h
forecast window.
forecast window.
Electronics 2024, 13, x FOR PEER REVIEW 15 of 34
Figure11.
Figure [Link]
Decisiontree
treegraph
graphof
ofMAE
MAEacross
acrossthe
theforecast
forecastwindows.
windows.
Electronics 2024, 13, x FOR PEER REVIEW 16 of 34
The graph
The graph of
of MAPE
MAPE values
values for
for the
the decision
decision tree
tree algorithm
algorithm across
across the
the forecast
forecast win-
win-
dows is shown in Figure 12. The graph reflects a sigmoid shape revealed
dows is shown in Figure 12. The graph reflects a sigmoid shape revealed by the MAE by the MAE
profile above.
profile above.
Figure12.
Figure [Link]
Decisiontree
treegraph
graphof
ofMAPE
MAPEacross
acrossthe
theforecast
forecastwindows.
windows.
Thecorresponding
The correspondingsigmoid
sigmoidcurve
curvefor
forthe
thevalues
valuesof
ofRMSE
RMSEisisrepresented
representedby
bythe
thegraph
graph
of RMSE values for the decision tree across the forecasting windows in Figure 13.
of RMSE values for the decision tree across the forecasting windows in Figure 13.
Electronics 2024, 13, x FOR PEER REVIEW 17 of 34
Figure13.
Figure [Link]
Decisiontree
treegraph
graphof
ofRMSE
RMSEacross
acrossthe
theforecast
forecastwindows.
windows.
Electronics 2024, 13, x FOR PEER REVIEW 18 of 34
Figure14
Figure 14represents
representsthethegraph
graphofofR-squared
R-squaredvalues
valuesacross
acrossthe
theforecast
forecastwindows
windowsforfor
the decision tree. It is evident that the graph is a reverse sigmoid curve with a peak at
the decision tree. It is evident that the graph is a reverse sigmoid curve with a peak at the the
6-h forecast window.
6-h forecast window.
Figure14.
Figure 14. Decision
Decisiontree
treegraph
graphof
ofR-squared
R-squaredacross
acrossthe
theforecast
forecastwindows.
windows.
3.3.
3.3. Random
Random Forest
ForestResult
Result
Table 3 displays
Table displays the
theMAE,
MAE,RMSE
RMSEand
andR-squared results
R-squared forfor
results random forest
random model
forest pre-
model
diction analysis.
prediction analysis.
[Link]
Table Resultsof
ofrandom
randomforest
forestmodel.
model.
Metric
Metric 0.5-h
0.5-h 3-h
3-h 6-h
6-h 12-h
12-h 24-h
24-h Average
Average
MAE
MAE
0.0318
0.0318
0.0347
0.0347
0.0673
0.0673
0.2392
0.2392
0.2741
0.2741
0.1294
0.1294
MAPE 0.3031 0.3647 0.2315 1.1180 1.0212 0.6077
MAPE 0.3031 0.3647 0.2315 1.1180 1.0212 0.6077
RMSE 0.1053 0.1162 0.1951 0.4164 0.54996 0.27659
RMSE 0.1053 0.1162 0.1951 0.4164 0.54996 0.27659
R2 0.9890 0.9863 0.9631 0.8052 0.7305 0.8948
R 2 0.9890 0.9863 0.9631 0.8052 0.7305 0.8948
The regression plot for the evaluation of the random forest model is presented in
Figure 15, showing a 3-h forecast window. As with the decision tree model, the data points
for the random forest model are distributed symmetrically along the regression plot.
Electronics 2025, 14, 866 13 of 22
The regression plot for the evaluation of the random forest model is presented in
Figure 15, showing a 3-h forecast window. As with the decision tree model, the data points
for the random forest model are distributed symmetrically along the regression plot.
Figure 16 shows reduced point density at the 24-h prediction window. However, the
Figure 16 shows reduced point density at the 24-hour prediction window. However,
random
the forest
random regression
forest maintains
regression a symmetric
maintains distribution
a symmetric of the
distribution of points.
the points.
Figure17.
Figure [Link]
RandomForest
Forestgraph
graphof
ofMAE
MAEacross
acrossthe
theforecast
forecastwindows.
windows.
Electronics 2024, 13, x FOR PEER REVIEW 22 of 34
Figure18
Figure 18shows
showsthe theMAPE
MAPEcurvecurveacross
acrossthe
theforecast
forecastwindows
windowsfor
forthe
therandom
randomforest
forest
model. The
model. The graph
graph isislike
likethat
thatof
ofthe
thepreceding
precedingmachine
machinelearning
learningmodel,
model,albeit
albeitwith
withsome
some
level of sensitivity at the 3-h window.
level of sensitivity at the 3-h window.
Figure18.
Figure [Link]
RandomForest
Forestgraph
graphof
ofMAPE
MAPEacross
acrossthe
theforecast
forecastwindows.
windows.
Asigmoid
A sigmoidcurve
curvefor
forthe
theRMSE
RMSEvalues,
values, corresponding
correspondingto to the
the MAE
MAE and
and MAPE
MAPE values,
values,
for the random forest model, is shown in Figure 19. Unlike the decision tree model,
for the random forest model, is shown in Figure 19. Unlike the decision tree model, the the
RMSEcurve
RMSE curvehas
hasaatransition
transitionphase
phaseat
atthe
the12-h
12-hinstead
insteadof
ofthe
thepeak.
peak.
Electronics 2024, 13, x FOR PEER REVIEW 23 of 34
Figure19.
Figure [Link]
RandomForest
Forestgraph
graphof
ofRMSE
RMSEacross
acrossthe
theforecast
forecastwindows.
windows.
Electronics 2024, 13, x FOR PEER REVIEW 24 of 34
Figure20
Figure 20presents
presentsthe
theR-squared
R-squaredvalues
valuesfor
forthe
therandom
randomforest
forestmodel.
[Link]
Comparedto to
the corresponding curve for the decision tree model, the random forest has a less promi-
the corresponding curve for the decision tree model, the random forest has a less prominent
nent at
peak peak
the at
3-hthe 3-h window
window on theon the reverse
reverse sigmoid
sigmoid [Link].
Figure20.
Figure [Link]
RandomForest
Forestgraph
graphof
ofR-squared
R-squaredacross
acrossthe
theforecast
forecastwindows.
windows.
3.4.
[Link]
XGBoostResults
Results
Table44shows
Table showsthe
theevaluation
evaluationresults
resultsof
ofMAE,
MAE,RMSE
RMSEand andR-squared
R-squaredforforthe
theXGBoost
XGBoost
algorithm. The
algorithm. The analysis was performed for each of the five data forecast windows
was performed for each of the five data forecast windows from 30
from
min
30 to to
min 2424
hours.
h.
[Link]
Table ResultsofofXGBoost
XGBoostmodel.
model.
Metric
Metric 0.5-h
0.5-h 3-h
3-h 6-h
6-h 12-h
12-h 24-h
24-h Average
Average
MAE
MAE 0.0350
0.0350 0.0441
0.0441 0.0738
0.0738 0.2060
0.2060 0.2893
0.2893 0.1296
0.1296
MAPE
MAPE
0.3499
0.3499
0.4805
0.4805
0.2372
0.2372
0.9386
0.9386
0.9065
0.9065
0.5825
0.5825
RMSE 0.1051 0.1162 0.1939 0.3616 0.6272 0.2808
RMSE 0.1051 0.1162 0.1939 0.3616 0.6272 0.2808
R2 0.9891 0.9863 0.9635 0.8531 0.6465 0.8877
R2 0.9891 0.9863 0.9635 0.8531 0.6465 0.8877
Figure 21, below, presents the regression plot for a 3-h forecast window for the
XGBoost model. The graph shows a strong linear relationship based on this model as de-
picted by the symmetric distribution of data points along the regression line.
Electronics 2025, 14, 866 16 of 22
Figure 21, below, presents the regression plot for a 3-h forecast window for the XGBoost
model. The graph shows a strong linear relationship based on this model as depicted by
the symmetric distribution of data points along the regression line.
Figure
Figure 22
22 provides
provides results
results for
for the
the 24-h forecast
24-hour window
forecast for the
window for XGBoost regression
the XGBoost regres-
analysis.
sion analysis.
Figure 23 presents the MAE values for XGBoost over all forecast windows. The trend
Figure 23 presents the MAE values for XGBoost over all forecast windows. The trend
is
is aa curve
curve approaching
approaching anan exponential
exponential curve
curve with
with growth
growth starting
starting at
at the
the 6-h
6-h window.
window.
Electronics 2024, 13, x FOR PEER REVIEW 27 of 34
Figure23.
Electronics 2024, 13, x FOR PEER REVIEW
Figure [Link]
XGBoostgraph
graphof
ofMAE
MAEacross
acrossthe
theforecast
forecastwindows.
windows. 28 of 34
Thecurve
The curveofofthethe MAPE
MAPE values
values across
across the the forecasting
forecasting windows
windows is a sigmoid
is a sigmoid curve curve
with
with regularity at the 3-h forecasting window, as shown
regularity at the 3-h forecasting window, as shown in Figure [Link] Figure 24.
Figure24.
Figure [Link]
XGBoostgraph
graphof
ofMAPE
MAPEacross
acrossthe
theforecast
forecastwindows.
windows.
The RMSE
The RMSE curve,
curve, Figure
Figure 25,
25, for
forXGBoost
XGBoostacross
acrossthe
theforecast windows
forecast depicts
windows an an
depicts ex-
ponential curve, unlike the decision tree and random forest models. An exponential
exponential curve, unlike the decision tree and random forest models. An exponential in-
crease ininthe
increase theRMSE
RMSEvalue
valuestarts
startsatatthe
the3-h
3-hforecast
forecastwindow.
window.
Electronics 2024, 13, x FOR PEER REVIEW 29 of 34
Figure25.
Figure [Link]
XGBoostgraph
graphof
ofRMSE
RMSEacross
acrossthe
theforecast
forecastwindows.
windows.
Electronics 2024, 13, x FOR PEER REVIEW 30 of 34
Correspondingly, the
Correspondingly, the graph
graph of
ofR-squared
R-squaredvalues forfor
values XGBoost across
XGBoost the forecast
across win-
the forecast
dows shows a reverse exponential curve with its maximum value at the 3-h window;
windows shows a reverse exponential curve with its maximum value at the 3-h window; Fig-
ure 26.
Figure 26.
Figure26.
Figure 26. XGBoost
XGBoostgraph
graphof
ofR-Squared
R-Squaredacross
acrossthe
theforecast
forecastwindows.
windows.
4. Discussion
4. Discussion
Theresults
The resultsshow
showthat
thatthe
thehighest
highestaccuracy
accuracywas
was recorded
recordedat atthe
the30-minute
30-minute prediction
prediction
window, with the highest R-squared measurement at 0.9890 and the lowest
window, with the highest R-squared measurement at 0.9890 and the lowest at 0.8745. The at 0.8745. The
lowestprediction
lowest predictionaccuracy
accuracywas
wasrecorded
recordedforfor the
the 24-hour
24-h forecastforecast
window, window, with random
with random forest
forest scoring 0.7305 and decision tree scoring the lowest, 0.6297, in this category. Predict-
scoring 0.7305 and decision tree scoring the lowest, 0.6297, in this category. Predictability
ability becomes a challenging undertaking and cannot remain reliable over a longer fore-
becomes a challenging undertaking and cannot remain reliable over a longer forecasting
casting period. Predictions within half an hour can be extremely accurate but may not be
period. Predictions within half an hour can be extremely accurate but may not be of much
of much use to the intended application in the solar energy industry and energy produc-
use
tiontoand
thedistribution.
intended application in the solar energy industry and energy production and
distribution.
The linear regression model’s performance was unpredictable over the five forecast
The linear
windows. regression model’s of
The underperformance performance was unpredictable
linear regression is seen here inover theoffive
terms forecast
high values
windows.
of MAE (up Thetounderperformance
0.49) and RMSE (up of to
linear
0.58)regression is seen
accompanied byhere
low in termsofofR-squared,
values high valuesas
low
of as 0.2
MAE (upinto
the0.49)
outlier
andinstance.
RMSE (up Despite theaccompanied
to 0.58) poor model outcome, the algorithm
by low values depicted
of R-squared, as
considerable forecast reliability in the 24-hour window with an R-squared value ap-
proaching 0.7 (0.69699), which is similar to the results presented in [22].
Decision tree was one of the best-performing algorithms overall forecast categories.
It showed consistency, recording a high of 0.98 in the half-hour window and 0.6297 in the
24-hour forecast window. However, performing the decision tree suddenly dropped in
Electronics 2025, 14, 866 19 of 22
low as 0.2 in the outlier instance. Despite the poor model outcome, the algorithm depicted
considerable forecast reliability in the 24-h window with an R-squared value approaching
0.7 (0.69699), which is similar to the results presented in [22].
Decision tree was one of the best-performing algorithms overall forecast categories. It
showed consistency, recording a high of 0.98 in the half-hour window and 0.6297 in the
24-h forecast window. However, performing the decision tree suddenly dropped in the 12-h
window to 0.6119 from 0.9450 in the 6-h window. Generally, predictions by the decision
tree model returned low values of MAE (to the tune of 0.0431) and RMSE (the lowest being
0.141).
Random forest was the most outstanding machine-learning algorithm in this study.
The values were consistent and decreased gradually across the forecast window tests. This
underscores the strength of the random forest algorithm in modelling solar radiation given
the input data. The algorithm had the lowest mean absolute error to the tune of 0.0318 and
an R2 of 0.989. However, in forecasting the 24-h window period, it did not outperform the
decision tree in terms of point–cluster symmetry. This does not make it less superior to the
decision tree in this test.
The last machine model analysed was XGBoost, which also depicted consistency across
the set of test windows. XGBoost maintained high values for the test metrics above all the
tested algorithms, presenting it as one of the most effective forecasting models [23,24].
Random forest outperformed all the tested algorithms in this research. It recorded
the best values on all the evaluation metrics: an average mean absolute error of 0.1294,
root-mean-square error of 0.27659 and R-squared value of 0.8948. These summary results
are presented in Table 5 below.
Random forest and XGBoost models showed higher prediction accuracies compared to
the models. This may be attributed to their ensemble nature, which enables these models to
capture complex patterns within the dataset. Both XGBoost and random forest share a basic
prediction approach. Both algorithms are founded on the fact that data can be complex, and
a single prediction model cannot satisfactorily model the data. Therefore, both algorithms
adopt a tree-like analysis approach, which makes them robust in the prediction of data.
Figure 27 provides the average MAE values for the evaluated machine learning
models. Random forest and XGBoost recorded the lowest MAE values, indicating them as
potentially reliable forecasting algorithms.
Figures 28 and 29 echo the outcome of the MAE metric, presenting random forest and
XGBoost as the models with the lowest error among the studied group of algorithms.
To conclude that random forest and XGBoost are the most reliable machine learning
algorithms suited for solar power forecasting, Figure 30 shows that both random forest and
XGBoost recorded the highest average values of R-squared at 0.89.
both algorithms adopt a tree-like analysis approach, which makes them robust in the pre-
diction of data.
Figure 27 provides the average MAE values for the evaluated machine learning mod-
Electronics 2025, 14, 866
els. Random forest and XGBoost recorded the lowest MAE values, indicating them 20 asofpo-
22
Figures 28 and 29 echo the outcome of the MAE metric, presenting random forest
and XGBoost as the models with the lowest error among the studied group of algorithms.
Figure28.
Figure [Link]
Graphofofaverage
averageMAPE
MAPEvalues
valuesfor
foreach
eachalgorithm.
algorithm.
Figure 28. Graph of average MAPE values for each algorithm.
To conclude that random forest and XGBoost are the most reliable machine learning
Electronics 2025, 14, 866
algorithms suited for solar power forecasting, Figure 30 shows that both random21forest
of 22
and XGBoost recorded the highest average values of R-squared at 0.89.
Figure30.
Figure [Link]
Graphof
ofaverage
averageR-squared
R-squaredvalues
valuesfor
foreach
eachalgorithm.
algorithm.
5. Conclusions
5. Conclusions
Random Forest and XGBoost algorithms provide the most reliable ML models for
forecasting PV power output. The 6-h window provided the best forecast period for all the
models except for the linear regression model. The longer the forecast window, the less
reliable the prediction model became, as seen by the diminishing values of R-squared for
the tested ML models.
The forecasting of solar radiation is most reliable over short window periods, and this
can be attributed to the sensitivity of GHI to changing weather conditions. To develop a
regression model, one needs to understand the output variable and the input variable. The
forecast for solar PV power output, which is represented by trends in global horizontal
radiation, is susceptible to changes in weather conditions. This is the underlying reason the
prediction provided reliable results over short window periods.
The study findings present an opportunity for the solar energy industry in terms of
improving decision-making, efficiency design and implementation of solar energy plants.
It also presents an opportunity to develop software packages that can take the design
variables for any selected location and produce design parameters quickly and efficiently.
This study was limited to solar radiation as a variable affecting solar power output. Future
research work should incorporate other factors affecting solar energy output to develop a
comprehensive machine-learning model.
The application of machine learning models in predicting PV power output requires
a good amount of historical data to draw reliable conclusions. This may lack in target
geographical regions of interest. PV plant establishment requires feasibility studies that
may yield new locations whose meteorological data may be available in times of need. This
presents a limitation but also an opportunity to venture into developing meteorological
databases for the entire globe.
Conflicts of Interest: The authors of this article declare no conflict of interest or situations in which
finances and personal preferences could compromise the research.
Electronics 2025, 14, 866 22 of 22
References
1. Global Solar Atlas. World-Bank, ESMAP and Solargis. Global Solar Attlas. 2024. 2014. Available online: [Link]
info/map (accessed on 1 September 2024).
2. Mangherini, G.; Diolaiti, V.; Bernardoni, P.; Andreoli, A.; Vincenzi, D. Review of Façade Photovoltaic Solutions for Less Energy-
Hungry Buildings. Energies 2023, 16, 6901. [CrossRef]
3. Iea-org. Global-Energy-Review-2021. 2021. Available online: [Link] (accessed
on 1 September 2024).
4. Impram, S.; Nese, S.V.; Oral, B. Challenges of renewable energy penetration on power system flexibility: A survey. Energy Strategy
Rev. 2020, 31, 100539. [CrossRef]
5. Ahmed, R. A review and evaluation of the state-of-the-art in PV solar power forecasting: Techniques and optimization. Renew.
Sustain. Energy Rev. 2020, 124, 109792. [CrossRef]
6. Kabir, M. Coordinated control of grid-connected photovoltaic reactive power and battery energy storage systems to improve
the voltage profile of a residential distribution feeder. In Proceedings of the IEEE Transactions on Industrial Informatics, Porto
Alegre, Brazil, 27–30 July 2014; pp. 967–977.
7. Trondle, T. Trade-offs between geographic scale, cost, and infrastructure requirements for fully renewable electricity in Europe.
Joule 2020, 4, 1929–1948. [CrossRef] [PubMed]
8. Abdelshafy, A.M.; Hassan, H.; Jurasz, J. Optimal design of a grid-connected desalination plant powered by renewable energy
resources using a hybrid PSO-GWO approach. Energy Convers. Manag. 2018, 173, 331–347. [CrossRef]
9. Johnson, D.O.; Hassan, K.A. Issues of power quality in electrical systems. Int. J. Energy Power Eng. 2016, 5, 148–154. [CrossRef]
10. Mellit, A.; Massi Pavan, A.; Ogliari, E.; Leva, S.; Lughi, V. Advanced methods for photovoltaic output power forecasting: A
review. Appl. Sci. 2020, 10, 487. [CrossRef]
11. Iweh, C.D. Distributed generation and renewable energy integration into the grid: Prerequisites, push factors, practical options,
issues and merits. Energies 2021, 14, 5375. [CrossRef]
12. Abbassi, R. An efficient salp swarm-inspired algorithm for parameters identification of photovoltaic cell models. Energy Convers.
Manag. 2019, 179, 362–372. [CrossRef]
13. Khare, V.; Nama, S.; Baredar, P. Solar-wind hybrid renewable energy system. Renew. Sustain. Energy Rev. 2015, 10, 23–33.
[CrossRef]
14. Hassan, A. Thermal management and uniform temperature regulation of photovoltaic modules using hybrid change materials-
nanofluids system. Renew. Energy 2020, 145, 282–293. [CrossRef]
15. Abdel-Nasser, M.; Mahmoud, K. Accurate photovoltaic power forecasting models using deep LSTM-RNN. Neural Comput. Appl.
2019, 31, 2727–2740. [CrossRef]
16. Solar Energy Research Institute. Basic Photovoltaic Principles and Methods; Technical Information Office: Washington, DC, USA,
1981.
17. Madhavan, B.L.; Ratnam, M.V. Impact of a solar eclipse on surface radiation and photovoltaic energy. Sol. Energy 2021, 223,
351–366. [CrossRef]
18. IBM. IBM Research. 2024. Available online: [Link] (accessed on 31 August 2024).
19. Khandakar, A. Machine learning-based photovoltaics (PV) power prediction using different environmental parameters of Qatar.
Energies 2014, 12, 2782. [CrossRef]
20. Long, C.N.; Dutton, E.G. BSRN Global Network Recommended QC Tests; V2.0 BSRN Technical Report, BSRN; 2010. Available
online: [Link] (accessed on 1 September 2024).
21. Perez-Astudillo, D.; Bachour, D.; Martin-Pomares, L. Improved quality control protocols on solar radiation measurements. Sol.
Energy 2018, 169, 425–433. [CrossRef]
22. Hao, J.; Ho, T.K. Machine learning made easy: A review of Scikit-learn package in Python programming language. A quarterly
publication sponsored by the American Educational Research Association and the American Statistical Association. J. Educ. Behav.
Stat. 2019, 44, 348–361. [CrossRef]
23. Babatunde, A.A.; Abbasoglu, S. Predictive analysis of photovoltaic plants specific field with the implementation of multiple linear
regression tool. Environ. Prog. Sustain. Energy 2019, 38, 13098. [CrossRef]
24. Wang, J. A short-term photovoltaic power prediction model based on the gradient boost decision tree. Appl. Sci. 2018, 8, 689.
[CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.