Crop Yield Prediction
Crop Yield Prediction
[Link]
ORIGINAL ARTICLE
Received: 26 August 2023 / Accepted: 12 July 2024 / Published online: 16 August 2024
The Author(s) 2024
Abstract
Accurately predicting crop yield is essential for optimizing agricultural practices and ensuring food security. However,
existing approaches often struggle to capture the complex interactions between various environmental factors and crop
growth, leading to suboptimal predictions. Consequently, identifying the most important feature is vital when leveraging
Support Vector Regressor (SVR) for crop yield prediction. In addition, the manual tuning of SVR hyperparameters may not
always offer high accuracy. In this paper, we introduce a novel framework for predicting crop yields that address these
challenges. Our framework integrates a new hybrid feature selection approach with an optimized SVR model to enhance
prediction accuracy efficiently. The proposed framework comprises three phases: preprocessing, hybrid feature selection,
and prediction phases. In preprocessing phase, data normalization is conducted, followed by an application of K-means
clustering in conjunction with the correlation-based filter (CFS) to generate a reduced dataset. Subsequently, in the hybrid
feature selection phase, a novel hybrid FMIG-RFE feature selection approach is proposed. Finally, the prediction phase
introduces an improved variant of Crayfish Optimization Algorithm (COA), named ICOA, which is utilized to optimize the
hyperparameters of SVR model thereby achieving superior prediction accuracy along with the novel hybrid feature
selection approach. Several experiments are conducted to assess and evaluate the performance of the proposed framework.
The results demonstrated the superior performance of the proposed framework over state-of-art approaches. Furthermore,
experimental findings regarding the ICOA optimization algorithm affirm its efficacy in optimizing the hyperparameters of
SVR model, thereby enhancing both prediction accuracy and computational efficiency, surpassing existing algorithms.
Keywords Machine learning Feature selection CFS Crop yield prediction Information gain Hyperparameter
optimization Crayfish optimization Support vector machine Support vector regression
123
20724 Neural Computing and Applications (2024) 36:20723–20750
MedAE Median absolute error agriculture. The output, crop yield, is viewed by machine
MAPE Mean absolute percentage error learning models as a function of many input factors, such
MIG Mutual information gain as soil conditions and weather components.
ML has developed as a powerful tool for obtaining
insights and patterns from data, applied to various appli-
cations and domains including the agriculture environment.
ML models can be generally split into unsupervised and
supervised learning models. For supervised learning, where
1 Introduction models are trained on labeled data to make predictions or
decisions, offers a structured approach to solving predictive
Since farmers are responsible for producing a sizable tasks. In contrast, unsupervised learning explores patterns
proportion of the world’s food supply, agriculture is one of and structures within unlabeled data, often uncovering
the significant areas of social interest. Due to population hidden relationships or clusters. This paper focused on the
growth and food shortages, hunger persists in many nations supervised learning models such as SVR, kNN, RF, etc.,
today. Increasing food production is an attractive strategy for assessing the evaluation of the proposed framework.
to eradicate hunger. The United Nations has set the year The supervised learning allows us to leverage historical
2030 as a target date for accomplishing two of its most datasets containing labeled information about environ-
important goals: increasing food security and decreasing mental factors and corresponding crop yields. This labeled
need. Policymakers in a country need accurate forecasts to data provides a clear signal for the model to learn from,
make informed decisions about food exports and imports. enabling it to capture complex relationships and make
Also, farmers and growers can use yield predictions to accurate predictions. By utilizing supervised learning, we
better plan their budgets and operations. However, due to aim to develop robust models that effectively forecast crop
many complicated factors, accurate predictions of agri- yields, thereby informing optimized agricultural practices
cultural yields are notoriously tricky. The success of a crop and contributing to food security efforts.
relies on several factors, including the weather, the soil, the More recently, Support Vector Machine has attracted
terrain, the presence of pests, the water supply, the plant’s the attention of researchers, practitioners, and statisticians
genetic makeup, the crop’s organization, and more [1, 2]. due to its theoretical and practical superiority, which has
Researchers are making more precise predictions with been learnt to perform better in both classification and
the help of data-driven models [3]. To enhance the accu- regression. SVM was originally applied to classification
racy of data-driven models, Machine Learning (ML) problems [22]. It has since been extended to handle non-
methods are essential [4]. Machine learning enables com- linear regression problems and named Support Vector
puters to gain new abilities without the need for explicit Regression [22]. There are two distinct advantages to the
programming. Agricultural frameworks, whether non- or application of SVR. Firstly, SVR guarantees convergence
linear-based, can be resolved by these procedures, which to optimal solutions using quadratic programming with
thus provide exceptional foresight [5]. Agronomic frame- linear constraints for learning data. Secondly, it is com-
works based on machine learning acquire their strategies putationally efficient in modeling nonlinear relationships
through learning. The operations necessitate much practice using kernel mapping. However, the computational effi-
before they can be executed successfully. Once the training ciency of SVR depends on a couple of hyperparameters
phase is complete, the model will use its assumptions to and factors that directly or indirectly affect finding the
validate the data. optimal solutions. Ordinarily, an exhaustive grid search is
While ML and its realization have made great strides, utilized to explore all the hyperparameter combinations.
there are still some limits to what can be accomplished Cross-validation is conducted to evaluate the prediction
when relying solely on the data. ML predictions’ accuracy capability of SVR. Despite its striking features, SVR has its
and limitations are influenced by model parameters, data limitations [26]. The important one is its inability to per-
quality, and the relationship between target and input form feature selection. In other words, it is incapable of
variables in the obtained datasets [6]. Incomplete or inac- feature selection [27].
curate data, biases, outliers, and noisy data can all severely On the other hand, feature selection plays an essential
weaken the prediction ability of models [7]. Multivariate part in supervised learning to obtain more promising and
regression, random forests, regression trees, neural net- efficient results. Feature selection filters unnecessary
works and association rules are machine learning models information from a dataset using statistical metrics to
engaged in numerous research to predict yields of improve a learning algorithm. The primary goal of feature
123
Neural Computing and Applications (2024) 36:20723–20750 20725
selection is to collect a good set of characteristics that may technique is used to cluster all the dataset’s features. It
be used to characterize and limit a dataset. The feature strives to maintain the clusters as far apart as feasible while
selection approach in machine learning reduces computa- making their features consistent. Then, the CFS ranking
tion time [8], improves forecasting outcomes, and enhances method independently positions features in each cluster.
data comprehension. Feature selection, then, is a typical These two techniques simplify the search space by
preprocessing step for high-dimensional data. The goals are addressing the high dimensionality and redundancy prob-
to make the data and the model easier to understand by lems. After the top features from each cluster are chosen,
reducing their dimensionality and improving forecasts’ the resulting reduced dataset is forwarded to the feature
accuracy. In other words, feature selection involves iden- selection phase. Secondly, in the feature selection phase, a
tifying the most relevant input variables from a pool of novel hybrid feature selection strategy is proposed to nar-
potential predictors, such as weather conditions, soil attri- row down the pool of candidates to the top-performing
butes, and agricultural practices for improving and features. Filter-type approaches, Fisher score and Mutual
enhancing the crop prediction phase. information gain, are applied. The intersection set of the
There are three distinct types of feature selection FS resulting features from each process is fed into the wrapper
methodologies named as a wrapper, a filter, or a hybrid. approach. The wrapper-based approach, recursive feature
Due to the increased complexity introduced by features of elimination Random-Forest-based RFE, combined with the
higher dimensions, this problem cannot be fixed by simply filer approaches to create a hybrid-based feature selection
combining all possible solutions. The filter approaches can technique. Finally, for the prediction phase, a novel
identify and eliminate irrelevant features; they cannot do improved algorithm ICOA is proposed to optimize the
the same for repeating features due to their failure to hyperparameter of SVR model to enhance the prediction
account for possible associations among features [9, 10]. results of the final phase. The COA algorithm is enhanced
For the filter method, it is the features of the data them- with the chaotic map and the Levy distribution function to
selves that define which subset of features is most impor- enhance exploration and exploitation phases of COA
tant, such as its correlation, Fisher score measure, resulting in a novel ICOA algorithm. The paddy crop
information gain, mutual information, and entropy [11]. dataset is used with the proposed method to identify the
The wrapper feature selection approaches is wrapped best features for future crop production prediction.
within the induction process [12, 13]. It is helpful to use the This paper’s main contributions are summed up as
wrapper method when problems arise. Several search follows:
methods can be used to find a subgroup of features by
• A framework that integrates a novel hybrid feature
restricting the suitable objective function, including
selection approach with optimized SVR model to
recursive feature elimination and backward and for-
enhance the prediction results is proposed.
ward elimination passes [14]. Wrapper methods are easily
• This paper provides a hybrid approach to feature
recognized by the excellent quality of the features they
selection, combining heuristic techniques such as filter
select, although at the deprivation of a higher computa-
and wrapper methods.
tional cost. Hybrid approaches are another approach that is
• An improved variant of COA algorithm is proposed to
occasionally studied to better special features. They
enhance the exploration and exploitation phases of
employ methods that aim for an intermediate between
COA.
computational complexity and speed [1, 15]. It strikes a
• The Levy flight and chaotic maps are integrated into the
good mix between accuracy and processing speed. In this
original COA resulting in a promising ICOA applied to
study, we proposed a hybrid approach, utilizing elements
optimize the SVR model.
of both the filter and wrapper approaches.
• The dataset’s redundancy and high dimensionality were
Despite the promising potential of supervised learning
mitigated through an unsupervised feature selection
and feature selection techniques in agriculture, challenges
strategy in the preprocessing phase, such as combining
persist in effectively integrating these methods to enhance
KM clustering and the CFS ranking.
crop yield prediction models. Agricultural datasets often
• Experimental results confirm that the proposed
exhibit high dimensionality and contain numerous vari-
approach selects the most relevant features and
ables, necessitating robust feature selection approaches to
enhances the prediction results.
identify the most influential factors. Moreover, manual
tuning of SVR hyperparameters can be labor-intensive and The following sections make up this paper: Sect. 2
may not always yield optimal results. compiles previous research on predicting crop yields, and
To address these challenges, this paper proposes a new Sect. 3 explains the information collected for this study.
framework with three phases: Preprocessing, Feature Section 4 discusses the proposed framework and its com-
Selection, and prediction. First, the k-means (KM) ponents; Sect. 5 presents our results and discussions of the
123
20726 Neural Computing and Applications (2024) 36:20723–20750
experiments; Sect. 6 present the discussion of the obtained categorized eight crops. Different machine learning algo-
results and Sect. 7 offers the conclusion and future work in rithms were utilized to classify different sorts of crops,
the related work. specifically the Random Forest and J48 Decision Tree. The
classifier’s performance was evaluated using precision,
F-measure and recall. These findings were then compared
2 Related work to the advanced classifications. The Random Forest method
demonstrated superior efficiency in categorizing agricul-
Estimating crop yields is critical in today’s world when an tural-related text, exhibiting the lowest error measures such
ever-growing population demands more and more food. It as a 13% Root Mean Square Error (RMSE).
aids in the enhancement of management procedures vital to Paudel et al. [25] used Crop Yield Prediction model
maximizing agricultural yield. ML methods, traditional (MARS) data from the mutual Research Centre of the
regression methods, and crop models [16–18] have been European Commission to evaluate NN crop yield fore-
used to estimate crop yields in the previous decade. Crop casting models’ accuracy and accessibility. The 1DCNN
yield models are a type of crop growth model. According and LSTM could handle time-series data. A GBDT model
to these parameters, they are merely a simulacrum of actual with hand-crafted attributes was compared to effectiveness.
scientific studies [19]. Providing reliable data on agricul- Agriculture and crop yield forecasting experts used feature
tural output, these models aid policymakers, farmers, and recognition algorithms to rate input parameters’ signifi-
the government achieves maximum sustainability [20]. cance. LSTM models outperformed GBDT models eco-
Vani and Rathi [21] described big data analysis as gath- nomically for wheat crop in Germany. LSTM models
ering, maintaining, and analyzing massive amounts of data accurately predicted the impacts of yield pattern, static
to find connections and other insights. Big data was used to features including biomass and soil retention ability fea-
analyses harvest, soil, and climate data from internal and tures on crop output, however high temperature and
external sources for agricultural applications. Several moisture circumstances were harder to measure. This study
machine learning algorithms grouped the data to estimate shows that DL can mechanically acquire characteristics
agricultural productivity. However, the grouping was inac- and provide accurate crop output estimates the advantages
curate and deprived. On the other hand, Proximity Likeli- and challenges of relating stakeholders’ human in model
hood Maximization Data Clustering (PLMDC) uses fewer understanding assessment.
characteristics from vast and densely packed farming data to Khaki et al. [26] used ML to successfully forecast corn
improve clustering and farmer crop output projections. An production and yield difference among corn hybrids given
appropriate linear regression method was utilized to remove either environmental or genotype data. Using remotely
extraneous features from dense and sparse agricultural data. sensed data collected before the crop, You et al. [27]
The Genetic Algorithm (GA) selected clustering data fea- applied DL algorithms to estimate soybean production. An
tures for best fitness. The A-FP development methods ANN model was also built to forecast environmental
evaluate the decision-support system’s capability to predict impacts and tea crops in Iran for black, green, and oolong
agricultural yields using meteorological data and crop [28]. After comparing the performance of deep fully con-
quality. The facts and observations showed that PLMDC nected neural networks, LASSO and RF found that com-
was more effective than current methods. bining a CNN and an RNN was superior for predicting
Predictions of frost danger for Zhejiang tea plantations soybean and corn yields [29]. Researchers created a deci-
using ML methods have also been made [22]. Damage was sion-support system using information about the soil and
calculated using meteorology, topography, and coordinate the surrounding environment [30].
geometry (latitude and longitude). ANN and SVM were Swanth et al. [31] propose a new way of predicting crop
used for the estimation. The authors in [23] built a Spatio- yield using a hybrid classification model that incorporates
temporal hybrid model using satellite-derived hydro-me- an enhanced feature ranking fusion technique. The authors
teorological data from 20 sites for 20 years in Bangladesh. propose a new SMOTE algorithm for data enrichment to
Dragonfly optimization and support vector regression ensure the optimization of features that will be extracted.
(SVR) were employed in this research. This hybrid model Their technique for feature extraction includes statistical
reduced the relative error in predicting tea crops by 11%. features, improved correlation-based features, raw data,
A. Reyana et al., [24] utilize data from IoT sensors to and entropy features. They also offer an enhanced way of
remotely monitor their crops. In contemporary agriculture, combining feature rankings using the results of various
producers presently control the surrounding environment of feature selection techniques: Relief, RFE and Chi-square.
their crops to optimize yields. The authors presented the Their hybrid model, which combines DBN and LSTM
MMLA, a novel method for recognizing multisensory models, is used for prediction. The results of the authors
information. The suggested recommendation system
123
Neural Computing and Applications (2024) 36:20723–20750 20727
show that their approach improves upon traditional clas- been used to estimate the accuracy of the derived models.
sifiers including LSTM, DBN, Bi-GRU, CNN, and SVM. The RF algorithm was utilized for comparison.
Fatma M. Talaat [32] introduces the Crop Yield Pre- In [41], CNN and LSTM are coupled to estimate county-
diction Algorithm (CYPA), a new method that employs level soybean yields using outdoor remote sensing data at
IoT techniques in precision farming. The authors integrate both the end of the increasing season. There is a shortage of
climate, meteorological, chemical data and agricultural literature on using the deep learning approach to estimate
yield into CYPA to enable policymakers to forecast annual agricultural yields in an indoor greenhouse setting, in
crop yields. The authors developed a decision support tool contrast to outside application scenarios. The research in
to aid farmers and decision makers in predicting agricul- [42, 43] motivated us to apply a RNN with long-short
tural yields by analyzing meteorological circumstances temporal memory (LSTM) units to the problem of pre-
specific to their regions. The researchers suggested an dicting crop yields for tomatoes and ficus. The evaluation
advanced machine learning approach for predicting agri- results also demonstrate that the conventional machine
cultural yields. In addition, active learning was imple- learning algorithms are inferior to deep learning techniques
mented in CYPA to optimize the model’s performance by regarding prediction accuracy and root mean
minimizing the amount of labeled data required for train- square errors (RMSEs).
ing. The CYPA can respond to adjusting field environ- To define agricultural objectives for import and export,
ments, such as pest outbreaks or weather by engaging in as well as to boost farmer incomes, crop yields must be
active learning. This involves actively choosing fresh predicted quickly and precisely in numerical and economic
samples for labelling that accurately reflect the current assessments. Crop production forecasts are one of the most
conditions. difficult concerns in the agricultural industry, since they are
The Levenberg–Marquardt technique was previously used to estimate higher crop output utilizing machine
exploited to evaluate and forecast human gait [33], and can learning techniques. According to previously reported
be utilized in surveys of forests and farms. Surveying with related studies, it is found that the literature work that
the old methods is difficult, time-consuming, and costly, utilizes DT algorithm has overfitting concerns with the
especially in remote or rugged places or where a lot of data, resulting in inaccurate predictions. The primary
vegetation is present, such as mountains, forests, or fields. obstacles and issues in the associated work can be sum-
In another paper, relevant operating laws and necessary marized as follows:
weighted aggregation operators were devised [34]. Here,
• More technique classifiers for agricultural yield predic-
scalar multiplication and neutral addition operational rules
tion must be examined in the linked work.
define the properties of the neutral type in the group
• The associated work must consider the proposed
association degrees and the sum of probability. All facets
technique in all variables of the agricultural sectors to
of the proposed legislation are examined.
enhance the forecasting process.
However, research on the use of DL for predicting tea
• The related work must add climatic data of the
yield is scant [35]. To estimate yield, ML and DL methods
suggested method to boost the accuracy of prediction.
analyze data on climate, soil, crops, and satellite imagery
• The linked work must add more crop-related features
[36]. The use of different microwave and spectral wave-
into the suggested technique for accurate prediction.
lengths, made possible by remote sensing data, enables
• The linked work needs to analyze improved strategies
crop status monitoring [37]. Predictions of wheat crops
for higher accuracy in crop forecast.
have been made using satellite and climate data [38]. A
• The associated work does not use an optimized model,
model for prediction for sorghum biomass was suggested
which can have a significant impact on prediction
with the sorghum crop model APSIM, the multi-layer
accuracy when compared to traditional models.
perceptron, and SVM as input. After comparing other
models, they decided that the MLP one was the most Therefore, this paper proposes a new framework that
reliable [39]. enhances the prediction performance by introducing a
Previous authors have used data mining (DM) tech- comprehensive framework that proposes a new hybrid
niques to identify and organize data corresponding to the feature selection approach and a novel algorithm for opti-
relative importance of the critical features influencing mizing the different hyper parameters for the prediction
sugarcane output and then to create mathematical models process. The proposed framework helped to handle dif-
for predicting sugarcane yield [40]. Three different DM ferent issues by related works where the novel hybrid
methods were used to analyze data from the databases of feature selection approach is more focused on the best
numerous sucrose mills in Brazil. Some DM strategies features to reduce the dimensionality reduction. In addi-
have been used to investigate relationships between tion, the new optimized model for the prediction enhanced
weather conditions and plant care. An external dataset has the prediction results compared with the recent approaches.
123
20728 Neural Computing and Applications (2024) 36:20723–20750
Furthermore, a new set of climatic features are integrated crayfish. In addition, r and C1 parameters control the dif-
to enhance the obtained results. ferent temperatures of crayfish.
The COA randomly initialize the population X of N can- In this phase, the crayfish fight for the possession of the
didate solutions each of which with dim dimensions. The cave. If the temperature is over 30 degrees and the random
position of each solution Xi;j is modeled as: value is greater than 0.5, the other crayfish are attracted to
the same cave. Consequently, they engage in conflict with
Xi;j ¼ Lb þ ðUb þ LbÞ rand ð1Þ one another in order to obtain possession of the cave, as
where Lb and Ub refer to the limit bounds of each of the indicated by Eq. (7).
dimension j. tþ1 t t
Xi;j ¼ Xi;j Xz;j þ Xshade ð7Þ
As previously mentioned, temperature is a crucial factor
in multiple phases of the crayfish and has been defined in z ¼ roundðrand ðN 1ÞÞ þ 1 ð8Þ
Eq. (2). When the temperature exceeds 30 degrees, the where the total number of population’s agents is defined by
crayfish relocates to a cooler area for its summer retreat. N.
When the temperature is suitable, the crayfish initiates its
foraging habit. The temperature range for foraging 3.1.4 Foraging phase
behavior is specified as 15 to 30 degrees. Therefore, the
foraging behavior can be replicated using a normal distri- In this phase, the crayfishes start the process of searching
bution, which is influenced by the temperature. The for food (optimal solution). Hence, when the temp is less
mathematical representation of this relationship is pre- than 30 degrees, the crayfish start searching for food at
sented in Eq. (3). different locations. The location and size of food is for-
temp ¼ rand 15 þ 20 ð2Þ mulated as:
! This process simulates the process of searching for the
1 ðtemp lÞ2 optimal solution for a problem
p ¼ C1 pffiffiffiffiffiffiffiffiffiffiffi exp ð3Þ
2pr 2r2
Xfood ¼ XG ð9Þ
where the temperature of the crayfish location is denoted fiti
by temp while l refers to the temperature of the best Q ¼ C3 ð10Þ
fitfood
123
Neural Computing and Applications (2024) 36:20723–20750 20729
where CðzÞ ¼ r 1
0 t
z1 t
e dt.
3.3 Levy flight
A Lévy Flight is a type of arbitrary walk where the steps 4 Proposed framework
taken follow a probability distribution known as the Lévy
distribution, which has tails that are heavier than those of a This section presents the proposed framework consisting of
normal distribution. The concept of Lévy-flight was ini- three main phases as shown in Fig. 2: preprocessing,
tially developed by Paul Lévy in 1937 and later further hybrid feature selection, and prediction. The preprocessing
elaborated by Benoit Mandelbrot [45]. Multiple studies phase includes the normalization task and clustering with
indicate that as animals and insects look for food, their the CFS task. The goal of this phase is to normalize the
flight behavior often exhibits a characteristic pattern of values of the dataset features, then to cluster the dataset
random direction selection, which can be described as a into different clusters. The groups/ clusters help to extract
Lévy-flight. In [46], Reynolds et al. investigated the the hidden information in the dataset. After that, the CFS is
movement patterns of fruit flies as they navigated their applied to reduce each cluster’s features, presenting a new
environment using a series of straight paths that were reduced dataset. In the second phase, a hybrid feature
interrupted by a sudden 90 turn. This resulted in a search selection approach is proposed, which implements two
pattern known as a scale-free intermittent Lévy-flight. In stages of filter and wrapper stages. The most relevant
[47], the authors have shown that Lévy-flight can be
123
20730 Neural Computing and Applications (2024) 36:20723–20750
features are selected for the prediction phase in this phase. 4.1.1 Normalization
In the prediction phase, a novel variant of COA algorithm
is proposed to optimize the different hyperparameters of Normalizing values is necessary for processing data. To
different machine learning models particularly SVR. This acquire values associated with another variable, some
is because the manual tuning of hyperparameters may not normalization forms require merely a rescaling step. When
lead to a more promising solution. we have data on the size of a crop’s population, we can
correct the mistakes. The population values can be regu-
4.1 Pre-processing phase larly distributed instead of randomly distributed once the
inaccuracies are corrected. Getting the z-score is the initial
This section presents the preprocessing phase components. step in normalization. The z-score can be written as:
The preprocessing phase consists of two main stages, z ¼ ½ðx lÞ=r ð14Þ
normalization, and clustering with CFS stages. Normal-
ization is applied as a preprocessing stage to make the where means of the crop population and standard devia-
values of different features in a specified range. Feature tions of the crop population are denoted by l and,
values may be of various ranges, so normalization is used. respectively.
Secondly, clustering is applied to cluster the dataset into
groups of similar patterns and rank the features of each 4.1.2 Clustering with CFS
cluster using CFS ranking. Top-ranked features from each
cluster are chosen to form a new minimized dataset for In this phase, we apply a preprocessing strategy that
different phases of the proposed framework. The details of combines KM clustering and CFS ranking to deal with the
each stage of the preprocessing phase are discussed in the high dimensionality of the input data. The KM approach is
following subsections. used on raw data to generate the initial ‘k’ number of
clusters. The suggested method is very similar to the tra-
ditional ones in that the ‘k’ number of clusters is predefined
each time, with ‘k’ values of 8, 10, and 12 being consid-
ered. Each cluster’s data is then ranked using CFS rating
123
Neural Computing and Applications (2024) 36:20723–20750 20731
and sorted in ascending order. A minimized dataset is features are discrete assessments of the variable of inter-
obtained by choosing the top CFS-ranked features from est’s distinctive qualities. Given a set of features, if the
each cluster. This reduced new dataset is sent to the next relationship between each extrinsic variable and feature is
phase since it has less redundancy. KM clustering is used known, and the relationships between all other pairs of
on training data to identify commonalities and create features are also known, then Eq. 15 can be used to
subsets. The theory behind this strategy is that clustering determine the relationship between the complex test, which
can help bring to light previously hidden information and includes all features, and the extrinsic variable,
highlight the underlying data structure that was not Pn
i¼1 ðxi xÞðyi yÞ
apparent before grouping. r ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Pn ffiqffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Pn ffi ð15Þ
2 2
The cluster analysis output can serve as a valuable guide i¼1 ðxi xÞ i¼1 ðyi yÞ
when extracting and prioritizing critical features from
many clusters. Using a KM approach, the training data is The formula mentioned above characterizes Pearson’s
partitioned into k groups. The center of each cluster can be correlation coefficient (PCC). Here, x and xi define the
determined by taking the meaning of the data inside it. mean and actual values related to the features under con-
Choosing how many clusters should be created or what sideration. The average and actual values of the dataset
value k would have been a crucial challenge in the KM class are denoted by y and yi , respectively.
clustering process. As the degree of similarity between the features and the
After the clusters have been generated, a CFS filter classes grows, so does the importance of the resulting
approach is applied to select the crucial features for each feature set. Furthermore, the overall feature-output corre-
cluster’s resulting minimal subset dataset. CFS uses sta- lation is denoted by rny = pðXn ; Y Þ, while the overall fea-
tistical metrics to evaluate feature subsets as part of its ture-feature correlation is denoted by
filtering procedure. When one feature is highly correlated rnn = pðXn ; X ÞðXn ; Xn Þ. The relative importance of the
with another, it is considered redundant. In this case, the chosen features is calculated using Eq. 16,
123
20732 Neural Computing and Applications (2024) 36:20723–20750
123
Neural Computing and Applications (2024) 36:20723–20750 20733
4.3 Prediction
123
20734 Neural Computing and Applications (2024) 36:20723–20750
algorithm is proposed. ICOA is a new variant of COA The SVR’s regression performance is enhanced by utilizing
algorithm which enhanced the searching process of COA to the ICOA to optimize the bandwidth of the Gaussian kernel
search for the best parameter combination to reduce the r and the penalty factor C. The mean square error (MSE)
prediction error defined by MSE. The proposed ICOA is between the actual values and the predictive values of SVR
used to optimize the different parameters of ML models is employed as the fitness function of ICOA. The ICOA-
and the results obtained indicated that SVR is the best SVR model algorithm is depicted below.
among all other algorithms in the crop yield prediction
1. Step 1. Initialize the population of ICOA with a set of
problem. The main strategy of enhancement defined to
candidate parameters, set the population size and the
boost the performance of COA is using combination of
maximum number of required iterations. Also, the LB
chaotic mapping theory and Levy flight operators. Multiple
and UB of the optimized parameters (r and C) are set.
studies indicate that involving Lévy flight trajectory
The initial population consists of a set of candidate
enhances the equilibrium between exploration and
solutions each of which is two-dimensional solution to
exploitation in optimization algorithms. In this paper, the
represent the two parameters. The initial population is
Lévy flight technique is employed to further adjust the
generated randomly.
positions of the gazelles. In addition, the chaotic maps are
2. Step 2. The fitness value of each solution (set of
utilized to further explore the search space of optimizing
parameters) is determined using the MSE fitness
the hyperparameters and obtain more promising parameter
function and XG and XL are obtained.
sets for different ML models. Therefore, the combination
3. Step 3. The algorithm update formulas are executed
of chaotic and Levy flight operators helps the ICOA
according to the exploration and exploitation phased
algorithm to avoid falling into local optima and enhance
based on the temp variables which can be either less
the search process of ICOA by balancing both exploration
than or greater than thirty degrees.
and exploitation phases. The mathematical model for the
4. Step 4. The Levy-chaotic update position (as shown in
new formulated combination is defined as follows:
Eq. (17)) is utilized to modify the parameters with
Gazellei ðt þ 1Þ ¼ Gazellei ðtÞ þ chðiÞ sign½r1 1=2 more enhanced parameter set.
0
L e vyðcÞ 5. Step 5. At the end of each iteration, the solution with
the minimum MSE is recorded which indicates the best
ð17Þ
parameter set at this iteration.
GazelleðtÞ indicates the position of the ith gazelle at the 6. Step 6. Save the optimal Crayfish overall the whole
tth iteration, r1 is a stochastic number between 0 and 1, and iterations to represent the optimal set of r and C that
chðiÞ is a chaotic value obtained by the chaos map. The minimized the MSE fitness value.
stochastic random walk equation, denoted as Eq. (17), 7. Step 7. The steps 2–6 are repeated until the maximum
assists the COA in guaranteeing that the search agent will number of iterations is reached and output the optimal
systematically explore the search area. This is achieved by solution that represents the optimal parameter set. The
increasing the step length over time, which helps to elim- flowchart of the proposed steps for optimizing the SVR
inate local minima. This study incorporates the Lévy flight parameters.
trajectory with the chaos map applied into the COA.
Therefore, the proposed ICOA can be used to optimize the
parameters of SVR ML algorithm to enrich the high per-
formance of the prediction results.
The regression performance of SVR is highly dependent
on the values of the bandwidth of the Gaussian kernel r in
the Radial Basis Function (RBF) and the penalty factor C.
123
Neural Computing and Applications (2024) 36:20723–20750 20735
1 Net cropped area The area that has had at least one planting of the crop in a given year Integer (hectare)
2 Gross cropped area Total area dedicated to growing crops across all growing seasons Integer (hectare)
3 Net irrigated area The sum of land that has been rinsed at some point during the year Integer (hectare)
4 Gross irrigated area How much land has been devoted to crops watered during the year’s growth seasons Integer (hectare)
5 Area rice Cumulative acreage devoted to rice cultivation Integer (hectare)
6 Quantity rice Quantity of rice grown in the region Integer (ton)
7 Yield rice Acquired rice amount in total Integer (ton)
8 Soil type The soil type in the research location was considered—2—Red soil type, 1—Medium black Integer
soil type
9 Land slope An increase or decrease in elevation Integer
10 Soil PH The soil pH scale measures both alkalinity and acidity Integer
11 Topsoil depth The top layer of soil is where most of the microbes and organic stuff are Integer (meters)
12 N soil The number of nitrogen molecules in the soil Integer (kilogram/
hectare)
13 P soil The measure of soil phosphorus content Integer (kilogram/
hectare)
14 K soil The potassium content of the ground Integer (kilogram/
hectare)
15 QNitro The application rate of nitrogen fertilizers Integer (kilogram)
16 QP2O5 The ratio of phosphorus-containing fertilizers used Integer (kilogram)
17 QK2O Use of potassium-based fertilizers in quantities Integer (kilogram)
18 Precipitation Condensation of atmospheric water vapor, or precipitation Integer
(millimeter)
19 Potential How much water evaporates from a given region given a sufficient supply Integer
evapotranspiration (millimeter/day)
20 Reference crop The rate of evaporation and transpiration from an irrigated crop reference surface Integer
evapotranspiration (millimeter/day)
21 Ground frost frequency The total number of days that the soil temperature in the upper layer has been below the Integer (number
water freezing point of days)
22 Diurnal temperature Temperature variation between daily high and low Integer ( Æ C)
range
23 Wet day frequency The total number of days with rainfall of 0.2 mm or more Integer (number
of days)
24 Vapour pressure Thermodynamic equilibrium is maintained thanks to the pressure exerted by water vapor in Integer
its condensed phase (hectopascal)
25 Maximum temperature The highest measured ambient temperature Integer ( Æ C)
26 Minimum temperature The air temperature was the coldest ever measured Integer ( Æ C)
27 Average temperature The typical level of air temperature Integer ( Æ C)
28 Humidity Levels of atmospheric water vapor Integer
(percentage)
29 Wind speed the velocity of the wind Integer (miles/
hour)
30 Aquifer area percentage Groundwater transmission capacity is a fraction of an area bounded by a body of porous Integer
rock (percentage)
31 Aquifer well yield The quantity of water extracted from an aquifer employing pumping Integer (liters/
minute)
32 Aquifer transmissivity The amount of water that can be distributed horizontally if the aquifer were completely Integer (meter2 /
saturated throughout its whole thickness day)
33 Aquifer permeability The rate at which fluids can move through a rock is a measure of this attribute Integer
(meter/day)
34 Post-electrical We have a standardized electrical conductivity of groundwater after rain Integer (siemens/
conductivity meter)
123
20736 Neural Computing and Applications (2024) 36:20723–20750
Table 2 (continued)
SN Parameter Parameter description Units
35 Pre–electrical Standard groundwater electrical conductivity before the monsoons Integer (siemens/
conductivity meter)
36 Groundwater post- The typical calcium content of groundwater after rainfall Integer
calcium (milligram/
Liters)
37 Groundwater pre- Specific groundwater calcium content before the monsoons Integer
calcium (milligram/
Liters)
38 Groundwater post- Groundwater magnesium levels are about average after a monsoon Integer
magnesium (milligram/
Liters)
39 Groundwater pre- Typical levels of magnesium in groundwater before the monsoon season Integer
magnesium (milligram/
Liters)
40 Groundwater post- the specific concentration of sodium in groundwater after a monsoon Integer
sodium (milligram/
Liters)
41 Groundwater pre- Average sodium concentration in groundwater before the monsoons Integer
sodium (milligram/
Liters)
42 Groundwater post- Potassium concentration in groundwater, on average, following a monsoon Integer
potassium (milligram/
Liters)
43 Groundwater pre- Potassium concentration in the earth was about average before the monsoons hit Integer
potassium (milligram/
Liters)
44 Groundwater post- Chloride concentration in groundwater, on average, following a monsoon Integer
chloride (milligram/
Liters)
45 Groundwater pre- Level of chloride in groundwater, typically before the monsoons Integer(milligram/
chloride Liters)
In this section, a set of experiments are conducted to With vast increases in both human intelligence and the
evaluate the performance of the proposed framework availability of suitable tools, the field of machine learning
including the proposed hybrid feature selection and pre- has exploded in recent years, allowing for the development
diction phases. of novel circumstances to ascertain, evaluate, and value
information-pervasive approaches in agricultural contexts.
123
Neural Computing and Applications (2024) 36:20723–20750 20737
The dataset for this study was collected from official Indian
government websites belonging to several agricultural
ministries. Three primary keys are used to piece together
the data: states, years, and crops. There needs to be a lot of
data for feature selection methods to work well. Data with
flexible features simplify discovering patterns by filtering
out details that aren’t pertinent to the study’s goals.
This section describes the data set utilized to make
predictions about crop yield in the study. Factors such as
rainfall, crop type, market price, and yield are collected to
create a dataset that can forecast whether or not a crop will
be profitable. Data from many sources is gathered, filtered,
and combined using Python. Eands. [Link]. [50],
Agmarknet [51], and [Link]. [52] are among the
sources used.
Paddy crop production prediction in the Vellore district
of southern India is the focus of the planned study. Ponnai,
Sholinghur, Arcot, Thimiri, Ammur, and Kalavai are all
part of the study’s geographical focus. Since paddy is a
significant cash crop in the area, it makes sense to look into
the economy there. The information includes non-typical
meteorological and soil features, such as the characteristics
of the groundwater used by the crops and the amount of
fertilizer applied to them. Parameters such as evapotran-
spiration, wet day frequency, groundwater nutrients, and
aquifer features were examined in this study. Brief details
regarding the study’s many crop parameters can be found
in Table 2.
A combination of paddy output (tonnes) cultivated area
(hectares), and yield acquired (kg/hectare) is used to cal-
culate the estimated paddy crop yield. Regular climatic
factors were used, such as reference crop evapotranspira-
tion, mean temperature, humidity, potential evapotranspi-
Fig. 4 The proposed forecasting model’s predicted values are com-
ration, and precipitation. In contrast, unique climatic data pared to actual values. a After cross-validation, b Before cross-
such as diurnal temperature range, ground frost frequency, validation
and wind speed were also considered. The climate infor-
mation comes from the Indian Meteorological Depart- magnesium, potassium, calcium, and sodium) content.
ment’s online platform, metadata. Topsoil density, soil Table 3
macronutrients and Soil pH are all examples of soil
parameters. The analysis considers the many hydro
chemical characteristics of groundwater, such as its per-
meability, aquifer type, transmissivity, electrical conduc-
tivity, and pre- and post-monsoon micro-nutrient (chloride,
Table 4 Evaluation of ML
ML Model The Efficiency Metric Using All Dataset Features
models’ performance using all
features of the dataset MAE MSE RMSE R2 MAPE (%) MedAE
123
20738 Neural Computing and Applications (2024) 36:20723–20750
Table 5 Evaluation of ML
ML Model The Efficiency Metric Using Inherent feature importance
models’ performance using the
inherent approach of feature MAE MSE RMSE R2 MAPE (%) MedAE
importance
Support Vector Machine 0.276 0.089 0.298 0.472 21 0.316
Random Forest 0.280 0.091 0.302 0. 441 29 0.320
Decision Tree 0.326 0.121 0.348 0.382 45 0.366
k-Nearest Neighbor 0.406 0.169 0.411 0.478 51 0.426
Gradient Boosting 0.286 0.095 0.309 0.427 33 0.326
Table 6 Evaluation of ML
ML Model The Efficiency Metric Using the FMIG-RFE-SVM approach
models’ performance using the
proposed FMIG-RFE-SVM MAE MSE RMSE R2 MAPE (%) MedAE
approach
Support Vector Machine 0.194 0.039 0.196 0.542 20 0.196
Random Forest 0.238 0.060 0.245 0.415 35 0.230
Decision Tree 0.272 0.075 0.274 0.403 40 0.286
k-Nearest Neighbor 0.316 0.102 0.319 0.384 45 0.330
Gradient Boosting 0.252 0.064 0.253 0. 409 38 0.266
Table 8 Optimized Prediction models evaluation without using gradient boosting [56] and Random Forest [57]. The fol-
Hybrid FMIG-RFE-SVM as feature selection lowing subsections details the utilized evaluation metrics to
ML Model MAE MSE R2 MedAE assess the different components of the proposed model.
The evaluation map consists of many folds including: the
ICOA-SVM 0.182 0.073 0.493 0.280 evaluation of the hybrid FS approach, the evaluation of the
ICOA-RF 0.220 0.077 0.421 0.320 proposed optimized prediction phase and the evaluation of
ICOA-KNN 0.212 0.091 0.412 0.318 the full framework.
ICOA-DT 0.233 0.103 0.381 0.351
ICOA-Gradient 0.211 0.083 0. 427 0.360 5.3.1 Metrics of evaluation
123
Neural Computing and Applications (2024) 36:20723–20750 20739
Fig. 5 Machine learning model performance metrics with a all features of the dataset, b selected features using the inherent feature importance
approach, and c features obtained using the proposed FMIG-RFE-SVM approach
performance indicators considered in evaluating the work • Mean Squared Error (MSE) is an essential metric for
that has been developed. evaluating an estimator’s efficacy. Moreover, this
characterizes how well a regressor line corresponds to
• Mean Absolute Error (MAE) is a way to determine
the points in the dataset [59]. Mean squared error
the typical significance of errors [58] by taking a set of
(MSE) can be calculated using the formula:
predictions and averaging them over. According to the
n 2
Equation, the mean absolute deviation from the 1X
MSE ¼ yj y0j ð19Þ
expected value has been observed. n j¼1
1X n
MAE ¼ yj y0j ð18Þ • Root mean square error (RMSE) measures prediction
n j¼1 uncertainty by squaring the difference between the
observed and the predicted errors [60]. More specifi-
where sample size, n, represents the population from
cally, it clarifies the degree to which the data is
which information is drawn; yj represents the baseline
concentrated along the best fit line. Equation 6 repre-
measure of interest, and y0 j characterizes the predicted
sents the calculation of the (RMSE):
estimate of interest.
123
20740 Neural Computing and Applications (2024) 36:20723–20750
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u
u n y y0 2
uX j j
RMSE ¼ t ð20Þ
j¼1
n
ð21Þ
• Mean Absolute Percentage Error (MAPE) measures
how far the model’s forecast differs from the actual
result. Simply put, it represents an average of the
various percentage errors. It’s calculated by dividing
the total absolute error by each timeframe separately.
As shown in Eq. 19, its definition is as follows:
1X n y y0
j j
MAPE ¼ ð22Þ
n j¼1 yj
Fig. 7 The results of a comparison of the accuracy of the Random forest model with a all of the features in the dataset and b the proposed feature
selection approach
123
Neural Computing and Applications (2024) 36:20723–20750 20741
Fig. 8 The results of a comparison of the accuracy of the Decision-tree model with a all of the features in the dataset and b the proposed feature
selection approach
Fig. 9 The results of a comparison of the accuracy of the SVM model with a all of the features in the dataset and b the proposed feature selection
approach
MedAEðy; y^Þ ¼ medianðjy1 y^1 j; . . .; jyn y^n jÞ ð23Þ dependency issue may arise because of the reliance on
earlier observations and the possibility of leakage in lag
variables due to the response variable. Because of this, the
5.3.2 Cross-validation information space exhibits non-stationarity or a tendency
toward fluctuating mean and variance values. In this case, a
It is common to split the dataset into a test set and a forward chaining method is more suitable for executing the
training set when building a model of machine learning, cross-validation. The proposed approach builds its model
with the more extensive set being given more weight as the on historical data and makes predictions using a five-fold
model is refined. While the test dataset is minimal, there is cross-validation procedure. Table 4 summarizes these
always the chance that important data that could have results.
helped the model was left out. High variance in the data is It’s similar to training on a small sample of data and
also a cause for concern. The method of K-fold cross- then using that to make predictions about new data and
validation is used to deal with this issue. Attempting to assess how well those predictions hold up. The data points
model and predict time series data is a challenging and predicted are included in the next batch of training data,
involved process. Cross-validation based on randomly and predictions are made for the following data points. The
splitting a time series does not work very well. A temporal
123
20742 Neural Computing and Applications (2024) 36:20723–20750
Fig. 10 The results of a comparison of the accuracy of the KNN model with a all of the features in the dataset and b the proposed feature
selection approach
Table 9 Optimized Prediction model evaluation using Hybrid FMIG- Table 11 Prediction model evaluation without using Hybrid FMIG-
RFE-SVM as feature selection RFE-SVM as feature selection
ML Model ML Model
2
MAE MSE R MedAE MAE MSE R2 MedAE
ICOA-SVM 0.151 0.062 0.572 0.216 RF [62] 0.203 0.093 0.526 0.301
ICOA-RF 0.205 0.070 0. 531 0.250 1DCNN [25] 0.193 0.085 0.513 0.296
ICOA-KNN 0.201 0.088 0.472 0.306 LSTM-DBN [31] 0.194 0.101 0.561 0.243
ICOA-DT 0.206 0.098 0.518 0.316 CYPA [32] 0.166 0.099 0.566 0.231
ICOA-Gradient 0.196 0.071 0.537 0.231 Proposed 0.151 0.062 0.572 0.216
Table 10 Prediction model evaluation without using Hybrid FMIG- possible value for K, the cross_val_score function is used
RFE-SVM as feature selection for fine-tuning the cross-validation hyperparameters. The
ML Model data is divided into K-equal subgroups by specifying a
value of 5 for the n_splits argument. In this work, we
MAE MSE R2 MedAE allocate 75% of the data to training and 25% to testing.
ICOA-SVM 0.151 0.062 0.572 0.216 K-fold cross-validation is where the error measure for the
COA-SVM 0.211 0.102 0.462 0.336 trained model is established. The R2 score is used to
PSO-SVM 0.261 0.113 0.453 0.364 quantify the accuracy of the model and is refined at each
RUN-SVM 0.196 0.093 0.521 0.297 iteration until an optimal value is reached.
WOA-SVM 0.213 0.112 0. 489 0.268 Below is a detailed illustration of the experimental setup
for the proposed hybrid feature selection approach for crop
prediction, which may be used in conjunction with several
different machine learning frameworks, including KNN,
results of the cross-validation on the proposed method are gradient boosting, SVM, decision trees, and random forest.
shown in Fig. 4.
PyScikit-Learn, a machine learning library, is used to 5.4 Experimental results
conduct cross-validation. The steps of preprocessing the
dataset are performed. Sklearn’s train test_split_function is Here, we summarize the results of experiments that chose
brought in via the model_selection sub-library so that data the proposed framework above the baseline machine
can be split into test and training sets. To discover the best learning models. The feature selection approach, which
involves picking the most pertinent features from a dataset,
123
Neural Computing and Applications (2024) 36:20723–20750 20743
0.5
0.4
0.3
0.2
0.1
0
ICOA-SVM COA-SVM PSO-SVM RUN-SVM WOA-SVM
can boost the accuracy of a prediction model. In addition, step using all the features in the dataset. Second, the feature
the proposed prediction phase is optimized by using a importance techniques included within the algorithm are
novel ICOA algorithm to best obtain a set of parameters of used to create models, with only the most essential features
ML models. To ensure that the proposed hybrid statistical being chosen. Third, the models are built using the pro-
feature selection technique work as intended, it is imple- posed FMIG-RFE-SVM approach. The method proposes
mented with the following models of machine learning: selecting the most critical features and assessing the pre-
dicted outcomes. Fourth, the prediction phase is evaluated
• Decision tree
using the novel ICOA compared with other optimization
• Random forest
algorithms for optimizing the different ML parameters.
• Gradient boosting.
Finally, the full framework is evaluated with some state-of-
• Support Vector machine.
art approaches and models.
• KNN
In some machine learning algorithms, a helpful built-in 5.4.1 Estimating the effectiveness of hybrid proposed
mechanism known as feature importance is included. These feature selection approach
techniques are commonly used in forecasting, allowing
close monitoring of the essential model variables. To evaluate the efficacy of the proposed hybrid feature
Depending on the situation, this data can be utilized to selection method, a set of experimental results are cap-
modify the current models by engineering new features or tured. The features obtained via the proposed feature
discarding the noisy feature data. The proposed hybrid selection methods, features gained via the built-in feature
feature selection framework is compared to this metric as importance approach and all of the features are used to
one of its benchmarks. There are five stages to this model’s evaluate the accuracy of the various experimental models.
analysis and evaluation. The evaluation metrics define the effectiveness of the
First, Prediction results are verified by several statistical running model. The differences between the expected and
assessment measures once the models are built in the first actual values are measured by the residuals acquired in the
123
20744 Neural Computing and Applications (2024) 36:20723–20750
Fig. 12 Diagnostic residual plots for regression analysis. a the scale versus location plot, b the residuals against leverage plot, c The residuals vs
fitted plot, and d the usual Q-Q plot
experiments. A model’s efficacy and accuracy can be models. Each was trained using either the complete set of
measured by examining the size of the residual spread. As features, the inherent feature importance approach or a
can be shown in Tables 5, 6 and 7, the evaluation metrics subset of features generated using the proposed feature
attained via the proposed hybrid feature selection technique selection method.
outperform those acquired via the other investigated Figures 6, 7, 8, 9, 10 show visual representations of the
methods. accuracy of machine learning models trained on the entire
Efficiency evaluation is a crucial part of developing a dataset and the proposed feature selection method. For
better model. For future iterations, it aids in determining instance, you can see in Fig. 6a how practical the proposed
the best possible framework for describing the data and feature selection approach is by looking at the accuracy
putting it to use. A prediction’s accuracy is evaluated by achieved by the Gradient boosting algorithm when fed with
considering how close the prediction comes to the true features produced in this manner. When using all of the
value. It measures how often a model comes up with cor- features in the dataset, as shown in Fig. 6a, the Gradient
rect results. Accuracy measurements for the tested models boosting algorithm achieves an accuracy of 84.22%.
are shown in Table 8 using the proposed FMIG-RFE-SVM Achieving an accuracy of 86.77% utilizing the proposed
hybrid approach for feature selection, the built-in feature hybrid feature selection approach of the gradient boosting
importance approach, and the entire dataset. algorithm is depicted in Fig. 6b. In Fig. 7a, we can see that
When evaluated with the proposed feature selection when the random forest algorithm is applied to the entire
approach, the results show higher accuracy. Figure 5 dataset, it achieves an accuracy measure of 90.84%. The
visually depicts the different types of machine learning proposed feature selection approach yielded an accuracy of
123
Neural Computing and Applications (2024) 36:20723–20750 20745
Fig. 13 presents the probability density curves of the following: actual data and data predicted by the proposed FMIG-RFE-SVM used with:
b Random Forest, c Gradient Boosting, d Decision Tree, e Support Vector Machine, (f)K-Nearest Neighbor
91.98%when fed into the random forest algorithm (as learning model is shown in Fig. 9a with an accuracy
shown in Fig. 7b). Figure 8a shows that an accuracy measure of 86.22%. Figure 10b shows an accuracy of
measure of 79.25% is obtained when the decision tree 88.91% after applying the proposed feature selection
method is applied to all dataset features. As shown in approach with the SVM model. In Fig. 10a, we can see that
Fig. 8b, the accurate measurement of the features was when the KNN model is used for all of the features in the
obtained after using the proposed feature selection method. dataset, it achieves an accuracy measure of 78.21%. The
Utilizing all features in the dataset with the SVM machine
123
20746 Neural Computing and Applications (2024) 36:20723–20750
proposed feature selection approach yielded an accuracy of parameters of ML models particularly SVM model. Fig-
80.69%when fed into the KNN model, shown in Fig. 10b. ure 11 visualizes the comparison reported at Table 11 for
further analysis and visualization.
5.4.2 Estimating the effectiveness of prediction phase
5.4.3 Comparison with some recent state-of-art
This section evaluates the proposed prediction model for approaches
the crop yield. In order to evaluate the proposed ICOA to
optimize the parameters of different machine learning To undertake a more extensive study of the suggested
models, ICOA is applied for the five utilized machine framework, it is compared to various current state-of-the-
learning models and the results are captured. Tables 9 and art techniques to evaluate. In this experiment, multiple
10 present the obtained results applied as a result of using recently published methodologies are combined to
ICOA to optimize the parameters of different ML models. demonstrate the innovative model as a possible solution to
This section tests the ICOA parameter optimization with the crop yield forecast problem. The compared state-of-the-
and without using the proposed hybrid feature selection art techniques include RF [62], 1DCNN [25], LSTM-DBN
approach. Tables 9 and 10 test the ICOA as a parameter [31] and CYPA [32]. The results show that the suggested
optimizer for many ML models without and with using the model outperformed other prediction models, indicating a
proposed hybrid feature selection approach, respectively. robust model. Table 12 records the captured results in
According to Table 9 it is obvious that the performance of terms of MAE, MSE, R2 and MedAE. According to
ML models increased after utilizing ICOA compared with Table 12, it is clear that the MAE and RMSE obtained by
Table 5 and 7. This is because the utilized chaotic levy the proposed framework is the minimum among all state-
Crayfish algorithm utilized the search process for the vest of-art works while CYPA is ranked as the second best one.
set of parameters that adapt the ML models. Moreover, The obtained results indicate that the proposed framework
Table 10 evaluates the proposed ICOA to adapt the presents an excellent contribution to the literature work.
parameters of ML models in the presence of the hybrid
feature selection approach. It is clear that the parameter 5.4.4 Statistical analysis on accuracy
optimization process along with the hybrid feature selec-
tion approach largely increased the performance of the ML Table 12 shows the statistical comparison of the proposed
to predict the best results. From Table 10, SVR obtained model to the RF, 1DCNN, CYPA, and LSTM-DBN for
the best results, least prediction errors, compared with agricultural yield prediction. Furthermore, it is evaluated
other algorithms. The performance presented by SVR is for accuracy. Because metaheuristic procedures are unre-
superior to all other algorithms indicating a robust pre- liable, each method is rigorously tested to assure improved
diction result can be obtained. estimation. Furthermore, five different types of statistical
For further evaluation of the proposed ICOA as a measures are investigated: the mean, maximum, Wilcoxon
parameter optimizer approach, several optimization algo- test with p-value is 0.05 [63], Friedman rank [64], median,
rithms are compared with ICOA. Table 11 presents the standard deviation, and minimum. Furthermore, the pro-
captured results of applying different optimization algo- posed framework has a maximum statistical metric accu-
rithms to SVM ML model. The MAE, MSE, RMSE, R2 racy of 0.949, while the RF has an accuracy of 0.853,
and MedAE are used to differentiate between different LSTM-DBN has an accuracy of 0.914, CYPA has maxi-
approaches. According to Table 11 it is clear that ICOA- mum accuracy of 0.939 and 1DCNN [24] has an accuracy
SVM is superior to all other algorithms particularly COA- of 0.924. The average accuracy obtained by the proposed
SVM which indicates the original COA. The MAE and framework is the best among all literature, which is 0.943.
RMSE of ICOA-SVM is 0.151 and 0.228 which is the It is also analyzed that the obtained p-value of Proposed
minimum among all optimizers. Therefore, the proposed framework versus all other approaches is less than 0.05
ICOA-SVM is a promising algorithm to optimize the indicating the significance of the obtained results between
123
Neural Computing and Applications (2024) 36:20723–20750 20747
123
20748 Neural Computing and Applications (2024) 36:20723–20750
analysis. Evaluation of the dataset’s factorability is approach eliminates redundant and non-essential features
required before running the factor analysis. Aside from utilizing information gain and fisher score, resulting in a
that, the Kaiser–Meyer–Olkin (KMO) test is used to smaller subgroup. These features can be used to build an
determine if the data is suitable for factor analysis. It intelligent agricultural model for crop prediction. Future
details whether or not the overall model and set of data are research may focus on new cutting-edge fuzzy-based
adequate. The KMO values might range from 0 to 1, with clustering algorithms that can provide more helpful infor-
anything below 0.1 deemed insufficient. mation for yield prediction. Second, we may include
In general, the KMO for the crop dataset is 0.83, which additional features in the dataset to improve the accuracy
is a good fit for moving forward with factor analysis. The of the prediction model.
eigenvalues define the number of factors in a scree plot. A
straight line represents each factor and its eigenvalue in the
scree plot procedure. The variables with eigenvalues more Author contributions All authors contributed equally to the research
by conceptualization, methodology, software, validation, formal
significant than one are regarded to be independent. The analysis, investigation, resources, data curation, writing—original
scree plot shown in Fig. 14 reveals 32 eigenvectors with draft, review & editing.
squared values larger than 1. These factors together
account for 55% of the total variance. By analyzing mas- Funding Open access funding provided by The Science, Technology
& Innovation Funding Authority (STDF) in cooperation with The
sive datasets, factor analysis can uncover hidden relation- Egyptian Knowledge Bank (EKB).
ships and identify groups of connected variables.
In any case, the same data components might be used to Data availability All data used and required are mentioned in the
support competing explanations. Our proposed feature manuscript.
selection technique yields 32 deciding factors close to the
number of features. The overall performance and com- Declarations
parison findings demonstrate that the proposed hybrid
feature selection approach yields superior performance Conflict of interest The authors declare that they have no known
results compared to the other feature selection process. As competing financial interests or personal relations that could have
a result, the frameworks’ prediction ability and efficiency appeared to influence the work reported in this paper.
are enhanced, as evidenced by a lower mean squared error Open Access This article is licensed under a Creative Commons
(MSE), root mean square error (RMSE), mean absolute Attribution 4.0 International License, which permits use, sharing,
error (MAE), median absolute error (MedAE) and higher adaptation, distribution and reproduction in any medium or format, as
value of determination coefficient. The diagnostic graphs long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons licence, and indicate
additionally detail the improved exploratory performance if changes were made. The images or other third party material in this
of the models. article are included in the article’s Creative Commons licence, unless
indicated otherwise in a credit line to the material. If material is not
included in the article’s Creative Commons licence and your intended
use is not permitted by statutory regulation or exceeds the permitted
7 Conclusion and future work use, you will need to obtain permission directly from the copyright
holder. To view a copy of this licence, visit [Link]
Agriculture is one of the most challenging departments to org/licenses/by/4.0/.
incorporate analytical results. Weather, soil, crop diseases,
and pest infestations affect agricultural productivity and
precision agriculture. Machine learning can change References
agribusiness by incorporating yield forecasting compo-
1. Holzman ME, Carmona F, Rivas R, Niclòs R (2018) Early
nents. Machine learning models assess facts, translate data, assessment of crop yield from remotely sensed water stress and
and provide in-depth process knowledge. Feature selection solar radiation data. ISPRS J Photogramm Remote Sens
using statistical measurements is critical for streamlining 145:297–308
the predictive model’s learning process and efficiently 2. Singh A, Ganapathysubramanian B, Singh AK, Sarkar S (2016)
Machine learning for high-throughput stress phenotyping in
representing the dataset. This paper proposes a novel plants. Trends Plant Sci 21(2):110–124
framework with a new hybrid feature selection strategy for 3. Xing L, Li L, Gong J, Ren C, Liu J, Chen H (2018) Daily soil
machine learning models. The models predict paddy crop temperatures predictions for various climates in United States
production based on soil, climate, and groundwater using data-driven model. Energy 160:430–440
4. Liu S, Wang X, Liu M, Zhu J (2017) Towards better analysis of
hydrochemical parameters. The FMIG-RFE-SVM method machine learning models: A visual analytics perspective. Visual
determines an intriguing study area’s most crucial agri- Informatics 1(1):48–56
cultural yield feature. The proposed approach combines the 5. Johnson MD, Hsieh WW, Cannon AJ, Davidson A, Bédard F
filter and RFE-SVM wrapper approaches. The filter (2016) Crop yield forecasting on the Canadian Prairies by
123
Neural Computing and Applications (2024) 36:20723–20750 20749
remotely sensed vegetation indices and machine learning meth- 27. J. You, X. Li, M. Low, D. Lobell, and S. Ermon (2017) ‘‘Deep
ods. Agric For Meteorol 218:74–84 gaussian process for crop yield prediction based on remote
6. Y.-H. Kuo, Z. Li, and D. Kifer, ‘‘Detecting outliers in data with sensing data’’ in Thirty-First AAAI conference on artificial
correlated measures,’’ in Proceedings of the 27th ACM Interna- intelligence
tional Conference on Information and Knowledge Management, 28. Khanali M, Mobli H, Hosseinzadeh-Bandbafha H (2017)
2018, pp. 287–296 Modeling of yield and environmental impact categories in tea
7. Irita K (2011) Risk and crisis management in intraoperative processing units based on artificial neural networks. Environ Sci
hemorrhage: Human factors in hemorrhagic critical events. Pollut Res 24(34):26324–26340
Korean J Anesthesiol 60(3):151–160 29. Khaki S, Wang L, Archontoulis SV (2020) A cnn-rnn framework
8. Chandrashekar G, Sahin F (2014) A survey on feature selection for crop yield prediction. Front Plant Sci 10:1750
methods. Comput Electr Eng 40(1):16–28 30. Iqbal U, Shahbaz M, Khalid A (2015) Development of a Decision
9. Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020) Support System to increase the Tea Crops yield. Bahria Univer-
Benchmark for filter methods for feature selection in high-di- sity Journal of Information & Communication Technologies
mensional classification data. Comput Stat Data Anal 143:106839 (BUJICT) 8:2
10. Askr H, Abdel-Salam M, Hassanien AE (2024) Copula entropy- 31. Boppudi S, Jayachandran S (2024) Improved feature ranking
based golden jackal optimization algorithm for high-dimensional fusion process with Hybrid model for crop yield prediction.
feature selection problems. Expert Syst Appl 238:121582 Biomed Signal Process Control 93:106121
11. Mielniczuk J, Teisseyre P (2019) Stopping rules for mutual 32. Talaat FM (2023) Crop yield prediction algorithm (CYPA) in
information-based feature selection. Neurocomputing precision agriculture based on IoT techniques and climate chan-
358:255–274 ges. Neural Comput Appl 35(23):17281–17292
12. Kohavi R, John GH (1997) Wrappers for feature subset selection. 33. Alharbi A, Equbal K, Ahmad S, Rahman HU, Alyami H (2021)
Artif Intell 97(1–2):273–324 Human gait analysis and prediction using the levenberg-mar-
13. Taher F, Abdel-salam M, Elhoseny M, El-hasnony IM (2023) quardt method. J Healthcare Eng 2021:1–11
Reliable Machine Learning Model for IIoT Botnet Detection. 34. Garg H (2020) Neutrality operations-based Pythagorean fuzzy
IEEE Access 11:49319–49336 aggregation operators and its applications to multiple attribute
14. Chen G, Chen J (2015) A novel wrapper method for feature group decision-making process. J Ambient Intell Humaniz
selection and its applications. Neurocomputing 159:219–226 Comput 11(7):3021–3041
15. Pourpanah F, Lim CP, Wang X, Tan CJ, Seera M, Shi Y (2019) A 35. Reichstein M, Camps-Valls G, Stevens B, Jung M, Denzler J,
hybrid model of fuzzy min–max and brain storm optimization for Carvalhais N (2019) Deep learning and process understanding for
feature selection and data classification. Neurocomputing data-driven Earth system science. Nature 566(7743):195–204
333:440–451 36. Kern A et al (2018) Statistical modelling of crop yield in Central
16. Paudel D et al (2021) Machine learning for large-scale crop yield Europe using climate data and remote sensing vegetation indices.
forecasting. Agric Syst 187:103016 Agric For Meteorol 260:300–320
17. Becker-Reshef I, Vermote E, Lindeman M, Justice C (2010) A 37. Azzari G, Jain M, Lobell DB (2017) Towards fine resolution
generalized regression-based model for forecasting winter wheat global maps of crop yields: Testing multiple methods and satel-
yields in Kansas and Ukraine using MODIS data. Remote Sens lites in three countries. Remote Sens Environ 202:129–141
Environ 114(6):1312–1323 38. Cai Y et al (2019) Integrating satellite and climate data to predict
18. Qader SH, Dash J, Atkinson PM (2018) Forecasting wheat and wheat yield in Australia using machine learning approaches.
barley crop production in arid and semi-arid regions using Agric For Meteorol 274:144–159
remotely sensed primary productivity and crop phenology: A case 39. A. Masjedi et al., ‘‘Sorghum biomass prediction using UAV-
study in Iraq. Sci Total Environ 613:250–262 based remote sensing data and crop model simulation,’’ in
19. Van Ittersum M, Donatelli M (2003) Modelling cropping sys- IGARSS 2018–2018 IEEE International Geoscience and Remote
tems: highlights of the symposium and preface to the special Sensing Symposium, 2018: IEEE, pp. 7719–7722
issues. Eur J Agron 18(3–4):187–197 40. Hammer RG, Sentelhas PC, Mariano JC (2020) Sugarcane yield
20. Kasampalis DA, Alexandridis TK, Deva C, Challinor A, Moshou prediction through data mining and crop simulation models.
D, Zalidis G (2018) Contribution of remote sensing on crop Sugar Tech 22(2):216–225
models: a review. Journal of Imaging 4(4):52 41. Sun J, Di L, Sun Z, Shen Y, Lai Z (2019) County-level soybean
21. Vani PS, Rathi S (2023) Improved data clustering methods and yield prediction using deep CNN-LSTM model. Sensors
integrated A-FP algorithm for crop yield prediction. Distributed 19(20):4363
and Parallel Databases 41(1):117–131 42. Alhnaity B, Pearson S, Leontidis G, Kollias S (2019) Using deep
22. Xu J et al (2021) Estimation of Frost Hazard for Tea Tree in learning to predict plant growth and yield in greenhouse envi-
Zhejiang Province Based on Machine Learning. Agriculture ronments. In International Symposium on Advanced Technolo-
11(7):607 gies and Management for Innovative Greenhouses GreenSys2019
23. Jui SJJ et al (2022) Spatiotemporal Hybrid Random Forest Model 1296:425–432
for Tea Yield Prediction Using Satellite-Derived Variables. 43. Alhnaity B, Kollias S, Leontidis G, Jiang S, Schamp B, Pearson S
Remote Sensing 14(3):805 (2021) An autoencoder wavelet based deep neural network with
24. Reyana A, Kautish S, Karthik PS, Al-Baltah IA, Jasser MB, attention mechanism for multi-step prediction of plant growth. Inf
Mohamed AW (2023) Accelerating Crop Yield: Multisensor Data Sci 560:35–50
Fusion and Machine Learning for Agriculture Text Classification. 44. Jia H, Rao H, Wen C, Mirjalili S (2023) Crayfish optimization
IEEE Access 11:20795–20805 algorithm. Artif Intell Rev 56(Suppl 2):1919–1979
25. Paudel D, de Wit A, Boogaard H, Marcos D, Osinga S, Atha- 45. X.-S. Yang and S. Deb, ‘‘Cuckoo search via Lévy flights,’’ in
nasiadis IN (2023) Interpretability of deep learning models for 2009 World congress on nature & biologically inspired com-
crop yield forecasting. Comput Electron Agric 206:107663 puting (NaBIC), 2009: Ieee, pp. 210–214.
26. Khaki S, Wang L (2019) Crop yield prediction using deep neural 46. Reynolds AM, Frye MA (2007) Free-flight odor tracking in
networks. Front Plant Sci 10:621 Drosophila is consistent with an optimal intermittent scale-free
search. PLoS ONE 2(4):e354
123
20750 Neural Computing and Applications (2024) 36:20723–20750
47. Barthelemy P, Bertolotti J, Wiersma DS (2008) A Lévy flight for the influence of climate parameters and seasonality on drought
light. Nature 453(7194):495–498 forecasting. Comput Electron Agric 152:149–165
48. R. Kohavi and G. H. John, ‘‘The wrapper approach,’’ in Feature 59. Deepa N, Ganesan K (2019) Hybrid rough fuzzy soft classifier
extraction, construction and selection: Springer, 1998, pp. 33–50. based multi-class classification model for agriculture crop
49. Elavarasan D, Vincent PD (2020) Crop yield prediction using selection. Soft Comput 23(21):10793–10809
deep reinforcement learning model for sustainable agrarian 60. Torres AF, Walker WR, McKee M (2011) Forecasting daily
applications. IEEE access 8:86886–86901 potential evapotranspiration using machine learning and limited
50. E. d. n. i. (2016). ‘‘Directorate Of Economics And Statistics, climatic data. Agric Water Manag 98(4):553–562
Ministry Of Agriculture, Government Of India.’’ [Link] 61. S. D. Brown, R. Tauler, and B. Walczak, Comprehensive
[Link] (accessed 21–12–2022. chemometrics: chemical and biochemical data analysis. Elsevier,
51. ‘‘Agriculture Marketing.’’ [Link] 2020.
(accessed 12/21/2022. 62. Van Klompenburg T, Kassahun A, Catal C (2020) Crop yield
52. M. n. i. 2016. ‘‘Ministry Of Statistics And Program Implemen- prediction using machine learning: A systematic literature
tation, Government Of India.’’ [Link] (accessed review. Comput Electron Agric 177:105709
21–12–2022. 63. Cuzick J (1985) A Wilcoxon-type test for trend. Stat Med
53. Prasad R, Deo RC, Li Y, Maraseni T (2018) Soil moisture 4(1):87–90
forecasting by a hybrid machine learning technique: ELM inte- 64. S. Siegel and N. Castellan, ‘‘The Friedman two-way analysis of
grated with ensemble empirical mode decomposition. Geoderma variance by ranks,’’ Nonparametric statistics for the behavioral
330:136–161 sciences, pp. 174–184, 1988, [Link]
54. Oh H-J, Pradhan B (2011) Application of a neuro-fuzzy model to 9781420036268.ch25.
landslide-susceptibility mapping for shallow landslides in a 65. R. Srinivasan and C. Lohith, ‘‘Main study—detailed statistical
tropical hilly area. Comput Geosci 37(9):1264–1276 analysis by multiple regression,’’ in Strategic marketing and
55. Peterson LE (2009) K-nearest neighbor. Scholarpedia 4(2):1883 innovation for Indian MSMEs: Springer, 2017, pp. 69–92.
56. Kari D, Mirza AH, Khan F, Ozkan H, Kozat SS (2018) Boosted
adaptive filters. Digital Signal Processing 81:61–78 Publisher’s Note Springer Nature remains neutral with regard to
57. Pal M (2005) Random forest classifier for remote sensing clas- jurisdictional claims in published maps and institutional affiliations.
sification. Int J Remote Sens 26(1):217–222
58. Ali M, Deo RC, Downs NJ, Maraseni T (2018) Multi-stage
committee based extreme learning machine model incorporating
123