JutePest-YOLO A Deep Learning Network For Jute Pest Identification and Detection
JutePest-YOLO A Deep Learning Network For Jute Pest Identification and Detection
ABSTRACT In recent years, jute, as an important natural fiber crop, has become more and more significant in
the production process of insect pests, causing serious harm to agricultural production. Especially in the field
of crop pest identification with complex backgrounds, fuzzy features, and multiple small targets, the lack
of datasets specifically for jute pests has led to the large limitations of traditional pest identification models
in terms of generalization. At the same time, the research on models specifically for jute pest detection is
still in its infancy. To solve this problem, we constructed a large-scale image dataset containing nine types
of jute pests, which was highly targeted and could effectively support model training and evaluation. In this
study, we developed a deep convolutional neural network model based on YOLOv7, namely JutePest-YOLO.
The model has optimized the Backbone, Head, and loss functions of the baseline model, and introduced the
new ELAN-P module and P6 detection layer, which effectively improved the model’s ability to identify jute
pests in complex backgrounds. The experimental results showed that compared with the baseline model, the
Precision, Recall, and F1 scores of the JutePest-YOLO model were improved by 3.45%, 1.76%, and 2.58%,
respectively; the [email protected] and [email protected]:0.95 was improved by 2.24% and 3.25%, and the overall model’s
computation (GFLOPS) was reduced by 16.05%. Compared to other advanced methods such as YOLOv8s,
JutePest-YOLO has achieved superior performance in terms of detection accuracy, with a precision of 98.7%
and [email protected] reaching 95.68%. As a result, JutePest-YOLO not only achieved significant improvement
in recognition accuracy but also optimized computational efficiency. It’s a high-performance, lightweight
solution for jute pest detection.
INDEX TERMS Jute pest detection, YOLOv7, PConv, wise-IoU, object detection, deep learning.
2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
72938 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ VOLUME 12, 2024
S. Zhang et al.: JutePest-YOLO: A Deep Learning Network for Jute Pest Identification and Detection
II. RELATED WORK DF-YOLO; they introduced the DenseNet network into the
In recent years, researchers have developed an increasing YOLOv4 backbone network CSPDarknet53 to introduce
number of models using different Convolutional Neural Net- DenseNet network to enhance the feature extractor capa-
works (CNNs), and this section highlights some of the recent bility of the model, improve the individual recognition rate
noteworthy studies. Sourav and Wang [9] proposed a target of densely distributed targets, use the focal loss function to
detection model based on Transfer Learning (TL) and Deep improve the effect of sample imbalance on training and opti-
Convolutional Neural Networks (DCNN), which was capable mize the mining process of complex samples, the algorithm
of identifying four groups of jute pests, Field cricket, Spilo- achieved 94.89% mAP after testing on the homemade pest
soma obliqua, Jute stem weevil, and Yellow mite with a final dataset, which is better than the improved the previous
accuracy of 95% for the identification of the four pest cate- YOLOv4 by 4.66%. Xinming and Hong [18] compared the
gories. However, in general, the accuracy of the network may performance of two well-known target detection and classifi-
decrease as the number of categories increases. Networks cation models, YOLOv4 and YOLOv7, in detecting different
such as MobileNet [10], AlexNet [11], ShuffleNet [12], leaf diseases. The performance comparison showed that both
or GoogLeNet [13], for example, all assert that the richness architectures were competitive in precision, F1 score, average
of the dataset should be increased in all cases to improve the precision, and recall, but the composite scaling and dynamic
recognition rate of the model. Therefore, the model still needs labeling of YOLOv7 provided superior performance. In addi-
to enhance the number of categories in the dataset greatly. tion, several researchers have focused on defect identification
Karim et al. [14] worked on the same dataset and proposed in raw jute fibers, with Nageshkumar et al. [19] exploring
a deep CNN model called PestDetector for the classification methods to identify and classify fiber defects in this specific
of the jute pest population. Their model achieved an excellent context.
99.18% training accuracy and 99.00% validation accuracy. Although researchers in various fields have utilized var-
However, it could perform better on unseen pest test datasets. ious deep learning algorithms and neural network models
Li et al. [15] established a new large-scale image dataset of to achieve significant results in crop pest recognition and
ten types of jute diseases and pests, which includes eight other target detection tasks, relatively few studies have been
different diseases as well as two types of jute pests. They conducted for the recognition of geographically important
proposed a unique model, YOLO-JD, which integrates into insects, especially jute pests. Moreover, existing studies
its main architecture the Sand Clock feature extractor Mod- generally lack specialized image datasets for jute pest iden-
ule (SCFEM), Deep Sand Clock feature extractor Module tification. Therefore, we have produced a dataset specialized
(DSCFEM) and Spatial Pyramid Pooling Module (SPPM) for jute pests based on the report published by the Department
three new modules to extract image features efficiently and of Agricultural Extension, Bangladesh [20], which identified
to be able to detect multiple types of diseases and pests in the a wide range of pests causing damage to large-scale jute
same image as well as to find multiple instances of diseases production, using the pest species in the report as a reference.
in the same image. However, YOLO-JD achieved an average Considering the problems of feature ambiguity due to
mAP of 96.63% for all disease categories. It was not as effec- complex pest background, misdetection and underdetection
tive for jute pest category recognition. To address these issues, of small target pest species, and generally large arithmetic
Talukder et al. [16] prepared a jute pest dataset containing volume faced by traditional models in the pest identification
17 categories and about 380 photographs per pest category task, we proposed the JutePest-YOLO detection algorithm.
and designed JutePestDetect from several well-known pre- The algorithm aimed to effectively break through the limi-
trained models from previous studies, which is a model based tations in the field of jute pest identification and provide an
on DenseNet201 and Resilient Migration Learning (TL) jute accurate, efficient, and convenient pest detection solution for
pest detection model, which can achieve a surprising 99% jute growers.
accuracy, despite the excellent performance of JutePestDetect
in terms of accuracy on the homemade dataset, Md. Simul
Hasan Talukder et al. did not test the JutePestDetect model III. METHODS
for metrics such as mAP and FPS and lacked comparisons A. YOLOv7 DETECTION
with other, then newer, models for jute pest identification. The YOLO (You Only Look Once) family of algorithms is
The jute pest dataset prepared by them was not targeted. The an efficient target detection framework that has undergone
dataset was not targeted and lacked a description of the jute several iterations and optimizations since it was first proposed
pest species. by Redmon et al. In July 2022, Wang et al. released its latest
In addition to pest identification in the field of jute, version, YOLOv7 [8]. The network architecture of YOLOv7,
in other areas of crop pest identification, we also learned as shown in Figure 2, can be divided into four main compo-
that pest species identification has problems such as small nents: the Input, the Backbone, the Neck, and the Head.
targets being easily lost, dense distribution of pests, individ- For the input part, the image undergoes a series of pre-
ual recognition rate, etc. To improve the efficiency of pest processing stages, such as data enhancement, and is then
detection further, Limei et al. [17] proposed an algorithm for fed into the backbone for the feature extractor. Next, these
pest species identification based on the YOLOv4 network, extracted features are partially feature-fused by the Neck to
generate features of different sizes by fusing the three fea- path. In addition, it guides the computation of different
ture layers extracted by the backbone network. Finally, these groups of features to induce the network to learn richer
fused features are fed to the Head module, which outputs the and more diverse feature information. At the same time, the
prediction results. MPConv module is mainly responsible for the downsam-
The input layer of YOLOv7 subjected the input images pling operation, which combines the maxpool downsampling
to a series of data augmentation algorithms, including color branch with the convolutional downsampling branch to merge
dithering, normalization, random cropping, etc., designed to the feature maps obtained from different downsampling
improve the network’s data diversity and generalization per- methods. This fusion process preserves as much feature
formance. Subsequently, the images after data augmentation information as possible without increasing the computational
are all subjected to a uniform scaling to scale them to the burden.
default size (640 × 640 × 3) to meet the backbone network’s The neck module consists of an optimized SPPCSPC mod-
requirements for the input. ule and Path Aggregation Feature Pyramid Network (PAFPN)
The main responsibility of the backbone network lies in for fusing feature maps of different sizes. Among them, the
extracting feature information from images in preparation role of PAFPN is to retain the precise location information at
for subsequent feature fusion and target detection tasks. the bottom level and fully fuse it with the abstract semantic
The backbone network consists of three main components: information at the top level to achieve a complete fusion of
the CBS, ELAN, and MPConv modules. Specifically, the semantic and location information at different levels. This
CBS module consists of a convolutional layer, a batch nor- strategy further improves the model’s localization accuracy
malization layer, and an activation function layer, whose for multi-sized targets, especially for small targets in complex
main tasks are feature extractor and channel number trans- contexts.
formation operations. The ELAN module is an efficient In the detection head module, the number of image chan-
layer aggregation network that enhances the learning capa- nels of the PAFPN output features was adjusted using the
bility of the network without destroying the original gradient REPConv structure [21], and multi-scale target prediction
was performed by convolution on three different sizes of effectively solve the problems caused by the complex back-
feature map branches output from the neck module. ground of jute pests in this research field.
Finally, we improved the loss function of the model by
abandoning the original CIoU loss function, because it failed
B. IMPROVED JUTE PEST IDENTIFICATION ALGORITHM: to effectively distinguish the differences between targets of
JUTEPEST-YOLO different sizes when dealing with aspect ratios, and was prone
Although the traditional YOLOv7 algorithm can satisfy gen- to cause problems such as missed detection and misdetection
eral image recognition tasks, its detection of jute pests still in small target detection. Therefore, we adopted WIoU v3 to
needs improvement. Most major false detections occur in optimize the loss function [22].WIoU v3 adopts a dynamic
scenes with small target detection and blurred pest features. non-monotonic mechanism and designs a reasonable gradi-
In this study, we proposed an improved deep learning model ent gain allocation strategy, which reduces the occurrence
for jute pest detection, JutePest-YOLO. Its structure is shown of large gradients or harmful gradients from extreme sam-
in Figure 3. ples.WIoU v3 can better take into account the target’s size and
First, we replaced all the ELAN modules of the baseline positional information and effectively solve problems such as
model with the ELAN-P module, which was a module that misdetection and omission of detection of targets at all scales.
replaced all the 3 × 3 regular convolutions in the ELAN mod-
ule with PConv, where PConv applied regular convolutions to
a single subset of the input channels as a way of extracting the 1) ELAN-P MODULE
spatial features, and by doing so, the sum of computational The conventional ELAN module enables the network to learn
redundancy and the number of memory accesses could be more features and be more robust by controlling the shortest
reduced. and longest gradient paths. The structure is shown in Figure 4.
Next, we added a new P6 detection layer in the Head part The ELAN module reaches a steady state when process-
of the original network, and the added P6 detection layer ing large-scale data or performing large-scale computations,
extended the sensory field of the model and enhanced the regardless of the gradient path length and the number of
model’s ability to extract the fuzzy features in the complex computational modules. However, if more computational
background. This is of crucial significance for the accu- modules are stacked indefinitely, this stable state may be
rate localization and identification of jute pests, and can destroyed, reducing parameter utilization. The ELAN-P
FIGURE 7. Structure of the neck and head sections with the addition of the P6 detection layer.
real frames and the aspect ratio to optimize the loss function
further. However, CIoU does not consider that after using the
aspect ratio as a penalty factor in the loss function, if the real
frame and the predicted frame have the same aspect ratio but
different values of width and height, then the penalty term
cannot reflect the real difference between these two frames.
Therefore, in this study, we replace the CIoU loss with the
WIoU v3 loss. WIoU v3 loss places greater emphasis on the
aspect ratio of bounding boxes, center distance, and overlap
area. It introduces a dynamic, non-monotonic focusing mech-
anism and devises a rational gradient gain allocation strategy.
This reduces the occurrence of large or detrimental gradients
from extreme samples, enhancing the model’s performance
FIGURE 8. Schematic diagram of the CIoU loss function. in detecting targets of varying sizes and effectively reducing
false negatives and false positives. Tong et al. [22] introduced
three versions of WIoU. WIoU v1 is based on attention-driven
In Equation (7), IoU denotes the intersection ratio of the
bounding box loss, while WIoU v2 and WIoU v3 incorporate
predicted and real boxes. Some of the remaining parameters
a focusing coefficient through the construction of gradient
involved are shown in Figure 8. ρ represents the Euclidean
gains and algorithmic methods.
distance between the center of the predicted bounding box
WIoU v1 introduced distance as a metric of attention.
and the center of the actual bounding box, where b is the
Reducing the penalty of the geometric metric when the object
coordinate of the center of the predicted bounding box, and
frame and prediction frame overlap within a certain range
bgt is the coordinate of the center of the actual bounding
gives the model a better generalization ability. The formulas
box. The terms cw and ch denote the width and height of
for calculating WIoU v1 are shown in Equation (8) and
the minimum enclosing rectangle (i.e., the smallest com-
Equation (9):
mon external rectangle) of the predicted and actual bounding
boxes. The wgt and hgt are the width and height of the actual LWIoUv1 = RWIoU LIoU
bounding box, while w and h are the width and height of the
2 2
predicted bounding box. x − xgt + y − ygt
= exp LIoU (8)
∗
The CIoU loss function considers the overlap between the
Wg2 + Hg2
predicted and real frames. It introduces a penalty term for
the distance between the center point of the predicted and LIoU = 1 − IoU (9)
FIGURE 11. Image samples in the data augmentation. (a) Original Image.
(b) VerticalFlip. (c) HorizontalFlip. (d) RandomCrop. (e) ShiftScaleRotate.
(f) HueSaturationValue. (g) PadIfNeeded. (h) RandomBrightnessContrast.
(i) RandomFog. (j) Cutout. (k) GaussianBlur. (l) ColorJitter.
TABLE 2. Environment configuration.
in Equations (17)-(19).
Z 1
P (r) dr
AP = (17)
0
1 X 1X
mAP = P (r) (18)
m n
1 X0.95
[email protected] : 0.95 = mAP@r (19)
10 r=0.5
where m denotes the number of categories for classification,
n denotes the number of targets predicted in a single cate-
gory, and P(r) denotes the precision value when the recall
is r. mAP@r denotes the mean mAP value at a specific IoU
threshold r.
TABLE 4. Comparison of the proposed improved model and YOLOv7 detection accuracy. (The bold data in the table indicate the best results.)
F. ABLATION STUDY
To verify the effectiveness of the various improvement strate-
gies of the JutePest-YOLO model proposed in this paper,
we designed an ablation study on the jute pest dataset in this
paper. The experiments were divided into six groups, and
their results are displayed in Table 6. Group 1 is the exper-
imental results of the original model YOLOv7, and Groups
2 to 4 are the results after adding only one improvement
method at a time to the original model, respectively, to verify
the effectiveness of each improvement method to the original
FIGURE 13. Comparison of model detection index change curves. algorithm. Group 5 is the experimental results after adding
two improvement methods, and Group 6 is based on the
finally obtained improved algorithm JutePest-YOLO.
As shown in Table 6, the first group represents the original
clearly indicates its superior performance, further confirming YOLOv7 model without the inclusion of any improvement
our model’s effectiveness in addressing issues related to pest modules, achieving accuracy and [email protected] of only 95.27%
background modeling and the prevalence of small targets. and 93.26%, respectively. In comparison to the original
model, all models incorporating the three improvement meth-
E. DIFFERENT LOSS FUNCTION COMPARISON ods have demonstrated enhanced detection performance. The
In the experiments of training the JutePest-YOLO network analysis of the experimental results is as follows:
for jute pest detection, to verify the superiority of intro- In the second experimental group, the original model was
ducing WIoU v1, we conducted comparative experiments augmented by introducing the WIoU v3 loss function. WIoU
using WIoU v1 and several mainstream loss functions for v3, by incorporating a dynamic, non-monotonic focusing
JutePest-YOLO network respectively, while keeping other mechanism, effectively reduces the occurrence of large or
training conditions consistent. Table 5 demonstrates the detrimental gradients from extreme samples. This enhance-
experimental results, while Figure 15 compares the Precision, ment resulted in an increase of 1.31% in [email protected] and 1.29%
Recall, F1 score, and [email protected] [email protected]:0.95 under dif- in [email protected]:.95.
ferent loss functions. In the third set of experiments, the addition of the P6
The experimental data show that the model achieves detection layer enabled the model to more effectively cap-
the best mAP performance when WIoU v3 is used as the ture large-scale, blurred feature information in complex
bounding box regression loss function, which is 1.13% background images. Consequently, this improvement led
higher than using the WIoU v1 loss function, and 1.46% to a 2.22% increase in accuracy and a 1.43% increase
FIGURE 14. Heatmaps of different models on all categories. (YOLOv7 on the left, JutePest-YOLO on the right.)
TABLE 5. Comparison of detection results for different loss functions introduced by JutePest-YOLO.
√
TABLE 6. Comparison of ablation experiments of each module in JutePest-YOLO model, indicates that this improved strategy was used.
in [email protected], while [email protected]:.95 was enhanced by 3.67%, significantly reduce the FLOPs so that the GFLOPs are
reaching 67.53%. reduced from 105.3 to 85.0, which is a reduction of 19.3%.
Group 4 experiments improved the ELAN module of the In the fifth group of experiments, the P6 detection layer
original YOLOv7 model, and the new ELAN-P module was introduced on the basis of the fourth group. Compared
introduced a more efficient PConv in the original module. to the original model, this resulted in a 16.05% reduction in
After using the ELAN-P module, the model can effectively GFLOPs, while Precision and [email protected] were enhanced by
reduce redundant computations and memory accesses and 2.46% and 1.96%, respectively.
FIGURE 18. Comparison of confusion matrix results, (a) for YOLOv7, (b) for JutePest-YOLO.
In summary, as verified by the Generalisation experiment detection algorithm. Presented in a two-dimensional table
on the jute pest dataset, our JutePest-YOLO model achieves format, the rows represent actual categories while the
excellent performance in all evaluation metrics and has sig- columns represent predicted categories. By calculating the
nificant advantages over other mainstream target detection prediction results across different categories, various metrics
models, especially in terms of precision, recall, detection such as accuracy, recall rate, and false positive rate can be
effect of various categories, and mAP metrics. These results determined.
fully demonstrate the generalization ability of our model and Darker colored blocks on the diagonal of the confusion
its wide applicability in practical applications. matrix indicate high accuracy of the model’s detection results;
values on the off-diagonal represent misclassification, and
I. VISUAL ANALYSIS these values should be as low as possible to show the model’s
To show the detection effect of the proposed model in this high accuracy and low false alarm rate. It is evident that
study more intuitively, a confusion matrix was employed to the YOLOv7 network has lighter color blocks on the diag-
compare the model’s performance before and after improve- onal of the confusion matrix for the category Yello mite
ments. In this experiment, the confusion matrix is primarily with a Precision of 39% and shows color blocks for all
used to assess the performance of the JutePest-YOLO categories on the FN and FP samples. This implies that
the model has a certain error rate in detecting various cat- and omission caused by the complex background and small
egories of objects.By comparison, the confusion matrix of target categories in the field of pest recognition and to satisfy
the JutePest-YOLO network exhibits a darker color on the the requirements of accuracy and effect of the target detec-
diagonal for the P9 (Yello mite) category, indicating an accu- tion of the jute pest scene while considering the resource
racy of 53%. Meanwhile, it achieves a detection accuracy consumption. First, we replaced all the ELAN modules of
of 100% for most other categories. Additionally, only three the YOLOv7 model with the ELAN-P module, which was
categories show color blocks in the case of FN (False Neg- a module that replaced all the 3 × 3 regular convolutions
ative) samples. Notably, P9 represents a typical example of in the ELAN module with PConv, where PConv applied
small-target pest infestation. Therefore, based on the com- regular convolutions to a single subset of the input channels
parison of these confusion matrices, it can be concluded as a way of extracting spatial features, which reduced the
that the JutePest-YOLO model outperformed the original computational redundancy and memory accesses of the net-
model in detecting objects of all categories. The results of work while keeping the original gradient paths unchanged.
the comparison of the confusion matrices are displayed in Next, we added a new P6 detection layer, which extended
Figure 18. the sensory field of the model and fused different levels
To visually demonstrate the detection effect of our of semantic information to enable the network to recog-
model, this study conducted inference experiments using nize fuzzy features in the background of the model more
YOLOv7 and JutePest-YOLO. We screened the images of clearly. Finally, we introduced the WIoU v3 loss function,
the jute pest dataset for this experiment, and all categories which incorporated a dynamic sample allocation strategy to
tried to select images with complex image backgrounds effectively reduce the model’s focus on extreme samples and
and many small targets as the inference experiment data improve the overall performance. In addition, we constructed
and compared the detection results of some categories of a large-scale image dataset containing nine types of jute pests,
pests. which not only provided an effective training and testing
Figure 19 shows the comparative results of YOLOv7 and basis for the model but also was an important contribution
JutePest-YOLO models in detecting jute pests, respectively. to the research field of jute pest recognition. The experi-
It can be observed that YOLOv7 has a relatively poor detec- mental results showed that the average detection accuracy
tion performance, while JutePest-YOLO demonstrated the of the improved model increased by 3.45%, especially in
best detection performance. In the detection of (i) cate- the small target P9 category with 12.6% accuracy improve-
gory Yello mite, YOLOv7 used three detection frames, and ment, [email protected] [email protected]:0.95 compared to YOLOv7
JutePest-YOLO used 14 detection frames, identifying a large with 2.24% and 3.25% respectively, and the GFLOPs were
number of visible Yello mite targets in the image. Overall, reduced by 16.05%.
JutePest-YOLO was able to detect a wide range of jute pests The limitation of the JutePest-YOLO model is that the
quickly, accurately, and comprehensively, providing strong number of parameters and the inference speed of the model
technical support for crop protection. are still too high, resulting in inapplicability to target detec-
tion in other scenarios. In the following research work,
V. CONCLUSION we will make lightweight structural optimization of the
In this study, a JutePest-YOLO model for jute pest detection JutePest-YOLO model so that it can be extended to target
with high detection accuracy and good effect was proposed detection in other scene datasets or applied to the field of
to solve the problems of feature ambiguity and misdetection target tracking.
HENG WANG received the B.E. degree from YIMING JIANG received the B.E. degree from
the Huazhong University of Science and Technol- Wuhan Polytechnic University, Wuhan, China,
ogy, in 2006, and the Ph.D. degree in engineering in 2023, where he is currently pursuing the M.S.
from Wuhan University, in 2013. He is currently degree in software engineering. His research inter-
a Professor with the School of Mathematics and ests include music information retrieval, and arti-
Computer Science, Wuhan Polytechnic University. ficial intelligence technology and its application.
He is also a Postdoctoral Research Fellow with
Alto University, Finland. His research interests
include the perception characteristics of acoustic
spatial parameters, artificial intelligence, and the
application of 3D audio and video in virtual reality.
ZHENG LIU received the B.E. degree from Wuhan LEI YU received the B.E. degree from South-
Polytechnic University, Wuhan, China, in 2022, west Petroleum University, Chengdu, China,
where he is currently pursuing the M.S. degree in 2023. She is currently pursuing the M.S.
in software engineering. His research interests degree in software engineering with Wuhan Poly-
include music information retrieval, and artificial technic University, Wuhan. Her research interest
intelligence technology and its application. includes artificial intelligence technology and its
application.