0% found this document useful (0 votes)
99 views19 pages

JutePest-YOLO A Deep Learning Network For Jute Pest Identification and Detection

JutePest-YOLO_A_Deep_Learning_Network_for_Jute_Pest_Identification_and_Detection (1)

Uploaded by

wekalex833
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views19 pages

JutePest-YOLO A Deep Learning Network For Jute Pest Identification and Detection

JutePest-YOLO_A_Deep_Learning_Network_for_Jute_Pest_Identification_and_Detection (1)

Uploaded by

wekalex833
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Received 8 March 2024, accepted 15 May 2024, date of publication 20 May 2024, date of current version 30 May 2024.

Digital Object Identifier 10.1109/ACCESS.2024.3403491

JutePest-YOLO: A Deep Learning Network for


Jute Pest Identification and Detection
SHUAI ZHANG 1 , HENG WANG 1, CONG ZHANG 2, ZHENG LIU1 ,
YIMING JIANG1 , AND LEI YU1
1 School of Mathematics and Computer, Wuhan Polytechnic University, Wuhan 430048, China
2 School of Electrical and Electronic Engineering, Wuhan Polytechnic University, Wuhan 430048, China
Corresponding author: Heng Wang ([email protected])
This work was supported in part by the Key Project of the Scientific Research Program of the Hubei Provincial Department of Education
under Grant D20201601; in part by the Major Technical Innovation Projects of Hubei Province under Grant 2018ABA099; and in part by
the Hubei Provincial Key Laboratory of Intelligent Robot under Grant HBIR202101.

ABSTRACT In recent years, jute, as an important natural fiber crop, has become more and more significant in
the production process of insect pests, causing serious harm to agricultural production. Especially in the field
of crop pest identification with complex backgrounds, fuzzy features, and multiple small targets, the lack
of datasets specifically for jute pests has led to the large limitations of traditional pest identification models
in terms of generalization. At the same time, the research on models specifically for jute pest detection is
still in its infancy. To solve this problem, we constructed a large-scale image dataset containing nine types
of jute pests, which was highly targeted and could effectively support model training and evaluation. In this
study, we developed a deep convolutional neural network model based on YOLOv7, namely JutePest-YOLO.
The model has optimized the Backbone, Head, and loss functions of the baseline model, and introduced the
new ELAN-P module and P6 detection layer, which effectively improved the model’s ability to identify jute
pests in complex backgrounds. The experimental results showed that compared with the baseline model, the
Precision, Recall, and F1 scores of the JutePest-YOLO model were improved by 3.45%, 1.76%, and 2.58%,
respectively; the [email protected] and [email protected]:0.95 was improved by 2.24% and 3.25%, and the overall model’s
computation (GFLOPS) was reduced by 16.05%. Compared to other advanced methods such as YOLOv8s,
JutePest-YOLO has achieved superior performance in terms of detection accuracy, with a precision of 98.7%
and [email protected] reaching 95.68%. As a result, JutePest-YOLO not only achieved significant improvement
in recognition accuracy but also optimized computational efficiency. It’s a high-performance, lightweight
solution for jute pest detection.

INDEX TERMS Jute pest detection, YOLOv7, PConv, wise-IoU, object detection, deep learning.

I. INTRODUCTION Jute in Bangladesh is known as the ‘‘golden fiber’’, not only


This Jute is a highly versatile natural fiber, widely used because of its unique golden color but also because of its
in the manufacture of a variety of environmentally friendly important contribution to the national economy. In China,
products, such as bags, handicrafts, textiles clothing, etc [1]. jute also has a long history, the importance of the agricultural
It is often seen as an ideal alternative to nylon and polypropy- product is self-evident [4]. One of the major challenges faced
lene because jute is not only durable and reusable but also during jute production is the threat of pests. These pests not
poses minimal threat to human health and the natural envi- only have a serious impact on the growth of jute but also have
ronment [2]. In addition, the low cost of jute is preferable to a significant negative impact on the overall yield and quality.
synthetic fibers, making it an affordable material choice [3]. For example, Indigo caterpillars feed heavily on jute leaves,
resulting in stunted plant growth and in severe cases, plant
The associate editor coordinating the review of this manuscript and death. Jute semiloopers attack the tops and leaves of jute,
approving it for publication was Alba Amato . making it difficult for the plant to flower and set seed, thus

2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
72938 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ VOLUME 12, 2024
S. Zhang et al.: JutePest-YOLO: A Deep Learning Network for Jute Pest Identification and Detection

directly affecting yield. Yellow mites affect jute by causing


leaf spotting, wilting, and eventual defoliation, which not
only affects the quality of the jute fiber but also reduces
yield [5].
Apart from the above-mentioned pests, various other pests
also pose a threat to jute production such as root pests and
larvae that colonize the soil and attack the root system of
jute, affecting the plant’s ability to absorb water and nutrients.
The activities of these pests not only make jute production
difficult but also increase the cost of plant protection and
pest control for farmers. Therefore, effective identification
and control of these pests is important to safeguard the yield
and quality of jute.
In jute production, although the application of insecticides FIGURE 1. Detection results for the current stage of the network model.
is a common and rapid method of pest control with significant In the first figure, YOLOv5s uses only one detection frame, and YOLOv7
uses four detection frames, and in the second figure, YOLOv5 uses only
cost-effectiveness, the effectiveness of most insecticides is three detection frames, and YOLOv7 uses four detection frames.
limited to specific species of pests. Traditional visual inspec-
tion methods [6], while relying on specialized knowledge consideration for the practicality of mobile devices. Our main
and experience, can easily lead to misuse of insecticides innovations are as follows:
and affect production due to similar pest symptoms and
complex detection processes. Existing inspection methods 1) Optimize the model for mobile device compatibility:
either rely on complex hardware equipment or are difficult To better adapt to mobile devices, we replaced the 3 ×
to respond quickly in the field. Therefore, the development 3 regular convolution in the ELAN module of the base-
of an Artificial Intelligence (AI)-based real-time inspection line model with PConv to form a new ELAN-P module,
technology, especially an efficient pest detection solution that which not only reduced the amount of computation and
can be adapted to mobile devices, is important for real-time the number of memory accesses of the whole model
monitoring and effective pest control. but also improved the computational efficiency of the
In recent years, deep learning methods have made signif- model so that the network could extract the features of
icant progress in the field of crop pest detection, especially the jute insect pests more quickly.
in the application of two network models, YOLOv5s and 2) Solve the feature ambiguity problem: For the complex-
YOLOv7. YOLOv5s, with its lightweight structure and effi- ity of the pest background, we optimized the Head part
cient performance, performed well in small target detection of the baseline model and added a new P6 detection
and was suitable for fast detection in resource-constrained layer, which extended the sensory field of the model
environments [7]. YOLOv7, on the other hand, has made and enhanced the feature extraction capability of the
an even greater breakthrough in pest detection accuracy original image, so that JutePest-YOLO could recognize
and speed due to its more advanced feature extraction the ambiguous features in the complex background
and target recognition capabilities. Both models demon- more clearly.
strate excellent recognition capabilities when dealing with 3) Enhance the detection ability of small targets: Consid-
complex crop backgrounds and pests of various scales, ering the diversity of small target pests, we improved
and YOLOv7, in particular, is widely regarded as the the loss function of the model by abandoning the origi-
fastest and most accurate real-time object detector currently nal CIoU loss function and adopting WIoU to optimize
available [8]. the loss function, which effectively solved the problems
Figure 1 demonstrates the effectiveness of YOLOv5s and such as misdetection and omission of targets of all
YOLOv7 in detecting small target pests. However, despite the scales.
effectiveness of these two models in general-purpose object In addition, we constructed a large-scale image dataset
detection, they still face certain challenges when confronted containing nine types of jute pests, which not only provided
with the detection of small-target pests. These challenges an effective training and tested basis for the model but also
mainly originated from the feature ambiguity due to the was an important contribution to the research field of jute pest
complexity of the pest background and the diversity of recognition.
small-target pest species. These problems triggered misdetec- The remainder of the paper is structured as follows:
tion and omission in the detection of small target pests, thus Section II describes the related work, Section III details
limiting the efficiency and accuracy of the model in jute pest the architecture of the JutePest-YOLO model, Section IV
identification applications. provides the results and analyses of the comparative exper-
To solve the above problems, we proposed a more effi- iments, the ablation experiments, and the visual presentation,
cient jute pest detection model, JutePest-YOLO, which was and finally, in Section V, we summarize the results of the
innovatively optimized based on YOLOv7, with special research.

VOLUME 12, 2024 72939


S. Zhang et al.: JutePest-YOLO: A Deep Learning Network for Jute Pest Identification and Detection

II. RELATED WORK DF-YOLO; they introduced the DenseNet network into the
In recent years, researchers have developed an increasing YOLOv4 backbone network CSPDarknet53 to introduce
number of models using different Convolutional Neural Net- DenseNet network to enhance the feature extractor capa-
works (CNNs), and this section highlights some of the recent bility of the model, improve the individual recognition rate
noteworthy studies. Sourav and Wang [9] proposed a target of densely distributed targets, use the focal loss function to
detection model based on Transfer Learning (TL) and Deep improve the effect of sample imbalance on training and opti-
Convolutional Neural Networks (DCNN), which was capable mize the mining process of complex samples, the algorithm
of identifying four groups of jute pests, Field cricket, Spilo- achieved 94.89% mAP after testing on the homemade pest
soma obliqua, Jute stem weevil, and Yellow mite with a final dataset, which is better than the improved the previous
accuracy of 95% for the identification of the four pest cate- YOLOv4 by 4.66%. Xinming and Hong [18] compared the
gories. However, in general, the accuracy of the network may performance of two well-known target detection and classifi-
decrease as the number of categories increases. Networks cation models, YOLOv4 and YOLOv7, in detecting different
such as MobileNet [10], AlexNet [11], ShuffleNet [12], leaf diseases. The performance comparison showed that both
or GoogLeNet [13], for example, all assert that the richness architectures were competitive in precision, F1 score, average
of the dataset should be increased in all cases to improve the precision, and recall, but the composite scaling and dynamic
recognition rate of the model. Therefore, the model still needs labeling of YOLOv7 provided superior performance. In addi-
to enhance the number of categories in the dataset greatly. tion, several researchers have focused on defect identification
Karim et al. [14] worked on the same dataset and proposed in raw jute fibers, with Nageshkumar et al. [19] exploring
a deep CNN model called PestDetector for the classification methods to identify and classify fiber defects in this specific
of the jute pest population. Their model achieved an excellent context.
99.18% training accuracy and 99.00% validation accuracy. Although researchers in various fields have utilized var-
However, it could perform better on unseen pest test datasets. ious deep learning algorithms and neural network models
Li et al. [15] established a new large-scale image dataset of to achieve significant results in crop pest recognition and
ten types of jute diseases and pests, which includes eight other target detection tasks, relatively few studies have been
different diseases as well as two types of jute pests. They conducted for the recognition of geographically important
proposed a unique model, YOLO-JD, which integrates into insects, especially jute pests. Moreover, existing studies
its main architecture the Sand Clock feature extractor Mod- generally lack specialized image datasets for jute pest iden-
ule (SCFEM), Deep Sand Clock feature extractor Module tification. Therefore, we have produced a dataset specialized
(DSCFEM) and Spatial Pyramid Pooling Module (SPPM) for jute pests based on the report published by the Department
three new modules to extract image features efficiently and of Agricultural Extension, Bangladesh [20], which identified
to be able to detect multiple types of diseases and pests in the a wide range of pests causing damage to large-scale jute
same image as well as to find multiple instances of diseases production, using the pest species in the report as a reference.
in the same image. However, YOLO-JD achieved an average Considering the problems of feature ambiguity due to
mAP of 96.63% for all disease categories. It was not as effec- complex pest background, misdetection and underdetection
tive for jute pest category recognition. To address these issues, of small target pest species, and generally large arithmetic
Talukder et al. [16] prepared a jute pest dataset containing volume faced by traditional models in the pest identification
17 categories and about 380 photographs per pest category task, we proposed the JutePest-YOLO detection algorithm.
and designed JutePestDetect from several well-known pre- The algorithm aimed to effectively break through the limi-
trained models from previous studies, which is a model based tations in the field of jute pest identification and provide an
on DenseNet201 and Resilient Migration Learning (TL) jute accurate, efficient, and convenient pest detection solution for
pest detection model, which can achieve a surprising 99% jute growers.
accuracy, despite the excellent performance of JutePestDetect
in terms of accuracy on the homemade dataset, Md. Simul
Hasan Talukder et al. did not test the JutePestDetect model III. METHODS
for metrics such as mAP and FPS and lacked comparisons A. YOLOv7 DETECTION
with other, then newer, models for jute pest identification. The YOLO (You Only Look Once) family of algorithms is
The jute pest dataset prepared by them was not targeted. The an efficient target detection framework that has undergone
dataset was not targeted and lacked a description of the jute several iterations and optimizations since it was first proposed
pest species. by Redmon et al. In July 2022, Wang et al. released its latest
In addition to pest identification in the field of jute, version, YOLOv7 [8]. The network architecture of YOLOv7,
in other areas of crop pest identification, we also learned as shown in Figure 2, can be divided into four main compo-
that pest species identification has problems such as small nents: the Input, the Backbone, the Neck, and the Head.
targets being easily lost, dense distribution of pests, individ- For the input part, the image undergoes a series of pre-
ual recognition rate, etc. To improve the efficiency of pest processing stages, such as data enhancement, and is then
detection further, Limei et al. [17] proposed an algorithm for fed into the backbone for the feature extractor. Next, these
pest species identification based on the YOLOv4 network, extracted features are partially feature-fused by the Neck to

72940 VOLUME 12, 2024


S. Zhang et al.: JutePest-YOLO: A Deep Learning Network for Jute Pest Identification and Detection

FIGURE 2. The overall structure of YOLOv7.

generate features of different sizes by fusing the three fea- path. In addition, it guides the computation of different
ture layers extracted by the backbone network. Finally, these groups of features to induce the network to learn richer
fused features are fed to the Head module, which outputs the and more diverse feature information. At the same time, the
prediction results. MPConv module is mainly responsible for the downsam-
The input layer of YOLOv7 subjected the input images pling operation, which combines the maxpool downsampling
to a series of data augmentation algorithms, including color branch with the convolutional downsampling branch to merge
dithering, normalization, random cropping, etc., designed to the feature maps obtained from different downsampling
improve the network’s data diversity and generalization per- methods. This fusion process preserves as much feature
formance. Subsequently, the images after data augmentation information as possible without increasing the computational
are all subjected to a uniform scaling to scale them to the burden.
default size (640 × 640 × 3) to meet the backbone network’s The neck module consists of an optimized SPPCSPC mod-
requirements for the input. ule and Path Aggregation Feature Pyramid Network (PAFPN)
The main responsibility of the backbone network lies in for fusing feature maps of different sizes. Among them, the
extracting feature information from images in preparation role of PAFPN is to retain the precise location information at
for subsequent feature fusion and target detection tasks. the bottom level and fully fuse it with the abstract semantic
The backbone network consists of three main components: information at the top level to achieve a complete fusion of
the CBS, ELAN, and MPConv modules. Specifically, the semantic and location information at different levels. This
CBS module consists of a convolutional layer, a batch nor- strategy further improves the model’s localization accuracy
malization layer, and an activation function layer, whose for multi-sized targets, especially for small targets in complex
main tasks are feature extractor and channel number trans- contexts.
formation operations. The ELAN module is an efficient In the detection head module, the number of image chan-
layer aggregation network that enhances the learning capa- nels of the PAFPN output features was adjusted using the
bility of the network without destroying the original gradient REPConv structure [21], and multi-scale target prediction

VOLUME 12, 2024 72941


S. Zhang et al.: JutePest-YOLO: A Deep Learning Network for Jute Pest Identification and Detection

FIGURE 3. JutePest-YOLO structure diagram.

was performed by convolution on three different sizes of effectively solve the problems caused by the complex back-
feature map branches output from the neck module. ground of jute pests in this research field.
Finally, we improved the loss function of the model by
abandoning the original CIoU loss function, because it failed
B. IMPROVED JUTE PEST IDENTIFICATION ALGORITHM: to effectively distinguish the differences between targets of
JUTEPEST-YOLO different sizes when dealing with aspect ratios, and was prone
Although the traditional YOLOv7 algorithm can satisfy gen- to cause problems such as missed detection and misdetection
eral image recognition tasks, its detection of jute pests still in small target detection. Therefore, we adopted WIoU v3 to
needs improvement. Most major false detections occur in optimize the loss function [22].WIoU v3 adopts a dynamic
scenes with small target detection and blurred pest features. non-monotonic mechanism and designs a reasonable gradi-
In this study, we proposed an improved deep learning model ent gain allocation strategy, which reduces the occurrence
for jute pest detection, JutePest-YOLO. Its structure is shown of large gradients or harmful gradients from extreme sam-
in Figure 3. ples.WIoU v3 can better take into account the target’s size and
First, we replaced all the ELAN modules of the baseline positional information and effectively solve problems such as
model with the ELAN-P module, which was a module that misdetection and omission of detection of targets at all scales.
replaced all the 3 × 3 regular convolutions in the ELAN mod-
ule with PConv, where PConv applied regular convolutions to
a single subset of the input channels as a way of extracting the 1) ELAN-P MODULE
spatial features, and by doing so, the sum of computational The conventional ELAN module enables the network to learn
redundancy and the number of memory accesses could be more features and be more robust by controlling the shortest
reduced. and longest gradient paths. The structure is shown in Figure 4.
Next, we added a new P6 detection layer in the Head part The ELAN module reaches a steady state when process-
of the original network, and the added P6 detection layer ing large-scale data or performing large-scale computations,
extended the sensory field of the model and enhanced the regardless of the gradient path length and the number of
model’s ability to extract the fuzzy features in the complex computational modules. However, if more computational
background. This is of crucial significance for the accu- modules are stacked indefinitely, this stable state may be
rate localization and identification of jute pests, and can destroyed, reducing parameter utilization. The ELAN-P

72942 VOLUME 12, 2024


S. Zhang et al.: JutePest-YOLO: A Deep Learning Network for Jute Pest Identification and Detection

FIGURE 4. Structure diagram of the ELAN module.

proposed in this paper introduces the PConv convolution in


the FasterNet network [23] to reduce the network’s compu-
tation and improve the network’s computational efficiency FIGURE 5. Comparison of ordinary convolution, DWConv, and PConv
without destroying the original gradient path. convolution: (a) Structural diagram of ordinary convolution, (b) Structural
diagram of DWConv convolution, (c) Structural diagram of PConv
PConv is used to apply regular convolution to a sin- convolution.
gle subset of input channels to extract spatial features
and keep the remaining channels unchanged. This partial
convolution approach reduces the sum of computational The memory access formula for regular convolution is as
redundancy and memory accesses, thus improving detection follows:
speed. In YOLOv7, there is a certain amount of redundant
computation in its neural network structure, which results in h × w × 2c + k 2 × c ≈ h × w × 2c (3)
more floating point operations (FLOPs), thus increasing the
latency time of the model. Equation (1) below reveals the We found that when c′ > c, it is evident that the memory
relationship between latency time, FLOPs, and FLOPS: access times of DWConv are higher than those of the reg-
ular convolution. A novel Partial Convolution (PConv) was
FLOPs proposed in FasterNet as a competitive alternative capable of
Latency = (1) reducing the computational redundancy and the number of
FLOPS
memory accesses. The design of PConv is shown in Figure 5.
Here, FLOPs represent the total number of floating point In contrast to regular convolution and DWConv, PConv
operations, and FLOPS represents the number of floating in FasterNet requires only regular convolution to be applied
point operations per second. The ratio of FLOPs to FLOPS to a portion of the input channels to extract spatial features,
is a measure of computational latency. The FasterNet net- leaving the remaining channels unchanged. If the feature map
work increases FLOPS at the same time by effectively is stored continuously or periodically in memory, the first or
reducing the FLOPs, and this approach reduces the latency last consecutive channel represents the whole feature map.
time and improves the computation speed, as seen from The FLOPs of Pconv are shown in Equation (4):
equation (1).
Depthwise Convolution (DWConv) is a commonly used h × w × k 2 × c2p (4)
method for convolutional optimization of backbone net- c
works. Unlike conventional convolution, DWConv assigns a In the general case of ratio r = cp = 14 , the FLOPs of
1
convolution kernel to each channel so that each channel is Pconv are only 16 of the conventional convolution, achieving
convolved by only one convolution kernel, effectively reduc- a significant reduction in FLOPs. In addition to this, Pconv
ing redundant computations and FLOPs.However, DWConv also has a significant reduction in memory accesses compared
cannot simply replace conventional convolution, which may to the regular convolution, which is shown in Equation (5):
lead to degradation of network accuracy.
h × w × 2cp + k 2 × c2p ≈ h × w × 2cp (5)
Typically, DWConv is followed by Pointwise Convolution
(PWConv) to improve accuracy. With the structure of this When r = 14 , Pconv has only 14 of the memory access of
network combination, to compensate for the loss of accuracy regular convolution.
caused by DWConv, the number of channels of DWConv PConv enables neural network models to pursue higher
needs to be increased from c to c′ , which is more than the FLOPS while reducing the number of parameters and increas-
number of channels c for regular convolution. However, this ing the FPS. Based on PConv, we constructed ELAN-P
increases the number of memory accesses, which increases modules. Each ELAN-P module consists of three CBS mod-
the latency time and decreases the overall computational ules and four Pconv modules, and the structure diagram of the
speed. The memory accesses for DWConv are shown in whole module is shown in Figure 6.
Equation (2), where h and w represent the length and width of The ELAN-P module is similar in structure to the conven-
the image, respectively, c represents the number of channels, tional ELAN module, with two branches. The first branch
and k represents the convolution kernel size: passes through a 1 × 1 convolution module to change the
number of channels. The second branch changes the number
h × w × 2c′ + k 2 × c′ ≈ h × w × 2c′ (2) of channels first by a 1 × 1 convolutional module and then

VOLUME 12, 2024 72943


S. Zhang et al.: JutePest-YOLO: A Deep Learning Network for Jute Pest Identification and Detection

detection layer. Subsequently, D3 was adjusted to the resolu-


tion to the size of D4 and D5 by downsampling operation and
then fused with them to obtain the fused M4 and M5 feature
maps.M4 and M5 were adjusted to the number of channels by
RepConv and predicted using 1 × 1 convolution to form the
FIGURE 6. Structure diagram of the ELAN-P. P4 and P5 detection layers, respectively. Finally, we merge
the D3 feature map with the M5 feature map following a
downsampling process to form the M6 feature map. This
by four 3 × 3 PConv convolutional modules for the fea- is then subjected to channel adjustment via RepConv and
ture extractor. By replacing the original convolution module 1 × 1 convolution before prediction, culminating in the for-
with four 3 × 3 PConv convolutions, the computation of the mation of the P6 detection layer.
entire module is greatly reduced, resulting in a more effi- The P6 detection layer fused different levels of semantic
cient computation. Finally, the final feature extractor results information to enable the model to more clearly recognize
were obtained by superimposing the four features. Using the ambiguous features in complex backgrounds.
ELAN-P module instead of the ELAN module in YOLOv7,
we achieve a more efficient spatial feature extractor due to 3) LOSS FUNCTION IMPROVEMENT
the introduction of the PConv convolution, which allows for Object Detection is one of the core problems of computer
a reduction in the amount of computation in the network and vision, and its effectiveness depends greatly on the loss
an increase in the computational efficiency of the network function used [24]. In our proposed JutePest-YOLO model,
while keeping the original gradient paths intact. We expect we noticed that the accuracy of Jute mite species pest detec-
that this improvement will reduce redundant computations tion is low, and the targets of this species occupy fewer
and memory accesses, significantly reducing FLOPs while pixel points in the image than fewer targets are small. The
boosting FLOPS. traditional YOLOv7 algorithm should be more effective for
jute mite pest detection, with leakage and false detection
2) INTRODUCTION OF P6 DETECTION LAYER occurring mainly in the case of small target detection and
In this study, we implemented a significant improvement background blurring. Furthermore, improving the loss func-
and optimization measure for the YOLOv7 network model, tion is the key to improving the accuracy of small target
i.e., a new P6 detection layer was added to the Head part detection.
of the original model, which extended the sensory field of Many current target detection algorithms use the Inter-
the model, improved the network’s ability to extract features section of Union (IoU) as the loss function because the
from the original image, and enhanced the recognition and intersection ratio can represent the error between the predic-
detection of multi-scale targets. Its structure is shown in tion frame and the real frame, directly affecting the prediction
Figure 7. effect. The higher the value of the loss function, the higher the
The traditional Head module predicts the objectness, class, direct error between the prediction and real frames. In tra-
and box components mainly by taking the three detection ditional IoU calculations, the IoU values of the predicted
layers P3, P4, and P5 output from the Neck part and adjusting and actual bounding boxes are calculated by the ratio of
them by the number of RepConv channels, followed by a their intersection area to the total area. However, this tradi-
1 × 1 convolution. The introduction of the P6 detection layer tional approach sometimes leads to sub-optimal results. For
significantly expands the analytical scope of the network, example, smaller targets are given less weight in the IoU
allowing the model to more effectively capture and under- calculation due to a smaller pixel base, which may cause the
stand the information about large-scale blurred features in the model to ignore these smaller targets due to bias.
image. It facilitates better capture and utilization of high-level The loss function used for the original YOLOv7 network
semantic information by performing feature extraction and is as follows:
information integration at higher layers of the network.
The specific implementation of the new P6 detection layer loss = lossioc + lossconf + losscls (6)
is as follows: where, lossioc , lossconf , and losscls represent the localization
Firstly, the image is processed by Backbone to output three loss, confidence loss, and classification loss, respectively.
feature maps, whose resolutions are C3, C4, and C5, from Among them, the confidence loss and classification loss
largest to smallest. Next, the network will process C5 by are calculated using the cross-entropy loss function, and the
reducing its channel number from 1024 to 512 through the localization loss is calculated using the CIoU loss function,
SPPCSPC module and adjusting the resolution of C5 to the which is shown in Equation (7):
size of C4 and C3 through upsampling, followed by feature
ρ 2 b, bgt

fusion to get the fused D4 and D3 feature map. Among them, LCIoU = 1 − IoU +
D3 is first adjusted for the number of channels by RepConv, (cw )2 + (ch )2
4
 gt 
and then 1 × 1 convolution is used to predict the three parts −1 w −1 w
+ 2 tan − tan (7)
of objectness, class, and bbox, which finally forms the P3 π hgt h
72944 VOLUME 12, 2024
S. Zhang et al.: JutePest-YOLO: A Deep Learning Network for Jute Pest Identification and Detection

FIGURE 7. Structure of the neck and head sections with the addition of the P6 detection layer.

real frames and the aspect ratio to optimize the loss function
further. However, CIoU does not consider that after using the
aspect ratio as a penalty factor in the loss function, if the real
frame and the predicted frame have the same aspect ratio but
different values of width and height, then the penalty term
cannot reflect the real difference between these two frames.
Therefore, in this study, we replace the CIoU loss with the
WIoU v3 loss. WIoU v3 loss places greater emphasis on the
aspect ratio of bounding boxes, center distance, and overlap
area. It introduces a dynamic, non-monotonic focusing mech-
anism and devises a rational gradient gain allocation strategy.
This reduces the occurrence of large or detrimental gradients
from extreme samples, enhancing the model’s performance
FIGURE 8. Schematic diagram of the CIoU loss function. in detecting targets of varying sizes and effectively reducing
false negatives and false positives. Tong et al. [22] introduced
three versions of WIoU. WIoU v1 is based on attention-driven
In Equation (7), IoU denotes the intersection ratio of the
bounding box loss, while WIoU v2 and WIoU v3 incorporate
predicted and real boxes. Some of the remaining parameters
a focusing coefficient through the construction of gradient
involved are shown in Figure 8. ρ represents the Euclidean
gains and algorithmic methods.
distance between the center of the predicted bounding box
WIoU v1 introduced distance as a metric of attention.
and the center of the actual bounding box, where b is the
Reducing the penalty of the geometric metric when the object
coordinate of the center of the predicted bounding box, and
frame and prediction frame overlap within a certain range
bgt is the coordinate of the center of the actual bounding
gives the model a better generalization ability. The formulas
box. The terms cw and ch denote the width and height of
for calculating WIoU v1 are shown in Equation (8) and
the minimum enclosing rectangle (i.e., the smallest com-
Equation (9):
mon external rectangle) of the predicted and actual bounding
boxes. The wgt and hgt are the width and height of the actual LWIoUv1 = RWIoU LIoU
bounding box, while w and h are the width and height of the 
2 2

predicted bounding box.  x − xgt + y − ygt
= exp   LIoU (8)

∗
The CIoU loss function considers the overlap between the

Wg2 + Hg2
predicted and real frames. It introduces a penalty term for
the distance between the center point of the predicted and LIoU = 1 − IoU (9)

VOLUME 12, 2024 72945


S. Zhang et al.: JutePest-YOLO: A Deep Learning Network for Jute Pest Identification and Detection

FIGURE 9. Schematic diagram of the WIoU loss function.

The weight of simple samples in the loss values is


effectively reduced by constructing monotonic focusing
coefficients L∗IoU and applying WIoU v2 to WIoU v1. Con-
sidering that L∗IoU decreases during the model training period
as the of L∗IoU decreases, which leads to slower conver- FIGURE 10. Some sample images from our jute diseases and pests data.
(a) Black hairy. (b) Field cricket. (c) Indigo caterpillar. (d) Jute semilooper.
gence, the average value of LIoU is introduced to normalize (e) Jute stem girdler. (f) Jute stem weevil. (g) Leaf beetle. (h) Pod borer.
L∗IoU . The formula for WIoU v2 is shown in Equation (10): (i) Yellow mite.

LIoU = 1 − IoU (10)


where γ is a hyperparameter. of the model, but we base on the application scenario of
WIoU v3 defines the outlier β to measure the quality of the the jute pest recognition task, considering the diversity of
anchor frame, constructs the non-monotonic focusing factor pest species, the model trained using a single dataset is less
γ based on β, and applies r to WIoU v1. The WIoU v3 generalizable and does not perform as well as the model
equations are shown in Equation (11) to Equation (13): trained with multiple category datasets in recognizing unseen
or similar species. Therefore, to address the problem of miss-
LWIoUv3 = γ × LWIoUv1 (11)
ing datasets for multiple species of jute pests, the other part of
β
γ = β−δ (12) the dataset we obtained from Baidu image library and Google
δα image library. Through these sources, we collected images
L∗ of nine types of pests that seriously damage jute plants,
β = IoU ∈ [0, +∞) (13)
LIoU including Black hairy, Field cricket, Indigo caterpillar, Jute
β denotes the degree of abnormality of the prediction semilooper, Jute stem girdler, Jute stem weevil, Leaf beetle,
frame, and a smaller degree implies a higher quality of the Pod borer, and Yellow mite images. Some sample images will
anchor frame. Therefore, using β to construct the number be shown in Figure 10.
of non-monotonic focuses can assign smaller gradient gains
to the prediction frames with larger anomalies, effectively 2) DATASET PREPROCESSING
reducing the harmful gradients of low-quality training sam- To increase the training volume of the network model and to
ples; α and δ are hyperparameters. The meanings of the prevent overfitting and low model generalization ability dur-
other parameters are shown in Figure 9. xp and yp denote ing model training, we expanded the jute pest dataset based on
the coordinate values of the prediction box, while xgt and the original data in this study using data augmentation meth-
ygt denote the coordinate values of the Ground Truth. The ods using content and geometric transformations. Geometric
corresponding H and W values denote the width and height transformations modify image properties without changing
of the two boxes, respectively. the image content, such as image RandomCrop, horizontal
flip, and translation rotation, etc., which are designed to
IV. EXPERIMENTAL RESULTS AND ANALYSIS simulate the appearance of pests under different viewpoints
A. DATASET and locations to meet the challenge of target localization in
1) DATASET CONSTRUCTION complex backgrounds. Content transformations include color
In the field of target detection, model training using a single dithering, Gaussian blurring, etc. These transformations are
dataset containing multiple categories is usually considered used to simulate pest images under different lighting and
to improve the recognition accuracy and training efficiency environmental conditions to increase the model’s adaptability

72946 VOLUME 12, 2024


S. Zhang et al.: JutePest-YOLO: A Deep Learning Network for Jute Pest Identification and Detection

TABLE 1. Jute pests dataset information.

FIGURE 11. Image samples in the data augmentation. (a) Original Image.
(b) VerticalFlip. (c) HorizontalFlip. (d) RandomCrop. (e) ShiftScaleRotate.
(f) HueSaturationValue. (g) PadIfNeeded. (h) RandomBrightnessContrast.
(i) RandomFog. (j) Cutout. (k) GaussianBlur. (l) ColorJitter.
TABLE 2. Environment configuration.

to environmental changes, and certain transformations (e.g.,


Random Fogging, Gaussian Blurring) provide different tex-
tures and noise levels, which are essential to improve the
model’s robustness to the variations in image quality that may
be encountered in real-world applications.
In summary, we included these data enhancement steps
aimed at comprehensively improving the model’s ability to
cope with diverse environments, as detailed in Figure 11.
After data enhancement, we expanded the entire dataset to
3252 jute pest images. To reduce the impact of dataset divi-
TABLE 3. Parameter settings.
sion on the experiment, this study adopts a random division
method, dividing the augmented dataset into a training set
and validation set according to the ratio of 8:2. Subsequently,
we annotated each image in the dataset with ‘‘LabelImg’’
software to mark the real bounding box of the pests. All image
data sizes were standardized at the initial stage of the network
and fixed to a uniform resolution of 640 × 640. In Table 1,
we give the details of the enhanced dataset of jute pests.

B. EXPERIMENTAL ENVIRONMENT AND PARAMETERS


In this paper, experiments were conducted on a homemade
jute pest dataset with the following model experimental con- In this study, the change curve of the loss function during
ditions: an Ubuntu server with a CPU of Xeon E5-2620 v4, 24 model training is shown in Figure 12, which shows that the
GB of RAM, and a GPU of NVIDIA GeForce RTX3090 with improved JutePest-YOLO model in this paper is closer to
24G of video memory. Python version 3.8, PyTorch version the global optimum. In the early stage of model training
1.13.0, and CUDA 11.7 are the programming environments. (Epoch 1-100), the loss value decreases rapidly and shows
The initial learning rate for network training is set to 0.01, and a clear convergence trend. The model adapts quickly to the
the Adam optimizer is used to update the network parameters training data at this stage, and the loss value decreases sig-
with a batch size of 8, a weight decay coefficient of 0.0005, nificantly. In the next training process (Epoch 100-400), the
a momentum of 0.937, and an Epoch of 500. To save time, decline of the loss function gradually slows down and shows
the model is trained on the server and subsequently validated a smooth trend. It indicates that the model is approaching
locally. The detailed environment configuration is shown in the convergence point and learning the main features of the
Table 2, and the training parameter settings are shown in data. The decelerating decline at this stage indicates that the
Table 3. model’s parameter tuning is more subtle, and more training

VOLUME 12, 2024 72947


S. Zhang et al.: JutePest-YOLO: A Deep Learning Network for Jute Pest Identification and Detection

in Equations (17)-(19).
Z 1
P (r) dr
AP = (17)
 0 
1 X 1X
mAP = P (r) (18)
m n
1 X0.95
[email protected] : 0.95 = mAP@r (19)
10 r=0.5
where m denotes the number of categories for classification,
n denotes the number of targets predicted in a single cate-
gory, and P(r) denotes the precision value when the recall
is r. mAP@r denotes the mean mAP value at a specific IoU
threshold r.

D. COMPARISON WITH BASELINE MODEL


FIGURE 12. Loss function curve during model training. To demonstrate the improvement effect of the improved
model on the detection performance, we conducted a com-
parison experiment between the improved model and the
iterations are needed to refine the model’s performance. In the baseline model YOLOv7. Table 4 shows the results of the
final stages of training (Epoch 400-500), the trend in the loss detection metrics of the improved model and YOLOv7.
function indicates that the model has approached stability. Figure 13 shows the change curve of the detection met-
At this point, the parameters of the model are nearing their rics. The results indicate that compared to YOLOv7, the
optimal state, and the performance of the model has reached improved JutePest-YOLO model shows an improvement
convergence. of 3.45% in Precision and 1.76% in Recall. It achieves
a [email protected] of 95.68% and [email protected]:0.95 of 67.11%,
C. PERFORMANCE METRICS representing increases of 2.24% and 3.25%, respectively,
For our jute disease dataset, each detected bounding box can compared to YOLOv7. The GFLOPs decreased from 105.3 to
be classified into four cases, i.e., True Positive (TP), True 88.4, a reduction of 16.05%. The F1 score increased from
Negative (TN), False Positive (FP), and False Negative (FN), 94.19 to 96.77%, showing an overall improvement of 2.58%.
and Precision and Recall can be used to classify the results of The accuracy is improved in all categories, especially in the
the above four classifications for comprehensive evaluation. P9 category, by 12.6%, which proves the effectiveness of the
The F1 value represents the reconciled average of Precision redesign of the detection head in our improvement strategy,
and Recall, which provides a single metric when dealing which allows the model to better capture and understand the
with data imbalance problems, enabling the simultaneous large-scale fuzzy feature information in the image and further
consideration of model precision and recall. Its calculation improves the accuracy of target detection. The results show
principle is shown in Eqs. (14)-(16). that the improved model has better detection performance in
TP pest target identification.
Precision = (14) Gradient-weighted Class Activation Map (Grad-CAM)
TP + FP
[25] is now one of the most commonly adopted techniques
TP
Recall = (15) in computer vision, aiming to visualize the convolutional
TP + FN feature maps in deep neural networks and generate heat
Precision × Recall
F1 = 2 × (16) map, which in turn can identify the region of interest of
Precision + Recall the model more accurately. The heatmap visually and easily
This paper also evaluates the model’s detection accu- reflects which areas of the feature map the model focuses on.
racy using [email protected] and [email protected]:0.95 and measures the Figure 14 demonstrates the difference between the improved
model’s computational complexity using GFLOPs. Here, and baseline models in terms of focusing on regions of
mAP (mean average precision) assesses the model’s pre- interest for specific target categories. This difference fur-
diction accuracy at various recall levels through different ther corroborates the effectiveness of our proposed improved
IoU thresholds, reflecting the model’s ability in localization model JutePest-YOLO in detecting non-significant targets.
and classification [email protected] and [email protected] are We acquired the Grad-CAMs for both the YOLOv7 and
calculated at IoU thresholds of 0.5 and 0.95, respectively. JutePest-YOLO models and visualized the detection effec-
On the other hand, [email protected]:0.95 is calculated by averaging tiveness on nine categories of jute pest damage using heat
over the range of IoU thresholds from 0.5 to 0.95 with a step maps generated by Grad-CAM. Compared to the baseline
size of 0.05. This evaluation criterion is more stringent and YOLOv7 model, our JutePest-YOLO model demonstrates
can demonstrate the performance variation of the model at an enhanced focus on relevant information, particularly in
different IoU thresholds. The calculation formulas are shown augmenting the perception of non-prominent objects. This

72948 VOLUME 12, 2024


S. Zhang et al.: JutePest-YOLO: A Deep Learning Network for Jute Pest Identification and Detection

TABLE 4. Comparison of the proposed improved model and YOLOv7 detection accuracy. (The bold data in the table indicate the best results.)

higher than the default CIoU loss function, reaching a max-


imum of 95.68%. the F1 score improves by 1.75%, from
95.04 to 96.79%.Moreover, as shown in Figure 15, the
JutePest-YOLO model using the WIoU v1 loss function
outperforms other loss functions in terms of recall rate and
[email protected]:0.95. Therefore, we believe that introducing the
WIoU v3 loss function as the bounding box loss function for
the JutePest-YOLO model is an optimal choice.

F. ABLATION STUDY
To verify the effectiveness of the various improvement strate-
gies of the JutePest-YOLO model proposed in this paper,
we designed an ablation study on the jute pest dataset in this
paper. The experiments were divided into six groups, and
their results are displayed in Table 6. Group 1 is the exper-
imental results of the original model YOLOv7, and Groups
2 to 4 are the results after adding only one improvement
method at a time to the original model, respectively, to verify
the effectiveness of each improvement method to the original
FIGURE 13. Comparison of model detection index change curves. algorithm. Group 5 is the experimental results after adding
two improvement methods, and Group 6 is based on the
finally obtained improved algorithm JutePest-YOLO.
As shown in Table 6, the first group represents the original
clearly indicates its superior performance, further confirming YOLOv7 model without the inclusion of any improvement
our model’s effectiveness in addressing issues related to pest modules, achieving accuracy and [email protected] of only 95.27%
background modeling and the prevalence of small targets. and 93.26%, respectively. In comparison to the original
model, all models incorporating the three improvement meth-
E. DIFFERENT LOSS FUNCTION COMPARISON ods have demonstrated enhanced detection performance. The
In the experiments of training the JutePest-YOLO network analysis of the experimental results is as follows:
for jute pest detection, to verify the superiority of intro- In the second experimental group, the original model was
ducing WIoU v1, we conducted comparative experiments augmented by introducing the WIoU v3 loss function. WIoU
using WIoU v1 and several mainstream loss functions for v3, by incorporating a dynamic, non-monotonic focusing
JutePest-YOLO network respectively, while keeping other mechanism, effectively reduces the occurrence of large or
training conditions consistent. Table 5 demonstrates the detrimental gradients from extreme samples. This enhance-
experimental results, while Figure 15 compares the Precision, ment resulted in an increase of 1.31% in [email protected] and 1.29%
Recall, F1 score, and [email protected] [email protected]:0.95 under dif- in [email protected]:.95.
ferent loss functions. In the third set of experiments, the addition of the P6
The experimental data show that the model achieves detection layer enabled the model to more effectively cap-
the best mAP performance when WIoU v3 is used as the ture large-scale, blurred feature information in complex
bounding box regression loss function, which is 1.13% background images. Consequently, this improvement led
higher than using the WIoU v1 loss function, and 1.46% to a 2.22% increase in accuracy and a 1.43% increase

VOLUME 12, 2024 72949


S. Zhang et al.: JutePest-YOLO: A Deep Learning Network for Jute Pest Identification and Detection

FIGURE 14. Heatmaps of different models on all categories. (YOLOv7 on the left, JutePest-YOLO on the right.)

TABLE 5. Comparison of detection results for different loss functions introduced by JutePest-YOLO.


TABLE 6. Comparison of ablation experiments of each module in JutePest-YOLO model, indicates that this improved strategy was used.

in [email protected], while [email protected]:.95 was enhanced by 3.67%, significantly reduce the FLOPs so that the GFLOPs are
reaching 67.53%. reduced from 105.3 to 85.0, which is a reduction of 19.3%.
Group 4 experiments improved the ELAN module of the In the fifth group of experiments, the P6 detection layer
original YOLOv7 model, and the new ELAN-P module was introduced on the basis of the fourth group. Compared
introduced a more efficient PConv in the original module. to the original model, this resulted in a 16.05% reduction in
After using the ELAN-P module, the model can effectively GFLOPs, while Precision and [email protected] were enhanced by
reduce redundant computations and memory accesses and 2.46% and 1.96%, respectively.

72950 VOLUME 12, 2024


S. Zhang et al.: JutePest-YOLO: A Deep Learning Network for Jute Pest Identification and Detection

all the models listed. The closest of these is the YOLOv7x


model, but its Precision of 96.9% is still lower than the
JutePest-YOLO model. This implies that the JutePest-YOLO
model performs well in reducing the number of incorrect
positive examples and has a strong accuracy. Regarding the
Recall metric, the JutePest-YOLO model achieves 94.9%,
second only to the YOLO-JD model’s 95.0%. The high
Recall indicates that the JutePest-YOLO model can iden-
tify the target object well and reduce missed detection. The
F1 score is the reconciled mean of Precision and Recall,
and the JutePest-YOLO model got 96.7% on this metric,
the highest of all models, showing that the model has a
good balance between Precision and Recall. In terms of
[email protected], the JutePest-YOLO model outperforms all other
models with a score of 95.6%, highlighting its ability to main-
FIGURE 15. Comparison of Precision, Recall, [email protected], and
tain high detection accuracy at higher IoU thresholds. On the
[email protected]:0.95 under different loss functions. [email protected]:0.95 metric, the JutePest-YOLO model scores
67.1%. While it may not be the highest among all models,
it still surpasses the majority, such as YOLOX, YOLO-JD,
The sixth experimental group integrated all the improve- and CAP-YOLOv7. This indicates that the detection perfor-
ment methods, resulting in the proposed JutePest-YOLO mance of the JutePest-YOLO model remains relatively stable
model. Compared to the original model, the improved across different IoU thresholds.
JutePest-YOLO model showed enhancements in Precision, In summary, The JutePest-YOLO model demonstrates
Recall, F1 score, [email protected], and [email protected]:0.95 by 3.45%, strong advantages across almost all evaluation metrics,
1.76%, 2.58%, 2.24%, and 3.25%, respectively. The overall particularly in Precision, F1 score, and [email protected]. This
model’s GFLOPs decreased by 16.05%. highlights the effectiveness of the proposed improvement
The experimental results show that the improvement strate- strategies, demonstrating their ability to enhance the model’s
gies proposed in this paper are effective. The improved model recognition capabilities for complex backgrounds and tar-
not only enhances the accuracy but also optimizes for small gets of different scales. Additionally, its performance on
target detection and blurred pest features and significantly the [email protected]:0.95 metric is commendable, showcasing
improves the operation efficiency of the model so that the stability across different IoU thresholds. This set of com-
model can achieve the optimal comprehensive performance in parative experiments fully demonstrates the superiority of
the task of jute pest identification and detection. In addition, the JutePest-YOLO model, emphasizing its practical value in
we demonstrated the model results of the six sets of exper- real-world applications.
imental results by generating a heatmap via Grade-CAM.
Figure 16 demonstrates the results of the heatmap. H. GENERALIZATION STUDIES
The darker the color of the heatmap, the more obvious the To validate the generality and performance of our proposed
target area is, and the more localized it is, the more important JutePest-YOLO model on different datasets, we conducted a
feature areas are highlighted. From the figure, we can see Generalisation experiment on another jute pest dataset and
that the regions of interest of the heatmaps generated by the compared our model with other mainstream target detection
models after adding the improved methods are all enlarged, models. This dataset is from the dataset used in the paper
especially the JutePest-YOLO model after introducing all the of Sourav et al. [9], which contains images of four cate-
methods, which can highlight the important regions in the gories of jute pests (the specific categories are Field cricket,
influence more clearly, and once again proves that the overall Spilosoma Obliqu, Jute stem weevil, and Yellow mite). The
detection performance of the improved models is better. names of the categories are denoted by D1, D2, D3, and D4,
respectively. The experimental results are shown in Table 8
G. COMPARATIVE EXPERIMENTS below.
To demonstrate the superiority and effectiveness of the From the experimental results, it is obvious that our
JutePest-YOLO network with better detection performance JutePest-YOLO model achieves 97.1% in Precision, which
in jute pest detection, we conducted comparison experiments is significantly better than other models, which indicates
between the improved model, the classical model, and the that our model can accurately identify jute pest targets
recently released model in this paper dataset. The comparison and reduces the possibility of misdetection. Meanwhile,
results are shown in Table 7, and the comparison of metrics the JutePest-YOLO model also achieved excellent perfor-
of different models is shown in Figure 17. mance in Recall and F1 scores, reaching 93.4% and 95.21%,
The experimental results show that the JutePest-YOLO respectively, which verified the superiority and general-
model achieved 98.7% on the Precision metric, the highest of ization ability of the model. In each category’s average

VOLUME 12, 2024 72951


S. Zhang et al.: JutePest-YOLO: A Deep Learning Network for Jute Pest Identification and Detection

FIGURE 16. Comparison of heat maps after adding different methods.


(a-f correspond to 6 groups of experiments. For example, (a) is the baseline model
in the first set of experiments).

TABLE 7. Analysis of the experimental effects of the different models.

98.5%. This indicates that the JutePest-YOLO model can


handle the multi-category target detection task well with
strong generalization ability.
On the [email protected] metric, our model achieved a score
of 88.9%, demonstrating excellent performance. This fur-
ther confirms that the JutePest-YOLO model excels not
only in general object detection tasks but also maintains
high-precision detection even at lower IoU thresholds. On the
[email protected]:0.95 metric, our model achieved a score of 64.4%.
Compared to other models, our model demonstrates higher
advantages across almost all evaluation metrics. Particu-
larly, when compared to the RetinaNet model, our model
FIGURE 17. Comparison of detection performance of different models. shows a significant improvement with a 15.5 percentage
point increase in [email protected] and a 21.8 percentage point
increase in [email protected]:0.95. When compared to other YOLO
precision (AP) evaluation, the JutePest-YOLO model series models, the JutePest-YOLO model also exhibits certain
achieves excellent detection results on all four categories, advantages, indicating significant improvements in our model
especially on D1, D2, and D3, where the AP values exceed enhancements.

72952 VOLUME 12, 2024


S. Zhang et al.: JutePest-YOLO: A Deep Learning Network for Jute Pest Identification and Detection

TABLE 8. Generalization experiment.

FIGURE 18. Comparison of confusion matrix results, (a) for YOLOv7, (b) for JutePest-YOLO.

In summary, as verified by the Generalisation experiment detection algorithm. Presented in a two-dimensional table
on the jute pest dataset, our JutePest-YOLO model achieves format, the rows represent actual categories while the
excellent performance in all evaluation metrics and has sig- columns represent predicted categories. By calculating the
nificant advantages over other mainstream target detection prediction results across different categories, various metrics
models, especially in terms of precision, recall, detection such as accuracy, recall rate, and false positive rate can be
effect of various categories, and mAP metrics. These results determined.
fully demonstrate the generalization ability of our model and Darker colored blocks on the diagonal of the confusion
its wide applicability in practical applications. matrix indicate high accuracy of the model’s detection results;
values on the off-diagonal represent misclassification, and
I. VISUAL ANALYSIS these values should be as low as possible to show the model’s
To show the detection effect of the proposed model in this high accuracy and low false alarm rate. It is evident that
study more intuitively, a confusion matrix was employed to the YOLOv7 network has lighter color blocks on the diag-
compare the model’s performance before and after improve- onal of the confusion matrix for the category Yello mite
ments. In this experiment, the confusion matrix is primarily with a Precision of 39% and shows color blocks for all
used to assess the performance of the JutePest-YOLO categories on the FN and FP samples. This implies that

VOLUME 12, 2024 72953


S. Zhang et al.: JutePest-YOLO: A Deep Learning Network for Jute Pest Identification and Detection

FIGURE 19. Detection results of YOLOv7.

the model has a certain error rate in detecting various cat- and omission caused by the complex background and small
egories of objects.By comparison, the confusion matrix of target categories in the field of pest recognition and to satisfy
the JutePest-YOLO network exhibits a darker color on the the requirements of accuracy and effect of the target detec-
diagonal for the P9 (Yello mite) category, indicating an accu- tion of the jute pest scene while considering the resource
racy of 53%. Meanwhile, it achieves a detection accuracy consumption. First, we replaced all the ELAN modules of
of 100% for most other categories. Additionally, only three the YOLOv7 model with the ELAN-P module, which was
categories show color blocks in the case of FN (False Neg- a module that replaced all the 3 × 3 regular convolutions
ative) samples. Notably, P9 represents a typical example of in the ELAN module with PConv, where PConv applied
small-target pest infestation. Therefore, based on the com- regular convolutions to a single subset of the input channels
parison of these confusion matrices, it can be concluded as a way of extracting spatial features, which reduced the
that the JutePest-YOLO model outperformed the original computational redundancy and memory accesses of the net-
model in detecting objects of all categories. The results of work while keeping the original gradient paths unchanged.
the comparison of the confusion matrices are displayed in Next, we added a new P6 detection layer, which extended
Figure 18. the sensory field of the model and fused different levels
To visually demonstrate the detection effect of our of semantic information to enable the network to recog-
model, this study conducted inference experiments using nize fuzzy features in the background of the model more
YOLOv7 and JutePest-YOLO. We screened the images of clearly. Finally, we introduced the WIoU v3 loss function,
the jute pest dataset for this experiment, and all categories which incorporated a dynamic sample allocation strategy to
tried to select images with complex image backgrounds effectively reduce the model’s focus on extreme samples and
and many small targets as the inference experiment data improve the overall performance. In addition, we constructed
and compared the detection results of some categories of a large-scale image dataset containing nine types of jute pests,
pests. which not only provided an effective training and testing
Figure 19 shows the comparative results of YOLOv7 and basis for the model but also was an important contribution
JutePest-YOLO models in detecting jute pests, respectively. to the research field of jute pest recognition. The experi-
It can be observed that YOLOv7 has a relatively poor detec- mental results showed that the average detection accuracy
tion performance, while JutePest-YOLO demonstrated the of the improved model increased by 3.45%, especially in
best detection performance. In the detection of (i) cate- the small target P9 category with 12.6% accuracy improve-
gory Yello mite, YOLOv7 used three detection frames, and ment, [email protected] [email protected]:0.95 compared to YOLOv7
JutePest-YOLO used 14 detection frames, identifying a large with 2.24% and 3.25% respectively, and the GFLOPs were
number of visible Yello mite targets in the image. Overall, reduced by 16.05%.
JutePest-YOLO was able to detect a wide range of jute pests The limitation of the JutePest-YOLO model is that the
quickly, accurately, and comprehensively, providing strong number of parameters and the inference speed of the model
technical support for crop protection. are still too high, resulting in inapplicability to target detec-
tion in other scenarios. In the following research work,
V. CONCLUSION we will make lightweight structural optimization of the
In this study, a JutePest-YOLO model for jute pest detection JutePest-YOLO model so that it can be extended to target
with high detection accuracy and good effect was proposed detection in other scene datasets or applied to the field of
to solve the problems of feature ambiguity and misdetection target tracking.

72954 VOLUME 12, 2024


S. Zhang et al.: JutePest-YOLO: A Deep Learning Network for Jute Pest Identification and Detection

REFERENCES [20] (2019). Agricultural Extension Manual—Dae. [Online]. Available:


[1] M. H. Saleem, S. Ali, M. Rehman, M. Hasanuzzaman, M. Rizwan, https://dae.portal.gov.bd/sites/default/files/files/dae.portal.gov.bd/
S. Irshad, F. Shafiq, M. Iqbal, B. M. Alharbi, T. S. Alnusaire, and publications/38eaceb4_db27_48ff_83e1_8b45b01b6a79/Extension_
S. H. Qari, ‘‘Jute: A potential candidate for phytoremediation of Mannual_Chapt1.pdf
metals—A review,’’ Plants, vol. 9, no. 2, p. 258, Feb. 2020, doi: [21] X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, and J. Sun, ‘‘RepVGG: Making
10.3390/plants9020258. VGG-style ConvNets great again,’’ in Proc. IEEE/CVF Conf. Comput. Vis.
[2] J. Ferdous, M. Hossain, M. Alim, and M. Islam, ‘‘Effect of field duration on Pattern Recognit. (CVPR), Jun. 2021, pp. 13728–13737.
yield and yield attributes of tossa jute varieties at different agroecological [22] Z. Tong, Y. Chen, Z. Xu, and R. Yu, ‘‘Wise-IoU: Bounding box regression
zones,’’ Bangladesh Agronomy J., vol. 22, no. 2, pp. 77–82, Jun. 2020, doi: loss with dynamic focusing mechanism,’’ 2023, arXiv:2301.10051.
10.3329/baj.v22i2.47622. [23] J. Chen, S.-H. Kao, H. He, W. Zhuo, S. Wen, C.-H. Lee, and S.-H.-G. Chan,
[3] S. Akter, M. N. Sadekin, and N. Islam, ‘‘Jute and jute products of ‘‘Run, don’t walk: Chasing higher FLOPS for faster neural networks,’’ in
bangladesh: Contributions and challenges,’’ Asian Bus. Rev., vol. 10, no. 3, Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2023,
pp. 143–152, Aug. 2020, doi: 10.18034/abr.v10i3.480. pp. 12021–12031.
[4] S. Rahman, M. Kazal, I. Begum, and M. Alam, ‘‘Exploring the future [24] Z. Jiang, Y. Guo, K. Jiang, M. Hu, and Z. Zhu, ‘‘Optimization
potential of jute in Bangladesh,’’ Agriculture, vol. 7, no. 12, p. 96, of intelligent plant cultivation robot system in object detection,’’
Nov. 2017, doi: 10.3390/agriculture7120096. IEEE Sensors J., vol. 21, no. 17, pp. 19279–19288, Sep. 2021, doi:
[5] V. R. Babu, G. Sivakumar, and S. Satpathy, ‘‘Characterization and field 10.1109/JSEN.2021.3077272.
evaluation of spilosoma obliqua nucleopolyhedrosis virus (SpobNPV) [25] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and
CRIJAF1 strain against jute hairy caterpillar, spilosoma obliqua (Walker) D. Batra, ‘‘Grad-CAM: Visual explanations from deep networks via
infesting jute, corchorus olitorius linn,’’ Egyptian J. Biol. Pest Control, gradient-based localization,’’ in Proc. IEEE Int. Conf. Comput. Vis.
vol. 33, no. 1, p. 8, Jan. 2023, doi: 10.1186/s41938-023-00654-7. (ICCV), Oct. 2017, pp. 618–626.
[6] K. Li, Q. H. Yang, H. J. Zhi, and J. Y. Gai, ‘‘Identification and distribution [26] H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese,
of soybean mosaic virus strains in Southern China,’’ Plant Disease, vol. 94, ‘‘Generalized intersection over union: A metric and a loss for bounding
no. 3, pp. 351–357, Mar. 2010, doi: 10.1094/pdis-94-3-0351. box regression,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.
[7] F. Lei, F. Tang, and S. Li, ‘‘Underwater target detection algorithm based on (CVPR), Jun. 2019, pp. 658–666.
improved YOLOV5,’’ J. Mar. Sci. Eng., vol. 10, no. 3, p. 310, Feb. 2022, [27] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, ‘‘Distance-
doi: 10.3390/jmse10030310. IoU loss: Faster and better learning for bounding box regression,’’
[8] C.-Y. Wang, A. Bochkovskiy, and H.-Y.-M. Liao, ‘‘YOLOV7: Trainable in Proc. AAAI Conf. Artif. Intell., Apr. 2020, vol. 34, no. 7,
bag-of-freebies sets new state-of-the-art for real-time object detectors,’’ in pp. 12993–13000.
Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2023, [28] Y.-F. Zhang, W. Ren, Z. Zhang, Z. Jia, L. Wang, and T. Tan, ‘‘Focal
pp. 7464–7475. and efficient IOU loss for accurate bounding box regression,’’ Neurocom-
[9] M. S. U. Sourav and H. Wang, ‘‘Intelligent identification of jute pests puting, vol. 506, pp. 146–157, Sep. 2022, doi: 10.1016/j.neucom.2022.
based on transfer learning and deep convolutional neural networks,’’ 07.042.
Neural Process. Lett., vol. 55, no. 3, pp. 2193–2210, Jun. 2023, doi: [29] Z. Gevorgyan, ‘‘SIoU loss: More powerful learning for bounding box
10.1007/s11063-022-10978-4. regression,’’ 2022, arXiv:2205.12740.
[10] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, [30] J. Wang, C. Xu, W. Yang, and L. Yu, ‘‘A normalized Gaussian Wasserstein
‘‘MobileNetV2: Inverted residuals and linear bottlenecks,’’ in distance for tiny object detection,’’ 2021, arXiv:2110.13389.
Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, [31] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, ‘‘Focal loss for dense
pp. 4510–4520. object detection,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017,
[11] W. Yu, K. Yang, Y. Bai, T. Xiao, H. Yao, and Y. Rui, ‘‘Visualizing and pp. 2999–3007.
comparing AlexNet and VGG using deconvolutional layers,’’ in Proc. 33rd [32] Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, ‘‘YOLOX: Exceeding YOLO
Int. Conf. Mach. Learn., 2016, pp. 1–18. series in 2021,’’ 2021, arXiv:2107.08430.
[12] N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, ‘‘ShuffleNet V2: Practical [33] H. Shi, W. Yang, D. Chen, and M. Wang, ‘‘CPA-YOLOV7: Contextual
guidelines for efficient CNN architecture design,’’ in Proc. Eur. Conf. and pyramid attention-based improvement of YOLOV7 for drones scene
Comput. Vis. (ECCV), 2018, pp. 116–131. target detection,’’ J. Vis. Commun. Image Represent., vol. 97, Dec. 2023,
[13] P. Tang, H. Wang, and S. Kwong, ‘‘G-MS2F: GoogleNet based multi- Art. no. 103965, doi: 10.1016/j.jvcir.2023.103965.
stage feature fusion of deep CNN for scene recognition,’’ Neurocomputing, [34] G. Wang, Y. Chen, P. An, H. Hong, J. Hu, and T. Huang, ‘‘UAV-YOLOV8:
vol. 225, pp. 188–197, Feb. 2017, doi: 10.1016/j.neucom.2016.11.023. A small-object-detection model based on improved YOLOV8 for UAV
[14] D. Z. Karim, T. A. Bushra, and M. M. Saif, ‘‘PestDetector: A deep aerial photography scenarios,’’ Sensors, vol. 23, no. 16, p. 7190, Aug. 2023,
convolutional neural network to detect jute pests,’’ in Proc. 4th Int. Conf. doi: 10.3390/s23167190.
Sustain. Technol. Ind., Dec. 2022, pp. 1–6.
[15] D. Li, F. Ahmed, N. Wu, and A. I. Sethi, ‘‘YOLO-JD: A deep learning
network for jute diseases and pests detection from images,’’ Plants, vol. 11,
no. 7, p. 937, Mar. 2022, doi: 10.3390/plants11070937.
[16] M. S. H. Talukder, M. R. Chowdhury, M. S. U. Sourav, A. A. Rakin,
S. A. Shuvo, R. B. Sulaiman, M. S. Nipun, M. Islam, M. R. Islam,
M. A. Islam, and Z. Haque, ‘‘JutePestDetect: An intelligent approach
for jute pest identification using fine-tuned transfer learning,’’
Smart Agricult. Technol., vol. 5, Oct. 2023, Art. no. 100279, doi:
10.1016/j.atech.2023.100279.
[17] L. Song, M. Liu, S. Liu, H. Wang, and J. Luo, ‘‘Pest species identification
algorithm based on improved YOLOV4 network,’’ Signal, Image Video
Process., vol. 17, no. 6, pp. 3127–3134, Sep. 2023, doi: 10.1007/s11760- SHUAI ZHANG received the B.E. degree from
023-02534-x. Hubei Polytechnic University, Huangshi, China,
[18] W. Xinming and T. S. Hong, ‘‘Comparative study on Leaf dis- in 2021. He is currently pursuing the M.S.
ease identification using Yolo v4 and Yolo v7 algorithm,’’ AgBio- degree in software engineering with Wuhan Poly-
Forum, vol. 25, no. 1, pp. 58–67, Jun. 2023. [Online]. Available: technic University, Wuhan. His research interest
https://hdl.handle.net/10355/95967 includes artificial intelligence technology and its
[19] T. Nageshkumar, P. Shrivastava, B. Saha, A. Subeesh, D. B. Shakyawar, application.
G. Sardar, and J. Mandal, ‘‘Defects identification in raw jute fibre using
convolutional neural network models,’’ J. Textile Inst., vol. 115, no. 5,
pp. 835–843, May 2024, doi: 10.1080/00405000.2023.2199489.

VOLUME 12, 2024 72955


S. Zhang et al.: JutePest-YOLO: A Deep Learning Network for Jute Pest Identification and Detection

HENG WANG received the B.E. degree from YIMING JIANG received the B.E. degree from
the Huazhong University of Science and Technol- Wuhan Polytechnic University, Wuhan, China,
ogy, in 2006, and the Ph.D. degree in engineering in 2023, where he is currently pursuing the M.S.
from Wuhan University, in 2013. He is currently degree in software engineering. His research inter-
a Professor with the School of Mathematics and ests include music information retrieval, and arti-
Computer Science, Wuhan Polytechnic University. ficial intelligence technology and its application.
He is also a Postdoctoral Research Fellow with
Alto University, Finland. His research interests
include the perception characteristics of acoustic
spatial parameters, artificial intelligence, and the
application of 3D audio and video in virtual reality.

CONG ZHANG received the bachelor’s degree


in automation engineering from the Huazhong
University of Science and Technology, in 1993,
the master’s degree in computer application tech-
nology from Wuhan University of Technology,
in 1999, and the Ph.D. degree in computer applica-
tion technology from Wuhan University, in 2010.
He is currently a Professor with the School of Elec-
trical and Electronic Engineering, Wuhan Poly-
technic University. His research interests include
multimedia signal processing, multimedia communication system theory and
application, and pattern recognition.

ZHENG LIU received the B.E. degree from Wuhan LEI YU received the B.E. degree from South-
Polytechnic University, Wuhan, China, in 2022, west Petroleum University, Chengdu, China,
where he is currently pursuing the M.S. degree in 2023. She is currently pursuing the M.S.
in software engineering. His research interests degree in software engineering with Wuhan Poly-
include music information retrieval, and artificial technic University, Wuhan. Her research interest
intelligence technology and its application. includes artificial intelligence technology and its
application.

72956 VOLUME 12, 2024

You might also like