0% found this document useful (0 votes)
17 views12 pages

CNN-Based Target Detection and Classification When Sparse SAR Image Dataset Is Available

This document presents a novel framework for target detection and classification using sparse synthetic aperture radar (SAR) images, leveraging the complex approximate message passing (CAMP) algorithm. The proposed method demonstrates improved performance over traditional matched filtering techniques, achieving high mean average precision (mAP) values in experimental results. The framework effectively utilizes both sparse and nonsparse image datasets to enhance target detection capabilities in various operational conditions.

Uploaded by

ragou
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views12 pages

CNN-Based Target Detection and Classification When Sparse SAR Image Dataset Is Available

This document presents a novel framework for target detection and classification using sparse synthetic aperture radar (SAR) images, leveraging the complex approximate message passing (CAMP) algorithm. The proposed method demonstrates improved performance over traditional matched filtering techniques, achieving high mean average precision (mAP) values in experimental results. The framework effectively utilizes both sparse and nonsparse image datasets to enhance target detection capabilities in various operational conditions.

Uploaded by

ragou
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL.

14, 2021 6815

CNN-Based Target Detection and Classification


When Sparse SAR Image Dataset is Available
Hui Bi , Member, IEEE, Jiarui Deng, Tianwen Yang, Jian Wang, and Ling Wang, Member, IEEE

Abstract—Synthetic aperture radar (SAR) is an earth obser- I. INTRODUCTION


vation technology that can obtain high-resolution image in all-
S a kind of high-resolution earth observation technique,
weather and all-time conditions, and hence, has been widely used in
civil and military applications. SAR target detection and classifi-
cation are the key processes for the detailed feature information
A synthetic aperture radar (SAR) has all-time and all-
weather surveillance ability, and has been widely used in many
extraction of the interested target. Compared with traditional military and civilian fields [1], [2]. Target detection and classifi-
matched filtering (MF) recovered result, sparse SAR image has
lower sidelobes, noise, and clutter. Thus, it will theoretically has bet-
cation are the key fields of SAR applications, which can extract
ter performance in target detection and classification. In this article, the image feature information, e.g., target position, shadow, and
we propose a novel sparse SAR image based target detection and contour, and hence, play an important role in military reconnais-
classification framework. This novel framework first obtains the sance, social security, and resource exploration [3]–[6].
sparse SAR image dataset by complex approximate message pass- Traditional matched filtering (MF) based SAR imaging al-
ing (CAMP), which is an L1 -norm regularization sparse imaging
method. Different from other regularization recovery algorithms, gorithms, such as Range Doppler algorithm [7], [8] and Chirp
CAMP can output not only a sparse solution, but also a nonsparse Scaling algorithm [9]–[11], need the echo data satisfying the
estimation of considered scene that well preserves the statistical Shannon–Nyquist sampling theory in scene recovery [12].
characteristic of the image when protruding the target. Then, we Therefore, the amount of data required to obtain high-resolution
detect and classify the targets by using the convolutional neural image will greatly increase the load of data storage and pro-
network based technologies from the sparse SAR image datasets
constructed by the sparse and nonsparse solutions of CAMP, re-
cessing, and dramatically increase the complexity of radar sys-
spectively. For clarify, these two kinds of sparse SAR image datasets tem [13]. In 1990s, sparse signal processing theory was pro-
are named as DSp and DNsp . Experimental results show that posed, which uses fewer samples than required by traditional
under standard operating conditions, the proposed framework can sampling theory to reconstruct the original signal [14]. Then, in
obtain 92.60% and 99.29% mAP on Faster RCNN and YOLOv3 2006, Donoho [15] and Candes [16], [17] proposed the com-
by using the DNsp sparse SAR image dataset. Under extended
operating conditions, the mAP value of Faster RCNN and YOLOv3 pressive sensing (CS), that is an important development of sparse
are 95.69% and 89.91% mAP, respectively. These values based on signal processing. CS breaks the limitation of Shannon–Nyquist
the DNsp dataset are much higher than the classified result based theory, which can achieve high-quality recovery of sparse scene
on the corresponding MF dataset. with less amount of data. After introducing sparse signal pro-
Index Terms—Convolutional neural network (CNN), complex cessing into SAR imaging, sparse SAR imaging theory was
approximate message passing (CAMP), sparse synthetic aperture formed. Compared with traditional SAR, sparse SAR imaging
radar (SAR) image, target detection and classification. radar can decrease the system complexity and increase the swath
width [13], and shows great application potential. However,
typical regularization based sparse SAR imaging algorithms,
e.g., iterative soft thresholding (IST) [18]–[20] and orthogonal
Manuscript received December 1, 2020; revised April 14, 2021; accepted June matching pursuit (OMP) [21], [22] could only obtain the sparse
27, 2021. Date of publication June 30, 2021; date of current version July 15, 2021.
This work was supported in part by the Fundamental Research Funds for the
estimation of the observed scene with ruined background distri-
Central Universities under Grant NE2020004, in part by the National Natural bution. Although the sparse image has better performance than
Science Foundation of China under Grant 61901213, in part by the Natural MF based result, it will also lose the feature information of the
Science Foundation of Jiangsu Province under Grant BK20190397, in part by
the Aeronautical Science Foundation of China under Grant 201920052001, and
target, which greatly reduces the accuracy of target detection
in part by the Young Science and Technology Talent Support Project of Jiangsu and classification. To solve this problem, complex approximate
Science and Technology Association. (Corresponding author: Hui Bi.) message passing (CAMP) algorithm was introduced to sparse
Hui Bi, Jiarui Deng, Jian Wang, and Ling Wang are with the Key Lab-
oratory of Radar Imaging and Microwave Photonics, Ministry of Educ-
SAR imaging [23]–[25]. Different from other regularization
tion, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, recovery algorithms, it outputs not only a sparse image, but also
China, and also with the College of Electronic and Information Engineer- a nonsparse estimation of considered scene with complete image
ing, Nanjing University of Aeronautics and Astronautics, Nanjing 211106,
China (e-mail: bihui@[Link]; djr_919@[Link]; 1528114249@[Link];
statistical characteristics and improved quality [26]. Because of
tulip_wling@[Link]). this advantage, CAMP-based sparse SAR imaging method is
Tianwen Yang is with the National Mobile Communications Research Labo- used to acquire the two kinds of sparse SAR image datasets,
ratory, Southeast University, Nanjing 211189, China, and also with the College
of Electronics and Information Engineering, Southeast University, Nanjing
constructed by the sparse and nonsparse solutions, respectively.
211189, China (e-mail: yangtianwen0524@[Link]). For clarify, these two kinds of sparse SAR image datasets
Digital Object Identifier 10.1109/JSTARS.2021.3093645 are named as DSp and DNsp . Compared with DSp , DNsp will
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see [Link]
6816 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 14, 2021

provide more feature information for the target detection and When DNsp is available, under extended operating conditions
classification. (EOC), the mAP value of Faster RCNN and YOLOv3 are
SAR automatic target detection methods are mainly divided 95.69% and 88.21% mAP, respectively. While under standard
into two types, i.e., template-based [27]–[29] and model-based operating conditions (SOC), these values even reach 92.60% and
methods [30]–[32]. The core of template-based method per- 99.29%, which is a good result in practical SAR target detection
forms feature extraction and selection, which requires wide process.
professional knowledge as the basis. Some hidden features may The rest of this article is organized as follows. Section II intro-
not be used effectively, which limits the detection performance. duces the CAMP-based sparse SAR imaging principles for echo
The core of the model-based method lies in the design of target data and complex image data, respectively. Target detection and
model, which relies too much on the acquisition of target model classification models of Faster RCNN and Yolov3 are described
information and requires time-consuming high-frequency elec- in Section III. Section IV shows the experimental results and
tromagnetic calculation. Deep learning technique provides a new performance analysis of SAR target detection and classification
solution without artificial feature design and object modeling. based on different datasets. Finally, Section V concludes this
In 2012, Hinton et al. [33] designed a deep convolutional neural article.
network (CNN), named as AlexNet. In the ImageNet Large
Scale Visual Recognition Challenge [34], the Top5 error ratio II. CAMP-BASED SPARSE SAR IMAGING
of AlexNet is just 17.0%, which is considerably better than the
In this section, the advantages of CAMP algorithm in SAR
state-of-the-art then. This makes CNN become the most impor-
imaging performance improvement and image statistical char-
tant tool in the field of target detection and classification. Mean-
acteristics preservation will be discussed. This is the precon-
while, it also attracts the attention of researchers in the field of
dition for CNN-based target detection and classification to be
radar image processing. CNN-based target detectors are usually
presented.
divided into two types, one-stage object detector [35]–[37] and
two-stage object detector [38]–[41]. Two-stage object detector
A. Sparse SAR Imaging From Echo Data
first generates the target candidate bounding box and then uses
the target detection network to classify the candidate bounding As discussed in [26], one-dimensional (1-D) sparse SAR
box and perform border regression. The most representative imaging model can be expressed as
two-stage object detector is RCNN series [35]–[37]. The one-
y = Hx + n0 (1)
stage object detector directly outputs the target coordinates and
conditional probabilities of all classes. Its representative models where y ∈ C M ×1 and x ∈ C N ×1 are the echo data and backscat-
are Single Shot MultiBox Detector (SSD) [41] and YOLO [38]– tering coefficient of considered scene, respectively, n0 ∈ C M ×1
[40] series. Faster RCNN [37] and Yolov3 [40] have the best is the noise vector, and H ∈ C M ×N is the system measure-
performance in the abovementioned two kinds of detectors, re- ment matrix, which represents the transmitted signal and the
spectively, and hence, being selected for the SAR target detection imaging geometry relationship between radar and surveillance
and classification. Nowadays, several researchers have applied area. According to the CS theory [15], when x is sparse enough
CNN-based methods to solve the SAR target detection and and H satisfies the RIP condition [17], the sparse scene can be
classification problems. Dong et al. [42] proposed a modified recovered by solving
Faster RCNN model and SSD model with data augmentation to  
1
address target recognition problem. Kang et al. [43] modified x̂ = min y − Ax22 + λx1 (2)
x 2
Faster RCNN by the traditional constant false alarm rate so as
to better detect the SAR target. Wang et al. [44] designed a deep where λ is the regularization parameter. After recovery, the
framework using multiple CNNs for feature-fused SAR target 2-D backscattering coefficient X̂ of considered scene can be
discrimination. However, all these works are based on the MF obtained by reshaping x̂. For the Lasso problem in (2), CAMP
recovered SAR image. It is known that compared with MF-based algorithm can be used for the scene recovery. The detailed
image, sparse SAR image has better quality with lower sidelobes iterative procedures are listed in [26]. Different from other
and reduced noise and clutter. Thus, it is meaningful to study regularization recovery algorithms, CAMP can obtain not only
CNN-based target detection and classification technique when the traditional sparse image x̂, but also a nonsparse estimation x̃
the sparse SAR image dataset is available. of the considered scene, which has an improved image quality
In this article, we propose a novel sparse SAR image based tar- and well preserved background statistical distribution. compared
get detection and classification framework. This framework first with MF based result. It is known that compared with conven-
obtains the sparse SAR image datasets DSp and DNsp by using tional sparse SAR imaging technique via model in (1), MF-based
CAMP based sparse SAR imaging method. Then, it detects the method has better calculation efficiency. However, its recovered
targets by using two conventional CNN-based methods, Faster image usually suffers from serious noise and sidelobes, which
RCNN and YOLOv3, for the constructed sparse SAR image will affect the further application of the image. In addition,
dataset. Experimental results based on MSTAR data show that compared with original echo, the available data is always the
compared with MF dataset and DSp composed of sparse SAR MF recovered SAR complex image, such as the used MSTAR
images with damaged statistical distribution, DNsp shows better dataset. Therefore, in order to obtain a large number of sparse
performance in CNN based target detection and classification. SAR images for further application, the complex image based
BI et al.: CNN-BASED TARGET DETECTION AND CLASSIFICATION WHEN SPARSE SAR IMAGE DATASET IS AVAILABLE 6817

Fig. 1. Reconstructed images of simulated scene by different methods. (a) MF. (b) Sparse solution X̂ of CAMP-based method. (c) Nonsparse solution X̃ of
CAMP-based method.

Fig. 2. Reconstructed images of considered scene by different methods. (a) MF. (b) Sparse solution X̂ of CAMP-based method. (c) Nonsparse solution X̃ of
CAMP-based method.

sparse SAR imaging technique is essential to enhance the MF TABLE I


TBR VALUES OF IMAGE RECONSTRUCTED BY MF AND
dataset directly. CAMP-BASED METHODS

B. Sparse SAR Imaging From Complex Image Data


According to [45], the complex image based sparse SAR
imaging model can be written as
XMF = X + N (3)

where X ∈ C NP (Azimuth)×NQ (Range) is the 2-D backscatter-


CAMP-based algorithm introduces a term of “state evolution” to
ing coefficient of the scene, whose (p, q) entry is x(p, q),
evolve the standard deviation of “noise” as the iteration proceeds,
XMF is the known complex-valued MF recovered SAR image,
and thus produces the sparse estimation X̂ and nonsparse esti-
N ∈ C NP ×NQ is a complex matrix that denotes the difference
mation X̃ of the scene simultaneously [24]. Therefore, different
between XMF and X including sidelobes, noise, etc. Similar to
from other regularization recovery algorithms, such as IST [18]–
the sparse imaging from echo data, we can recover the scene of
[20] and OMP [21], [22], CAMP can obtain a nonsparse solution
interest by solving the following Lasso problem:
  with a similar background statistical distribution to that of the
1 MF-based image.
X̂ = min XMF − X2F + λX1 . (4)
X 2
CAMP algorithm can also be used to solve the optimization C. Verification
problem in (4). The detailed iterative process is shown in [45]. In the following, experiments based on the MSTAR dataset
Similar to the CAMP algorithm via echo data, the complex will be used to verify the CAMP-based algorithm in SAR imag-
image based CAMP algorithm can also output the sparse (X̂) ing performance improvement and image statistical distribution
and nonsparse (X̃) images of the considered scene, which have preservation. Figs. 1 and 2 show the image recovered by MF-
the similar performance to the images recovered from echo data. and CAMP-based sparse SAR imaging methods, respectively.
6818 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 14, 2021

Compared with MF-based image, it is seen that both sparse


and nonsparse solutions of CAMP have better image quality,
which is very helpful for the further SAR image applications.
To evaluate the performance improvement quantitatively, target-
to-background ratio (TBR) is selected as the judging criterion,
defined as [46]
⎛   ⎞
 
Δ
max (u,v)∈T  (X) (u,v) 
TBR (X) = 20log10 ⎝   ⎠ (5)
  
(1/NB ) (u,v)∈B (X)(u,v) 

where T is the target area which is surrounded by the background


region B, and NB is the number of pixels in the background
region. Three targets indicated by the yellow rectangles in Fig. 2
are selected to calculate the TBR values (see Table I). Their
zoom-in images are shown in Fig. 3. From Table I and Fig. 3,
it is seen that the TBR values of both X̂ and X̃ of selected
Fig. 3. Zoom-in plots of three selected targets in Fig. 2. From top to bottom targets are all above 50 dB, which show better image quality
rows: MF recovered images, sparse solutions of CAMP-based method, and
nonsparse solutions of CAMP-based method. From left to right columns, the than MF recovered images. To demonstrate the ability of image
focused targets are Target1, Target2, and Target3, respectively. statistical distribution preservation by the CAMP-based sparse
SAR imaging methods, the difference between MF-based image
in Fig. 1(a) and X̃ in Fig. 1(c) are calculated and shown in Fig. 4.
From Fig. 4, it is seen that compared with MF-based result, X̃
has the similar background statistical distribution. However, it
can suppress the amplitude value by about 20 dB in the nontarget
area, which means that X̃ has both better image performance and
complete feature information of the target. This is very helpful
for further SAR target detection and classification applications.

III. CNN-BASED TARGET DETECTION AND CLASSIFICATION


FRAMEWORK VIA SPARSE SAR IMAGE DATASAT
A. Principle of CNN
CNN is a deep feed forward neural network with excellent
feature learning ability, which realizes the receptive field by
Fig. 4. Amplitude deviation between X̃ and the MF-based image. convolution. There are three major characteristics of CNN, i.e.,
locality of features based local field, repeatability of features
based weight sharing, and pooling operation in the subsampled
TABLE II processing. These operations greatly decrease the used param-
DATA DESCRIPTION FOR SOC eters for deep learning, and hence, reduce the complexity of
network. This makes CNN achieve better fault tolerance and
robustness. Generally, as shown in Fig. 5, the typical structure
of CNN is composed of input layer, convolution layer, pooling
layer, fully-connected layer, and output layer. The input layer is
used to receive the input image data. In this article, the input is
a single-channel grayscale SAR image. Thus, the feature of the
input layer contains only one feature map, i.e., x1 = {x11 }.
The convolutional layer simulates the response mechanism of
human neurons to visual stimuli. The function of this layer is to
perform convolution operation on the input data to extract image
maps, and connect the result locally to the next layer. Generally,
the more convolutional layers, the stronger ability of the network
to express features. The feature map of the convolutional layer
can be described as
⎛ ⎞
N m-1
xm
j =f⎝ Gm m m−1
i,j (ki,j ⊗ xi ) + bm
j
⎠ (6)
i=1
BI et al.: CNN-BASED TARGET DETECTION AND CLASSIFICATION WHEN SPARSE SAR IMAGE DATASET IS AVAILABLE 6819

Fig. 5. Typical structure of CNN [33].

Fig. 6. Architecture of Faster RCNN with ZFNet [37].

TABLE III
COMPARISON OF DIFFERENT DATASETS ON FASTER RCNN UNDER SOC

TABLE IV
COMPARISON OF DIFFERENT DATASETS ON YOLOV3 UNDER SOC

where xmj is the jth feature map in the mth layer, N


m−1
is The pooling layer, also known as the subsampling layer, is
m
the number of feature maps in the m − 1th layer, Gi,j is the usually located between successive convolutional layers. The
m−1
connection matrix between xm j and xi
m
, ki,j , and bmj are
main function of pooling layer is to make the features have a
the convolution kernels and bias, respectively, and f (·) is the certain degree of spatial invariance, and reduce the parameters
nonlinear activate function, which is usually set as tanh, sigmoid, and computations to avoid overfitting. The feature map of this
ReLU, or SELU.
6820 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 14, 2021

TABLE V
DATA DESCRIPTION FOR EOC

i.e., y = (y1 , y2 , . . . , yH )T , where H represents the number of


Fig. 7. Architecture of RPN [43].
classes.

B. Faster RCNN
Faster RCNN is a region-based two-stage target detection
algorithm based on CNN. It first generates the candidate re-
gions, then classifies the candidate regions, and finally refines
the locations. Faster RCNN consists of four parts, convolution
layer, region proposal networks (RPN), region of interest (ROI)
pooling, and classification. The architecture of Faster RCNN
with ZFNet is shown in Fig. 6, whose main steps for deep
learning can be summarized as follows [37].
1) The image data are input into CNN to obtain the corre-
sponding feature maps.
2) The feature maps are transmitted through two different
ways, one to RPN, and the other to forward.
3) The PRN calculates the region proposals through the fea-
ture maps, then performs maximum suppression on the region
proposals and outputs the score of each region proposals.
4) The Top-N ranked proposal regions in step 3 and feature
maps obtained in step 2 are passed to the ROI pooling layer to
obtain the features corresponding to the region proposals.
5) The features of region proposals are introduced to the full-
connected layer. Then, the result of classification and regression
will be outputted.
Fig. 8. Network structure of YOLOv3 [40]. The main difference between Faster RCNN and other al-
gorithms of RCNN series is that RPN is proposed in Faster
RCNN to specifically recommend candidate regions, which
layer can be expressed as realizes an end-to-end target detection framework. RPN shares
the full-image convolutional features with the entire network,
xm m−1
j = p(xj ) (7)
thus, makes region proposals almost free. The main idea of PRN
where p(·) is the pooling function. Common pooling functions is to distinguish candidate boxes and optimize the target position
are mean-pooling and max-pooling. In recent years, some CNN- according to the feature maps by the network convolution layers.
based detectors have used convolutional layers with a step size The structure of PRN is shown in Fig. 7 [43]. In Fig. 7, for
greater than 1 instead of pooling layers, making pooling layers each sliding-window location, the PRN generates k bounding
not a necessary part of CNN. boxes, known as anchor boxes at multiple scales and aspect
The full-connected layer is set after the feature extraction. ratios. Then, the sliding window will be mapped to a lower
Its function is to connect all the neurons in the previous layer dimensional feature, i.e., 256 dimensions for ZFNet, and fed
with the neurons in the current layer, and then map the features into a box-regression layer and a box-classification layer. For
according to the specific task of the output layer. The form of the k anchor boxes of each sliding window, box-regression layer,
output layer is determined by the specific task that the network and box-classification layer will output 4 k coordinates and 2 k
needs to complete. If the convolutional neural network is used scores, respectively. These coordinates and scores will be used to
as a classifier, the output layer will use the softmax logistic estimate the probability of being an object or not for the proposal
regression model to output prediction vectors of all classes, of each anchor box. Before RPN was proposed, the most popular
BI et al.: CNN-BASED TARGET DETECTION AND CLASSIFICATION WHEN SPARSE SAR IMAGE DATASET IS AVAILABLE 6821

Fig. 9. SAR images of ten classes of targets in MSTAR dataset and their corresponding optical images.

Fig. 10. Examples in the dataset DNsp . Each image in this dataset is fused with 15 targets randomly.

TABLE VI
COMPARISON OF DIFFERENT DATASETS ON FASTER RCNN UNDER EOC

region proposal approach was selective search method, which with convolutional layers for dramatically reducing the compu-
is computationally expensive and time-consuming. In Faster tational cost. RPN improves the quality and efficiency of region
RCNN, PRN replaces the selective search method. It optimizes proposal, thereby improving the accuracy and speed of target
the structure of the proposed region for efficient and accurate detection and classification in Faster RCNN.
region proposal generation, and shares the convolution features
6822 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 14, 2021

Fig. 11. Target detection and classification results of Faster RCNN under SOC. (a) Faster RCNN with MF dataset. (b) Faster RCNN with DNsp .

Fig. 12. Target detection and classification results of YOLOv3 under SOC. (a) YOLOv3 with MF dataset. (b) YOLOv3 with DNsp .

Fig. 13. Target detection and classification results of Faster RCNN under EOC. (a) Faster RCNN with MF dataset. (b) Faster RCNN with DNsp .
BI et al.: CNN-BASED TARGET DETECTION AND CLASSIFICATION WHEN SPARSE SAR IMAGE DATASET IS AVAILABLE 6823

Fig. 14. Target detection and classification results of YOLOv3 under EOC. (a) YOLOv3 with MF dataset. (b) YOLOv3 with DNsp . White box represents the
false classification.

TABLE VII
COMPARISON OF DIFFERENT DATASETS ON YOLOV3 UNDER EOC

C. YOLOv3 accuracy in the network, according to ResNet [47], darknet53


uses several residual modules, and then designs a novel one
YOLOv3 is a regression-based one-stage target detection
algorithm based on CNN, which directly outputs target coor- with more layers and higher accuracy. Nowadays, darknet53
dinates and conditional class probabilities. The main idea of is still one of the most advanced feature extraction networks.
YOLOv3 can be summarized as follows. First, feature extrac- In terms of target prediction, with the help of feature pyramid
tion network, named as Darknet53, is used to extract features networks (FPN) [48], YOLOv3 uses multiscale prediction to
from the input image to obtain a feature map of a certain predicts bound boxes with three different scales. For the input
size. The input image is divided into S × S grids, which has image with the size of 416×416, as shown in Fig. 8, YOLOv3
three scales in YOLOv3, i.e., 13, 26, and 52. The selection predicts the scales of 13×13, 26×26, 52×52, respectively. The
of scale is determined by the size of feature map. Each grid multiscale prediction enables YOLOv3 can be used to detect the
targets with different receptive field sizes, which significantly
is responsible for predicting three bounding boxes of objects.
The center coordinates of these objects are located in the grid. improves the detected ability of small target [40].
Each bounding box corresponds to five parameters, i.e., four
coordinates and one objective prediction. Only the bounding
IV. SPARSE SAR IMAGE BASED TARGET DETECTION AND
box that mostly overlaps the ground truth box of the object is
CLASSIFICATION
selected to predict the target. The other bounding boxes are used
only to calculate the confidence. Then, YOLOv3 will output a A. Dataset
result of S × S × (5 + C) dimension for each scale, where C In this section, the MSTAR dataset constructed by the MF
is the number of classes. recovered images will be used to verify the proposed frame-
The network structure of YOLOv3 is shown in Fig. 8. Com- work. MSTAR is a public dataset containing enough SAR target
pared with other algorithms in YOLO series, a major improve- samples with resolution being 0.3 × 0.3 m, and is widely used
ment of YOLOv3 is that it introduces a deeper network, called in the field of SAR target detection and classification. In this
darknet53, to achieve the better feature extraction [40]. Dark- dataset, there are ten different classes of military vehicle targets.
net53 has 53 convolutional layers including one connected layer. The aspect angle of each target class ranges from 0◦ to 360◦ . The
It directly discards the pooling layers and uses the convolu- SAR images of ten classes of targets in MSTAR and their corre-
tional layers with a step size of 2 for subsampling. To solve sponding optical images are shown in Fig. 9. In the experiments
the problems of increasing training difficulty and decreasing of this paper, SAR images with 15 different scenes are selected
6824 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 14, 2021

for background fusion. Each scene is fused with 15 targets. The discussed previously. The network with initial learning rate
class of each target in the scene is completely random, which of 0.001 and batch size of 16 is trained in this experiment.
means that the fused image may contain all categories or only one Quantitative results of YOLOv3 are listed in Table IV. Fig. 12
category. This randomness of dataset makes the experimental shows the target detection and classification results based on MF
results more reliable and applicable. In the proposed framework, and DNsp datasets, respectively. Similar to the result of Faster
we first reconstruct the fused images of MSTAR data by using RCNN, DNsp also shows the best classification performance
the CAMP-based sparse SAR imaging method, and obtain the with 99.29% mAP, which outperforms MF dataset and DSp
novel sparse SAR image dataset DSp and DNsp , respectively. by 0.14% and 3.43%, respectively. In addition, it should be
Examples in the DNsp dataset are shown in Fig. 10. In this article, noted that since the mAP of YOLOv3 via MF dataset has
the results of SAR target detection and classification based on reached 99.01%, there is no room for improvement. Therefore,
the original MF dataset (MSTAR), the CAMP’s sparse solution an increase of 0.14% is meaningful for target detection and
dataset DSp , and the CAMP’s nonsparse solution dataset DNsp classification. After comparing the results of YOLOv3 and
will be compared. All experiments are conducted on the hard- Faster RCNN under SOC, it can be seen that YOLOv3 has
ware platform with NVIDIA RTX2080Ti GPU and Intel Xeon obvious advantages in both classification accuracy and detection
CPU. The results of target detection and classification will be time. It outperforms Faster RCNN by 6.69% mAP when DNsp
compared by the evaluation indexes, mAP. Average precision is available. In terms of detection time, YOLOv3 only needs
(AP) is an index combining precision and recall rate, which can 15.67 ms per image, which is much faster than Faster RCNN.
comprehensively evaluate the recognition performance of the
model. In general, the model performance is proportional to AP.
mAP is the average of APs on multiple validation sets. C. Comparison Under EOC
EOC is consist of EOC-1 and EOC-2. EOC-1 is suitable for
the situation where the target in the train set and the test set
B. Comparison Under SOC has a big change in depression angle. Without loss of gener-
SOC is suitable for the situation where the target categories ality, in this article, we use EOC-1 as the example to validate
and serial numbers in the test set are the same as those in the the proposed framework. Similar result will be obtained under
train set, but with the different depression angles. In the train set, EOC-2. EOC-1 contains four categories for training and testing,
the targets are acquired at a 17◦ depression angle. While in the whose serial number, depression angle, and number of targets
test set, the targets are collected at a 15◦ depression angle. The per class are shown in Table V. In the training set, the targets are
serial number, depression angle, and number of targets per class acquired at the depression angle of 17◦ . In the test set, the targets
in train set and test set are shown in Table II. All target slices are are collected at 30◦ depression angle. Because SAR image is
randomly merged into 15 different scenes. Each fused image extremely sensitive to the change of depression angles, the 13◦
contains 15 targets. Due to random fusion, very few targets depression angle change in EOC-1 will increase the difficulty of
are repeatedly fused in different images. After abovementioned target detection and classification. After background fusion, the
fusion, the train set contains 220 images and the test set contains train set contains 90 images and the test set contains 95 images.
200 images. The framework and network parameters in this experiment are
1) Faster RCNN. The network with learning rate of 0.001 the same as those in SOC.
and batch size of 16 is trained first. Quantitative experimental 1) Faster RCNN. Experimental results of Faster RCNN under
results of Faster RCNN based on different datasets are listed in EOC are shown in Table VI. It is seen that Faster RCNN still
Table III. Fig. 11 shows the examples of target detection and clas- gets improved results with 95.69% mAP by using DNsp dataset.
sification result based on MF and DNsp datasets, respectively. Compared with the result based on MF dataset, the mAP via
From Table III, it is seen that by using the CAMP’s nonsparse DNsp is significantly increased by 6.30%, which is an impor-
solution dataset DNsp , Faster RCNN can obtain 92.60% mAP tant improvement of target classification performance. From
in the test set, which is much higher than 89.13% mAP of MF Table VI, it also can be seen that compared with MF dataset,
dataset, and 86.40% mAP of CAMP’s sparse solution dataset DSp is still helpful in improving the recognition performance,
DSp . The main reason is that traditional sparse SAR image, but not as good as DNsp . Examples of the target classification
such as the sparse solution of CAMP, destroys the background results under EOC are shown in Fig. 13.
statistical distribution and details of target features, thus, greatly 2) YOLOv3. As shown in Table VII, when the CAMP’s non-
reduces the accuracy of large-class target classification. Com- sparse solution dataset DNsp is used for the target classification,
pared with the sparse SAR image, the nonsparse solution of YOLOv3 will achieve the optimal result with 89.91% mAP
CAMP algorithm can well preserve the feature information of under EOC. It outperforms MF dataset and CAMP’s sparse
the target, e.g., shadow, details of target with low amplitude, solution dataset DSp by 5.09% and 0.72% mAP, respectively.
which is very helpful for the high accuracy target classification Examples of the classification results are shown in Fig. 14.
of SAR targets. In addition, compared with MF-based image, Compared with Faster RCNN, YOLOv3 has less accuracy, but
the nonsparse solution of CAMP has better image quality, which has faster detection speed.
also will improve the accuracy of classification. In this section, we conduct four groups of twelve comparisons.
2) YOLOv3. In the following, the comparison based on From the experimental results, it is found that by using both
YOLOv3 will be performed for the three kinds of datasets Faster RCNN and YOLOv3 methods, whether under SOC or
BI et al.: CNN-BASED TARGET DETECTION AND CLASSIFICATION WHEN SPARSE SAR IMAGE DATASET IS AVAILABLE 6825

EOC, the CAMP’s nonsparse solution dataset DNsp has shown [15] D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52,
optimal performance in target detection and classification. In no. 4, pp. 1289–1306, Apr. 2006.
[16] E. T. Candes, “Near-optimal signal recovery from random projections:
addition, when using DNsp , it can be seen that Faster RCNN Universal encoding strategies,” IEEE Trans. Inf. Theory, vol. 52, no. 12,
has better performance than YOLOv3 under EOC. In contrast, pp. 5406–5425, Dec. 2006.
YOLOv3 works better under SOC, especially in the detection [17] E. T. Candes, J. K. Romberg, and T. Tao, “Stable signal recovery from
incomplete and inaccurate measurements,” Commun. Pure. Appl. Math.,
speed, which is desirable to the real-time processing. vol. 59, no. 8, pp. 1207–1223, 2006.
[18] I. Daubechies, M. Defriese, and C. De Mol, “An iterative thresholding
V. CONCLUSION algorithm for linear inverse problems with a sparsity constraint,” Commun.
Pure Appl. Math., vol. 57, no. 11, pp. 1413–1457, 2004.
In this article, we propose a novel target detection and classi- [19] H. Bi and G. Bi, “A novel iterative soft thresholding algorithm for L1 reg-
ularization based SAR image enhancement,” Sci. China Inf. Sci., vol. 62,
fication framework based on sparse SAR image dataset. First, a no. 4, pp. 1–3, 2019.
novel CAMP-based sparse imaging method is used to obtain [20] H. Bi and G. Bi, “Performance analysis of iterative soft thresholding
the sparse SAR image datasets DSp and DNsp . Then, two algorithm for L1 regularization based sparse SAR imaging,” in Proc. IEEE
Radar Conf., Boston, MA, USA, 2019, pp. 1–6.
conventional CNN methods, Faster RCNN and YOLOv3 are [21] Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, “Orthogonal matching
used for the target detection and classification based on DSp pursuit: Recursive function approximation with applications to wavelet
and DNsp . Experimental results show that DNsp has optimal decomposition,” in Proc. 27th Proc. Asilomar Conf. Signal Sys. Comput.,
Pacific Grove, CA, USA, 1993, pp. 40–44.
performance in CNN-based target detection and classification. [22] D. L. Donoho, Y. Tsaig, I. Drori, and J. Starck, “Sparse solution of
Under EOC, the mAP value of Faster RCNN and YOLOv3 are underdetermined systems of linear equations by stagewise orthogonal
95.69% and 88.21% mAP, respectively, which is higher than matching pursuit,” IEEE Trans. Inf. Theory, vol. 58, no. 2, pp. 1094–1121,
Feb. 2012.
the other two kinds of datasets, MF dataset and DSp . These [23] D. L. Donoho, A. Maleki, and A. Montanari, “Message passing algo-
two mAP values even reach 92.60% and 99.29% under SOC, rithms for compressed sensing,” Proc. Nat. Acad. Sci., vol. 106, no. 45,
which means that the novel sparse SAR image dataset has much pp. 18914–18919, 2009.
[24] A. Maleki, L. Anitori, Z. Yang, and R. G. Baraniuk, “Asymptotic analysis
better performance in SAR target detection and classification, of complex LASSO via complex approximate message passing (CAMP),”
and shows a huge application potential for military battlefield in IEEE Trans. Inf. Theory, vol. 59, no. 7, pp. 4290–4308, Jul. 2013.
the future. [25] L. Anitori, A. Maleki, M. Otten, R. G. Baraniuk, and P. Hoogeboom,
“Design and analysis of compressed sensing radar detectors,” IEEE Trans.
Signal process., vol. 61, no. 4, pp. 813–827, Feb. 2013.
REFERENCES [26] H. Bi, B. Zhang, X. Zhu, W. Hong, J. Sun, and Y. Wu, “L1 regularization
based SAR imaging and CFAR detection via complex approximated
[1] J. C. Curlander and R. N. Mcdonough, Synthetic Aperture Radar: Systems message passing,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 6,
and Signal Processing. New York, NY, USA: Wiley, 1991. pp. 3426–3440, Jun. 2017.
[2] F. M. Henderson and A. J. Lewis, “Principle and Application of Imaging [27] C. Shan et al., “Gesture recognition using temporal template based tra-
Radar. New York, NY, USA: Wiley, 1998. jectories,” in Proc. 17th Int. Conf. Pattern Recognit., Cambridge, Britain,
[3] C. Clemente and J. J. Soraghan, “Vibrating target micro-doppler signature 2004, pp. 954–957.
in bistatic SAR with a fixed receiver,” IEEE Trans. Geosci. Remote Sens., [28] R. Ahmmed and M. F. Hossain, “Tumor detection in brain MRI image
vol. 50, no. 8, pp. 3219–3227, Aug. 2012. using template based k-means and fuzzy c-means clustering algorithm,”
[4] D. Cerutti-Maori et al., “Precision SAR processing using chirp scal- in Proc. IEEE Int. Conf. Comput. Commun. Inform., Coimbatore, India,
ing,” IEEE Trans. Geosci. Remote Sens., vol. 46, no. 10, pp. 3019–3030, 2016, pp. 1–6.
Jul. 2008. [29] J. Zhu, X. Qiu, Z. Pan, Y. Zhang, and B. Lei, “Projection shape template-
[5] S. Singha, T. J. Bellerby, and O. Trieschmann, “Satellite oil spill detection based ship target recognition in TerraSAR-X images,” IEEE Trans. Geosci.
using artificial neural networks,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 14, no. 2, pp. 222–226, Feb. 2017.
Remote Sens., vol. 6, no. 6, pp. 2355–2363, Dec. 2013. [30] G. Magna et al., “Adaptive classification model based on artificial immune
[6] M. Neumann, L. Ferro-Famil, and A. Reigber, “Estimation of forest system for breast cancer detection,” in Proc. IEEE AISEM Annu. Conf.,
structure, ground, and canopy layer characteristics from multibaseline Trento, Italy, 2015, pp. 1–4.
polarimetric interferometric SAR data,” IEEE Trans. Geosci. Remote [31] J. J. Gertler, “Survey of model-based failure detection and isolation
Sens., vol. 48, no. 3, pp. 1086–1104, Mar. 2010. in complex plants,” IEEE Control Syst. Mag., vol. 8, no. 6, pp. 3–11,
[7] R. Bamler, “A comparsion of range-doppler and wavenamber domain SAR Dec. 1988.
focusing algorithms,” IEEE Trans. Geosci. Remote Sens., vol. 30, no. 4, [32] A. Sheikhi, A. Zamani, and Y. Norouzi, “Model-based adaptive target
pp. 706–713, Jul. 1992. detection in clutter using MIMO radar,” in Proc. CIE Inter. Conf. Radar,
[8] Y. L. Neo, F. H. Wong, and I. G. Cumming, “Processing of azimuth- Shanghai, China, 2006, pp. 1–4.
invariant bistatic SAR data using the range doppler algorithm,” IEEE [33] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
Trans. Geosci. Remote Sens., vol. 46, no. 1, pp. 14–21, Jan. 2008. with deep convolutional neural networks,” in Proc. 25th Int. Conf. Neural
[9] R. K. Raney, H. Runge, R. Bamler, I. G. Cumming, and F. H. Wong, Inf. Process. Syst., Trento, Italy, 2015, pp. 1–4.
“Precision SAR processing using chirp scaling,” IEEE Trans. Geosci. [34] O. Russakovsky et al., “Imagenet large scale visual recognition challenge,”
Remote Sens., vol. 32, no. 4, pp. 786–799, Jul. 1994. Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, 2015.
[10] J. Mittermayer, R. Lord, and E. Borner, “Sliding spotlight SAR processing [35] R. Girshick, “Fast R-CNN,” in Proc. IEEE 5th Int. Conf. Comput. Vis.,
for TerraSAR-X using a new formulation of the extended chirp scaling Santiago, Spain, 2015, pp. 1440–1448.
algorithm,” in Proc. IEEE Int. Geosci. Remote Sens. Symp., Toulouse, [36] R. Girshick et al., “Rich feature hierarchies for accurate object detection
France, 2003, pp. 1462–1464. and semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern
[11] F. H. Wong, I. G. Cumming, and Y. L. Neo, “Focusing bistatic SAR data Recognit., Columbus, America, 2014, pp. 580–587.
using the nonlinear chirp scaling algorithm,” IEEE Trans. Geosci. Remote [37] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time
Sens., vol. 46, no. 9, pp. 2493–2505, Sep. 2008. object detection with region proposal networks,” IEEE Trans. Pattern Anal.
[12] I. G. Cumming and F. H. Wong, Digital Processing of Synthetic Aper- Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017.
ture Radar Data: Algorithms and Implementation. Norwood, MA, USA: [38] R. Girshick et al., “Rich feature hierarchies for accurate object detection
Artech House, 2004. and semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern
[13] B. Zhang, W. Hong, and Y. Wu, “Sparse microwave imaging: Principles Recognit., Las Vegas, America, 2016, pp. 779–788.
and applications,” Sci. China Inf. Sci., vol. 55, no. 8, pp. 1–33, 2012. [39] J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” in Proc.
[14] R. G. Baraniuk et al., “Applications of sparse representation and compres- IEEE Conf. Comput. Vis. Pattern Recognit., Honolulu, America, 2017,
sive sensing,” Proc. IEEE., vol. 98, no. 6, pp. 906–909, Jun. 2010. pp. 6517–6525.
6826 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 14, 2021

[40] J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” Tianwen Yang was born in Jiangsu, China, in 1998.
Tech. Rep., pp. 1–6, 2018. She received the bachelor’s degree in electronics and
[41] W. Liu et al., “SSD: Single shot multibox detector,” in Proc. Eur. Conf. information engineering from the College of Elec-
Comp. Vis., EurAmsterdam, Netherlands, 2016, pp. 21–37. tronic and Information Engineering, Nanjing Uni-
[42] M. Dong et al., “End-to-end target detection and classification with data versity of Aeronautics and Astronautics, Nanjing,
augmentation in SAR images,” in Proc. IEEE Int. Conf. Comput. Electro- China, in 2020. She is currently working toward the
magnetics, Shanghai, China, 2019, pp. 1–3. master’s degree in signal and information processing
[43] M. Kang et al., “A modified faster R-CNN based on CFAR algorithm for with Southeast University, Nanjing.
SAR ship detection,” in Proc. Int. Workshop Remote Sens. Intell. Process, Her research interests include communication and
Shanghai, China, 2017, pp. 1–4. information system.
[44] N. Wang, Y. Wang, H. Liu, Q. Zuo, and J. He, “Feature-fused SAR
target discrimination using multiple convolutional neural networks,” IEEE
Geosci. Remote Sens. Lett., vol. 14, no. 10, pp. 1695–1699, Oct. 2017.
[45] H. Bi, G. Bi, B. Zhang, and W. Hong, “Complex-image-based sparse SAR
imaigng and its equivalence,”IEEE Trans. Geosci. Remote Sens., vol. 56,
no. 9, pp. 5006–5014, Sep. 2018.
[46] M. Çetin, W. C. Karl, and D. A. Castanon, “Feature enhancement and ATR
performance using nonquadratic optimization-based SAR imaging,” IEEE Jian Wang was born in Anhui, China, in 1999. He re-
Trans. Aero. Elec. Sys., vol. 39, no. 4, pp. 1375–1395, Oct. 2003. ceived the bachelor’s degree in electronics and infor-
[47] K. He et al., “Deep residual learning for image recognition,” in Proc. mation engineering from the College of Automation,
IEEE Conf. Comput. Vis. Pattern Recognit., Las Vegas, America, 2016, Chongqing University, Chongqing, China. He is cur-
pp. 770–778. rently working toward the master’s degree in signal
[48] T. Lin et al., “Feature pyramid networks for object detection,” in Proc. and information processing with Nanjing University
IEEE Conf. Comput. Vis. Pattern Recognit., Honolulu, America, 2017, of Aeronautics and Astronautics, Nanjing, China.
pp. 936–944. His research interest includes sparse SAR image
processing and application.

Hui Bi (Member, IEEE) was born in Shandong,


China, in 1991. He received the bachelor’s degree in
electronics and information engineering from YanTai
University, Yantai, China, in 2012, and the Ph.D.
degree in signal and information processing from the
University of Chinese Academy of Sciences, Beijing,
China, in 2017. Ling Wang (Member, IEEE) received the B.S. degree
From 2012 to 2017, he was with the Science and in electrical engineering, and the M.S. and Ph.D.
Technology on Microwave Imaging Laboratory, In- degrees in information acquirement and processing
stitute of Electronics, Chinese Academy of Sciences, from the Nanjing University of Aeronautics and As-
China. He was a Research Fellow with the School tronautics, Nanjing, China, in 2000, 2003, and 2006,
of Electrical and Electronic Engineering, Nanyang Technological University, respectively.
Singapore, from 2017 to 2018. Since 2018, he has been working in the College Since 2003, she has been with the Nanjing Uni-
of Electronic and Information Engineering, Nanjing University of Aeronautics versity of Aeronautics and Astronautics, where she
and Astronautics, Nanjing, China as an Associate Professor. His main research is currently a Professor with the Department of In-
interests include sparse microwave imaging with compressive sensing, synthetic formation and Communication Engineering. From
aperture radar data processing and application, sparse signal processing, and February 2008 to May 2009, she was a Postdoctoral
tomographic SAR imaging. Research Associate with the Department of Mathematical Sciences and the
Department of Electrical, Computer, and Systems Engineering, Rensselaer
Polytechnic Institute, Troy, NY, USA. She has authored and coauthored more
than 100 publications. Her current research interests include radar, imaging
problems, image processing in vision-based navigation, and image-based target
Jiarui Deng was born in Jiangsu, China, in 1997. She reconstruction and recognition.
received the bachelor’s degree in electronics and in- Miss Wang was the recipient of the Alexander Humboldt Fellowship for
formation engineering from the College of Electronic Experienced Researchers in 2014 and worked with Professor Otmar Loffeld
and Information Engineering, Nanjing University of in University of Siegen from 2015 to 2016.
Aeronautics and Astronautics, Nanjing, China, in
2020. She is currently working toward the master’s
degree in signal and information processing with
Nanjing University of Aeronautics and Astronautics,
Nanjing.
Her research interest includes sparse SAR image
processing and application.

You might also like