Accident Detection
Accident Detection
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2939532, IEEE Access
Date of publication xxxx 00, 0000, date of current version June 19, 2019.
Digital Object Identifier
ABSTRACT Car accidents cause a large number of deaths and disabilities every day, a certain proportion
of which result from untimely treatment and secondary accidents. To some extent, automatic car accident
detection can shorten response time of rescue agencies and vehicles around accidents to improve rescue
efficiency and traffic safety level. In this paper, we proposed an automatic car accident detection method
based on Cooperative Vehicle Infrastructure Systems (CVIS) and machine vision. First of all, a novel image
dataset CAD-CVIS is established to improve accuracy of accident detection based on intelligent roadside
devices in CVIS. Especially, CAD-CVIS is consisted of various kinds of accident types, weather conditions
and accident location, which can improve self-adaptability of accident detection methods among different
traffic situations. Secondly, we develop a deep neural network model YOLO-CA based on CAD-CVIS and
deep learning algorithms to detect accident. In the model, we utilize Multi-Scale Feature Fusion (MSFF)
and loss function with dynamic weights to enhance performance of detecting small objects. Finally, our
experiment study evaluates performance of YOLO-CA for detecting car accidents, and the results show
that our proposed method can detect car accident in 0.0461 seconds (21.6FPS) with 90.02% average
precision (AP). In additionally, we compare YOLO-CA with other object detection models, and the results
demonstrate the comprehensive performance improvement on the accuracy and real-time over other models.
INDEX TERMS Car accident detection, CVIS, machine vision, deep learning
VOLUME 4, 2016 1
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2939532, IEEE Access
video features in traffic accidents, such as vehicle collision, High Sensitive GPS receiver is considered as the index for
rollover and so on. To some extent, these features can be detecting accidents, and the GSM/GPRS modem is utilized to
used to detect or predict car accidents. Accordingly, some send the location of the accident. [24] presented a prototype
researchers apply the machine vision technology based on system called e-NOTIFY, which monitors the change of ac-
deep-learning into methods of car accident detection. These celeration to detect accident and utilize V2X communication
methods extract and process complex image features instead technologies to report it. To a certain extent, these methods
of single vehicle motion parameter, which improves the can detect and report car accidents in short time, and improve
accuracy of detecting car accidents. However, the datasets the efficiency of car accidents warning. However, the vehi-
of these methods are mostly captured by car cameras or cle running condition before car accidents is complex and
cell phones of pedestrian, which is not suitable for roadside unpredictable, and the accuracy of accident detection only
devices in CVIS. In additionally, the reliability and real-time based on speed and acceleration may be low. In addition, they
performance of these methods need to be improved to meet rely too heavily on vehicular monitoring and communication
the requirements of car accident detection. equipment, which may be unreliable or damaged in some
In this paper, we propose a data-driven car accident detec- extreme circumstances, such as heavy canopy, underground
tion method based on CVIS, whose goal is improving effi- tunnel, and serious car accidents.
ciency and accuracy of car accident response. With the goal,
we focus on such a general application scenario when there B. METHOD BASED VIDEO FEATURES
is an accident on the road, roadside intelligent devices recog- With the development of machine vision and artificial neural
nize and locate it efficiently. First, we build a novel dataset, network technology, more and more applications based on
Car Accident Detection for Cooperative Vehicle Infrastruc- video processing have been applied in transportation and
ture System dataset (CAD-CVIS), which is more suitable for vehicle fields. Under this background, some researchers uti-
car accident detection based on roadside intelligent devices lized video features of the car accident to detect it. [25] pre-
in CVIS. Then, a deep learning model YOLO-CA based on sented a Dynamic-Spatial-Attention Recurrent Neural Net-
CAD-CVIS is developed to detect car accident. Especially, work (RNN) for anticipating accidents in dashcam videos,
we optimize the network of traditional deep learning models which can predict accidents about 2 seconds before they
YOLO [21] to build network of YOLO-CA, which is more occur with 80% recall and 56.14% precision. [26] proposed
accurate and fast in detecting car accident. In additionally, a car accident detection system based on first-person videos,
considering of wide shooting scope of roadside cameras in which detected anomalies by predicting the future locations
CVIS, multi-scale feature fusion method and loss function of car participants and then monitoring the prediction accu-
with dynamic weights are utilized to improve performance racy and consistency metrics. These methods also have some
of detecting small objects. limitations because of low penetration of vehicular intelligent
The rest of this paper is organized as follows: Section 2 devices and shielding effects between vehicles.
gives an overviews of related work. We present the details of There are also some other methods which use roadside
our proposed method in Section 3. The performance evalua- devices instead of vehicular equipments to obtain and pro-
tion is discussed in Section 4. Finally, Section 5 conclude this cess video. [27] proposed a novel accident detection system
paper. at intersection, which composed background images from
image sequence and detected accidents by using Hidden
II. RELATED WORK Markov Model. [28] outlined a novel method for modeling
The car accident detection and notification method is a of interaction among multiple moving objects, and used the
challenging issue and has attracted a lot of attention from Motion Interaction Field to detect and localize car accidents.
researchers. They have proposed and applied various car ac- [29] proposed a novel approach for automatic road accident
cident detection methods. In generally, car accident detection detection, which was based on detecting damaged vehicles
methods are mainly divided into the following two kinds: from footage received from surveillance cameras installed
vehicle running condition-based and accident video features- in roads. In this method, Histogram of gradients (HOG)
based. and Gray level co-occurrence matrix features were used to
train support vector machines. [30] presented a novel dataset
A. METHOD BASED ON VEHICLE RUNNING CONDITION for car accidents analysis based on traffic Closed-Circuit
When an accident occurs, the motion state of the vehicle Television (CCTV) footage, and combined Faster Regions-
will change dramatically. Therefore, many researchers pro- Convolutional Neural Network (R-CNN) and Context Min-
posed the accident detection method by monitoring motion ing to detect and predict car accidents. The method in [30]
parameters, such as acceleration, velocity and so on. [22] achieved 1.68 seconds in terms of Time-To-Accident mea-
used On Board Diagnosis (OBD) system to monitor speed sure with an Average Precision of 47.25%. [8] proposed a
and engine status to detect a crash, and utilized smart-phone novel framework for automatic car accident detection, which
to report the accident by Wi-Fi or cellular network. [23] learned feature representation from the spatio-temporal vol-
developed an accident detection and reporting system using umes of raw pixel intensity instead of traditional hand-crafted
GPS, GPRS, and GSM. The speed of vehicle obtained from features. The experiments of method in [8] demonstrated it
2 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2939532, IEEE Access
can detect on average 77.5% accidents correctly with 22.5% 1) Data collection and annotation
false alarms. There are two major challenges in collecting car accidents
Compared with the methods based on vehicle running data:(1) Access: access to roadside traffic cameras data is of-
condition, these methods improve the detection accuracy and ten limited. In addition, the accident data from transportation
some of them even can predict accidents about 2 seconds be- administration is often not available for public uses because
fore they occur. To some extent, these methods are significant of many legal reasons. (2) Abnormality: car accidents are rare
in decreasing the accident rate and improving traffic safety. in the road compared with normal traffic conditions. In this
However, the detection accuracy of these methods is low and work, we try to draw support from video sharing websites to
the error rate is high, and the wrong accident information will search the videos and images including car accidents, such
have a great impact on the normal traffic flow. Concerning as news report and documentary. In order to improve the
the core issue mentioned above, in order to avoid the draw- applicability of our proposed method to roadside edge device,
backs of vehicular cameras, our proposed method utilizes the we only pick out the videos and images captured from a
roadside intelligent edge devices to obtain traffic video and traffic CCTV footage.
process image. Moreover, for sake of improving the accuracy
of accident detection method based on intelligent roadside
devices, we establish the CAD-CVIS dataset based on video
sharing websites, which is consisted of various kinds of
accident types, weather conditions and accident locations.
Moreover, we develop the model YOLO-CA to improve the FIGURE 2. Data collection and annotation for the CAD-CVIS dataset.
reliability and real-time performance among different traffic
conditions by combining deep learning algorithms and MSFF Through the above steps, we obtain 633 car accidents
method. scenes, 3255 accident key frames and 225206 normal frames.
Moreover, the car accident scene only occupies a small
III. METHODS part of each accident frame. We utilize LabelImg [31] to
A. METHOD OVERVIEW annotate the location of the accident in each frame in detail
to enhance the accuracy of locating accident. The high ac-
curacy enables emergency message be sent to the vehicles
Intelligent that are in the same direction as accident more efficiently
roadside devices
CAD-CVIS
Data-driven
Rescue agencies and decrease the impact to the vehicles that are in the
YOLO-CA opposite direction. The whole steps of data collection and
Real-time
image
annotation are shown in Fig. 2. The CAD-CVIS dataset is
Accident?
Y
made available for research use through https://github.com/
DSRC/5G zzzzzzc/Car-accident-detection.
Car accident!
2) Statistics of the CAD-CVIS
Statistics of the CAD-CVIS dataset can be found in Fig. 3.It
can be found that the CAD-CVIS dataset includes various
types of car accidents, which can improve the adaptability of
our method to different conditions. According to the number
of vehicles in the accident, the CAD-CVIS dataset includes
FIGURE 1. The application scenario of the automatic car accident detection 323 Single Vehicle Accident frames, 2449 Double Vehi-
method based on CVIS.
cle Accidents frames and 483 Multiple Vehicle Accidents
frames. Moreover, the CAD-CVIS dataset covers a variety
The Fig. 1 shows the application principle of our proposed of weather conditions, such as 2769 accident frames under
car accident detection method based CVIS. Firstly, the car ac- sunny condition, 268 frames under foggy condition, 52 acci-
cident detection application program with YOLO-CA model dent frames under rainy condition and 166 accident frames
is deployed on the edge server, which is developed based on under snowy condition. Besides, there are 2588 frames of
CAD-CVIS and deep learning algorithms. Then edge server accidents in the daytime and 667 accident frames at night.
receives and processes the real-time image captured by road- In addition, the CAD-CVIS dataset contains 2281 frames
side cameras. Finally, the roadside communication unit will of accidents occurring at the intersection, 596 frames in the
broadcast the accident emergency messages to the relevant urban road, 189 frames in the expressway and 189 frames in
vehicles and rescue agencies by DSRC and 5G networks. In the highway.
the rest of this section, we will present the details of CAD- Comparison between CAD-CVIS and related datasets can
CVIS and YOLO-CA model. be found in Table. 1. The A in Table. 1 responses that there
is annotation of car accident in the dataset. R responses that
B. CAD-CVIS the videos and frames captured from the roadside CCTV
VOLUME 4, 2016 3
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2939532, IEEE Access
1500
2100
1800
end to end detection service. Because of eliminating the
1200
1500 process of selecting the proposal regions, these algorithms
900
1200
900
are very fast and still has guaranteeing accuracy. Considering
600
600 that accident detection requires high real-time performance,
300
300
0 0
we design the deep neural network based on one-stage model
Single Double Multiple Sunny Foggy Rainy Snowy
Number of vehicles in the accident Weather condition YOLO [21].
(a) (b)
1) Network Design
3000
2400 YOLO utilizes its particular CNN to complete classification
Number of traffic accident frames
2700
2400
2100
and location of multiple objects in an image at one time. In
1800
2100
1800 1500
the training process of YOLO, each image is divided into
1500
1200 S × S grids. If the center of an object falls into a grid cell,
1200
900
900
that grid cell is responsible for detecting that object [39].
600
600
300
300
This design can improve the detection speed dramatically
0
Daytime Night
0
Intersection Urban road Expressway Highway
and the detection accuracy with reference to global features.
Time of the accident Location of the accident
However, it also will cause serious detection error when there
(c) (d) are more than one objects in one grids. Roadside cameras
FIGURE 3. Number of accident frames in CAD-CVIS categorized by different have a wide scope of shooting, the accident area may be
indexes. (a) Accident Type (b) Weather condition (c) Accident time (d) Accident
location small in the image. Inspired of the multi-scale feature fusion
(MSFF) network, in order to improve the performance of
TABLE 1. Comparison between CAD-CVIS and related datasets model to detect small objects, we utilize 24 layers to achieve
image upsampling and obtain two different dimensional out-
Dataset name Scenes Frames or Duration A R M put tensors. This new car accident detection model is called
UCSD Ped2 77 1636 frames × X × as YOLO-CA, and the network structure diagram of YOLO-
CUHK Avenue 47 3820 frames × × X
DAD 620 2.4 hours X × X CA is shown as Fig. 4.
CADP 1416 5.2 hours × X X As shown in Fig. 4, YOLO-CA is composed of 228 neural
CAD-CVIS 632 3255+225206 frames X X X network layers, and the number of each kind of layer is
shown in Table. 2. These layers constitute many kinds of
basic components of YOLO-CA network, such as DBL and
footage. M responses that there are multiple road conditions ResN. The DBL is the minimum components of YOLO-
in dataset. Compared with CUHK Avenue [32], UCSD Ped2 CA network, which is composed of Convolution layer, Batch
[33] and DAD [25], CAD-CVIS contains more car accident Normalization layer and Leaky ReLU layer. ResN consists of
scenes, which can improve the adaptability of model based Zero Padding layer, DBL and N Resblock_units [40], which
on CAD-CVIS. Moreover, the frames of CAD-CVIS are all is designed to avoid neural network degradation caused by
captured from roadside CCTV footage, which is more suit- increased depth. Ups in Fig. 4 is upsampling layer, which is
able for the accident detection methods based on intelligent utilized to improve the performance of YOLO-CA to detect
roadside devices in CVIS. small objects. Concat is concatenate layer, which is used to
concatenate the layer in Darknet-53 and upsampling layer.
C. OUR PROPOSED DEEP NEURAL NETWORK MODEL
TABLE 2. Composition of YOLO-CA Network
In the task of car accident detection, we must not only
judge whether there is a car accident in the image, but
Layer name Number
also accurately locate the car accident. That’s because the Input 1
accurate location guarantees that the RSU can broadcast the Convolution 65
emergency message to the vehicles affected by the accident. Batch Normalization 65
Leaky ReLU 65
The classification and location algorithms can be divided Zero Padding 5
into two kinds:(1) Two stage model, such as R-CNN [34], Add 23
Fast R-CNN [35], Faster R-CNN [36] and Faster R-CNN Upsampling 1
Concatenate 1
with FPN [37]. These algorithms utilize selective research Total 228
and Region Proposal Network (RPN) to select about 2000
proposal regions in the image, and then detection objects
by the features of these regions extracted by CNN. These 2) Detection principle
region-based models locate objects accurately, but extracting Fig. 5 shows the detection principle of YOLO-CA, which
proposals take a great deal of time. (2) One stage model, includes extracting feature map and predicting bounding box.
4 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2939532, IEEE Access
DBL*5
Input DBL*5
416x416x3 DBL UpS
Concat
DBL DBL Conv
Output2
26x26x18
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2939532, IEEE Access
b
1X
Loss = Loss_imgk (7)
Intersection b
k=1
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2939532, IEEE Access
0.6 0.6
Recall
0.4 0.4
0.2 0.2
1
0 0
0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3
Batches 104 Batches 104
0.9
(a) (b)
0.8
1 2
Precision
0.7
0.8 1.6
0.6
0.6 1.2
Fast R-CNN (AP=77.65%)
Loss
0.5
IoU
in training set, IoU finally stabilizes above 0.8. The Fig. 7d 0.8
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2939532, IEEE Access
TABLE 4. AP and IoU results of different models among different scales of object
In order to compare and analysis the performance of mod- loss among different scales of objects. This process increases
els in details, the objects of test set is divided into three parts the error punishment of small objects, because that the
according to different scales of objects:(1) Large: the area of same errors of x, y, w, h cause more serious impact on the
object is larger than one tenth of image size. (2) Medium: detection effect of the small object than that of the large
the area of object is over the interval [1/100, 1/10] of image object. Consequently, YOLO-CA has obvious advantages in
size. (3) Small: the area of object is less than one-hundredth AP and Average IoU of small objects than YOLOv3. The
of image size. MSFF processes of Faster R-CNN with FPN and YOLO-
CA are similar, feature pyramid networks is used to extract
The Table. 4 shows the AP and IoU results of the seven
feature maps of different scales and fuse these maps to obtain
models among different scales of object. We can intuitively
features with high-semantic and high-resolution. Faster R-
see that the scales of objects significantly affect the accuracy
CNN utilizes RPN to select about 20000 proposal regions,
and locating performance of detection models. It can be
whereas there are only 13∗13∗3+26∗26∗3 = 2535 candidate
found that our proposed YOLO-CA has obvious advantages
bounding boxes in YOLO-CA. This difference results in
in AP and Average IoU than Fast R-CNN, Faster R-CNN
Faster R-CNN has slight advantages in accuracy performance
and YOLOv3 without MSFF, especially among small scale
than YOLO-CA, but also causes serious disadvantages in
of objects. There is not MSFF process in the above three
real-time performance.
models, which results in that they detection the objects only
rely on the top-level features. However, although there is
rich semantic information in top-level features, the location
30
information of objects is rough, which does not benefit to
27
locate the bounding box of objects correctly. On the contrary,
24
there is little semantic information in low-level features with
high resolution, but the location information of objects is 21
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2939532, IEEE Access
RPN, Faster R-CNN achieve about 3.5 of FPS among test set 4) Comparison with other car accident detection methods
(Faster R-CNN:3.5, Faster R-CNN with FPN:3.6). Although other car accident detection methods utilize a small
Although Faster R-CNN obtains significantly improve- private collection of datasets and do not make them public
ment of real-time performance compared with Fast R-CNN, so comparing them may not be fair at this stage. But still,
there is still a big gap with one-stage models. That is be- we list the performance achieved by these methods on their
cause one-stage models abandon the process of selecting individual datasets. ARRS [3] achieve about 63% AP with
proposal regions and utilize one CNN to implement location 6% false alarms. The method of [42] achieve 89.50% AP.
and classification of objects. As shown in Fig. 9, SSD can DSA-RNN [25] achieve about 80% recall and 56.14% AP.
achieve 15.6 of FPS among test set. The other three models The method in [30] achieve about 47.25% AP. The method
based on YOLO utilize the backbone Darknet-53 instead of of [8] achieve 77.5% AP and 22.5% false alarms. Moreover,
VGG-16 in SSD, and computation of the former network the number of accident scenes of the datasets utilized in these
is significantly less than the latter because of using the methods is limited, which will result in poor adaptability for
residual networks. Therefore, the real-time performance of new scenarios.
SSD is lower than YOLO-based models in our experiments.
In additionally, our proposed YOLO-CA simplifies the MSFF V. CONCLUSION
networks of YOLOv3. So YOLO-CA can achieve 21.7 of In this paper, we have proposed an automatic car accident
FPS, which is higher than that of YOLOv3 (about 19.1). Be- detection method based on CVIS. First of all, we present the
cause of lacking MSFF process in YOLOv3 without MSFF, application principles of our proposed method in the CVIS.
it has better real-time performance (about 23.6 of FPS) than Secondly, we build a novel image dataset CAD-CVIS, which
YOLO-CA, but this lacking results in serious performance is more suitable for car accident detection method based on
penalties of AP. intelligent roadside devices in CVIS. Then we develop the car
Fig. 10 show some visual results of the seven models accident detection model YOLO-CA based on CAD-CVIS
among different scales of objects. It can be found that there and deep learning algorithms. In the model, we combine the
is a false positive in the large objects detection results of Fast multi-scale feature fusion and loss function with dynamic
R-CNN, but the other six models all have high accuracy and weights to improve real-time and accuracy of YOLO-CA.
locating performance in large objects in Fig. 10. However, Finally, we show the simulation experiments results of our
the locating performance of Fast R-CNN, Faster R-CNN, method, which demonstrates our proposed methods can de-
SSD, and YOLOv3 without MSFF decrease significantly in tect car accident in 0.0461 seconds with 90.02% AP. More-
medium object frame (1), and the prediction bounding box over, the comparative experiments results show that YOLO-
cannot fitting out the contour of car accident. Moreover, Fast CA has comprehensive performance advantages of detecting
R-CNN, SSD, and YOLO-without MSFF cannot detect the car accident than other detection models, in terms of accuracy
car accident in small object frame (1). In additionally, except and real-time.
for Faster R-CNN with FPN and YOLO-CA, other models
have serious location error in small object frame (3). REFERENCES
[1] WHO, “Global status report on road safety 2018,” https://www.who.int/
3) Comparison of comprehensive performance and violence_injury_prevention/road_safety_status/2018/en/.
[2] H. L. Wang and M. A. Jia-Liang, “A design of smart car accident rescue
practicality system combined with wechat platform,” Journal of Transportation Engi-
As analyzed above, it can be found that our proposed YOLO- neering, 2017.
CA has performance advantages of detecting car accident [3] Y. Ki and D. Lee, “A traffic accident recording and reporting model at
intersections,” IEEE Transactions on Intelligent Transportation Systems,
than Fast R-CNN, Faster R-CNN, SSD, and YOLOv3 in vol. 8, no. 2, pp. 188–194, June 2007.
terms of accuracy, locating and real-time performance. For [4] W. Hao and J. Daniel, “Motor vehicle driver injury severity study under
YOLOv3 without MSFF, the FPS of it (23.6) is higher than various traffic control at highway-rail grade crossings in the united states,”
Journal of Safety Research, vol. 51, pp. 41–48, 2014.
that of YOLO-CA (21.7), and this difference is acceptable in [5] J. White, C. Thompson, H. Turner, B. Dougherty, and D. C. Schmidt,
the practical application of detecting car accident. However, “Wreckwatch: Automatic traffic accident detection and notification with
the AP of YOLO-CA is significantly higher than that of smartphones,” Mobile Networks and Applications, vol. 16, no. 3, pp. 285–
303, 2011.
YOLOv3 without MSFF, especially for small scales of object [6] S. Sadek, A. Al-Hamadi, B. Michaelis, and U. Sayed, “Real-time auto-
(76.51% vs 58.89%). Compared with Faster R-CNN with matic traffic accident recognition using hfg,” in International Conference
FPN, YOLO-CA can approach the AP of it (90.66% vs on Pattern Recognition, 2010.
[7] A. Shaik, N. Bowen, J. Bole, G. Kunzi, D. Bruce, A. Abdelgawad, and
90.03%) with an obvious speed advantage. Faster R-CNN K. Yelamarthi, “Smart car: An iot based accident detection system,” in
cost about 277ms on average to detect one frame, whereas 2018 IEEE Global Conference on Internet of Things (GCIoT). IEEE,
YOLO-CA only need 46 ms, which illustrates the speed 2018, pp. 1–5.
[8] D. Singh and C. K. Mohan, “Deep spatio-temporal representation for
of YOLO-CA is about 6x faster than Faster R-CNN with detection of road accidents using stacked autoencoder,” IEEE Transactions
FPN. Car accident detection in CVIS requires high real-time on Intelligent Transportation Systems, vol. 20, no. 3, pp. 879–887, March
performance because of the high dynamics of vehicles. To 2019.
[9] M. Zheng, T. Li, R. Zhu, J. Chen, Z. Ma, M. Tang, Z. Cui, and Z. Wang,
summarize, our proposed YOLO-CA have higher practicality “Traffic accident’s severity prediction: A deep-learning approach-based
and comprehensive performance on accuracy and real-time. cnn network,” IEEE Access, vol. 7, pp. 39 897–39 910, 2019.
VOLUME 4, 2016 9
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2939532, IEEE Access
(1)
Large (2)
(3)
(1)
Mediun (2)
(3)
(1)
Small (2)
(3)
FIGURE 10. Some visual results of the seven models among different scales of objects.
[10] L. Zheng, Z. Peng, J. Yan, and W. Han, “An online learning and unsuper- [16] S. Ramos, S. Gehrig, P. Pinggera, U. Franke, and C. Rother, “Detecting
vised traffic anomaly detection system,” Advanced Science Letters, vol. 7, unexpected obstacles for self-driving cars: Fusing deep learning and
no. 1, pp. 449–455, 2012. geometric modeling,” in 2017 IEEE Intelligent Vehicles Symposium (IV).
[11] F. Yang, S. Wang, J. Li, Z. Liu, and Q. Sun, “An overview of internet of IEEE, 2017, pp. 1025–1032.
vehicles,” China Communications, vol. 11, no. 10, pp. 1–15, Oct 2014. [17] T. Qu, Q. Zhang, and S. Sun, “Vehicle detection from high-resolution
[12] C. Ma, W. Hao, A. Wang, and H. Zhao, “Developing a coordinated aerial images using spatial pyramid pooling-based deep convolutional
signal control system for urban ring road under the vehicle-infrastructure neural networks,” Multimedia Tools and Applications, vol. 76, no. 20, pp.
connected environment,” IEEE Access, vol. 6, pp. 52 471–52 478, 2018. 21 651–21 663, 2017.
[13] S. Zhang, J. Chen, F. Lyu, N. Cheng, W. Shi, and X. Shen, “Vehicular [18] D. Dooley, B. McGinley, C. Hughes, L. Kilmartin, E. Jones, and
communication networks in the automated driving era,” IEEE Communi- M. Glavin, “A blind-zone detection method using a rear-mounted fisheye
cations Magazine, vol. 56, no. 9, pp. 26–32, 2018. camera with combination of vehicle detection methods,” IEEE Transac-
[14] Y. Wang, D. Zhang, Y. Liu, B. Dai, and L. H. Lee, “Enhancing transporta- tions on Intelligent Transportation Systems, vol. 17, no. 1, pp. 264–278,
tion systems via deep learning: A survey,” Transportation research part C: Jan 2016.
emerging technologies, 2018. [19] X. Changzhen, W. Cong, M. Weixin, and S. Yanmei, “A traffic sign
[15] G. Wu, F. Chen, X. Pan, M. Xu, and X. Zhu, “Using the visual intervention detection algorithm based on deep convolutional neural network,” in 2016
influence of pavement markings for rutting mitigation–part i: preliminary IEEE International Conference on Signal and Image Processing (ICSIP),
experiments and field tests,” International Journal of Pavement Engineer- Aug 2016, pp. 676–679.
ing, vol. 20, no. 6, pp. 734–746, 2019. [20] S. Zhang, C. Bauckhage, and A. B. Cremers, “Efficient pedestrian detec-
10 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2939532, IEEE Access
VOLUME 4, 2016 11
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.