0% found this document useful (0 votes)
418 views7 pages

Deep Reinforcement Learning for Self-Driving Car

Deep reinforcement learning is proposed to train a self-driving scale car. The car is first trained in a virtual environment created in Unity. A double deep Q-network is used to train the car to drive itself and receive rewards. The trained model is then successfully transferred to the real world to achieve self-driving. Behavioral cloning based on end-to-end deep learning has limitations in long-term planning and adaptability to new environments. Deep reinforcement learning allows the car to learn driving policies and automatically correct errors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
418 views7 pages

Deep Reinforcement Learning for Self-Driving Car

Deep reinforcement learning is proposed to train a self-driving scale car. The car is first trained in a virtual environment created in Unity. A double deep Q-network is used to train the car to drive itself and receive rewards. The trained model is then successfully transferred to the real world to achieve self-driving. Behavioral cloning based on end-to-end deep learning has limitations in long-term planning and adaptability to new environments. Deep reinforcement learning allows the car to learn driving policies and automatically correct errors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Self-driving scale car trained by Depp reinforcement Learning

Qi Zhang1 and Tao Du2

Abstract— This paper considers the problem of self-driving


algorithm based on deep learning. This is a hot topic because drive on their own than under humans supervision. Because
self-driving is the most important application field of artificial there are many problems of this replication pattern,
intelligence. Existing work focused on deep learning which has especially on the sensor. The traffic accidents of Tesla are
the ability to learn end-to-end self-driving control directly from caused by the failure of the perceived module in a bright
raw sensory data, but this method is just a mapping between
light environment. Deep reinforcement learning can make
images and driving. We prefer deep reinforcement learning to
arXiv:1909.03467v1 [[Link]] 8 Sep 2019

train a self-driving car in a virtual simulation environment cre- appropriate decisions even some modules fail in
ated by Unity and then migrate to reality. Deep reinforcement working[21].
learning makes the machine own the driving decision-making This paper focus on the issue of self-driving based on deep
ability like human. The virtual to realistic training method reinforcement learning, we modify a 1:16 RC car and train
can efficiently handle the problem that reinforcement learning
requires reward from the environment which probably cause
it by double deep Q network. We use a virtual-to-reality
cars damage. We have derived a theoretical model and analysis process to achieve it, which means training the car in the
on how to use Deep Q-learning to control a car to drive. We virtual environment and testing in reality. In order to get a
have carried out simulations in the Unity virtual environment reliable simulation environment, we create a Unity simula-
for evaluating the performance. Finally, we successfully tion training environment based on OpenAI gym. We set a
migrate the model to the real world and realize self-driving.
reasonable reward mechanism and modify the double deep
I. INTRODUCTION Q-learning networks which makes the algorithm suitable for
training a self-driving car. The car was trained in the Unity
The automotive industry is a special industry. In order simulation environment for many episodes. At last, the scale
to keep the passengers safety, any accident is unaccept- car is able to learn a pretty good policy to drive itself and
able. Therefore, the reliability and security must satisfy we successfully transfer the learned policy to the real world!
the stringent standard. The accuracy and robustness of the
sensors and algorithms are required extremely precision in
the procedures of self-driving vehicles. On the other hand,
self-driving cars are products for the average consumers,
so the cost of the cars need be controlled. High-precision
sensors[22] can improve the accuracy of the algorithms but
very expensive. This is a difficult contradiction need to
solve. Recently, the rapid development of artificial
intelligence technology, especially the deep learning, has
made a major breakthrough in the fields such like image
recognition and intelligent control. Deep learning
techniques, typically such as convolutional neural networks,
are widely used in various types of image processing, which
makes them suitable for self-driving applications. The
researchers use deep learning to build end-to-end deep
Fig. 1. The reinforcement learning Donkey car based on DDQN.
learning self-driving car whose core is learning through the
neural network under supervised, then get the mapping
relationship, finally achieve a pattern- replicating driving II. RELATED WORK
skills[23]. While end-to-end driving is easy to scale and Our aim is making a self-driving car trained by deep rein-
adaptable, it has limited ability to handle long-term planning forcement learning. Right now, the most common methods
which involves the nature of imitation learning[24,25]. We to train the car to perform self driving are behavioral cloning
prefer to let scale cares learn how to and line following. On a high level, behavioral cloning
*This work was supported by Beijing Municipal Commission of Educa- works by using a convolutional neural network to learn a
tion and North China University of Technology. mapping between car images (taken by the front camera) and
1
Qi Zhang is a undergraduate student in school of information science steering angle and throttle values through supervised
and technology, North China University of Technology, P.R. China. His
research interests including machine learning, image processing and data
learning. It is true that the behavioral cloning methods based
science. zhangqi131072 at [Link] on end-to-end deep learning can efficiently achieve the self-
2
Tao Du is Assistant Professor in school of information science and driving task. However because each part of the network is
technology, North China University of Technology, P.R. China. taodu used for feature extractor and controller (For example, the
at [Link]
full connect layer train the feature that extracted by
convolutional layer and
then output the control signals of turn), the boundary to learn a range of behaviors. Automated driving requires
between the feature extractor layer and controller layer is a series of correct actions to drive successfully. If the car
vague. Therefore, in this case, if we want to improve the learn from the dataset that we labeled, the learned model
adaptability of the model, we should increase the data will offset every time, and the model may offset a lot in
continually to traverse all the possible scene during driving. the end. Reinforcement learning can learn to automatically
Furthermore, the training data and test data are correct the offset. The key to a true autonomous vehicle is
independently identically distribution. If the distribution is self-learning, using more sensors does not solve the problem.
relatively different (In other words, the environment is It requires better coordination [21].
constantly changing), which may result in terrible problem. In this case, we use the algorithm of deep reinforcement
The other method, line following, works by using computer learning to make our self-driving car.
vision techniques to track the middle line and utilizes a PID
controller to get the car to follow the line. Aditya Kumar III. P ROPOSED METHOD
Jain used CNN technology to complete the self-driving car A. Self-driving scale car
with a camera[7]. Kaspar Sakmannti proposed a behavioral Autonomous vehicles are often composed of traditional
learning method [10], collecting human driving data through car-mounted sensor sensing systems, computer decision sys-
a camera, and then learning driving through CNN, which is tems and driving control systems[1]. The function of the
a typical supervised learning. Kwabena Agyeman designed a sensor sensing system is to capture surrounding environ-
car by linear regression versus blob tracking. However, these mental information and vehicle driving state, and provide
are the capabilities that under under manual intervention. We information support for decision controler. According to the
hope that cars can learn to drive by themselves, which is an scope of perception, it can be divided into environmental
intelligent way. information perception and vehicle state perception. The
In 1989, Watkins proposed the noted Q-learning algorithm. environmental information includes roads, pedestrians,
The algorithm is mainly based on the Q table to record obsta- cles, traffic control signals and vehicle geographic
the state - the value of the action pair, each episode will location. Vehicle information includes driving speed, gear
update the state value. In 2013, Mnih, Volodymyr, et al. position, engine speed, wheel speed, and the amount of oil,
pioneered the concept of deep reinforcement learning [11], etc.. Ac- cording to the implementation technology, it can be
successfully applied in Atari games. In 2015 they also divided into ultrasonic radar, video acquisition sensor and
improved the model [4]. Two identically structured networks positioning device[2].
are used in DQN: Behavior Network and Target Network. In our desired experiment, we only need to use visual data
Although this method improves the stability of the model, Q- as a sensing device. We use the RC car as a benchmark for
Learning’s problem of overestimating the value cannot be retrofitting. The hardwares we used including:
solved. To solve this problem, Hasselt proposed the Double
• Raspberry Pi(Raspberry Pi 3): This is a low-cost com-
Q Learning method, which is applied to DQN, which is
puter with a processing speed of 1.2 GHz and a
Double DQN(DDQN)[6]. The so-called Double Q Learning
memory of 1 GB. It is equipped with a customized
is to implement the selection of actions and the evaluation
version of the Linux system, supports Bluetooth, WIFI
of actions with different value functions.
communi- cation, and has rich support for i2c, etc.. The
Recently, the use of virtual simulation techniques to train
agreement amount is GPIO port, which is the
intensive learning models and then migrated to reality has
calculation brain for our auto-driving car.
been largely verified. OpenAI has developed a robotic arm
• PCA9685(Servo Driver PCA 9685): Includes an i2 C-
called Dactyl [18] that trains AI robots in a virtual en-
controlled PWM driver with a built-in clock to drive
vironment and finally applies them to physical robots. In
the modified servo system. Wide Angle Raspberry Pi
the later research and exploration, the relevant personnel
Camera: The resolution is 2592 x 1944 and the viewing
have been verified by the tasks of picking up and placing
angle is 160 degrees. It is our only environmental
objects [17], visual servo [19], flexible movement[20], etc.,
sensing device, which is our eyes.
all indicating their feasibility. In 2019, Luo, Wenhan, et al.
• Other: For the sake of beauty, according to the design
proposed an end-to-end active target tracking method based
provided by the Donkey Car community, 3D printed a
on reinforcement learning, which trained a robust active
car bracket for carrying various hardware devices.
tracker in a virtual environment through a custom reward
function and environment enhancement technology. B. Environment require
From the above work, we can infer that many of the visual 1) Donkey Car Simulator: The first step is to create a
autopilot algorithms learn through the neural network under high fidelity simulator for Donkey Car. Fortunately,
the condition of supervised learning, get the mapping rela- someone from the Donkey Car community has generously
tionship, and then control the car. However, it’s unintelligent. created a Donkey Car simulator in Unity. However, it is
Tesla’s driverless accident is caused by perceived module specifically designed to perform behavioral learning(i.e. save
failure in a bright light environment. However, the car the camera images with the corresponding steering angles
trained by reinforcement learning can solve this problem and throttle values in a file for supervised learning), but not
even some modules are invalid. Reinforcement learning cater for re- inforcement learning at all. What we expected
makes it easier is an OpenAI
steps together
• The final state is of dimension (1,80,80,4).
C. Algorithm
1) The model of Reinforcement Learning: Figure 3 shows
the elements and processes of reinforcement learning. The
agent takes action and interacts with the environment. The
environment returns rewards and moves to the next state.
Through multiple interactions, the agent gains experience
and seeks the optimal strategy in experience. This interactive
learning process is similar to the human learning style. Its
main features are trial and error and delayed return. The
learning process can be represented by the Markov decision
process. The Markov decision process consists of triples ”S,
A, P, r”:
S = {S1, S2, S3...}
Fig. 2. One 1:16 scale car. There is an opensource DIY self-driving
platform for small scale cars called donkeycar (visit [Link]). A = {a 1, a 2, a 3...}

Ps a t = P ΣSt+1 = st |St = s, At =
s
gym like interface where we can manipulate the simulated
aΣ r = r(s, a)
environment through calling reset() to reset the environment
and step(action) to step through the environment. We made S ais a collection of all states; A is a collection of all actions;
some modifications to make it compatible with P sst is the state transition probability;means the transition
reinforcement learning. Since we are going to write our probability when the agent takes action a and change state
reinforcement learning code in python, we have to first s to st . The r is reward function,which means the reward of
figure out a way to get python communicate with the Unity taking action a under state s.
environment. It turns out that the Unity simulator created by
Tawn Kramer also comes with python code for
communicating with Unity. The communication is done
through the Websocket protocol. Websocket protocol, unlike
HTTP, allows two way bidirec- tional communication
between server and client. In our case, our python server can
push messages directly to Unity (e.g. steering and throttle
actions), and our Unity client can also push information(e.g.
states and rewards) back to the python server.
2) Create a customized OpenAI gym environment for
Donkey Car: The next step is to create an OpenAI gym like
interface for training reinforcement learning algorithms. For
Fig. 3. The process of reinforcement learning.
those of you who are have trained reinforcement learning
algorithms before, we should be accustomed to the use
The agent forms an interaction trajectory
of a set of API for the reinforcement learning agent to
in each round of interaction with the
interact with the environment. The common ones are reset(),
environment(s1, a1, r1, s2, a2, r2...sT , aT , rT ), and he
step(), isgameover(), etc.. We can customize our own gym
cumulative return at the state is:
environment by extending the OpenAI gym class and im-
plementing the methods above. The resulting environment Rt = γrt+1 + γ 2 rt+2 + γ 3 rt+3 + ... = Σ∞k=0 γ k rt+k+1 (1)
is compatible with OpenAI gym. We can interact with the
Donkey environment using the familiar gym like interface. The γ ∈[0, 1], which is the discount coefficient of the return,
The environment also allows us to set frame skipping and is used to weigh the relationship between current returns and
train the reinforcement learning agent in headless mode(i.e. long-term returns. The higher the value the more attention is
without Unity GUI). Therefore, we have a virtual environ- paid to long-term returns and vice versa.
ment that we can use. We take the pixel images taken by the The goal of reinforcement learning is to learn strategies to
front camera of the Donkey car, and perform the following maximize the expectations of cumulative returns:
transformations: π(a| s) = argmax E[R] (2)
• Resize it from (120,160) to (80,80). a

• Turn it into grayscale. In order to solve the optimal strategy, the value function and
• Frame stacking: Stack 4 frames from previous time the action state value function are introduced to evaluate the
advantages and disadvantages of a certain state and action. to find the maximum Q value in each action directly in the
The value function is defined as follows: target Q network, but first find the action corresponding to
Vπ(s) = Eπ the maximum Q value in the current Q network:
[Σ∞k γkr t+k+1 | = s] (3)
=0 st amax(Stj, w) = argmax Q(φ(St ), a, w) (6)
Defining the action value function as: a

Qπ (s, a) = Then making use of the selective action amax(St , w) jto


Eπ [Σ∞k=0 γkr t+k+1|st = s, = a] (4)
calculate the target Q value:
at
Methods for solving value functions and action state value
yj = Rj + γQt (φ(Stj ), amax(St , w), wt ) (7)
functions are based on table methods and approximation
methods which based on value functions[3]. Traditional dy- Put them together:
namic programming, Monte Carlo and time difference(TD) yj = Rj + γQt (φ(St ), argmax Q(φ(St ), a, w), wt ) (8)
j j
algorithms are all table methods. The essence is to create a
a table of Q(s, a), behavioral state, and list as actions. The Therefore, there no any difference between the procedures of
table is continuously updated by loop iteration calculation. DDQN and DQN except the way of calculating the Q value.
value. When the state is relatively small, it is completely Both the Donkey car in the real world and the Donkey car in
feasible, but when the state space is large, the traditional the simulator takes continuous steering and throttle values as
method is not feasible. Can you fit the state action value input. For simplicity sake, we set throttle value as constant
function with the approximating ability of the deep neural (i.e. 0.7) and only opt to control the steering. The steering
network to make Q(s, a) Q(s, ≈ a, ø) has become the value ranges from -1 to 1. However, DQN can only handle
current research hot spot. discrete actions, so we discretized the steering value into 15
In 2013, deepmind highlighted the famous DQN algo- categorical bins.
rithm[4], which opened a new era of deep reinforcement Reward is a function of cross track error (cte) which is pro-
learning. The algorithm uses a convolutional neural network vided by the Unity environment. Cross track error measures
to approximate the state action value function, and uses the the distance between the center of the track and car. Our
original pixels of the screen as input to directly learn the shaped reward is given by the following formula:
Atari game strategy. At the same time, making use of the
abs
experience to replay mechanism[5]. The training samples are max (9)
reward = 1 − (
stored in the memory pool, and each time a fixed amount of )/cte cte
data is randomly sampled to train the neural network, the Where ctemax is just a normalizing constant so that the
correlation between the training samples is eliminated, and reward is within the range of 0 and 1. We terminate the
the stability of the training is improved. episode if abs(cte) is larger than ctemax.
2) Self-driving algorithm based on DDQN: In the pres-
ence of a friendly reinforcement learning model training
environment, we plan to use the strong learning algorithm as
our control algorithm for automatic driving. Therefore, we
chose to use the DDQN algorithm for its relatively simple
coding feature. We will introduce this method and how to
apply it to the autopilot model.
In the DNQ algorithm, the author creatively proposed an
approximate representation of the value function[11], which
successfully solve the problem that the status array is too
big to calculate. Among them, the state value function is
introduced:
vˆ(s, w) = ø(s)T w (5)
And use neural networks to work as state value functions. Fig. 4. The architecture of the network is shown in the figure below. The
But it does not necessarily guarantee the convergence of the first layer convolves the input image with an 8x8x4x32 kernel at a stride
size of 4. The output is then put through a 2x2 max pooling layer. The
Q network, which may not be able to get the Q network second layer convolves with a 4x4x32x64 kernel at a stride of 2. We then
parameters after convergence and will result in a inferior max pool again. The third layer convolves with a 3x3x64x64 kernel at a
trained model. In order to solve this problem, Double deep stride of 1. We then max pool one more time. The last hidden layer consists
of 256 fully connected ReLU nodes.
Q-learning network proposed by Hasselt[6] deal with the
problem of eliminating overestimation by decoupling the
selection of the target Q value action and the calculation Frame skipping is set to 2 to stabilize training. Memory
replay buffer (i.e. storing ¡state, action, reward,
of the target Q value.
The double deep Q-learning network has two Q network statenext¿ tuples) has a capacity of 10000. Target Q network
structures like the Deep Q-learning network. It is no longer is updated at the end of each episode. Batch size for training
the CNN is
64. Epsilon-greedy is used for exploration. Epsilon is initially
set to 1 and gradually annealed to a final value of 0.02 in
10,000 time steps.

IV. EXPERIMENT
A. Simulation
Essentially, we want our Reinforcement learning agent to
base its output decision (i.e. steering) only on the location
and orientation of the lane lines and neglect everything
else in the background. However, since we give it the
full pixel camera images as inputs, it might overfit to the
Fig. 6. The scale vehicle car in the Unity Simulation.
background patterns instead of recognizing the lane lines.
This is especially problematic in the real world settings
where there might be undesirable objects lying next to the B. Simulation to Realty
track (e.g. tables and chairs) and people walking around the
track. If we ever want to transfer the learned policy from We have customized a 3.5x4m simulation track. The track
the simulation to the real world, we should get the agent and Unity environment has a high degree of reduction,
to neglect the background noise and just focus on the track which is similar to the real life road (according to China’s
lines. right- hand drive standard). We modified the program to
To address this problem, we create a pre-processing pipeline change the
to segment out the lane lines from the raw pixel images be-
fore feeding them into the CNN. The procedure is described
as follows:
• Detect and extract all edges using Canny Edge Detector.
• Identify the straight lines through Hough Line Trans-
form.
• Separate the straight lines into positive sloped and
negative sloped (candidates for left and right lines of
the track)
• Reject all the straight lines that do not belong to the
track utilizing slope information.
The resulting transformed images consists of 0 to 2 straight
lines representing the lane, illustrated as the figure 5.
We then took the Fig. 7. The road for self-driving scale vehicle car, which contains two fast
segmented images, curves and two gentle curves.
resize them to
(80,80), stack 4 trained model input from Unity’s output to the camera’s
successive frames real- time input. Then we transfer the program to the
together and use it Raspberry Pi. The good news is that our car successfully
as the new input followed the rules after several experiments. In order to
states. We trained improve the rate of convergence. We use a new reward
DDQN again with function:
the new states. The
resulting RL agent reward = abs(cteprev) − abs(cte) (10)
was again able
This reward results agent converged to a good policy in 30
to learn a good
episodes as compared to 100 episodes for the reward above.
policy to drive the
Furthermore, we add an obstacle on the road in order to
car! With the setup
increase the level of the challenge. After trained by the
above, I trained
improved reward function, the self-driving vehicle car
DDQN for around
bypass the obstacle successfully. The experiment results
demonstrate the feasibility of our method that training a self-
driving vehicle car by the algorithm of DDQN in the Unity
Simulator
100 episodes on a Fig. 5. The examples of raw images transfer and transfer to the reality.
single CPU and a to the segmented images.
GTX 1080 GPU. V. CONCLUSION AND DISCUSSION
The entire training took around 2 to 3 hours. As we can In this paper, we propose the method of using Double
see from the video below, the car was able to learn a pretty deep Q-learning network to set up a self-driving model just
good policy to drive itself! Notice that the car learned to demand one camera, and train it in the Unity, then transfer
drive and stayed at the center of the track most of the time. to the reality. We call this trick ”sim-to-real”. Through this
Reinforcement learning agent only generates steering output,
with the throttle value held fixed. The next step will be
to have the agent learn to output a throttle value as well
to optimize vehicle speed. For example, it should learn to
increase throttle when the vehicle is driving straight and
decrease throttle when the vehicle is making sharp turns.
To achieve this, we need to further shape the reward with
vehicle velocity.
VI. A CKNOWLEDGEMENT
The authors would like to thank thank Tawn Kramer
and Mr. Felix for creating a high fidelity Unity simulator
for Donkey car. What we did was modifying their existing
Fig. 8. The trained self-driving scale vehicle car. The first image shows code to make it reinforcement learning compatible. We also
that the car meets a sharp turn. In the second and third image, the car is in want to thank the Donkey car community for initiating this
a ”S” curve. The fourth image illustrates the scene of straight road. wonderful project for us to learn about self driving!
REFERENCES
[1] Janai J, Gney F, Behl A, et al. Computer Vision for Autonomous
Vehicles: Problems, Datasets and State-of-the-Art[J]. 2017.
[2] Abraham, Hillary, Chaiwoo Lee, Samantha Brady, Craig Fitzgerald,
Bruce Mehler, Bryan Reimer, and Joseph F. Coughlin. ”Autonomous
vehicles, trust, and driving alternatives: A survey of consumer pref-
erences.” Massachusetts Inst. Technol, AgeLab, Cambridge (2016): 1-
16.
[3] Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An
introduction. MIT press, 2018.
[4] Mnih, Volodymyr, et al. ”Human-level control through deep reinforce-
ment learning.” Nature 518.7540 (2015): 529.
[5] Lin, Long-Ji. Reinforcement learning for robots using neural
networks. No. CMU-CS-93-103. Carnegie-Mellon Univ Pittsburgh PA
School of Computer Science, 1993.
[6] Van Hasselt, Hado, Arthur Guez, and David Silver. ”Deep reinforce-
ment learning with double q-learning.” Thirtieth AAAI Conference on
Artificial Intelligence. 2016.
Fig. 9. An obstacle is added on the road. The angle of view of the car in
[7] Jain, Aditya Kumar. ”Working model of Self-driving car using
the lower left of the figure.
Convo- lutional Neural Network, Raspberry Pi and Arduino.” 2018
Second In- ternational Conference on Electronics, Communication
and Aerospace Technology (ICECA). IEEE, 2018.
[8] Santana, Eder, and George Hotz. ”Learning a driving simulator.”
experiment, we prove the strategy that train an automatic arXiv preprint arXiv:1608.01230 (2016).
scale car through training in the virtual environment is [9] [Link]
[10] [Link]
practicable. Since if we want to train a self-driving car by drive-like-yourself-dc6021152713
reinforcement learning, we need to get some rewards may [11] Mnih, Volodymyr, et al. ”Playing atari with deep reinforcement
damage the car absolutely. And we can avoid this by training learning.” arXiv preprint arX iv:1312.5602 (2013).
[12] Watkins, Christopher John Cornish Hellaby. ”Learning from delayed
in the virtual environment for we don’t need to cost any for rewards.” (1989).
the vehicle damage. Besides, because the virtual [13] Luo, Wenhan, et al. ”End-to-end Active Object Tracking and Its Real-
environment can contain all the possible conditions of the world Deployment via Reinforcement Learning.” IEEE transactions
on pattern analysis and machine intelligence(2019).
driving, the trained car possesses better robustness. [14] James, Stephen, Andrew J. Davison, and Edward Johns. ”Transferring
Though the trained self-driving vehicle car achieves the goal end-to-end visuomotor control from simulation to real world for a
of autonomous driving successfully, the learned policy was multi-stage task.” arXiv preprint a r X i v : 1 7 0 7 . 0 2 2 6 7 (2017).
[15] Andrychowicz, Marcin, et al. ”Learning dexterous in-hand manipula-
also less stable and the car wriggled frequently especially tion.” arXiv preprint ar X iv:1808.00177 (2018).
when making turns. From our analysis, it is because we [16] Sadeghi, Fereshteh, et al. ”Sim2real view invariant visual servoing by
threw away useful background information and line recurrent control.” arXiv preprint arXiv:1712.07642(2017).
[17] Tan, Jie, et al. ”Sim-to-real: Learning agile locomotion for quadruped
curvature information. In return, the agent should be less robots.” arXiv preprint arX iv:1804.10332 (2018).
prone to overfitting and can even be generalize to unseen [18] Kendall, Alex, et al. ”Learning to Drive in a Day.” arXiv preprint
and real world tracks. Closing the reality gap is no easy task. arXiv:1807.00412 (2018).
[19] Drr, Dominik, David Grabengiesser, and Frank Gauterin. ”Online
In order to solve this issue, the next step we may adopt some driving style recognition using fuzzy logic.” 17th International IEEE
sim-to- real tricks involved domain randomization (e.g. Conference on Intelligent Transportation Systems (ITSC). IEEE,
randomizing the width, color, friction of the track, adding 2014.
[20] Bojarski, Mariusz, et al. ”End to end learning for self-driving cars.”
shadows, randomizing throttle values, etc) so that the learned arXiv preprint arXiv:1604.07316 (2016).
policy is robust enough to be deployed to the real world. [21] Codevilla, Felipe, et al. ”End-to-end driving via conditional imitation
We will also train the car to maximize speed. Right now the learning.” 2018 IEEE International Conference on Robotics and Au-
tomation (ICRA). IEEE, 2018.
[22] Shalev-Shwartz, Shai, Shaked Shammah, and Amnon Shashua. ”Safe,
multi-agent, reinforcement learning for autonomous driving.” arXiv
preprint arXiv:1610.03295 (2016).

You might also like