This README is not finished yet. And some Exper_Data for training is too big to upload, you can collect them by yourself.
Example: Multi-AGV in an robotic mobile fulfilment system
This project consists of two main parts: a multi-AGV learning environment and several reinforcement learning (RL) algorithms, including existing algorithms as well as our own innovation.
1.The multi-AGV learning environment consists of three classes: Layout Class, which manages the layout of the scene; Explorer Class, which manages the AGVs; and Scene Class, which is a container that accommodates both the layout and AGVs so that the AGVs can operate in a specific layout.
2.As for the RL algorithms, we have implemented several well-known approaches such as Deep Q-Network (DQN), Double Deep Q-Network (DDQN), Actor-Critic (AC), and Policy-Gradient (PG), as well as our own novel algorithm, A* guiding DQN (AG-DQN).
-Aiming at solving pathfinding problem on a 2D map, we used a Convolutional Neural Network and specifically designed state representation.
-To enhance the training effect, we applied techniques such as Behavioral Cloning, Sparse Reward, and Limited Visual.
3.The figure on the right shows the structure of this project.
This project was created by LeiLuo under the supervision of Professor Zhao Ning form University of Science and Technology of Beijing (Beijing, China) .
- The figure shows an RMFS made by Quicktron Robots.
- The components of an RMFS include AGVs (for transferring shelves), shelves (for storing goods), a track (on which the AGVs can move), a picking station (where workers can pick goods), and a charge room (where the AGVs can recharge).
- The goal of the RMFS is to have AGVs transfer shelves to the picking station, where workers can correctly select, package, and deliver the goods.
- The aim of the AGVs is to transfer the necessary shelves to the picking station and return them to their original location once the workers have finished picking the goods.
- AGVs have a special rule that when they are empty, they can travel under the shelves to shorten their travel path. However, when the AGVs are full, they are only allowed to move on the tracks.
- The objective of this project is to develop effective methods to guide the AGVs in finding the shortest and safest path to complete a large number of tasks without collisions.
Tips:
-Unlike other pathfinding scenes, the RMFS is unique in that it is full of obstacles (shelves) that pose a challenge to AGVs, and the path is often too narrow to accommodate two AGVs at once.
-Our method is not limited to the RMFS scenario but can also be applied in other pathfinding situations if they are created using a similar grip-based approach.
Just run src.main.py to run this project.
Notes:
The red block represents storage station, the gray block represents picking station, the white block represents track.
The green block represents current target place, the pink block represents current target place's task has been finished.
The left figure represents construction method 2.1. The right figure represents construction method 2.2
2.1 Create a rectangular layout by entering 5 parameters
Adjust following parameters to create a rectangular layout
ss_x_width, ss_y_width, ss_x_num, ss_y_num, ps_num = 4, 2, 2, 2, 2
# ss_x_width: The number of storage stations in the x-axis direction of the storage station island
# ss_y_width: The number of storage stations in the y-axis direction of the storage station island
# ss_x_num: The number of storage station island in X-axis direction
# ss_y_num: The number of storage station island in y-axis direction
# ps_num: The number of picking station
2.2 Create a special layout by entering 'layout_list'
Enter a list to create a special layout
layout_list = [[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 1, 1, 0, 0],
[0, 1, 1, 1, 1, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 1, 0],
[0, 0, 0, 2, 0, 0, 1, 1, 0],
[0, 0, 0, 0, 0, 0, 1, 1, 0],
[0, 1, 1, 1, 1, 1, 1, 0, 0],
[0, 1, 1, 1, 1, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0]]
# 0: track, 1: storage station, 2: picking station.
# Note: Any storage station needs to be connected to a road in at least one direction.
Select the control modes by changing parameter 'control_mode'
control_type = {0: "train_NN", 1: "use_NN", 2: "A_star", 3: "manual", 4: "Expert"}
control_mode = 3
3.1 Manual mode
Control AGV by keyboard. (In this mode, you can only control a single AGV)
control_mode = 3
3.2 A* mode
Control AGV by A* algorithm.
control_mode = 2
3.3 Training mode
Training a neural network to guide AGV. (RL algorithms will be discussed in next section)
control_mode = 0
3.4 RL mode
Control AGV by a well-trained neural network.
control_mode = 1
3.5 Expert mode
Collect expert experience by using A* algorithm to control AGV.
control_mode = 4
We provide three algorithms including PG, AC and DQN.
Import different packages to experience these algorithms, the control mode should choose to "Training mode"
# from algorithm.AC_structure.Controller import ACAgentController as modelController
from algorithm.PG_structure.Controller import PGAgentController as modelController
# from src.algorithm.MADQN_structure.Controller import MADQNAgentController as modelController
Tips:You can check the logic of these algorithms online, so we won’t go into details here.
1.1 We use three matrices to construct the State. Valid_Location_Matrix, Current_Location_Matrix and Target_Location_Matrix.
1.2 Valid_Location_Matrix describes which block (marked as 1) the AGV can access.
1.2 Current_Location_Matrix describes which block (marked as 1) is the AGV's current location.
1.3 Target_Location_Matrix describes which block (marked as 1) is the AGV's target location.
Action space includes five actions: up, right, down, left and stop. Each action corresponds to a number.
You can look them up in src.utils.utils.py
str_value = {"UP": 0, "RIGHT": 1, "DOWN": 2, "LEFT": 3, "STOP": 4}
value_str = {0: "UP", 1: "RIGHT", 2: "DOWN", 3: "LEFT", 4: "STOP"}
Tips: If there is only one AGV, then action "stop" is unnecessary.
3.1 Corresponding to the result of performing an action, the agent can get three types of rewards: positive reward (reward value +1), negative reward (reward value -1) and normal reward (reward value 0).
3.2 When the AGV reaches the destination, it gets a positive reward; when the AGV hits an obstacle or runs out of the scene, it gets a negative reward; in other cases, the AGV gets a normal reward.
Tips: This is a typical sparse reward problem. We use Reward Reshaping to improve training performance.
According to the formation of State, convolutional neural network is used.
1.1 This is the most efficient method we have found to greatly improve the training effect of the DQN algorithm on AGV Pathfinding Problem.
1.2 The core change is to replace the random exploration used by the exploration method with the A* algorithm.
Tips: We will put some figures of the training process later.
2.1 Behavioral Cloning is a specific method imitation learning.
2.2 Simply put, Behavioral Cloning is used to enhance the utilization of data, thereby increasing the training speed of neural networks. Because the neural network is initialized completely randomly, if we let the agent interact with the environment from the beginning, it will not accumulate useful experience and the neural network will be difficult to optimize. So we can use some expert experience to pre-train the neural network, and then let the agent interact with the environment to continue to optimize the neural network.
2.3 The A* algorithm is used as an expert for the pathfinding problem.
2.4 Behavioral Cloning performs well in PG and AC algorithms.
Here are some pages where you can learn more about behavior cloning. But they are all in Chinese, you can search for more relevant English materials on the Internet.
Behavioral Cloning, Code
Tips: We will put some figures of the training process later.
It is not sufficient in solving multi-AGV pathfinding in huge scene, it only works with small scenes and several AGVs
If you find our project helpful, please cite our paper related to this project:
[1] Luo L, Zhao N, Zhu Y, et al. A* guiding DQN algorithm for automated guided vehicle pathfinding problem of robotic mobile fulfillment systems[J]. Computers & Industrial Engineering, 2023, 178: 109112.