Optimization of Container Relocation Problem Via Reinforcement Learning
Optimization of Container Relocation Problem Via Reinforcement Learning
2195/lj_Proc_wei_en_202112_02
URN: urn:nbn:de:0009-14-54466
Lei Wei
Fuyin Wie
Sandra Schmitz
Kunal Kunal
Bernd Noche
I
which is not topmost of the stack. These unproductive
n dieser Arbeit wird eine Optimierungsmethode für
moves performed by the yard cranes should be minimized
das Container Relocation Problem (CRP) mittels
to improve the terminal efficiency.
Reinforcement Learning (RL) vorgestellt, die auf
heuristischen Regeln basiert. Eine Methode zur
Berechnung der theoretisch niedrigsten Relocation Rate
wird ebenfalls erläutert. Als Ergebnis werden
Trainingsmodelle für unterschiedlich dimensionierte
Bays bereitgestellt. Verglichen mit dem theoretischen
Wert, ist die Relocation Rate zufriedenstellend und die
Inferenz-Geschwindigkeit hoch. Außerdem wird eine
erweiterte Version des CRPs die sich auf einen ganzen
Containerblock bezieht, präsentiert.
Figure 1. Container yard structure and terms (figure from
[Schlüsselwörter: Container Relocation Problem; Block
[ZF12]) with coordinate directions
Relocation Problem; Reinforcement Learning; ML-Agents]
For this purpose, a Container Relocation Problem
1 INTRODUCTION (CRP) (also known as Block Relocation Problem (BRP)
[GBJ18]) is considered in this article.
With the increase in global container trade, efficient
transshipment of terminal containers is essential. After a brief literature review in Section 2 we will
Intelligent container relocation in an inland container describe our approach to solve the CRP for a 2-
terminal or port is significant to improve performance dimensional stacking area considering one bay in Section
measures like task completion time, energy consumption, 4. We use the ML-Agents toolkit from Unity to implement
container rehandling rates and operation efficiency of a Reinforcement Learning based on heuristic rules to solve
terminal. the CRP. We present the experimental results at the end of
that section of our approach and a comparison to other Proximal Policy Optimization (PPO) is a new class of
existing approaches. reinforcement learning algorithms, which perform
comparable or better performance than other modern
In common multimodal container terminals, the approaches like TRPO (Trust Region Policy Optimization)
relocations won’t be limited to one bay of the container while being much simpler to implement and tune
blocks. The different bays of one block are not [SWDRK17].
independent from each other, but can be operated by crane
movements along the x-axis. We will address this issue in Jerry Elman [Elm93] proposed the idea of training a
Section 6 by brief previewing the extension of the CRP, learning machine with a curriculum back in 1993. Bengio
which aims to optimize a 3-dimensional stacking area, i.e., et al. [BLCW09] presented a summary of curriculum
an entire block rather than one bay. Our main goal is to learning back in the day. They proposed curriculum
minimize the average operation (retrieval, relocation and learning as a method for a stepwise progression of the
stacking) time of containers. Besides, to visualize the complexity of the data samples used during the training
process, a simulation system is developed with Unity process.
Engine.
In Section 4 we use PPO as the training algorithm and
apply curriculum learning to accelerate the training process.
2 PRIOR RESEARCH
2.3 OTHER ALGORITHMS
2.1 THE CONTAINER RELOCATION PROBLEM
• Iterative Deepening A* (IDA*) algorithm [ZQL12]
[CVR14] classified storage yard operations in [LZL20]: Zhu et al. developed IDA* algorithms for
container terminals such as, storage space assignment to the unrestricted CRP. By using their derived
containers, yard crane scheduling, routing of vehicles dominance property, it takes advantages of two new
within the terminal, optimizing relocating operations at the lower bounds and several probe heuristics.
storage blocks, reviewing scientific journal papers Successive target containers can be retrieved as long
between 2004 and 2012. [KE21] extended this as they are on top of their respective stacks at the
classification by adding recent research papers. time of retrieval, until the minimum equivalent
layout is reached.
In most of the papers for the CRP research, the
objective is to find an optimal sequence of crane • Genetic Algorithm (GA) [MGM20] [SEE15]:
movement to retrieve all the containers from a bay Gamal et al. propose an optimization methodology
according to a predefined retrieval sequence, so the for solving CRP using genetic algorithm. The
number of movements is minimized [CSV20]. However, computational results show the effectiveness of the
there are also approaches that focus on other optimization proposed methodology for container terminal. It is
goals as minimizing the crane’s working time associated widely applied because of its ability to locate the
with any movement like relocation or retrieval of optimal solution in the global solution space.
containers [LL10], [FB12], [SAT19].
• Beam Search [WT10]: Beam Search (BS) is a
As the CRP is known as NP hard [CSV12], only small heuristic search algorithm based on breadth-first
instances can be solved with exact methods in reasonable branch-and-bound algorithm. The term "beam
time. So, several heuristic approaches can be found in search" was created by Raj Reddy in 1977.
literature. For a comprehensive literature survey of the
CRP and various exact and heuristic solution methods that 3 PROBLEM DESCRIPTION
have been applied to the CRP, we refer to [SAT19],
[MGM20], [CSV20]. 3.1 CONTAINER RELOCATION PROBLEM (CRP)
The goal of CRP is to minimize the relocations (or • No new container will be stacked during retrieval
relocation rate) during the container retrieval process. process.
Researchers use priority label to identify the retrieval
sequence of containers. The container with smallest label • The relocation could happen between any two stacks
number will be retrieved at first. as long as it is possible, e.g., relocation is
impossible, if target stack is full.
Static / Dynamic CRP: If there are no new containers
during the retrieval process to be stacked on the bay, the
4 OUR APPROACH
problem is called static CRP, otherwise it’s called dynamic
CRP. In this paper, the static CRP will be mainly 4.1 TOOL INTRODUCING
researched. Since the crane needs to serve the whole block,
the dynamic CRP within one bay is usually not under Unity Engine: Unity Engine is a 3D real time engine
consideration. for simulation and game development. For the future
implementation of Digital Twins (real-time crane control),
Restricted / Unrestricted CRP: CRP is restricted, if the Unity Engine was chosen to be the solution to build the
relocations are only allowed for the blocking containers host computer application.
above the container with highest priority. Otherwise, it’s
unrestricted, which means the unrestricted CRP is the ML-Agents: The Reinforcement Learning toolkit
super-set of restricted CRP. Generally, it has lower ML-Agents from Unity is used as training toolkit. The ML-
relocation rate and its corresponding algorithm is more Agents uses PPO Algorithm by default. Several learning
complex than restricted CRP. strategies are also supported by this toolkit, such as
curriculum learning, imitation learning and behavioral
Stochastic CRP: If the retrieval sequence is not fully cloning [Uni21]. In this paper, we used curriculum learning
known, for instance, several containers shall be stacked on to accelerate the training process.
a train, then the retrieval sequence is not important as long
as the corresponding containers are stacked on the correct 4.2 RL TRAINING
position [BMBJ13], [GMB18].
In this section our training method will be introduced.
CRP in block: In reality, relocations could happen in The training part in this paper is designed only for
whole or part of container yard, which is defined as a block relocation, since retrieval process does not need to be
(Figure 1). In this scenario, the relocation rate could not be trained and should be determined before making any
the single judgement of the problem, instead, several new relocation decision.
judgements were introduced, like average operation time of
container and average waiting time of truck [FHVX13]. The term “episode” is introduced in ML-Agents, in
Furthermore, the above-described types could be combined this context it means the period starting from initialization
in this scenario. of new layout of the bay to finishing retrieval of all the
containers in current layout. With help of this concept, the
3.2 RULES FOR UNRESTRICTED STATIC CRP reward will be summarized during the operation process
and refreshed when episode ends to ensure the rewards are
We consider the following common properties of the for the whole episode rather than each step.
CRP:
4.2.1 OBSERVATIONS OF RL
• Crane performs the operation (retrieval or relocation,
no stacking) with only one container at the same The observation structure is shown below.
time.
(Dim-Z) Hot Encoding []
• Only the topmost container could be operated (1) Z-index
(relocation or retrieval). Observation Stack Info [] (1) Can pick up
• All containers have same size. (1) Can stack
(1) Blocking Degree
• The relocation within the bay is limited [CVS11], (MaxTier * (2 + MaxTier))
which means no repeat operation is allowed. Container Info []
(MaxTier) Hot Encoding []
• The operations happen only in one bay.
Container Info (1) Whether moveable
• The containers have unique predefined priorities, no (1) Priority
containers have same priority. Figure 3. Observation structure, each square bracket
contains a list. The parenthesis means the size of
• Bay should never be full. the object
The observation size of a stack is: During our implementation, this concept seems to be
insufficient. As Figure 4 shows, the blocking degrees of S1
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 = 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 + 4 + 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 ∗ (2 + 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚) and S3 calculated via the method from Jiang et al.
[JZWW21] should both be 5, whereas the S1 has two
Total observation size: containers above the container with highest priority. Thus,
we introduced a new concept called “blocking count”,
𝑠𝑠𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 ∗ 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 which can be calculated similar with the “blocking degree”,
only needs to change the “blockingDegree += x – c” to
The Hot Encoding is the common way for machine
“blockingCount += 1”. The training result with the
learning to handle the categorical data. In this paper, the
“blocking count” is slightly better than the version without
simplest One-Hot encoding was used. For corresponding z-
“blocking count”.
index and tier, the element in one-hot array should be 1,
other elements should be 0.
Although, the traverse process is still very slow. The result of each scenario from this paper has at least
Figure 8 shows the result of layout 4 * 3. The Total Time 1000 instances (scenarios) to reduce fluctuation of the
means the sum of time span that each instance took, value, and the layouts are fully random generated.
average calculation time for one layout is 12 min. More Although large number of instances were tested to reduce
detailed heuristic will accelerate the process remarkably. the fluctuation of the result, there still can be around 5%
Another important point is that the traverse algorithm has error that can’t be eliminated due to different layouts.
no memory. For two scenarios with exactly the same layout, Compared with the theoretical optimal relocation rate, the
it will cost exactly double time. Despite this method can trained model will infer the result within 0.1s, which is
obtain the theoretical min relocations, it’s not practical for much faster than the tree search.
reality usage.
Authors - RL
Livia Maglic
stacks * tiers
(2019) - GA
(2010) - BS,
Theoretical
containers
Wu, Ting
(2014) –
optimal
chain F
opt %
No of
B&B
3*3 7 3.01 (0.430) 3.38 (0.482) 3.38 (0.482) 3.38 (0.482) 3.20 (0.457) 5.19%
3*4 9 5.06 (0.562) 5.85 (0.650) 5.95 (0.661) 5.67 (0.630) 5.71 (0.635) -0.79%
4*3 10 4.15 (0.415) 4.98 (0.498) 4.95 (0.495) 4.85 (0.485) 4.51 (0.451) 7.01%
4*4 13 - 8.55 (0.658) 8.57 (0.659) 8.43 (0.648) 8.62 (0.652) -0.60%
5*3 13 - 5.80 (0.446) 5.80 (0.446) 5.75 (0.442) 5.77 (0.444) -0.45%
Table 1. Comparison of results with different methods from other authors. The theoretical optimal solution is not
fully listed because of the time consumption.
Code and test results for this section (unrestricted to optimize the crane operation time of each container,
static CRP) can be found under https://github.com/idea- which will maximize the port efficiency. These two
lei/CRP. judgements are almost the same and will collapse to be
exactly the same if every truck takes only one container.
5 CONCLUSION
7 ACKNOWLEDGEMENTS
Figure 9. Example of training failure
Parts of the work were carried out as a part of the
6 FUTURE WORK BMBF research project 01IS19068A “KI-LiveS” (KI-
Labor für verteilte und eingebettete Systeme) and
The further work is separated in two parts, the first is supported by Federal Ministry of Education and Research
to extend the problem definition, where the CRP should be (BMBF).
combined with the crane scheduling problem (CSP), since
CRP within one bay is not practical to be used in terminals.
LITERATURE
The second is to implement the Digital Twins for the
terminal. [BLCW09] Bengio, Yoshua; Louradour, Jérôme;
Collobert, Ronan; Weston, Jason:
6.1 DYNAMIC RESTRICTED CRP IN BLOCK
Curriculum learning, ICML '09:
COMBINED WITH CRANE SCHEDULING PROBLEM
Proceedings of the 26th Annual,
In reality, the dynamic restricted CRP shall be International Conference on Machine
considered in block (or whole container yard). The Learning, 41–48, 2009.
unrestricted CRP in block won’t be considered, because the
priorities of the containers could change dynamically due [BMBJ13] Borjian, Setareh; Manshadi, Vahideh H.;
Barnhart, Cynthia; Jaillet, Patrick: Dynamic
to the actual truck arrival time and stacking of new
Stochastic Optimization of Relocations in
containers. Besides, the static CRP in whole block won’t
Container Terminals. Working paper, MIT,
happen often in practice, so it won’t be a mainstream topic,
2013.
neither.
[CSV12] Caserta, Marco; Schwarze, Silvia; Voß,
The judgement of the CRP&CSP in block can vary,
Stefan: A mathematical formulation and
Fotuhi et al. [FHVX13] have introduced a method to reduce
the average truck waiting time (despite it only considered complexity considerations for the blocks
about the CSP). This is a view for truck drivers. We intend relocation problem. European Journal of
Operational Research, Volume 219, Issue 1: [JZWW21] Jiang, Tiecheng; Zeng, Bo; Wang, Yong;
96-104, 2012. Wei, Yan: A New Heuristic Reinforcement
Learning for Container Relocation
[CSV20] Caserta, Marco; Schwarze, Silvia; Voß, Problem. Journal of Physics, Conference
Stefan: Container Rehandling at Maritime Series, 1873 012050, IWECAI, 2021
Container Terminals: A Literature Update.
In: Böse J.W. (eds) Handbook of Terminal [KE21] Kizilay, Damla; Eliiyi, Deniz Türsel; A
Planning. Operations Research/Computer comprehensive review of quay crane
Science Interfaces Series. Springer, Cham, scheduling, yard operations and
2020 integrations thereof in container terminals.
Flexible Services and Manufacturing
[CVR14] Carlo, Héctor J.; Vis, Iris F. A.; Roodbergen, Journal, Volume 33: 1-42, 2021.
Kees Jan: Transport operations in container
terminals: Literature overview, trends, [KLM96] Kaelbling, Leslie P.; Littman, Michael L.;
research directions and classification Moore, Andrew W. Reinforcement Learning:
scheme. European Journal of Operational A Survey. Journal of Artificial Intelligence
Research, Volume 236, Issue 1: 1-13, 2014. Research, 4: 237–285.
[CVS11] Caserta, Marco; Voß, Stefan; Sniedovich, [LL10] Lee, Yusin; Lee, Yen-Ju: A heuristic for
Moshe: Applying the corridor method to a retrieving containers from a yard,
blocks relocation problem. OR Spectrum, Computers & Operations Research, Volume
Volume 33: 915-929, 2011. 37, Issue 6: 1139-1147, 2010.
[Elm93] Elman, Jerry: Learning and development in [LZL20] Lu, Chao; Zeng, Bo; Liu, Shixin: A study
neural networks: The importance of on the block relocation problem: Lower
starting small. Cognition, 48:781-799, bound derivations and strong formulations.
1993. IEEE Transactions on Automation Science
and Engineering, Volume 17, Issue 4:
[FB12] Forster Florian; Bortfeldt Andreas: A tree 1829-1853, 2020.
search heuristic for the container retrieval
problem. In: Klatte D., Lüthi HJ., [MGM20] Magli, Livia; Gulic, Marko; Maglic, Lovro:
Schmedders K. (eds) Operations Research Optimization of container relocation
Proceedings 2011. Springer, Berlin, operations in port container terminals,
Heidelberg, 2012. Transport, Volume 35, Issue 1: 37-47, 2020.
[FHVX13] Fotuhi, Fateme; Huynh, Nathan; Vidal, Jose [SAT19] da Silva Firmino, Andresson; de Abreu
M.; Xie, Yuanchang: Modeling yard crane Silva, Ricardo Martins; Times, Valeria
operators as reinforcement learning agents, Cesario: A reactive GRASP metaheuristic
Research in Transportation Economics, for the container retrieval problem to
Volume 42, Issue 1: 3-12, 2013. reduce crane’s working time. Journal of
Heuristics, Volume 25, Issue 2: 141–173,
[GBJ18] Galle, Virgile; Barnhart, Cynthia, Jaillet, 2019.
Patrick: A new binary formulation of the
restricted Container Relocation Problem [SEE15] Said, Gamal Abd El-Nassar A.; El-Horbaty,
based on a binary encoding of El-Sayed M.: An optimization methodology
configurations, European Journal of for container handling using genetic
Operational Research, Volume 267, Issue 2: algorithm. Procedia Computer Science, 65:
467-477, 2018. 662-671, 2015.
[GMB18] V. Galle, V. H.; Manshadi, S.; Borjian [SWDRK17] Schulman, John; Wolski, Filip; Dhariwal,
Boroujeni, C.; Barnhart, P. Jaillet: The Prafulla; Radford, Alec; Klimov, Oleg:
Stochastic Container Relocation Problem. Proximal Policy Optimization Algorithms,
Transportation Science Volume 52, Issue 5, eprint arXiv1707.06347, 2017.
2018.
[Uni21] Unity-Technologies: Training ML-Agents,
[JV14] Jovanovic, Raka; Voß, Stefan: A chain release 18, 2021. https://github.com/Unity-
heuristic for the blocks relocation problem. Technologies/ml-
Computers & Industrial Engineering, 75: agents/blob/release_18_docs/docs/Training
79–86, 2014 -ML-Agents.md