0% found this document useful (0 votes)

43 views8 pages

Optimization of Container Relocation Problem Via Reinforcement Learning

Container relocation problem at container terminals for effective decking strategies

Uploaded by

llanojairo

Available Formats

Download as PDF, TXT or read online on Scribd

Download as pdf or txt

0% found this document useful (0 votes)

43 views8 pages

Optimization of Container Relocation Problem Via Reinforcement Learning

Container relocation problem at container terminals for effective decking strategies

Uploaded by

llanojairo

Available Formats

Download as PDF, TXT or read online on Scribd

Download as pdf or txt

You are on page 1/ 8

DOI: 10.

2195/lj_Proc_wei_en_202112_02
URN: urn:nbn:de:0009-14-54466

Optimization of Container Relocation Problem

via Reinforcement Learning
Optimierung des Container Relocation Problems
mittels Reinforcement Learning

Lei Wei
Fuyin Wie
Sandra Schmitz
Kunal Kunal
Bernd Noche

Lehrstuhl für Transportsysteme und –logistik

Universität Duisburg-Essen

T his paper presents an optimization method of

Container Relocation Problem (CRP) via
Reinforcement Learning (RL) based on heuristic rules.
In multimodal terminals, the cranes not only have to
serve the container ships, the trucks and the railroad at the
yard side, but also serve as stacking cranes. Inbound and
The method to calculate theoretical lowest relocation rate outbound containers are often stored at the container
is also briefly explained. As the result, training models for terminals for a certain period of time, waiting to be loaded
different dimensional bays are provided. Compared to onto the train or ship, or to be delivered by trucks.
the theoretical value, the result relocation rate is
acceptable with high inference speed. Furthermore, A rail-mounted gantry crane is usually used for
extended CRP in block will be briefly demonstrated. handling containers at the terminal. The containers are
stacked in storage blocks at the container yard to minimize
[Keywords: container relocation problem; block relocation storage space (Fig 1). Thus, only the topmost container is
problem; reinforcement learning; ML-Agents] directly available for a retrieval. Relocations (also known
as reshuffling) are necessary to grant access to a container

I
which is not topmost of the stack. These unproductive
n dieser Arbeit wird eine Optimierungsmethode für
moves performed by the yard cranes should be minimized
das Container Relocation Problem (CRP) mittels
to improve the terminal efficiency.
Reinforcement Learning (RL) vorgestellt, die auf
heuristischen Regeln basiert. Eine Methode zur
Berechnung der theoretisch niedrigsten Relocation Rate
wird ebenfalls erläutert. Als Ergebnis werden
Trainingsmodelle für unterschiedlich dimensionierte
Bays bereitgestellt. Verglichen mit dem theoretischen
Wert, ist die Relocation Rate zufriedenstellend und die
Inferenz-Geschwindigkeit hoch. Außerdem wird eine
erweiterte Version des CRPs die sich auf einen ganzen
Containerblock bezieht, präsentiert.
Figure 1. Container yard structure and terms (figure from
[Schlüsselwörter: Container Relocation Problem; Block
[ZF12]) with coordinate directions
Relocation Problem; Reinforcement Learning; ML-Agents]
For this purpose, a Container Relocation Problem
1 INTRODUCTION (CRP) (also known as Block Relocation Problem (BRP)
[GBJ18]) is considered in this article.
With the increase in global container trade, efficient
transshipment of terminal containers is essential. After a brief literature review in Section 2 we will
Intelligent container relocation in an inland container describe our approach to solve the CRP for a 2-
terminal or port is significant to improve performance dimensional stacking area considering one bay in Section
measures like task completion time, energy consumption, 4. We use the ML-Agents toolkit from Unity to implement
container rehandling rates and operation efficiency of a Reinforcement Learning based on heuristic rules to solve
terminal. the CRP. We present the experimental results at the end of

© 2021 Logistics Journal: Proceedings – ISSN 2192-9084 Seite 1

Article is protected by German copyright law
DOI: 10.2195/lj_Proc_wei_en_202112_02
URN: urn:nbn:de:0009-14-54466

that section of our approach and a comparison to other Proximal Policy Optimization (PPO) is a new class of
existing approaches. reinforcement learning algorithms, which perform
comparable or better performance than other modern
In common multimodal container terminals, the approaches like TRPO (Trust Region Policy Optimization)
relocations won’t be limited to one bay of the container while being much simpler to implement and tune
blocks. The different bays of one block are not [SWDRK17].
independent from each other, but can be operated by crane
movements along the x-axis. We will address this issue in Jerry Elman [Elm93] proposed the idea of training a
Section 6 by brief previewing the extension of the CRP, learning machine with a curriculum back in 1993. Bengio
which aims to optimize a 3-dimensional stacking area, i.e., et al. [BLCW09] presented a summary of curriculum
an entire block rather than one bay. Our main goal is to learning back in the day. They proposed curriculum
minimize the average operation (retrieval, relocation and learning as a method for a stepwise progression of the
stacking) time of containers. Besides, to visualize the complexity of the data samples used during the training
process, a simulation system is developed with Unity process.
Engine.
In Section 4 we use PPO as the training algorithm and
apply curriculum learning to accelerate the training process.
2 PRIOR RESEARCH
2.3 OTHER ALGORITHMS
2.1 THE CONTAINER RELOCATION PROBLEM
• Iterative Deepening A* (IDA*) algorithm [ZQL12]
[CVR14] classified storage yard operations in [LZL20]: Zhu et al. developed IDA* algorithms for
container terminals such as, storage space assignment to the unrestricted CRP. By using their derived
containers, yard crane scheduling, routing of vehicles dominance property, it takes advantages of two new
within the terminal, optimizing relocating operations at the lower bounds and several probe heuristics.
storage blocks, reviewing scientific journal papers Successive target containers can be retrieved as long
between 2004 and 2012. [KE21] extended this as they are on top of their respective stacks at the
classification by adding recent research papers. time of retrieval, until the minimum equivalent
layout is reached.
In most of the papers for the CRP research, the
objective is to find an optimal sequence of crane • Genetic Algorithm (GA) [MGM20] [SEE15]:
movement to retrieve all the containers from a bay Gamal et al. propose an optimization methodology
according to a predefined retrieval sequence, so the for solving CRP using genetic algorithm. The
number of movements is minimized [CSV20]. However, computational results show the effectiveness of the
there are also approaches that focus on other optimization proposed methodology for container terminal. It is
goals as minimizing the crane’s working time associated widely applied because of its ability to locate the
with any movement like relocation or retrieval of optimal solution in the global solution space.
containers [LL10], [FB12], [SAT19].
• Beam Search [WT10]: Beam Search (BS) is a
As the CRP is known as NP hard [CSV12], only small heuristic search algorithm based on breadth-first
instances can be solved with exact methods in reasonable branch-and-bound algorithm. The term "beam
time. So, several heuristic approaches can be found in search" was created by Raj Reddy in 1977.
literature. For a comprehensive literature survey of the
CRP and various exact and heuristic solution methods that 3 PROBLEM DESCRIPTION
have been applied to the CRP, we refer to [SAT19],
[MGM20], [CSV20]. 3.1 CONTAINER RELOCATION PROBLEM (CRP)

2.2 REINFORCEMENT LEARNING

Reinforcement Learning is one of the three basic

paradigms of machine learning, together with supervised
learning and unsupervised learning. Back in 1996,
Kaelbling et al. described Reinforcement Learning as “the
problem faced by an agent that learns behavior through
trial-and-error interactions with a dynamic environment”
[KLM96]. Figure 2. Layout demonstration of CRP, labels mean the
retrieval priority, smaller value will be retrieved
earlier [JZWW21]

© 2021 Logistics Journal: Proceedings – ISSN 2192-9084 Seite 2

Article is protected by German copyright law
DOI: 10.2195/lj_Proc_wei_en_202112_02
URN: urn:nbn:de:0009-14-54466

The goal of CRP is to minimize the relocations (or • No new container will be stacked during retrieval
relocation rate) during the container retrieval process. process.
Researchers use priority label to identify the retrieval
sequence of containers. The container with smallest label • The relocation could happen between any two stacks
number will be retrieved at first. as long as it is possible, e.g., relocation is
impossible, if target stack is full.
Static / Dynamic CRP: If there are no new containers
during the retrieval process to be stacked on the bay, the
4 OUR APPROACH
problem is called static CRP, otherwise it’s called dynamic
CRP. In this paper, the static CRP will be mainly 4.1 TOOL INTRODUCING
researched. Since the crane needs to serve the whole block,
the dynamic CRP within one bay is usually not under Unity Engine: Unity Engine is a 3D real time engine
consideration. for simulation and game development. For the future
implementation of Digital Twins (real-time crane control),
Restricted / Unrestricted CRP: CRP is restricted, if the Unity Engine was chosen to be the solution to build the
relocations are only allowed for the blocking containers host computer application.
above the container with highest priority. Otherwise, it’s
unrestricted, which means the unrestricted CRP is the ML-Agents: The Reinforcement Learning toolkit
super-set of restricted CRP. Generally, it has lower ML-Agents from Unity is used as training toolkit. The ML-
relocation rate and its corresponding algorithm is more Agents uses PPO Algorithm by default. Several learning
complex than restricted CRP. strategies are also supported by this toolkit, such as
curriculum learning, imitation learning and behavioral
Stochastic CRP: If the retrieval sequence is not fully cloning [Uni21]. In this paper, we used curriculum learning
known, for instance, several containers shall be stacked on to accelerate the training process.
a train, then the retrieval sequence is not important as long
as the corresponding containers are stacked on the correct 4.2 RL TRAINING
position [BMBJ13], [GMB18].
In this section our training method will be introduced.
CRP in block: In reality, relocations could happen in The training part in this paper is designed only for
whole or part of container yard, which is defined as a block relocation, since retrieval process does not need to be
(Figure 1). In this scenario, the relocation rate could not be trained and should be determined before making any
the single judgement of the problem, instead, several new relocation decision.
judgements were introduced, like average operation time of
container and average waiting time of truck [FHVX13]. The term “episode” is introduced in ML-Agents, in
Furthermore, the above-described types could be combined this context it means the period starting from initialization
in this scenario. of new layout of the bay to finishing retrieval of all the
containers in current layout. With help of this concept, the
3.2 RULES FOR UNRESTRICTED STATIC CRP reward will be summarized during the operation process
and refreshed when episode ends to ensure the rewards are
We consider the following common properties of the for the whole episode rather than each step.
CRP:
4.2.1 OBSERVATIONS OF RL
• Crane performs the operation (retrieval or relocation,
no stacking) with only one container at the same The observation structure is shown below.
time.
(Dim-Z) Hot Encoding []
• Only the topmost container could be operated (1) Z-index
(relocation or retrieval). Observation Stack Info [] (1) Can pick up
• All containers have same size. (1) Can stack
(1) Blocking Degree
• The relocation within the bay is limited [CVS11], (MaxTier * (2 + MaxTier))
which means no repeat operation is allowed. Container Info []
(MaxTier) Hot Encoding []
• The operations happen only in one bay.
Container Info (1) Whether moveable
• The containers have unique predefined priorities, no (1) Priority
containers have same priority. Figure 3. Observation structure, each square bracket
contains a list. The parenthesis means the size of
• Bay should never be full. the object

© 2021 Logistics Journal: Proceedings – ISSN 2192-9084 Seite 3

Article is protected by German copyright law
DOI: 10.2195/lj_Proc_wei_en_202112_02
URN: urn:nbn:de:0009-14-54466

The observation size of a stack is: During our implementation, this concept seems to be
insufficient. As Figure 4 shows, the blocking degrees of S1
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 = 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 + 4 + 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 ∗ (2 + 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚) and S3 calculated via the method from Jiang et al.
[JZWW21] should both be 5, whereas the S1 has two
Total observation size: containers above the container with highest priority. Thus,
we introduced a new concept called “blocking count”,
𝑠𝑠𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 ∗ 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 which can be calculated similar with the “blocking degree”,
only needs to change the “blockingDegree += x – c” to
The Hot Encoding is the common way for machine
“blockingCount += 1”. The training result with the
learning to handle the categorical data. In this paper, the
“blocking count” is slightly better than the version without
simplest One-Hot encoding was used. For corresponding z-
“blocking count”.
index and tier, the element in one-hot array should be 1,
other elements should be 0.

The “can pick up” property considers two aspects: (1)

Whether the stack is empty. (2) Whether the stack was
visited last time with unsuccessful operation. The second
condition is to avoid repeating operation done by the agent,
since it could fall into local optimal solution. And same
with the “can stack” property, only need to change the first
Figure 5. Comparison between with “blocking count” and
judgement to “whether the stack is full”.
without “blocking count”
The z-index in the stack info is to ensure the trainer
will get correct index after shuffling of the observations,
which will help the agent not fall into local optima.

The “whether moveable” observation is to tell the

trainer whether the corresponding container is moveable
4.2.2 OUTPUT
(relocatable). Only the topmost container could be
relocated, and if the stack is empty, no container could be As mentioned above, all decisions the agent makes are
removed from this stack. for relocation, retrieval will be automatically determined
before requesting decision from agent brain.
Jiang et al. have introduced a concept called “blocking
degree” [JZWW21]. It describes how “severe” the Action (output) can be described as (z0, z1), z0 means
corresponding stack is blocked. This value can be the pick-up index, z1 is stack index. Obviously, z0 and z1
calculated by the following pseudo code: should have different value, besides, stack of z0 must not
define blockingDegree = 0
be empty and stack of z1 must not be full. If all the
// elements in list are priorities containers in the bay are retrieved, the episode of current
Define stack = initStack scenario is finished, a new episode will begin to continue
while (stack.elementCount > 1) training until it reaches the predefined max step.
// max priority means min label
define c = stack.MaxPriority 4.2.3 REWARDING SYSTEM
// define upper stack includes c
• Minus “repeat times”, if the agent performs a
define hStack = stack[c.index, end]
repeat operation. A repeat operation means the
if (hStack.elementCount > 1) action is same with the last one when the last
foreach (x in hStack exclude c) operation failed. Without this punishment, the agent
blockingDegree += x - c will keep repeating unsuccessful action. To ensure
this rule will be well followed, the reward value is
// update list, without c
not normalized.
stack = stack[0, c.index]
• Minus “0.1” every step. The more step the agent
return blockingDegree
takes to retrieve all the containers, the more
4 punishment it will get.
3 6 7 • Add “1” if a container is retrieved.
1 5 2
• (optional) Minus “0.01 * z1”. This will encourage
S1 S2 S3 the agent to relocate the container near the waiting
Figure 4. blocking degree calculation. position of truck.

© 2021 Logistics Journal: Proceedings – ISSN 2192-9084 Seite 4

Article is protected by German copyright law
DOI: 10.2195/lj_Proc_wei_en_202112_02
URN: urn:nbn:de:0009-14-54466

4.3 THEORETICAL OPTIMAL SOLUTION

The theoretical optimal relocation rate can be archived

via tree-search, by which all the possibilities of relocation
Figure 8. Theoretical lowest relocation rate for 4 * 3 layout
in a layout could be achieved. Repeat of operations should
be avoided during implementation, otherwise the program The corresponding code can be found under
will fall into infinity loop. Our solution to solve the https://github.com/idea-lei/CRP_LowBound.
repeating problem is check of all parent nodes to see
whether there are nodes which have same layout with 4.4 EVALUATION
current node.
Max label: The initial container amount in the bay
6 6 [WT10], which ensures no relocation are blocked.
4 5 4 5 6 4 5
𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 = (𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 − 1) ∗ 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 + 1
1 2 3 1 2 3 1 2 3
Figure 6. example of repeat operation case Average relocation rate: how many relocations are
needed to move one container out.
There are other methods to accelerate the traverse
∑ 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟T𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖
process, e.g., ignoring all the meaningless relocation and 𝐴𝐴𝐴𝐴𝐴𝐴 =
𝒏𝒏𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄
using thread to fully use the power of CPU to run instances
simultaneously, etc.
Optimization ratio: positive value means better result
than best known.
3 3
1 2 1 2 𝑅𝑅𝑅𝑅𝑅𝑅𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 − 𝑅𝑅𝑅𝑅𝑅𝑅𝑎𝑎𝑎𝑎𝑎𝑎ℎ𝑜𝑜𝑜𝑜
𝑂𝑂𝑂𝑂𝑂𝑂 = ∗ 100%
Figure 7. example of meaningless relocation 𝑅𝑅𝑅𝑅𝑅𝑅𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 𝑘𝑘𝑘𝑘𝑘𝑘𝑘𝑘𝑘𝑘

Although, the traverse process is still very slow. The result of each scenario from this paper has at least
Figure 8 shows the result of layout 4 * 3. The Total Time 1000 instances (scenarios) to reduce fluctuation of the
means the sum of time span that each instance took, value, and the layouts are fully random generated.
average calculation time for one layout is 12 min. More Although large number of instances were tested to reduce
detailed heuristic will accelerate the process remarkably. the fluctuation of the result, there still can be around 5%
Another important point is that the traverse algorithm has error that can’t be eliminated due to different layouts.
no memory. For two scenarios with exactly the same layout, Compared with the theoretical optimal relocation rate, the
it will cost exactly double time. Despite this method can trained model will infer the result within 0.1s, which is
obtain the theoretical min relocations, it’s not practical for much faster than the tree search.
reality usage.

Average relocations (relocation rate), optimization ratio (opt)

Jovanovic, Voß
relocation rate

Authors - RL
Livia Maglic
stacks * tiers

(2019) - GA

(2010) - BS,
Theoretical
containers

Wu, Ting
(2014) –
optimal

chain F

opt %
No of

B&B

3*3 7 3.01 (0.430) 3.38 (0.482) 3.38 (0.482) 3.38 (0.482) 3.20 (0.457) 5.19%

3*4 9 5.06 (0.562) 5.85 (0.650) 5.95 (0.661) 5.67 (0.630) 5.71 (0.635) -0.79%

4*3 10 4.15 (0.415) 4.98 (0.498) 4.95 (0.495) 4.85 (0.485) 4.51 (0.451) 7.01%

4*4 13 - 8.55 (0.658) 8.57 (0.659) 8.43 (0.648) 8.62 (0.652) -0.60%

5*3 13 - 5.80 (0.446) 5.80 (0.446) 5.75 (0.442) 5.77 (0.444) -0.45%
Table 1. Comparison of results with different methods from other authors. The theoretical optimal solution is not
fully listed because of the time consumption.

Article is protected by German copyright law
DOI: 10.2195/lj_Proc_wei_en_202112_02
URN: urn:nbn:de:0009-14-54466

Code and test results for this section (unrestricted to optimize the crane operation time of each container,
static CRP) can be found under https://github.com/idea- which will maximize the port efficiency. These two
lei/CRP. judgements are almost the same and will collapse to be
exactly the same if every truck takes only one container.

5 CONCLUSION

The static unrestricted CRP within one bay using RL

was discussed in this paper, the training result is acceptable
compared with theoretical lowest relocation rate but with
much more time efficiency. The training is suitable for
small size layout, for large layout, the training time will be
relatively longer. The disadvantage of the method is that
training relies much on experience to adjust the parameters
of trainer, such as learning rate and hidden layers. Different
configurations could lead to different result. Furthermore, Figure 10. Layout of the simulation, 1) crane hooks the
the current version of ML-Agents toolkit (Release 18) picked-up container, 2) retrieval area in red, 3)
stacking transporter in green, 4) the current block
could have bug, sometimes the training process could fail
of this crane
without any sign. We have needed to dynamically change
the learning rate to keep the training process not to fail, and 6.2 DIGITAL TWINS
if the learning rate is too low, the model cannot be trained.
The approach is also intended for future usage like
Digital Twins for real-time control of cranes. Much more
visualized data will be granted with the development of the
concept “Digital Twins”. Which means the management
could become more intuitionistic with visualization of the
operation process of the crane. The biggest challenge is to
obtain the position data from all the moving objects
(trucks, crane, etc.) and control of the objects.

7 ACKNOWLEDGEMENTS
Figure 9. Example of training failure
Parts of the work were carried out as a part of the
6 FUTURE WORK BMBF research project 01IS19068A “KI-LiveS” (KI-
Labor für verteilte und eingebettete Systeme) and
The further work is separated in two parts, the first is supported by Federal Ministry of Education and Research
to extend the problem definition, where the CRP should be (BMBF).
combined with the crane scheduling problem (CSP), since
CRP within one bay is not practical to be used in terminals.
LITERATURE
The second is to implement the Digital Twins for the
terminal. [BLCW09] Bengio, Yoshua; Louradour, Jérôme;
Collobert, Ronan; Weston, Jason:
6.1 DYNAMIC RESTRICTED CRP IN BLOCK
Curriculum learning, ICML '09:
COMBINED WITH CRANE SCHEDULING PROBLEM
Proceedings of the 26th Annual,
In reality, the dynamic restricted CRP shall be International Conference on Machine
considered in block (or whole container yard). The Learning, 41–48, 2009.
unrestricted CRP in block won’t be considered, because the
priorities of the containers could change dynamically due [BMBJ13] Borjian, Setareh; Manshadi, Vahideh H.;
Barnhart, Cynthia; Jaillet, Patrick: Dynamic
to the actual truck arrival time and stacking of new
Stochastic Optimization of Relocations in
containers. Besides, the static CRP in whole block won’t
Container Terminals. Working paper, MIT,
happen often in practice, so it won’t be a mainstream topic,
2013.
neither.
[CSV12] Caserta, Marco; Schwarze, Silvia; Voß,
The judgement of the CRP&CSP in block can vary,
Stefan: A mathematical formulation and
Fotuhi et al. [FHVX13] have introduced a method to reduce
the average truck waiting time (despite it only considered complexity considerations for the blocks
about the CSP). This is a view for truck drivers. We intend relocation problem. European Journal of

Article is protected by German copyright law
DOI: 10.2195/lj_Proc_wei_en_202112_02
URN: urn:nbn:de:0009-14-54466

Operational Research, Volume 219, Issue 1: [JZWW21] Jiang, Tiecheng; Zeng, Bo; Wang, Yong;
96-104, 2012. Wei, Yan: A New Heuristic Reinforcement
Learning for Container Relocation
[CSV20] Caserta, Marco; Schwarze, Silvia; Voß, Problem. Journal of Physics, Conference
Stefan: Container Rehandling at Maritime Series, 1873 012050, IWECAI, 2021
Container Terminals: A Literature Update.
In: Böse J.W. (eds) Handbook of Terminal [KE21] Kizilay, Damla; Eliiyi, Deniz Türsel; A
Planning. Operations Research/Computer comprehensive review of quay crane
Science Interfaces Series. Springer, Cham, scheduling, yard operations and
2020 integrations thereof in container terminals.
Flexible Services and Manufacturing
[CVR14] Carlo, Héctor J.; Vis, Iris F. A.; Roodbergen, Journal, Volume 33: 1-42, 2021.
Kees Jan: Transport operations in container
terminals: Literature overview, trends, [KLM96] Kaelbling, Leslie P.; Littman, Michael L.;
research directions and classification Moore, Andrew W. Reinforcement Learning:
scheme. European Journal of Operational A Survey. Journal of Artificial Intelligence
Research, Volume 236, Issue 1: 1-13, 2014. Research, 4: 237–285.

[CVS11] Caserta, Marco; Voß, Stefan; Sniedovich, [LL10] Lee, Yusin; Lee, Yen-Ju: A heuristic for
Moshe: Applying the corridor method to a retrieving containers from a yard,
blocks relocation problem. OR Spectrum, Computers & Operations Research, Volume
Volume 33: 915-929, 2011. 37, Issue 6: 1139-1147, 2010.

[Elm93] Elman, Jerry: Learning and development in [LZL20] Lu, Chao; Zeng, Bo; Liu, Shixin: A study
neural networks: The importance of on the block relocation problem: Lower
starting small. Cognition, 48:781-799, bound derivations and strong formulations.
1993. IEEE Transactions on Automation Science
and Engineering, Volume 17, Issue 4:
[FB12] Forster Florian; Bortfeldt Andreas: A tree 1829-1853, 2020.
search heuristic for the container retrieval
problem. In: Klatte D., Lüthi HJ., [MGM20] Magli, Livia; Gulic, Marko; Maglic, Lovro:
Schmedders K. (eds) Operations Research Optimization of container relocation
Proceedings 2011. Springer, Berlin, operations in port container terminals,
Heidelberg, 2012. Transport, Volume 35, Issue 1: 37-47, 2020.

[FHVX13] Fotuhi, Fateme; Huynh, Nathan; Vidal, Jose [SAT19] da Silva Firmino, Andresson; de Abreu
M.; Xie, Yuanchang: Modeling yard crane Silva, Ricardo Martins; Times, Valeria
operators as reinforcement learning agents, Cesario: A reactive GRASP metaheuristic
Research in Transportation Economics, for the container retrieval problem to
Volume 42, Issue 1: 3-12, 2013. reduce crane’s working time. Journal of
Heuristics, Volume 25, Issue 2: 141–173,
[GBJ18] Galle, Virgile; Barnhart, Cynthia, Jaillet, 2019.
Patrick: A new binary formulation of the
restricted Container Relocation Problem [SEE15] Said, Gamal Abd El-Nassar A.; El-Horbaty,
based on a binary encoding of El-Sayed M.: An optimization methodology
configurations, European Journal of for container handling using genetic
Operational Research, Volume 267, Issue 2: algorithm. Procedia Computer Science, 65:
467-477, 2018. 662-671, 2015.

[GMB18] V. Galle, V. H.; Manshadi, S.; Borjian [SWDRK17] Schulman, John; Wolski, Filip; Dhariwal,
Boroujeni, C.; Barnhart, P. Jaillet: The Prafulla; Radford, Alec; Klimov, Oleg:
Stochastic Container Relocation Problem. Proximal Policy Optimization Algorithms,
Transportation Science Volume 52, Issue 5, eprint arXiv1707.06347, 2017.
2018.
[Uni21] Unity-Technologies: Training ML-Agents,
[JV14] Jovanovic, Raka; Voß, Stefan: A chain release 18, 2021. https://github.com/Unity-
heuristic for the blocks relocation problem. Technologies/ml-
Computers & Industrial Engineering, 75: agents/blob/release_18_docs/docs/Training
79–86, 2014 -ML-Agents.md

Article is protected by German copyright law
DOI: 10.2195/lj_Proc_wei_en_202112_02
URN: urn:nbn:de:0009-14-54466

[WT10] Wu, Kun-Chih; Ting, Ching-Jung: A Beam

Search Algorithm for minimizing reshuffle Lei Wei, M.Sc., Researcher at the Chair of Transport
operations at container yards. Proceedings Systems and Logistics, University Duisburg-Essen. E-
of the International Conference on Logistics Mail: lei.wei@uni-due.de
and Maritime Systems, 2010.
Fuyin Wei, M.Sc., Researcher at the Chair of Transport
[ZF12] Zehendner, Elisabeth; Feillet, Dominique: Systems and Logistics, University Duisburg-Essen. E-
Column Generation for the Container Mail: fuyin.wei@uni-due.de
Relocation Problem. 12th IMHRC
Proceedings, 2012. Dipl.-Ök. Sandra Schmitz, Researcher at the Chair of
Transport Systems and Logistics, University Duisburg-
[ZQL12] Zhu, Wenbin; Qin, Hu; Lim, Andrew; Zhang, Essen. E-Mail: sandra.schmitz@uni-due.de
Huidong: Iterative deepening A* algorithms
for the container relocation problem. IEEE Kunal, Kunal, B.Sc., Research Assistant at the Chair of
Transactions on Automation Science and Transport Systems and Logistics, University Duisburg-
Engineering, Volume 9, Issue 4: 710-722, Essen. E-Mail: kunal.kunal@stud.uni-due.de
2012.
Address: Chair of Transport Systems and Logistics (TuL),
University Duisburg-Essen, Keetmanstr. 3-9, 47058
Duisburg, Germany.

Article is protected by German copyright law