D2 - Application of Graph Neural Networks
D2 - Application of Graph Neural Networks
This work is a result of project DynamiCITY: Fostering Dynamic Adaptation of Smart Cities
to Cope with Crises and Disruptions, with reference NORTE-01-0145-FEDER-000073, supported
by Norte Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020
Partnership Agreement, through the European Regional Development Fund (ERDF).
Resumo
A previsão de tráfego é um aspecto crucial dos Sistemas de Transporte Inteligentes, uma vez
que tem o potencial de melhorar a mobilidade e eficiência do transporte nas cidades, reduzindo os
custos e minimizando o impacto ambiental. A tarefa de previsão de tráfego é um desafio complexo,
uma vez que envolve lidar com a rápida evolução e o dinamismo do tráfego, que é afectado por
vários factores, tais como acidentes, cortes de estradas, eventos sociais e meteorológicos. Para
além disso, o fluxo de tráfego é caracterizado por dependências espaciais e temporais, em que o
estado do tráfego num determinado local é influenciado por o tráfego noutros locais e o estado do
tráfego num determinado momento depende do tráfego passado.
Nos últimos anos, as Graph Neural Networks (GNNs) têm ganho cada vez mais atenção no
terreno de Deep Learning, demonstrando desempenho ao nível de estado da arte em várias apli-
cações. As GNNs são particularmente adequadas a problemas de previsão de tráfego, uma vez que
têm a capacidade de capturar tanto dependências espaciais como temporais nos dados.
Esta dissertação explora a utilização de GNNs na previsão de tráfego rodoviário. O estudo
fornece uma revisão da literatura existente sobre técnicas de previsão de tráfego e investiga o
potencial das GNNs no tratamento das relações complexas entre as condições de tráfego, tanto no
tempo como no espaço. A investigação tem como objetivo avaliar o desempenho das GNNs neste
problema e complementar esse estudo com a investigação do impacto da utilização de técnicas de
imputação de dados em falta e dos factores externos nos resultados da previsão. O estudo também
tem como objetivo avaliar a generalização dos modelos GNN em diferentes datasets, incluindo
datasets de benchmarking e o dataset de caso de uso do projeto DynamiCITY, que esta dissertação
integra.
A avaliação empírica dos modelos demonstra a sua eficácia no tratamento das relações com-
plexas entre as condições de tráfego no tempo e no espaço. Os resultados da investigação mostram
que as GNNs superam vários modelos baseados em séries temporais comummente utilizados na
literatura, em linha com outros trabalhos sobre este tema. As técnicas de imputação de dados em
falta também apresentaram melhores resultados, especialmente nos conjuntos de dados com mais
dados em falta. No entanto, a utilização de dados meteorológicos não melhorou os resultados
obtidos.
Em termos gerais, esta dissertação contribui para a crescente investigação sobre a utilização de
GNNs na previsão de tráfego e demonstra o seu potencial para melhorar os sistemas de transporte.
ii
Abstract
Traffic forecasting is a crucial aspect of Intelligent Transportation Systems, as it has the potential to
improve the mobility and efficiency of transportation in cities while reducing costs and minimizing
environmental impact. The task of traffic forecasting is a challenging one, as it involves predicting
the rapidly changing and dynamic nature of traffic, which is affected by various factors such as
accidents, road closures, social events, and weather. Additionally, traffic flow is characterized by
both spatial and temporal dependencies, where the state of traffic in one location is influenced by
traffic in other locations and the state of traffic at a particular time is dependent on past traffic
patterns.
In recent years, Graph Neural Networks (GNNs) have gained increasing attention in the field
of deep learning, demonstrating state-of-the-art performance in various applications. GNNs are
particularly well suited for traffic forecasting problems, as they have the ability to capture both
spatial and temporal dependencies in the data.
This dissertation explores the application of GNNs in road traffic forecasting. The study pro-
vides a review of existing literature on traffic forecasting techniques and investigates the potential
of GNNs in handling the complex relationships between traffic conditions in time and space. The
research aims to evaluate the performance of GNNs in this problem and complement that study by
investigating the impact of using missing data imputation techniques and external factors on the
prediction results. The study also aims to evaluate the generalizability of the GNN models across
different datasets, including benchmarking datasets and DynamiCITY’s use-case dataset.
The empirical evaluation of the models demonstrates their effectiveness in handling the com-
plex relationships between traffic conditions in time and space. The results of the research show
that GNNs outperform several commonly used time-series-based models in the literature, in line
with other works on the topic. The missing data imputation techniques also showed improved
results, especially on the datasets with more missing data. However, the use of weather data did
improve the results obtained.
Overall, this dissertation contributes to the growing body of research on the use of GNNs in
traffic forecasting and demonstrates their potential to improve transportation systems.
Keywords: traffic forecasting, intelligent transportation systems, graph neural networks, spatio-
temporal dependency, deep learning, missing data, external factors
iii
Acknowledgments
First, I would like to thank Professor Daniel Silva and Professor Rosaldo Rossetti, supervisors of
this dissertation, for their guidance and encouragement throughout the duration of this dissertation
process. Without their support in the most stressful phases this work could not be the way it is. I
would like to acknowledge the support and resources provided by LIACC and the DynamiCITY
research project. The access to computing facilities and data provided were crucial in conducting
the experiments and analysis for this research.
I also want to express my gratitude for my family, in particular my parents and brother, for
always believing in me and giving me encouragement for all the challenges I face.
Finally, I want to thank all my friends, that accompanied me throughout this five years and in
particular this last few months . Without them this journey would not have been the same.
Clara Gadelho
iv
Contents
1 Introduction 1
1.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Aim and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Background 4
2.1 Intelligent Transportation Systems . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 The Traffic Forecasting Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.1 Traffic Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.2 Categorization of Traffic Prediction Problems . . . . . . . . . . . . . . . 5
2.3 Graph Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.1 General GNN Arquitecture . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.2 Attention Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Related Work 12
3.1 Traffic Forecasting Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.1 Statistical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Regression Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.1 Ensemble Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.2 Hybrid Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.3 Machine Learning and Deep Learning Techniques . . . . . . . . . . . . 14
3.2.4 Graph Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Benchmarking Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.4 Gap Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4 Methodological Approach 22
4.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Problem Formalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3 Implementation Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.4 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.4.1 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.4.2 Baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.5 Risk Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5 Implementation 27
5.1 Technologies Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.2 Data Retrieval and Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . 28
v
CONTENTS vi
6 Empirical Evaluation 55
6.1 Base Graph Neural Network Architecture . . . . . . . . . . . . . . . . . . . . . 56
6.1.1 Model Comparisons in Each Dataset . . . . . . . . . . . . . . . . . . . . 56
6.1.2 Effect of Hyperparameters . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.1.3 Effect of the Length of the Training Set . . . . . . . . . . . . . . . . . . 67
6.2 Handling Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.3 Weather Data Incorporation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.4 Global Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.5 Results Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
References 77
List of Figures
vii
List of Tables
6.1 Identification of the names used for the different GNN arquitectures . . . . . . . 55
6.2 Comparison of the different models on the PeMS-BAY dataset . . . . . . . . . . 56
6.3 Comparison of the different models on the METR-LA dataset . . . . . . . . . . . 57
6.4 Comparison of the different models on the VCI dataset . . . . . . . . . . . . . . 58
6.5 Comparison of the different optimizers used with the Cheb-GRUAtt on the PeMS-
BAY dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.6 Comparison of the different optimizers used with the Cheb-GRUAtt on the METR-
LA dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.7 Comparison of the different optimizers used with the Cheb-GRUAtt on the VCI
dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.8 Comparison of the different optimizers used with the GAT-CNNAtt on the PeMS-
BAY dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.9 Comparison of the different optimizers used with the GAT-CNNAtt on the METR-
LA dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.10 Comparison of the different optimizers used with the GAT-CNNAtt on the VCI
dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.11 Results of different learning rates with 100 epochs for Cheb-GRUAtt on PeMS-
BAY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.12 Results of different learning rates with 100 epochs for Cheb-GRUAtt on METR-
LA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.13 Results of different learning rates with 100 epochs for Cheb-GRUAtt on VCI . . 62
6.14 Results of different learning rates and with 100 epochs for GAT-CNNAtt on PeMS-
BAY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.15 Results of different learning rates and with 100 epochs for GAT-CNNAtt on METR-
LA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.16 Results of different learning rates with 100 epochs for GAT-CNNAtt on VCI . . 63
viii
LIST OF TABLES ix
x
xi
Acronyms and Abbreviations xii
AI Artificial Intelligence
Agg Aggregation
Att Attention
ARIMA Autoregressive Integrated Moving Average
ASTGCN Attention Based Spatial-Temporal Graph Convolutional Network
ChebConv Chebyshev Spectral Graph Convolution
CNN Convolutional Neural Network
DL Deep Learning
DNN Deep Neural Network
DSTGCN Dynamic Spatial-Temporal Graph Convolutional Network
FNN Feedforward Neural Network
GAT Graph Attention Network
GCN Graph Convolution Network
GMAN Graph Multi-Attention Network
GNN Graph Neural Network
GRU Gated Recurrent Unit
GTCN Temporal Graph Convolutional Network
ITS Intelligent Transportation Systems
LOCF Last Observation Carried Forward
LSTM Long Short-term Memory
MAE Mean Absolute Error
MAPE Mean Absolute Percentage Error
MICE Multivariate Imputation by Chained Equations
ML Machine Learning
MLP Multilayer Perceptron
NN Neural Network
NOCB Next Observation Carried Backwards
N/A Non Applicable
PeMS Caltrans Performance Measure Systems
RMSE Root Mean Squared Error
RNN Recurrent Neural Network
SLC Structure Learning Convolution
STGCN Spatio-Temporal Graph Convolutional Network
ST-UNet Spatio-Temporal U-Network for Graph-structured Time Series Modeling
SVM Support Vector Machine
T-GCN Temporal Graph Convolutional Network
VAR Vector Auto Regressive Model
VCI Via de Cintura Interna (Porto’s Ring Road)
WHO World Health Organization
Chapter 1
Introduction
1.1 Context
With the acceleration of urbanization and the rapid growth of urban population, great pressure is
being placed on urban traffic management (United Nations, 2017). Expanding cities face numer-
ous challenges related to transportation, including increased air pollution and worsening traffic
congestion (W. Jiang & Luo, 2022). According to the World Health Organization (WHO, 2022),
the transportation sector leads to a significant portion of air pollution, with more than seven million
people dying each year due to this crisis. With this in mind, great effort is nowadays being placed
into developing Intelligent Transportation Systems (ITS), which could significantly improve the
lives of residents in future cities (Shahid, Shah, Khan, Maple, & Jeon, 2021). In the context of
ITS, Traffic forecasting is seen as an essential step in improving the efficiency of the transporta-
tion system and alleviating transportation-related problems (J. Lu, Li, Li, & Al-Barakani, 2021).
This is a complex task since traffic is very dynamic and involves dependencies in terms of space
and time (W. Jiang & Luo, 2022). So, over the years, researchers have been iterating over differ-
ent ways of improving traffic predictions. This dissertation is being prepared within the scope of
an academic research project: DynamiCITY. DynamiCITY aims to create a virtual environment
for the design and implementation of solutions for mobility within smart cities. It consists of an
open-data platform and a multi-agent-based decision-support system that allows researchers and
practitioners to explore the relationship between the city and its transport system, focusing on cli-
mate neutrality, safety, and accessibility. The DynamiCITY is being developed as a collaboration
between three research laboratories of FEUP: LIACC, CITTA, and SYSTEC. It will represent an
important R&D infrastructure available to the whole research community.
1
Introduction 2
of research and explore how to properly deal the main challenges of this problem, such as the
complex relationships between traffic conditions in time, space and external factors. This research
will then be put to use in developing and implementing a machine learning model based on the
current trend of research for road traffic forecasting, considering factors such as weather and time
of day that may affect traffic patterns. Being part of a research project that focuses on smart cities,
the ultimate goal of this dissertation is to be able to use this model to make accurate and reliable
traffic predictions to be incorporated into Intelligent Transportation Systems across different road
networks.
1.3 Contributions
The contributions of this dissertation can be summarized as follows:
• A comprehensive review of the existing literature of traffic forecasting, with a special focus
on techniques based in Graph Neural Networks(GNN).
• Comparison of the performance of GNNs with other commonly used traffic prediction mod-
els in the literature.
• Investigation of the impact of using missing data imputation techniques on traffic prediction
performance of Graph Neural Networks (GNNs).
• Evaluation of the generalizability of the GNN models across different datasets, including
benchmarking datasets and DynamiCITY’s use-case dataset.
Overall, the results of this research contributed to a better understanding of the potential of
GNNs in the context of traffic forecasting.
1.4 Outline
The dissertation begins with an introduction that sets the stage for the research. It provides the
necessary context for understanding the study and outlines the aims and goals that the research
seeks to achieve. The expected contributions of the work are also highlighted, giving an overview
of what can be expected from the document.
The background chapter, Chapter 2, delves into the main areas of study. First, exploring the
concept of Intelligent Transportation Systems (ITS) and their relevance to the research topic. Sec-
ondly, the chapter focuses on the traffic forecasting problem itself. It discusses various traffic
1.4 Outline 3
prediction techniques and categorizes the different types of traffic prediction problems. More-
over, the chapter introduces Graph Neural Networks (GNNs) as a potential solution to the traffic
forecasting problem. The architecture of GNNs is explained, with a particular emphasis on the
attention mechanism.
Chapter 3 reviews the existing literature and research in the field of traffic forecasting. It
provides an overview of various techniques and approaches that have been employed, includ-
ing statistical methods, ensemble learning, hybrid methods, machine learning, and deep learning
techniques. Special attention is given to the application of Graph Neural Networks in traffic fore-
casting. Furthermore, benchmarking datasets used in the field are discussed, and a gap analysis is
presented to identify areas where the current research can make valuable contributions.
The methodological approach chapter, Chapter 4, outlines the methodology followed in the
research. It begins by clearly stating the problem and formalizing it. The chapter then presents
the implementation pipeline, detailing the steps involved in the research process. Validation is
addressed, including the metrics used to evaluate the models and the baselines against which the
models are compared. Additionally, the chapter includes a risk analysis, considering potential
challenges and limitations associated with the research.
Moving on to the implementation in Chapter 5, the focus is on the practical aspects of the
research. It discusses the technologies used in the research and describes the data retrieval and
pre-processing steps. This includes the handling of traffic data, weather data, and event data. The
chapter also explains the process of generating the graph dataset, covering aspects such as adja-
cency matrix generation, traffic speed normalization, and the incorporation of additional features.
Then the base Graph Neural Network architecture is described, including the different layers and
network hyperparameters. Moreover, techniques for handling missing data and incorporating ex-
ternal data, such as weather and event data, are explored.
In the empirical evaluation, Chapter 6, the research findings are presented and analyzed. The
performance and comparisons of the base Graph Neural Network architecture on each dataset
are discussed. The effect of different hyperparameters is evaluated, and the impact of handling
missing data and incorporating weather data is explored. Global comparisons are made to gain
insights into the overall performance of the models. The results are also discussed, providing an
understanding of the empirical findings.
Finally, the dissertation concludes with Chapter 7. The main findings and contributions of the
research are highlighted, and the limitations and potential areas for future research are addressed.
This serves as a reflection on the research journey and offers closure to the document.
Chapter 2
Background
This introductory chapter presents a summary of the research themes explored in this dissertation
so the reader can become familiarized with the fundamental concepts necessary to comprehend
it. Accordingly, this chapter provides an overview of the main topics covered in this dissertation:
ITS, Traffic Prediction and Graph Neural Networks.
4
2.2 The Traffic Forecasting Problem 5
inform a range of decisions, from planning and designing new transportation systems to optimiz-
ing existing networks.
• Data Source: Regarding the data source for the problem, it’s possible to have them cat-
egorized into moving, interval and point sources (Angarita-Zapata, Masegosa, & Triguero,
2019) (W. Jiang & Luo, 2022). Interval data sources, such as Automatic Toll Collection
systems, Video cameras, and License Plate Recognition, differ from point detectors in that
they offer a more comprehensive understanding of vehicle movement along a particular seg-
ment of the road. These sources are equipped with sensors placed at two fixed points, which
calculate measures such as travel time between the points (Angarita-Zapata et al., 2019)
(Mori, Mendiburu, Álvarez, & Lozano, 2015). Moving data sources is the newest since
Background 6
it appeared with the use of Global Positioning Systems (GPSs) in vehicles and other de-
vices. These can provide individual traffic data related to vehicles’ trajectories on the roads
and more detailed traffic information (Angarita-Zapata et al., 2019) (Castro, Zhang, & Li,
2012). Point data sources are placed at specific locations on the roads to detect the presence
of nearby vehicles. These sensors, called point data sources, provide information such as
traffic flow ( number of vehicles passing per time unit), occupancy (the percentage of time
that a sensor is detecting a vehicle), and density (the number of vehicles per unit length of
the road). Common types of sensors used in this category include loop detectors, microwave
radars, laser radar sensors, infrared sensors, among others (Lopes, Bento, Huang, Antoniou,
& Ben-Akiva, 2010) (Mori et al., 2015). This is the data source that will be used for this
dissertation.
• Prediction Horizon: The predictions can be short-term, which are usually defined as pre-
dictions for a time interval smaller than 60 minutes or long-term if the prediction window is
bigger than 60 minutes (Irawan et al., 2020) (K. Lee & Rhee, 2022) (Angarita-Zapata et al.,
2019). This work focuses on short-term prediction.
• Prediction Output: Traffic prediction tasks can be classified based on the traffic state to
be predicted. Predictions related to traffic flow, speed, and demand are the most widely
addressed and are typically handled as distinct problems, while other types are generally
grouped together(W. Jiang & Luo, 2022). This work focuses on traffic speed prediction.
• Traffic Representation: The prediction can also be defined in terms of how traffic states
are defined. These are graph/network-based representations or grid-based representations.
Graph-based (R. Jiang et al., 2021). Grid-based traffic prediction refers to the use of a grid-
based data structure, where the area of interest is divided into a regular grid of cells, and
traffic data is collected and analyzed at each cell. This approach assumes that traffic condi-
tions are homogeneous within each cell and that the data collected at each cell can be used
to make predictions about traffic conditions in that cell (D. Chen et al., 2021) (Schörner,
Hubschneider, Härtl, Polley, & Zöllner, 2019) (X. Zhou, Shen, & Huang, 2019). On the
other hand, Graph-based prediction refers to the use of a graph-based data structure, where
the area of interest is represented as a network of interconnected nodes and edges. This
approach is based on the assumption that traffic conditions are influenced by the relation-
ships and interactions between different parts of the network (Cui, Henrickson, Ke, & Wang,
2018) (L. Zhao et al., 2020). This work solely focuses on graph-based representations.
2.3 Graph Neural Networks 7
Figure 2.1: Grid-Based Traffic and Graph-Based Traffic (extracted from (R. Jiang et al., 2021))
Categorizing the various dimensions of the Traffic Prediction problem helps to organize this
broad task into comprehensible subtasks that will be explored in this dissertation, which will focus
on graph-based traffic flow and speed prediction in a freeway context. A diagram that summarizes
the dimensions of Traffic Prediction can be seen in Fig. 2.2
Figure 2.2: Diagram that illustrates the dimension in which traffic prediction problems can be
categorized
Figure 2.3: Comparison between a 2-D Convolution (a) and a Graph Convolution (b) (extracted
from (Z. Wu et al., 2021b))
In GNNs, the nodes in the graph are typically associated with some feature or attribute, and
the edges represent the relationships between the nodes. GNNs learn representations of the nodes
and edges in the graph and make predictions about the graph. They are typically composed of
multiple layers, each of which takes as input the representations of the nodes and edges from the
previous layer and then updates the representations based on the graph structure. The layers of a
GNN use various techniques to learn the graph structure, such as aggregation and pooling, which
allow the network to propagate information through the graph.
One of the key features of GNNs is their ability to handle graph-structured data with variable
sizes and their ability to propagate information through the graph. This allows GNNs to learn from
patterns and relationships in the graph that would be difficult or impossible to learn from using
traditional neural networks that don’t capture this connection, relying on flat data.
Table 2.1: Notation used to describe the GNN architecture (adapted from (Ye, Zhao, et al., 2022))
Figure 2.4: Traffic prediction problem formulated as a GNN (extracted from (Ye, Zhao, et al.,
2022))
Accordingly to the literature, GNNs can be categorized into four types, namely, recurrent
GNNs, convolutional GNNs, graph autoencoders, and spatio-temporal GNNs (Z. Wu et al., 2021a),
that will be further explored in Chapter 3.
Because of the spatio-temporal nature of the traffic forecasting problem, the GNNs used in
this field can be categorized as spatio-temporal GNNs. However, specific components of the
other types of GNNs have also been applied to this problem (W. Jiang & Luo, 2022). Spatio-
temporal GNNs are capable of capturing the complex relationships and dependencies between
Background 10
the different road segments, intersections, and the temporal information, which is essential for
accurate traffic forecasting. They can be used to model the temporal dynamics of the traffic on
the graph, by using the graph structure to propagate information between nodes and edges, and
also the temporal dependencies between the different timestamps. This can be achieved by using
temporal convolutions or recurrent layers in the GNN architecture, which allow the network to
learn both spatial and temporal representations of the data (Bui, Cho, & Yi, 2022). A general
representation of these networks can be seen in Figure 2.5.
Figure 2.5: Representation of a spatio-temporal GNN (extracted from (Bui et al., 2022))
One of the most important components of GNNs is the graph convolution. Unlike traditional
convolutions, which are used to process grid-structured data, graph convolutions allow for the
processing of data that is organized as a graph or a network. In graph convolutional networks
(GCNs), each node in the graph is treated as a feature, and the edges of the graph are used to
encode relationships between the nodes (B. Yu, Yin, & Zhu, 2018). The goal of graph convolutions
is to learn a feature representation of the nodes in a graph.
The basic idea behind graph convolutions is to aggregate information from a node’s neighbours
in the graph. This is done by defining a function that takes as input the features of a node and its
neighbours and outputs a new set of features for the node. This function is applied to all the
nodes in the graph in a recurrent manner, allowing for information to flow across the graph and
the features of each node to be updated. The parameters of the graph convolution operation are
learned during the training process (Sergios Karagiannakos, 2021).
The specific way that graph convolutions are implemented can vary, but typically they involve
a combination of matrix multiplication and activation functions. For example, one common ap-
proach involves first representing the graph as a weighted adjacency matrix, where the values in
the matrix represent the strength of the relationships between nodes. This matrix is then used to
2.3 Graph Neural Networks 11
compute a feature representation for each node, which is updated through a series of convolu-
tion operations. These operations can also be performed recursively, allowing for a hierarchy of
representations to be learned, each representing different levels of abstraction.
Figure 2.6: Schema of attention over one node with respect to its adjacent nodes. (extracted from
(Sanchez-Lengeling et al., 2021))
Note: In the following sections of this dissertation, other specific terms and components re-
lated to GNNs and Machine Learning that were not mentioned in this section will be used, so the
interested reader is redirected to (Ripley, 1996) for the base knowledge of ML and (Hamilton,
2020)(Z. Liu & Zhou, 2020) for a full overview of GNNs and their structure.
Chapter 3
Related Work
After the context provided by Chapter 2, this section aims to summarize the current state of re-
search in road traffic forecasting, providing a review of the latest technical achievements in this
field and highlighting the main challenges that remain for future research in this field and that
guide the work developed in this dissertation.
12
3.2 Regression Algorithms 13
Overall, hybrid methods have been shown to be effective in improving the accuracy of traffic
flow predictions. By combining the strengths of different models, these methods can capture the
complexity of traffic patterns and the influence of external factors. It’s important to note that
while hybrid methods can be effective in improving the accuracy of traffic flow predictions, they
may also come with higher computational costs and require more data and resources for optimal
performance.
Figure 3.1: Symplifyed representation of a hybrid CNN-RNN model (extracted from (K. Lee et
al., 2021))
mension, showing improvement from STGCN and other baseline models. The authors pinpointed
the inclusion of external data as a relevant future work direction.
A problem that can occur when dealing with road network datasets is that the explicit graph
structure does not necessarily reflect the true dependency relation, due to the incomplete connec-
tions in the data, so (Z. Wu, Pan, Long, Jiang, & Zhang, 2019) developed Graph WaveNet, a
graph-based adaption of WaveNet (van den Oord et al., 2016) that focuses on trying to extract an
adaptive adjacency matrix that captures hidden spatial features. Another factor that can influence
traffic predictions is the occurrence of incidents, and for that (Xie et al., 2019) developed a model
for capturing the impact of traffic incidents on traffic flow and speed, showing how this can impact
predictions.
Another proposed architecture is the encoder-decoder, used by (Zheng, Fan, Wang, & Qi,
2019) in GMAN. This architecture features an encoder and a decoder made up of multiple spatio-
temporal attention blocks that model the effect of spatio-temporal factors on traffic conditions.
The encoder processes the input traffic features while the decoder produces the output sequence.
A transform attention layer sits between the encoder and decoder and converts the encoded traffic
features into the decoder’s input. The transformer attention mechanism models the direct relation-
ships between past and future time steps, helping to mitigate the error propagation issue between
prediction time steps. Experimental results demonstrate the superiority of GMAN when compared
to other methods, including Graph WaveNet and STGNN, To try to accommodate global network
dependencies and changes in the graph structure, (Zhang, Chang, Meng, Xiang, & Pan, 2020) pre-
sented an architecture based in Structure Learning Convolution (SLC), which enables extending
the traditional CNN to graph domains and learning the underlying graph structure for traffic pre-
diction. The model has two SLC modules to capture the global and local structures respectively.
Additionally, Pseudo three Dimensional convolution (P3D) networks are combined with SLC to
model the temporal connection. It was tested in 6 different datasets, where it outperformed other
state-of-the-art approaches.
Another significant approach was developed by (L. Zhao et al., 2020), combining GCN and
GRU networks. The GCN is used to capture the topological structure of the graph to obtain the
spatial dependence, and the GRU model is used to capture the dynamic temporal dependence.
The experiments made by the authors showed the robustness of the approach when dealing with
network perturbations. Another technique that uses GRU was developed by (S. Cao, Wu, Zhang,
Li, & Wu, 2022), aiming to address the limitations of conventional graph convolutional networks
GCN in mining global spatial correlations. To achieve this, the authors leverage gated recurrent
units (GRU) and attention to explore local and global temporal correlations simultaneously.
Trying to tackle the missing data problem in traffic prediction, (Zhong, Suo, Jia, Zhang, &
Su, 2021) propose RIHGCN. This method effectively handles missing data through a recurrent
imputation process and a heterogeneous graph structure that captures dynamic spatial correlations
among nodes in the road network. RIHGCN differs from standard GCN models by creating multi-
ple graphs with different edges, which better capture changing spatial correlations over time. The
method integrates data imputation and traffic prediction in a unified framework, optimizing both
3.3 Benchmarking Datasets 17
objectives simultaneously and avoiding accumulated errors. Experiments show that RIHGCN out-
performs existing methods by a significant margin. This work provides an indication that paying
special importance to dealing with missing data can bring a positive effect in the predictions.
Addressing limitations of other methods that only model relationships between node pairs
and node history information, neglecting node properties, the proposed approach by (Hu, Lin,
& Wang, 2022) is a dynamic spatial-temporal graph convolutional network (DSTGCN), which
includes a dynamic graph generation module that adaptively fuses geographical proximity and
spatial heterogeneity information, and a graph convolution cycle module that captures local tem-
poral dependencies. Experiments on two types of traffic prediction tasks show that DSTGCN
outperforms most baseline models, including STGCN, ASTGCN, and GMAN.
In conclusion, Graph Neural Networks have proven to be an effective solution for traffic pre-
diction tasks. Its application has gained significant attention in recent years due to their ability
to capture the complex spatial and temporal dependencies of traffic data. The combination of
Graph Convolutional Networks (GCNs) and Recurrent Neural Networks (RNNs) has been shown
to achieve excellent performance in multi-step traffic flow prediction tasks. Additionally, the in-
troduction of attention mechanisms has further improved prediction results. Furthermore, the
inclusion of external factors, such as the time of day, day of the week, and traffic accidents, in the
prediction process has been shown to have a significant impact on the accuracy of the prediction
results, highlighting the importance of considering the influence of external factors in traffic flow
prediction tasks. Table 3.1 contains a list of all the approaches addressed in this section.
Prediction
Name Problem Context Tested On
Horizon
STGCN: Spatio-Temporal Graph Convolu- BJER4,
Short-term Speed Highway
tional Network (B. Yu et al., 2018) PeMSD7
MRes-RGNN: Gated Residual Recurrent
Short-term, PeMS-Bay,
Graph Neural Network (C. Chen et al., Speed Highway
Medium-term METR-LA
2019)
GTCN: Temporal Graph Convolutional Net- Short-term, PeMSD4,
Speed Highway
work (Ge et al., 2019) Medium-term PeMSD7
ASTGCN: Attention Based Spatial-
Short-term, PeMsD4,
Temporal Graph Convolutional Network Flow Highway
Medium-term PeMSD8
(Guo et al., 2019)
Graph WaveNet for Deep Spatial-Temporal METR-LA,
Short-term Speed Highway
Graph Modeling (Z. Wu et al., 2019) PeMS-Bay
DIGC-Net: Deep Graph Convolutional Net- Short-term,
work for Incident-driven Traffic Speed Pre- Medium-term, Speed Urban SFO, NYC
diction (Xie et al., 2019) Long-term
Short-term,
GMAN: Graph Multi-Attention Network Urban, PeMs-Bay,
Medium-term, Speed
(Zheng et al., 2019) Highway Xiamen
Long-term
METR-LA,
SLCNN: Spatio-Temporal Graph Structure
Short-term Flow Highway PeMS-Bay,
Learning (Zhang et al., 2020)
PeMs-S
T-GCN: Temporal Graph Convolutional Urban, SZ-Taxi,
Short-term Speed
Network (L. Zhao et al., 2020) Highway Los-loop
ST-UNet: Spatio-Temporal U-Network for METR-LA,
Short-term,
Graph-structured Time Series Modeling Flow Highway PeMS-M,
Medium-term
(T. Yu, Yin, & Zhu, 2019) PeMs-L
RIHGCN: Heterogeneous Spatio-Temporal
Graph Convolution Network for Traffic
Short-term Speed Highway PeMS
Forecasting with Missing Values (Zhong et
al., 2021)
GCRAN: Graph Convolutional Recurrent Short-term, Flow, METR-LA,
Highway
Attention Network (S. Cao et al., 2022) Medium-term Speed PeMS-Bay
PEMS-Bay,
DSTGCN: Dynamic Spatial-Temporal
Flow, Urban, NE-BJ,
Graph Convolutional Network (Hu et al., Short-term
Speed Highway PEMSD4,
2022)
PEMSD8
requirements. However, the sensors are susceptible to hardware failures, resulting in missing or
noisy data, which necessitates the use of pre-processing techniques. Additionally, there are some
drawbacks to using traffic sensor data for graph-based modelling: these sensors can only be in-
3.3 Benchmarking Datasets 19
stalled in a limited number of locations due to factors such as installation cost. As a result, only a
portion of the road network equipped with traffic sensors can be included in the graph, while the
remaining areas would be omitted (W. Jiang & Luo, 2022).
Most GGN-based approaches make their experimental predictions on loop detector datasets,
so having open-source datasets that can be used for benchmarking between different approaches
is important. Consequently, since the advent of GNNs for traffic prediction, some open-source
datasets have become popular and used in different approaches, which are enumerated in Table
3.2 and described as follows.
METR-LA is a dataset that contains traffic speed and volume collected from the highway of
the Los Angeles County road network from March 1st to June 30th, 2012, with 207 loop detectors
in total, with samples aggregated in 5-minute intervals. It is represented in Fig.3.2a.
Caltrans Performance Measurement System (PeMS) (State of California, 2023) is a large-scale
transportation data collection system in California, United States. It is managed by the California
Department of Transportation (Caltrans) and provides real-time and historical traffic data from
over 18,000 vehicle detector stations on the freeway system. The data is collected using various
sensors, including inductive loops, side-fire radar, and magnetometers, and includes information
on the total flow, average speed, and direction of travel for each sensor. Several traffic datasets
have been extracted from PeMS, namely: PeMS-Bay, which can be seen in Fig.3.2b, contains data
from 325 sensors in the Bay Area from January 1st to June 30th, 2017; PeMSD3, which uses
358 sensors in the North Central Area from September 1st to November 30th, 2018; PeMSD4
that contains data from 307 sensors in the San Francisco Bay Area from January 1st to February
28th, 2018; PeMSD7 which uses 883 sensors in the Los Angeles Area from May to June 2012;
PeMSD8: that uses 170 sensors in the San Bernardino Area from July to August 2016.
Seattle Loop (Cui, 2023), represented in Fig.3.2c, is a dataset collected on four connected
freeways (I-5, I-405, I-90, and SR-520) in the Seattle area, from January 1st to 31st, 2015. It
contains the traffic speed data from 323 detectors.
Traffic Speed Guangzhou (X. Chen, Chen, & He, 2018), consists of 214 road segments in
Guangzhou, China (mainly urban expressways and arterials) within two months from August 1 to
September 30, 2016, in 10-minute intervals.
(a) METR-LA (H. Lu et al., 2020) (b) PeMS-BAY (H. Lu et al., 2020) (c) Seatle Loop (Cui et al., 2018)
Table 3.2: List of the most popular open-source benchmarking loop-based traffic datasets
Nº of Aggregation Recorded
Name Location Time Period
Sensors Period Metrics
Los Angeles, Speed, March 1st to June
METR-LA 207 5 minutes
CA, USA Volume 26th 2012
San Jose, CA, Speed, January 1st to June
PeMS-BAY 325 5 minutes
USA Volume 30th, 2017
North Central September 1st to
Speed,
PeMSD3 California, 358 5 minutes November 30th,
Volume
USA 2018
San Franciso, Speed, January 1st to Febru-
PeMSD4 307 5 minutes
CA, USA Volume ary 28th, 2018
Los Angeles, Speed,
PeMSD7 883 5 minutes May to June, 2012
CA, USA Volume
San
Speed,
PeMSD8 Bernardino, 170 5 minutes July to August, 2016
Volume
CA, USA
Seattle, CA, Speed, January 1st to 31st,
Seattle Loop 323 5 minutes
USA Volume 2015
Traffic
Guangzhou, August 1 to Septem-
Speed 214 10 minutes Speed
China ber 30, 2016
Guangzhou
Interpretability and explainability are also important areas of concern in the use of GNNs for
traffic forecasting (Yuan & Li, 2021) (W. Jiang & Luo, 2022). These models can be complex and
difficult to interpret, making it challenging to understand the reasoning behind their predictions
(Baldassarre & Azizpour, 2019). This lack of interpretability can limit the trust that stakeholders
place in the models and make it difficult to identify areas for improvement. This indicates that
there is a need for more research to develop interpretable and explainable GNN-based models for
traffic forecasting.
Furthermore, as seen in Table 3.1, most GNN approaches focus on short-term traffic predic-
tions, with intervals from 15 to 60 minutes into the future. Long-term prediction has more complex
spatio-temporal dependencies and uncertainty factors (Yin et al., 2022). It is a relevant research
direction to make models that are able to deal with longer predictions.
In addition to these issues, there is also a need for further research on the extraction of addi-
tional information from the output of GNN-based traffic prediction models. For example, (Xie et
al., 2019) have explored the use of GNNs for accident prediction, which is an important applica-
tion of traffic forecasting. However, there is a need for further research to explore other types of
additional information that can be extracted from the output of these models.
Another area that shows a need for further improvement is its general applicability. The differ-
ent approaches analysed in this literature review were tested in a small number of datasets, often
one or two, with little to no focus on exploring how well the models can handle different traffic
networks.
In conclusion, while GNNs have shown promising results in the area of traffic forecasting,
several gaps in current research need to be addressed in order to further improve their accuracy and
applicability. These include the handling of missing data, the consideration of external factors, the
development of interpretable and explainable models, and the extraction of additional information
from the output of these models. By addressing these gaps, it will be possible to further advance
the state of the art in traffic forecasting using GNNs.
Chapter 4
Methodological Approach
This chapter aims to outline the research approach, containing the problem statement and formal
definition, a pipeline of the research process, the validation methods and ends with an analysis of
the risks that can affect the work’s implementation.
• What is the suitable design of a Graph Neural Network for traffic prediction?
• How are the models affected by different missing data handling techniques?
• Can external factors, such as weather and nearby social events, be efficiently incorporated
into the GNN models and how do they influence the model’s performance?
22
4.3 Implementation Pipeline 23
values of all the features of all nodes at time t. X = (X1 , X2 , . . . , Xτ )T ∈ RN×F×τ denotes the
value of all the features of all the nodes over τ time slices. In addition, we set yti = xtf ,i ∈ R to
represent the traffic speed of node i at time t in the future.
So, the problem to be solved becomes that given X , that represents all historical measure-
ments of all the sensors on the network over past τ time slices, being able to predict future traffic
T
states Y = y1 , y2 , . . . , yN ∈ RN×Tp of all the nodes
on the whole traffic network over the next
Tp time slices, where y = yτ+1 , yτ+2 , . . . , yτ+Tp ∈ RTp denotes the future traffic state of node i
i i i i
from τ + 1.
4.4 Validation
Having a proper validation stage is an important part of the development since it evaluates the
performance and assesses how well the model is able to generalize to new data and identify any
potential issues.
In this work, the validation is done on three fronts:
• Evaluate the performance on different datasets to evaluate how well the models generalize
and adapt to different circumstances;
Methodological Approach 24
• Compare the performance of the different iterations (base, with missing data handling and
with external factors) of the proposed models to see the impact of these factors in the pre-
dictions.
• Compare the performance with baseline approaches to assess how powerful the proposed
models are against already existing techniques;
4.4.1 Metrics
The metrics that are used in this validation process are Root Mean Squared Error (RMSE), Mean
Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). They are mathematically
defined as:
1
MAE =∑Si=1 ∑Nj=1 xi, j − x̂i, j
N×S (4.1)
1 |xi, j −x̂i, j |
MAPE = N×S ∑Si=1 ∑Nj=1 xi, j (4.2)
q
1 2
RMSE = N×S ∑Si=1 ∑Nj=1 (xi, j − x̂i, j ) (4.3)
Where x̂i, j is the predicted traffic metric and xi, j is the observed traffic metric, with N being
the number of samples in the test set and S the number of sensors. RMSE measures the deviation
of the predictions from the actual values, taking into account the magnitude of the error. A lower
RMSE value indicates that the model’s predictions are closer to the actual values and thus perform
better. MAE measures the average magnitude of the errors in a set of predictions, and its units
are the same as the predicted target, making it easier to interpret the results. However, as it does
not amplify the effect of large errors, it is less sensitive to outliers(Karunasingha, 2022). MAPE
provides a measure of the accuracy of the model’s predictions as a percentage of the actual values,
making it a useful metric for comparing the performance of the model across different datasets.
These metrics were chosen based on their systematic presence in the recent literature as the metrics
for model performance evaluation (Barros, Araujo, & Rossetti, 2015).
Using these metrics, it is possible to quantify the accuracy of the GNN model’s predictions and
compare it to the performance of other methods. The validation process results will provide in-
sights into the strengths and limitations of the explored models, allowing for further improvement
and refinement. The ultimate goal of the validation process is to demonstrate the effectiveness and
feasibility of the explored approaches.
It is important to note that the datasets used contain missing values, and the predictions made
for those cases should not be accounted for in the metrics calculation since they would be altering
the perceived power of prediction on real values. To avoid that, other works used a technique
called masked metrics, in which the missing values are covered by a mask and not used in the
metrics computations. This work also uses this technique.
4.5 Risk Analysis 25
In addition to these metrics, the execution times of the models will also be measured. One
model can produce better results than others, but if it takes a much longer time to train, the dif-
ference in results might not be relevant enough to justify the added computational cost, so it’s
important to also extract some conclusions from this.
4.4.2 Baselines
To make a global comparison between the work carried out and research done in this field, several
baselines will be used. The choice was based on the baselines chosen by other works, with a focus
on time-series methods and the pioneer GNN approaches to the traffic prediction problem. So, the
chosen baselines are the following:
• Historic Average: This approach models the traffic flow as a seasonal process and uses the
weighted average of previous seasons as the prediction (Li et al., 2018). The period used is
one week, and the prediction is based on aggregated data from the previous four weeks. For
example, the prediction for a given Thursday is the average traffic speeds from the last four
Thursdays.
• VAR - Vector Auto-regressive Model: It is a multivariate time series model that captures
the interactions and dependencies between multiple variables. It represents each variable
as a linear combination of its own past values and the past values of other variables. The
VAR model generates predictions by estimating coefficient matrices and a constant term.
The approach was implemented using the statsmodel python package.
• ARIMA: Stands for Autoregressive Integrated Moving Average and is a widely used time
series model. It is a statistical method that combines autoregressive (AR), differencing (I),
and moving average (MA) components to capture the patterns and trends in time series data.
The orders used were (3, 0, 1), and the model was implemented using the statsmodel python
package.
• DCRNN: Is a deep learning model created by (Li et al., 2018) specifically designed for
traffic prediction tasks. It combines concepts from GCNs and RNNs to capture both spatial
and temporal dynamics in traffic data.
• STGCN: Is a deep learning model designed by (B. Yu et al., 2018) to capture both spatial
and temporal dependencies in spatiotemporal data. It combines GCNs with convolutional
and recurrent layers to model spatial and temporal relationships within the data.
analysis can help minimize potential risks and ensure the research project is completed success-
fully.
One risk that can take place in this area of research is data availability and quality. Traffic
data is crucial for the development and evaluation of the GNN models, but collecting it can be a
challenge, especially if the data is proprietary or if there are privacy concerns (Laña et al., 2018).
Inaccurate or incomplete data can also negatively impact the performance of the GNN models.
To mitigate these risks, this work will utilize benchmarking datasets that have already been used
in many other well-documented approaches, which helps ensure the quality and relevance of said
datasets.
Another risk is the computational resources required for GNN models. GNNs can be com-
putationally intensive and may require significant computational power and time to train (Yin et
al., 2022). This can be a barrier to the practical implementation of GNN models. To address this
risk, it is necessary to have access to high-performance computing resources and accommodate
the appropriate time for training in the work plan. Since one of the goals of this dissertation is to
explore the incorporation of external data in the prediction models, it’s necessary to make correct
use of those so as not to increase the time and computation power needed to train the models in an
unmanageable way.
In addition, there is a risk that the GNN models may not generalize well to new, unseen data
due to overfitting the training data. This can lead to poor performance in real-world applications.
To mitigate this risk, it is necessary to test the performance of the model in different datasets and
adapt it, so it generalizes better.
In conclusion, by considering these potential risks while developing this work, it is possible to
minimize their impact and increase the likelihood of achieving high-quality results.
Chapter 5
Implementation
Following the methodology proposition, this chapter aims to explain the implementation details,
stage by stage. Starting with the choice of frameworks and technologies for the development, then
explaining the data retrieval and preprocessing, followed by the generation of the graph dataset
structure. After this data-focused part, the base GNN architecture generation is explained, fol-
lowed by the additional approaches: handling missing data and adding external data.
27
Implementation 28
layers, graph pooling layers, and graph attention mechanisms that are specifically designed for
graph data.
In traffic forecasting, the temporal dynamics of traffic data are critical, so the use of an-
other library, PyTorch Geometric Temporal, was very helpful. PyTorch Geometric Temporal
(Rozemberczki, 2023) extends PyTorch Geometric to support temporal graph data and provides a
set of temporal graph convolutional layers and temporal pooling layers that can model the tempo-
ral dependencies in traffic data. The library also includes utilities for loading and preprocessing
temporal graph datasets, making it easy to work with real-world spatiotemporal traffic data.
Using these libraries allows a powerful and flexible toolkit for developing and training deep
learning models for traffic forecasting on graph-structured spatiotemporal data. The libraries pro-
vide an efficient, easy-to-use, and well-documented framework for deep learning models, taking
advantage of the latest advances in deep learning.
On top of the raw sensor traffic data, in order to be able to generate the graph representation of
the traffic, each of the datasets is accompanied by an auxiliary file with the geographic locations
of each sensor.
Table 5.1 lists the main characteristics of the used datasets. The particularities and individual
pre-processing techniques applied to each of the 3 data sources will be explained in the following
subsections.
Table 5.1: Description of each of the different traffic datasets to be used
Number of Period
Name Location Recorded Period
Sensors Length
Los Angeles, CA, March 1st to June
METR-LA 207 4 months
USA 26th 2012
January 1st to June
PeMS-BAY San Jose, CA, USA 325 6 months
30th, 2017
July 1st, 2020 to
PeMS-BAY-2years San Jose, CA, USA 319 2 years
June 30th, 2022
January 1st to June
VCI Porto, Portugal 26 6 months
30th, 2015
Of the benchmarking datasets found in the literature and described in Section 3.3, PeMS-
BAY and METR-LA were the ones chosen to use in this work, because they were the two most
referenced. They are both loop-detector-based datasets from highway road networks, with the
respective sensor location shown in Fig.5.1. Both datasets also follow the same structure, as
illustrated in Fig.5.2: for each time interval of 5 minutes (represented in each row) , there is one
value for each sensor (represented by the columns), which is the harmonic average speed of that
interval in miles per hour (mph).
Implementation 30
Figure 5.2: METR-LA and PeMS-BAY Dataset Structure, where each column represents a sensor
and each row represents a time, with each cell having the harmonic average speed of a sensor in a
timeframe
For PeMS-BAY, since the referenced dataset was collected from Caltrans Performance Mea-
surement System (State of California, 2023), which is a large-scale transportation data collection
system, it was possible to retrieve a new period of data in addition to the referenced period from
January 1st to June 30th, 2017. The new instance of PEMS-BAY data collected dates from the 1st
of July 2020 to the 30th of June 2022, with a duration of 2 years in total. There are two reasons for
this choice: on one side, it comes from the lack of finding relevant event data for older datasets, so
this one can be used to test the incorporation of such factors; on the other side, the bigger temporal
window that this dataset has allows for comparing the results when using longer or shorter dataset
periods for training the models. This new dataset will henceforth be called PeMS-BAY-2years to
distinguish between the two versions. It is relevant to note that the latter version has 319 sen-
sors instead of the original 325 because when retrieving the new data, those sensors’ IDs were no
longer available. Apart from this, these datasets share the same properties.
This traffic dataset was obtained from 26 loop-detector sensors, whose location can be seen in
Fig.5.3, installed on Via de Cintura Interna (VCI), a highway in Porto, Portugal. VCI is a ring-like
highway that is 21 km long, with two directions separated by a lane separator. The sensors are
positioned transversely in the lane, under all pathways in both directions. They collect information
about the type and number of vehicles passing through every 5 minutes for the specific segment
they are situated in. The historical dataset contains traffic information from 2013, 2014 and 2015,
but in order to have a dataset with a similar length to the benchmark ones, only the last six months
recorded were used.
Unlike the benchmarking datasets described above, the VCI dataset follows a different struc-
ture, with each timestamp having other metrics besides the harmonic average speed value. The
meaning of every field in the original VCI dataset can be seen in Table 5.2. Another relevant fact
is that this dataset contains missing values, which should be addressed and will pose an additional
challenge. For the purpose of this work, it was necessary to change the structure of the dataset to
match METR-LA and PeMS-BAY.
To get the original VCI dataset to match the structure of the benchmarking datasets the fol-
lowing procedure is followed:
• The original dataset has predictions in the two traffic directions and only one should be
studied at a time, so only records with Lane_bunndle_direction equal to "C" were kept.
• Pivot the dataset table structure to match the structure of PeMS-BAY and METR-LA: Every
value of Equipment_id becomes a separate column that for each Agg_period_start holds the
speed Avg_speed_harmonic value.
To have consistent weather data across all traffic datasets, instead of pre-existing datasets, an
API was used to retrieve historical weather data for each dataset’s locations and time periods.
The chosen API was Weatherbit (Weatherbit, 2023a) for its free availability of high-granularity
historical data for all locations and time periods needed.
For simplicity, only the closest weather station location to the centre of the road network was
picked for each dataset. The smallest frequency of measurements available for historical data is of
15 minutes.
[Link] Pre-processing
To better fit the needs of this work, the weather data needed to go through some pre-processing
steps. First, not all of the dataset fields available4 through the API were needed for the task at
4 Full list of available fields: [Link]
Implementation 32
Table 5.2: Description of each field of the VCI dataset (adapted from (Alam et al., 2017))
hand, so they were not included, such fields were: UV index, solar radiance, atmospheric and
sea level pressure, humidity and solar angles. Also, despite being relevant for traffic prediction,
the snow-rate field was also removed, given that for the cities concerned in the traffic datasets,
its value was always zero. The ’pod’ field was converted from string to int, ’n’ became 0 and ’d’
became ’1’.
The final number of fields became 6 out of the original 24 made available by the API. The
fields that were kept can be found in Table 5.3.
Table 5.3: Description of the fields from the weather API that were used (adapted from
(Weatherbit, 2023b) )
As previously mentioned, the smaller granularity available for the historical data was of 15-
minute intervals, while the traffic datasets have 5-minute intervals. So there was the need to match
these intervals so the data could be used together. Given the bigger importance of the traffic data
over the weather data, it did not make sense to reduce the granularity of the traffic data to 15-
minute aggregates, because some information would be lost by doing that. Instead, the weather
dataset was modified to have 2 more records between each 15-minute interval by doing linear
interpolation of the available records. It is known that this introduces some inaccuracy in the data,
but its effects were considered less damaging than reducing the traffic dataset granularity.
To allow other researchers to follow the same procedure and generate weather datasets to use
with loop-sensor data for traffic prediction tasks, a command line tool that can generate weather
datasets with the characteristics as described above for any location and time period was created.
It retrieves the data from the Weatherbit API, processes it and saves it to a CSV file.
To use the tool, there’s the need to input several parameters, such as a valid Weatherbit API
key, the latitude and longitude of the location for which to generate the weather data, as well as
the start and end dates for the data and finally the name for the output CSV file. The usage is as
follows:
So, running:
would build weather data for San Francisco, CA from January 1, 2022 to January 31, 2022, using
the Weatherbit API key ‘abc123‘, and save it to a file named ‘my_weather_data.csv‘.
The events dataset was more difficult to find than the previous ones since there is less availability
of open-source quality data regarding this matter, and there was the need to find events occurring
in specific timeframes in specific locations. There was only one relevant data source found that
had the needed information, which was the PredictHQ API (PredictHQ, 2023a). The PredictHQ
API aggregates and enriches event data from various sources to provide a comprehensive and up-
to-date view of events happening around the world. It includes information such as event type,
location, time, duration, and attendance. Unfortunately, it is only possible to retrieve the historic
events data for free if they are less than a year in the past, counting from the moment of the API
call, so it created the need for having a more recent traffic dataset, as previously explained.
The choice of events to retrieve from the API was based on the following criteria:
• The location of the event needs to be within a 20 km radius of the road network;
• The event’s occurrence needs to be between the recorded time period of the traffic dataset;
• The attendance should be of at least 1000 people, this was deemed a reasonable threshold
between small gatherings and events that have the potential to significantly affect the way
people usually move in the city;
• The event belongs to one of the following categories: conferences, expos, concerts, festi-
vals, performing arts, sports, community, public holidays, school holidays, severe weather
and disasters. The excluded categories were non-attendance-based events, such as daylight
savings, which were considered to not have an impact on mobility.
[Link] Pre-processing
In terms of pre-processing, the data provided by the API did not need much work to be ready for
use in this work, so the pre-processing procedure was simple. One of the required steps consisted
in removing the variables that would not be relevant to this use case. The event records directly
retrieved contained 32 different fields (PredictHQ, 2023b), with information such as if the event
was private or public, if it had been rescheduled or postponed when it was announced, and many
others that don’t need to be included, so they were removed. In addition, the location of the event
was originally expressed as a complex datatype with several sub-fields, so it was changed in order
5.2 Data Retrieval and Pre-processing 35
to only have two separate fields of longitude and latitude. Table 5.4 shows the fields and respective
descriptions of the processed dataset. After these pre-processing steps, the final events dataset for
PeMS-BAY-2years is comprised of 394 events, with the category distribution as shown in Figure
5.4.
Table 5.4: Description of each field of the Events dataset
To allow other researchers to follow the same procedure and generate events datasets to use with
loop-sensor data for traffic prediction tasks, a command line tool that can generate events datasets
with the characteristics as described above for any location and time period, as long as they are
from less than one year in the past from the moment the API call is being made, was created. It
retrieves the data from the PredictHQ API, processes it and saves it to a CSV file.
To use the tool, there’s the need to input several parameters, such as valid PredictHQ API
client ID, API secret and access token, the latitude and longitude of the ’epicentre’ for which to
generate the data, as well as the start and end dates for the data in and finally the name for the
output CSV file. The usage is as follows:
would build a dataset with events in New York City within a radius of 10 kilometres between
March 1, 2023 and March 31, 2023, and export the results to a file called nyc_events.csv
• A csv file that represents the speed values per timestamp in every sensor, with the structure
shown in Fig.5.2, henceforth called traffic data;
• A csv file with information about the coordinates of the sensors, with the structure as shown
in Figure 5.5.
Given these files, it is possible to generate the input data for the GNN models for any given
road network and time period by following the procedures shown in Fig.5.6, which will be further
described in the next subsections.
5.3 Graph Dataset Generation 37
It is important to note that this data generation phase is adapted to add the additional features
of handling missing data and external data incorporation, which will be described in Section 5.5
and Section 5.6, respectively.
Having the information from the sensors’ location is not enough to extract all the spatial infor-
mation of the road network since the road structure and direction of traffic flow also play a part.
Having an adjacency matrix that represents the distances between the sensors is needed. For
PeMS-BAY and METR-LA datasets, the distance between sensors was made available by the de-
velopers of the DCRNN approach in their GitHub repository (Li, Mensi, & Yu, 2023). However,
for the VCI dataset, it was not available, so there was a need to create it. It may seem trivial to ob-
tain these distances by calculating them from the distance between the coordinates, but this would
be a very rough estimation of the true adjacency of the road distance (i.e. the distance that a car
would need to travel from an origin to a destination) between the sensors. So, an alternative was
found by using Openrouteservice5 , which is a free-of-charge and open-source API, which pro-
vides road distances between points. Like this, it was possible to create a way to build adjacency
matrices for this particular VCI dataset and, in general, for other datasets that might be used in the
future.
Having the road distance between sensors, it is now possible to calculate the weight of the graph’s
edges. Instead of using the raw distance values between sensors to fill the adjacency matrix, this
work follows the technique used in DCRNN(Li et al., 2018) and STGCN(B. Yu et al., 2018), in
which the weighted adjacency matrix W is filled using the Thresholded Gaussian Kernel function
(Shuman, Narang, Frossard, Ortega, & Vandergheynst, 2013):
5 [Link]
Implementation 38
exp − [dist(vi ,v2 j )]2
2σ
if dist(vi , v j ) ≤ κ
Wvi ,v j = (5.1)
0 otherwise
where Wvi ,v j is the edge weight between sensor vi and sensor v j , dist (vi , v j ) represents the road
network distance from vi to v j , σ is the standard deviation of distances and κ is a threshold for
sparsity, meaning that sensors with a weight lower than the threshold are set to 0. This is especially
helpful for bigger networks where sensors very far from each other have almost no interaction,
and therefore their connection can be ignored by the network model to speed up calculations. The
threshold k was set to 0.1 as it was done by the approaches mentioned above.
• Time in day: every five-minute interval of the day gets a different value between 0 and 1
sequentially, so the same time on different days shares this feature to try to capture daily
patterns.
• Day in week: all records that are from the same day of the week get the same value to try to
capture weekly patterns.
• Hour in day: all records that are from the same hour of the day get the same value.
• Is weekend: all records that belong to weekends get 1 and others get 0
The goal here is to train the models with different combinations of these features to see if
they help the results. Here only temporal features extractable from the traffic dataset itself were
explained. Data from other sources are explored in Section 5.6.
5.3 Graph Dataset Generation 39
5.3.6 Dataloader
After having the data split, the last step is to use PyTorch’s ’DataLoader’ method to divide inputs
and targets into batches of a specified size, allowing for more efficient processing. It makes an it-
erable object from the batches of inputs and targets, allowing iteration over the batches for training
and test loops.
The data in this form, together with the adjacency matrix generated, are what is fed the model
to obtain future predictions.
5.4 Base Graph Neural Network Architecture 41
5.4.1 Layers
The models explored in this work follow the structure illustrated in Fig.5.7. The data goes through
N Spatio-temporal blocks made up of a spatial graph mechanism followed by a temporal mech-
anism. Several mechanisms were tried, as enumerated in Table 5.5. The number of blocks is
a parameter that was explored in the empirical experiments. After the spatio-temporal blocks,
the data goes through a fully connected layer, followed by ReLU activation, to result in the final
network output.
Table 5.5: Enumeration of the temporal and spatial mechanisms that were explored
This part of the network is responsible for spatial dependency modelling, capturing the relationship
between neighbouring sensors in the network.
The mechanisms that were explored are the following:
Implementation 42
• GAT - Graph Attention Network: GAT leverages attention mechanisms to model the rela-
tionships between nodes in a graph. GAT was introduced in (Velickovic et al., 2017) to cap-
ture the importance of neighbouring nodes when aggregating information during message-
passing iterations. In GAT, each node in the graph is associated with a learnable attention
mechanism. Attention allows the model to assign different weights to the neighbouring
nodes based on their importance in the context of the target node. This mechanism works
with multi-head attention, where k different mechanisms run in parallel, originating k inde-
pendent outputs that are then concatenated and linearly transformed. This parameter k is set
to 3 in this work, as was done in the original paper.
• GCN - Graph Convolution Network: The graph convolution operation was introduced
in (Kipf & Welling, 2017). It updates a node’s features by aggregating its neighbouring
nodes’ features. This process is repeated for several layers, each time considering a larger
neighbourhood, which allows the network to learn more global features of the graph.
• ChebConv - Chebyshev Spectral Graph Convolution: This graph operation was intro-
duced in (Defferrard, Bresson, & Vandergheynst, 2016), and it is a variation of GCN de-
signed to approximate spectral graph convolutions by using Chebyshev polynomials. It
operates in the graphs’ spectral domain, leveraging the Laplacian matrix’s eigenvalues and
eigenvectors. By truncating the expansion of the Chebyshev polynomials, it efficiently com-
putes convolution on graph signals without the need for costly eigenvalue decomposition.
This operator allows for the propagation of information and capturing of spatial dependen-
cies in graph data.
5.4 Base Graph Neural Network Architecture 43
This part of the network is responsible for temporal dependency modelling, capturing the influence
of past sensor readings with more recent ones.
The mechanisms that were explored are the following:
• Gated Recurrent Unit: The most commonly used neural network model for processing
sequence data (i.e. temporal data) are RNNs. However, traditional RNNs have long-term
prediction limitations due to gradient disappearance and explosion. To address these prob-
lems, variants of RNNs such as the LSTM and GRU have been developed. As LSTM has a
more complex structure and longer training time, while the GRU model has a simpler struc-
ture and faster training ability, the GRU model is chosen to obtain temporal dependence
from traffic data in some works, such as (Li et al., 2018) (L. Zhao et al., 2020). The GRU
model uses a hidden state from the previous time step and the current traffic information
as inputs to predict the traffic status at the current time step. This allows the model to cap-
ture the current traffic information while retaining the changing trends of historical traffic
information and capturing temporal dependence.
• CNN: Despite RNNs being more commonly used in temporal data, CNNs are also used to
model this part of the data in works such as (B. Yu et al., 2018) due to their faster training
times, simpler structures, and lack of dependency on previous steps. Inspired by (Gehring,
Auli, Grangier, Yarats, & Dauphin, 2017) the authors propose a method that employs en-
tire convolutional structures on the time axis to capture the dynamic behaviours of traffic
flows. This design allows for parallel and controllable training procedures. The temporal
convolutional layer comprises a one-dimensional causal convolution with a width-Kt kernel,
followed by Gated Linear Units (GLU) for non-linearity. This process essentially maps the
input (which could be seen as a multi-channel time series) to a single output element via a
convolutional kernel.
• Gated Recurrent Unit + Attention: This mechanism was used in (Zhu, Song, Zhao, &
Li, 2020), has an extension of their previous work (L. Zhao et al., 2020), that after passing
through GRU, the hidden states are inputted into an attention model to determine the context
vector that covers the global traffic variation information in time.
• CNN + Attention: This mechanism was introduced in (Guo et al., 2019), and it consists of
combining a CNN with an attention mechanism to capture the temporal dependencies.
• Learning Rate: The learning rate is a crucial hyperparameter determining the step size
during optimization. A suitable learning rate can significantly improve the model’s conver-
gence speed and overall performance. However, selecting an appropriate learning rate can
be challenging, as it often depends on the problem domain, data distribution, and model ar-
chitecture. While a larger learning rate leads to a premature, sub-optimal solution, a smaller
one will converge the optimisation process slowly (K. Zhou, Liu, Duan, & Hu, 2022). Al-
ready existing approaches used either 0.01 or 0.001 as a base learning rate. In this work,
the base learning rate was set to 0.01 on the first tests done, but on top of those, some other
experiments with other learning rates, including 0.001, were also carried out.
• Training Epochs: The number of training epochs is a key hyperparameter that determines
how many times the model will update its parameters based on the training data. Balancing
the number of epochs is essential to avoid underfitting, where the model fails to capture the
underlying patterns in the data, and overfitting, where the model becomes overly specialized
to the training data and performs poorly on unseen instances. Recent GNN studies (Li et
al., 2018)(B. Yu et al., 2018)(Guo et al., 2019)(L. Zhao et al., 2020) indicate the use of 100
epochs for training their GNN models, so the same is applied in this work.
An important detail to note is that the model that gets saved from training to be applied to
the test data and evaluated is not the model that results from the last epoch. but rather the
model from the epoch that got the lowest loss in the validation data. This means that in the
training loop the current validation loss is always checked against the best until that point ,
and if it is even lower, the model gets saved.
• Loss Function: Most GNN works use MSE (Li et al., 2018) (B. Yu et al., 2018) (T. Yu et
al., 2019) loss, so this was also applied in this work. It is important to note that since these
works don’t do any missing data pre-processing when calculating the loss function, they use
a variation of the original definitions, where the loss is only calculated based on non-missing
values by applying a mask to null values. These are often called masked loss functions.
• Optimizer: The optimizer is the algorithm that adjusts the network’s weights and biases
to minimize the loss function. It does this by computing the gradients of the loss function
according to the parameters and updating them iteratively. The optimizer’s goal is to find
the best parameter values that reduce the loss, improving the network’s predictions(Lau,
2017). Previous works with GNN mostly use the Adam optimizer. In addition to Adam,
5.5 Handling Missing Data 45
other optimizers, AdamW and AdaGrad, will be used in the experiments to see if they can
produce better results.
• Train, Validation and Test proportions: This work used 80% of data for training, 10% for
validation and 20% for testing.
• Number of spatio-temporal blocks: Other techniques that use a similar consecutive spatio-
temporal blocks approach show better results with 2 blocks (Guo et al., 2019) (L. Zhao et
al., 2020), so this was the number used in the experiments of this work.
affected by missing values than the others. METR-LA, despite not having blocks of null values in
its sensors, has a significant amount of sparse zeros, more than in the VCI dataset.
Figure 5.8: Types of occurring missing data in the sensors (VCI dataset)
network correlations) and each sensor individually can be considered a time series of traffic speed
records, the techniques chosen to apply should be ones used in time series data.
[Link] Mean
This is a simple and commonly used technique for handling missing data. The missing values are
replaced with the mean value of the sensor readings. Used with time series data it has limitations
since it does not account for temporal dependencies between sensor readings. The plotted values
of the imputation in a sample of data from the VCI dataset can be seen in Fig.5.9a for a block of
missing values and Fig.5.9b for sparse missing data.
Figure 5.9: Comparison between original data and after mean imputation in VCI
[Link] Last Observation Carried Forward and Next Observation Carried Backward
Last Observation Carried Forward (LOCF) and Next Observation Carried Backward (NOCB) are
two distinct imputation techniques used in time series data. LOCF uses the last observed value
to fill in missing values that occur after it and NOCB uses the next observations to fill in missing
values that occur before it. In this case, they were used together, LOCF before NOCB to fill the
missing values since there can be values missing at the beginning and end of the dataset that would
be left missing if only one of the techniques were used. The choice of performing LOCF before
NOCB comes from past values having influence over future values, so there was the desire to keep
that relationship whenever possible. The plotted values of this imputation in a sample of data from
Implementation 48
the VCI dataset can be seen in Fig.5.10a for a block of missing values and Fig.5.10b for sparse
missing data.
Figure 5.10: Comparison between original data and after LOCF + NOCB imputation in VCI
It assumes a linear relationship between the observed values before and after the missing value
and estimates the missing value based on this assumption. The plotted values of this imputation in
a sample of data from the VCI dataset can be seen in Fig.5.11a for a block of missing values and
Fig.5.11b for sparse missing data.
MICE starts by calculating the column-wise mean for any columns containing missing values and
replaces those missing values with the mean. It then proceeds to perform a series of chained
regression models to impute each missing value iteratively (Ismiguzel, 2022). Similar to standard
regression, MICE utilizes a feature matrix and a target variable for training. The plotted values of
this imputation in a sample of data from the VCI dataset can be seen in Fig.5.12a for a block of
missing values and Fig.5.12b for sparse missing data.
5.5 Handling Missing Data 49
Figure 5.11: Comparison between original data and after interpolation imputation in VCI
• Sparse 4%: Only sparse missing values randomly selected at 4% missing rate for the subset;
• Sparse 8%: Only sparse missing values randomly selected at 8% missing rate for the subset;
• Block I: Only block missing values, randomly selecting two sensors, removing one month’s
worth of data from one at a random location and from the other removing two weeks also at
a random location;
• Block II: Only block missing values, randomly selecting two sensors, removing two month’s
worth of data from one at a random location and from the other removing one month also at
a random location;
Implementation 50
Figure 5.12: Comparison between original data and after MICE imputation in VCI
Then, the performance of each imputation technique in the different missing data configura-
tions was evaluated using MAE and RMSE metrics. To remove possible bias related to the random
choice of the missing data, the metrics were generated by averaging over five runs with different
randomly generated missing data. The results can be seen in Table 5.7, and they show that for
sparse missing data using interpolation produces the best results, but for blocks of missing data,
the MICE approach produces smaller errors. In response to this, a hybrid between these two ap-
proaches that uses MICE for blocks and interpolation for sparse missing values was created in it
produced the best results for when the both types of missing data exist.
• precip_rate + visibility
Implementation 52
Figure 5.14: Comparison between the traffic in a bad weather day and a good weather day
• precip_rate + wind_spd
This list is not by any means the full exhaustion of all the possible combinations of features,
but given the constraint in terms of training time available, it was deemed a reasonable set of
possibilities concerning the most promising features. The results of this inclusion can be seen in
Section 6.3.
5.6 External Data Incorporation 53
Figure 5.16: Comparison between the traffic during a match day in Estádio do Dragão and a
regular day
Unfortunately, given time restrictions related to the computational time and power required to
run the models, it was not possible to carry out this exploration. This means that the PeMS-BAY-
2years dataset and the corresponding retrieved event dataset were not used. However, given that
the datasets were retrieved and processed, it made sense to keep the mentions to them since they
can be used in future research on the topic.
Chapter 6
Empirical Evaluation
In this chapter, the empirical experiences carried out will be shown and discussed. Given their
big computational time requirements, different experiments ran on different machines at the same
time to economize time. It was relevant to register the training time of the different models, but
this became challenging when using different machines for different experiences since they had
slightly different performances. The compromise that was deemed adequate for this work was,
since there isn’t a big discrepancy in the time of different epochs of the same model, to run 10
epochs of each model all on the same machine and register the average of those epochs. During
those runs, there was an effort not to have other processes running on the machine so as to have
execution time comparisons as reliable as possible. The machines had the same hardware speci-
fications, which were the following: Intel core i7-8700K @3.70GHz processor, 32 GB of RAM
and an Nvidia GeForce GTX 1080 graphics card running on Windows 10. The source code was
developed in Python 3.10, with PyTorch 2.0, running with CUDA 11.8. Ensuring similar results
will only be possible by having these hardware specifications and library versions. To simplify
the identification of the different models explored, they were named in this chapter following the
components of their network structure, as described in Table 6.1.
Table 6.1: Identification of the names used for the different GNN arquitectures
Name Network
GAT-GRU GAT as spatial mechanism and GRU as temporal mechanism
GAT-CNN GAT as spatial mechanism and CNN as temporal mechanism
GAT-GRUAtt GAT as spatial mechanism and GRU+Attention as temporal mechanism
GAT-CNNAtt GAT as spatial mechanism and CNN+Attention as temporal mechanism
GCN-GRU GCN as spatial mechanism and CCN as temporal mechanism
GCN-CNN GCN as spatial mechanism and CNN as temporal mechanism
GCN-GRUAtt GCN as spatial mechanism and GRU+Attention as temporal mechanism
GCN-CNNAtt GCN as spatial mechanism and CNN+Attention as temporal mechanism
Cheb-GRU ChebConv as spatial mechanism and CCN as temporal mechanism
Cheb-CNN ChebConv as spatial mechanism and CNN as temporal mechanism
Cheb-GRUAtt ChebConv as spatial mechanism and GRU+Attention as temporal mechanism
Cheb-CNNAtt ChebConv as spatial mechanism and CNN+Attention as temporal mechanism
55
Empirical Evaluation 56
• Optimizer: Adam
• Historical Features: 12
• Prediction Horizons: 12
Table 6.2 shows the performance of all the explored models in regard to the PeMS-BAY
dataset. It shows the values of MAE, MAPE and RMSE in 3 prediction horizons: 15, 30 and
60 minutes, as well as the average time each epoch takes to train in seconds. It is important to note
that the models generate predictions for 12 horizons, but only these three are shown since having
them all displayed would make the results very dense and hard to understand. This principle will
be followed throughout the tables of this Chapter.
One thing that all models have in common results-wise is that the predictions get progressively
worse for bigger horizons, which is in line with what is expected since the further away in time
we are, the more difficult it is to make good predictions.
Comparing the results of the same temporal mechanism used with different spatial mecha-
nisms, it is noticeable that GCN produces the worst results out of the three mechanisms, which
is in line with the fact that this technique is less advanced and was then improved with Cheby-
shev polynomials to become GhebConv. Comparing ChebConv with GAT is not as clear. The
results show similar performances across almost all temporal mechanisms. There is the exception
of Cheb-GRUAtt, which significantly outperforms the other ChebConv-based models.
In terms of comparing the results of the same spatial mechanism used with different tempo-
ral mechanisms, the temporal mechanisms that have temporal attention (GRUAtt and CNNatt)
significantly outperform the ones without attention, so it can be said that having the attention
helps to extract the most relevant time dependencies. In terms of choosing between GRUAtt and
CNNAtt, GRUAtt tends to have better results but not by a big difference, with the exception of
GAT-CNNAtt, that is better than GAT-GRUAtt.
Now looking into other datasets, Table 6.3 shows the performance of all the explored models
in the METR-LA dataset.
Table 6.3: Comparison of the different models on the METR-LA dataset
Again, the predictions get progressively worse for bigger horizons, as in PeMS-BAY. The
performance of the models in METR-LA is worse than in PeMS-BAY, in general across all models.
The performance of the temporal and spatial mechanisms follows the tendencies seen in PeMS-
BAY.
Table 6.4 shows the performance of all the explored models in the VCI dataset.
Again, the performance of the different mechanisms follows the trend of the previous two
datasets.
Comparing the results across the three datasets, some conclusions can be extracted:
• The different models show the same relative performance against each other (i.e. if a model
performs better than some other model in one of the datasets it also performs better in the
Empirical Evaluation 58
other datasets) on the different datasets, which makes it possible to consider that they are
capable of being applied to different use cases.
• The results are overall better in PeMS-BAY, followed by VCI and then METR-LA. This
could have to do with the ratios of missing value occurrence in each dataset, given that
PeMS-BAY has a very small amount of missing values, then VCI sits in the middle, and
METR-LA has the most.
• In terms of training time, the models have the same relative performance across all datasets,
but the same model in different datasets takes different times. This makes sense since the
datasets have different dimensions, both in sensor number and duration of the recorded
period. Models take the most time to run on PeMS-BAY, followed by VCI and then METR-
LA.
• Models using GRU as the temporal mechanism tend to be slower than the ones using CNN,
which goes in line with the theory behind the two architectures.
• Models with ChebConv as their spatial mechanism are slower than those with GAT or GCN.
PeMS-BAY is the largest dataset in the number of sensors (it has 325) and has the same
recorded period as VCI (6 months), but VCI has only 26 sensors. METR-LA has 206
sensors and 4 months of recorded period. This gives the indication that the recorded period
length has more impact than the number of sensors. Otherwise, the models would run faster
in VCI than in METR-LA.
From now on, as they showed better performance in all datasets and for all prediction times,
the Cheb-GRUAtt and GAT-CNNAtt models will be the ones in which the following experiments
will be performed.
6.1 Base Graph Neural Network Architecture 59
For this experiment, the settings, apart from the optimizer, were the following:
• Historical Features: 12
• Prediction Horizons: 12
Tables 6.5, 6.6, and 6.7 contain the results of the different optimizers in the PeMS-BAY,
METR-LA and VCI datasets, respectively, for the Cheb-GRUAtt model. In regards to this model,
in METR-LA, the optimizer with the best results is Adam across all metrics and prediction hori-
zons. In PeMS-BAY, this only verifies on 30 and 60-minute horizons, and for 15 Adagrad gets
better results. In the case of VCI, the results are not as clear: for a 15-minute horizon, Adagrad is
better, but for 30 and 60-minute horizons the results are very similar across all optimizers.
In terms of the time needed for training, the difference between the optimizers is minimal.
Given that, for two of the datasets (METR-LA and PeMS-BAY), Adam is better, and in VCI
the difference isn’t very significant, the following experiences with this model will be performed
using the Adam optimizer.
Table 6.5: Comparison of the different optimizers used with the Cheb-GRUAtt on the PeMS-BAY
dataset
Now, for the GAT-CNNAtt model, Tables 6.8, 6.9, and 6.10 contain the results of the different
optimizers in the PeMS-BAY, METR-LA and VCI datasets, respectively. Here the predominance
of better results with Adam that was seen on the previous model does not verify. In METR-LA
Empirical Evaluation 60
Table 6.6: Comparison of the different optimizers used with the Cheb-GRUAtt on the METR-LA
dataset
Table 6.7: Comparison of the different optimizers used with the Cheb-GRUAtt on the VCI dataset
Table 6.8: Comparison of the different optimizers used with the GAT-CNNAtt on the PeMS-BAY
dataset
Table 6.9: Comparison of the different optimizers used with the GAT-CNNAtt on the METR-LA
dataset
Table 6.10: Comparison of the different optimizers used with the GAT-CNNAtt on the VCI dataset
and PeMS-BAY, Adagrad is better across all metrics and horizons. This does not hold for VCI,
where Adam performs better.
Comparing the results of the two models, it is not possible to conclude that one particular
optimizer is better than the other for this problem, given that the two models work better with
different optimizers. One conclusion that is possible to extract is that the optimizers don’t have a
significant influence on the training time. It is also possible to observe that the results in VCI tend
to work differently than for the other datasets. This dataset is a bit more distinct from the others,
having a smaller network of sensors that are spaced more apart, which could explain the different
6.1 Base Graph Neural Network Architecture 61
For this experiment, the settings, apart from the learning rate, were the following:
• Optimizer: Adam (for the Cheb-GRUAtt model) / Adagrad (for the GAT-CNNAtt model)
• Historical Features: 12
• Prediction Horizons: 12
The goal here was to test learning rates smaller and bigger than 0.01 (the value used on the
previous tests). Given the expected behaviour of experimenting with learning rates, starting with
a bigger learning rate (that moves too fast and skips the optimal weights), the results would get
progressively better until a ’sweet spot’ is reached, where the optimal weights can be reached
within the given epochs, and after that, the results would get worse again, since more epochs would
be needed to converge. Following this principle, the procedure followed was to try progressively
lower learning rates until the results started deteriorating.
Table 6.11 has the results of training the Cheb-GRUAtt model with different learning rates
over 100 epochs on PeMS-BAY. It is possible to see that the learning rates 0.001 and 0.0005 have
the best results, with 0.001 beating 0.0005 in the 15-minute horizon but 0.0005 is better for the
bigger horizons.
Table 6.11: Results of different learning rates with 100 epochs for Cheb-GRUAtt on PeMS-BAY
Learning Nr of 15 min 30 min 60 min Avg. time
Rate Epochs MAE MAPE RMSE MAE MAPE RMSE MAE MAPE RMSE p/ Epoch (s)
0.020 1.842 3.963 3.403 2.502 5.643 4.630 3.268 7.915 5.972 131.674
0.015 1.825 3.964 3.373 2.392 5.490 4.545 3.195 7.698 5.857 130.322
0.010 1.869 3.920 3.341 2.406 5.428 4.495 3.077 7.463 5.736 131.120
0.005 1.789 3.793 3.295 2.345 5.369 4.467 3.069 7.420 5.687 132.002
100
0.0025 1.709 3.695 3.248 2.274 5.286 4.420 3.004 7.328 5.638 131.107
0.001 1.673 3.659 3.236 2.275 5.256 4.419 3.029 7.319 5.660 130.992
0.0005 1.678 3.673 3.241 2.266 5.231 4.425 2.979 7.214 5.647 130.717
0.00025 1.687 3.667 3.247 2.270 5.242 4.434 2.985 7.285 5.669 131.203
For METR-LA, the results are in Table 6.12 and don’t show such a clear picture, with 0.005
being better.
Table 6.13 contains the results in the VCI dataset, where it is possible to see that the best
learning rate across all horizons and metrics was the smallest one tested, 0.00025, with results
getting better and better the lower the learning rate is. This could indicate that an even smaller
Empirical Evaluation 62
Table 6.12: Results of different learning rates with 100 epochs for Cheb-GRUAtt on METR-LA
Learning Nr of 15 min 30 min 60 min Avg. time
Rate Epochs MAE MAPE RMSE MAE MAPE RMSE MAE MAPE RMSE p/ Epoch (s)
0.020 2.908 7.907 5.480 3.548 10.456 6.721 4.483 14.190 8.241 83.135
0.015 2.916 7.911 6.732 3.547 10.396 6.732 4.467 14.112 8.248 82.854
0.010 2.995 8.201 5.621 3.622 10.748 6.876 4.586 14.770 8.456 82.450
0.005 3.139 7.907 5.493 3.532 10.350 6.728 4.408 14.028 8.261 83.123
100
0.0025 2.900 7.922 5.001 3.533 10.438 6.750 4.477 14.206 8.258 83.637
0.001 2.902 7.944 5.496 3.544 10.434 6.740 4.493 14.136 8.237 82.500
0.0005 2.877 7.832 5.500 3.535 10.349 6.748 4.523 14.159 8.266 83.439
0.00025 2.895 7.825 5.508 3.593 10.457 6.781 4.646 14.504 8.334 83.127
value could still provide better results, but since in the other datasets, this small learning rate got
worse results than some bigger ones, the tests stopped at 0.00025.
By observing the results on the three datasets, it is possible to say that they follow the same
tendencies but not exactly the same results, not being able to pinpoint one learning rate that sys-
tematically is better in all three datasets. Since the goal of this work is to try to generalize the
process to fit different datasets, it was decided that 0.0005, which was the best performing on
PeMS-BAY but also worked well with the two others, would now be used in all the remaining
tests.
Table 6.13: Results of different learning rates with 100 epochs for Cheb-GRUAtt on VCI
Learning Nr of 15 min 30 min 60 min Avg. time
Rate Epochs MAE MAPE RMSE MAE MAPE RMSE MAE MAPE RMSE p/ Epoch (s)
0.020 4.094 6.968 6.842 4.687 8.773 8.091 5.572 11.215 9.696 120.065
0.015 4.107 7.174 6.853 4.686 8.870 8.082 5.534 11.062 9.624 119.645
0.010 4.017 6.750 6.677 4.601 8.453 7.892 5.451 10.668 9.450 122.710
0.005 3.893 6.560 6.574 4.471 8.227 7.754 5.309 10.439 9.223 120.602
100
0.0025 3.845 6.399 6.493 4.452 8.191 7.703 5.274 10.432 9.167 120.749
0.001 3.835 6.554 6.493 4.425 8.237 7.697 5.229 10.353 9.148 121.689
0.0005 3.800 6.452 6.458 4.397 8.180 7.658 5.210 10.399 9.151 120.941
0.00025 3.791 6.394 6.445 4.375 8.076 7.621 5.181 10.285 9.077 118.870
Now concerning the other model, GAT-CNNAtt, the result in PeMS-BAY, METR-LA and VCI
datasets can be seen, respectively, in Tables 6.14, 6.15 and 6.16. For PeMS-BAY, the best learning
rate of the ones tested is 0.05, for METR-LA it is 0.015 and for VCI the results are not as clear
between 0.020 and 0.015. As it happenned with the other model, different datasets have better
results with different learning rate, but 0.015 was the one chosen to continue the next tests in this
model.
In terms of the effect the learning rate has on the training time, it does not give a noticeable
influence.
For this experiment, the settings, apart from the learning rate, were the following:
Table 6.14: Results of different learning rates and with 100 epochs for GAT-CNNAtt on PeMS-
BAY
Learning Nr of 15 min 30 min 60 min Avg. time
Rate Epochs MAE MAPE RMSE MAE MAPE RMSE MAE MAPE RMSE p/ Epoch (s)
0.020 2.642 6.364 5.071 2.741 6.654 5.247 3.094 7.173 5.620 121.193
0.015 2.751 6.567 5.211 2.849 6.766 5.359 3.117 7.393 5.754 121.505
0.010 100 2.715 6.293 5.085 2.834 6.531 5.246 3.1525 7.276 5.735 120.810
0.005 2.602 6.172 4.933 2.727 6.447 5.123 3.009 7.170 5.594 121.225
0.0025 2.921 6.803 5.256 3.020 7.041 5.401 3.327 7.797 5.841 121.505
Table 6.15: Results of different learning rates and with 100 epochs for GAT-CNNAtt on METR-
LA
Learning Nr of 15 min 30 min 60 min Avg. time
Rate Epochs MAE MAPE RMSE MAE MAPE RMSE MAE MAPE RMSE p/ Epoch (s)
0.020 4.150 13.000 7.582 4.337 13.789 7.950 4.762 15.578 8.691 54.899
0.015 3.683 10.872 6.687 3.968 12.068 7.301 4.530 14.435 8.344 55.184
0.010 100 3.840 11.725 6.957 4.606 12.764 7.496 4.670 14.983 8.477 55.490
0.005 4.318 13.233 7.733 4.572 14.380 8.281 5.063 16.213 9.130 55.062
0.0025 4.541 14.630 8.012 4.781 15.567 8.421 5.193 17.069 9.088 54.851
Table 6.16: Results of different learning rates with 100 epochs for GAT-CNNAtt on VCI
Learning Nr of 15 min 30 min 60 min Avg. time
Rate Epochs MAE MAPE RMSE MAE MAPE RMSE MAE MAPE RMSE p/ Epoch (s)
0.020 4.444 7.572 7.177 4.848 8.696 7.949 5.426 10.512 10.482 66.538
0.015 4.444 7.802 7.292 4.800 8.813 8.006 5.437 10.461 9.176 66.919
0.010 100 4.610 8.346 7.750 4.947 9.141 8.269 5.552 10.586 9.201 64.365
0.005 5.075 9.597 8.635 5.435 10.416 9.146 5.965 11.738 9.986 66.625
0.0025 5.211 10.374 9.103 5.536 11.078 9.585 6.013 12.124 10.303 67.116
• Optimizer: Adam (for the Cheb-GRUAtt model) / Adagrad (for the GAT-CNNAtt model)
• Learning rate: 0.0005 (for the Cheb-GRUAtt model) / 0.015 (for the GAT-CNNAtt model)
• Historical Features: 12
• Prediction Horizons: 12
To make the tables more compact, abbreviations were used to represent the different combina-
tions of weather features, presented in Table 6.17.
The goal was to see if adding additional time features would benefit the results. In theory, this
could help the model to find patterns in the temporal aspect of data, i.e. between the same days
of the week, distinguish between weekdays and weekends and also between different times of the
day. it is important to note that the problem addressed is a short-term prediction, in which some
of these patterns might not have as much influence, so the results should help to answer these
questions.
Empirical Evaluation 64
Table 6.17: Abbreviations for time features to use in the experiments’ tables
Tables 6.18, 6.19 and 6.20 show the results of adding different combinations of time features
to the input data of the Cheb-GRUAtt model in PeMS-BAY, METR-LA and VCI datasets, respec-
tively.
Table 6.18: Results of Cheb-GRUAtt with the use of different combinations of time features in
PeMS-BAY
15 min 30 min 60 min Avg. time
Time Features
MAE MAPE RMSE MAE MAPE RMSE MAE MAPE RMSE p/ Epoch (s)
None 1.708 3.696 3.407 2.419 5.640 4.866 3.456 8.687 6.615 131.337
Day 1.678 3.673 3.241 2.266 5.231 4.425 2.979 7.214 5.647 130.717
Day + Week 1.688 3.705 3.230 2.233 5.207 4.375 2.909 7.109 5.533 132.326
Day + Weekend 1.692 3.651 3.234 2.255 5.190 4.389 2.931 7.086 5.568 132.010
Day + Week + Hour 1.696 3.724 3.240 2.255 5.252 4.400 2.936 7.159 5.567 133.132
All 1.692 3.679 3.233 2.667 5.230 4.388 2.946 7.119 5.558 130.956
Table 6.19: Results of Cheb-GRUAtt with the use of different combinations of time features in
METR-LA
15 min 30 min 60 min Avg. time
Time Features
MAE MAPE RMSE MAE MAPE RMSE MAE MAPE RMSE p/ Epoch (s)
None 2.850 7.720 5.565 3.554 10.477 6.924 4.695 14.867 8.666 81.829
Day 2.877 7.832 5.500 3.535 10.349 6.748 4.523 14.159 8.266 83.439
Day + Week 2.960 7.980 5.466 3.651 10.519 6.692 4.657 14.239 8.173 83.218
Day + Weekend 2.928 7.944 5.477 3.600 10.400 6.692 4.592 14.029 8.149 83.376
Day + Week + Hour 2.977 7.953 5.475 3.678 10.478 6.672 4.698 14.205 8.163 83.556
All 2.976 8.084 5.477 3.646 10.519 6.679 4.673 14.174 8.151 83.015
Table 6.20: Results of Cheb-GRUAtt with the use of different combinations of time features in
VCI
15 min 30 min 60 min Avg. time
Time Features
MAE MAPE RMSE MAE MAPE RMSE MAE MAPE RMSE p/ Epoch (s)
None 3.785 6.432 6.594 4.451 8.378 7.928 5.382 10.963 9.595 123.141
Day 3.800 6.452 6.458 4.397 8.180 7.658 5.210 10.399 9.151 120.941
Day + Week 3.857 6.540 6.525 4.468 8.307 7.771 5.268 10.510 9.302 119.704
Day + Weekend 3.931 6.584 6.598 4.520 8.266 7.777 5.316 10.409 9.230 119.653
Day + Week + Hour 3.836 6.453 6.487 4.464 8.204 7.724 5.324 10.438 9.259 120.032
All 3.963 6.600 6.568 4.571 8.238 7.780 5.397 10.497 9.329 120.362
In PeMS-BAY, it is possible to see that having no time features produces the worst results, but
in terms of the best results, it is difficult to see which combination is best. ’Time in Day + Day
in Week’ is better in most metrics, but ’Time in Day’ and ’Time in Day + Is Weekend’ perform
6.1 Base Graph Neural Network Architecture 65
better in others. In METR-LA, having no time features performs better for the smallest prediction
horizon, but then it starts getting worse than other alternatives. ’Time in Day’ is better in some
cases, but in others, it is ’Time in Day + Is Weekend’. In the case of VCI, having no time features
works better in the smallest prediction horizon, but in the two other horizons, ’Time in Day’ is
better.
It is worth noting that the differences between the results are very small, with differences
sometimes only on the second and third decimal cases, giving them little significance.
The choice here was to continue using ’Time in Day’ as the only time feature with this model.
Tables 6.21, 6.22 and 6.23 show the results of adding different combinations of time features
to the input data of GAT-CNNAtt in PeMS-BAY, METR-LA and VCI datasets, respectively.
Table 6.21: Results of GAT-CNNAtt with the use of different combinations of time features in
PeMS-BAY
15 min 30 min 60 min Avg. time
Time Features
MAE MAPE RMSE MAE MAPE RMSE MAE MAPE RMSE p/ Epoch (s)
None 2.769 6.303 5.281 2.901 6.581 5.497 3.281 7.463 6.035 120.005
Day 2.751 6.567 5.211 2.849 6.766 5.359 3.117 7.393 5.754 121.505
Day + Week 2.687 6.463 5.185 2.805 6.813 5.390 3.089 7.645 5.893 123.764
Day + Weekend 2.707 6.377 5.119 2.812 6.642 5.277 3.081 7.326 5.694 123.445
Day + Week + Hour 2.746 6.561 5.208 2.878 6.881 5.426 3.186 7.609 5.883 124.842
All 2.755 6.359 5.226 2.852 6.864 5.472 3.108 7.615 5.818 126.565
Table 6.22: Results of GAT-CNNAtt with the use of different combinations of time features in
METR-LA
15 min 30 min 60 min Avg. time
Time Features
MAE MAPE RMSE MAE MAPE RMSE MAE MAPE RMSE p/ Epoch (s)
None 4.979 16.239 9.360 5.387 18.062 9.955 6.021 20.916 10.824 53.081
Day 3.683 10.872 6.687 3.968 12.068 7.301 4.530 14.435 8.344 55.184
Day + Week 5.357 15.962 8.733 5.618 16.997 9.135 6.081 18.905 9.808 55.706
Day + Weekend 4.541 13.532 7.989 4.738 14.250 8.354 5.267 15.711 8.985 55.367
Day + Week + Hour 4.594 14.054 7.870 4.982 15.323 8.416 5.644 17.504 8.416 56.513
All 4.324 12.547 7.497 4.603 13.775 8.072 5.137 16.038 9.035 57.432
Table 6.23: Results of GAT-CNNAtt with the use of different combinations of time features in
VCI
15 min 30 min 60 min Avg. time
Time Features
MAE MAPE RMSE MAE MAPE RMSE MAE MAPE RMSE p/ Epoch (s)
None 4.220 7.670 7.174 4.625 8.873 8.057 5.351 10.776 9.421 70-147
Day 4.444 7.802 7.292 4.800 8.813 8.006 5.437 10.461 9.176 66.919
Day + Week 4.855 8.938 8.068 5.261 9.085 8.867 5.998 11.137 10.088 64.421
Day + Weekend 4.463 8.061 7.553 4.844 8.988 8.212 5.526 10.606 9.458 67.075
Day + Week + Hour 5.650 10.523 9.301 5.896 11.072 9.776 6.362 11.974 10.478 63.894
All 5.161 9.197 9.055 5.389 9.867 9.509 5.744 10.936 10.390 65.300
The influence the time features have when used with the three datasets is different. In the
case of PeMS-BAY the results are all very similar, but for the bigger horizons, the ’Time in Day
+ Day in Week’ combination produces better results. In the case of METR-LA, the results show
Empirical Evaluation 66
more difference between the different combinations, and using only ’Time in Day’ produces the
best results. For VCI, the difference between the different combinations is less noticeable than
in METR-LA but more than in PeMS-BAY. Using no time features works best for the smallest
prediction horizon, but for the bigger ones, ’Time in Day’ produces better results in terms of
MAPE and RMSE.
In terms of training time, the addition or removal of time features does not produce a significant
impact.
For this experiment, the settings, apart from the learning rate, were the following:
• Optimizer: Adam (for the Cheb-GRUAtt model) / Adagrad (for the GAT-CNNAtt model)
• Learning rate: 0.0005 (for the Cheb-GRUAtt model) / 0.015 (for the GAT-CNNAtt model)
The goal of this experiment was to see the impact that changing the number of historical fea-
tures given as input would change the results for each prediction horizon. Moreover, it was also
to see how the models perform on longer prediction, i.e predicting 90 and 120 minutes ahead. As
explained in Section 5.3.4, having, for example, ’Historic Features’ be 6 and ’Prediction Hori-
zon’ be 3, means that 6 × 5 = 30 minutes before the given timestamp were used as historical
features to predict 3 × 5 = 15 minutes into the future from the given timestamp. Tables 6.25 and
6.24 contain the results of different combinations of historical features and prediction horizons
for Cheb-GRUAtt and GAT-CNNAtt, respectively. The reason this study was only made on one
dataset, comes from the fact that training the models especially when there are more historic fea-
tures is very slow and it requires a lot of computational time to be able to extract the results to fill
the tables. METR-LA is the dataset in which the models train faster, so it was chosen as the only
one where this would be tried.
Some conclusions can be extracted from observing the tables:
• By comparing the results on the same prediction horizon, increasing the number of historical
features makes the results worse. Comparing the results of prediction up to 60 minutes (12
prediction steps), the results when using 12, 18 or 24 historical features get progressively
worse results, and this applies to the other prediction horizons.
• By comparing the use of the same number of historical features for different numbers of
prediction horizons, the results, in general, get better the more horizons are predicted.
6.1 Base Graph Neural Network Architecture 67
Table 6.24: Results of Cheb-GRUAtt with different numbers of historical features and output
predictions on METR-LA dataset
Hist. 15 min 30 min 60 min 90 min 120 min Time
Pred.
Feat. MAE MAPE RMSE MAE MAPE RMSE MAE MAPE RMSE MAE MAPE RMSE MAE MAPE RMSE (s)
3 3 2.903 7.821 5.491 - - - - - - - - - - - - 21.041
6 3 3.573 10.620 6.796 - - - - - - - - - - - - 21.011
6 6 2.893 7.795 5.480 3.555 10.490 6.746 - - - - - - - - - 41.900
12 3 4.529 14.278 8.257 - - - - - - - - - - - - 21.192
12 6 4.126 12.670 7.592 4.563 14.320 8.227 - - - - - - - - - 42.133
12 12 2.902 7.944 5.496 3.544 10.434 6.734 4.493 14.136 8.237 - - - - - - 83.439
18 3 5.287 17.046 9.192 - - - - - - - - - - - - 21.341
18 6 4.900 15.667 8.731 5.225 16.853 9.157 - - - - - - - - - 42.169
18 12 4.090 12.578 7.593 4.506 14.178 8.209 5.205 16.911 9.164 - - - - - - 83.202
18 18 2.935 7.951 5.583 3.573 10.523 6.860 4.552 14.901 8.581 5.342 17.391 9.304 - - - 123.031
24 3 5.934 19.196 9.917 - - - - - - - - - - - - 21.774
24 6 5.577 18.028 9.552 5.813 18.938 9.855 - - - - - - - - - 42.368
24 12 4.979 15.673 8.769 5.292 16.863 9.173 5.859 18.991 9.847 - - - - - - 83.283
24 18 4.143 12.765 7.634 4.567 14.345 8.255 5.267 17.357 9.332 5.817 19.049 9.894 - - - 124.000
24 24 2.951 8.058 5.561 3.608 10.523 6.797 4.725 14.687 8.474 5.426 17.367 9.395 5.908 19.206 9.976 174.071
Table 6.25: Results of GAT-CNNAtt with different numbers of historical features and output pre-
dictions on METR-LA dataset
Hist. 15 min 30 min 60 min 90 min 120 min Time
Pred.
Feat. MAE MAPE RMSE MAE MAPE RMSE MAE MAPE RMSE MAE MAPE RMSE MAE MAPE RMSE (s)
3 3 3.497 10.051 6.683 - - - - - - - - - - - - 26.187
6 3 3.810 11.996 7.163 - - - - - - - - - - - - 35.618
6 6 3.903 11.862 7.055 4.178 12.980 7.672 - - - - - - - - - 34.248
12 3 5.327 18.657 9.43 - - - - - - - - - - - - 56.874
12 6 4.723 14.691 8.652 4.951 15.500 9.022 - - - - - - - - - 55.239
12 12 3.683 10.872 6.687 3.968 12.068 7.301 4.530 14.435 8.344 - - - - - - 55.356
18 3 4.753 14.136 8.762 - - - - - - - - - - - - 77.584
18 6 4.445 13.903 8.006 4.635 14.629 8.424 - - - - - - - - - 75.963
18 12 4.883 15.808 8.858 4.983 16.190 9.067 5.318 17.458 9.612 - - - - - - 76.270
18 18 5.646 18.698 9.672 5.737 19.090 9.872 6.077 20.314 10.393 6.357 21.368 10.749 - - - 76.004
24 3 6.682 24.200 11.8739 - - - - - - - - - - - - 98.448
24 6 5.369 17.501 9.819 5.504 18.026 10.058 - - - - - - - - - 94.667
24 12 5.921 20.211 10.755 6.105 20.904 11.080 6.567 22.471 11.797 - - - - - - 96.765
24 18 5.523 17.527 9.624 5.621 18.102 9.779 5.863 19.232 10.100 6.140 20.369 6.140 - - - 96.172
24 24 6.197 20.945 10.594 6.349 21.673 10.840 6.605 22.949 11.258 6.803 23.976 11.605 6.950 24.777 11.891 96.412
• As seen in the other experiments, when observing a single row (results across different
horizons for the same configuration), the results get progressively worse as the horizons get
bigger.
• The results obtained with Cheb-GRUAtt are better than GAT-CNNAtt, across all combina-
tions in all metrics.
• When looking at the bigger horizons of 90 and 120 minutes, it is possible to see that the
results achieved by the models get very degraded, which shows that their applicability is
focused towards short-term prediction, as also pointed out by the recent literature. It is still
possible to see, however, that the Cheb-GRUAtt model has better results and shows less
degradation than the GAT-CNNAtt model.
the same. Given training time restrictions, only the Cheb-GRUAtt Model was experimented on.
The settings for this experiment were the following:
• Optimizer: Adam
• Historical Features: 12
• Prediction Horizons: 12
Table 6.26: Comparison of the results with different training dataset lengths
Table 6.26 shows the results of the different dataset lengths used for training. The results show
that increasing the dataset length to 12 months produces better results. This can come from the
fact that training with a full year allows the model to capture the full extent of the seasonal de-
pendencies. When extending the results further to 18 months, go back to being similar to training
with 6 months. Training with 24 months is slightly better than with 18 but worse than with 12
months. These results indicate that training with one year of data can help the results, but further
increasing the training length is not favourable. When looking at the training times, they show a
linear increase in time with the length increase, i.e. training with 12 months takes roughly twice
the time as training with 6 months and so on. This lack of better results with the increase in dataset
length could come from the fact that the less recent data present in those parts happened in a time
that was more affected by COVID, which impacted the typical transportation patterns.
• Optimizer: Adam
• Historical Features: 12
• Prediction Horizons: 12
Table 6.27 shows the results of using different missing data imputation techniques on the
PeMS-BAY dataset.
Table 6.27: Comparison of the results with the different missing data imputation techniques in the
PeMS-BAY dataset
15 min 30 min 60 min Avg. time
Techniques
MAE MAPE RMSE MAE MAPE RMSE MAE MAPE RMSE per Epoch (s)
No imputation 1.678 3.673 3.241 2.266 5.231 4.425 2.979 7.214 5.647 130.717
Mean 1.678 3.680 3.242 2.273 5.253 4.426 2.998 7.248 5.647 131.921
LOCF+NOCB 1.678 3.680 3.243 2.265 5.262 4.426 2.997 7.249 5.647 131.257
Interpolation 1.678 3.679 3.242 2.273 5.253 4.426 2.997 7.247 5.647 131.678
MICE 1.688 3.705 3.260 2.288 5.289 4.459 3.042 7.404 5.728 132.002
MICE+interpolation 1.678 3.679 3.242 2.273 5.253 4.426 2.997 7.247 5.647 131.976
The results here show very little difference between the different techniques and even worsen
them in the case of MICE. Since this particular dataset has very little missing data, the impact they
have on the results is minimal, so using the missing data imputation techniques did not make a
noticeable different. This is why no results were highlighted in the table.
Table 6.28 shows the results of using different missing data imputation techniques on the
METR-LA dataset.
Table 6.28: Comparison of the results with the different missing data imputation techniques in the
METR-LA dataset
15 min 30 min 60 min Avg. time
Techniques
MAE MAPE RMSE MAE MAPE RMSE MAE MAPE RMSE per Epoch (s)
No imputation 2.877 7.832 5.500 3.535 10.349 6.748 4.523 14.159 8.266 83.439
Mean 2.925 7.486 5.503 3.547 9.538 6.318 4.446 12.481 8.006 83.196
LOCF+NOCB 2.840 7.799 5.486 3.501 10.312 6.700 4.491 14.122 8.221 83.345
Interpolation 2.862 7.665 5.417 3.504 10.004 6.626 4.464 13.500 8.115 83.542
MICE 2.861 7.598 5.531 3.583 9.674 6.679 4.490 12.672 8.054 83.630
MICE+interpolation 2.862 7.665 5.417 3.504 10.004 6.626 4.464 13.500 8.114 83.965
In this dataset, the results show a bit more difference. Using the mean, despite its simplicity
and not being the preferred technique for time series data imputation, showed the best results,
although closely followed most of the time by MICE.
Empirical Evaluation 70
Here the results of the different techniques show a bit more influence than in PeMS-BAY but
less than in METR-LA. This can stem from the fact that PeMS-BAY has the least amount of miss-
ing data, followed by VCI and then METR-LA. Having less missing data makes the imputation
used to fill them less impactful on the overall results. Given the results, it can be understood
why most of the previous works on GNNs for traffic forecasting don’t pay a lot of attention to
the missing data, given that the models, especially when their occurrence is small, can deal well
with their presence. But since the results show that more impact is noticed when the presence of
missing data is bigger, meaning that the models are not immune to missing data and can benefit
from the use of these techniques. Furthermore, this shows that when dealing with datasets where
there are even more missing data than in METR-LA, the usage of an imputation technique can
have an even more significant impact and should not be overlooked. One detail worth mentioning
is that, when comparing the imputation techniques that performed better pre-process the data for
the GNN models with the results shown in Table 5.7, where the power of imputation is evaluated
on its own, the results don’t match up. That is, the techniques that were better at generating values
closer to what they would really be, where not the ones that helped the model achieve the best
results. This is seen especially in the mean imputation, which did not perform well on its own
as imputation but when used as pre-processing for the GNN models, helped them achieve better
results.
• Optimizer: Adam
To make the tables more compact, abbreviations were used to represent the different combina-
tions of weather features, presented in Table 6.30
Table 6.30: Abbreviations for weather features to use in the experiments’ tables
Tables 6.31, 6.32 and 6.33 show the results obtained when adding the weather features to the
input of the Cheb-GRUAtt model in PeMS-BAY, METR-LA and VCI, respectively.
Table 6.31: Results with the use of different combinations of weather features in PeMS-BAY
15 min 30 min 60 min Avg. time
Weather Features
MAE MAPE RMSE MAE MAPE RMSE MAE MAPE RMSE p/ Epoch (s)
None 1.678 3.673 3.241 2.266 5.231 4.425 2.979 7.214 5.647 130.717
p+v 1.691 3.746 3.256 2.277 5.320 4.438 2.986 7.314 5.676 132.992
p+v+w 1.781 3.984 3.530 2.412 5.767 4.885 3.396 8.647 6.548 132.659
p+w 1.696 3.726 3.247 2.285 5.294 4.431 2.989 7.270 5.673 132.973
p+w+v+t 1.736 3.899 3.304 2.343 5.482 4.499 3.093 7.501 5.761 132.638
All 1.749 3.824 3.280 2.349 5.389 4.472 3.090 7.393 5.744 134.078
Table 6.32: Results with the use of different combinations of weather features in METR-LA
15 min 30 min 60 min Avg. time
Weather Features
MAE MAPE RMSE MAE MAPE RMSE MAE MAPE RMSE p/ Epoch (s)
None 2.877 7.832 5.500 3.535 10.349 6.748 4.523 14.159 8.266 83.439
p+v 2.893 7.882 5.517 3.570 10.451 6.781 4.593 14.306 8.310 83.222
p+v+w 2.907 7.932 5.507 3.569 10.444 6.756 4.590 14.307 8.280 85.292
p+w 2.891 7.886 5.493 3.572 10.433 6.752 4.583 14.306 8.295 83.863
p+w+v+t 2.899 8.035 5.512 3.575 10.587 6.772 4.567 14.353 8.306 83.414
All 2.986 8.102 5.511 3.667 10.649 6.754 4.661 14.370 8.338 83.257
Table 6.33: Results with the use of different combinations of weather features in VCI
15 min 30 min 60 min Avg. time
Weather Features
MAE MAPE RMSE MAE MAPE RMSE MAE MAPE RMSE p/ Epoch (s)
None 3.800 6.452 6.458 4.397 8.180 7.658 5.210 10.399 9.151 120.941
p+v 3.845 6.539 6.495 4.458 8.249 7.722 5.273 10.348 9.179 123.201
p+v+w 3.860 6.540 6.532 4.470 8.286 7.756 5.293 10.544 9.252 118.959
p+w 3.866 6.585 6.540 4.489 8.362 7.815 5.365 10.614 9.371 123.039
p+w+v+t 3.926 6.741 6.603 4.558 8.554 7.873 5.431 10.841 9.436 119.066
All 3.992 6.802 6.635 4.658 8.745 7.906 5.431 10.872 9.546 120.363
Empirical Evaluation 72
The results observed in the three datasets are similar in the sense that adding the features does
not improve the results and instead makes them slightly worse. The combinations that almost
have the same performance as having no features are ’precip_rate + visibility’ and ’precip_rate
+ wind_spd’. The lack of good results could mean that since the problem is a short-term traffic
prediction, these weather dependencies don’t have as much influence as in longer-term predictions.
In terms of training time required, adding these features does not add a substantial amount of
time. This shows that despite the increase in the input data size, the models take roughly the same
time to process it.
In PeMS-BAY the results of the developed models fall short of the expectations compared to
the other approaches. ChebGRUAtt has worse results than the other GNN-based approaches in
6.5 Results Discussion 73
almost all cases and even when compared to VAR in some instances. Since the results of PeMS-
BAY are much better overall than in the other datasets, the room for improvement is smaller, and
time-series techniques have results more on par with the GNN-based techniques. This can be from
the lack of missing data on this dataset or the overall quality of the data present in it. In datasets
like this, where time-series data performs well, it actually can trigger the question of whether
deep-learning-based techniques (such as GNNs) are worth employing, given that the computation
power and time required for training these models is substantial, especially when compared with
the time-series algorithms employed here.
In METR-LA however, it outperforms all time-series-based approaches in all metrics and hori-
zons, and it also outperforms STGCN in some cases. DCRNN, however still has the best results
overall. In terms of VCI, it is only possible to compare with the time-series-based approaches, and
it is significantly better than those for all metrics and horizons.
• What combinations of spatial and temporal mechanisms work best for this problem?
It was found that the combination of Chebyshev Spectral Graph Convolution and Gated
Recurrent Unit was the best model across all three datasets. It was possible to see the way
the explored models perform relative to each other was the same across all datasets, i.e. if
model X is better than model Y in dataset Z, it also is better in dataset W.
• Can the models extract helpful knowledge from adding weather features and get better re-
sults?
Their impact on the results was negative, pointing against the use of those features.
Something worth mentioning that could be seen throughout the chapter is that time constraints
impacted the number of experiments carried out and the different combinations that could have
been explored. There is the possibility that some of the combinations that were not explored could
have led to better results than the ones presented, even if the tests chosen to do were the most
theoretically promising. This is a common struggle in studying machine learning algorithms, and
despite trying to minimize its impact by using multiple machines to run the experiences, it still
posed an obstacle.
Chapter 7
The motivation behind this research stems from the increasing demand for reliable and accurate
traffic predictions in today’s fast-paced world. With the increasing number of vehicles on the
road, traffic congestion is becoming a widespread issue, leading to higher levels of pollution and
decreased mobility. To address this problem, the field of traffic forecasting has been subject to
thorough research over the years, given the complexity of the spatial and temporal dependencies
of traffic data.
Approaches to traffic forecasting include various techniques that range from traditional statis-
tical methods to Deep Learning. However, most approaches have their limitations when it comes
to handling the complex temporal and spatial dependencies of graph-like traffic data. This is where
Graph Neural Networks come into the picture. GNNs have been shown to be effective in handling
complex relationships between data and have been successfully applied to various problems.
The objective of this dissertation consisted of applying GNNs to the problem of traffic forecast-
ing and evaluating their performance against other frequently used methods, considering missing
data handling and the incorporation of external factors as additional methods to try to improve
results.
An important contribution of this work was the standardization of the process of generating
the input data for the GNN models, in order to easily allow for the training and testing in different
datasets.
The empirical evaluation showed that the explored GNNs outperform several commonly used
time-series-based models in the literature, affirming the leverage that these models can have. How-
ever, they did not outperform other GNN-based techniques existing in the literature, meaning that
some more tuning to the architectures and parameters should be done.
Another contribution of this dissertation is the study of the impact of using missing data im-
putation techniques on prediction performance. It was shown that GNNs can benefit from these
techniques, especially when used with VCI and METR-LA datasets. The research also demon-
strated that the incorporation of weather conditions does not produce a positive impact on the
results, contrary to the initial hypothesis.
75
Conclusions and Future Work 76
Moving forward, the potential applications of GNNs in Intelligent Transportation Systems are
vast, and further research is needed to explore their full potential. This work opens several direc-
tions for future work, such as a more intensive study of hyperparameters and layer architectures to
try to achieve better results.
On top of that, a broader study of sources of external data, such as social events happening
close to the road network, social media mentions of congestion or even accidents happening on
the network could be used to enrich the models.
References
Alam, I., Ahmed, M. F., Alam, M., Ulisses, J., Farid, D. M., Shatabda, S., & Rossetti, R. J. F.
(2017). Pattern mining from historical traffic big data. In 2017 ieee region 10 symposium
(tensymp) (p. 1-5). doi: 10.1109/TENCONSpring.2017.8070031
Alam, I., Farid, D. M., & Rossetti, R. J. F. (2019). The prediction of traffic flow with regres-
sion analysis. In A. Abraham, P. Dutta, J. K. Mandal, A. Bhattacharya, & S. Dutta (Eds.),
Emerging technologies in data mining and information security (pp. 661–671). Singapore:
Springer Singapore.
Alsolami, B., Mehmood, R., & Albeshri, A. (2020). Hybrid Statistical and Machine Learning
Methods for Road Traffic Prediction: A Review and Tutorial. In R. Mehmood, S. See,
I. Katib, & I. Chlamtac (Eds.), Smart Infrastructure and Applications: Foundations for
Smarter Cities and Societies (pp. 115–133). Springer International Publishing. doi: 10
.1007/978-3-030-13705-2_5
Ang, K. L.-M., Seng, J. K. P., Ngharamike, E., & Ijemaru, G. K. (2022). Emerging Technologies
for Smart Cities’ Transportation: Geo-Information, Data Analytics and Machine
Learning Approaches. ISPRS International Journal of Geo-Information, 11(2). doi: 10
.3390/ijgi11020085
Angarita-Zapata, J. S., Masegosa, A. D., & Triguero, I. (2019). A taxonomy of traffic forecasting
regression problems from a supervised learning perspective. IEEE Access, 7, 68185-68205.
doi: 10.1109/ACCESS.2019.2917228
Baldassarre, F., & Azizpour, H. (2019). Explainability techniques for graph convolutional net-
works. ArXiv, abs/1905.13686.
Barros, J., Araujo, M., & Rossetti, R. J. F. (2015, June). Short-term real-time traffic pre-
diction methods: A survey. In 2015 International Conference on Models and Tech-
nologies for Intelligent Transportation Systems (MT-ITS) (pp. 132–139). doi: 10.1109/
MTITS.2015.7223248
Bhanja, S., & Das, A. (2019, January). Impact of Data Normalization on Deep Neural Network
for Time Series Forecasting. arXiv. doi: 10.48550/arXiv.1812.05519
Bokaba, T., Doorsamy, W., & Paul, B. S. (2022, January). A Comparative Study of Ensemble
Models for Predicting Road Traffic Congestion. Applied Sciences, 12(3), 1337. doi: 10
.3390/app12031337
Bui, K.-H. N., Cho, J., & Yi, H. (2022, February). Spatial-temporal graph neural network for traffic
77
References 78
forecasting: An overview and open research issues. Applied Intelligence, 52(3), 2763–2774.
doi: 10.1007/s10489-021-02587-w
Cai, L., Zhang, Z., Yang, J., Yu, Y., Zhou, T., & Qin, J. (2019, December). A noise-immune
Kalman filter for short-term traffic flow forecasting. Physica A: Statistical Mechanics and
its Applications, 536, 122601. doi: 10.1016/[Link].2019.122601
Cao, S., Wu, L., Zhang, R., Li, J., & Wu, D. (2022, July). Capturing Local and Global
Spatial-Temporal Correlations of Spatial-Temporal Graph Data for Traffic Flow Predic-
tion. In 2022 International Joint Conference on Neural Networks (IJCNN) (pp. 1–8). doi:
10.1109/IJCNN55064.2022.9892616
Cao, W., Wang, D., Li, J., Zhou, H., Li, L., & Li, Y. (2018). BRITS: Bidirec-
tional Recurrent Imputation for Time Series. In Advances in Neural Informa-
tion Processing Systems (Vol. 31). Curran Associates, Inc. Retrieved 2023-
01-31, from [Link]
[Link]
Castro, P. S., Zhang, D., & Li, S. (2012). Urban Traffic Modelling and Prediction Using Large
Scale Taxi GPS Traces. In J. Kay, P. Lukowicz, H. Tokuda, P. Olivier, & A. Krüger (Eds.),
Pervasive Computing (pp. 57–72). Berlin, Heidelberg: Springer. doi: 10.1007/978-3-642
-31205-2_4
Che, Z., Purushotham, S., Cho, K., Sontag, D. A., & Liu, Y. (2016). Recurrent neural networks
for multivariate time series with missing values. CoRR, abs/1606.01865. Retrieved from
[Link]
Chen, C., Li, K., Teo, S. G., Zou, X., Wang, K., Wang, J., & Zeng, Z. (2019, July). Gated
Residual Recurrent Graph Neural Networks for Traffic Prediction. Proceedings of the AAAI
Conference on Artificial Intelligence, 33(01), 485–492. doi: 10.1609/aaai.v33i01.3301485
Chen, D., Yan, X., Liu, X., Li, S., Wang, L., & Tian, X. (2021). A multiscale-grid-based stacked
bidirectional gru neural network model for predicting traffic speeds of urban expressways.
IEEE Access, 9, 1321-1337. doi: 10.1109/ACCESS.2020.3034551
Chen, X., Cai, X., Liang, J., & Liu, Q. (2018). Ensemble learning multiple lssvr with improved
harmony search algorithm for short-term traffic flow forecasting. IEEE Access, 6, 9347-
9357. doi: 10.1109/ACCESS.2018.2805299
Chen, X., Chen, Y., & He, Z. (2018, March). Urban traffic speed dataset of guangzhou, china.
Zenodo. doi: 10.5281/zenodo.1205229
Cui, Z. (2023, January). Seattle Inductive Loop Detector Dataset V.1 (2015). Retrieved 2023-01-
30, from [Link]
Cui, Z., Henrickson, K. C., Ke, R., & Wang, Y. (2018). Traffic graph convolutional recurrent neural
network: A deep learning framework for network-scale traffic learning and forecasting.
IEEE Transactions on Intelligent Transportation Systems, 21, 4883-4894. doi: 10.48550/
arXiv.1802.07007
Defferrard, M., Bresson, X., & Vandergheynst, P. (2016). Convolutional neural networks on
graphs with fast localized spectral filtering. CoRR. doi: 10.48550/arXiv.1606.09375
References 79
Ge, L., Li, H., Liu, J., & Zhou, A. (2019, June). Temporal Graph Convolutional Networks for
Traffic Speed Prediction Considering External Factors. In 2019 20th IEEE International
Conference on Mobile Data Management (MDM) (pp. 234–242). doi: 10.1109/MDM.2019
.00-52
Gehring, J., Auli, M., Grangier, D., Yarats, D., & Dauphin, Y. N. (2017, July). Convolu-
tional Sequence to Sequence Learning. CoRR, 1243–1252. Retrieved 2023-06-19, from
[Link]
Guo, S., Lin, Y., Feng, N., Song, C., & Wan, H. (2019, July). Attention Based Spatial-Temporal
Graph Convolutional Networks for Traffic Flow Forecasting. Proceedings of the AAAI Con-
ference on Artificial Intelligence, 33(01), 922–929. Retrieved from [Link]
.org/[Link]/AAAI/article/view/3881 doi: 10.1609/aaai.v33i01.3301922
Hamilton, W. L. (2020). Graph Representation Learning. Cham: Springer International Publish-
ing. doi: 10.1007/978-3-031-01588-5
He, Z., Chow, C.-Y., & Zhang, J.-D. (2019, June). STCNN: A Spatio-Temporal Convolutional
Neural Network for Long-Term Traffic Prediction. In 2019 20th IEEE International Confer-
ence on Mobile Data Management (MDM) (pp. 226–233). doi: 10.1109/MDM.2019.00-53
Hu, J., Lin, X., & Wang, C. (2022, July). DSTGCN: Dynamic Spatial-Temporal Graph Convo-
lutional Network for Traffic Prediction. IEEE Sensors Journal, 22(13), 13116–13124. doi:
10.1109/JSEN.2022.3176016
Irawan, K., Yusuf, R., & Prihatmanto, A. S. (2020, December). A Survey on Traffic Flow Predic-
tion Methods. In 2020 6th International Conference on Interactive Digital Media (ICIDM)
(pp. 1–4). doi: 10.1109/ICIDM51048.2020.9339675
Ismiguzel, I. (2022, May). Imputing Missing Data with Simple and Advanced Techniques.
Retrieved 2023-06-10, from [Link]
-missing-data-with-simple-and-advanced-techniques-f5c7b157fb87
Jensen, M. (2019, August). Data Leakage and how to avoid it. Retrieved 2023-
06-12, from [Link]
-to-avoid-it/
Jiang, R., Yin, D., Wang, Z., Wang, Y., Deng, J., Liu, H., . . . Shibasaki, R. (2021). Dl-
traff: Survey and benchmark of deep learning models for urban traffic prediction. In Pro-
ceedings of the 30th acm international conference on information & knowledge manage-
ment (p. 4515–4525). New York, NY, USA: Association for Computing Machinery. doi:
10.1145/3459637.3482000
Jiang, W., & Luo, J. (2021). Big data for traffic estimation and prediction: A survey of data and
tools. CoRR, abs/2103.11824. doi: 10.48550/arXiv.2103.11824
Jiang, W., & Luo, J. (2022, November). Graph neural network for traffic forecasting: A survey.
Expert Systems with Applications, 207, 117921. doi: 10.1016/[Link].2022.117921
Karunasingha, D. S. K. (2022, March). Root mean square error or mean absolute error? Use their
ratio as well. Information Sciences, 585, 609–629. doi: 10.1016/[Link].2021.11.036
References 80
Kipf, T. N., & Welling, M. (2017, February). Semi-Supervised Classification with Graph Convo-
lutional Networks. arXiv. doi: 10.48550/arXiv.1609.02907
Kołodziej, J., Hopmann, C., Coppa, G., Grzonka, D., & Widłak, A. (2022). Intelligent Trans-
portation Systems – Models, Challenges, Security Aspects. In J. Kołodziej, M. Repetto,
& A. Duzha (Eds.), Cybersecurity of Digital Service Chains: Challenges, Methodolo-
gies, and Tools (pp. 56–82). Cham: Springer International Publishing. doi: 10.1007/
978-3-031-04036-8_3
Lau, S. (2017, August). Learning Rate Schedules and Adaptive Learning Rate Methods for
Deep Learning. Retrieved 2023-03-28, from [Link]
learning-rate-schedules-and-adaptive-learning-rate-methods-for
-deep-learning-2c8f433990d1
Laña, I., Del Ser, J., Velez, M., & Vlahogianni, E. (2018, July). Road Traffic Forecasting: Recent
Advances and New Challenges. IEEE Intelligent Transportation Systems Magazine, 10,
93–109. doi: 10.1109/MITS.2018.2806634
Lee, K., Eo, M., Jung, E., Yoon, Y., & Rhee, W. (2021). Short-term traffic prediction with deep
neural networks: A survey. IEEE Access, 9, 54739-54756. doi: 10.1109/ACCESS.2021
.3071174
Lee, K., & Rhee, W. (2022). DDP-GCN: Multi-graph convolutional network for spatiotemporal
traffic forecasting. Transportation Research Part C: Emerging Technologies, 134, 103466.
doi: [Link]
Lee, S., & b. Fambro, D. (1999). Application of subset autoregressive integrated moving average
model for short-term freeway traffic volume forecasting. Transportation Research Record,
1678, 179 - 188.
Li, Y., Mensi, F., & Yu, R. (2023, June). Diffusion Convolutional Recurrent Neural Network:
Data-Driven Traffic Forecasting. Retrieved 2023-06-06, from [Link]
liyaguang/DCRNN
Li, Y., Yu, R., Shahabi, C., & Liu, Y. (2018). Diffusion convolutional recurrent neural network:
Data-driven traffic forecasting. In International conference on learning representations. doi:
10.48550/arXiv.1707.01926
Liu, J., Wu, N., Qiao, Y., & Li, Z. (2021, 01). A scientometric review of research on traffic fore-
casting in transportation. IET Intelligent Transport Systems, 15. doi: 10.1049/itr2.12024
Liu, Z., & Zhou, J. (2020). Introduction to Graph Neural Networks. Cham: Springer International
Publishing. doi: 10.1007/978-3-031-01587-8
Lopes, J., Bento, J., Huang, E., Antoniou, C., & Ben-Akiva, M. (2010, September). Traffic and
mobility data collection for real-time applications. In 13th International IEEE Conference
on Intelligent Transportation Systems (pp. 216–223). doi: 10.1109/ITSC.2010.5625282
Lu, H., Huang, D., Song, Y., Jiang, D., Zhou, T., & Qin, J. (2020, Septembet). St-trafficnet: A
spatial-temporal deep learning network for traffic forecasting. Electronics. doi: 10.3390/
electronics9091474
Lu, J., Li, B., Li, H., & Al-Barakani, A. (2021). Expansion of city scale, traffic modes, traffic
References 81
Shi, R., & Du, L. (2022, October). Multi-Section Traffic Flow Prediction Based on MLR-LSTM
Neural Network. Sensors (Basel, Switzerland), 22(19), 7517. doi: 10.3390/s22197517
Shuman, D. I., Narang, S. K., Frossard, P., Ortega, A., & Vandergheynst, P. (2013, May). The
Emerging Field of Signal Processing on Graphs: Extending High-Dimensional Data Anal-
ysis to Networks and Other Irregular Domains. IEEE Signal Processing Magazine, 30(3),
83–98. doi: 10.1109/MSP.2012.2235192
Shuvo, M. A. R., Zubair, M., Purnota, A. T., Hossain, S., & Hossain, M. I. (2021, January).
Traffic Forecasting using Time-Series Analysis. In 2021 6th International Conference on
Inventive Computation Technologies (ICICT) (pp. 269–274). doi: 10.1109/ICICT50816
.2021.9358682
Smith, B. L., Williams, B. M., & Keith Oswald, R. (2002, August). Comparison of parametric
and nonparametric models for traffic flow forecasting. Transportation Research Part C:
Emerging Technologies, 10(4), 303–321. doi: 10.1016/S0968-090X(02)00009-8
State of California. (2023). PeMS Data Source | Caltrans. [Link]
traffic-operations/mpr/pems-source.
Sun, H., Liu, H. X., Xiao, H., He, R. R., & Ran, B. (2003). Use of local linear regression model
for short-term traffic forecasting. Transportation Research Record, 1836(1), 143-150. doi:
10.3141/1836-18
Tedjopurnomo, D. A., Bao, Z., Zheng, B., Choudhury, F. M., & Qin, A. K. (2022). A survey
on modern deep neural network for traffic prediction: Trends, methods and challenges.
IEEE Transactions on Knowledge and Data Engineering, 34(4), 1544-1561. doi: 10.1109/
TKDE.2020.3001195
United Nations, U. (2017, September). Rapid urbanisation: opportunities and challenges
to improve the well-being of societies. [Link]
rapid-urbanisation-opportunities-and-challenges-improve-well
-being-societies.
V, R., & S, G. V. (2022, July). Hybrid Time-Series Forecasting Models for Traffic Flow Prediction.
Promet, 34(4), 537–549. Retrieved 2023-01-15, from [Link]
[Link]/PROMTT/article/view/15 doi: 10.7307/ptt.v34i4.3998
van den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., . . . Kavukcuoglu,
K. (2016). Wavenet: A generative model for raw audio. ArXiv, abs/1609.03499.
Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio’, P., & Bengio, Y. (2017). Graph
attention networks (Vol. abs/1710.10903).
Vlahogianni, E., Golias, J., & Karlaftis, M. (2004, 09). Short-term traffic forecasting:
Overview of objectives and methods. Transport Reviews, 24, 533-557. doi: 10.1080/
0144164042000195072
Vlahogianni, E. I., Karlaftis, M. G., & Golias, J. C. (2014, June). Short-term traffic forecast-
ing: Where we are and where we’re going. Transportation Research Part C: Emerging
Technologies, 43, 3–19. doi: 10.1016/[Link].2014.01.005
Wang, Y., Li, L., & Xu, X. (2017, 10). A piecewise hybrid of arima and svms for short-term traffic
References 83