A Framework for In-network Inference using P4

A Framework for In-network Inference using P4
Huu Nghia Nguyen Manh-Dung Nguyen Edgardo Montes de Oca

huu.nguyen@montimage.com manh.Nguyen@montimage.com edgardo.oca@montimage.com
Montimage Montimage Montimage
Paris, France Paris, France Paris, France
ABSTRACT devices to process and control network traffic. It allows network
Machine Learning (ML) has been widely used in network security administrators to centrally manage and program network behavior
monitoring. Although, its application to data intensive use cases via a software-based controller, such as managing the flow of data
and those requiring ultra-low latency remains challenging. This is traffic (data plane) and making decisions about how it should be
due to the large amounts of network data and the need of trans- forwarded based on network policies and configurations.
ferring data to a central location hosting analysis services. In this However, only the control plane has been programmable in SDN.
paper, we present a framework to perform in-network analysis by The data plane was not programmable in a flexible way, and allowed
offloading ML inference tasks from end servers to P4-capable pro- only the pre-configured policies and configurations. Applying ML
grammable network devices. This helps reduce transfer latency and, to data intensive networking use cases and those requiring ultra-
thus, allows faster attack detection and mitigation. It also improves low latency is challenging because the inference step is performed
privacy since the data is processed at the networking devices. The at a central location hosting analyses services in the control plane.
paper also presents an experimental use-case of the framework to This requires transferring data extracted from the data plane. The
classify network traffic, and to early detect and rapidly mitigate prediction results are then used to reprogram the data plane via
against IoT malicious traffic. the controller. This Round-Trip Time (RTT) causes delays in the
closed-loop detection and mitigation process. To improve this, there
KEYWORDS is a need to perform ML inference at the data plane.
Programming Protocol-independent Packet Processors (P4) [3]
In-network inference, programmable network, open-source, P4,
is a domain-specific language for network devices, specifying how
attack detection, IoT networks
data plane is to be processed by networking devices, such as,
ACM Reference Format: switches, routers, filters, etc. Thus, it opens new possibilities for
Huu Nghia Nguyen, Manh-Dung Nguyen, and Edgardo Montes de Oca. flexible and dynamic programmable data planes, such as, perform-
2024. A Framework for In-network Inference using P4. In The 19th Interna-
ing the ML inference. Offloading ML inference from the control
tional Conference on Availability, Reliability and Security (ARES 2024), July
plane side to the data plane side helps not only reducing RTT delay
30–August 02, 2024, Vienna, Austria. ACM, New York, NY, USA, 6 pages.
https://doi.org/10.1145/3664476.3670453 in the closed-loop but also avoiding privacy issues because data is
processed only by the the data plane devices.
1 INTRODUCTION This paper presents a complete open-source framework to per-
form ML-based inference at the data-plane using P4. Especially, we
Recently, ML approaches have been successfully applied in differ- make the following contributions:
ent domains and have shown significant breakthroughs. They have
• We extend our existing tool, Montimage AI Platform (AMIP)
been applied to different applications in networking, ranging from
[12], to deal with Decision Tree (DT) models. AMIP provides
traffic classification and anomaly detection to network configura-
users with easy access, through a friendly and intuitive user
tion and orchestration. A ML-based application usually consists of
interface, to prepare ML models and understand the mod-
2 main steps: model preparation and inference. The former consists
els using Explainable Artificial Intelligence (XAT). We also
of preparing data, training and optimising ML models. The latter
enhance it to give users the ability to select features to be ex-
deploys the obtained models to production to receive input data,
tracted from .pcap files, and to tune parameters for training
make predictions, and return results.
models.
Furthermore, Software Defined Networking (SDN) is pushing
• We propose a new transformation of a DT model to a single
an emerging approach to network management by decoupling
Match-Action (MA) table to reduce the number of MA entries
control and data planes. SDN enables greater automation, flexibility,
and table lookup.
and scalability in network management. It helps to reduce the
• We implement the framework1 and demonstrate its applica-
need of dedicated hardware devices by introducing programmable
tion via a network traffic classification use-case. The use-case
Permission to make digital or hard copies of all or part of this work for personal or is deployed in a Raspberry Pi to act as a smart IoT gateway
classroom use is granted without fee provided that copies are not made or distributed to be able to swiftly detect and react against malicious IoT
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the encrypted traffic.
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or This paper is structured as follows. Section 2 discusses the state
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org. of the art and the related work. Section 3 describes the proposed
ARES 2024, July 30–August 02, 2024, Vienna, Austria framework and its detailed components. An application user-case of
© 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
1 The framework is freely available at https://github.com/montimage/maip and
ACM ISBN 979-8-4007-1718-5/24/07
https://doi.org/10.1145/3664476.3670453 https://github.com/montimage/in-network-inference-using-p4
ARES 2024, July 30–August 02, 2024, Vienna, Austria Huu Nghia Nguyen, Manh-Dung Nguyen, and Edgardo Montes de Oca
the framework will be introduced in Section 4. Section 5 concludes Architecture (PISA), which generalizes the Reconfigurable Match-
the paper. Table (RMT) [4] model and provides essential line-rate packet pro-
cessing features. In the PISA architecture, packets go through a
2 RELATED WORK packet parser, which instantiates user-defined protocols. After the
We summarize in this section two main approaches related to our parser processes a packet, it follows a pipeline of control flows and
framework: XAT and offloading ML inference to programmable MA tables. Finally, packet headers are emitted by a deparser.
data planes. The former helps to better understand trained ML MA table is the core mechanism for processing packets [3]. It’s
models, thus easily optimise them. The latter helps to more rapidly basically a hash lookup table, in which an entry consists of a key to
detect anomalies and mitigate them. match against an input and a value is an action to be executed. MA
table is usually used to define a set of rules to check the packet’s
2.1 Explainable AI header fields against a set of predefined criteria. The matching
criteria can be exact, range, lpm and ternary to match exactly a
XAT [2] aims to make Artificial Intelligence (AI)/ML black-box
given value, a given range of values, a longest prefix and a ternary
models more transparent by explaining why decisions are made.
respectively. If the packet matches a rule, an Action is taken, such
Across various domains, AI/ML plays a crucial role, but trust and
as, modifying the packet’s headers, copying the packet, dropping
transparency are essential for its future applications. These models
the packet, forwarding it to a particular port, or any other defined
are usually complex and not easily interpretable due to their multi-
operations that can be applied to the packet within the switch. Users
ple layers and hyperparameters. This complexity hinders both users
can use Actions to implement different logic, such as, performing
and developers from understanding and improving their accuracy
Distributed Denial of Service (DDoS) attack detection and reaction
and performance. Therefore, incorporating explainability on top
at the edge [13], performing in-band network telemetry [11], im-
of these models is necessary to provide post-hoc explanations and
plementing a Time Sensitive Networking (TSN) mechanism [8], or
enhance interpretability. Notable post-hoc explainability methods
even performing in-network ML inference [1, 7, 19, 20] which will
include visual explanations, local explanations, explanations by
be subsequently detailed.
example, and feature relevance explanations.
In-network ML inference refers to the process of offloading ML
Local explanations seek to approximate explanations within less
inference to networking devices [21]. It provides line rate ML infer-
complex solution sub-spaces by considering only a subset of data.
ence on programmable network devices within the network. This
One popular technique is Local Interpretable Model-agnostic Expla-
is different from traditional ML services that train and deploy mod-
nations (LIME) [15], which interprets outputs of black-box models
els either on a server or an accelerator, e.g., GPU, using complex
across various fields.
frameworks such as, Sklearn2 , TensorFlow3 . Here, in-network ML
Feature relevance explanations involve computing relevance
first trains a model on a server at the control plane, then translates
scores for model features to quantify their contribution or sensitiv-
the model to a set of MA entries to define packet processing logic,
ity to the model’s output. Shapley Additive Explanations (SHAP)
and finally loads the entries to do inference on a network device, at
[10] is a popular XAT technique utilizing cooperative game theory
the data plane.
to identify the importance of each feature value in a prediction. Ad-
The authors in [19] present algorithms, called IISY, to transform
ditionally, Permutation Feature Importance is a global XAI method
different ML models, such as, Decision Tree, SVM, K-mean, Naïve
that measures changes in prediction error when feature values are
Bayes. The authors also propose an evaluation prototype of the al-
randomly permuted.
gorithms. IISY then follow different approaches to apply in-network
Explanations by example focus on extracting representative data
ML inference to different ML methods, such as, supervised, unsuper-
examples that relate to a model’s generated result, thereby enhanc-
vised, reinforcement, or distributed learning. Two surveys [14, 21]
ing understanding. Methods within this category include counter-
summarise these approaches.
factual explanations [18] and adversarial examples.
The existing approaches mainly focus on tree-based ML models,
such as, DT or Random Forest (RF) due to their simple logical
2.2 Offloading ML Inference to Programmable structure and limited operations involved at networking devices [1].
Data Planes SwitchTree [7] extends [19] and deals with RF models by using
Data plane programmability allows the network owner to define range to match a range of values, instead of exact as in IISY, to
data plane functionality using software artifacts running on pro- reduce number of MA entries. Although, these approaches need 𝑛+1
grammable networking devices. P4 [3] is a domain-specific lan- MA tables to map a tree of 𝑛 features. Our transformation requires
guage used for programming these devices to process packets. A only a single MA table, thus reducing the number of pipelines to
key feature of the P4 language is protocol independence. It sup- be executed when performing table lookup. The authors in [20]
ports flexible interaction between the programmable data plane focus on implementing a mechanism of seamless updates of in-
and control plane, which enables coordination between the control network ML inference models at runtime. The authors in [1] deal
logic and packet processing logic on devices. with the challenges and experiment the in-network ML inference
Although a P4 program is independent from a device running it, on a P4-enabled hardware switch.
its compilation needs to follow the packet processing architecture
of the device. This architecture defines the high level structure
of the device, and the interfaces between its major components. 2 https://scikit-learn.org
The most common architecture is the Protocol Independent Switch 3 https://www.tensorflow.org

A Framework for In-network Inference using P4 ARES 2024, July 30–August 02, 2024, Vienna, Austria
Path Class
(f1 ≤ x1) and (f2 ≤ y1) 1
Trained tree
1 2
(f1 ≤ x1) and (y1 < f2) 2
Extract & compute Train & optimize
features model f1 ≤ x 1 (x1 < f1) and (f1 ≤ x2) and (f2 ≤ y2) 1
.pcap
.pcap .csv
.pcap (x1 < f1) and (f1 ≤ x2) and (y2 < f2) 3
MAIP
f2 ≤ y 1 f1 ≤ x 2 (x1 < f1) and (x2 < f1) 1
3
Controller class 1 class 2 f2 ≤ y 2 class 1
Control plane .cfg Match-action entries
range of f1 range of f2 class
Data plane P4 switch class 1 class 3 [ 0, x1] [ 0, y1 ] 1
4 5 6 7
[ 0, x1] [ y1+1, Ymax ] 2
M A
traffic
Deparser
Extract &
Parser
M A [ x1+1, x2 ] [ 0, y2 ] 1
compute
M A
features [ x1+1, x2 ] [ y2+1, Ymax ] 3
ML Inference
[ max(x1,x2)+1, Xmax ] [ 0, Ymax ] 1
Figure 1: Overview of the framework Figure 2: Example of generation of Match-action entries
3 SYSTEM DESIGN
XAT methods, we can retrain the models using only the important
Figure 1 represents an overview of the framework and its compo-
features and discard those that did not contribute much to the
nents. The framework consists mainly of two parts, model prepa-
models’ outcomes. We can iterate through this loop several times to
ration and in-network ML inference. The model preparation, in-
fine-tune the process of blocks ① and ② until the model’s accuracy
cluding blocks ① and ②, is done offline at the control plane. The
is significantly improved. Furthermore, we provide users with the
in-network ML inference, blocks ④,⑤,⑥, and ⑦, is done online at a
option to select which features they want to use for training models
P4-enabled switch to verify network traffic against the model. The
based on their domain-specific knowledge. The output of these two
communication between the control plane and the data plane is
blocks, ① and ②, is a highly accurate trained model that will be
done via a controller in ③. We will detail these blocks subsequently.
ready for subsequent analysis in the following blocks.
3.1 Offline Model Preparation
3.2 Model Transformation
The blocks ① and ② show the extract feature process and model
The block ③ in Figure 1 is implemented by a controller. The con-
training process, respectively. We follow the standard ML pipeline
troller inputs a ML model, transforms it into a set of MA entries,
and implement the two processes in AMIP [12], which is an open
then loads them into the switch. It can also receive ML inference
source ML-based framework for anomaly detection in encrypted
results from the switch. We currently focus on supporting Decision
traffic with high performance, explanation and robustness against
Tree models.
adversarial attacks. It also provides an intuitive and user-friendly
The transformation is performed by visiting all possible paths
interface to access a range of ML services, including feature ex-
from the root to a leaf node of the tree. Each path is transformed
traction, model building and storing, adversarial attack injection,
into a MA entry. The Match of an entry is a set of ranges of features’
explanation generation, and AI models evaluation using different
values that satisfy the path. The Action of an entry is to return the
quantifiable metrics.
classification result.
The block ① employs our open source tool MMT-Probe4 to parse
Figure 2 demonstrates a simplified transformation of a DT with
raw network traffic in .pcap files, extract the needed information,
two features 𝑓1 and 𝑓2 . A DT is a binary tree, in which, each node
compute the features required for training ML models, and, then
is a simple boolean expression that represents a relation between a
translate them into a numeric form in Comma-Separated Values
feature and a threshold. If the feature’s value is less than or equal
(CSV) format. Specifically, MMT-Probe is a monitoring and data
to the threshold, then the left branch is taken, otherwise it’s the
extraction software that parses network traffic to extract network
right branch. In the DT tree in the left side of Figure 2, we use
and application-based events, such as protocol field values and
𝑥𝑖 ∈ [0, 𝑋𝑚𝑎𝑥 ] and 𝑦𝑖 ∈ [0, 𝑌𝑚𝑎𝑥 ] to denote the feature thresholds
statistics. It allows parsing a variety of network protocols, e.g., TCP,
𝑓1 and 𝑓2 respectively. These thresholds are literal numeric values.
UDP, HTTP, and more than 700 others, for the purpose of extracting
The ML inference process, which predicts a set of feature values
metadata. The features consist of multiple parameters that can be
𝑓1, 𝑓2 against this DT tree, is basically the verification of 𝑓1 and
directly extracted from raw traffic, such as, IP addresses, packet
𝑓2 against these boolean expressions from the root to a leaf node.
size, etc, or calculated from the extracted values, such as, variation
The leaf contains the prediction. For example, if we have (𝑓1 ≤
of packet sizes, Inter-Arrival Time (IAT) showing the time intervals
𝑥 1 ) ∧ (𝑓2 ≤ 𝑦1 ) then the prediction is class 1.
between successive arrivals of packets or events, etc.
We list all possible paths of the tree on the table on the top-
The block ② aims to train, fine-tune, and optimize ML models
right of Figure 2. Because 𝑥𝑖 and 𝑦𝑖 are literal numeric values, we
in a closed-loop manner. It employs popular XAT methods such as
can easily collapse the boolean expression of a path to ranges of
SHAP to identify a list of the most important features contributing
possible values of each feature. The bottom-right table of Figure 2
to the models’ predictions. Based on post-hoc insights provided by
represents the result MA entries. Each entry is a row of the table. For
4 https://github.com/Montimage/mmt-probe example, the first row represents an entry having the key (Match)
corresponding to range values 𝑓1 , 𝑓2 : [0, 𝑥 1 ] and [0, 𝑦1 ], and value Table 1: Overview of the dataset
(Action) that is class 1.
Previous approaches in [7, 19, 20] transform a DT tree as-is, i.e., Number of packets Label
without collapsing. [7, 19] follow the sequential top-down tree path Normal traffic 155888 0
to map the each threshold to a MA entry. The thresholds (thus Malicious traffic 10208 1
its MA entries) are then grouped by feature into a separated table. Total 166096
They then introduces another table to combine the mapping results
from the feature tables to the final result. This map is enhanced
in [20] to reduce number of entries in each MA table by breaking destination IP addresses and port numbers, and protocol identifica-
the sequential dependency of each node in a path. However, these tion), as well as the feature values. The information is encoded in a
approaches require 𝑛 + 1 tables for a DT tree of 𝑛 features, and digest message (i.e., a message to communicate information from
more tables implies more pipeline executions. By simply collapsing the data plane to the control plane), then sent to the controller. Utili-
the conjunction of a boolean expression of a path before mapping, sation of digest communication, instead of transmission of a packet
our transformation produces a single MA table for a DT input to the controller via the CPU port, reduces the processing overhead
within a minimum number of entries. Indeed, the number of entries at the controller [3] as the message is structured. Digests are sent
is less than or equal to the number of leaf nodes. Consequently, to the controller by calling digest P4 function. The controller is
this reduces the number of pipeline executions needed to match configured with P4Runtime [3] to listen to digest messages. Upon
features. receiving a message, the controller decodes it. The decoded informa-
tion can be used to perform some reaction, such as, to reconfigure
the switch, or even to retrain the model, etc. This utilisation is out
3.3 In-network ML Inference of scope of the paper, thus here, for instance, we simply save it into
The blocks ④, ⑤, ⑥ and ⑦ are implemented inside a P4-enable a .csv file.
switch. The blocks are described in the following. It is crucial to note that the blocks presented above are essential.
Depending on a specific use-case, other blocks can be introduced.
Parser. When a packet arrives at the switch, it is parsed in For example, a flow tracker block can be inserted after the parser
block ④. This parser extracts the concerned protocol headers, such block to track the flows that have already been classified, or identi-
as, Ethernet, IP, UDP, etc. The extracted values are kept as metadata fied as malware. If so, any new incoming packets belonging to those
in the Paacket Header Vector (PHV). The values can then be sub- flows are not processed by the blocks ⑤ and ⑥, but are forwarded
sequently accessed by other blocks. For example, Ethernet and IP as-is or dropped respectively.
protocols need to be parsed to be able to perform packet switching. The prediction result can also be immediately used by the switch,
for example, to decide to drop or forward the current packet. This
In-band features extraction & computation. The block ⑤ in Fig- forms a local closed-loop of extraction-detection-reaction inside
ure 1 extracts the necessary feature values required for ML infer- the switch. Consequently, it avoids RTT delay caused by the com-
ence. Several feature values can be obtained from the previous munication with the controller.
parser block, such as, the length of an IP packet, the source port
number, etc. Although, there are feature values that still need to be 4 EXPERIMENTAL EVALUATION
extracted or computed, for example, IAT feature that captures the
In this section, we evaluate our framework by applying it to imple-
different arrival times of two consecutive packets, or the maximum
ment a smart IoT wireless gateway. The gateway implements an
size of packets during a window time frame, etc. This extraction and
in-network ML-based solution for quickly detecting and immedi-
computation must take into account a number of strict constraints
ately mitigating IoT malicious traffic which is encrypted. The threat
imposed by a programmable switch, such as, low available memory,
model simply follows the block-list (or blacklist) model. This is to
limited support for mathematical operations and limited number
block any packets which are classified as malicious.
of operations per packet to maintain line rate packet processing.
ML inference. This block performs ML inference. It predicts the 4.1 Model Preparation
set of extracted feature values. The prediction is done by matching Several packet-level features [12] can be easily extracted from
the values against the MA entries. If there exists an entry having a packet header fields and utilized to address various traffic clas-
key that matches the values then the prediction is the result value. sification challenges. However, they may prove inadequate when
This result is saved as metadata so that it can be accessed by the dealing with encrypted traffic, as certain features can be obfuscated
next block. by encryption algorithms [6]. Therefore, for the sake of simplicity,
in this experimental evaluation we focus solely on statistical fea-
Deparser. This is the last block in the chain of processing the tures derived from packet sizes and timestamps. Specifically, we
packets inside the switch. This block, ⑦, packages a packet with consider three key features: IAT, representing the inter-arrival time
additional information, such as, MAC destination, new checksum, between packets; len, indicating the payload size of each IP packet;
etc, and then sends the packet to a selected outgoing port of the and diffLen, which captures the variability in packet sizes over time.
switch. This block also notifies the controller on the result obtained Table 1 presents a summary of the public dataset CSE-CIC-
from the ML inference. Additional information is also sent to the IDS2018 [16], which is used for botnet detection in encrypted traffic
controller, such as, a 5-tuple identifying the packet (i.e., source and within IoT networks. The dataset contains a total of 166096 packets
A Framework for In-network Inference using P4 ARES 2024, July 30–August 02, 2024, Vienna, Austria
Confusion Matrix
Controller clients
Actual Normal
server
46527 242 Control traffic
Raspberry Pi
True Label
wlan0
WIFI
eth0
Data traffic
RJ45 P4 switch AP
Actual Malicious
3056 4
Figure 5: Overview of the IoT testbed
Predicted Normal Predicted Malicious
Predicted Label
4.2 In-network ML Inference
Figure 3: Confusion matrix The framework prototype is implemented mainly in P4 and Python
to provide the data plane and control plane functionality (see Fig-
ure 1). We execute the P4 code in a P4 software switch which im-
plements the behavioral model version 2 architecture, BMv25 . The
iat DT model, that is prepared in the previous section, is transformed
diffLen into 5022 MA entries of a table that will be loaded into the BMv2
len Normal traffic P4 switch. With the same model, IISY [19] generates totally 10047
Malicious traffic
MA entries in 4 MA tables. Three tables for IAT, len and diffLen
0.00 0.02 0.04 0.06 0.08 0.10
mean(|SHAP value|) (average impact on model output magnitude) features contain 4731, 42 and 251MA entries respectively. The last
table contains 5023 MA entries for synthesising final results from
Figure 4: SHAP summary plot for anomaly detection the 3 tables above.
Correctness of P4-based Inference. We first want to evaluate the
correctness of the ML inference running inside the P4 switch. We
use mininet6 to create a realistic virtual network running on a single
extracted from .pcap files, with 155,888 packets classified as normal
virtual machine Ubuntu 20.04.6. The network consists of two nodes:
traffic and 10208 packets classified as malicious traffic. In the dataset,
a host h and a BMv2 P4 switch s. The captured packets in dataset
normal and malicious traffic are labeled as 0 and 1, respectively.
traces are replayed from h to s using tcpreplay7 . For each incoming
We use AMIP to extract the 3 features from this dataset. We then
packet, the P4 switch performs a prediction, e.g., steps ④, ⑤, ⑥ and
randomly spit the obtained values into training and testing datasets.
⑦ in Figure 1, and sends the prediction result together with feature
The training dataset consisting of 70% of the data is used to train
values to the controller which saves this information into a .csv
our DT model. The rest is used to test the accuracy of obtained
file. We later use sklearn to obtain the score which represents the
model.
mean accuracy of data in the .csv file. Since we obtained the score
Figure 4 shows the confusion matrix that provides a detailed
1, we can conclude that the P4-based inference and sklearn-based
breakdown of the accuracy of the obtained DT model. For instance,
inference give the same result.
the top-left cell represents instances where the model correctly
classified normal traffic (label 0) as normal, with a count of 46527. Overhead on Packet Latency. In this section, we evaluate the over-
The accuracy metric, calculated as the ratio of correctly predicted head caused by the in-network ML inference over packet latency in
instances to the total number of instances, is 93.38%. Despite their a physical testbed. We deployed the implementation prototype of
simple logical structure and lower precision compared to advanced the in-network ML inference in Raspberry PI 3 Model B, with 1GB
ML techniques such as deep neural networks [12], the DT model of RAM and Quad Core 1.2GHz, which acts as a smart IoT wireless
still achieves high accuracy and, more importantly, facilitates the gateway, as shown in Figure 5. We rely on P4PI [9] to run the P4
complex operations involved in networking devices. code on a BMv2 P4 switch in the Raspberry Pi. The controller is
SHAP provides explanations of a model’s predictions by identi- deployed on a separated machine. The controller and other IoT de-
fying the most important features based on a feature attribution vices connect to the gateway via its wireless network interface. The
framework and Shapley values. Figure 4 illustrates important fea- IoT devices can connect to each other and to a server represented
tures by sorting the sum of magnitudes of Shapley values over these on the left side of the figure.
samples. Here, the length of the bar indicates how much influence In order to assess the latency overhead, we developed a pair of
the feature has on the prediction. Among the three features used, basic client and server applications to actively gauge the end-to-
the most important one is IAT. It is also a common characteristic end packet latency. The client includes its current time in a packet
used in machine learning algorithms [5, 17], as malicious commu- payload and transmits it to the server, which promptly returns the
nications often exhibit specific flow duration patterns. For instance, packet. The client subsequently contrasts the current time with
some botnets establish brief connections, while others are more the one encapsulated in the packet to determine the RTT of the
chatty, resulting in longer duration. Please note that the efficacy of 5 https://github.com/p4lang/behavioral-model/blob/main/docs/simple_switch.md
this detection method may diminish due to attacker evasion tactics, 6 https://mininet.org/
though this aspect falls beyond the scope of this paper. 7 https://tcpreplay.appneta.com/
100
80
Distribution (%)
60
40
however those of the author(s) only and do not necessarily reflect
those of the European Union. Neither the European Union nor the
20
without ML inference
granting authority can be held responsible for them.
with ML inference
0
10000 15000 20000 25000 30000 35000 40000 45000 50000 REFERENCES
RTT ( s)
[1] Aristide Tanyi-jong Akem, Guillaume Fraysse, and Marco Fiore. 2024. Encrypted
Traffic Classification at Line Rate in Programmable Switches with Machine
Figure 6: Overhead of ML inference on packet latency Learning. In Proc. of NOMS.
[2] Alejandro Barredo Arrieta et al. 2020. Explainable Artificial Intelligence (XAI):
Concepts, taxonomies, opportunities and challenges toward responsible AI. In-
formation fusion (2020).
packet, all of this is achieved without requiring time synchroniza- [3] Pat Bosshart, Dan Daly, Glen Gibb, Martin Izzard, Nick McKeown, Jennifer
tion between the client and the server. In the case of measurements Rexford, Cole Schlesinger, Dan Talayco, Amin Vahdat, George Varghese, and
David Walker. 2014. P4: Programming Protocol-Independent Packet Processors.
without ML inference, we removed the related P4 code of ML in Computer Communication Review 44, 3 (2014), 87–95.
the P4 switch. [4] Pat Bosshart, Glen Gibb, Hun-Seok Kim, George Varghese, Nick McKeown, Martin
We conducted several measurements. Each measurement sends Izzard, Fernando Mujica, and Mark Horowitz. 2013. Forwarding Metamorphosis:
Fast Programmable Match-Action Processing in Hardware for SDN. In Proc. of
10000 packets. We present the results in the Cumulative Distri- SIGCOMM.
bution Function (CDF) diagram in Figure 6. The horizontal axis [5] Livadas Carl, R Walsh, D Lapsley, and WT Strayer. 2006. Using machine learning
technliques to identify botnet traffic. In Local Computer Networks, Proceedings
stands for the measured RTT values and the vertical axis for their 2006 31st IEEE Conference on. IEEE.
distribution. We can see that almost all RTT values vary from 15000 [6] Hossein Doroud, Ahmad Alaswad, and Falko Dressler. 2022. Encrypted Traffic
to 30000 𝜇𝑠. The average latency with and without ML inference Detection: Beyond the Port Number Era. In 2022 IEEE 47th Conference on Local
Computer Networks (LCN). 198–204.
are 22093 𝜇𝑠 and 19069 𝜇𝑠 respectively. Therefore, the additional [7] Jong hyouk Lee and Kamal Sigh. 2020. SwitchTree: In-network Computing and
latency increases by 15.8%. This additional latency is mainly due to Traffic Analyses with Random Forests. Neural Computing and Applications (2020).
the table lookup time of the P4 switch that is executed inside the [8] Fabian Ihle, Steffen Lindner, and Michael Menth. 2023. P4-PSFP: P4-Based Per-
Stream Filtering and Policing for Time-Sensitive Networking. (2023).
Raspberry Pi. [9] Sándor Laki, Radostin Stoyanov, Dávid Kis, Robert Soulé, Péter Vörös, and Noa
Zilberman. 2021. P4Pi: P4 on Raspberry Pi for networking education. SIGCOMM
Swift Detection and Reaction at the IoT Gateway. We involved the Comput. Commun. Rev. 51, 3 (jul 2021), 17–21.
testbed in Figure 5 by introducing a new block in the P4 program [10] Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model
predictions. Advances in neural information processing systems 30 (2017).
to drop a packet if it is classified as malicious. We then use a laptop, [11] Huu Nghia Nguyen, Bertrand Mathieu, Marius Letourneau, and Guillaume Doyen.
that connects to the gateway via its wireless interface, to act as a 2023. A Comprehensive P4-based Monitoring Framework for L4S leveraging
In-band Network Telemetry. In Proc. of NOMS.
malicious IoT device, as shown in the right side of Figure 5. In this [12] Manh-Dung Nguyen, Anis Bouaziz, Valeria Valdes, Ana Rosa Cavalli, Wissam
laptop, we use tcpreplay to inject botnet traffic into the network. Mallouli, and Edgardo Montes De Oca. 2023. A deep learning anomaly detec-
We see that the gateway can detect almost all the malicious packets tion framework with explainability and robustness. In Proceedings of the 18th
International Conference on Availability, Reliability and Security (ARES ’23).
and immediately drop them. Although, there exist packets that are [13] F. Paolucci, F. Civerchia, A. Sgambelluri, A. Giorgetti, F. Cugini, and P. Castoldi.
not classified as malicious because the accuracy of the model is 2019. P4 edge node enabling stateful traffic engineering and cyber security.
93.38%. Journal of Optical Communications and Networking 11, 1 (2019), A94–A95.
[14] Ricardo Parizotto and Israat Haque. 2024. Offloading Machine Learning to
Programmable Data Planes: A Systematic Survey. January 2024 (2024).
5 CONCLUSION [15] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. " Why should
I trust you?" Explaining the predictions of any classifier. In Proc. of SIGKDD.
We presented in this paper a comprehensive framework to swiftly 1135–1144.
detect and mitigate malicious traffic by directly performing ML [16] Iman Sharafaldin, Arash Habibi Lashkari, Ali A Ghorbani, et al. 2018. Toward
generating a new intrusion detection dataset and intrusion traffic characterization.
inference at the data planes via P4-enabled switches. We imple- Proc. of ICISSP 1, 108–116.
mented the framework and experimentally evaluated it against [17] W Timothy Strayer, David E Lapsley, Robert Walsh, and Carl Livadas. 2008.
a P4 software switch. This is a step forward to applying ML to Botnet detection based on network behavior. Botnet detection 36, August (2008),
1–24.
security analysis of network traffic at line rate. Future work will [18] Sandra Wachter, Brent Mittelstadt, and Chris Russell. 2017. Counterfactual
explore improving the accuracy of the model and applying it for explanations without opening the black box: Automated decisions and the GDPR.
encrypted network traffic classification and anomaly detection in Harv. JL & Tech. 31 (2017), 841.
[19] Z Xiong and N Zilberman. 2019. Do Switches Dream of Machine Learning?
deterministic networks. Toward In-Network Classification. Proc. of HotNets, 25–33.
[20] Mingyuan Zang, Changgang Zheng, Lars Dittmann, and Noa Zilberman. 2023. To-
wards Continuous Threat Defense: In-Network Traffic Analysis for IoT Gateways.
ACKNOWLEDGMENTS IEEE Internet of Things Journal 11, 6 (2023), 9244–9257.
This work is partially supported by the European Union’s Horizon [21] Changgang Zheng, Damu Ding, Shay Vargaftik, and Yaniv Ben-itzhak. 2023.
In-Network Machine Learning Using Programmable Network Devices : A Survey.
Europe research and innovation program under grant agreements EEE Communications Surveys & Tutorials (2023), 1–35.
Numbers 101096504 (DETERMINISTIC6G), 101070450 (AI4CYBER)
and the INFLUENCE project. Views and opinions expressed are

A Framework for In-network Inference using P4

Uploaded by

A Framework for In-network Inference using P4

Uploaded by

A Framework for In-network Inference using P4

Huu Nghia Nguyen Manh-Dung Nguyen Edgardo Montes de Oca

The most common architecture is the Protocol Independent Switch 3 https://www.tensorflow.org

Figure 1: Overview of the framework Figure 2: Example of generation of Match-action entries

46527 242 Control traffic

You might also like