0% found this document useful (0 votes)
14 views12 pages

A New Method For Flow-Based Network Intrusiondetection Using The Inverse Potts Model

The document presents a novel Energy-based Flow Classifier (EFC) for network intrusion detection, addressing limitations of traditional machine learning classifiers such as the need for large labeled datasets and poor domain adaptation. EFC utilizes inverse statistics to classify network traffic based on benign examples, achieving high accuracy and providing an interpretable model. The classifier is tested on three datasets, demonstrating its robustness and adaptability compared to classical methods.

Uploaded by

driceate
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views12 pages

A New Method For Flow-Based Network Intrusiondetection Using The Inverse Potts Model

The document presents a novel Energy-based Flow Classifier (EFC) for network intrusion detection, addressing limitations of traditional machine learning classifiers such as the need for large labeled datasets and poor domain adaptation. EFC utilizes inverse statistics to classify network traffic based on benign examples, achieving high accuracy and providing an interpretable model. The classifier is tested on three datasets, demonstrating its robustness and adaptability compared to classical methods.

Uploaded by

driceate
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

1

A new method for flow-based network intrusion


detection using the inverse Potts model
Camila Pontes, Manuela Souza, João Gondim, Matt Bishop and Marcelo Marotta

Abstract—Network Intrusion Detection Systems (NIDS) play classification in real-time, a massive volume of data must
an important role as tools for identifying potential network be analyzed, making deep packet inspection too costly to
threats. In the context of ever-increasing traffic volume on be applied regarding processing and energy consumption.
computer networks, flow-based NIDS arise as good solutions
Since flow-based approaches can classify the whole traffic
arXiv:1910.07266v5 [cs.NI] 11 Mar 2021

for real-time traffic classification. In recent years, different flow-


based classifiers have been proposed using Machine Learning inspecting an equivalent to 0.1% of the total volume, NIDSs
(ML) algorithms. Nevertheless, classical ML-based classifiers based on flow analysis arise as good solutions for real-time
have some limitations. For instance, they require large amounts traffic classification [4]. Besides, with the advent of software-
of labeled data for training, which might be difficult to ob- defined networking and the virtualization of network functions,
tain. Additionally, most ML-based classifiers are not capable
of domain adaptation, i.e., after being trained on an specific distributed security systems can take advantage of the spread
data distribution, they are not general enough to be applied of NIDSs based on flow analysis to improve their security
to other related data distributions. And, finally, many of the management across the network [5].
models inferred by these algorithms are black boxes, which do In recent years, different flow-based classifiers have been
not provide explainable results. To overcome these limitations, we proposed based on both shallow and deep learning [6]. Ac-
propose a new algorithm, called Energy-based Flow Classifier
(EFC). This anomaly-based classifier uses inverse statistics to cording to the report in [6], the best flow-based classifiers
infer a statistical model based on labeled benign examples. We achieve around 99% accuracy. Although quite accurate, clas-
show that EFC is capable of accurately performing binary flow sical Machine Learning (ML)-based classifiers require labeled
classification and is more adaptable to different data distributions malicious traffic samples to perform training. However, real
than classical ML-based classifiers. Given the positive results traffic labeling might be difficult, especially in the case of ma-
obtained on three different datasets (CIDDS-001, CICIDS17 and
CICDDoS19), we consider EFC to be a promising algorithm to licious traffic [7]. Besides, ML-based classifiers after trained
perform robust flow-based traffic classification. on specific data distribution usually do not work well when
applied to other data with slightly different distributions, i.e.,
Index Terms—Flow-based Network Intrusion Detection,
Anomaly-based Network Intrusion Detection, Network Flow they have low domain adaptation capability [8], [9], [10].
Classification, Network Intrusion Detection Systems, Energy- Such a capability is particularly important to the network
based Flow Classifier, Inverse Potts Model, Domain Adaptation. context since a standard procedure is to perform the training
of classifiers with simulated data and, afterward, apply in real
scenarios that change the data distribution requiring domain
I. I NTRODUCTION adaptation. Moreover, most ML algorithms are well-known
to be black-box mechanisms, challenging to be understood
YMANTEC’S Internet Security Threat Report [1] points
S out a 56% increase in the number of web attacks in 2019.
Network scans, denial of service, and brute force attacks are
and readjusted in detail [11], [12]. In this regard, there is a
clear need for a new flow-based classifier for NIDSs, which
generates an understandable model (white box) based solely
among the most common threats. Such malicious activities
on benign examples, and adaptable to different domains.
threaten individuals and collective organizations such as public
In this work, we propose a novel classifier called Energy-
health, financial, and government institutions. In this context,
based Flow Classifier (EFC), which is inspired by the inverse
Network Intrusion Detection Systems (NIDSs) play an impor-
Potts model from quantum mechanics and adapted to net-
tant role as tools for managing and identifying potential threats
work flow classification. EFC performs one-class, anomaly-
in the network [2].
based classification, i.e., as long as it can learn the prop-
There are two main approaches for NIDSs regarding the
erties of benign flows, it will discriminate between benign
kind of data analyzed: packet-based and flow-based. In the
and malicious flows. Moreover, it is a white box algorithm,
former, deep packet inspection is performed, taking into
producing a statistical model that can be analyzed in detail
account individual packet payloads and header information
regarding individual parameter values. Here, we compared the
[3]. In the latter, flows, i.e., packet collections, are analyzed
performance of EFC against a variety of classifiers using three
regarding their properties, e.g., duration, number of packets,
different datasets, i.e., CIDDS-001 [13], CICIDS17 [14], and
number of bytes, and source/destination port [3]. To perform
CICDDoS19 [15]. Our results show that EFC’s performance is
C. Pontes, M. Souza, J. Gondim and M. A. Marotta are with the Uni- comparable to the performance of the other classifiers. Also,
versity of Brasilia, Brazil, emails: [email protected], [email protected], we observed that EFC is less sensitive to changes in data
[email protected];
M. Bishop is with the University of California at Davis, Davis, USA, email: distribution than the others. Our main contributions are:
[email protected] • The proposal and implementation of a flow classifier
2

based on the inverse Potts model to be employed in accuracy. Also, Ring et al. [23] explored slow port scans de-
NIDSs; tection using CIDDS-001. The approach proposed by them is
• A performance comparison of the proposed classifier capable of accurately recognizing the attacks with a low false
with classical ML-based classifiers using three different alarm rate. Finally, Abdulhammed et al. [24] also performed
datasets; classification based on flows on CIDDS-001 and proposed an
• An analysis of how different classifiers perform when approach that is robust considering imbalanced network traffic.
trained within one domain and tested in another related In summary, CIDDS-001 is an updated and relevant dataset to
domain. be used for network flow-classification solutions, being one of
The rest of this paper is structured as follows. In Section II, our dataset choices for assessing the performance of EFC.
we briefly present the state-of-the-art in flow-based NIDSs. The other two datasets considered here are CICIDS17 and
In Section II, we describe the structure of network flows CICDDoS19, from the Canadian Institute for Cyber Secu-
with a preliminary analysis of the datasets considered here. rity. Recently, Yulianto, Sukarno, and Suwastika [25] used
In Section III-C, we introduce the statistical model proposed CICIDS17 to assess the performance of an Adaboost-based
and the classifier implementation. In Section V, we present classifier. Aksu et al. [26] did the same in 2018 with
the results obtained regarding the statistical model’s analysis different ML classifiers. CICIDS17 contains benign as well
and the classification experiments performed. Finally, in Sec- as the most up-to-date common attacks, resembling true real-
tion VI, we present our conclusions and future work. world data, being a relevant dataset to consider for flow-
based traffic classification. Meanwhile, CICDDoS19 is a very
recent dataset with a focus on DDoS attacks. [27] proposes
II. R ELATED W ORK
a real-time entropy-based NIDS for detection of volumetric
In this section, we first briefly review the state-of-the-art DDoS in Internet of Things (IoT) and performs tests over
in flow-based network intrusion detection systems. In the CICDDoS19 dataset, among other datasets. Another recent
following, some previous work on CIDDS-001, CICIDS17, work [28] obtained over 99% accuracy over CICDDoS19
and CICDDoS19 are shown to highlight their relevance as up- dataset using a Convolutional Neural Network (CNN). And,
to-date datasets to be used in our experiments. Finally, some finally, Novaes et al. [29] proposed a system for intrusion
challenges of ML-based traffic classification are regarded, such detection based on fuzzy logic, which had its performance
as the difficulty in obtaining sufficient labeled data, the non- assessed on CICDDoS19. The rising popularity of this dataset
interpretability of models, and the difficulty in adapting to serves as evidence of its relevance to assess the performance
different domains (data distributions). of different NIDS. Hence, we use CICIDS17 and CICDDoS19
Several ML-based classifiers have been explored over the datasets to test our classifier and compare it to the performance
last years for network intrusion detection. Vinayakumar et al. of classical ML classifiers.
(2017) [16], Mahfouz et al. (2020) [17] and Khan et al. (2020) Umer et al. (2017) [6] performed a comprehensive literature
[18] independently evaluated the performance of different ML- survey on flow-based network intrusion detection. In their
based classifiers over internet traffic datasets. In [16], the work, some disadvantages of using ML-based classifiers for
KDDCup’99 and NSL-KDD datasets are regarded to evaluate traffic classification are mentioned. Among them are the high
the performance of both shallow and deep learning-based computational cost of training the classifiers, the difficulty in
classifiers. It is shown that deep learning-based approaches obtaining representative datasets, and the high false positive
performed better to differentiate malicious attacks from benign rates observed. The present work addresses some of the
traffic. Meanwhile, the authors of [17] considered the NSL- issues mentioned, since the classifier proposed here has a low
KDD dataset to compare the performance of different shallow computational cost and learns exclusively based on benign
learning-based classifiers. The classifier that presented the best samples. In the following, these and some other issues are
performance without feature selection was the Decision Tree discussed in further detail.
(DT). With feature selection, K-Nearest Neighbors (KNN) One of the commonly discussed issues in the field of ML
performed better to classify malicious traffic. Finally, the is the tight dependency most algorithms have on the amount
authors of [18] compared the performance of a few different of labeled samples available for training [7], which might be
classifiers over the UNSW-NB15 dataset. They observed that difficult to obtain in some contexts. For instance, it is difficult
Random Forest (RF) overperformed all other classifiers. In to obtain malicious traffic samples and to label it in the real
fact, RF has been used in several recent NIDSs [19], [20], world and this is why most of the network intrusion detection
[21]. All of the aforementioned works use the F1-score as a datasets contain simulated attacks. This issue makes it difficult
metric to assess the performance of the different classifiers. In to train intrusion detection algorithms in such a way that they
the present work, we consider both deep and shallow learning- might be able to detect zero-day threats [7]. The only way of
based classifiers as baselines to assess EFC’s performance over possibly detecting a zero-day attack is relying on an anomaly-
three different datasets, regarding the F1-score as one of the based classifier [30], such as the one we propose in this work.
evaluation metrics. EFC, has a great advantage over other ML-based algorithms,
To assess EFC’s performance, one of the datasets we use which is the capability to infer a model based solely on benign
is CIDDS-001. This dataset was used by Verma and Ranga traffic samples, i.e., half of the information. Such capability
(2018) [22] to assess the performance of KNN and k-means can be used to circumvent the problem of obtaining a high
clustering algorithms. Both algorithms achieved over 99% amount of data and the labeling of malicious samples.
3

Another common problem in ML is that inferred models is the most commonly used format, its main features are listed
lose their predictive performance when tested in different below:
domains (data distributions) [10]. In the field of network • Source/Destination IP (flow keys) - determine the origin
security, this adaptability is specially important given the and destination of a given flow in the network;
existence of zero-day threats and the artificiality of most • Source/Destination port (flow keys) - characterize differ-
datasets used for research. In [10], there is an interesting ent kinds of network services e.g., ssh service uses port
discussion about the existing differences between datasets used 22;
by academics to test NIDSs and the network traffic observed • Protocol (flow key) - characterizes flows regarding the
in the real world. Additionally, the work of Bartos et al. [8] transport protocol used e.g., TCP, UDP, ICMP.
and Li et al. [9] also address this issue. They propose similar • Number of packets (feature) - total number of packets
approaches, applying data transformations to the data to reduce captured in a flow;
differences between data distributions in different domains. In • Number of bytes (feature) - total number of bytes in a
our work, we propose a classifier that is intrinsically adaptable flow;
to different domains, since the model inference is based solely • Duration (feature) - total duration of a flow in seconds;
on benign samples. Therefore, there is no need to transform • Initial timestamp (feature) - system time when one flow
the data in order to adapt the model or perform adjustments started to be captured.
to a different domain, making our approach simpler and more Other features such as TCP Flags and Type of Service might
straightforward. also be exported in some cases. The combination of different
Finally, another big issue in ML is the non-interpretability of flow keys and features characterize one flow and determine its
some models [11], [12]. Artificial Neural Networks (ANNs), particular behavior.
in special, became more and more opaque with time, de- Flow-based approaches are seen as suitable alternatives to
spite overperforming other approaches in many tasks. The precede packet inspection in real-time NIDSs. The idea is to
authors of [12] highlight that the best ML algorithms are deeply inspect only the packets belonging to flows considered
not interpretable, hence the decision taken by them can not to be suspicious by the flow-based classifier. A two-step
be explained. However, different contexts require transparent approach would notably reduce the amount of data analyzed
decision making and that is why the development of ex- while maintaining a high classification accuracy [4]. In this
plainable models is so important. The authors of [11] call work, we are only concerned with the first step, which is
attention to the fact that trying to explain black box models the flow classification. We evaluate the performance of our
might not be the best approach to solve the issue of non- algorithm, the EFC, compared to other ML algorithms using
interpretability. Instead, it is suggested the design of new three different datasets. We also evaluate the performance
models that are inherently interpretable. In line with what of the algorithms by training with data from part of the
has been suggested by these recent studies, EFC generates dataset and testing with other parts of it. Nonetheless, although
a white-box model and, therefore, satisfies the requirement of both parts of the data come from the same dataset, their
providing explainable results, allowing classification results to distributions are different to characterize domain adaptation. In
be analysed in retrospect if needed. Thus, next, we introduce the following, we briefly describe the datasets used for testing
main concepts and intuitions to serve as basis for EFC. and characterize what constitutes a domain adaptation in each
of them.

III. BACKGROUND AND DATASETS A. CIDDS-001


In this section, we present some fundamental concepts to CIDDS-001 [13] is a relatively recent dataset composed of
understand flow-based network intrusion detection. First, the a set of flow samples captured within a simulated OpenStack
concept of network flow and its features are introduced. In environment and another set of flow samples obtained from a
the following, the three internet flow datasets used in this real server. The former contains only simulated traffic, while
work are presented and described in detail to provide concrete the latter includes both real and simulated traffic. Each sample
examples of features and contextualize the experimental results collected within these two environments has one of the labels
presented in Section V. The information provided in this described in Table I.
section serves as a basis to understand how EFC works.
A network flow is a set of packets that traverses in- Table I
L ABELS WITHIN CIDDS-001 DATASET
termediary nodes between end-points within a given time
interval. Under the perspective of an intermediary node, i.e., Environment Labels
an observation point, all packets belonging to a given flow
OpenStack normal, DoS, portScan, pingScan, bruteForce
have a set of common features called flow keys. It means that
flow keys do not change for packets belonging to the same External server normal, DoS, bruteForce, unknown, suspicious
flow, while the remaining features might vary. FlowScan [31]
is an example of a tool capable of collecting data from a set Simulated benign flows are labeled as normal, while simu-
of packets and extracting flow features to be later exported in lated malicious flows are labeled as dos, portScan, pingScan
different formats, such as NetFlow and IPFIX. Since NetFlow or bruteForce, depending on the type of attack simulated. The
4

labels suspicious and unknown, in turn, are used for real traffic. Table III
The external server is open to user access through the ports ATTACKS WITHIN CICIDS17 DATASET
80 and 443. Hence, flows directed at these ports were labeled
Week day Attacks
as unknown, since they could be either benign or malicious.
Monday
All flows directed at other ports were labeled as suspicious.
Traffic was sampled in both the simulated and the external Tuesday FTP-Patator, SSH-Patator
server environment for a period of four weeks. Within this DoS slowloris, DoS Slowhttptes, DoS Hulk, DoS
Wednesday
dataset, a change from the simulated data distribution to the GoldenEye, Heartbleed Port 444
external server data distribution is a domain change, requiring Brute Force, XSS, Sql Injection, Dropbox download,
the classifiers to adapt. Thursday
Cool disk
CIDDS-001 dataset flow features are shown in Table II.
Friday Botnet ARES, Port Scan, DDoS LOIT
All features were taken into account for characterization and
classification except for Src IP, Dest IP and Date first seen.
These exceptions are because the latter one is intrinsically
contains different modern reflective DDoS attacks such as
not informative to differentiate flows, and the former two are
PortMap, NetBIOS, LDAP, MSSQL, UDP, UDP-Lag, SYN,
made up in the context of the simulated network and might
NTP, DNS, and SNMP. The traffic was captured in January
be confounding.
(first day) and March (second day), 2019. Attacks were
executed during this period (see Table IV).
Table II
F EATURES WITHIN CIDDS-001 DATASET
Table IV
# Name Description ATTACKS WITHIN CICDD O S19 DATASET

1 Src IP Source IP Address Day Attacks


2 Src Port Source Port First PortMap, NetBIOS, LDAP, MSSQL, UDP, UDP-Lag, SYN
3 Dest IP Destination IP Address NTP, DNS, LDAP, MSSQL, NetBIOS, SNMP, SSDP, UDP,
4 Dest Port Destination Port Second
UDP-Lag, WebDDos, SYN, TFTP
5 Proto Transport Protocol (e.g., ICMP, TCP, or UDP)
6 Date first seen Start time flow first seen Flow features on this dataset were extracted using CI-
7 Duration Duration of the flow CFlowMeter [32]. All features were considered here, except
8 Bytes Number of transmitted bytes for Flow Id, Source IP, Destination IP, and Timestamp. These
9 Packets Number of transmitted packets exceptions were made because the features were either in-
10 Flags OR concatenation of all TCP Flags trinsically not informative or made up within a simulated
environment.
Considering each concept regarding network flows, their
features, and how they are presented across different datasets,
B. CICIDS17 serve as basis to introduce the main intuition behind EFC,
CICIDS17 [14] dataset contains benign traffic and the most such as presented next.
up-to-date common attacks, resembling real-world data. This
dataset was built using the abstract behavior of 25 users
based on the HTTP, HTTPS, FTP, SSH, and email protocols. IV. P ROPOSAL
The data was captured during one week in July 2017. The
attacks implemented include Brute Force FTP, Brute Force EFC is based on inverse statistics. The main task of in-
SSH, DoS, Heartbleed, Web Attack, Infiltration, Botnet, and verse statistics is to infer a statistical distribution based on
DDoS. They were executed both morning and afternoon on a sample of it [33]. Methods using inverse statistics have
Tuesday, Wednesday, Thursday, and Friday (see Table III). been successfully applied to problems in other disciplines,
Flow features on this dataset were extracted using CI- e.g., predicting protein contacts in Biophysics [33], [34]. Here,
CFlowMeter [32]. There are in total 88 features, which are the statistical inference is based on the Potts model [35].
not going to be cited here because of the limited space. All This model provides a mathematical description of interacting
features were considered here, except for Flow ID, Source IP, spins on a crystalline lattice. Within the model framework,
Destination IP, and Timestamp. These exceptions were made interacting spins are mapped into a graph G(η, ε) (see Figure
because the features were either intrinsically not informative 1 A)), where each node i ∈ η = {1, ..., N} has an associated
or made up within a simulated environment. spin ai , which can assume one value from a set Ω that contains
all possible individual quantum states. Each node i has also
an associated local field hi (ai ) that is a function of ai ’s state.
C. CICDDoS19 Meanwhile, each edge (i, j) ∈ ε, i, j ∈ η, has an associated
CICDDoS19 [15] contains benign traffic and the most up-to- coupling value ei j (ai , a j ) that is a function of the states of
date common DDoS attacks (volumetric and application: low spins ai and a j associated to nodes i and j. A specific system
volume, slow rate), resembling real-world data. This dataset configuration has an associated total energy, determined by the
5

Hamiltonian function H (a1 ...aN ), which depends on all spin M < Q, it is considered that values M + 1, ..., Q are possible,
states. but will never be observed empirically. For instance, if the
only possible values for feature protocol are {’TCP’, ’UDP’},
A) B)
Protocol and given Q = 4. In this case, we would have the mapping
ai hi(ai) TCP +1 {’TCP’:1, ’UDP’:2, ’ ’:3, ’ ’:4 } and feature values 3 and 4
eij(ai,aj)
would never occur.
eil(ai,al)
+2 +1 Now, let K be the set of all possible flows, i.e., all
eik(ai,ak) -3 possible combinations of feature values (K = ΩN ), and let
hl(al) hj(aj)
0
al
ejl(aj,al)
aj
80
-2
80
+1 S ⊂ K be a sample of flows. We can use inverse statistical
ekl(ak,al) ejk(aj,ak) DstPt -2 SrcPt
physics to infer a statistical model associating a probability
ak -1
0.1
0
P(ak1 ...akN ) to each flow k ∈ K based on sample S . The
hk(ak) Duration global statistical model P is inferred following the Entropy
Maximization Principle [36]:
Figure 1. A) Interacting spins on a crystalline lattice. B) Network flow
mapped into a graph structure.
max − ∑ P(ak1 ...akN )log(P(ak1 ...akN )) (1)
P
In this work, we reuse the intuitions from the Potts model k∈K
to characterize network flows (see Figure 1 B)). An individ- s.t.
ual flow k is represented by a specific graph configuration P(ak1 ...akN ) = fi (ai ) (2)
Gk (η, ε). Instead of spins, each node represents a selected

k∈K |aki =ai
feature i ∈ η = {SrcPort, ..., Flags}. Within a given flow k, ∀i ∈ η; ∀ai ∈ Ω;
each network flow feature i assumes one value aki from the
set Ωi that contains all possible values for this feature. As in ∑ P(ak1 ...akN ) = fi j (ai , a j ) (3)
k∈K |aki =ai ,ak j =a j
the Potts Model, each feature i has an associated local field
hi (aki ). Meanwhile, ε = {(i, j)|i, j ∈ η; i 6= j} is the set of ∀(i, j) ∈ η 2 | i 6= j; ∀(ai , a j ) ∈ Ω2 ;
edges determined by all possible pairs of features, creating
where fi (ai ) is the empirical frequency of value ai on feature
a fully meshed graph linking different flow samples through
i and fi j (ai , a j ) is the empirical joint frequency of the pair of
their common features. Each edge has an associated coupling
values (ai , a j ) of features i and j. Note that constraints 2 and
value determined by the function ei j (aki , ak j ).
3 force model P to generate single as well as joint empirical
Since the values of local fields and couplings depend on the frequency counts as marginals. This way, the model is sure to
values assumed by features within a given flow, each distinct be coherent with empirical data.
flow will have a different combination of these quantities. As
Single and joint empirical frequencies fi (ai ) and fi j (ai , a j )
in the Potts Model, the Hamiltonian involving local fields and
are obtained from set S by counting occurrences of a given
couplings determines the total "energy" H (ak1 ...akN ) of each
feature value ai or feature value pair (ai , a j ), respectively, and
flow. For instance, in Figure 1 B), the total "energy" of the
dividing by the total number of flows in S . Since the set S
flow is obtained by summing up all values associated with the
is finite and much smaller than K , inferences based on S are
edges and to the nodes, resulting in a total of -3. Note that
subjected to undersampling effects. Following the theoretical
what we call energy is analogous to the notion of Hamiltonian
framework proposed in [34], we add pseudocounts to empirical
in Quantum Mechanics. It is important to note that the model
frequencies to limit undersampling effects by performing the
described here is discrete, therefore continuous features must
following operations:
be discretized. The classes for continuous feature discretization
are shown in Section V. In the following, we present the α
fi (ai ) ← (1 − α) fi (ai ) + (4)
framework applied to perform the statistical model inference Q
and subsequent energy-based flow classification. α
fi j (ai , a j ) ← (1 − α) fi j (ai , a j ) + 2 (5)
A. Model inference Q
In this section, a statistical model is going to be inferred in where (ai , a j ) ∈ Ω2 and 0 ≤ α ≤ 1 is a parameter defining the
terms of couplings and local field values to perform energy- weight of the pseudocounts. The introduction of pseudocounts
based flow classification. The main idea consists in extracting is equivalent to assuming that S is extended with a fraction
a statistical model from benign flow samples to infer coupling of flows with uniformly sampled features.
and local field values that characterize this type of traffic. The proposed maximization can be solved using a La-
When calculating the energies of unlabeled flows using the grangian function such as presented in [36], yielding the
inferred values, it is expected that benign flows will have lower following Boltzmann-like distribution:
energies than malicious flows.
e−H (ak1 ...akN )
Let (A1 ...AN ) be an N-tuple of features, which can be in- P∗ (ak1 ...akN ) = (6)
stantiated for flow k as (ak1 ...akN ), with ak1 ∈ Ω1 , ..., akN ∈ ΩN . Z
Each feature value aki is encoded by an integer from the set where
Ω = {1, 2, ..., Q}, i.e., all feature alphabets are the same Ωi = Ω H (ak1 ...akN ) = − ei j (aki , ak j ) − ∑ hi (aki ) (7)
of size Q. If a given feature can only assume M values and

i, j|i< j i
6

is the Hamiltonian of flow k and Z (eq. (6)) is the partition action of a feature with its neighbors is replaced by an
function that normalizes the distribution. Since in this work we approximate interaction with an averaged feature, yielding an
are not interested in obtaining individual flow probabilities, approximated value for the local field associated to it.
Z is not required and, as a consequence, its calculation is For further details about these calculations, please refer to
omitted. Our objective is to calculate individual flows energies, [33]. Now that all model parameters are known, it is possible
i.e., individual Hamiltonians as determined in eq. (7). to calculate a given flow energy according to eq. (7). In the
Note that the Hamiltonian, as presented above, is fully following, we are going to present the theoretical framework
determined regarding the Lagrange multipliers ei j (·) and hi (·) implementation to perform a two-class, i.e., benign and mali-
associated to constraints (2) and (3), respectively. Within cious, flow classification, i.e., Energy-based flow classification
the Potts Model framework, the Lagrange multipliers have a (EFC).
special meaning, with the set {ei j (ai , a j )|(ai , a j ) ∈ Ω2 } being
the set of all possible coupling values between features i and B. Energy-based flow classification
j and {hi (ai )|ai ∈ Ω} the set of possible local fields associated
to feature i. The energy of a given flow can be calculated according to
eq. (7) based on the values of its features and the parameters
Inferring the local fields and pairwise couplings is dif- from the statistical model inferred in Section IV-A. A given
ficult since the number of parameters exceeds the number flow energy is the negative sum of couplings and local fields
of independent constraints. Due to the physical properties associated with its features, according to a given statistical
of interacting spins, it is possible to infer pairwise coupling model. It means that a flow that resembles the ones used to
values ei j (ai , a j ) using a Gaussian approximation. Assuming infer the model is likely to be low in energy.
that the same properties apply for flow features, we infer Since EFC is an anomaly-based classifier, the statistical
coupling values as follows: model used for classification is inferred based only on benign
ei j (ai , a j ) = −(C−1 )i j (ai , a j ), (8) flow samples. We would then expect the energies of benign
2 2 samples to be lower than the energies of malicious samples. In
∀(i, j) ∈ η , ∀(ai , a j ) ∈ Ω , ai , a j 6= Q
other words, what the energy value of a given flow is capturing
where is how dissimilar that flow is to a set of known benign flows
Ci j (ai , a j ) = fi j (ai , a j ) − fi (ai ) f j (a j ) (9) used to infer the model in the training phase. In terms of
frequencies, this means that, if a given flow presents feature
is the covariance matrix obtained from single and joint empir- values combinations that are very frequent in benign flow
ical frequencies. Taking the inverse of the covariance matrix samples, its energy will be low. In this sense, it is possible to
is a well known procedure in statistics to remove the effect of classify flow samples as benign or malicious based on a chosen
indirect correlation in data [37]. Now, it is important to clarify energy threshold. The classification is performed by stating
that the number of independent constraints in eq. (2) and eq. that samples with energy smaller than the threshold are benign,
(3) is actually N(N−1)
2 (Q − 1)2 + N(Q − 1), even though the and samples with energy greater than or equal to the threshold
N(N−1) 2
model in eq. (6) has 2 Q + NQ parameters. So, without are malicious. Note that the threshold for classification can be
loss of generality, we set: chosen in different ways, and it can be static or dynamic. In
ei, j (ai , Q) = ei, j (Q, a j ) = hi (Q) = 0 (10) this work, we will consider a static threshold.
Algorithm 1 shows the implementation of EFC. In lines 2-
Thus, in eq. (8) there is no need to calculate ei, j (ai , a j )in case 5, the statistical model for the sampled flows is inferred, as
ai or a j is equal to Q [34]. Afterwards, local fields hi (ai ) can described by eqs. (4), (5), (8) and (12). Afterward, on lines 6-
be inferred using a mean-field approximation [38]: 27, the classifier monitors the network waiting for a captured
! flow. When a flow is captured, its energy is calculated on lines
fi (ai )
= exp hi (ai ) + ∑ ei j (ai , a j ) f j (a j ) , (11) 9-20, according to the Hamiltonian in eq. (7). The computed
fi (Q) j,a j flow energy is compared to a known threshold (cutoff ) value
∀i ∈ η, ai ∈ Ω, ai 6= Q on line 21. In case the energy falls above the threshold, the
flow is classified as malicious and should be forwarded to deep
where fi (Q) is the frequency of the last element ai = Q packet inspection (line 23) for assessment. Otherwise, the flow
for any feature i used for normalization. It is also worth is released, and the classifier waits for another flow.
mentioning that the element Q is arbitrarily selected and could It is essential to highlight that the time complexity of the
be replaced by any other value in {1. . . Q} as long as the training step of EFC is O((M × Q)3 + N × M 2 × Q2 ), where
selected element is kept the same for calculations of the local N is the number of samples, M is the number of features,
fields of every feature i ∈ η. Note that in eq. (11) the empirical and Q is the size of the alphabet. Meanwhile, the complexity
single frequencies fi (ai ) and the coupling values ei j (ai , a j ) are of the classification step for each sample is O(M 2 ). It means
known, yielding: that, in both steps, the complexity is more dependant on the
number of features chosen, which can be kept small by using
 
fi (ai )
hi (ai ) = ln − ∑ ei j (ai , a j ) f j (a j ) (12) a feature selection mechanism, e.g., Principal Component
fi (Q) j,a j
Analysis (PCA). However, we do not currently explore any
In the mean-field approximation presented above, the inter- feature selection mechanisms because we consider it to be out
7

of scope of this work, in which the main aim is only to present V. R ESULTS
a first version of our newly proposed classifier for NIDS. In this section, we present the results obtained for EFC and
ML-based classifiers in different binary classification exper-
Algorithm 1 Energy-based Flow Classifier iments considering three different datasets, i.e., CIDDS-001,
Input: benign_ f lows(K×N) , Q, α, cuto f f CICIDS17, and CICDDoS19. First, we show that EFC can
separate benign from malicious flows based on their energies,
1: import all model inference functions
a result that is consistent for all considered datasets. Then, we
2: f _i ← SiteFreq(benign_ f lows, Q, α)
present EFC’s classification performance and compare it to the
3: f _i j ← PairFreq(benign_ f lows, f _i, Q, α)
classification performance of ML-based classifiers in different
4: e_i j ← Couplings( f _i, f _i j, Q)
experiments.
5: h_i ← LocalFields(e_i j, f _i, Q)
It is important to highlight that the classification experi-
6: while Scanning the Network do
ments we perform in this work were designed not only to
7: f low ← wait_for_incoming_flow()
assess the performance of different classifiers but also to
8: e ←0
investigate their capability of adaptation to different domains,
9: for i ← 1 to N − 1 do
i.e., data distributions. Hence, we performed two kinds of
10: a_i ← f low[i]
experiments: training/testing in the same domain, and train-
11: for j ← i + 1 to N do
ing/testing in different domains. For training/testing in the
12: a_ j ← f low[ j]
same domain, in each experiment, we assessed the average
13: if a_i 6= Q and a_ j 6= Q then
performance of the classifiers over ten different test sets,
14: e ← e − e_i j[i, a_i, j, a_ j]
containing 10,000 benign and 10,000 malicious samples each,
15: end if
randomly selected from the full dataset. Models were inferred
16: end for
based on 80% the test set and tested on the remaining 20%.
17: if a_i 6= Q then
The inferred models were used for each experiment to assess
18: e ← e − h_i[i, a_i]
the performance of the classifiers over another ten test sets
19: end if
composed of 2,000 benign and 2,000 malicious samples from
20: end for
another domain (data distribution).
21: if e ≥ cuto f f then
22: stop_flow()
23: forward_to_DPI() A. EFC characterization
24: else To assess EFC capabilities to separate benign from mali-
25: release_flow() cious traffic flow samples correctly, we performed classifica-
26: end if tion experiments considering datasets CIDDS-001, CICIDS17
27: end while and CICDDoS19. First, we inferred models based on benign
samples from the OpenStack (simulated) environment within
the CIDDS-001 dataset. This models were used to calculate
In Table V, it is possible to see that EFC has a low training
the energy of different benign and malicious flow samples
cost, linear in the number of samples (N), when compared to
coming also from the simulated traffic. Figure 2A shows
ML-based classifiers, such as Decision Tree (DT), Random
energy values of 40,000 classified flow samples, a merge of
Forest (RF) and Support Vector Machine (SVM). EFC’s train-
the results obtained over ten randomly sampled test sets, as
ing complexity is considered to be dominated by the term NM 2
described in the last paragraph of the previous section. The
because the number of training samples is expected to be much
statistical model used to calculate the energies in each test set
bigger than both the number of features (M) and the size of the
was inferred based on 8,000 benign flows randomly sampled
alphabet (Q), which means that the term (MQ)3 is likely not
from the simulated traffic. Flow samples with energy values
dominant over NM 2 . Considering the implementation shown
falling above the energy cutoff, defined as the 95th percentile
in this section, we present the results obtained using EFC and
of the benign traffic training distribution (red dashed line),
ML-algorithms in classification experiments in the following.
would be classified as malicious, while the remaining samples
would be classified as benign. It is possible to observe that
Table V the separation between the two flow classes is clear, i.e., the
T RAINING COMPLEXITY OF DIFFERENT ML- ALGORITHMS [39] AND EFC energy distribution of tested benign flows falls mostly on the
left side of the cutoff line, while the energy distribution of
Algorithm Time complexity Notes
tested malicious flows falls mostly on the right side of the
ANN O(EMNK) E: number of epochs; K: number of neurons
cutoff line, as expected.
DT O(MNlog(N))
Figure 2B-C shows the results of an analogous experiment
KNN O(Nlog(K)) K: number of neighbors
performed on the remaining two datasets. Similarly to what
RF O(T MNlog(N)) T : number of trees
is observed for CIDDS-001, when trained on CICIDS17,
SVM O(N 2 )
EFC O(NM 2 )
EFC is also capable of clearly separating the two classes
(Figure 2B). This means that the benign energy histogram
of tested samples falls mostly on the left side of the cutoff
8

A
Probability Train/Test simulated (CIDDS-001) B Train/Test CICIDS17 C Train/Test CICDDoS19

Energy Energy Energy

Figure 2. Energy histograms of benign (n = 20,000 in each plot) and malicious (n = 20,000 in each plot) flow samples obtained in the testing phase of a
classification experiment performed over the CIDDS-001 dataset (A), CICIDS17 (B) and CICDDoS19 (C) datasets. The energy threshold for classification is
shown as a red dashed line and corresponds to the 95th percentile of the energy distribution obtained in the training phase.

line (95th percentile of the training distribution), while the which pairs of features are similar to normal traffic (blue
malicious energy histogram of tested samples falls mainly on squares) and might be confounding the model. It is interesting
the right side of the cutoff line. Again, the same result can be to note that different kinds of attacks are characterized by
observed for dataset CICDDoS19 (Figure 2C). It is interesting different combinations of abnormal feature pairs, as expected.
to observe that, although the benign flows energy histograms For instance, the most abnormal thing about DoS attacks is
looks similar regarding variance for the three datasets, the the combination of number of packets and duration, while for
malicious flows energy histogram varies. In CIDDS-001, it has port scan attacks is the combination of source and destination
very low variance, reflecting the fact that this dataset contain ports. This analysis was done considering only the couplings
only four classes of attacks and is highly imbalanced, while and not the local fields. An analogous breakdown can be done
in CICIDS17 and CICDDoS19, malicious energy histograms for individual flow samples, allowing for the understanding
have a broader spread, reflecting the greater variability of of which features cause a specific sample to be classified as
malicious flows that exists in those datasets. malicious or benign.
In summary, the results presented in this subsection show
that EFC can correctly discriminate between the two flow
classes considered, i.e., benign and malicious, and the results
are consistent for all datasets considered. In addition, it was
shown how the total energy of different attack classes can be
broken down and analyzed in detail. This is illustrative of the
white box nature of the statistical model inferred by EFC.
In the following, classification results are shown for different
classifiers and compared with the results obtained for EFC.

B. Comparative analysis of EFC’s performance


We compared EFC to five different ML classifiers: K-
Nearest Neighbors (KNN) [40], Decision Tree (DT) [41], [42],
Multilayer perceptron (MLP) [43], Naive Bayes (NB) [44],
and Support Vector Machine (SVM) [45], all deployed with
their default scikit-learn1 configurations. Additionally, two
ensemble methods, namely Adaboost (AB) [46] and Random
Forest (RF) [47], were considered, also with default scikit-
learn parameters. Flow features were only discretized for EFC
Figure 3. Individual contributions of each feature pair for the total energy of (Table VI) since discretization would impair the performance
each attack type within CIDDS-001 dataset: brute force (n = 150), ping scan
(n = 100), port scan (n = 800) and DoS (n = 8,950). The heatmap shows the of most ML algorithms. The metrics used to compare the
energetic difference ∆e that each pair of features has in relation to the average results were the F1 score and the area under the ROC curve
expected energy value of that pair in benign flows (n = 10,000), calculated (AUC). The first metric, F1 score, is the harmonic mean of
as ∆e = ēattack − ēbenign .
the Precision and the Recall, i.e.,
The white box nature of the statistical model inferred by 2 2 · Precision · Recall
F1 = = (13)
EFC is demonstrated in Figure 3, where the energy of different Precision−1 + Recall −1 Precision + Recall
attack classes is broken down to the individual contributions where Precision = T P/(T P + FP), Recall = T P/(T P + FN),
of each pair of features. As shown, for a given attack class, TP are the true positives, i.e., malicious traffic classified
it is possible to identify which combination of features is
contributing the most to its abnormality (red squares) and 1 Scikit-learn library - https://scikit-learn.org
9

as malicious, FP are the false positives, i.e., benign traffic Table VII
classified as malicious, and FN are the false negatives, i.e., ma- AVERAGE COMPOSITION OF EACH OF THE TEST SETS IN EXPERIMENT 1
licious traffic classified as benign. The second metric, the area
CIDDS-001 OpenStack CIDDS-001 real traffic
under the ROC curve (AUC), is one of the most widespread
evaluation metrics for binary classifiers [48], [49]. The ROC Label Number Label Number
curve is constructed by plotting the true positive rate (TPR) normal 10,000 unknown 2,000
against the false positive rate (FPR) at different classification dos 9,800 suspicious 2,000
thresholds. It means that the AUC is the probability that pingScan 20
a randomly chosen positive example will receive a higher portScan 150
score than a randomly chosen negative one. One of the main
bruteForce 30
advantages of the AUC is that it is invariant to changes in
class distribution, i.e., the ROC curve will not change if the Total 20,000 Total 4,000
distribution changes in a test set, but the underlying conditional
distributions from which the data are drawn stay the same [50], Table VIII
[49]. Since we are interested in evaluating domain adaptation, AVERAGE COMPOSITION OF EACH OF THE TEST SETS IN CROSS - DATASET
this metric is particularly interesting to be adopted in this EXPERIMENTS 2 AND 3

work.
CICIDS17 CICDDoS19
Table VI shows the classes considered for feature dis-
Label Number Label Number
cretization on CIDDS-001 dataset. Since TCP Flags is the
discrete feature with more possible values (32 possibilities), benign 10,000 benign 10,000
the alphabet size Q was set to 32. Each continuous feature FTP Patator 170 DrDoS DNS 890
values were clustered in a certain number of classes (or bins), SSH Patator 80 DrDoS LDAP 370
up to Q classes. Classes were determined in such a way that DDoS 2,740 DrDoS MSSQL 890
the number of values within each class was similar for all PortScan 1,060 DrDoS NetBIOS 880
classes. Features within datasets CICIDS17 and CICDDoS19 Bot 110 DrDoS NTP 890
were also discretized in such a way that the number of values Infiltration 20 DrDoS SNMP 200
within each bin was similar for all bins. These discretizations
Brute force 50 DrDoS SSDP 890
are not shown here because of the high number of features
SQL injection 10 DrDoS UDP 890
( 80 features) in these datasets.
XSS 10 Syn 890
Table VI DoS Hulk 2,730 TFTP 890
C LASSES CONSIDERED FOR FEATURE DISCRETIZATION ON CIDDS-001 DoS GoldenEye 2,730 LDAP 120
Feature List of classes upper limits
DoS Slowloris 120 NetBIOS 140
Duration 0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.01, 0.04, 1, 10, 100, ∞ DoS Slowhttptest 170 MSSQL 660
Protocol TCP, UDP, GRE, ICMP, IGMP Portmap 400
Src Port 50, 60, 100, 400, 500, 40000, 60000, ∞
UDP 880
Dst Port 50, 60, 100, 400, 500, 40000, 60000, ∞
Num. of Bytes 50, 60, 70, 90, 100, 110, 200, 300, 400, 500, 700, 1000, 5000, ∞
UDPLag 120
Num. of Packets 2, 3, 4, 5, 6, 7, 10, 20, ∞ Total 20,000 Total 20,000
TCP Flags {( f1 , f2 , f3 , f4 , f5 )| fi ∈ {0, 1}}
and 20% for testing. The test sets containing samples from a
To evaluate EFC’s performance compared to other clas- different domain were not used for training, hence they were
sifiers, we performed three independent experiments. The composed of only 2,000 benign and 2,000 malicious samples,
first experiment was performed on CIDDS-001. Training was randomly selected from the full dataset. EFC’s cutoff was
performed on simulated flow samples, while testing was per- defined to be at the 95th percentile of the energy distribution
formed on both simulated and real flow samples captured in obtained in the training phase based solely in benign sam-
an external server. The second and the third experiments were ples. It means that we used a completely statistical threshold
cross-dataset experiments performed on CICIDS17 and CICD- based only in the training considering benign traffic without
DoS19. In the former, training was performed on CICIDS17, adjustments based on malicious samples, such as other ML-
with testing on both datasets, while in the latter training was Algorithms require. The average composition of the test sets
performed on CICDDoS19, with testing on both datasets. is shown in Table VII and VIII.
Essentially, in each experiment, we measured the perfor- Table IX shows the average performance and standard
mance of the classifiers when trained and tested in the same error (95% confidence interval) of each classifier on the first
domain and when trained in one domain and tested in a experiment considering CIDDS-001 dataset. When trained
different one. The performance was measured as the average and tested in the same simulated environment, DT is the
over ten different test sets, composed of 10,000 benign and algorithm presenting the best performance, with an F1-score of
10,000 malicious samples each, randomly selected form the 0.999 ± 0.000 and 0.999 ± 0.000 AUC. EFC does also perform
full dataset. With 80% of each test set being used for training well, being the second best in terms of AUC (0.997 ± 0.001).
10

When trained in the simulated environment and tested in the AUC (0.670 ± 0.002). EFC’s AUC (0.664 ± 0.002) was the
real environment, EFC outperforms the other classifiers (both second best, which means that EFC performance was good,
simple and ensemble methods) in F1-score (0.675 ± 0.009) taking into consideration both the F1-score and the AUC.
and AUC (0.720 ± 0.001). It is noteworthy that all algorithms Even though this adaptation seems more challenging than the
present a considerable degradation in performance when tested previous ones, EFC’s performance was consistent on all the
in a different domain, showing how sensitive the inferred experiments performed.
models are to changes in data distribution.
Table XI
Table IX AVERAGE CLASSIFICATION PERFORMANCE AND STANDARD ERROR
AVERAGE CLASSIFICATION PERFORMANCE AND STANDARD ERROR (95% CI) - TRAINING PERFORMED ON CICDD O S19
(95% CI) - TRAINING PERFORMED ON CIDDS-001 SIMULATED TRAFFIC
Train/Test CICDDoS19 Train CICDDoS19/Test CICIDS17
Train/Test simulated Train simulated/Test real Classifier F1 score AUC F1 score AUC
Classifier F1 score AUC F1 score AUC NB 0.590 ± 0.006 0.428 ± 0.007 0.590 ± 0.006 0.428 ± 0.006
NB 0.043 ± 0.016 0.502 ± 0.004 0.057 ± 0.024 0.517 ± 0.016 KNN 0.960 ± 0.002 0.984 ± 0.001 0.397 ± 0.043 0.670 ± 0.002
KNN 0.988 ± 0.001 0.994 ± 0.000 0.118 ± 0.014 0.524 ± 0.001 DT 0.998 ± 0.000 0.998 ± 0.000 0.259 ± 0.012 0.476 ± 0.000
DT 0.999 ± 0.000 0.999 ± 0.000 0.556 ± 0.007 0.619 ± 0.000 SVM 0.933 ± 0.002 0.976 ± 0.002 0.239 ± 0.009 0.538 ± 0.002
SVM 0.805 ± 0.003 0.951 ± 0.002 0.531 ± 0.005 0.707 ± 0.003 MLP 0.968 ± 0.002 0.993 ± 0.001 0.227 ± 0.011 0.451 ± 0.002
MLP 0.979 ± 0.002 0.993 ± 0.001 0.151 ± 0.016 0.596 ± 0.002 EFC 0.916 ± 0.002 0.981 ± 0.001 0.641 ± 0.002 0.664 ± 0.002
EFC 0.975 ± 0.001 0.997 ± 0.001 0.675 ± 0.009 0.720 ± 0.001 Ensemble
Ensemble AB 0.995 ± 0.001 1.000 ± 0.000 0.270 ± 0.013 0.660 ± 0.001
AB 0.999 ± 0.000 1.000 ± 0.000 0.594 ± 0.022 0.630 ± 0.000 RF 0.997 ± 0.000 1.000 ± 0.000 0.089 ± 0.032 0.623 ± 0.000
RF 0.999 ± 0.000 1.000 ± 0.000 0.269 ± 0.018 0.714 ± 0.000
Taken as a whole, the results presented in this subsection
Table X shows the results of experiment two, which was per- show that EFC is better at adapting to other domains than
formed on CICIDS17 and CICDDoS19 datasets. When trained classical ML-based classifiers on average. In addition to that,
and tested on CICIDS17, DT is again the algorithm to present it is possible to see that EFC achieves AUC values similar
the best performance both in terms of F1-score (0.994±0.001) to the best ML algorithms when trained and tested in the
and AUC (0.994 ± 0.001), though indistinguishable from MLP same domain, showing that it is capable of performing well
AUC (0.993 ± 0.001). Notably, when trained on CICIDS17 even if trained with only half of the information (benign data
and tested on CICDDoS19, EFC outperformed the other only) when compared to other classifiers (using malicious and
simple algorithms in both F1-score (0.787 ± 0.004) and AUC benign data). Not using malicious samples in the training
(0.781 ± 0.003). Again, it is possible to see that EFC is the phase is likely to be the reason why EFC is so good at
best algorithm in adapting to a different data distribution adapting to other domains. EFC’s increased capability for
when evaluating both metrics. However, when considering domain adaptation when there is a significant difference in data
also ensemble methods, RF outperforms EFC, which becomes distribution is a highly desirable trait in network flow-based
second best in terms of AUC. classifiers, since changes in traffic composition are expected
to be very frequent, and new kinds of attacks are generated
Table X
AVERAGE CLASSIFICATION PERFORMANCE AND STANDARD ERROR continuously.
(95% CI) - TRAINING PERFORMED ON CICIDS17 Finally, we believe EFC to be an interesting tool for network
managers, given (i) its more realistic requirements for training
Train/Test CICIDS17 Train CICIDS17/Test CICDDoS19
(only benign traffic that can be easily captured in the target
Classifier F1 score AUC F1 score AUC
network), (ii) its adaptability when faced with changes in
NB 0.344 ± 0.049 0.413 ± 0.021 0.344 ± 0.049 0.413 ± 0.049
KNN 0.961 ± 0.001 0.987 ± 0.001 0.457 ± 0.046 0.771 ± 0.001
traffic patterns, and (iii) the possibility to identify which
DT 0.994 ± 0.001 0.994 ± 0.001 0.168 ± 0.090 0.525 ± 0.001
flow features cause a specific network flow to be classified
SVM 0.930 ± 0.003 0.974 ± 0.001 0.264 ± 0.025 0.664 ± 0.003 as benign or malicious. However, there is still great room
MLP 0.961 ± 0.003 0.993 ± 0.001 0.221 ± 0.033 0.775 ± 0.003 for improvement. One possibility would be to incorporate
EFC 0.898 ± 0.003 0.975 ± 0.001 0.787 ± 0.004 0.781 ± 0.003 EFC as a first step of a two-step NIDS, in which the flow
Ensemble samples detected as malicious by EFC would be then sent to
AB 0.991 ± 0.002 1.000 ± 0.000 0.228 ± 0.055 0.698 ± 0.002 deep package inspection. Another possibility is to implement
RF 0.997 ± 0.001 1.000 ± 0.000 0.021 ± 0.003 0.867 ± 0.001 a dynamic threshold that would be adaptable to different
network situations, improving classification accuracy. There is
Further, Table XI shows the results of experiment three, also the possibility of performing feature selection previous
which was performed on CICIDS17 and CICDDoS19 datasets. to model inference, which would greatly reduce the time
Once more, DT overperformed the other classifiers when spent in the model inference phase and possibly also improve
training and testing on the same dataset, with F1-score of classification accuracy. And finally, it would be possible to
0.998 ± 0.000 and AUC of 0.998 ± 0.000. When tested on implement EFC to perform traffic classification at different
the CICDDoS19 dataset though, EFC achieved the best F1- points in a distributed network. In the following, we present
score (0.641 ± 0.002), while KNN was the best in terms of our conclusions and future work directions.
11

VI. C ONCLUSION [3] M. Ring, S. Wunderlich, D. Scheuring, D. Landes, and A. Hotho, “A


survey of network-based intrusion detection data sets,” Computers &
In this work, we present a new flow-based classifier for Security, 2019.
network intrusion detection called Energy-based Flow Clas- [4] A. Sperotto, G. Schaffrath, R. Sadre, C. Morariu, A. Pras, and B. Stiller,
sifier (EFC). In the training phase, EFC infers a statistical “An overview of IP flow-based intrusion detection,” IEEE Communica-
tions Surveys and Tutorials, vol. 12, no. 3, pp. 343–356, 2010.
model based solely on benign traffic samples. Afterward, [5] J. C. Correa Chica, J. C. Imbachi, and J. F. Botero Vega, “Security
this statistical model is used to classify network flows in in sdn: A comprehensive survey,” Journal of Network and Computer
benign or malicious based on "energy" values. Our results Applications, p. 102595, 2020.
[6] M. F. Umer, M. Sher, and Y. Bi, “Flow-based intrusion detection:
show that EFC is capable of correctly performing network Techniques and challenges,” Computers and Security, vol. 70, pp. 238–
flow binary classification considering three different datasets. 254, sep 2017.
F1 score (around 97% at best) and AUC (around 99% at [7] A. Singla, E. Bertino, and D. Verma, “Overcoming the lack of labeled
data: training intrusion detection models using transfer learning,” in 2019
best) values obtained using EFC are comparable to the values IEEE International Conference on Smart Computing (SMARTCOMP).
obtained with other classical ML-based classifiers, such as IEEE, 2019, pp. 69–74.
k-nearest neighbors, decision tree and multilayer perceptron, [8] K. Bartos, M. Sofka, and V. Franc, “Optimized invariant representation
even though EFC uses only half of the information in the of network traffic for detecting unseen malware variants,” in 25th
USENIX Security Symposium (USENIX Security 16), 2016, pp. 807–
training phase compared to the other algorithms. 822.
In addition to that, we analyzed different classifiers in terms [9] H. Li, Z. Chen, R. Spolaor, Q. Yan, C. Zhao, and B. Yang, “Dart:
of their capability for domain adaptation and observed that Detecting unseen malware variants using adaptation regularization trans-
fer learning,” in ICC 2019-2019 IEEE International Conference on
EFC is more suitable to that than classical ML-based algo- Communications (ICC). IEEE, 2019, pp. 1–6.
rithms. In all the experiments performed to evaluate that over [10] M. Zolanvari, M. A. Teixeira, and R. Jain, “Effect of imbalanced datasets
different datasets, EFC outperformed the other classifiers in on security of industrial iot using machine learning,” in 2018 IEEE
International Conference on Intelligence and Security Informatics (ISI).
F1-score and was among the best ones in AUC. We understand IEEE, 2018, pp. 112–117.
that EFC’s capability for domain adaptation is probably linked [11] C. Rudin, “Stop explaining black box machine learning models for high
to the fact that the model inference is based only on benign stakes decisions and use interpretable models instead,” Nature Machine
Intelligence, vol. 1, no. 5, pp. 206–215, 2019.
samples, which helps preventing overfit. [12] A. Holzinger, “From machine learning to explainable ai,” in 2018 World
Considering the advantages presented, we believe EFC to be Symposium on Digital Intelligence for Systems and Machines (DISA).
a promising algorithm to perform flow-based traffic classifi- IEEE, 2018, pp. 55–66.
[13] M. Ring, S. Wunderlich, D. Grüdl, D. Landes, and A. Hotho, “Flow-
cation. Nevertheless, despite the promising results achieved, based benchmark data sets for intrusion detection,” in Proceedings of
there is still room for further testing and improvement. In the 16th European Conference on Cyber Warfare and Security. ACPI,
future work, we aim at performing a more comprehensive 2017, pp. 361–369.
investigation of EFC’s applicability to real-world data and [14] I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, “Toward generating
a new intrusion detection dataset and intrusion traffic characterization.”
different contexts, such as fraud analysis in bank data. We in ICISSP, 2018, pp. 108–116.
are already working in a multiclass version of EFC that will [15] I. Sharafaldin, A. H. Lashkari, S. Hakak, and A. A. Ghorbani, “Devel-
be capable of identifying different kinds of known attacks, as oping realistic distributed denial of service (ddos) attack dataset and
taxonomy,” in 2019 International Carnahan Conference on Security
well as unknown suspicious flow samples. Finally, we will Technology (ICCST). IEEE, 2019, pp. 1–8.
investigate which improvements can be attained by using a [16] R. Vinayakumar, K. Soman, and P. Poornachandran, “Evaluating effec-
dynamic threshold in EFC and performing feature selection tiveness of shallow and deep networks to intrusion detection system,”
in 2017 International Conference on Advances in Computing, Commu-
previous to model inference. nications and Informatics (ICACCI). IEEE, 2017, pp. 1282–1289.
[17] A. M. Mahfouz, D. Venugopal, and S. G. Shiva, “Comparative analysis
ACKNOWLEDGMENT of ml classifiers for network intrusion detection,” in Fourth International
Congress on Information and Communication Technology. Springer,
The authors would like to thank Luís Paulo Faina Garcia for 2020, pp. 193–207.
helping with dataset analysis. Matt Bishop was supported by [18] S. Khan, E. Sivaraman, and P. B. Honnavalli, “Performance evaluation of
advanced machine learning algorithms for network intrusion detection
the National Science Foundation under Grant Number OAC- system,” in Proceedings of International Conference on IoT Inclusive
1739025. Any opinions, findings, and conclusions or recom- Life (ICIIL 2019), NITTTR Chandigarh, India. Springer, 2020, pp.
mendations expressed in this material are those of the author(s) 51–59.
and do not necessarily reflect the views of the National Science [19] X. Tan, S. Su, Z. Huang, X. Guo, Z. Zuo, X. Sun, and L. Li, “Wireless
sensor networks intrusion detection based on smote and the random
Foundation. João Gondim gratefully acknowledges the support forest algorithm,” Sensors, vol. 19, no. 1, p. 203, 2019.
from Project "EAGER: USBRCCR: Collaborative: Securing [20] J. Kazemitabar, R. Taheri, and G. Kheradmandian, “A novel technique
Networks in the Programmable Data Plane Era" funded by for improvement of intrusion detection via combining random forrest
and genetic algorithm,” Journal of Advanced Defence Science and
NSF (National Science Foundation), RNP (Brazilian National Technology, 2019.
Research Network) and GigaCandanga. [21] T. T. Bhavani, M. K. Rao, and A. M. Reddy, “Network intrusion
detection system using random forest and decision tree machine learning
techniques,” in First International Conference on Sustainable Technolo-
R EFERENCES gies for Computational Intelligence. Springer, 2020, pp. 637–643.
[1] Symantec, “Internet Security Threat Report (ISTR) 2019 | [22] A. Verma and V. Ranga, “Statistical analysis of cidds-001 dataset
Symantec,” Apr. 2019. [Online]. Available: https://www.symantec. for network intrusion detection systems using distance-based machine
com/security-center/threat-report learning,” Procedia Computer Science, vol. 125, pp. 709–716, 2018.
[2] M. Pradhan, C. K. Nayak, and S. K. Pradhan, “Intrusion detection system [23] M. Ring, D. Landes, and A. Hotho, “Detection of slow port scans in
(ids) and their types,” in Securing the Internet of Things: Concepts, flow-based network traffic,” PloS one, vol. 13, no. 9, p. e0204507, 2018.
Methodologies, Tools, and Applications. IGI Global, 2020, pp. 481– [24] R. Abdulhammed, M. Faezipour, A. Abuzneid, and A. AbuMallouh,
497. “Deep and Machine Learning Approaches for Anomaly-Based Intrusion
12

Detection of Imbalanced Network Traffic,” IEEE Sensors Letters, vol. 3, [50] S. Wu, P. Flach, and C. Ferri, “An improved model selection heuristic
no. 1, pp. 1–4, jan 2019. for auc,” in European Conference on Machine Learning. Springer,
[25] A. Yulianto, P. Sukarno, and N. A. Suwastika, “Improving adaboost- 2007, pp. 478–489.
based intrusion detection system (ids) performance on cic ids 2017
dataset,” in Journal of Physics: Conference Series, vol. 1192, no. 1.
IOP Publishing, 2019, p. 012018.
[26] D. Aksu, S. Üstebay, M. A. Aydin, and T. Atmaca, “Intrusion detec-
tion with comparative analysis of supervised learning techniques and Camila F. T. Pontes is a student at the University
fisher score feature selection algorithm,” in International Symposium on of Brasilia (UnB), Brasilia, DF, Brazil. She received
Computer and Information Sciences. Springer, 2018, pp. 141–149. her M.Sc. degree in Molecular Biology in 2016 from
[27] J. Li, M. Liu, Z. Xue, X. Fan, and X. He, “Rtvd: A real-time volumetric UnB and is currently an undergrad student at the
detection scheme for ddos in the internet of things,” IEEE Access, vol. 8, Department of Computer Science (CIC/UnB). Her
pp. 36 191–36 201, 2020. research interests are Computational and Theoretical
[28] Y. Jia, F. Zhong, A. Alrawais, B. Gong, and X. Cheng, “Flowguard: Biology and Network Security.
An intelligent edge defense mechanism against iot ddos attacks,” IEEE
Internet of Things Journal, 2020.
[29] M. P. Novaes, L. F. Carvalho, J. Lloret, and M. L. Proença, “Long short-
term memory and fuzzy logic for anomaly detection and mitigation
in software-defined network environment,” IEEE Access, vol. 8, pp.
83 765–83 781, 2020.
[30] A. AlEroud and G. Karabatis, “A contextual anomaly detection approach
to discover zero-day attacks,” in 2012 International Conference on Cyber Manuela M. C. de Souza is an undergrad Computer
Security, 2012, pp. 40–45. Science student at University of Brasilia (UnB),
[31] D. Plonka, “Flowscan: A network traffic flow reporting and visualization Brasilia, DF, Brazil. Her research interest is Network
tool.” in LISA, 2000, pp. 305–317. Security.
[32] A. H. Lashkari, G. Draper-Gil, M. S. I. Mamun, and A. A. Ghorbani,
“Characterization of tor traffic using time based features.” in ICISSP,
2017, pp. 253–262.
[33] S. Cocco, C. Feinauer, M. Figliuzzi, R. Monasson, and M. Weigt,
“Inverse statistical physics of protein sequences: A key issues review,”
Reports on Progress in Physics, vol. 81, no. 3, p. 032601, mar 2018.
[34] F. Morcos, A. Pagnani, B. Lunt, A. Bertolino, D. S. Marks, C. Sander,
R. Zecchina, J. N. Onuchic, T. Hwa, and M. Weigt, “Direct-coupling
analysis of residue coevolution captures native contacts across many
protein families,” Proceedings of the National Academy of Sciences of
the United States of America, vol. 108, no. 49, pp. E1293–E1301, dec João J. C. Gondim was awarded an M.Sc. in
2011. Computing Science at Imperial College, University
[35] F. Y. Wu, “The Potts model,” Reviews of Modern Physics, vol. 54, no. 1, of London, in 1987 and a Ph.D. in Electrical En-
pp. 235–268, jan 1982. gineering at UnB (University of Brasilia, 2017). He
[36] E. T. Jaynes, “Information theory and statistical mechanics. II,” Physical is an adjunct professor at Department of Computing
Review, vol. 108, no. 2, pp. 171–190, may 1957. Science (CIC) at UnB where he is a tenured mem-
[37] B. Giraud, J. M. Heumann, and A. S. Lapedes, “Superadditive correla- ber of faculty. His research interests are network,
tion,” Physical Review E, vol. 59, no. 5, p. 4983, 1999. information and cyber security.
[38] A. Georges and J. S. Yedidia, “How to expand around mean-field theory
using high-temperature expansions,” Journal of Physics A: Mathematical
and General, vol. 24, no. 9, p. 2173, 1991.
[39] A. L. Buczak and E. Guven, “A survey of data mining and machine
learning methods for cyber security intrusion detection,” IEEE Commu-
nications surveys & tutorials, vol. 18, no. 2, pp. 1153–1176, 2015.
[40] J. A. Hartigan and M. A. Wong, “Algorithm as 136: A k-means Matt Bishop received his Ph.D. in computer science
clustering algorithm,” Journal of the Royal Statistical Society. Series from Purdue University, where he specialized in
C (Applied Statistics), vol. 28, no. 1, pp. 100–108, 1979. computer security, in 1984. His main research area is
[41] J. R. Quinlan, “Simplifying decision trees,” International journal of the analysis of vulnerabilities in computer systems.
man-machine studies, vol. 27, no. 3, pp. 221–234, 1987. The second edition of his textbook, Computer Se-
[42] P. H. Swain and H. Hauska, “The decision tree classifier: Design and curity: Art and Science, was published in 2002 by
potential,” IEEE Transactions on Geoscience Electronics, vol. 15, no. 3, Addison-Wesley Professional. He is currently a co-
pp. 142–147, 1977. director of the Computer Security Laboratory at the
[43] W. S. McCulloch and W. Pitts, “A logical calculus of the ideas immanent University of California Davis.
in nervous activity,” The bulletin of mathematical biophysics, vol. 5,
no. 4, pp. 115–133, 1943.
[44] D. D. Lewis, “Naive (bayes) at forty: The independence assumption
in information retrieval,” in European conference on machine learning.
Springer, 1998, pp. 4–15.
[45] C. Cortes and V. Vapnik, “Support-vector networks,” Machine learning, Marcelo Antonio Marotta is an adjunct professor at
vol. 20, no. 3, pp. 273–297, 1995. the University of Brasilia, Brasilia, DF, Brazil. He
[46] Y. Freund, R. E. Schapire et al., “Experiments with a new boosting received his Ph.D. degree in Computer Science in
algorithm,” in icml, vol. 96. Citeseer, 1996, pp. 148–156. 2019 from the Institute of Informatics (INF) of the
[47] V. Svetnik, A. Liaw, C. Tong, J. C. Culberson, R. P. Sheridan, and Federal University of Rio Grande do Sul (UFRGS),
B. P. Feuston, “Random forest: a classification and regression tool Brazil. His research involves Heterogeneous Cloud
for compound classification and qsar modeling,” Journal of chemical Radio Access Networks, Internet of Things, Soft-
information and computer sciences, vol. 43, no. 6, pp. 1947–1958, 2003. ware Defined Radio, Cognitive Radio Networks, and
[48] N. Japkowicz and M. Shah, Evaluating learning algorithms: a classifi- Network Security.
cation perspective. Cambridge University Press, 2011.
[49] D. Brzezinski and J. Stefanowski, “Prequential auc: properties of the area
under the roc curve for data streams with concept drift,” Knowledge and
Information Systems, vol. 52, no. 2, pp. 531–562, 2017.

You might also like