0% found this document useful (0 votes)

33 views16 pages

Hybrid Intrusion Detection System Based On Combination of

Uploaded by

Bint-E- Haw'wa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views16 pages

Hybrid Intrusion Detection System Based On Combination of

Uploaded by

Bint-E- Haw'wa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

SS symmetry

Article
Hybrid Intrusion Detection System Based on Combination of
Random Forest and Autoencoder
Chao Wang 1,2 , Yunxiao Sun 1,2 , Wenting Wang 3 , Hongri Liu 1,4 and Bailing Wang 1,2, *

1 School of Computer Science and Technology, Harbin Institute of Technology, Weihai 264209, China
2 School of Cyber Science and Technology, Harbin Institute of Technology, Harbin 150001, China
3 State Grid Shandong Electric Power Company, Electric Power Research Institute, Jinan 250003, China
4 Weihai Cyberguard Technologies Co., Ltd., Weihai 264209, China
* Correspondence: [email protected]

Abstract: To cope with the rising threats posed by network attacks, machine learning-based intrusion
detection systems (IDSs) have been intensively researched. However, there are several issues that
need to be addressed. It is difficult to deal with unknown attacks that do not appear in the training
set, and as a result, poor detection rates are produced for these unknown attacks. Furthermore, IDSs
suffer from high false positive rate. As different models learn data characteristics from different
perspectives, in this work we propose a hybrid IDS which leverages both random forest (RF) and
autoencoder (AE). The hybrid model operates in two steps. In particular, in the first step, we utilize
the probability output of the RF classifier to determine whether a sample belongs to attack. The
unknown attacks can be identified with the assistance of the probability output. In the second step,
an additional AE is coupled to reduce the false positive rate. To simulate an unknown attack in
experiments, we explicitly remove some samples belonging to one attack class from the training
set. Compared with various baselines, our suggested technique demonstrates a high detection rate.
Furthermore, the additional AE detection module decreases the false positive rate.

Keywords: intrusion detection; random forest; autoencoder; hybrid model; unknown attack

Citation: Wang, C.; Sun, Y.; Wang, W.;

Liu, H.; Wang, B. Hybrid Intrusion 1. Introduction
Detection System Based on Network security has become a hot topic with the development of internet and com-
Combination of Random Forest and munication technologies such as cloud computing, the Internet of Things, etc. Among the
Autoencoder. Symmetry 2023, 15, 568. security tools to deal with the rising threats of network attacks and guarantee cyberspace
https://doi.org/10.3390/ security, intrusion detection systems (IDSs) play a significant role [1].
sym15030568
The concept of the IDS was proposed in 1980 [2]. From then on, more and more IDSs
Academic Editor: Jeng-Shyang Pan have been developed. In general, there are two criteria for the detection performance of a
powerful IDS. First, it should identify more attack samples and obtain a higher detection
Received: 27 January 2023
rate. Secondly, when there is a large number of false alarms, security operators may
Revised: 8 February 2023
overlook real network attacks; thus, it is vital to reduce the false alarm rate as much as
Accepted: 14 February 2023
feasible. In order to build powerful and effective IDSs, researchers have sought to utilize
Published: 21 February 2023
artificial intelligence methods [3] such as machine learning and deep learning [1,4–7]. One
effective detection approach is based on supervised learning.
The supervised detection method utilizes a labeled dataset to train a classifier. For ex-
Copyright: © 2023 by the authors. ample, in the decision tree (DT)-based IDS [8], the tree learns multiple rules for the labeled
Licensee MDPI, Basel, Switzerland. dataset. Ensemble of DTs, known as random forest (RF) have also been used [9]. To detect
This article is an open access article DDoS attacks, the authors of [10] extracted new features of the flow packet size and time in-
distributed under the terms and terval using the graphic symmetry concept. Since the development of deep learning, some
conditions of the Creative Commons IDS have been constructed based on deep neural networks (DNN) [11]. DNNs employ a
Attribution (CC BY) license (https:// number of hidden layers to learn the complex relations between the input features and
creativecommons.org/licenses/by/ the classification target. In addition, there is some work involving convolutional neural
4.0/).

Symmetry 2023, 15, 568. https://doi.org/10.3390/sym15030568 https://www.mdpi.com/journal/symmetry

Symmetry 2023, 15, 568 2 of 16

networks [12]. In datasets with both normal and attack samples, a classifier can find a
decision boundary between the normal and attack samples.
As network attack methods are becoming more complicated, however, it is challenging
to obtain samples of all attack types. When encountering unknown attacks during the
detection phase, the supervised classifier may not generalize well [13], which may result in
misclassification for these samples and decrease the detection rate. To tackle this challenge,
researchers have attempted to create an IDS model that trains on the normal data alone,
such as an autoencoder (AE) [14]. AEs employ the reconstruction error as the anomaly
score, where one sample with a higher score could be detected as a attack. However, due
to the lack of supervision by both normal and attack samples, it may not obtain the same
high performance as supervised algorithms, which may learn complex decision boundaries
between the normal and attack samples.
In this paper, we focus on the case where samples belonging to attack classes that
are used for training are limited because it is difficult to collect samples of all attack types.
For example, there are some types of attacks that have been used for training supervised
models. As attack variants or new types of attacks continue to emerge, the trained detection
model may not detect a novel attack. As a result, developing a robust IDS model with a
higher detection rate and a lower false positive rate becomes critical. With the consideration
that different techniques can learn the characteristics of data from different perspectives,
in this work, we propose a hybrid IDS that combines an RF and an AE. In particular,
the hybrid IDS comprises two steps in the detection phase. The first step is the application
of RF with probabilistic methods to detect attacks. Then, considering how to reduce the
false alarm rate further, we employ the AE module in the second step. The contributions of
this study can be listed as follows:
1. In the first step, we employ RF to identify attacks. Unlike commonly used strategies,
we employ the predicted probability to distinguish the samples. With a predefined
threshold, probabilities higher than the threshold can be identified as attacks. In this
manner, the RF can identify some unknown attacks.
2. In the second step, we combine another detector utilizing a different detection prin-
ciple. In detail, we apply an AE to recheck samples that have been predicted as
attacks by the RF classifier. The reconstruction error of samples with a lower value
can be reclassified as normal. This additional step decreases the false positive rate
even further.
3. To demonstrate the effectiveness of the proposed methods, we conduct experiments
on two incursion datasets. In the experiments, we explicitly set some attacks as
unknown. The combined approach provides a greater detection rate and a lower false
positive rate compared with other baseline methods.
The remainder of this research is organized as follows: we describe the relevant
work concerning the IDS in Section 2. The whole detection framework and corresponding
methodology are presented in Section 3. Section 4 demonstrates the performance of the
suggested approach via comprehensive tests. Finally, we draw relevant conclusions and
highlight avenues for further work in Section 5.

2. Related Work
The purpose of IDSs is the discovery of anomalous operations within the monitoring
environment. There are two ways to classify IDSs: based on the data source utilized or the
detection methods. According to the data source utilized in the detection engine, IDSs can
be divided into two categories: host-based IDSs (HIDS) and network-based IDSs (NIDS) [4].
The former utilizes data generated on one host, while the latter checks network traffic
packets transmitted within the network. In this study, we focus on machine learning-based
NIDS.
The overall process of machine learning can be summarized as two parts: training
and testing [1]. During the training phase, models are trained on the collected dataset and
learn the characteristics of the input features. After training, the model is deployed in the
Symmetry 2023, 15, 568 3 of 16

testing phase to examine the anomalous samples. There are a lot of classic machine learning
methods have been applied to construct IDSs [15,16]. As an ensemble learning approach,
the RF classifier yields considerable detection performance [17]. It constructs numerous
DTs to obtain a higher detection rate than a single DT.
As there may exist some issues such as high-dimensional data [18,19] and data imbal-
ances [20], researchers have proposed more and more enhanced classifiers. To reduce the
impact of irrelevant features and enhance the detection rate, the authors of [21] selected
useful features first based on the correlation between the features and classified samples
using a combination of several distinct classifiers.
RF can be employed directly as a feature selection method. The authors of [22] applied
an RF to discover the optimal features for classification based on feature importance first.
After that, the selected features are utilized to train a support vector machine. There are
also some other hybrid models involving two parts. For example, the authors of [23] used
both AE and DNN to classify attacks. However, this method has difficulty with some
unknown attacks that do not appear in the training set.
In some situations, it is difficult to collect or simulate the attack samples [24]. It is
reasonable to employ some one-class classifiers to learn about the characteristics of network
traffic. One-class learning aims to build a profile of normal traffic. For example, the one-
class support vector machine (OCSVM) [25] attempts to distinguish between normal and
anomalous data by learning the hyperplane that has the maximum distance between the
normal samples and the origin [26]. In addition, the isolation forest (IF) algorithm can be
used to detect anomalies [27]. Furthermore, there are various works employing AEs [14].
AEs are generally used for feature extraction [28,29], however they can also be utilized for
anomaly detection [14].
In this work, we seek to develop an IDS with the objective of a high detection rate and
a low false alarm rate. In a real deployment, there are some unknown attacks. Additionally,
this work finds that some supervised classifiers may incorrectly classify unknown attacks
as normal. To solve this problem, we employ a probabilistic RF and an AE. There is one
work [30] utilized in fraud detection similar to ours that also utilizes RF and AE. However,
they employed the AE as a dimension reduction approach to retrieve the representative
characteristics. Furthermore, they applied the RF in a probabilistic approach to overcome
the problem of data imbalance.

3. Proposed Methods
In this section, we describe the suggested model in detail. We introduce the employed
techniques first, i.e., RF and AE. Then, we merge these two methods to introduce the full
detection framework.

3.1. Random Forest

To obtain better prediction performance, ensemble methods employ a multiple of
basic classifiers to make decisions. This approach generally shows increased classification
performance over one basic classifier. The RF [31] is a powerful ensemble classifier; it
combines bagging and feature randomness to train multiple DTs. The DT is a frequently
used classification approach. It tries to learn a set of if–then rules to classify the data.
An illustration of an RF is shown in Figure 1.
The bagging approach is one often-used ensemble strategy for constructing multiple
classifiers. Using the bootstrap with replacement to sample the data, a large number of
classifiers are trained independently.
Symmetry 2023, 15, 568 4 of 16

Training data

Tree - 1 Tree - 2 Tree - M

Majority voting

Prediction

Figure 1. Illustration of random forest.

As illustrated in Figure 1, given the training set, there are M distinct DT classifiers.
To obtain final prediction results, majority voting is employed to aggregate the predictions
from each DT. In the following, we present the detailed training procedure for RF. Consid-
ering the labeled dataset with samples { x1 , x2 , · · · , x N } and labels {y1 , y2 , · · · , y N }, where
N is the number of samples and every sample includes j features, to train the RF, we aim to
train M distinct DTs. The general step can be summarized as follows:
(1) Sampling from the training set with N samples using the bootstrap with replacement.
(2) Construction of a DT classifier using the selected samples.
To construct one DT, we first select k features from j features. The value of k is set as
sqrt( j). After that, we pick the best split feature from the chosen k features and divide the
node into two children nodes. The Gini impurity is utilized as a split criterion for the split
node. Repeat these procedures and grow the tree as deep as possible.
In this study, we focus on binary classification. To predict an input sample, the RF
utilizes the votes of the trees in the forest weighted by their probability estimates [32]. Let
p denote the probability of being predicted as the attack. Then, the probability of being
predicted as normal is q and q = 1 − p. The predicted class probabilities of an input sample
are calculated as the mean predicted class probabilities of the trees in the forest. The class
probability of a single tree is the fraction of samples of the same class in a leaf node [32].
We utilize the “predict_proba” function in scikit-learn [32] to output the probability
that an object belongs to a certain class. Usually, the samples can be classified into one
class with the highest probability. However, in this approach, some samples belonging to
unknown attacks may be wrongly classified as normal. In this part, we define a threshold
T to help the process of deciding the samples. If the probability p belonging to the attack is
greater than the threshold T, samples could be classified as attack. In this manner, with the
sample xi and corresponding probability pi , we define a decision function f (·). The results
are indicated by ±1, where +1 is the anomalous samples. The calculation is shown below:
(
−1, if pi ≤ T
f ( xi ) = (1)
+1, if pi > T

3.2. Autoencoder
Deep learning has shown to be quite effective in a variety of research fields [33]. It
learns data representations with multiple neural network layers. The other part of our
suggested model is a special unsupervised neural network [34], an AE. As it can rebuild the
input, the reconstruction error can serve as the anomaly score for identifying abnormalities.
Symmetry 2023, 15, 568 5 of 16

The framework for anomaly detection using AE is depicted in Figure 2. We introduce the
general process later.

𝑋 𝑋"

er De
c od
cod er
En
Reconstruction Error

>Threshold ≤Threshold

Attack Normal

Figure 2. The framework of using AE to detect anomalies.

AEs usually have a symmetry structure. As demonstrated in Figure 2, it can be divided

into two parts: encoder and decoder. The AE attempts to reconstruct the original input
as much as possible. Both the encoder and the decoder are made up of several hidden
layers. In detail, the encoder generates a latent representation for the input sample. Usually,
the latent representations have a lower dimension than the original input. The decoder
tries to recover the original input from the compressed form. In this work, we employ
mean square error (MSE) to quantify the reconstruction loss. With the original input xi and
reconstructed output x̂i , the MSE ei is calculated as below:

ei = k xi − x̂i k2 (2)

The training process aims to minimize the reconstruction loss. After training, with a
well-trained AE we can identify the anomalous samples using the MSE. Similar to the
handling process for the probability of the RF with a predefined threshold T, we use a
function f (·) to make the decision, as illustrated below:
(
−1, if ei ≤ T
f ( xi ) = (3)
+1, if ei > T

where +1 denotes the anomalous sample.

3.3. Whole Detection Framework

In this study, we target the scenario where we have collected samples belonging to
certain attack types, but there may exist some unknown attacks as attack variations or new
types of attacks continue to emerge.
With the foundational technique laid out, we present the whole detection framework
of our suggested method. When there are any unknown attacks, the traditional RF classifier
would misclassify them as normal, so we employ a probabilistic approach to make the
Symmetry 2023, 15, 568 6 of 16

decision. In addition, to reduce the number of false alarms, we apply an AE to recheck

the attacks.
In total, we integrate these two techniques, leveraging the probabilistic RF and AE.
The approach has two parts: training and testing. We present the two parts in Figure 3.
In the training phase, each model is trained with a different subset of the dataset. Consider-
ing the labeled dataset at hand, we can train a binary classifier using an RF classifier. Then,
an additional AE is constructed utilizing normal samples from the training set only. As the
RF is trained on the labeled dataset, we use its output probability to make decisions for
detecting attack samples.

Testing
Labeled Training Set Samples
Normal Attack
Samples Samples
Random Forest Normal
Classifier Samples
Train

Random Forest Attack

Classifier Samples

Train
Autoencoder Normal
Detector Samples
Autoencoder
Detector
Attack
Samples

(a) (b)
Figure 3. The overview of our proposed method. (a) Training process; (b) testing process.

Because the AE trains on normal data only, the normal samples would have a lower
MSE than the anomalous ones. From this point of view, we can define a lower thresh-
old, and samples lower than this threshold have a higher degree of confidence that they
belong to the normal class. Under this assumption, we integrate these two decision pro-
cesses. After obtaining the trained model, during the testing phase, we apply a two-step
detection strategy.
We list the detection procedure in Algorithm 1. There are two hyperparameters
considered for the decision. The first one is T1 , utilized for RF probability, and the second
one is T2 , used for MSE. In the beginning, a sample xi is classified using the RF classifier.
When the probability of a sample is larger than the T1 , it would be classified as an attack.
After that, we utilize the AE to examine the attack sample predicted by RF again. In this part,
when the reconstruction error is smaller than the T2 , the sample is reclassified as normal.
In this two-step approach, the samples in the testing phase can be correctly classified,
especially the misclassified attack samples.
Symmetry 2023, 15, 568 7 of 16

Algorithm 1: Testing process of our proposed method

Data: Original input data xi . The trained RF model M1 and AE model M2 .
The thresholds T1 and T2
Result: Anomaly detection result yi for xi
/* Step 1: Calculate the probability pi by RF model M1 */
1 p i ← M1 ( x i )
/* Step 2: Calculate the MSE ei by AE model M2 . */
2 x̂i ← M2 ( xi )
3 ei ← k xi − x̂i k
2

/* Step 3: Classified the sample by RF first. */

4 if pi > T1 then
5 yi ← 1
6 else
7 y i ← −1
8 end
/* Step 4: Recheck the predicted attack sample */
9 if yi == 1 then
10 if ei ≤ T2 then
11 y i ← −1
12 end
13 end

4. Experimental Results
In this section, we present the experimental results for the proposed approach. First,
we describe the dataset and preprocessing procedures applied in the experiments. The eval-
uation metric and comparison methods are then introduced. After that, the specific experi-
ment settings are listed. The results of the experiments are thoroughly analyzed in the final
part.

4.1. Dataset
To conduct the experiments, we use two intrusion detection datasets [35], namely,
NF-CSE-CIC-IDS2018-v2 and NF-BoT-IoT-v2. Both datasets are created using the NetFlow
v9 features from the original datasets CSE-CIC-IDS2018 [36] and BoT-IoT [37]. In this study,
we refer to them as IDS2018 and BoT-IoT, respectively. Considering that there is a large
number of samples in the dataset, we randomly sample different categories of data, and
the detailed distribution of the different categories is shown in Table 1.

Table 1. The sample distribution of different attack types for both datasets.

IDS2018 BoT-IoT
Number of Number
No. Class No. Class
Samples of Samples
1 Normal 120,000 1 Normal 65,150
2 DDoS 68,000 2 DoS 42,332
3 DoS 48,000 3 DDoS 21,133
4 Bot 14,000 4 Reconnaissance 13,000
5 Bruteforce 12,000 5 Theft 2133
6 Infiltration 11,000 - - -
7 Web 3502 - - -

In the IDS2018 dataset, there are six attacks (not including normal data): DDoS, DoS,
Bot, Bruteforce, Infiltration, and Web attacks. As for the BoT-IoT dataset, there are four
attacks. There are 43 features for every record. To preprocess the dataset, we remove some
irrelevant columns, for example, source IP. After that, all numeric features are handled
Symmetry 2023, 15, 568 8 of 16

via the log function to decrease the effect of a larger number. Furthermore, some category
features are encoded by the one-hot encoder method. We use min–max normalization to
scale the feature into the range of 0 and 1. After handling the dataset, the dimension of
IDS2018 dataset is about 300, and the dimension of BoT-IoT dataset is about 200.

4.2. Comparative Methods

To compare the detection performance of our method with some baseline methods,
four supervised methods are used to demonstrate improvements while coping with the
unknown attacks. Furthermore, there are two anomaly detection methods trained on
normal data only.
• DT. DT is a commonly used classification method, it learns a set of if–else deci-
sion rules.
• Logistic regression (LG). LG uses a logistic function to learn the relations between
features and target.
• RF. The detailed theory has been laid out above. We use the “predict” method in
scikit-learn to obtain classification results.
• DNN. DNN uses multiple hidden layers to model the complex relationship between
an input feature and its classification target.
• IF. Multiple isolation trees are constructed to decide together whether one sample is
normal or anomalous.
• OCSVM [25]. OCSVM finds a hyperplane that separates the normal data from the
origin. The hyperplane has the largest distance with the origin.
As there are two techniques in our combination method, they would be compared
with our method. The first one is AE, and the second one is the probabilistic RF, which we
denote as “RF(Pro)”. In total, there are eight baselines.

4.3. Evaluation Metric

Considering the attack type as the positive one, there are four classification results
between the true label and prediction label: true positive (TP), false negative (FN), false
positive (FP), and true negative (TN). One TP record, for example, indicates that an attack
sample is correctly classified as anomalous. We list the four conditions in Figure 4.

Attack True (TP)

Positive False Negative
(FN)
True label

Normal False (FP)

Positive True Negative
(TN)

Attack Normal
Predicted label
Figure 4. Confusion matrix of detection results.

Accounting for these four numbers of the classification results, we calculate some
performance metrics. We list four metrics commonly used in the field of classification
including accuracy, precision, recall, and F1. Their calculations are defined as follows:

TP + TN
Accuracy = (4)
TP + TN + FP + FN
Symmetry 2023, 15, 568 9 of 16

TP
Precision = (5)
TP + FP

TP
Recall = (6)
TP + FN

2 × Precision × Recall
F1 = (7)
Precision + Recall
The F1 is the harmonic mean of precision and recall. Furthermore, there is an additional
metric named false positive rate (FPR) as below:

FP
FPR = (8)
FP + TN

4.4. Experimental Settings

Before listing the hyperparameters for the deployed methods, we introduce how the
dataset is split. To simulate the unknown attack, in one experiment, we select one type
of attack as unknown one. In detail, all of the samples that belong to this selected class
are removed from the dataset first. After that, the remaining dataset is split by a ratio of
4:1 into a training set and a testing set. The selected attack data are then merged with the
testing set to create a new testing set.
Furthermore, we randomly select 10% fraction samples from the training set as a
validation set. Using these samples, we can select some hyperparameters based on their
performance on the validation set. It should be noted that all the unknown attack samples
are in the testing set. All the samples in the training set and validation set belong to normal
or known attacks. In the case of the binary classification problem, the label contains only
−1 or 1 after splitting the dataset.
To avoid the randomness caused by the model itself or data splitting, we repeat each
attack as unknown five times. In total, for each method, there are thirty experiments for the
IDS2018 dataset and twenty experiments for the BoT-IoT dataset.
We implement our method using the Python programming language. Furthermore,
the Pytorch framework [38] is utilized to implement AE neural network. The machine
learning methods are developed by the scikit-learn library [32].
For AE, we use five hidden layer settings for both datasets. As for the IDS2018 dataset,
the number of neurons is “210-140-70-140-210”, and the setting of BoT-IoT is “150-100-50-
100-150”. The activation function in the hidden layers is the PReLU function [39]. The batch
size for training both AEs is 256, the number of epochs is 200, and the learning rate is 0.001.
As for the RF, we use the default settings from the scikit-learn library. The number of DT
is 100.
There are two parameters in our method that affect the performance: T1 and T2 . T1 is
the lower bound of predicted probability output by the RF classifier for samples classified
as attack. To avoid overfitting, we use the probabilities of normal data in the validation
dataset to determine the T1 . The value of T1 is set as the 90th percentile of the probabilities.
T2 is the value of MSE to recheck the attack samples predicted by RF. We set it as the 75th
percentile of the MSE from normal data in the validation dataset. We will conduct a detailed
analysis of these two parameters.
We set the hidden layer of DNN as “256-256”. The other three supervised methods,
including DT, LG, and RF, use the default setting from scikit-learn. As for OCSVM, we use
the radial basis function as the kernel function. Furthermore, the parameter nu is selected
from {0.01, 0.1}, and gamma is from {1 × 10−5 , 1 × 10−4 , 1 × 10−3 , 1 × 10−2 , 1 × 10−1 , 1}.
The parameter contamination of IF is selected from {0.01, 0.1}, and the number of estima-
tors is selected from {100, 200, 300}. We use the grid search method to find the optimal
parameters that have the highest F1 on the validation dataset for these two methods.
Symmetry 2023, 15, 568 10 of 16

The comparative method AE uses the same network settings as ours. In our method,
the T2 for AE is used to reduce the FPR. Its value is set at a lower value to ensure that the
samples can be classified as normal with higher confidence. When using AE only to detect
attacks, we use another threshold. We use the validation dataset to select the best threshold,
which results in the best F1 score.

4.5. Results Analysis

In this subsection, we evaluate the experimental results in sequence. We conduct an
experiment utilizing the IDS2018 dataset as a case study for our method. Then we assess the
overall experiment results for both datasets, including the general detection performance
and the performance under various attacks are set as unknown. Finally, we examine the
impact of two hyperparameters: T1 and T2 .
In this part, we use DoS as an unknown attack and present the probability and MSE
distributions. First, we analyze the predicted probability distribution of RF using the bar
plot, as displayed in Figure 5.

2.5×104
Normal
2.0×104 Known Attack
Unknown Attack
1.5×104
Count

1.0×104
5.0×103
0.0
0.0 0.2 0.4 0.6 0.8 1.0
Probability
Figure 5. The probability of being predicted as attacks for testing samples of IDS2018 dataset when
DoS is unknown attack.

In the figure, the samples’ distribution of probability is plotted, where the x-axis is
the probability, and the y-axis is the number of samples. From the figure, we can see that
the probability of normal and known attack samples being distributed on the two sides is
generally 0 or 1. However, the unknown attack samples are located in the middle position.
From this point, if we use the default “predict” method, the normal and known attacks can
be classified correctly, but the unknown attacks would be misclassified as normal.
In our method, we can use the probability to classify the samples. In this instance, we
can set the threshold to 0.2 and obtain the correct prediction. Next, we display the MSE
distribution, which is plotted in Figure 6. As there is no attack sample in the training set, it
is hard for the AE to rebuild the attack samples well. Whether known or unknown, most
attack samples have a higher MSE than normal samples.
Symmetry 2023, 15, 568 11 of 16

2.0×104 Normal
Known Attack
1.5×104 Unknown Attack

Count 1.0×104

5.0×103

0.0
0.00 2.50×10 5 5.00×10 5 7.50×10 5 1.00×10 4 1.25×10 4 1.50×10 4
MSE
Figure 6. The MSE of testing samples for IDS2018 dataset.

To further compare the combination methods, we plot the confusion matrix of the four
classifiers in Figure 7.
Normal

Normal

Normal
96.7 3.3 89.8 10.2 92.4 7.6 93.5 6.5
True label

True label

True label
Attack

Attack

Attack
64.8 35.2 1.8 98.2 2.0 98.0 2.0 98.0
Normal Attack Normal Attack Normal Attack Normal Attack
Predicted label Predicted label Predicted label Predicted label
(a) RF (b) RF(Pro) (c) AE (d) Ours
Figure 7. The detection confusion matrix (in %) of different classifiers on IDS2018 dataset when
the unknown attack is DoS. (a) Classification confusion matrix of RF. (b) Classification confusion
matrix of RF(Pro). (c) Classification confusion matrix of AE. (d) Classification confusion matrix of
combination methods.

The first RF shows the lowest detection rate, as 64.8% of attacks have been wrongly
categorized as normal. When it uses the highest probability to classify (as we can see from
Figure 5), it wrongly classifies some unknown attack samples as normal. From Figure 7b, it
can be seen that only 1.8% of attack samples were misclassified as normal for the RF(Pro)
classifier. It has a big improvement over the RF classifier. Furthermore, the AE detector has
a higher TP rate. In the final analysis, our proposed combination methods achieved the
lowest FP fraction when compared with the other two. As stated before, we apply the AE
to the RF results and relabel some attack samples that have a lower MSE as normal. In this
manner, the combination method has a lower FP.
To further demonstrate the performance of our method, we average all of the results
when different attacks are set as unknown. The detailed results are shown in Tables 2 and 3
for both datasets. We report the mean value and standard deviation in the table.
Symmetry 2023, 15, 568 12 of 16

Table 2. The detection performance (in %) of different classifiers for the BoT-IoT dataset. The table
reports the five metrics mentioned above. Both the mean value and standard deviation are reported.

Method Accuracy Precision Recall F1 FPR

DT 90.97 ± 13.77 99.99 ± 0.02 80.04 ± 28.14 85.86 ± 20.96 0.01 ± 0.01
LG 90.54 ± 13.86 100.00 ± 0.01 78.98 ± 28.32 85.14 ± 21.15 0.00 ± 0.00
RF 94.14 ± 8.52 99.99 ± 0.01 91.11 ± 12.71 94.85 ± 7.54 0.02 ± 0.01
DNN 91.46 ± 13.62 100.00 ± 0.00 81.13 ± 27.85 86.58 ± 20.84 0.00 ± 0.00
IF 65.81 ± 14.55 53.50 ± 22.80 89.33 ± 8.13 64.89 ± 20.96 50.47 ± 11.42
OCSVM 90.76 ± 5.66 95.50 ± 8.43 91.38 ± 4.99 93.10 ± 4.80 7.74 ± 12.44
AE 91.33 ± 7.10 97.47 ± 2.05 89.19 ± 10.86 92.79 ± 6.38 4.94 ± 3.63
RF(Pro) 98.79 ± 0.66 98.23 ± 1.14 99.95 ± 0.07 99.08 ± 0.58 3.77 ± 1.86
Ours 99.63 ± 0.11 99.50 ± 0.20 99.95 ± 0.07 99.72 ± 0.11 1.06 ± 0.16
Bold font indicates best results. The following table are the same.

In the first place, based on the results of accuracy, and F1 for both datasets, our method
outperforms others. In detail, our method has the highest F1 of 99.72% for the BoT-IoT
dataset and 95.90% for the IDS2018 dataset. Although the recall of OCSVM in the IDS2018
dataset is higher than ours, their other metrics perform worse than ours.

Table 3. Detection performance (in %) of IDS2018 dataset.

Method Accuracy Precision Recall F1 FPR

DT 78.81 ± 12.79 100.00 ± 0.01 36.56 ± 22.38 49.97 ± 22.85 0.00 ± 0.00
LG 65.76 ± 15.72 99.99 ± 0.03 21.86 ± 17.51 33.12 ± 20.18 0.00 ± 0.00
RF 79.41 ± 14.31 98.20 ± 1.01 72.23 ± 18.85 81.72 ± 14.48 2.82 ± 1.27
DNN 61.80 ± 17.40 99.99 ± 0.02 18.98 ± 15.99 29.48 ± 18.95 0.00 ± 0.00
IF 72.19 ± 6.97 63.44 ± 11.11 92.85 ± 2.58 74.97 ± 8.59 46.36 ± 7.74
OCSVM 88.85 ± 7.61 88.77 ± 11.09 93.99 ± 2.09 91.01 ± 6.92 18.16 ± 12.93
AE 91.15 ± 9.07 95.97 ± 2.95 91.29 ± 11.65 93.05 ± 8.13 7.92 ± 4.67
RF(Pro) 93.43 ± 4.22 96.28 ± 1.85 93.89 ± 7.65 94.86 ± 3.87 8.31 ± 3.16
Ours 94.92 ± 4.79 99.09 ± 0.42 93.19 ± 7.47 95.90 ± 4.27 1.81 ± 0.50

As the experimental results present similar conclusions for both datasets, we analyze
them both. The first four supervised methods present higher precision and a lower FPR than
other methods. Because some samples belong to the unknown attacks in the experiments,
the supervised methods classified them directly into the normal class.
The detection performance of IF is not satisfying. It has a F1 of only 65% on the BoT-IoT
dataset. The other two detection methods, OCSVM and AE, present higher performance
than supervised methods. As these three methods only require normal data during training,
they are capable of dealing with known or unknown attacks. The F1 of OCSVM and AE
achieves about 93%, which is higher than the four supervised methods.
The single detector “RF(Pro)” which is part of our method, has a similar performance
to our method. The RF classifier is significantly enhanced after using probability to make
the decision. From the point of recall, the value of “RF(Pro)” is higher than RF with about
8% on BoT-IoT dataset and about 20% on IDS2018 dataset. As we stated before, we aim to
reduce the false positive rate by combining the AE and RF, and the FPR of both datasets
in the tables is lower than a single one. For example, in the IDS2018 dataset, our hybrid
method has the lowest FPR of 1.81% compared to the “RF(PRO)” or AE. In addition, the F1
of our method is higher than these two basic methods.
After analyzing the average detection performance on different attacks, it is reasonable
to investigate the performance of various classifiers when dealing with different unknown
attacks. For simplification, we report the methods related to RF and AE for the IDS2018
dataset. We plot the F1 and FPR only in Figures 8 and 9, with the consideration that F1 is the
harmonic mean of recall and precision. Additionally, the FPR can prove the improvement
of our method.
Symmetry 2023, 15, 568 13 of 16

RF RF(Pro) AE Ours
1.00
0.95
0.90
F1
0.85
0.80
0.75
DoS Bot Bruteforce DDoS Web Infiltration
Attack
Figure 8. F1 (in %) of different detection methods against various unknown attacks.

RF RF(Pro) AE Ours
0.15

0.10
FPR

0.05

0.00
DoS Bot Bruteforce DDoS Web Infiltration
Attack
Figure 9. FPR (in %) of different detection methods against various unknown attacks.

In Figure 8, there are six different attacks. The probability-based RF performs better
than the classical RF method in most cases. Furthermore, our method presents the highest
F1. The results shown in Figure 9, our combination method reduces the FPR significantly.
In our model, there are two hyperparameters that have significant effects: T1 and T2 .
Because there are no unknown attacks in the training or validation set, we need to set them
manually. To validate its effectiveness, we vary these two hyperparameters with some
representative values. In detail, the T1 is selected from {80, 85, 90, 95, 99} and T2 is from
{65, 70, 75, 80, 85}. We plot the comparisons in Figure 10. As before, we report the F1 and
FPR for the IDS2018 dataset only.

100 6
95.6 95.7 95.8 95.9 95.8 5.8 5.0 4.3 3.6 2.9
99 95 90 85 80

99 95 90 85 80

95
95.9 96.0 96.1 96.1 96.0 4.1 3.5 3.0 2.5 2.0 4
90
95.8 95.9 95.9 95.9 95.7 2.5 2.2 1.8 1.5 1.2
T1

85 2
89.9 89.9 89.9 89.8 89.6 1.2 1.1 0.9 0.7 0.6
80
81.3 81.3 81.3 81.2 81.0 0.3 0.2 0.2 0.2 0.2
75 0
65 70 75 80 85 65 70 75 80 85
T2 T2
(a) F1 (b) FPR
Figure 10. Performance (in %) comparisons of various threshold value on IDS2018 dataset methods
combining with five threshold method construct 25 results. We use the red circle to highlight our
results. (a) F1, higher is better. (b) FPR, lower is better.
Symmetry 2023, 15, 568 14 of 16

In this section, we restate the effects of two parameters. T1 is the cut position for
probability output by RF to decide whether a sample is attack. Furthermore, T2 is a value
used for the MSE to determine whether a sample may be misclassified as an attack. It is
hard to decide the optimal value of these two hyperparameters, because there no existing
unknown attack samples in the training set or validation set.
At first, we find that the FPR in Figure 10b decreases with increases in T1 and T2 .
However, F1 presents a different pattern with the change in threshold. When the T1
achieves the highest value, i.e., the 99th percentile, the F1 presents the lowest value. This
is because with higher T1 , more and more attack samples are missed, although the FPR is
lower. The F1 does not change too much, as with the varied T2 ; however, FPR is lower.
In our experiment, we set T1 as the 90th percentile of the probabilities of normal data
in the validation dataset, and T2 is the 75th percentile of the MSEs of normal data in the
validation. We use the red circle to highlight the results of our selected value. We can see
that the highest F1 is achieved at T1 = 85th and T2 = 75th, which is only higher than ours by
0.2%. The results prove that the selected values present satisfactory results.

5. Conclusions
With the increasing risks of network attacks, network environments need more power-
ful IDSs to protect them. As more and more attacks appear every day, it is important to
handle the issues presented by unknown attacks. In this study, we develop a hybrid IDS to
boost the detection rate when dealing with unknown attacks.
In detail, the proposed method combines RF and AE. Because the unknown attacks
may be misclassified, we use the probability output of the RF classifier to check the sam-
ples first. Then, the AE is utilized to recheck the attack predicted by RF and reduce the
FPR. We conducted experiments on two intrusion detection datasets while setting some
attack samples explicitly as unknown. The experimental results prove that the combina-
tion method boosts the detection rate and reduces the FPR in comparison to the single
detection methods.
Some directions are worth further investigation. Only one type of attack was set as
the unknown during the experiments; it is important to set more than one type of attack as
the unknown to test the model. In this study, we focus on binary classification. We plan to
expand the method into a multi-class approach to provide more diagnostic information for
security operators in the future.

Author Contributions: Conceptualization, C.W. and Y.S.; data curation, C.W.; formal analysis, W.W.
and H.L.; funding acquisition, B.W.; investigation, C.W. and H.L.; methodology, C.W.; project
administration, B.W.; software, C.W. and Y.S.; supervision, H.L. and B.W.; validation, Y.S. and W.W.;
visualization, C.W.; writing—original draft, C.W.; writing—review and editing, B.W. All authors have
read and agreed to the published version of the manuscript.
Funding: This research is funded by the National Key Research and Development Program of China
(No.2021YFB2012400).
Data Availability Statement: In this study, we use the intrusion detection dataset reported in [35].
Readers can refer to the corresponding paper for detail information.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Ahmad, Z.; Shahid Khan, A.; Wai Shiang, C.; Abdullah, J.; Ahmad, F. Network intrusion detection system: A systematic study of
machine learning and deep learning approaches. Trans. Emerg. Telecommun. Technol. 2021, 32, 1–29. [CrossRef]
2. Anderson, J.P. Computer Security Threat Monitoring and Surveillance; Technical Report; James P. Anderson Company: Philadelphia,
PA, USA, 1980.
3. Vanin, P.; Newe, T.; Dhirani, L.L.; O’Connell, E.; O’Shea, D.; Lee, B.; Rao, M. A Study of Network Intrusion Detection Systems
Using Artificial Intelligence/Machine Learning. Appl. Sci. 2022, 12, 11752. [CrossRef]
4. Liu, H.; Lang, B. Machine learning and deep learning methods for intrusion detection systems: A survey. Appl. Sci. 2019, 9, 4396.
[CrossRef]
Symmetry 2023, 15, 568 15 of 16

5. Adnan, A.; Muhammed, A.; Abd Ghani, A.A.; Abdullah, A.; Hakim, F. An Intrusion Detection System for the Internet of Things
Based on Machine Learning: Review and Challenges. Symmetry 2021, 13, 1011. [CrossRef]
6. Aldallal, A.; Alisa, F. Effective Intrusion Detection System to Secure Data in Cloud Using Machine Learning. Symmetry 2021, 13,
2306. [CrossRef]
7. Aldallal, A. Toward Efficient Intrusion Detection System Using Hybrid Deep Learning Approach. Symmetry 2022, 14, 1916.
[CrossRef]
8. Ingre, B.; Yadav, A.; Soni, A.K. Decision Tree Based Intrusion Detection System for NSL-KDD Dataset. In Proceedings of the
Information and Communication Technology for Intelligent Systems (ICTIS 2017); Satapathy, S.C., Joshi, A., Eds.; Springer International
Publishing: Cham, Switzerland, 2018; Volume 2, pp. 207–218.
9. Balyan, A.K.; Ahuja, S.; Lilhore, U.K.; Sharma, S.K.; Manoharan, P.; Algarni, A.D.; Elmannai, H.; Raahemifar, K. A Hybrid
Intrusion Detection Model Using EGA-PSO and Improved Random Forest Method. Sensors 2022, 22, 5986. [CrossRef]
10. Yang, Z.; Wang, B. A Feature Extraction Method for P2P Botnet Detection Using Graphic Symmetry Concept. Symmetry 2019, 11,
326. [CrossRef]
11. Vinayakumar, R.; Alazab, M.; Soman, K.P.; Poornachandran, P.; Al-Nemrat, A.; Venkatraman, S. Deep Learning Approach for
Intelligent Intrusion Detection System. IEEE Access 2019, 7, 41525–41550. [CrossRef]
12. Li, Z.; Qin, Z.; Huang, K.; Yang, X.; Ye, S. Intrusion Detection Using Convolutional Neural Networks for Representation Learning.
In Proceedings of the Neural Information Processing; Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, E.S.M., Eds.; Springer International
Publishing: Cham, Switzerland, 2017; pp. 858–866.
13. Rudd, E.M.; Rozsa, A.; Günther, M.; Boult, T.E. A Survey of Stealth Malware Attacks, Mitigation Measures, and Steps Toward
Autonomous Open World Solutions. IEEE Commun. Surv. Tutor. 2017, 19, 1145–1172. . 2636078. [CrossRef]
14. Song, Y.; Hyun, S.; Cheong, Y.G. Analysis of autoencoders for network intrusion detection†. Sensors 2021, 21, 4294. [CrossRef]
[PubMed]
15. Magán-Carrión, R.; Urda, D.; Díaz-Cano, I.; Dorronsoro, B. Towards a reliable comparison and evaluation of network intrusion
detection systems based on machine learning approaches. Appl. Sci. 2020, 10, 1775. [CrossRef]
16. Maseer, Z.K.; Yusof, R.; Bahaman, N.; Mostafa, S.A.; Foozy, C.F.M. Benchmarking of Machine Learning for Anomaly Based
Intrusion Detection Systems in the CICIDS2017 Dataset. IEEE Access 2021, 9, 22351–22370. . 3056614. [CrossRef]
17. Resende, P.A.A.; Drummond, A.C. A survey of random forest based methods for intrusion detection systems. ACM Comput.
Surv. 2018, 51, 1–36. [CrossRef]
18. Di Mauro, M.; Galatro, G.; Fortino, G.; Liotta, A. Supervised feature selection techniques in network intrusion detection: A critical
review. Eng. Appl. Artif. Intell. 2021, 101, 104216. [CrossRef]
19. Abdulhammed, R.; Musafer, H.; Alessa, A.; Faezipour, M.; Abuzneid, A. Features dimensionality reduction approaches for
machine learning based network intrusion detection. Electronics 2019, 8, 322. [CrossRef]
20. Seo, J.H.; Kim, Y.H. Machine-Learning Approach to Optimize SMOTE Ratio in Class Imbalance Dataset for Intrusion Detection.
Comput. Intell. Neurosci. 2018, 2018, 9704672. [CrossRef]
21. Zhou, Y.; Cheng, G.; Jiang, S.; Dai, M. Building an efficient intrusion detection system based on feature selection and ensemble
classifier. Comput. Netw. 2020, 174, 107247. [CrossRef]
22. Chang, Y.; Li, W.; Yang, Z. Network intrusion detection based on random forest and support vector machine. In Proceedings of
the 2017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on
Embedded and Ubiquitous Computing (EUC), Guangzhou, China, 21–24 July 2017; Volume 1, pp. 635–638. [CrossRef]
23. Narayana Rao, K.; Venkata Rao, K.; P.V.G.D., P.R. A hybrid Intrusion Detection System based on Sparse autoencoder and Deep
Neural Network. Comput. Commun. 2021, 180, 77–88. [CrossRef]
24. Cao, V.L.; Nicolau, M.; McDermott, J. Learning Neural Representations for Network Anomaly Detection. IEEE Trans. Cybern.
2019, 49, 3074–3087. [CrossRef]
25. Schölkopf, B.; Platt, J.C.; Shawe-Taylor, J.; Smola, A.J.; Williamson, R.C. Estimating the support of a high-dimensional distribution.
Neural Comput. 2001, 13, 1443–1471. [CrossRef]
26. Mahfouz, A.M.; Abuhussein, A.; Venugopal, D.; Shiva, S.G. Network Intrusion Detection Model Using One-Class Support Vector
Machine. In Proceedings of the Advances in Machine Learning and Computational Intelligence; Patnaik, S., Yang, X.S., Sethi, I.K., Eds.;
Springer: Singapore, 2021; pp. 79–86.
27. Javed, M.A.; Khan, M.Z.; Zafar, U.; Siddiqui, M.F.; Badar, R.; Lee, B.M.; Ahmad, F. ODPV: An Efficient Protocol to Mitigate Data
Integrity Attacks in Intelligent Transport Systems. IEEE Access 2020, 8, 114733–114740. . 3004444. [CrossRef]
28. Al-Qatf, M.; Lasheng, Y.; Al-Habib, M.; Al-Sabahi, K. Deep Learning Approach Combining Sparse Autoencoder with SVM for
Network Intrusion Detection. IEEE Access 2018, 6, 52843–52856. [CrossRef]
29. Kunang, Y.N.; Nurmaini, S.; Stiawan, D.; Zarkasi, A.; Jasmir, F. Automatic Features Extraction Using Autoencoder in Intrusion
Detection System. In Proceedings of the 2018 International Conference on Electrical Engineering and Computer Science (ICECOS),
Pangkal, Indonesia, 2–4 October 2018; Volume 17, pp. 219–224. [CrossRef]
30. Lin, T.H.; Jiang, J.R. Credit card fraud detection with autoencoder and probabilistic random forest. Mathematics 2021, 9, 2683.
[CrossRef]
31. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [CrossRef]
Symmetry 2023, 15, 568 16 of 16

32. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.;
et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830.
33. Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [CrossRef] [PubMed]
34. Géron, A. Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2022.
35. Sarhan, M.; Layeghy, S.; Portmann, M. Towards a Standard Feature Set for Network Intrusion Detection System Datasets. Mob.
Networks Appl. 2021, 27, 357–370. [CrossRef]
36. Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. Toward generating a new intrusion detection dataset and intrusion traffic
characterization. In Proceedings of the 4th International Conference on Information Systems Security and Privacy, Funchal,
Portugal, 22–24 January 2018 pp. 108–116. [CrossRef]
37. Koroniotis, N.; Moustafa, N.; Sitnikova, E.; Turnbull, B. Towards the development of realistic botnet dataset in the Internet of
Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst. 2019, 100, 779–796. [CrossRef]
38. Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch:
An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32; Curran
Associates, Inc.: Red Hook, NY, USA, 2019; pp. 8024–8035.
39. He, K.; Zhang, X.; Ren, S.; Sun, J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In
Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 11–18 December 2015; pp. 1026–1034.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

Symmetry 15 01251
No ratings yet
Symmetry 15 01251
31 pages
Seguridad
No ratings yet
Seguridad
29 pages
Usfad Based Effective Unknown Attack Detection Focused Ids Framework
No ratings yet
Usfad Based Effective Unknown Attack Detection Focused Ids Framework
25 pages
ML Algorithms for Intrusion Detection
No ratings yet
ML Algorithms for Intrusion Detection
20 pages
Machine Learning-Based Intrusion Detection System For Detecting Web Attacks
No ratings yet
Machine Learning-Based Intrusion Detection System For Detecting Web Attacks
11 pages
Comparison of Single and Ensemble Intrusion Detection Techniques Using Multiple Datasets
No ratings yet
Comparison of Single and Ensemble Intrusion Detection Techniques Using Multiple Datasets
10 pages
AI-Powered Intrusion Detection
No ratings yet
AI-Powered Intrusion Detection
6 pages
Cyber Threat Detection Synopsis
No ratings yet
Cyber Threat Detection Synopsis
14 pages
Machine Learning and Deep Learning Methods For Intrusion Detection Systems - A Survey
No ratings yet
Machine Learning and Deep Learning Methods For Intrusion Detection Systems - A Survey
29 pages
Network and Host Based Intrusion Detecti
No ratings yet
Network and Host Based Intrusion Detecti
20 pages
Supervised Machine Learning and Detection of Unknown Attacks
No ratings yet
Supervised Machine Learning and Detection of Unknown Attacks
13 pages
Intrusion Detection System Based On One-Class Supp
No ratings yet
Intrusion Detection System Based On One-Class Supp
16 pages
The Efficiency of Ensemble Machine Learning Models On Network Intrusion Detection Using KDDCup 99 Dataset
No ratings yet
The Efficiency of Ensemble Machine Learning Models On Network Intrusion Detection Using KDDCup 99 Dataset
5 pages
Improving Intrusion Detection Systems With Machine Learning To Strengthen Network Security
No ratings yet
Improving Intrusion Detection Systems With Machine Learning To Strengthen Network Security
22 pages
8499-Article Text-9477-2-10-20231102
No ratings yet
8499-Article Text-9477-2-10-20231102
12 pages
Improving Intrusion Detection System Using The Combination of Neural Network and Genetic Algorithm
No ratings yet
Improving Intrusion Detection System Using The Combination of Neural Network and Genetic Algorithm
14 pages
Hybrid Deep Learning for Intrusion Detection
No ratings yet
Hybrid Deep Learning for Intrusion Detection
16 pages
Rq3 Paper 02
No ratings yet
Rq3 Paper 02
20 pages
2023 Scopus Ensemble Based Dimensionality
No ratings yet
2023 Scopus Ensemble Based Dimensionality
5 pages
Tree-Based ML for Intrusion Detection
No ratings yet
Tree-Based ML for Intrusion Detection
17 pages
Intrusion Detection of Imbalanced Network Traffic Based On Machine Learning and Deep Learning
No ratings yet
Intrusion Detection of Imbalanced Network Traffic Based On Machine Learning and Deep Learning
14 pages
1 s2.0 S2772503023000130 Main
No ratings yet
1 s2.0 S2772503023000130 Main
13 pages
Network Attack Detection Using ML & DL
No ratings yet
Network Attack Detection Using ML & DL
5 pages
A Review of Various Datasets For Machine Learning Algorithm-Based
No ratings yet
A Review of Various Datasets For Machine Learning Algorithm-Based
25 pages
Paper 1-1
No ratings yet
Paper 1-1
26 pages
Sensors 23 01315 v2
No ratings yet
Sensors 23 01315 v2
32 pages
CNN-based Network Intrusion Detection and Classification Model For Cyber-Attacks
No ratings yet
CNN-based Network Intrusion Detection and Classification Model For Cyber-Attacks
9 pages
Machine Learning Based Intrusion Detection System
No ratings yet
Machine Learning Based Intrusion Detection System
6 pages
RNN Based Intrusion Detection System
No ratings yet
RNN Based Intrusion Detection System
5 pages
Batch 1 - 4 CSE C
No ratings yet
Batch 1 - 4 CSE C
9 pages
Machine Learning For Securing Cyber-Physical Systems Under Cyber Attacks
No ratings yet
Machine Learning For Securing Cyber-Physical Systems Under Cyber Attacks
25 pages
Machine Learning for Intrusion Detection
No ratings yet
Machine Learning for Intrusion Detection
9 pages
HDLNIDS Hybrid Deep-Learning
No ratings yet
HDLNIDS Hybrid Deep-Learning
17 pages
10 1016@j Jnca 2005 06 003 PDF
No ratings yet
10 1016@j Jnca 2005 06 003 PDF
19 pages
A Deep Learning Method With Filter Based Feature Engineering For Wireless Intrusion Detection System
No ratings yet
A Deep Learning Method With Filter Based Feature Engineering For Wireless Intrusion Detection System
11 pages
A Comprehensive Survey of Intrusion Detection System Using Machine Learning and Deep Learning Approaches
No ratings yet
A Comprehensive Survey of Intrusion Detection System Using Machine Learning and Deep Learning Approaches
6 pages
19148-Article Text-78917-2-10-20240405
No ratings yet
19148-Article Text-78917-2-10-20240405
24 pages
Reference
No ratings yet
Reference
5 pages
Manjunath Classification in NIDS Using CRNN
No ratings yet
Manjunath Classification in NIDS Using CRNN
11 pages
Elevating Cybersecurity Using AI and Deep Learning For Intrusion Detection Reinforcement
No ratings yet
Elevating Cybersecurity Using AI and Deep Learning For Intrusion Detection Reinforcement
9 pages
Sada
No ratings yet
Sada
11 pages
Hybrid DetectSystem ppt2
No ratings yet
Hybrid DetectSystem ppt2
9 pages
Ijcse V2i3p4
No ratings yet
Ijcse V2i3p4
6 pages
Using Combination of Fuzzy Set and Gravitational Algorithm For Improving Intrusion Detection
No ratings yet
Using Combination of Fuzzy Set and Gravitational Algorithm For Improving Intrusion Detection
14 pages
1113-Article Text-13197-2-10-20240125
No ratings yet
1113-Article Text-13197-2-10-20240125
8 pages
A Review of Intrusion Detection System Using Machine Learning Approach
No ratings yet
A Review of Intrusion Detection System Using Machine Learning Approach
8 pages
Elevating Cybersecurity Using AI and Deep Learning For Intrusion Detection Reinforcement
No ratings yet
Elevating Cybersecurity Using AI and Deep Learning For Intrusion Detection Reinforcement
27 pages
Anomaly Based Intrusion Detection Model Using Supervised Machine Learning Techniques
No ratings yet
Anomaly Based Intrusion Detection Model Using Supervised Machine Learning Techniques
5 pages
Effectiveness of Machine Learning Based Intrusion Detection Systems
No ratings yet
Effectiveness of Machine Learning Based Intrusion Detection Systems
12 pages
Convolutional Neural Networks For Multi-Class Intrusion Detection System-2018
No ratings yet
Convolutional Neural Networks For Multi-Class Intrusion Detection System-2018
14 pages
Hybrid AS-IDS for IoT Security
No ratings yet
Hybrid AS-IDS for IoT Security
26 pages
Intrusion Detection Using Deep Neural Network Algorithm On The Internet of Things
No ratings yet
Intrusion Detection Using Deep Neural Network Algorithm On The Internet of Things
4 pages
631eaa91dbcfb7 78471842
No ratings yet
631eaa91dbcfb7 78471842
13 pages
Sysj 20
No ratings yet
Sysj 20
14 pages
13.a Novel Wireless Network Intrusion Detection Method Based On Adaptive Synthetic Sampling and An Improved Convolutional Neural Network
No ratings yet
13.a Novel Wireless Network Intrusion Detection Method Based On Adaptive Synthetic Sampling and An Improved Convolutional Neural Network
62 pages
1ahmed Maggie2023 PDF
No ratings yet
1ahmed Maggie2023 PDF
34 pages
A Review On Machine Learning Methods For Intrusion
No ratings yet
A Review On Machine Learning Methods For Intrusion
8 pages
Using A Long Short Term Memory Recurrent
No ratings yet
Using A Long Short Term Memory Recurrent
21 pages
Optimizing XGBoost for Intrusion Detection
No ratings yet
Optimizing XGBoost for Intrusion Detection
10 pages
J2EE Multi-Tier Architecture Overview
No ratings yet
J2EE Multi-Tier Architecture Overview
31 pages
The Nurse Leader Handbook The Art and Science of Nurse Leadership by Studer Group
No ratings yet
The Nurse Leader Handbook The Art and Science of Nurse Leadership by Studer Group
307 pages
Stat Graphic Plus 5
No ratings yet
Stat Graphic Plus 5
8 pages
Application and Theory Gaps During The Rise o - 2020 - Computers and Education
No ratings yet
Application and Theory Gaps During The Rise o - 2020 - Computers and Education
20 pages
Generate rawprogram.xml for Qualcomm
No ratings yet
Generate rawprogram.xml for Qualcomm
2 pages
EN Jabra Elite 7 Pro Tech Sheet A4 Web 070222
No ratings yet
EN Jabra Elite 7 Pro Tech Sheet A4 Web 070222
2 pages
Thesis About Computer Laboratory
100% (2)
Thesis About Computer Laboratory
8 pages
Experiment # 4
No ratings yet
Experiment # 4
6 pages
5 Unit V Introduction To VHDL
No ratings yet
5 Unit V Introduction To VHDL
54 pages
Electronic Document Management: Basics of
No ratings yet
Electronic Document Management: Basics of
31 pages
7 Search Warrant Application and Affidavit 0
No ratings yet
7 Search Warrant Application and Affidavit 0
35 pages
Rust for Game Tooling Development
No ratings yet
Rust for Game Tooling Development
42 pages
Hassan Juma: Software Developer CV
No ratings yet
Hassan Juma: Software Developer CV
3 pages
Centricity Cardiology CA1000 - PDS
No ratings yet
Centricity Cardiology CA1000 - PDS
9 pages
P V F C P: V V V F P C P C P I C P
No ratings yet
P V F C P: V V V F P C P C P I C P
4 pages
Suprema Brochure
No ratings yet
Suprema Brochure
32 pages
BOSCH PAVIRO - PVA-2P500 System Amplifier Architects' and Engineers' Specifications
No ratings yet
BOSCH PAVIRO - PVA-2P500 System Amplifier Architects' and Engineers' Specifications
1 page
(Ebook) A+ Guide To Hardware by Jean Andrews ISBN 9781133135128, 1133135129 2025 Instant Download
No ratings yet
(Ebook) A+ Guide To Hardware by Jean Andrews ISBN 9781133135128, 1133135129 2025 Instant Download
100 pages
Visvesvaraya Technological University Jnana Sangama, Belagavi-590018
No ratings yet
Visvesvaraya Technological University Jnana Sangama, Belagavi-590018
15 pages
Survey Paper Final
No ratings yet
Survey Paper Final
5 pages
Android Internal Storage Tutorial
No ratings yet
Android Internal Storage Tutorial
9 pages
Assignment - 13: Title
No ratings yet
Assignment - 13: Title
2 pages
Apple Video Cards and Monitors
No ratings yet
Apple Video Cards and Monitors
2 pages
Carlet C. Boolean Functions For Cryptography and Coding Theory 2020
No ratings yet
Carlet C. Boolean Functions For Cryptography and Coding Theory 2020
577 pages
Website Design Rubric: Use The Rubric Below To Evaluate Website Designs
No ratings yet
Website Design Rubric: Use The Rubric Below To Evaluate Website Designs
2 pages
Intruders
No ratings yet
Intruders
36 pages
Debug
No ratings yet
Debug
15 pages
Acer Aspire 4935G 4936G 4735G 4736G (Compal KAL90 LA-4491p) PDF
No ratings yet
Acer Aspire 4935G 4936G 4735G 4736G (Compal KAL90 LA-4491p) PDF
52 pages
A Blockchain Approach For The Organic Food Supply
No ratings yet
A Blockchain Approach For The Organic Food Supply
20 pages
Consumable Pricelist 01202023
No ratings yet
Consumable Pricelist 01202023
2 pages

Hybrid Intrusion Detection System Based On Combination of

Uploaded by

Hybrid Intrusion Detection System Based On Combination of

Uploaded by

SS symmetry

Citation: Wang, C.; Sun, Y.; Wang, W.;

Symmetry 2023, 15, 568. https://doi.org/10.3390/sym15030568 https://www.mdpi.com/journal/symmetry

3.1. Random Forest

Tree - 1 Tree - 2 Tree - M

Figure 1. Illustration of random forest.

Figure 2. The framework of using AE to detect anomalies.

AEs usually have a symmetry structure. As demonstrated in Figure 2, it can be divided

where +1 denotes the anomalous sample.

3.3. Whole Detection Framework

decision. In addition, to reduce the number of false alarms, we apply an AE to recheck

Random Forest Attack

Algorithm 1: Testing process of our proposed method

/* Step 3: Classified the sample by RF first. */

4.2. Comparative Methods

4.3. Evaluation Metric

Attack True (TP)

Normal False (FP)

4.4. Experimental Settings

4.5. Results Analysis

Method Accuracy Precision Recall F1 FPR

Table 3. Detection performance (in %) of IDS2018 dataset.

Method Accuracy Precision Recall F1 FPR

You might also like