Automated Fake News Detection System
Automated Fake News Detection System
2024
Tuba Çağlıkantar
Department of Computer Engineering, Gazi University, Ankara, Turky
Recommended Citation
Al-obaidi, Saja A. and Çağlıkantar, Tuba (2024) "Automated Fake News Detection System," Iraqi Journal
for Computer Science and Mathematics: Vol. 5: Iss. 4, Article 2.
DOI: [Link]
Available at: [Link]
This Original Study is brought to you for free and open access by Iraqi Journal for Computer Science and
Mathematics. It has been accepted for inclusion in Iraqi Journal for Computer Science and Mathematics by an
authorized editor of Iraqi Journal for Computer Science and Mathematics. For more information, please contact
[Link]@[Link].
IRAQI JOURNAL FOR COMPUTER SCIENCE AND MATHEMATICS 2024;5:12–26
Scan the QR to view
the full-text article on
the journal website
RESEARCH ARTICLE
ABSTRACT
Online news has been the majority of people’s information source in recent decades. However, a lot of the information
that is accessible online is fake and sometimes even designed to mislead. It might be difficult for individuals to distinguish
between certain false newspaper items and the real ones since they are so similar. Deep learning (DL) and machine
learning (ML) models, among other automated false news detection (FND) techniques, are quickly becoming essential. A
comparative study was conducted to analyze the performance of five prominent deep learning models across four distinct
datasets, namely ISOT, FakeNewsNet, Dataset1, and Dataset2. Results indicated that while LSTM achieved the highest
accuracy on the ISOT dataset (99.95%), CNN-GRU stood out with an exceptional 99.97% accuracy on the FakeNewsNet
dataset. Both CNN-LSTM and LSTM exhibited almost perfect accuracies on Dataset1. On Dataset2, LSTM led with 98.64%.
However, LSTM AutoEncoder consistently demonstrated lower efficiency, with accuracies spanning between 49% and
62%. The study underscores the critical role of dataset-specific model selection in optimizing deep learning outcomes.
Keywords: Fake news, Fake news detection, ISOT dataset, FakeNewsNet, Machine learning, Deep learning
* Corresponding author.
E-mail address: derasat20192020@[Link] (S. A. Al-obaidi).
[Link]
2788-7421/© 2024 The Author(s). This is an open-access article under the CC BY license ([Link]
IRAQI JOURNAL FOR COMPUTER SCIENCE AND MATHEMATICS 2024;5:12–26 13
that not only undermines journalistic integrity but against real-world data. Additionally, our proposed
also poses a significant threat to the very fabric of framework outshines its predecessors by offering a
society [5]. Fake news, distinctly characterized by its nuanced understanding of misinformation shades, an
intent to deceive, often serves as a vessel for propa- element often overlooked in other solutions. While
gating false beliefs, manipulating public opinion, and we acknowledge the robustness of our solution, we’re
even instigating real-world consequences [6]. This is also aware of its potential limitations, primarily con-
further exacerbated when one considers the overlap cerning the adaptability of our models across varied
and occasional interchangeability of terms like “fake datasets. As the abstract highlighted, the perfor-
news” and “misinformation” [7]. For clarity, “fake mance of models like LSTM AutoEncoder may not
news” typically refers to completely false informa- be consistent across different datasets. Going for-
tion presented as news, while “misinformation” is a ward, our research endeavors will focus on refining
broader term, encompassing any false or misleading these inconsistencies and expanding our model’s ver-
information, regardless of intent [8]. The prolifera- satility to cater to a broader spectrum of fake news
tion of such false narratives, driven largely by social categories. The rest of this study is structured as fol-
media, can lead to a myriad of societal issues, rang- lows: Section 2 presents recent researches about FND
ing from mere misinformation to large-scale public published in the literature. Section 3 includes a de-
panic or even political upheavals. Previous attempts tailed description of our proposed FND architecture,
at combating this menace, although commendable, emphasizing the binary and multi-class approaches.
have often been limited in their scope, either focusing Section 4 highlights the results, underscoring the ef-
solely on binary classifications or lacking the nuance fectiveness of our innovative contribution. Finally,
required to address the subtle shades of untruths. the conclusion and future work are included in Sec-
There remains a palpable gap in devising a compre- tion 5.
hensive solution that not only identifies fake news but
also discerns the varying degrees of its authenticity 1.1. Literature review
[9]. Our research seeks to bridge this gap by intro-
ducing a multi-dimensional approach to fake news A collection of ensemble ML models for classi-
detection, thus underscoring the urgency of this issue fying news articles is cited in this research [10].
and its profound implications on society. This research examines a variety of textual traits
In this context, the current study introduces a that can be utilized to differentiate between real and
groundbreaking approach to FND. By departing from fake content. They then evaluate the performance
traditional binary classification of ‘Real’ and ‘Fake,’ of those algorithms using four real-world datasets.
we propose a multi-layered approach that incre- These properties are used to train a range of ML algo-
mentally introduces new classes such as ‘Half-True’ rithms using a number of ensemble methodologies.
and ‘Barely True.’ This nuanced classification, imple- Results show that the suggested ensemble learner
mented through eight Machine Learning (ML) and technique performs better than individual learner
three Deep Learning (DL) algorithms, provides a more approaches. Using the ISOT dataset, the random for-
precise detection of various shades of misinformation. est classifier achieves the greatest accuracy of 98%.
Utilizing the public ISOT dataset, our method adds In a research [11], lexical, sentiment, unigram, and
depth to the understanding of fake news and offers a bigram approaches with term frequency-inverse doc-
sophisticated tool for combating its spread. ument frequency (TF-IDF) and Glove and character
In light of the ever-evolving digital landscape and embedding were used to examine a number of ML
the challenges it poses, our research primarily aims to and DL algorithms including: SVM, LR, DT, LSTM,
set itself apart by introducing a layered approach to convolutional HAN, and character level CLSTM). Re-
Fake News Detection, moving beyond the rudimen- sults showed that the LSTM had the best outcome
tary binary classifications predominantly observed in of 94% accuracy was produced by NB with bigram
prior studies. Our primary contributions include the TF-IDF on the combined corpus dataset. In addition,
conception of an incremental multiclassification sys- in this work [12] a FND system was proposed using
tem which embraces categories like ‘HalfTrue’ and ML techniques. The TF-IDF of bag of words and n-
‘Barely True’ – providing a holistic perspective to grams is used as a feature extraction approach during
the subtleties of misinformation. This approach is the pre-processing stage. The SVM classifier is trained
supported by a comprehensive combination of eight using a proposed dataset. Another study extracted
Machine Learning algorithms and three advanced features using the TF-IDF method [13]. Three ML
Deep Learning models, finetuned for accuracy. Lever- models are implemented including SVM, Naïve Bayes
aging the public ISOT dataset, we’ve ensured that our (NB), and Passive Aggressive Classifier. The SVM
methodologies are adaptable and can be validated classifier achieve the best accuracy of 95.05%. An
14 IRAQI JOURNAL FOR COMPUTER SCIENCE AND MATHEMATICS 2024;5:12–26
ensemble-based-DL model for FND was put out in this tifaceted nature of truthfulness in news reporting.
study utilizing the LIAR dataset [3]. The dataset being Our approach, therefore, incorporates four distinctive
natural led to the deployment of two DL models. A datasets, enabling a more intricate analysis.
dense model was utilised for all the features except The datasets chosen for this study have been metic-
the “statement” features, they employed a bi-LSTM ulously selected to encompass a comprehensive range
dense model. Studies revealed that the suggested ap- of news dynamics, ensuring a robust and exhaustive
proach has an accuracy of 89 percent when using just exploration of fake news detection. The ISOT corpus
the statement attribute. Detecting fake news in social was an ideal starting point due to its dichotomous
media is the purpose of this research [14]. They used nature, separating news into clear ‘True’ and
the SVM and RF classifiers with and without 10-cross ‘Fake’ categories. Its extensive database of articles
validation method. The two classifiers achieved an from both reputable and less reliable sources pro-
accuracy of 83.5% after using the cross validation vided a vast landscape of real-world news, making
technique. Using datasets from LIAR and PolitiFact, a it a suitable foundational dataset. On the other hand,
research combines DL, NLP, and semantics to develop the FakeNewsNet dataset offered a unique dimension
a hybrid methodology [15]. The study assessed the to our analysis by encapsulating the propagation dy-
performance of DL models including Bi-LSTM GRU, namics of fake news through social media, evidenced
and CapsNet as well as traditional ML models like by its inclusion of tweet IDs. Recognizing that fake
MNB, SGD, LR, DT, and SVM. CapsNet prformed the news is not just an issue of binary truth or falsehood,
best on the LIAR dataset with an accuracy of 64%. but exists on a spectrum, it became imperative to in-
According to the study, including semantic data such corporate the LIARPLUS dataset. LIAR-PLUS, with its
NER feelings helped the classification model perform nuanced categories such as “Half-True” and “Barely
better. The researchers in this work have provided True”, enabled the introduction of gradient truth clas-
an ensemble classification model for accurately rec- sifications, effectively addressing the subtleties and
ognizing bogus news. The DT, RF, and Extra Tree gray areas of misinformation. Additionally, the evi-
Classifier ensemble of three well-known ML models dence sentences in LIAR-PLUS provided an extra layer
is used by the model to extract key characteristics of verification, reinforcing the model’s ability to as-
from datasets of false news and classify them. The Liar certain the truthfulness of a claim. In essence, the
dataset shows that testing accuracy was 44.15 percent amalgamation of these datasets aimed to cater to the
while training accuracy was 99.8 percent. Training complexities of fake news across different platforms
and testing accuracy were both 100 percent on the and degrees of truthfulness, ensuring our model’s ef-
ISOT dataset [16]. The authors of this study assessed ficacy across varied scenarios.
the performance of ML and three DL models on two
datasets of false and true news with varying sizes. 1) The ISOT Corpus: For binary classification, we
Hold-out cross validation was used for the evaluation. utilize the ISOT corpus, fully sourced from real-
The authors represented the text data using word world sources [17]. This dataset consists of both
frequency, TD-IDF, and embedding methods for DL true and false news articles, drawing from rep-
models. Testing accuracy for the novel stacking model utable sources like [Link] and less reliable
developed by the authors on the ISOT and KDnugget sites recognized by fact-checkers like Politifact
datasets was 99.94% and 96.05%, respectively. and Wikipedia. The articles cover various sub-
jects, emphasizing politics and global events,
and are divided into two files, “[Link]” and
2. Methodology “[Link],” with more than 12,600 articles each.
The “[Link]” file aggregates articles from sev-
As show Fig. 1, the proposed architecture for FND eral fake news sites, while the “[Link]” file
includes the collection of the ISOT dataset, followed strictly consists of stories from [Link].
by data preprocessing and the proposed classifiers im- These articles, primarily from 2016 to 2017,
plementation. The models are then trained and tested include details such as title, text, type, and
using the rafined dataset. Finally, the data is evalu- publication date, and any existing errors or
ated and compared to determine the effectiveness of punctuation in the fake news articles have been
the models. retained.
2) FakeNewsNet Dataset: The FakeNewsNet dataset
2.1. Datasets is a unique compilation that serves researchers
with data collected based on ground truths from
The detection and categorization of fake news two distinctive sources, namely Politifact and
necessitate a robust dataset that can capture the mul- Gossipcop [18–20]. With an aggregate shape of
IRAQI JOURNAL FOR COMPUTER SCIENCE AND MATHEMATICS 2024;5:12–26 15
23,196 rows and 5 columns, it’s structured to idence sentences extracted automatically from
offer a comprehensive exploration of fake news full-text verdict reports by Politifact journal-
dynamics. The dataset offers a structured in- ists. The LIAR-PLUS dataset is structured into
sight into fake news, organized around five key 15 columns, including the ID of the state-
columns: ment, label, statement, subjects, speaker’s de-
• id: A unique identifier for each entry, facilitat- tails, state information, party affiliation, credit
ing precise data referencing and operations. history counts (including “barely true,” “false,”
• news url: Direct URLs to the original news “half true,” “mostly true,” and “pants on fire”),
sources, ensuring that researchers can trace context, and the extracted justification. The in-
back the news’ primary origin for context and clusion of these evidence sentences aims to
verification. improve the benchmark for evidence retrieval,
• title: The headline or title of the news arti- demonstrating that the addition of evidence in-
cle, capturing the essence of the content and formation invariably enhances the performance
serving as the initial point of engagement for of any fake news detection method.
readers. 4) Combining the datasets: Our research contributes
• tweet ids: A list of Twitter ID references that by thoughtfully combining the ISOT corpus
correspond to tweets disseminating the news with the LIARPLUS dataset to create a nu-
article. This is particularly valuable for trac- anced and multidimensional approach to fake
ing the social media impact and propagation news detection. This combination facilitates a
dynamics of the news article. step-by-step progression from binary classifica-
• label: A binary classification indicating the ve- tion to the inclusion of intermediate classes
racity of the news article, divided into ‘Fake’ like “Half-True” and “Barely True”. Specifically,
and ‘Real’. This is the linchpin for supervised dataset1 combines the ISOT corpus with the
learning tasks or comparative analyses in the “Half-True” feature from the LIAR-PLUS dataset.
domain of fake news detection. Dataset2 builds upon this by further integrating
3) The LIAR-PLUS Dataset: To deepen our under- the “Barely True” feature, essentially combin-
standing and classification of fake news, we ing the ISOT corpus, the “Half-True” feature,
further extend our approach by integrating and the “Barely True” feature from the LIAR-
classes from the LIAR-PLUS dataset, released PLUS dataset, creating a progressive extension
in the paper [21]. This extension includes ev- from dataset1. By merging the specificities of
16 IRAQI JOURNAL FOR COMPUTER SCIENCE AND MATHEMATICS 2024;5:12–26
both datasets in this structured manner, we patterns and meanings. Each token is treated as
cater to the complexities of fake news, allow- a discrete entity, ripe for model analysis.
ing our model to discern not only outright 5) Term Frequency-Inverse Document Frequency
falsehoods but also shades of misinformation. (TFIDF): While tokenization parsed the text, not
This innovative method underscores the impor- all tokens (or words) are of equal relevance.
tance of considering the gradations of truth Certain ubiquitous words might not offer
and falsehood in the era of digital information unique insights. The TF-IDF technique was
and contributes a sophisticated tool in the fight thus employed to assign weight to each token
against the propagation of fake news. based on its relevance [16]. Essentially, a
word recurrent in a specific document but
This version captures the progression you de-
sparse across the corpus garners a higher score,
scribed in your datasets, highlighting the incremental
underscoring its importance.
approach that adds layers of complexity to your clas-
6) Data Splitting: Upon concluding preprocessing,
sification. It should reflect the novel contribution of
it became imperative to segregate the data into
your research accurately.
training and testing subsets. This bifurcation
ascertained that our model was trained on
2.2. Pre-processing
a substantial data portion (70%), while
retaining an untapped segment (30%) for
Data preprocessing is an indispensable phase in
model validation, assessing overfitting, and
the machine learning pipeline, ensuring that models
gauging real-world efficacy.
receive high-quality input for generating reliable pre-
dictions. Given the diverse origins and nature of our Through this stringent preprocessing regimen, our
datasets, meticulous preprocessing became essential models were equipped with data that was purified,
to harmonize discrepancies and elevate input data balanced, tokenized, weighted for significance, and
quality. Here’s a detailed breakdown of our approach: judiciously partitioned, laying the groundwork for
potent and efficient machine learning.
1) Feature Selection and Aggregation: Owing to the
rich features present in our datasets, we deemed 2.3. Proposed classifiers
it crucial to focus on the most pertinent ones. We
singled out the ‘title’ and ‘text’ attributes, as they 2.3.1. Machine learning classifiers
encompassed the crux of the news articles. By In the study, seven ML models were used to detect
amalgamating these features into one column, false news from the ISOT dataset. The models were
we fashioned a coherent text body, streamlining thoroughly discussed.
subsequent preprocessing steps. 1- Decision Tree A Decision Tree (DT) is a
2) Balancing the Dataset: Within the FakeNews- supervised ML algorithm that identifies relationships
Net dataset, there was a discernible imbalance between variables in a dataset to predict outcomes
between ‘real’ and ‘fake’ news entries. To en- [22]. It uses a topdown tree-like structure where
sure a balanced model training, we undertook nodes are either class labels or decision points that
downsampling of the ‘real’ news entries, ensur- determine the outcome. The algorithm is simple
ing parity with the ‘fake’ news count. to understand and its decision-making process is
3) Text Cleaning using Regular Expressions: Initiat- transparent. However, it may not produce accurate
ing the cleaning process, it was pivotal to cleanse results with small datasets due to being a weak
the data of extraneous and potentially confound- learner. The crucial step in DT learning is choosing
ing elements. Utilizing regular expressions, we the most relevant attribute, which various DT
efficiently eradicated HTML tags, ubiquitous algorithms approach differently, such as using
in web-scraped data. Additionally, superflu- Information Gain (ID3 algorithm) or Gain Ratio
ous punctuations, which might introduce noise, (C4.5 algorithm).The gain ratio and information gain
were purged. A uniform text landscape was may be determined using an attribute with n distinct
achieved by converting all text to lowercase, values and a training dataset Gt as follows:
ensuring words like ‘The’ and ‘the’ weren’t dis- |Gt |
Entropy (Gt )− nj=1 |Gt|j Entropy (Gt j )
P
tinctively treated. Gain(A; Gt ) =
4) Tokenization: Post cleaning, tokenization was 6
our next endeavor. This entailed fragmenting (1)
the cleaned text into discrete words or tokens.
This step’s significance is monumental, as to- Gain (Gt; I)
kenized text facilitates algorithms to discern GainRatio (I; Gt ) = (2)
IV (I)
IRAQI JOURNAL FOR COMPUTER SCIENCE AND MATHEMATICS 2024;5:12–26 17
maximizing the margin and minimizing the classi- [Link]. CNN-GRU. Gated Recurrent Unit (GRU) is
fication error, xi represents individual data points, a powerful variation of the standard RNN that in-
yi is the label associated with xi , ξi are the slack corporates a combined gating mechanism, similar
variables that allow for soft-margin classification, fa- to LSTM, to address short-term memory limitations.
cilitating the handling of non-linearly separable data Within the GRU architecture, a set of gates controls
and avoiding overfitting. and regulates the flow of information, enabling the
model to determine which information is crucial to
2.3.2. Deep learning classifiers retain or discard [28]. This gating mechanism facili-
Convolutions neural networks (CNN) and Recurrent tates effective learning and prediction by selectively
neural network (RNN) are two sub fields of DL. For DL passing on important information. In this research,
models are proposed in our research CNN, CNN-GRU, we implement a hybrid model that combine the CNN
LSTM, and CNN-LSTM. and GRU models. This model combines convolutional
layers for capturing local patterns and a GRU layer for
[Link]. Convolution neural network. The CNN model capturing temporal dependencies. The model has 64
is a sequential model consisting of an embedding filters in the first convolutional layer and 128 filters
layer followed by dropout layers, three convolutional in the second convolutional layer. Max pooling is
layers with different kernel sizes (3, 4, and 5), a applied after each convolutional layer with a pool size
global max pooling layer, dense layers, and dropout of 2. The GRU layer has 64 units and utilizes dropout
layers. The embedding layer maps each word in the with a rate of 0.2 to prevent overfitting. The model
input sequence to a 128-dimensional vector repre- is flattened before passing through a dense layer with
sentation. Dropout layers help prevent overfitting, 64 units and ReLU activation. The final dense layer
and the convolutional layers with 128 filters capture has 2 units, representing the number of classes, and
local patterns in the input sequence. The global max utilizes the sigmoid activation function.
pooling layer reduces dimensionality, and the dense
layers capture complex relationships. The final dense 3. Optimization and evaluation metrics
layer with the number of classes as units and sigmoid justification
activation produces classification probabilities.
For our deep learning models, optimization was a
[Link]. CNN-LSTM. The sequential model begins pivotal aspect to ensure enhanced performance and
with an Embedding layer, which maps each word in prediction accuracy. The optimization process was
the input sequence to a dense vector representation. multifaceted and included various steps, as elabo-
The input dimension is determined by the length of rated below:
the tokenizer’s word index plus one, and the output
dimension is set to the specified embedding size. The 3.1. Model optimization
input length is defined as the maximum sequence
length. Convolutional layers are then added to cap- 1) Hyperparameter Tuning: Using techniques such
ture local patterns in the input sequence. The first as grid search and random search, we system-
Conv1D layer has 64 filters with a kernel size of 3 and atically assessed various hyperparameters like
uses the ReLU activation function. Max pooling with learning rate, dropout rate, and batch size. This
a pool size of 2 is applied to reduce the dimensionality empirical approach aided in identifying the op-
of the features. The second Conv1D layer has 128 timal hyperparameter set that would boost the
filters with a kernel size of 3 and also uses the ReLU model’s performance.
activation function. Again, max pooling is applied. 2) Regularization: To counter overfitting, we in-
Next, an LSTM (Long Short-Term Memory) layer is corporated dropout layers in the model. By
added. The LSTM layer has 64 units and incorporates randomly setting a fraction of input units to 0
dropout with a rate of 0.2 to prevent overfitting. The during training, the model became less sensitive
LSTM layer is well-suited for capturing long-range to specific weights, fostering more generaliza-
dependencies and sequential patterns in the input tion.
sequence. A dense layer with 64 units and the ReLU 3) Learning Rate Scheduling: Adapting the learning
activation function follows the LSTM layer to capture rate during training, either by reducing it when
complex relationships in the data. Finally, a dense a plateau was detected or using a decay rate,
layer with 2 units and the sigmoid activation func- we ensured that the model converged faster and
tion is added for binary classification, with each unit more robustly.
representing a class (e.g., true or false for fake news 4) Early Stopping: We constantly monitored the
detection). validation loss during training. If it ceased
IRAQI JOURNAL FOR COMPUTER SCIENCE AND MATHEMATICS 2024;5:12–26 19
Table 1. The machine learning classifiers results. Table 2. The deep learning classifiers results.
Classifier ACC Class PREC REC F-S Classifier ACC Class PREC REC F-S
Random Forest 83.66% Fake 88% 40% 55% CNN 98.59% Fake 96% 99% 97%
Real 83% 98% 90% Real 100% 99% 99%
Gradient Boosting 82.24% Fake 100% 29% 45% CNN-LSTM 98.03% Fake 99% 93% 96%
Real 81% 100% 89% Real 98% 100% 99%
XGBoost 86.89% Fake 89% 55% 68% CNN-GRU 97.99% Fake 98% 94% 96%
Real 87% 98% 92% Real 98% 99% 99%
Linear SVM 34.15% Fake 25% 79% 38% LSTM 99.95% Fake 100% 100% 100%
Real 73% 19% 30% Real 100% 100% 100%
RBF SVM 74.89% Fake 0% 0% 0% LSTM AutoEncoder 52% Fake 100% 52% 69%
Real 75% 100% 86% Real 0% 0% 0%
Logistic Regression 74.82% Fake 29% 0% 0%
Real 75% 100% 86%
Decision Tree 99.73% Fake 100% 100% 100%
Real 100% 100% 100% were 99%, 93%, and 96%, respectively. The sensi-
tivity and specificity for the fake class were 93.39%
and 99.56%, respectively. For the real class, the preci-
for comparison being accuracy. Among the models, sion, recall, and F1-score were 98%, 100%, and 99%,
the top two performers were the Random Forest and respectively. The sensitivity and specificity for the
XGBoost models. The Random Forest model achieved real class were 99.56% and 93.39%, respectively. The
an accuracy of 83.66%, with a precision of 88% and CNN-GRU model achieved an accuracy of 97.99%.
recall of 40% for the fake class, and a precision of The precision, recall, and F1-score for the fake class
83% and recall of 98% for the real class. On the other were 98%, 94%, and 96%, respectively. The sensitiv-
hand, the XGBoost model achieved an accuracy of ity and specificity for the fake class were 93.45% and
86.89%, with a precision of 89% and recall of 55% 99.45%, respectively. For the real class, the precision,
for the fake class, and a precision of 87% and recall recall, and F1-score were 98%, 99%, and 99%, respec-
of 0.98% for the real class. These results highlight the tively. The sensitivity and specificity for the real class
effectiveness of ensemble-based models, specifically were 99.94% and 93.57%, respectively.
Random Forest and XGBoost, in accurately distin- Among the three deep models, the CNN model
guishing between fake and real news articles. The achieved the highest accuracy of 98.59% and exhib-
rest of models achieved varying levels of accuracy in ited excellent performance in classifying both fake
classifying fake and real news articles. While Gradient and real news articles. With a precision of 96% and
Boosting performed relatively well with an accuracy recall of 99% for the fake class, the CNN model
of 82.24%, Logistic Regression, Decision Tree, Linear demonstrated its ability to accurately identify fake
SVM, and RBF SVM showed lower accuracies ranging articles while minimizing false negatives. Similarly,
from 34.15% to 74.89%. for the real class, it achieved a precision of 100% and
2) Deep Learning Classifiers Results: Table 2 presents recall of 99%, indicating its proficiency in correctly
the results of the deep models (CNN, CNN-LSTM, classifying real news articles.
and CNNGRU) in classifying fake and real news ar-
ticles. The models achieved high accuracy, with the 4.4. Binary classification using the fake news net
CNN model achieving an accuracy of 98.59%. For dataset
the fake class, the CNN model achieved a precision
of 96%, indicating that it correctly identified 96% 1) Machine Learning Classifiers Results: Table 3 illus-
of the fake articles. The recall (or sensitivity) for trates the performance of several machine learning
the fake class was 99%, meaning it correctly clas- classifiers on the FakeNewsNet dataset. Among them,
sified 99% of the actual fake articles. The F1-score, XGBoost outperforms the others with an accuracy of
which considers both precision and recall, was 97%, 99.81%, achieving perfect scores in precision, recall,
indicating a good balance between the two metrics. and F-score for both Fake and Real classes. Following
The sensitivity and specificity for the fake class were closely, both Linear SVM and RBF SVM achieve an
98.52% and 98.62%, respectively. Similar high per- accuracy of 99.67% and 99.58%, respectively, with
formance was observed for the real class, with a perfect scores in most metrics, except for a slight
precision of 100%, recall of 99%, and F1-score of reduction in precision and F-score for the Real class
99%. The sensitivity and specificity for the real class in RBF SVM. Gradient Boosting also exhibits strong
were 98.62% and 98.52%, respectively. For the CNN- performance, with an accuracy of 99.54% and high
LSTM model, an accuracy of 98.03% was achieved. scores across all metrics, notably achieving perfect
The precision, recall, and F1-score for the fake class F-Scores for both classes. Conversely, Random Forest
IRAQI JOURNAL FOR COMPUTER SCIENCE AND MATHEMATICS 2024;5:12–26 21
Table 3. The machine learning classifiers results. Table 4. The deep learning classifiers results.
Classifier ACC Class PREC REC F-S Classifier ACC Class PREC REC F-S
Random Forest 99.14% Fake 99% 99% 99% CNN 69.07% Fake 68% 72% 70%
Real 99% 99% 99% Real 71% 67% 69%
Gradient Boosting 99.54% Fake 100% 99% 100% CNN-LSTM 97.99% Fake 98% 94% 96%
Real 99% 100% 100% Real 98% 99% 99%
XGBoost 99.81% Fake 100% 100% 100% CNN-GRU 99.97% Fake 100% 100% 100%
Real 100% 100% 100% Real 100% 100% 100%
Linear SVM 99.67% Fake 100% 100% 100% LSTM 93.74% Fake 100% 100% 100%
Real 100% 100% 100% Real 100% 100% 100%
RBF SVM 99.58% Fake 100% 100% 100% LSTM AutoEncoder 52% Fake 100% 52% 69%
Real 99% 100% 99% Real 0% 0% 0%
Logistic Regression 99.04% Fake 99% 99% 99%
Real 99% 99% 99%
Decision Tree 77.84% Fake 56% 57% 56%
Real 85% 85% 85% in Table 5 of fake news detection utilizing dataset1,
which consists of three classes - fake (0), true
(Eq. (1)), and half-true (Eq. (2)), has yielded insight-
and Logistic Regression, while still demonstrating ful results. Among the machine learning algorithms
high performance, present the lowest accuracies at applied, Random Forest achieved an accuracy of
99.14% and 99.04%, respectively. They achieve a 93.28%, with a precision, recall, and F1-score of 0.93,
balanced but slightly lower performance across both 0.94, and 0.93 for class 0, and slightly higher values
classes. for class 1 and class 2. The Gradient Boosting model
2) Deep Learning Classifiers Results: Table 4 show- exhibited a significant improvement with an accuracy
cases the performance of various deep learning of 98.97%, demonstrating near-perfect precision and
classifiers, presenting a wide range of results. CNN- recall for classes 0 and 1, while achieving a 0.93
GRU tops the list with an extraordinary accuracy of precision and 0.97 recall for class 2. The Logistic
99.97%, achieving perfect scores across precision, re- Regression model achieved an even higher accuracy
call, and F-score for both the Fake and Real classes. at 98.99%, with almost uniform precision, recall, and
CNN-LSTM also demonstrates strong performance, F1-score across all three classes. XGBoost emerged
with an accuracy of 97.99%, and high scores across as the top performer with an impressive accuracy of
all metrics, particularly excelling in precision and 99.56%, reaching perfection in precision and recall
recall for the Real class. In contrast, the CNN model for classes 0 and 1, and a 0.98 score for class 2. The
shows a significantly lower accuracy of 69.07%, with Decision Tree (D-tree) model closely followed, with
a substantial gap in precision and recall between the an accuracy of 99.27% and similar distribution in
Fake and Real classes. LSTM exhibits a notable result precision, recall, and F1-score. Finally, the SVM with
with an accuracy of 93.74%, achieving perfect scores RBF kernel and Linear SVM models had accuracies
across all metrics, but still falling behind the more of 92.99% and 92.22%, respectively, with compara-
advanced CNN-GRU model. The LSTM AutoEncoder ble performance across the three classes, although
represents a unique case, as it presents an accuracy slightly lower than other models. The high accu-
of only 52%, with significant disparities between racy, precision, and recall scores across most models,
classes. While the model performs well for the Fake particularly XGBoost, suggest a successful implemen-
class in precision and recall, it fails completely for the tation of the multiclass classification system. The
Real class, showing 0% in all three metrics. This range results highlight the efficacy of complex ensemble
of results illustrates the capabilities and limitations methods like Gradient Boosting and XGBoost, demon-
of various deep learning architectures in this specific strating their superior capability in handling nuanced
context. The superiority of CNN-GRU underscores the classifications such as discerning outright falsehoods,
effectiveness of combining convolutional layers with absolute truths, and the shades of misinformation
recurrent gates for this task. At the same time, the dra- in between. However, simpler models like Random
matic difference in LSTM AutoEncoder’s performance Forest and SVM also performed commendably, show-
between classes reveals potential challenges in model casing the robustness and versatility of the approach.
tuning or data representation. 2) Deep Learning Classifiers Results: Our exploration
into deep learning techniques for the multiclass clas-
4.5. Results using dataset1 sification of fake news detection (consisting of classes:
fake (0), true (Eq. (1)), and half-true (Eq. (2)))
1) Machine Learning Classifiers Results: Our inves- revealed remarkable findings. Among the models ana-
tigation into the multiclass classification as shown lyzed, LSTM demonstrated extraordinary proficiency
22 IRAQI JOURNAL FOR COMPUTER SCIENCE AND MATHEMATICS 2024;5:12–26
Table 5. The machine learning classifiers results using dataset1. Table 6. The deep learning classifiers results using dataset1.
Classifier ACC Class PREC REC F-S Classifier ACC Class PREC REC F-S
Random Forest 93.28% Fake 93% 94% 93% CNN 59.34% Fake 45% 62% 52%
Real 94% 93% 93% Real 72% 56% 63%
half-true 95% 88% 91% half-true 86% 73% 79%
Gradient Boosting 98.97% Fake 99% 98% 99% CNN-LSTM 99.90% Fake 100% 100% 100%
Real 99% 100% 100% Real 100% 100% 100%
half-true 93% 97% 95% half-true 99% 99% 99%
XGBoost 99.56% Fake 100% 99% 100% CNN-GRU 99.96% Fake 100% 100% 100%
Real 100% 100% 100% Real 100% 100% 100%
half-true 98% 98% 98% half-true 100% 99% 100%
Linear SVM 92.22% Fake 92% 93% 92% LSTM 99.90% Fake 100% 100% 100%
Real 92% 92% 92% Real 100% 100% 100%
half-true 96% 89% 92% half-true 100% 100% 100%
RBF SVM 92.99% Fake 92% 94% 93% LSTM AutoEncoder 49% Fake 100% 49% 66%
Real 94% 92% 93% Real 0% 0% 0%
half-true 94% 86% 90% half-true 0% 0% 0%
Logistic Regression 98.99% Fake 99% 99% 99%
Real 99% 99% 99%
half-true 99% 95% 97%
Table 7. The machine learning classifiers results.
Decision Tree 99.27% Fake 99% 99% 99%
Real 100% 100% 100% Classifier ACC Class PREC REC F-S
half-true 97% 96% 96%
Random Forest 88.25% Fake 90% 94% 92%
Real 94% 91% 92%
half-true 54% 59% 56%
barely-true 45% 30% 36%
with an accuracy of 99.90%, achieving perfect pre- Gradient Boosting 94.89% Fake 100% 98% 99%
Real 99% 100% 100%
cision, recall, and F1-score across all three classes.
half-true 55% 76% 64%
The CNNLSTM model also yielded an accuracy of barely-true 49% 29% 36%
99.90%, with perfection in precision, recall, and F1- XGBoost 95.19% Fake 99% 99% 99%
score for classes 0 and 1, and near-perfect scores for Real 100% 100% 100%
class 2. The CNN-GRU model topped the performance half-true 57% 64% 60%
barely-true 48% 42% 45%
with an outstanding accuracy of 99.96%, reaching
Linear SVM 91.08% Fake 93% 96% 95%
a perfect 1.00 in nearly all metrics, save for a 0.99 Real 96% 95% 96%
recall for class 2. In stark contrast, the standalone half-true 55% 65% 59%
CNN model lagged behind significantly, with an accu- barely-true 45% 26% 33%
racy of only 59.34%, exhibiting a marked discrepancy RBF SVM 89.26% Fake 91% 95% 93%
Real 94% 92% 93%
in performance across classes; the precision ranged
half-true 56% 63% 59%
from 0.45 to 0.86, and the recall varied from 0.56 barely-true 47% 30% 36%
to 0.73. Logistic Regression 94.91% Fake 99% 99% 99%
Real 99% 99% 99%
4.6. Results using dataset2 half-true 60% 70% 64%
barely-true 52% 37% 43%
Decision Tree 94.87% Fake 99% 99% 99%
1) Machine Learning Classifiers Results: In our com- Real 100% 100% 100%
prehensive evaluation using dataset2, as shown in half-true 55% 60% 57%
Tables 6 and 7, the introduction of the “barely-true” barely-true 47% 43% 45%
class added a nuanced layer to the fake news de-
tection task, generating varied performances across
the models. Random Forest reported an accuracy of recall for classes 2 and 3. The XGBoost model fur-
88.25%, displaying strong results for classes 0 and ther advanced the accuracy to 95.19%, maintaining
1 but a considerable decline in precision and recall outstanding performance in classes 0 and 1, with
for the newly introduced classes 2 and 3. Gradient moderate results in the new categories. The Deci-
Boosting marked a significant increase in accuracy sion Tree model closely followed with an accuracy
to 94.89%, with near-perfect scores for the first of 94.87%. RBF SVM and Linear SVM reported ac-
two classes, while the results for the half-true and curacies of 89.26% and 91.08% respectively, again
barely-true categories remained less optimal. Simi- showing strong results for classes 0 and 1, with a
larly, Logistic Regression achieved an accuracy of more varied performance for classes 2 and 3. The
94.91%, with a slight improvement in precision and consistent excellence in identifying outright fake and
IRAQI JOURNAL FOR COMPUTER SCIENCE AND MATHEMATICS 2024;5:12–26 23
Table 8. The deep learning classifiers results using dataset2. Table 9. Accuracy of machine learning classifiers across datasets.
Classifier ACC Class PREC REC F-S Classifier ISOT FakeNewsNet Dataset1 Dataset2
CNN 61.88%
Fake 64% 65% 64% Random Forest 83.66% 99.14% 93.28% 88.25%
Real 62% 64% 63% Gradient Boosting 82.24% 99.54% 98.97% 94.89%
half-true 94% 44% 60% XGBoost 86.89% 99.81% 99.56% 95.19%
barely-true 0% 100% 0% Linear SVM 34.15% 99.67% 92.22% 91.08%
CNN-LSTM 96.6% Fake 100% 100% 100% RBF SVM 74.89% 99.58% 92.99% 89.26%
Real 100% 100% 100% Logistic Regression 74.82% 99.04% 98.99% 91.75%
half-true 97% 59% 73% Decision Tree 99.73% 77.84% 99.27% 89.73%
barely-true 16% 68% 26%
CNN-GRU 95.77% Fake 100% 100% 100%
Real 100% 100% 100%
half-true 100% 56% 71%
4.7. Comparing results
barely-true 0.1% 83% 0.2%
LSTM 98.64% Fake 100% 100% 100%
Real 100% 100% 100% As illustrated in Table 9 For the ISOT dataset, the
half-true 93% 83% 88% Random Forest classifier achieved an accuracy of
barely-true 77% 92% 84% 83.66%. When tested on the FakeNewsNet dataset,
LSTM AutoEncoder 62% Fake 64% 65% 64%
its performance jumped remarkably to 99.14%. On
Real 62% 64% 63%
half-true 94% 44% 60% Dataset1, the accuracy was 93.28%, and for Dataset2,
barely-true 0% 100% 0% it was 88.25%. Gradient Boosting delivered an accu-
racy of 82.24% on the ISOT dataset. Its performance
was exceptional on the FakeNewsNet dataset with an
true news contrasts with the relative challenge in accuracy of 99.54%. On Dataset1 and Dataset2, the
classifying the more nuanced categories of “half-true” accuracies were 98.97% and 94.89%, respectively.
and “barely-true.” These results emphasize the com- This model stood out particularly on the FakeNews-
plexity of distinguishing finer shades of truth, where Net dataset, achieving an outstanding accuracy of
models like XGBoost and Gradient Boosting outper- 99.81%. On the ISOT dataset, it scored 86.89%. For
formed others, while also illustrating the limitations Dataset1, the accuracy was 99.56% and for Dataset2,
in capturing the subtleties of misinformation. it was 95.19%. Linear SVM had a notable drop in
2) Deep Learning Classifiers Results: The investi- performance on the ISOT dataset, where it managed
gation of deep learning algorithms on dataset2, as only 34.15% accuracy. However, its accuracy on the
shown in Table 8, unveiled unique insights into fake FakeNewsNet dataset was 99.67%. On Dataset1, it
news detection in a multiclass setting. The LSTM scored 92.22% and on Dataset2, the accuracy was
model demonstrated strong accuracy at 98.64%, 91.08%. For the ISOT dataset, the RBF SVM achieved
achieving perfection in classes 0 and 1, but a notice- 74.89% accuracy. On the FakeNewsNet dataset, its
able drop in precision and recall for classes 2 and 3. accuracy was 99.58%. The model achieved 92.99%
Similarly, the CNN-LSTM model reached an accuracy accuracy on Dataset1 and 89.26% on Dataset2. On
of 96.06%, with flawless scores for the fake and true the ISOT dataset, Logistic Regression recorded an ac-
classes but a significant decline in the halftrue cat- curacy of 74.82%. It performed relatively well on the
egory, and an unexpected result in the barely-true FakeNewsNet dataset with an accuracy of 99.04%. On
class. The CNN model’s performance was distinctly Dataset1, its accuracy was nearly perfect at 98.99%,
modest with an accuracy of 61.88%, reflecting more while for Dataset2, it scored 91.75%. The Decision
balanced yet lower precision and recall across the Tree classifier showcased a stellar performance on
four classes, and an anomalous result for the barely- the ISOT dataset with an almost perfect accuracy of
true class with only one support instance. Lastly, the 99.73%. However, its performance slightly dipped on
CNN-GRU model reported an accuracy of 95.77%, the FakeNewsNet dataset, securing an accuracy of
again showcasing impeccable results for classes 0 77.84%. For Dataset1, the accuracy was 99.27%, and
and 1, with a sharp reduction in performance for on Dataset2, it was 89.73%. Most of the classifiers
the newly introduced classes. The disparate outcomes demonstrated exceptionally high accuracy rates on
in these models indicate the varying capabilities of the FakeNewsNet and Dataset1, with XGBoost emerg-
deep learning techniques in discerning between fake, ing as the most consistent in performance across all
true, half-true, and barely-true news. While the LSTM datasets. Linear SVM’s performance was notably vari-
and CNN-GRU models showed a more consistent and able, achieving high accuracy on some datasets while
robust performance, the CNN-LSTM and CNN models lagging significantly on others, particularly the ISOT
seemed to struggle with the more nuanced classes, dataset. Decision Tree’s almost perfect accuracy on
especially the barely-true category. the ISOT dataset was another standout result.
24 IRAQI JOURNAL FOR COMPUTER SCIENCE AND MATHEMATICS 2024;5:12–26
Table 10. Accuracy of deep learning models on different datasets. boundary that a linear hyperplane cannot efficiently
Model ISOT FakeNewsNet Dataset1 Dataset2 dissect.
CNN 98.59% 69.07% 59.34% 61.88% In Table 10, Deep Learning models present in-
CNN-LSTM 98.03% 97.99% 99.90% 96.6% triguing results. The CNN, although often powerful,
CNN-GRU 97.99% 99.97% 99.96% 95.77% underperforms on the FakeNewsNet, Dataset1, and
LSTM 99.95% 93.74% 99.90% 98.64% Dataset2. This suggests that the datasets might have
LSTM AutoEncoder 52% 52% 49% 62%
long-range dependencies and sequential patterns that
simple CNNs can’t capture effectively. CNN-LSTM
and CNN-GRU hybrids show robust performances,
The Table 10 presents a comparative analysis of pointing to the significance of combining feature
the performance of five deep learning models across extraction capabilities of CNNs with the temporal
four datasets. When evaluating the ISOT dataset, pattern detection strengths of RNNs like LSTM and
the LSTM model exhibited the highest accuracy at GRU. LSTM, with its memory cells, has an exceptional
99.95%, closely followed by the CNN with 98.59%. ability to remember past information which might
Interestingly, the FakeNewsNet dataset saw almost be the reason for its consistent high accuracy. The
impeccable performance from the CNN-GRU model LSTM AutoEncoder’s mediocre performance suggests
at 99.97%, and the CNN-LSTM wasn’t far behind with that the compression and decompression mechanism
97.99%. For Dataset1, both the CNN-LSTM and LSTM might be losing vital features necessary for accurate
achieved near-perfect scores, both registering an ac- classification in these specific datasets.
curacy of 99.90%. Meanwhile, on Dataset2, the LSTM In summary, the performance variations across
led the pack with an accuracy of 98.64%. A notable different models accentuate the importance of under-
outlier in these results was the LSTM AutoEncoder, standing dataset intricacies and tailoring models that
which consistently lagged behind its counterparts, can best capture the underlying patterns in the data.
with its performance ranging between 49% to 62%
across the datasets. This comprehensive assessment 4.8. Comparative analysis of benchmark results
reveals significant disparities in model efficiencies
depending on the dataset used, emphasizing the im- The tabulated benchmarks in Table 11 offer a
portance of model selection based on the nature of the holistic perspective on the performance of various
data at hand. Upon close observation of the data from methodologies in the realm of fake news detection.
Table 9, it is evident that there are substantial differ- [10] utilized an ensemble methodology, highlight-
ences in the accuracy rates across different classifiers ing the strength of combining various base learners,
and datasets. One of the most surprising results is and achieved an impressive 98% accuracy on the
the performance of the Decision Tree classifier on the ISOT dataset with a Random Forest classifier. On
ISOT dataset with an astonishing accuracy of 99.73%. another front, LSTM’s prowess in handling sequential
Given that Decision Trees tend to overfit when pre- data is evident from [11] where it outperforms other
sented with a lot of features, it’s likely that the ISOT models on a combined corpus. However, the inno-
dataset has distinct features that the Decision Tree vative use of CapsNet in [15], which encapsulates
could efficiently leverage, or it could also be a sign of hierarchical relationships in data, showcased a mod-
overfitting. In contrast, its performance dipped sig- est result, underscoring the challenges of the LIAR
nificantly on the FakeNewsNet dataset. This suggests dataset or potential model-data misalignments. This
that the nature of data in the FakeNewsNet dataset study’s novel stacking approach, combining both ML
is more complex, with possibly overlapping features, and DL techniques, has demonstrated its versatility
which usually challenges Decision Trees. with high accuracy rates on both the ISOT and KD-
XGBoost, a gradient boosting algorithm, maintains nugget datasets. In summary, while many methods
consistently high accuracy across all datasets. Its effi-
ciency is attributed to its ability to combine multiple Table 11. Comparison of accuracy results from various studies.
decision trees and correct the errors of predecessor
Achieved
trees iteratively. Furthermore, the regularization pa- Reference Best Model/Dataset Accuracy
rameters in XGBoost play a pivotal role in preventing
[10] Random Forest/ISOT 98%
overfitting, hence making it versatile across different [11] LSTM/Combined Corpus 94%
datasets. [13] SVM 95.05%
The Linear SVM has a drastic performance dip [3] Bi-LSTM Dense/LIAR (Statement) 89%
on the ISOT dataset. SVMs, especially the linear [14] SVM, RF (with 10-fold CV) 83.5%
ones, rely on the data being linearly separable. The [15] CapsNet/LIAR 64%
This Study CNN-GRU/Dataset1 99.96%
ISOT dataset might have a more convoluted decision
IRAQI JOURNAL FOR COMPUTER SCIENCE AND MATHEMATICS 2024;5:12–26 25
offer promising results, the choice of dataset and the evolve our defense mechanisms in tandem, ensuring
inherent features play a pivotal role in determin- that truth and authenticity remain unassailable.
ing the efficacy of a model, making it imperative to
consider both the model and data intricacies when
Acknowledgement
benchmarking and devising new strategies.
Authors would like to thank Al_raqia for moral
support.
5. Conclusion and future work
Human-Centric Smart Environments for Health and Well-being Learning and Deep Learning Techniques in Computer Vision,”
(IHSH). IEEE, 2021, pp. 125–130. Babylonian Journal of Machine Learning, pp. 48–55, 2024. doi:
13. J. Shaikh and R. Patil, “Fake news detection using machine [Link]
learning,” in 2020 IEEE International Symposium on Sustain- 21. T. Alhindi, S. Petridis, and S. Muresan, “Where is your ev-
able Energy, Signal Processing and Cyber Security (iSSSC). IEEE, idence: Improving fact-checking by justification modeling,”
2020, pp. 1–5. in Proceedings of the First Workshop on Fact Extraction and
14. W. E. Noori and A. S. Albahri, “Towards Trustworthy My- VERification (FEVER), 2018, pp. 85–90.
opia Detection: Integration Methodology of Deep Learning 22. X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda,
Approach, XAI Visualization, and User Interface System,” Ap- G. J. McLachlan, A. Ng, B. Liu and S. Y. Philip et al., “Top
plied Data Science and Analysis, pp. 1–15, 2023. doi: https: 10 algorithms in data mining,” Knowledge and information
//[Link]/10.58496/ADSA/2023/001. systems, vol. 14, no. 1, pp. 1–37, 2008.
15. A. M. Bras¸oveanu and R. Andonie, “Integrating machine 23. T. K. Ho, “Random decision forests,” in Proceedings of 3rd
learning techniques in semantic fake news detection,” Neural international conference on document analysis and recognition,
Processing Letters, vol. 53, no. 5, pp. 3055–3072, 2021. vol. 1. IEEE, 1995, pp. 278–282.
16. T. Jiang, J. P. Li, A. U. Haq, A. Saboor, and A. Ali, “A novel 24. J. H. Friedman, “Greedy function approximation: a gradi-
stacking approach for accurate detection of fake news,” IEEE ent boosting machine,” Annals of Statistics, pp. 1189–1232,
Access, vol. 9, pp. 22626–22639, 2021. 2001.
17. F. A. Ozbay and B. Alatas, “Fake news detection within 25. D. R. Cox, “The regression analysis of binary sequences,”
online social media using supervised artificial intelligence Journal of the Royal Statistical Society: Series B (Methodological),
algorithms,” Physica A: statistical mechanics and its applications, vol. 20, no. 2, pp. 215–232, 1958.
vol. 540, p. 123174, 2020. 26. T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting
18. K. Shu, D. Mahudeswaran, S. Wang, D. Lee, and H. Liu, “Fak- system,” in Proceedings of the 22nd ACM SIGKDD Interna-
enewsnet: A data repository with news content, social context tional Conference on Knowledge Discovery and Data Mining, ser.
and dynamic information for studying fake news on social KDD ’16. New York, NY, USA: ACM, 2016, pp. 785–794.
media,” arXiv preprint arXiv:1809.01286, 2018. [Online]. Available: [Link]
19. K. Shu, A. Sliva, S. Wang, J. Tang, and H. Liu, “Fake news 2939785.
detection on social media: A data mining perspective,” ACM 27. C. Cortes and V. Vapnik, “Support-vector networks,” Machine
SIGKDD Explorations Newsletter, vol. 19, no. 1, pp. 22–36, learning, vol. 20, no. 3, pp. 273–297, 1995.
2017. 28. J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical
20. A. A. Nafea, S. A. Alameri, R. R. Majeed, M. A. Khalaf, evaluation of gated recurrent neural networks on sequence
and M. M. AL-Ani, “A Short Review on Supervised Machine modeling,” arXiv preprint arXiv:1412.3555, 2014.