Fake News Spotting Using Interrelated Feature Selection Model Using Logistic Reg
Fake News Spotting Using Interrelated Feature Selection Model Using Logistic Reg
Abstract— This study aids us in evaluating the validity of fake news on them rather than traditional media like newspapers
news by employing many classification strategies. Fake news and television: it's easier to share, discuss, and debate news
actually has a major effect on our social lives. Specifically, in stories with friends and other readers on these sites; and it's
the fields of politics and teaching. By developing a often more timely and less expensive to consume
victimization-based fake news detection model with a separate information on these sites than on traditional media like
approach of categorization, a solution to the problems caused
by fake news is proposed. When resources, such as fake news
newspapers and television. [6] For instance, 62 percent of
detection, are taken into account, things get tricky. "Fake American adults rely on social media for news in 2016, up
news," or news reports that are erroneous and come from from 49 percent who said they would check it out if they
questionable sources, can be easily identified with the use of could only choose one source in 2012 [7]. It was also found
applications developed using NLP (Natural Language that social media has surpassed television as the primary
Processing) techniques, which this research aims to do. Using a source of news. [8] Technology advancements have made it
(Term Frequency Inverse Document Frequency) tfi-df matrix, simpler to fabricate and disseminate fake information, but it
which tallies words based on how often they appear in other is now much more difficult to determine whether or not any
articles in your dataset, is merely a starting point for given piece of information is real. Products and businesses
constructing a model. One form of scarce resource is datasets.
Logistic regression, Thomas Bayes's naïve machine, and the
can be affected by the spread of false information.[9] One's
Passive Aggression Classifier were all employed in the political career is not immune to the damaging effects of
classification processes of this model. Our model's extraction of fake news. [10]According to Jency Jacob, managing director
features for victimisation employs approaches like Term of the Mumbai-based fact-checking website BOOM, which
Frequency-Inverse Document (TF-IDF) frequency and logistic collaborates with Facebook to examine stories and tag
regression with an accuracy of 97%. specific posts spreading misinformation on the platform,
"2019 has been a unique year where truth checkers
Keywords- Fake News; Natural Language Processing; Term consistently kept shifting from one match to the other,"
Frequency; Machine Learning; Feature Extraction making it the busiest year for them so far.[11] In this study,
I. INTRODUCTION the Nave Bayes classifier, logistic regression, the Passive
Aggressive Classifier, and the Support Vector Machine
From the several years ago, social media has become (SVM) are compared for supervised classification.[12] In
ubiquitous.[1] The dissemination of false information is this work, a dataset is employed that combines real and fake
rampant on social media.[12] False information poses a data with successful outcomes.
threat to business, government, education, democracy, and
the economy.[3] Although fake news has always been an II. LITERATURE SURVEY
issue, the proliferation of social media has made it more Conventional methods of identifying false information only
likely that people will believe it and disseminate it allow for two possible values (Real or Fake), whereas in
further.[4] It is becoming increasingly difficult to tell true practice it is often impossible to determine with absolute
from misleading information, which leads to confusion and certainty whether a piece of data is genuine or not and must
issues. Recognizing fake news by hand is a challenging task instead be evaluated on a scale of confidence. The author
that can only be accomplished by someone with an saw this as a crucial consideration for organising data in
exceptional knowledge of the news industry. People are social media.[13] This hybrid strategy employs both
increasingly likely to seek out and consume information consisting the words consists of bags and n-gram
from social media rather than conventional news approaches to communicate data.[14] For the purpose of
organisations as more and more of their time is spent identifying harmful websites, suggested based static item
communicating online through these platforms. [5] The detection method. It was previously believed that IP
characteristics of social media sites like Facebook, Twitter, addresses were externally focused. Rather of focusing on the
and Instagram explain why people are shifting to reading
925
Authorized licensed use limited to: Zhejiang University. Downloaded on August 04,2023 at [Link] UTC from IEEE Xplore. Restrictions apply.
expressions, or to put it another way, to explain how selecting a bag of words set and comparing the reviews with
unusual a phrase is. One method for determining which the bag of words and identifies the true positives and false
phrases are most significant in a document is called the positives. The accuracy and recall metrics are calculated to
Term Frequency-Inverse Document Frequency (TF-IDF). It identify the model performance levels. The diagram
is true that the frequency with which a phrase appears in a presented is a Sigmoid Function, which was also refer to as
document raises its relevance, but the frequency with which a Logit. This quality transforms the probabilities into binary
the phrase appears in the corpus helps to mitigate this effect. numbers, which can be similarly utilised in making
High TF-IDF value words are crucial to the text and started forecasts.
off by performing some preliminary processing on the
dataset by removing any superfluous or irrelevant words or
characters. After that, author will move on to Term
Frequency-Inverse Document Frequency for the next phase
of the process: function extraction. Author looked into the
Support Vector Machine (SVM), Naive Bayes, Passive
Aggressive Classifier, and logistic regression techniques.
Python's Natural Language Toolkit was used to implement
these classifiers (NLTK). The dataset was divided into a
"preparation" and "testing" portion. the 20% of the data for
testing purposes and 80% for getting ready purposes. Here
Logistic Regression model is used for training our data.
News Pre-
articles processing [Link] Regression
dataset steps of
If the probability price is reduced to below 0.5, then it
belongs to Class 0 on this graph, and if the price is larger
than 0.5, then it belongs to Class 1. The proposed model
strictly considers and checks the correlation of features in the
dataset and the fake news are detected using the correlation
Divide factor. This process helps in achieving high accuracy levels
Feature dataset into in fake news detection.
abstracting train and IV. RESULTS
test The accuracy of 97% obtained by logistic regression is very
high. The accuracy score on the training data is 98%, while
on the testing data it is 97%. The following code snippet
depicts the accuracy scores for both the training data and the
testing data.
Parametric
Classifier
validation
training
[Link] for detecting faux news.
A. Logistic Regression:
926
Authorized licensed use limited to: Zhejiang University. Downloaded on August 04,2023 at [Link] UTC from IEEE Xplore. Restrictions apply.
Making a predictive system:
For predicting the system here author used a
classifier model called logistic regression model which is a
binary classification model. If prediction of the model is 0
the news is real otherwise the news is considered as fake
news.
[Link] System
[Link] extraction accuracy level
Feature extraction time level:
Model is trained using logistic regression. Here Feature selection time level:
[Link] dataset is used consists of 10800 news articles and Two features are selected in our training model.
extracted 5 features from the dataset called id, title, author, Feature selection for training the model is id and title. The
text and label. The feature extraction time level will be low time required is less when compared to existing model in
when compared to the existing model and the feature feature selection. The feature selection time level graph is
extraction time level graph is represented as follows: represented as follows:
927
Authorized licensed use limited to: Zhejiang University. Downloaded on August 04,2023 at [Link] UTC from IEEE Xplore. Restrictions apply.
uses binary representations for prediction. If the prediction
system is 0 the news is REAL. If the prediction system is 1
the news is FAKE. In future hybrid classifiers can be
implemented for reducing the complexity levels and
enhancing the accuracy rate. Feature dimensionality
reduction can also be applied on the model for reducing the
time complexity levels.
REFERENCES:
928
Authorized licensed use limited to: Zhejiang University. Downloaded on August 04,2023 at [Link] UTC from IEEE Xplore. Restrictions apply.
International Journal of Safety and Security Engineering, Vol. 11, [17]. V. Pavani, S. Sri. K, S. Krishna. P and V. L. Narayana, "Multi-
No. 2, 2021, pp. 201–205. Level Authentication Scheme for Improving Privacy and Security
[15]. V. Pavani, M. N. Swetha, Y. Prasanthi, K. Kavya and M. of Data in Decentralized Cloud Server," 2021 2nd International
Pavithra, "Drowsy Driver Monitoring Using Machine Learning Conference on Smart Electronics and Communication (ICOSEC),
and Visible Actions," 2022 International Conference on 2021, pp. 391-394, doi: 10.1109/ICOSEC51865.2021.9591698.
Electronics and Renewable Systems (ICEARS), 2022, pp. 1269- [18]. Shaik, Sharmila, P. Sudhakar, and Shaik Khaja Mohiddin. "A
1279, doi: 10.1109/ICEARS53579.2022.9751890. Novel Framework for Image Inpainting." International Journal of
[16]. V. Pavani, N. M. Pujitha, P. V. Vaishnavi, K. Neha and D. S. Computer Trends and Technology (IJCTT) 14: 141-147.
Sahithi, "Feature Extraction based Online Job Portal," 2022 [19]. Sharmila, Shaik, and Ch Aparna. "VMSSS: A proposed model for
International Conference on Electronics and Renewable Systems cloud forensic in cloud computing using VM snapshot server."
(ICEARS), 2022, pp. 1676-1683, doi: Soft Computing for Problem Solving. Springer, Singapore, 2019.
10.1109/ICEARS53579.2022.9752295. 483-493.
929
Authorized licensed use limited to: Zhejiang University. Downloaded on August 04,2023 at [Link] UTC from IEEE Xplore. Restrictions apply.