Draft
Draft
KALPANADEVI G
21CSR074
KARTHICK P
21CSR078
INDRA B
21CSL259
in partial fulfillment of the requirements
BACHELOR OF ENGINEERING
IN
COMPUTER SCIENCE AND ENGINEERING
NOVEMBER 2023
BONAFIDE CERTIFICATE
This is to certify that the Project report entitled FAKE NEWS DETECTION USING MACHINE
in partial fulfillment of the requirements for the award of the Degreeof Bachelor of Engineering in
Computer Science and Engineering of Anna University, Chennai during the year 2022 - 2023.
Date :
DECLARATION
We affirm that the Project Report titled FAKE NEWS DETECTION USING MACHINE
LEARNING being submitted in partial fulfillment of the requirements for the award of Bachelor
of Engineering is the original work carried out by us. It has not formed part of any other project
report or dissertation on the basis of which a degree or award was conferred on an earlier occasion
Date :
KALPANADEVI G
(Reg. No.:21CSR074)
KARTHICK P
([Link] :21CSR078)
INDRA B
([Link].:21CSRL259)
I certify that the declaration made by the above candidates is true to the best of my knowledge.
ABSTRACT
Fake news is false or misleading information presented as news. Fake news often has the aim
of damaging a person's reputation and making money through advertising revenue. The constant
circulation of fake news directly or indirectly produces a huge negative impact on the vast majority
of society. However, they have difficulty in sensing highly ambiguous fake news which can be
detected only after identifying the meaning and the latest related information. Fake news detection
involves analysing various data types, such as textual or media content,and social context. Machine
learning plays a crucial role in fake news detection and in analysing large amounts of
misinformation data. ML is a data-driven approach that learns from labelled data to make
predictions. Machine learning algorithms discover patterns and relationships in input data, enabling
the algorithm to make predictions or classifications. Supervised learning involves training a model
on a labelled dataset containing examples of real and fake news articles. We've used Logistic
Naïve Bayes(NB) algorithm techniques to classify fake news. This model helps us to detect the
accuracy of fake news using different classification techniques. The results obtained show that the
fake news with textual content can indeed be classified . Finally, we had received an accuracy range
ACKNOWLEDGEMENT
[Link], [Link], M.B.A, LLB our beloved Correspondent and all other philanthropic
trust members of Kongu Vellalar Institute of Technology Trust who have always encouraged us in
We are extremely thankful with no words of formal nature to the dynamic Principal Dr.
V. BALUSAMY, [Link]., Ph.D., for providing the necessary facilities to complete ourwork.
Computer Science Engineering, Project Coordinator for her encouragement and valuable advice that
Professor(SRG), Department of Computer Science Engineering, for her valuable ideas and
suggestions, which have been very helpful in the project. We are grateful to all the faculty members
TABLE OF CONTENTS
ABSTRACT iv
LIST OF FIGURES ix
LIST OF ABBREVIATION x
INTRODUCTION 1
1 1.1 MOTIVATION OF THE PROJECT 1
2 LITERATURE REVIEW 4
3 SYSTEM REQUIREMENT 5
3.3.1 Python 8 5
3.3.2 OpenCV 7
3.3.3 Matplotlib 9
3.3.4 Tensorflow 9
4 PROPOSED SYSTEM 10
APPENDIX 1 CODING 25
APPENDIX 2 SCREENSHOT 31
REFERENCES 32
8
LIST OF FIGURES
LIST OF ABBREVIATIONS
ML : Machine Learning
RF : Random Forest
NB : Naïve Bayes
CT : Computed Tomography
CHAPTER 1
INTRODUCTION
Fake news refers to false or misleading information presented as news with the intent to
generate revenue through advertising clicks. The rise of fake news in the digital age, amplified by
social media and other platforms, has had widespread negative consequences, leading to
misinformation on important topics and influencing public opinion and behavior. Detecting fake
news is a challenging task as it requires distinguishing between factual and deceptive content, which
often involves subtle differences in text, meaning, and the context in which the news is shared. Fake
news detection involves analyzing large datasets containing a variety of data types such as text,
images, and even social context, to determine the credibility of news [Link] learning (ML)
has emerged as a powerful tool for detecting fake news. Through supervised learning, ML models
are trained on labeled datasets consisting of real and fake news articles. These models use algorithms
to identify patterns in the data that help distinguish between genuine and false news. Several
classification techniques are employed to achieve this, including Logistic Regression (LR), Support
Vector Machines (SVM), Random Forest (RF), and Naïve Bayes (NB). These algorithms allow the
system to analyze the news content and predict whether a news article is real or [Link] this project,
we aim to classify fake news using different machine learning models. By leveraging these
techniques, we achieved a high accuracy of around 99%, showcasing the effectiveness of machine
The increasing prevalence of fake news, particularly in the digital age, has caused significant
societal, political, and economic harm. Misinformation spreads rapidly on social media platforms, often
going viral before fact-checkers or legitimate sources can intervene. This not only misleads individuals
but also erodes public trust in reliable news sources. The motivation for this project stems from the
urgent need to address the growing threat of fake news and its impact on [Link] of the main drivers
behind this project is the difficulty that people face in distinguishing fake news from real news,
especially when fake news is crafted to look authentic. Traditional methods of verifying information
manually are often time-consuming and insufficient given the sheer volume of content generated online.
Automated detection systems powered by machine learning offer a scalable solution that can analyze
large datasets and classify news articles quickly and [Link] project also aims to highlight the
role machine learning plays in detecting fake news. By applying machine learning models to this
problem, we can explore how data-driven algorithms can be used to identify patterns, relationships, and
key indicators that separate real news from fake news. This can empower users and platforms with tools
to automatically flag or filter out misleading [Link], the success of this project has a broader
societal impact. Detecting and reducing the spread of fake news can contribute to a more informed
public, reduce the negative influence of misinformation on elections, public health, and other critical
areas, and restore trust in credible journalism. By achieving high accuracy with multiple machine
• To design and implement a system capable of automatically identifying and classifying news
articles as fake or real, thus reducing the manual effort required to fact-check information.
• To apply various machine learning algorithms, such as Logistic Regression, Support Vector
Machines (SVM), Random Forest, and Naive Bayes, to build robust classifiers for detecting
• To optimize the detection process by using advanced feature extraction techniques like Term
• To evaluate the performance of different machine learning models and compare their accuracy,
precision, recall, and F1 scores in classifying fake and real news. This helps identify the most
• To provide a tool that contributes to the larger goal of controlling the spread of misinformation
and its adverse effects on society by enabling quick and accurate detection of fake news.
• To ensure that the model can handle large datasets efficiently, making it scalable and
applicable to real-time detection scenarios, such as social media platforms and news
aggregators.
By achieving these objectives, this project aims to develop a high-accuracy fake news detection system
that can help combat the spread of misinformation and contribute to a more informed and trust media
environment.
13
CHAPTER 2
LITERATURE REVIEW
The detection of fake news has become an essential area of study due to the increasing spread of
misinformation on social media. Early research primarily focused on analyzing the textual content of
news articles, using techniques like Bag of Words (BoW) and Term Frequency-Inverse Document
Frequency (TF-IDF) to represent text in a machine-readable format. Castillo et al. (2011) explored
linguistic features to classify rumors, marking one of the first attempts to automate this process. Wang
(2017) introduced the "LIAR" dataset, a widely used benchmark that emphasized the role of text analysis
in fake news detection. Machine learning algorithms such as Logistic Regression (LR), Support Vector
Machines (SVM), and Random Forest (RF) have been employed for classification tasks, leveraging
textual features to distinguish between real and fake news. More recently, deep learning models like
Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks have shown
significant improvements in capturing contextual and semantic nuances in text. Hybrid approaches that
combine text-based features with user behavior and social context data, as suggested by Shu et al. (2017),
have further improved detection accuracy. Despite these advancements, challenges remain, such as
generalizing models across domains and handling misinformation patterns. Future research aims to
CHAPTER 3
SYSTEM REQUIREMENTS
HARDWARE REQUIREMENTS
Processor : intel i7
Processor Speed : 2.21 GHz
Hard Disk : 475 GB
RAM : 8.00 GB
SOFTWARE REQUIREMENTS
Language :Python 3.x
Time, OpenCV
SOFTWARE DESCRIPTION
PYTHON
The source code of python is available under the GPL. It provides constructs that enable clear
programming on both small and large scales. Python has features like dynamic systems and
imperative, object-oriented, functional and procedural, and has a large and comprehensive standard
library.
15
Python is open source software and has a community-based development model. Python allows
programmers to build their own types using classes, which are most often used for object- oriented
Easy-to-learn: Python has simple keywords, simple structure, and a clearly defined
Easy-to-read: Python code is more clearly defined and visible to the eyes.
Interactive Mode: Python has a support for an interactive mode which allows interactive
Portable: Python can run on a wide variety of hardware platforms and has the same
GUI Programming: Python supports GUI applications that can be created and ported to
many system calls, libraries and windows systems, such as Windows MFC, Macintosh,
Scalable: Python provides a better structure and support for large programs than shell
scripting. The support of NumPy makes the task easier. NumPy is a highly optimized library
NUMPY:
NumPy is a library for the Python programming language that provides tools for working with
large, multi-dimensional arrays and matrices. It also provides a variety of mathematical functions
for working with these arrays, including linear algebra, Fourier analysis, and random number
generation.
MATPLOTLIB:
Matplotlib is a plotting library for the Python programming language. It provides avariety
of plotting functions and tools for creating visualizations of data, including line plots, scatter plots,
TIME:
Time is a module in the Python standard library that provides tools for working with time-
related functions. It can be used for measuring the performance of code, calculating time intervals,
OPENCV:
OpenCV supports various programming languages such as C++, Python, and Java, and is
available on various platforms such as Windows, Linux, OS X, Android, and iOS. Interfaces for
accelerated GPU operations based on CUDA and OpenCL are also actively developed. OpenCV
Python is a Python API for OpenCV that combines the best features of the OpenCV C++ API
17
and the Python [Link] Python uses Numpy, a highly optimized library for numerical
operations with MATLAB-style syntax. All OpenCV array structures are converted to and from
numpy arrays. This also makes it easier to integrate with other libraries that use Numpy, such as
Image Segmentation
Feature Extraction
Object Detection
Pattern Recognition
Visualization
TENSORFLOW:
provides tools and libraries for building and training machine learning models, including deep neural
networks. TensorFlow is known for its ease of use, scalability, and flexibility.
PYTORCH:
on the Torch library and provides a dynamic computational graph, making it easier to debug and
optimize machine learning models. PyTorch is known for its simplicity and flexibility.
18
KERAS:
Keras is a high-level neural networks API, written in Python and capable of running on top
of TensorFlow, Theano, or CNTK. It provides a user-friendly interface for building and training
EFFICIENT NET:
EfficientNet is a series of convolutional neural networks (CNNs) that were designed to provide
an optimal balance between model size (number of parameters) and model performance. These
models are known for their efficiency, meaning they achieve high accuracy while requiring fewer
PROPOSED SYSTEM
The proposed system for fake news detection employs machine learning techniques to analyze
and classify news articles as either real or fake. It will utilize classifiers such as Logistic Regression,
Support Vector Machines (SVM), Random Forest, and Naive Bayes, trained on a large and diverse
dataset of labeled articles to ensure high accuracy in distinguishing factual reporting from
misinformation. The system will incorporate data collection and preprocessing, feature extraction
using methods like TF-IDF, and comprehensive model training and evaluation to optimize
performance. A user-friendly interface will allow users to input articles for real-time analysis,
providing feedback on their authenticity along with confidence scores. Additionally, the system will
aim for seamless integration with existing news platforms to automatically flag potentially
misleading content. Ultimately, this project seeks to combat misinformation and enhance public
MACHINE LEARNING:
Machine learning (ML) is a subset of artificial intelligence (AI) that focuses on developing
algorithms and statistical models that enable computers to perform tasks without explicit
classify information. Here’s an overview of the main techniques used in machine learning:
1. Supervised Learning
2. Unsupervised Learning
3. Reinforcement Learning
NEURAL NETWORK:
Neural networks provide the ability to perform tasks such as classification and clustering.
They are a set of algorithms that copy the behavior of the human brain in recognizing relationships
between data. Neurons are information carriers. They use electrical impulses and chemical signals
to communicate information between themselves and other parts of the brain. A neural network here
weights, and passes the sum to other neurons via an activation function. Also called a node or unit.
1. Input Layer
2. Hidden Layer
3. Output Layer
The input layer is responsible for getting input into the system for further processing in subsequent
layers. These inputs can be read as vectors or from CSV files. Only one input layercan exist in
the network.
The hidden layer sits between the model's input and output layers, where the function applies
weights to the input values and passes them through the activation function. It is passed to
subsequent functions. They are very common in neural networks, but their usage and architecture
often vary from case to case. In some cases, weighted inputs are randomly assigned. Otherwise,
output. The output layer takes input from the previous layer, performs computations, and outputs.
As with the input layer, there can only be one output layer with any number of neurons.
KEY TECHNOLOGIES:
NLP is a critical technology for understanding and processing human language. In the context
of fake news detection, NLP techniques are used to analyze text data, extract features, and interpret
the semantics of [Link]: Splitting text into individual words or phrases for
[Link] Removal: Eliminating common words that do not contribute to the meaning (e.g.,
"and," "the").Stemming and Lemmatization: Reducing words to their base or root form to unify
Machine learning techniques are fundamental for building classifiers that can differentiate
• Logistic Regression: A simple and effective classification algorithm for binary outcomes.
• Support Vector Machines (SVM): Effective for high-dimensional data and text classification.
• Random Forest: An ensemble method that combines multiple decision trees to improve
accuracy.
• Naive Bayes: Based on Bayes' theorem, this algorithm is particularly useful for text
classification.
FEATURE EXTRACTION TECHNIQUES :
Extracting meaningful features from text data is essential for improving model
Techniques like Word2Vec or GloVe transform words into dense vector representations that capture
trained transformer model that captures contextual information from text and can be fine-tuned for
specific tasks.
DATA AUGMENTATION:
Data Augmentation is a technique used to increase the size of the training dataset by applying
random transformations to the data. This can improve the robustness and generalization of the
model, as it learns to recognize objects under different conditions. Common data augmentation
techniques include random scaling, cropping, and flipping, which can help the model learn to detect
MODULE DESCRIPTION:
DATASET DESCRIPTION:
The dataset used for the fake news detection project consists of two primary categories: real
news and fake news, encompassing a total of 44,898 articles. The real news category contains 21,417
articles sourced from reputable news organizations, covering subjects such as world news and
politics, which ensures a comprehensive representation of legitimate reporting. In contrast, the fake
news category comprises 23,481 articles including various topics like government news, middle-
east news, and left-leaning news, reflecting the diverse nature of misinformation present in online
platforms. Each article in the dataset is structured with key attributes: the title, the full text of the
article, the subject category, and a binary label indicating whether the article is real (0) or fake (1).
The articles have been meticulously curated and annotated to maintain high quality and accuracy,
with human annotators verifying each piece to ensure reliability. This rich dataset is instrumental in
training machine learning models to detect fake news by enabling them to learn from the linguistic
patterns and contextual features within the text, ultimately contributing to more effective
DATASET COLLECTION:
The dataset for the fake news detection project was collected from a variety of reputable sources
to ensure diversity and relevance in the training data. For the real news category, articles were sourced
from established news organizations, online publications, and trusted journalism websites, which
provide accurate and credible information on various topics such as politics, world events, and local
news. In contrast, the fake news category was compiled from known misinformation websites, social
media platforms, and content specifically labeled as unreliable or misleading by fact-checking
organizations. This collection process involved scraping publicly available articles, as well as
leveraging datasets from previous research efforts dedicated to fake news detection. Each article was
carefully vetted for authenticity and labeled accordingly, creating a balanced dataset that includes both
real and fake news examples. This approach not only enhances the dataset's quality but also ensures
that the model can learn to identify nuanced differences in writing styles, factual accuracy, and
WORKING OF MODEL:
The working of fake news detection involves several key processes that leverage machine
learning and natural language processing techniques to identify and classify news articles as either
real or fake. Here’s an overview of how the system operates:
1. Data Collection:
The process begins with the collection of a dataset that includes both real and fake news
articles. This dataset typically consists of various features such as titles, article texts, subjects, and
labels indicating the authenticity of each article.
2. Data Preprocessing:
The collected data undergoes preprocessing to clean and prepare it for analysis. This step
involves removing unnecessary elements such as HTML tags, special characters, and stop words,
as well as converting the text to a uniform format (e.g., lowercasing). Additionally, the text is often
tokenized, where sentences are broken down into individual words or phrases for further analysis.
3. Feature Extraction:
After preprocessing, relevant features are extracted from the text data. Techniques such as
Term Frequency-Inverse Document Frequency (TF-IDF) or word embeddings (like Word2Vec or
GloVe) are commonly used to convert textual data into numerical representations that machine
learning models can understand. This process captures the importance and context of words within
the articles.
4. Model Training:
The processed data is then split into training and testing sets. Various machine learning
algorithms, such as Logistic Regression, Support Vector Machines (SVM), Random Forests, and
Neural Networks, are trained on the training set. During this phase, the model learns to identify
patterns and relationships between the features and their corresponding labels (real or fake).
5. Model Evaluation:
After training, the model's performance is evaluated using the testing set. Metrics such as
accuracy, precision, recall, and F1-score are calculated to assess how well the model can correctly
classify unseen data. This evaluation helps to identify any weaknesses or biases in the model.
6. Prediction:
Once the model is deemed effective, it can be used to predict the authenticity of new, unseen
articles. When a new article is input into the system, it undergoes the same preprocessing and feature
extraction steps before being classified by the trained model.
The system provides an output indicating whether the article is classified as real or fake, often
with a confidence score. This information can be further enhanced with additional insights, such as
key features that contributed to the classification decision.
8. Continuous Improvement:
As new articles are published and more data becomes available, the model can be updated
and retrained to adapt to evolving patterns of misinformation, ensuring its accuracy and
effectiveness over time.
By combining these processes, the fake news detection system can efficiently analyze large
volumes of content and help users identify misleading or false information in the digital landscape.
Figure 4.2 Fake News Detection Architecture
Logistic regression operates as a statistical method used for binary classification, determining
the probability that a given input belongs to one of two categories. Initially, the model takes input
features and computes a linear combination of these features, applying weights to each input. The
key component of logistic regression is the logistic (or sigmoid) function, which transforms this
linear output into a probability value ranging between 0 and 1. This transformation allows the model
to express the likelihood that an instance belongs to the positive class, such as identifying a news
article as fake.
During the training phase, the model adjusts the weights assigned to each feature in order to
minimize the error between the predicted probabilities and the actual class labels in the training data.
This is typically achieved through an optimization process that uses algorithms like gradient descent,
iteratively updating the weights to reduce the loss function, which measures prediction accuracy.
Once trained, the logistic regression model can classify new instances by calculating the predicted
probability and applying a threshold (commonly set at 0.5) to assign a class label. For instance, if
the probability is above the threshold, the model predicts the instance as fake news; otherwise, it
classifies it as real news.
The performance of logistic regression is evaluated using metrics such as accuracy, precision,
recall, and F1-score, which help assess how well the model performs in distinguishing between the
two classes. Logistic regression is favored for its simplicity and interpretability, allowing users to
understand the impact of different features on the classification outcome. However, it may struggle
with complex relationships in the data, where more sophisticated models might be needed. Overall,
logistic regression remains a foundational technique in machine learning, particularly for tasks like
fake news detection, due to its effectiveness and ease of use.
PERFORMANCE EVALUATION
Performance evaluation is a crucial aspect of any machine learning model, including those
used for fake news detection, as it determines how well the model performs in classifying new,
unseen data. Several metrics are commonly employed to assess the effectiveness of the model,
providing insights into its accuracy and reliability. The most straightforward metric is accuracy,
which measures the proportion of correctly classified instances out of the total number of
datasets where one class significantly outnumbers the [Link] gain a more nuanced
understanding of the model's performance, additional metrics such as precision, recall, and F1-
score are utilized. Precision calculates the ratio of true positive predictions to the total predicted
positives, indicating the model's accuracy when it predicts an article as fake. Recall, on the other
hand, measures the ratio of true positives to the actual positives in the dataset, providing insights
into the model's ability to identify all relevant instances of fake news. The F1-score, which is the
harmonic mean of precision and recall, offers a single metric that balances both, making it
particularly useful when the classes are [Link], confusion matrices are
employed to visualize the model's performance, showcasing the counts of true positives, true
negatives, false positives, and false negatives. This visualization helps identify specific areas
where the model excels or struggles, guiding further improvements. ROC (Receiver Operating
Characteristic) curves and AUC (Area Under the Curve) are also valuable tools for evaluating the
model's discriminatory power across various threshold settings, illustrating the trade-offs between
Validation and results are critical components in the development and assessment of
fake news detection models. Validation refers to the process of evaluating the model's
performance on a separate dataset that was not used during the training phase. This ensures
that the model can generalize well to new, unseen data and helps prevent overfitting, where
a model performs exceptionally well on training data but fails to perform similarly on new
data. Common validation techniques include k-fold cross-validation, where the dataset is
divided into k subsets, and the model is trained and tested k times, each time using a different
subset as the test set while the others serve as the training set. This method provides a robust
estimate of the model's performance by averaging the results over multiple [Link]
validation is completed, the results are analyzed to determine how effectively the model
classifies news articles as real or fake. The evaluation metrics—such as accuracy, precision,
weaknesses. For instance, a high accuracy rate might indicate good overall performance, but
low precision or recall could suggest that the model struggles with false positives or
negatives. Additionally, visual tools such as confusion matrices provide insights into
results from validation can guide further refinement of the model. Based on performance
insights, adjustments can be made to the feature set, hyperparameters, or even the choice of
algorithm to enhance accuracy and reliability. Ultimately, thorough validation and analysis
of results are essential for ensuring that the fake news detection model is not only effective
CONFUSION MATRIX:
In conclusion, the development of a robust fake news detection model leveraging machine
learning techniques represents a significant step toward addressing the pervasive issue of misinformation
in today’s digital landscape. Through the application of various algorithms, such as Logistic Regression,
Support Vector Machines, and Random Forests, the model demonstrated a high degree of accuracy and
reliability in classifying news articles as real or fake. The validation process revealed the model's ability
to generalize well to unseen data, ensuring that it can effectively assist in the identification of misleading
information. Furthermore, the use of diverse evaluation metrics provided valuable insights into the
model's performance, highlighting areas for improvement and confirming its practical applicability in
real-world [Link] ahead, future work will focus on enhancing the model's capabilities
through several key avenues. One area of improvement involves expanding the dataset to include a
broader range of news sources and topics, which will help the model better understand the nuances of
language and context in different types of articles. Additionally, exploring advanced deep learning
techniques, such as neural networks, could yield even more accurate predictions by capturing complex
patterns in the data. Incorporating user feedback mechanisms may also refine the model further, enabling
continuous learning and adaptation to evolving trends in misinformation. Lastly, addressing ethical
considerations around bias and fairness in model predictions will be paramount, ensuring that the tool is
equitable and effective across diverse populations and news genres. Through these efforts, the goal is to
develop a more sophisticated and impactful fake news detection system that can play a vital role in
CODING:
import pandas as pd
import numpy as np
[Link]('/content/drive')
lines
true_data = pd.read_csv('/content/[Link]',
engine='python', on_bad_lines='skip')
fake_data = pd.read_csv('/content/[Link]',
engine='python', on_bad_lines='skip')
print(true_data[0:10])
fake_data[0:4]
print(true_data['text'][0])
true_data.columns
print(fake_data['text'][0])
fake_data.columns
len(true_data)
len(fake_data)
fake_data['label'] = 1
true_data['label']=0
print(true_data.head())
fake_data.head()
all_data=[Link]([fake_data, true_data])
random_permutation =
[Link](len(all_data))
all_data= all_data.iloc[random_permutation]
print(all_data.columns)
all_data.head()
'label']]
filterd_data.head()
filterd_data.isnull().sum()
filterd_data['training_feature']=filterd_data['title']+'
'+filterd_data['text']+' '+filterd_data['subject']
filterd_data.head()
X= filterd_data['training_feature'].values
y = filterd_data['label']
l_X=filterd_data['training_feature'].values[0:1000]
l_Y= filterd_data['label'].values[0:1000]
print(l_X.shape)
print(l_Y.shape)
type (l_X)
print(X[0:1])
TfidfVectorizer
LogisticRegression
vectorizer= TfidfVectorizer()
X=vectorizer.fit_transform(X)
l_vectorizer= TfidfVectorizer()
l_X=l_vectorizer.fit_transform(l_X)
print(type(X))
print([Link])
print(type(l_X))
print(l_X.shape)
random_state=42)
random_state=42)
X_train.shape
model=LogisticRegression()
[Link](X_train,Y_train)
test_y_hat=[Link](X_test)
print(accuracy_score(test_y_hat, Y_test))
train_y_hat = [Link](X_train)
print(accuracy_score(train_y_hat,Y_train))
clf_poly=clf = [Link](kernel='poly') #
Polynomial kernel
[Link](l_X_train, l_Y_train)
y_pred = [Link](l_X_test)
print(accuracy_score(y_pred, l_Y_test))
print(f1_score(l_Y_test, y_pred,))
from [Link] import
RandomForestClassifier
model = RandomForestClassifier()
[Link](l_X_train, l_Y_train)
y_pred = [Link](l_X_test)
recall_score, f1_score,
accuracy_score,confusion_matrix
print(accuracy_score(y_pred, l_Y_test))
print(f1_score(l_Y_test, y_pred,))
model = GaussianNB()
[Link](l_X_train.toarray(), l_Y_train)
y_pred = [Link](l_X_test.toarray())
print(accuracy_score(y_pred, l_Y_test))
print(f1_score(l_Y_test, y_pred,))
[Link](figsize=(6, 4))
yticklabels=['Fake', 'True'])
[Link]('Predicted Label')
[Link]('True Label')
[Link]()
[Link](figsize=(6, 4))
yticklabels=['Fake', 'True'])
[Link]('Predicted Label')
[Link]('True Label')
Kernel)')
[Link]()
[Link](figsize=(6, 4))
yticklabels=['Fake', 'True'])
[Link]('Predicted Label')
[Link]('True Label')
[Link]()
APPENDIX 2
SCREENSHOT
REFERENCES
[1] Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu, Fake News Detection
Social Media: A Data Mining Perspective‖ arXiv:1708.01967v3 [[Link]], 3 Sep 2017
[2] Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu, Fake News Detection on
Social MediaA Data Mining Perspective‖ arXiv:1708.01967v3 [[Link]], 3 Sep 2017
[3] Z Khanam, B N Alwasel, H Sirafi and M Rashid. Fake News Detection Using Machine
Learning Approaches. Journal of Physics: Conference Series, 1099(1), 012040.
[4] Shalini Pandey, Sankeerthi Prabhakaran, N V Subba Reddy and Dinesh Acharya. Fake News
Detection from Online media using Machine learning Classifiers. Journal of Physics:
Conference Series, 2161(1),012027.
[5] Ahmed H, Traore I, Saad S. (2017) “Detection of Online Fake News Using N-Gram Analysis
and Machine Learning Techniques. In: Traore I., Woungang I., Awad A. (eds) Intelligent,
Secure, and Dependable Systems in Distributed and Cloud Environments. ISDDC 2017.
Lecture Notes in Computer Science, vol 10618. Springer, Cham (pp. 127- 138).
[6] JA Nasir, OS Khan, I Varlamis Fake news detection: A hybrid CNN-RNN based deep learning
approach International Journal of Information Management Data Insights, 1 (1) (2021),
Article 100007
[7] MD Ibrishimova, KF Li A machine learning approach to fake news detection using knowledge
verification and natural language processing Advances in Intelligent Networking and
Collaborative Systems (2019), pp. 223-234
[8] JY Khan, MT Khondaker, S Afroz, G Uddin, A Iqbal A benchmark study of machine learning
models for online fake news detection Machine Learning with Applications, 4 (2021), Article
100032
[9] S Kumar, R Asthana, S Upadhyay, N. Upreti, M Akbar Fake news detection using deep
learning models: A novel approach Transactions on Emerging Telecommunications
Technologies, 31 (2) (2020), p. e3767
[10] Ahmed H, Traore I, Saad S. (2017) “Detection of Online Fake News Using N-Gram Analysis
and Machine Learning Techniques. In: Traore I., Woungang I., Awad A. (eds) Intelligent,
Secure, and Dependable Systems in Distributed and Cloud Environments. ISDDC 2017.
Lecture Notes in Computer Science, vol 10618. Springer, Cham (pp. 127- 138).