0% found this document useful (0 votes)
28 views15 pages

Fake News Detection

The document discusses the prevalence and impact of fake news in the digital era, highlighting its influence on public opinion and the economy. It outlines a project aimed at developing a machine learning-based system for detecting fake news through text analysis, utilizing various methodologies and classifiers. The conclusion emphasizes the model's effectiveness and potential applications in combating misinformation.

Uploaded by

priyanshu230404
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views15 pages

Fake News Detection

The document discusses the prevalence and impact of fake news in the digital era, highlighting its influence on public opinion and the economy. It outlines a project aimed at developing a machine learning-based system for detecting fake news through text analysis, utilizing various methodologies and classifiers. The conclusion emphasizes the model's effectiveness and potential applications in combating misinformation.

Uploaded by

priyanshu230404
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

PRESENTED BY:

Aadrika Barnwal
Anuj Kumar
Amit Kumar Sahani
Khushi Jaiswal
Manoj Kumar
Shiwangi
Introduction
In the digital era, information spreads rapidly,
making it difficult to separate fact from fiction.
Fake news is misleading or false content that
can influence public opinion and impact
society.

Social media and online platforms have


amplified the reach of fake news. Unchecked
fake news erodes trust, distorts public
discourse, and harms individuals,
communities, and economies

Fake news detection uses algorithms and


technologies to identify and filter misleading
information. This presentation will cover the
methods, challenges, and importance of
fake news detection.
62% of the data on the
Fake news costs the
internet is unreliable
global economy $78
according to center
billion annually
statistic office(CSO)

40% of the content 86% of the people


shared on the social globally have been
media is false exposed to fake news
Counter fake news'
influence on elections,
public health, and social Enhance critical
cohesion. thinking and media
literacy, empowering
users to distinguish
fact from fiction

Utilize NLP, deep


learning, and data Foster a
analytics to develop trustworthy online
innovative solutions for environment,
complex problems supporting
informed decision-
making and a
knowledge-based
economy
Objective Detecting Fake News Using
Machine Learning

The objective of this project is to develop a robust


and efficient ML-based system capable of
accurately detecting fake news.
By analyzing the language used in news articles,
the tool will be able to determine whether the
information is true or false.
This will help people make informed decisions and
reduce the spread of misinformation.
Data Collection Text Processing
Importing Datasets
df_fake =
pd.read_csv("../input
/fake-news-
detection/Fake.csv")
df_true =
pd.read_csv("../input
/fake-news- Apply Tokenization
Fake News Challenge detection/True.csv")
(FNC-1) Stop-Word Removal
Kaggle’s fake news Classification
dataset Lowercasing
df_fake["class"] = 0
BuzzFeed News’s df_true["class"] = 1 Stemming/lemmatization
dataset

Preprocessing can also include removing URLs, special characters, and handling contractions.
WORK-FLOW

TEXT COLLECTION

TEXT PRE-
PROCESSING

FEATURE
EXTRACTION

Classified News as
News to be Classified CLASSIFIER
Fake or Real
Feature Engineering
Preprocessed data into numerical features
tfidf =
that a machine learning model can learn TfidfVectorizer(stop_words
from : ='english', max_df=0.7)

TF-IDF (Term Frequency-Inverse # Transform the training


Document Frequency) and testing data
X_train_tfidf =
tfidf.fit_transform(X_train)
X_test_tfidf =
tfidf.transform(X_test)
Bag of Words (BoW) Word Embeddings
Splitting The Data

Split the data into training and


testing sets to evaluate model
performance. A common split
ratio is 80% for training and 20%
for testing:

Splitting Training and Testing


x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.20)
Methodologies
Different methodologies to test model on unseen data
Logistic Regression
def manual_testing(news):
A simple, efficient classifier for binary
testing_news = {"text":[news]}
classification problems, providing probability new_def_test = pd.DataFrame(testing_news)
estimates and easy-to-interpret coefficients. new_def_test["text"] = new_def_test["text"].apply(wordopt)
Decision Tree Classifier new_x_test = new_def_test["text"]
new_xv_test = vectorization.transform(new_x_test)
Easy-to-visualize classifiers that handle
pred_LR = LR.predict(new_xv_test)
categorical and numerical features, providing pred_DT = DT.predict(new_xv_test)
robustness to outliers and missing values. pred_GBC = GBC.predict(new_xv_test)
Gradient Boost Classifier pred_RFC = RFC.predict(new_xv_test)
A high-performance classifier that handles
return print("\n\nLR Prediction: {} \nDT Prediction: {} \nGBC
complex, non-linear relationships and high-
Prediction: {} \nRFC Prediction:
dimensional data. {}".format(output_lable(pred_LR[0]),
Random Forest Classifier output_lable(pred_DT[0]),
A robust ensemble classifier that reduces
output_lable(pred_GBC[0]),
overfitting, handles missing values, and
provides feature importance estimates.
output_lable(pred_RFC[0])))
Result Logistic Regression
We evaluated each classifier's
performance using metrics such as:
Accuracy, Precision, Recall, F1 score.
Used Random Search technique to find
optimal parameters for our chosen model.

Decision Tree Classifier

Gradient Boost Classifier

Random Forest Classifier


Confusion Matrix & Classification Report Heatmap
For Different Classifiers
Conclusion
Fake News Detection Model

Developed using Kaggle Fake News Dataset

Preprocessing + TF-IDF for feature extraction

Classifiers: LR, RF, Decision Tree, Gradient boost

High accuracy in detecting real vs. fake news

Applications: media, social platforms, combating


misinformation
Future work: advanced NLP models for improved
accuracy
References

Simplilearn learning tutorial and resources

International journal of intelligent systems and


applications in engineering

H. Ahmed, I. Traore, and S. Saad, ‘‘Detecting


opinion spams and fake news using text
classification, ’’ Secur. Privacy, vol. 1, no. 1, p.
e9, Jan. 2018.

You might also like