PRESENTED BY:
Aadrika Barnwal
Anuj Kumar
Amit Kumar Sahani
Khushi Jaiswal
Manoj Kumar
Shiwangi
Introduction
In the digital era, information spreads rapidly,
making it difficult to separate fact from fiction.
Fake news is misleading or false content that
can influence public opinion and impact
society.
Social media and online platforms have
amplified the reach of fake news. Unchecked
fake news erodes trust, distorts public
discourse, and harms individuals,
communities, and economies
Fake news detection uses algorithms and
technologies to identify and filter misleading
information. This presentation will cover the
methods, challenges, and importance of
fake news detection.
62% of the data on the
Fake news costs the
internet is unreliable
global economy $78
according to center
billion annually
statistic office(CSO)
40% of the content 86% of the people
shared on the social globally have been
media is false exposed to fake news
Counter fake news'
influence on elections,
public health, and social Enhance critical
cohesion. thinking and media
literacy, empowering
users to distinguish
fact from fiction
Utilize NLP, deep
learning, and data Foster a
analytics to develop trustworthy online
innovative solutions for environment,
complex problems supporting
informed decision-
making and a
knowledge-based
economy
Objective Detecting Fake News Using
Machine Learning
The objective of this project is to develop a robust
and efficient ML-based system capable of
accurately detecting fake news.
By analyzing the language used in news articles,
the tool will be able to determine whether the
information is true or false.
This will help people make informed decisions and
reduce the spread of misinformation.
Data Collection Text Processing
Importing Datasets
df_fake =
pd.read_csv("../input
/fake-news-
detection/Fake.csv")
df_true =
pd.read_csv("../input
/fake-news- Apply Tokenization
Fake News Challenge detection/True.csv")
(FNC-1) Stop-Word Removal
Kaggle’s fake news Classification
dataset Lowercasing
df_fake["class"] = 0
BuzzFeed News’s df_true["class"] = 1 Stemming/lemmatization
dataset
Preprocessing can also include removing URLs, special characters, and handling contractions.
WORK-FLOW
TEXT COLLECTION
TEXT PRE-
PROCESSING
FEATURE
EXTRACTION
Classified News as
News to be Classified CLASSIFIER
Fake or Real
Feature Engineering
Preprocessed data into numerical features
tfidf =
that a machine learning model can learn TfidfVectorizer(stop_words
from : ='english', max_df=0.7)
TF-IDF (Term Frequency-Inverse # Transform the training
Document Frequency) and testing data
X_train_tfidf =
tfidf.fit_transform(X_train)
X_test_tfidf =
tfidf.transform(X_test)
Bag of Words (BoW) Word Embeddings
Splitting The Data
Split the data into training and
testing sets to evaluate model
performance. A common split
ratio is 80% for training and 20%
for testing:
Splitting Training and Testing
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.20)
Methodologies
Different methodologies to test model on unseen data
Logistic Regression
def manual_testing(news):
A simple, efficient classifier for binary
testing_news = {"text":[news]}
classification problems, providing probability new_def_test = pd.DataFrame(testing_news)
estimates and easy-to-interpret coefficients. new_def_test["text"] = new_def_test["text"].apply(wordopt)
Decision Tree Classifier new_x_test = new_def_test["text"]
new_xv_test = vectorization.transform(new_x_test)
Easy-to-visualize classifiers that handle
pred_LR = LR.predict(new_xv_test)
categorical and numerical features, providing pred_DT = DT.predict(new_xv_test)
robustness to outliers and missing values. pred_GBC = GBC.predict(new_xv_test)
Gradient Boost Classifier pred_RFC = RFC.predict(new_xv_test)
A high-performance classifier that handles
return print("\n\nLR Prediction: {} \nDT Prediction: {} \nGBC
complex, non-linear relationships and high-
Prediction: {} \nRFC Prediction:
dimensional data. {}".format(output_lable(pred_LR[0]),
Random Forest Classifier output_lable(pred_DT[0]),
A robust ensemble classifier that reduces
output_lable(pred_GBC[0]),
overfitting, handles missing values, and
provides feature importance estimates.
output_lable(pred_RFC[0])))
Result Logistic Regression
We evaluated each classifier's
performance using metrics such as:
Accuracy, Precision, Recall, F1 score.
Used Random Search technique to find
optimal parameters for our chosen model.
Decision Tree Classifier
Gradient Boost Classifier
Random Forest Classifier
Confusion Matrix & Classification Report Heatmap
For Different Classifiers
Conclusion
Fake News Detection Model
Developed using Kaggle Fake News Dataset
Preprocessing + TF-IDF for feature extraction
Classifiers: LR, RF, Decision Tree, Gradient boost
High accuracy in detecting real vs. fake news
Applications: media, social platforms, combating
misinformation
Future work: advanced NLP models for improved
accuracy
References
Simplilearn learning tutorial and resources
International journal of intelligent systems and
applications in engineering
H. Ahmed, I. Traore, and S. Saad, ‘‘Detecting
opinion spams and fake news using text
classification, ’’ Secur. Privacy, vol. 1, no. 1, p.
e9, Jan. 2018.