DETECTING FAKE NEWS
USING MACHINE LEARNING
GAURAV KUMAR CHOUBEY (21MCA1061)
GUIDE NAME : Dr RAJARAJESWARI S
CONTENTS
1. INTRODUCTION
2. OBJECTIVE
3. PROBLEM STATEMENT
4. LITERATURE REVIEW SUMMARY OF RESEARCH PAPER 1
5. LITERATURE REVIEW SUMMARY OF RESEARCH PAPER 2
6. LITERATURE REVIEW SUMMARY OF RESEARCH PAPER 3
7. LITERATURE REVIEW SUMMARY OF RESEARCH PAPER 4
8. LITERATURE REVIEW SUMMARY OF RESEARCH PAPER 5
9. LITERATURE REVIEW SUMMARY OF RESEARCH PAPER 6
10. CONCLUSION
INTRODUCTION
● In today’s world information is available everywhere.
● The rise of platforms like facebook, twitter, instagram and many more helps in
sharing news where information is shared in such a easy manner.
● Just one click and the whole world can see or hear whatever you want them to hear.
● A research by 3 MIT scholars have found that there has been a spread of fake news
more than real news on twitter in the last few years.
● A very trendy word in india is “ watsapp university” which basically is a sarcastic
way of telling that the news from whatsapp is of no use as most of them are fake.
SOME KEY POINTS
● Social media platforms like facebook, twitter and instagram plays a major role in
fake news spread.
● Chatting apps like whatsapp are also responsible for the spread of fake news.
● Fake news are 70 percent more likely to be spread than true news.
● It takes 7 time more time for a true news to spread than to a fake news to spread.
These above points are taken from a survey done by 3 MIT scholars.
AGENDA BEHIND FAKE NEWS SPREADING
● Political agendas : some political parties or leaders want to defame there opponents
or have a good opinion about themselves.
● Pay per click : it means that there is a amount of money generated when a link is
visited by a number of people. Fake news more likely to be clicked and seen.
● Clickbait on platforms like youtube : clickbait is when you are forced to click on
something to view it even though the video is not actually what’s in the thumbnail.
OBJECTIVE
● The objective of this research is to develop a method for detecting fake news using
natural language processing.
● The main aim is to identify fake news which is classic text classification issue.
● Goal is to identify if a article is fake or real.
PROBLEM STATEMENT
● News consumption from online platforms is a double-edged sword.
● On one hand you get free information and easy to access news on the other hand it
gives rise to a wide-spread of fake news.
● Therefor fake news detection has become an emerging field of research.
● Detecting fake news on social media poses several new and challenging research
problems.
LITERATURE REVIEW SUMMARY OF PAPER 1
Fake News Detection Using Machine Learning Ensemble Methods
By : Iftikhar Ahmad , 1 Muhammad Yousaf,1 Suhail Yousaf , 1 and Muhammad Ovais Ahmad
INTRODUCTION
● Rise of world wide web and rapid adaptation of social media lead to information
dissemination.
● Huge profits for newspaper companies and news website. 70%of the traffic comes from
social media sites.
● But most of the news do not conform to facts leading to rise in fake news in politics,
sports, science and finance.
● There are some fact checking websites as well such as “politifact” and “snopes”, but most
of the news need human expertise. Except the text news which can easily be detected using
NLP(natural language processing).
FRAMEWORK
● Introducing ensemble techniques with various linguistic feature sets to distinguish fake and
real news from multiple domains.
● Ensemble techniques with linguistic features and world count(LIWC) feature set are used.
● Data sets used : ISOT fake news dataset and two datasets from kaggle.
● For consistency of format and structure below things are done.
1) unwanted variables like author’s name, date and url are filtered out
2) if word count less than 20 , then also its filtered out.
3) multicolumn articles are converted into single column articles.
● Data cleaning is done in which textual content is converted into numerical form so that it
can be inserted into training models.
● Linguistic features are extracted using the LIWC2015 tool .
● Linguistic features includes :
-> percentage of words includes positive and negative emotions
->percentage of stop words, punctuations, functional words.
->informal words, adjectives, verbs etc.
● The linguistic feature and word count 2015 tool extracts 93 different linguistic features
from the text.
CONCLUSION
● Classification of news requires in-depth knowledge of the domain.
● We used machine learning models and ensemble techniques.
● The main aim was to identify patterns in the text that differentiate the fake articles
from real ones.
● Used LIWC2015 tool to extract the linguistic features.
● Learning models were trained to get the best accuracy.
● Used performance metrics to compare different algorithms.
● The ensemble learners showed the best results.
LITERATURE REVIEW SUMMARY OF PAPER 2
Detecting Fake News in Social Media Networks
By : Monther Aldwairi, Ali Alwahedi
INTRODUCTION
➔ The advancement in web and technology has lead to a dissemination of information.
➔ Faster and convenient information on your palm.
➔ But the accuracy of the information is questionable.
➔ Content publishers started using clickbaits as a way to get traffic and eventually
money.
➔ Even short visits to these website generated a huge amount of revenue to the owners.
RESEARCH PROBLEM AND SOLUTION
● Problem :
○ The problem is to built a solution can be used to identify the fake news from different platforms.
○ Also to help users to stay away from clickbaits, which will be helpful for both the users and the
companies.
● Solution :
○ To identify and remove fake news from the results provided to the users, by the search engine or by the
social media sites. The tool can be downloaded by the user and can be embedded in there device or in
there browsers.
METHODOLOGY
➔ Locate credible clickbait database.
➔ Focus on social media sites having more fake news or clickbaits.
➔ After getting a url in a file, a python script computes the attributes from the file and
the content of the web pages.
➔ Features extracted from the web-pages.
◆ Keywords in arabic and english
◆ Title that starts with numbers
◆ Contains exclamation and question mark
◆ Bounce rates
◆ User can block the websites
CONCLUSION
➔ Used a simple algorithm to find some features to detect fake news websites.
➔ Features like word count, question marks, exclamation marks etc.
➔ Providing a tool to the user which can be downloaded or attached with the browser.
➔ This process is easier than the previous process because we are using less algorithms
here.
➔ Results might not be that accurate.
LITERATURE REVIEW SUMMARY OF PAPER 3
Fake News Detection Using Machine Learning Approaches
By : Z Khanam1 , B N Alwasel1 , H Sirafi1 and M Rashid2
INTRODUCTION
● This paper proposes a method to create a methodology that will detect if an article is
fake or real. It is primarily focusing on political news.
● The method will be based on the count of words ,sources and phrases using a
supervised machine learning algorithm.
● Models are created using different classification algorithm.
● Models test the unseen data and the results will be plotted.
METHODOLOGY
● A tool is implemented.
● Firstly dataset collection is done then preprocessing followed by implementing
features selection , training theme and finally testing of database.
● Finally running the classifiers.
● Algorithms used :
○ Random forest
○ Naive bayes
○ Majority voting and classifiers.
● Comparing all the algorithms to get the best results.
CONCLUSIONS
● Detection of fake news is done in two stages. Characterisation and disclosure.
● 1st stage is about the basic principles of detecting fake news.
● 2nd stage includes the use of supervised algorithms.
● Most of the research papers used naive bays algorithm and the prediction decision
was 70%-76% accurate.
LITERATURE REVIEW SUMMARY OF PAPER 4
Automatic detection of Fake News
By : Pontus Nordberga, Joakim Kävrestada and Marcus Nohlberg
INTRODUCTION AND METHODOLOGY
● Identifying key methods for the automation of fake news detection.
● Text classification is a typical task in supervised machine learning.
● Assigning categories to documents which can be a web page, media articles,
gallery etc.
● Application like spam filtering, email-routing, and sentiment analysis etc.
● Text classification is of 2 types :
○ Manual : done manually but is time consuming.
○ Automatic : NLP is used to automatically classify text much faster.
CONCLUSION
● This paper had a study of various text classification algorithms and language
processing machine learning algorithms.
● XGB(Gradient boosting algorithm) classifier was the most effective classifier
algorithm.
LITERATURE REVIEW SUMMARY OF PAPER 5
Ternion: An Autonomous Model for Fake News Detection
By : Noman Islam , Asadullah Shaikh , Asma Qaiser , Yousef Asiri , Sultan Almakdi , Adel Sulaiman
This paper was based on 3 basic things:
● Stance detection : automatic detection whether the author has positive or negative
opinions about the article.
● Author’s credibility : author’s credibility towards his articles. Explains 26 features
that describes the author’s credibility and also his article’s credibility.
● Machine learning based classification : machine learning algorithm are used to
determine the articles.
METHODOLOGY
1) Firstly there is data collection.
2) After the data collection , the data is taken for pre-processing and then the data is
converted into vector form.
3) Then happens the features extraction, info like author’s name, and stance detection
takes place.
4) Inference engine (applying of logical rules to deduce new information).
CONCLUSION AND RESULTS
● Based on the stance detection, author’s credibility and machine learning algorithms,
the process of fake news detection was done.
● There were a total of 4 algorithms were used. Decision tree, Random forest, Logistic
regression, SVM(support vector machine).
● Out of all the algorithms SVM was found to be the most accurate algorithm.
LITERATURE REVIEW SUMMARY OF PAPER 6
Fake news detection in social media
By : Kelly Stahl
INTRODUCTION AND METHODOLOGY
● This paper focused more on naive bayes algorithm for finding if the particular
text/article is real or fake.
● Methodology includes :
○ Input the link of the text from youtube or twitter on the website.
○ Website will collect data likhe comments and reviews from the platforms.
○ Based on the created model it classifies the particular text as fake or real.
RESULT AND CONCLUSION
● This model came up with a accuracy of 84%.
● This model classifies the news as fake or real on the basis of comments/reviews of
the users on the platform.
● Naive bayes algorithm is used.
● Naive bayes classifier is one of the most accurate algorithms according to this.
Views and opinion
● Graph theory and other machine learning algorithms can be considered.
● Identification of the key steps involved in the spread of news is very important.
● Fake news identification of the videos on news channels and social media platforms
like facebook, twitter etc. is the future of fake news detection.
REFERENCES
● Ahmad, Iftikhar, Muhammad Yousaf, Suhail Yousaf, and Muhammad Ovais Ahmad. "Fake news detection using
machine learning ensemble methods." Complexity 2020 (2020).
● Aldwairi, Monther, and Ali Alwahedi. "Detecting fake news in social media networks." Procedia Computer Science
141 (2018): 215-222.
● Khanam, Z., B. N. Alwasel, H. Sirafi, and M. Rashid. "Fake News Detection Using Machine Learning Approaches."
In IOP Conference Series: Materials Science and Engineering, vol. 1099, no. 1, p. 012040. IOP Publishing, 2021.
● Nordberg, Pontus, Joakim Kävrestad, and Marcus Nohlberg. "Automatic detection of Fake News." In 6th
International Workshop on Socio-Technical Perspective in IS Development, virtual conference in Grenoble,
France, June 8-9, 2020, pp. 168-179. CEUR-WS, 2020.
● Islam, Noman, Asadullah Shaikh, Asma Qaiser, Yousef Asiri, Sultan Almakdi, Adel Sulaiman, Verdah Moazzam, and
Syeda Aiman Babar. "Ternion: An Autonomous Model for Fake News Detection." Applied Sciences 11, no. 19 (2021):
9292.
● Stahl, Kelly. "Fake news detection in social media." California State University Stanislaus 6 (2018): 4-15.