A_Review_of_Deep_Learning_Models_for_Twitter_Sentiment_Analysis_Challenges_and_Opportunities

3550 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 11, NO.
3, JUNE 2024
A Review of Deep Learning Models for Twitter

Sentiment Analysis: Challenges and Opportunities
Laxmi Chaudhary , Nancy Girdhar , Deepak Sharma , Javier Andreu-Perez ,
Antoine Doucet , and Matthias Renz
Abstract— Microblogging site Twitter (re-branded to X since audio, and video, which has piqued the interest of the research
July 2023) is one of the most influential online social media community to identify, extract, and analyze user sentiments
websites, which offers a platform for the masses to communicate, exhibited in the text, referred to as sentiment analysis (SA).
expresses their opinions, and shares information on a wide range
of subjects and products, resulting in the creation of a large Among social networking sites, Twitter, with over 330 million
amount of unstructured data. This has attracted significant atten- active microblogging service users, has become a popular
tion from researchers who seek to understand and analyze the source of data for SA due to its real-time nature and the sheer
sentiments contained within this massive user-generated text. The volume of data generated [7]. The analysis of user-generated
task of sentiment analysis (SA) entails extracting and identifying content is crucial for various business applications, as it
user opinions from the text, and various lexicon- and machine
learning-based methods have been developed over the years to provides insights into users’ daily lives and explains their
accomplish this. However, deep learning (DL)-based approaches behavior and activities, as well as how they are influenced by
have recently become dominant due to their superior perfor- others’ opinions. The task of SA can yield valuable knowledge
mance. This study briefs on standard preprocessing techniques for further detailed analysis, including identifying trends or
and various word embeddings for data preparation. It then results of a particular topic based on sentiment [8], such as
delves into a taxonomy to provide a comprehensive summary
of DL-based approaches. In addition, the work compiles popular movie preferences [9], product proclivity in the market [10],
benchmark datasets and highlights evaluation metrics employed [11], or political opinions [12].
for performance measures and the resources available in the Despite the growing interest in SA, classifying the sentiment
public domain to aid SA tasks. Furthermore, the survey discusses polarity of Twitter tweets remains a crucial task due to
domain-specific practical applications of SA tasks. Finally, the several factors, including language and the lack of contextual
study concludes with various research challenges and outlines
future outlooks for further investigation. cues. Such factors may contradict the well-formed language
embodied in most corpora used for text analysis. Therefore,
Index Terms— Deep learning (DL), natural language process-
there is an increasing interest in improving sentiment classi-
ing, opinion mining, sentiment analysis (SA), social network,
Twitter. fication methods to achieve more accurate, explainable, and
traceable outcomes, as well as better performance in real-
I. I NTRODUCTION time applications. Numerous studies have been conducted to
improve SA techniques, as evidenced by the recent SemEval
O VER the past few years, social media platforms, such as
Twitter, Instagram, Facebook, and various blogging sites,
have experienced exponential growth in their user base. These
challenges [13], and there is still much work to be done to
enhance sentiment classification methods further [14].
venues allow users to be more vocal about their opinions, Various SA techniques, including traditional ones such as
emotions, and thoughts on diverse topics and items of their lexicon-based methods [15], [16], [17], [18], [19], [20], [21],
interests, resulting in the generation of a surplus multitude [22], [23], machine learning (ML) algorithms [24], [25], [26],
of data [1], [2], [3], [4], [5], [6]. Moreover, besides textual [27], [28], [29], and hybrid approaches, have been employed
content, the various aspects of multimodality include pictures, for analyzing Twitter data. In addition, graph-based approaches
have also been suggested to identify sentiment in Twitter
Manuscript received 15 April 2023; revised 28 August 2023 and datasets [30], [31]. However, these techniques have certain
21 September 2023; accepted 29 September 2023. Date of publication
27 October 2023; date of current version 31 May 2024. (Corresponding limitations, such as handling natural language complexities,
authors: Nancy Girdhar; Deepak Sharma.) short sequences of text, semantic relationships, feature selec-
Laxmi Chaudhary is with the Department of Computer Science and tion, lack of validation results, and processing large amounts
Engineering, Jaypee Institute of Information Technology, Noida 201309, India.
Nancy Girdhar and Antoine Doucet are with the Laboratoire Informatique, of data, which hinders their real-time applicability, especially
Image et Interaction (L3i), University of La Rochelle, 17000 La Rochelle, with high-dimensional features.
France (e-mail: nancy.gr1991@gmail.com). To address these limitations, deep learning (DL), a cluster
Deepak Sharma and Matthias Renz are with the Department of Com-
puter Science, Christian-Albrechts-Universität zu Kiel, 24118 Kiel, Germany of multilayer neural network algorithms, has emerged as a
(e-mail: deepak.btg@gmail.com). promising subfield of ML for Twitter SA [32], [33], [34].
Javier Andreu-Perez is with the School of Computer Science and Electronic Several DL-based models, including deep (Vanilla) neural
Engineering, University of Essex, CO4 3SQ Colchester, U.K., and also with
Simbad Group, University of Jaen, 23071 Jaén, Spain. networks (DNNs) [32], [34], convolutional neural networks
Digital Object Identifier 10.1109/TCSS.2023.3322002 (CNNs) [35], [36], [37], [38], recurrent neural networks
2329-924X © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: University of the West of England. Downloaded on December 25,2024 at 12:06:56 UTC from IEEE Xplore. Restrictions apply.
CHAUDHARY et al.: REVIEW OF DL MODELS FOR TWITTER SA 3551
(RNNs) [39], [40], and their variants such as long short-term disaster analysis. Works by De Albornoz et al. [10], Soni and
memory (LSTM) [41], [42], [43], [44], gated recurrent units Sharaff [53], and Fadel and Cemil [27] contribute to this effort.
(GRUs), and hybrid techniques, have shown effectiveness in Furthermore, few researchers have investigated the impact of
capturing the nuances of natural language and handling the data quality on SA performance, considering factors such as
noise and ambiguity present in Twitter data [35], [36], [37], readability, subjectivity, and informativeness. Kumar et al. [21]
[38], [39], [40], [41], [42], [43], [44]. These models offer and Jain and Vaidya [54] examined online product reviews to
flexible solutions that enhance SA performance by providing analyze customer feedback for applications such as business
a better interpretation of the context and semantic meaning of monitoring and brand management.
text data. In a different line of work, comparative studies, most
Motivation-Twitter as a Unique Case for SA: Twitter reviews have focused on reliability metrics such as F1-score
presents a distinctive environment for SA, characterized by or overall accuracy, and performance evaluation of methods
the specific features that set it apart from other contexts. is often carried out on small datasets [55]. These studies
First, the stringent character limit, which ranges from 280 to have shed light on domain-specific past literature or compared
10 000 characters (depending on the subscription) per tweet, the performances of different models on SA tasks. However,
leads to concise expressions. This can result in the loss despite the recent surge in DL-based developments in Twitter
of nuanced sentiment cues. In addition, Twitter users often SA, there is still a gap in the literature for an extensive analysis
employ informal language, slang, and abbreviations, posing and outline of research progress over the years. To bridge this
challenges for SA algorithms to comprehend unconventional gap, this comprehensive study presents an objective overview
language usage accurately. Moreover, the widespread use of of various SA methods, with a focus on DL approaches,
emojis and hashtags in tweets requires specialized techniques to provide an overview of existing research and identify
to effectively integrate these nontextual elements into SA, research gaps, paving the way for researchers to fill those
capturing their emotive context. gaps.
Furthermore, Twitter data are often noisy, with promotional Observing prevalent previous, current, and coming trends &
content, news updates, and irrelevant information intermingled developments, this research aims to achieve the following
with sentiment expressions. This noise hampers SA perfor- objectives.
mance and necessitates robust preprocessing techniques to 1) Systematic Taxonomy: To present a systematic taxonomy
filter out irrelevant content and enhance sentiment predic- that summarizes, compares, and reviews representa-
tion accuracy. The challenges posed by Twitter SA, such tive works for each type of approach. This provides
as handling brevity, informality, nontextual cues, and noise, new perspectives for future exploration and practices
demand tailored preprocessing strategies and algorithms to in SA.
ensure reliable SA results. 2) Preprocessing Techniques: To provide a summary of
Given the significance of SA in a vast spectrum of appli- various preprocessing techniques used to clean and
cations and the plethora of work dedicated to SA within process text data before applying DL models. To discuss
Twitter literature, various lines of review studies are presented the impact of these techniques on the accuracy of SA
by the researchers in order to highlight the advances being and their effectiveness in handling noisy and ambiguous
achieved, and the challenges yet need to be addressed. Mittal data.
and Patidar [45] focused on exploring lexicon- and ML-based 3) Overview of Traditional SA Techniques: To provide an
methods for SA on Twitter. However, this study did not delve overview of traditional SA techniques and their lim-
extensively into preprocessing methods and DL techniques. itations in processing large volumes of Twitter data.
In contrast, Silva et al. [46] conducted a comprehensive survey To discuss the challenges faced by traditional techniques
primarily centered around semisupervised approaches, encom- and the need for more advanced techniques to handle the
passing graph-, wrapper-, and topic-based methods for tweet complexities of natural language.
classification. This survey featured a comparative analysis 4) Analysis of DL-Based Approaches: To provide a detailed
of three semisupervised techniques: self-training, cotraining, analysis of various DL-based approaches for SA on
and topic modeling. Azzouza et al. [47] introduced a system Twitter, including DNN, CNN, RNN, and their variants
aimed at discovering and tracking opinions on Twitter using such as LSTM and GRU. To discuss their architectures,
Apache Storm. Through dynamic graphical visualizations, training methodologies, and their strengths and limita-
multiple opinions were represented, while an unsupervised tions.
ML technique was employed for SA and polarity detection, 5) Challenges Faced in SA on Twitter: To analyze the
and the evaluation of the model’s performance was conducted challenges faced in SA on Twitter, such as noisy data,
using SemEval datasets. In addition, Ligthart et al. [48], sarcasm, and irony. To discuss the impact of these
Wankhade et al. [49], and Das and Singh [50] have conducted challenges on the accuracy of SA and the need for
studies to gain insights into diverse tasks and approaches preprocessing techniques to clean text data.
within SA. 6) Performance Measures: To present a detailed analysis of
Some studies have summarized the technical and theoretical the evaluation metrics used to evaluate the performance
aspects of SA, as done by Yadav and Vishwakarma [51] of the models.
and Sharma and Jain [52]. Others have compiled literature to 7) Real-World Case Studies: To discuss various dimensions
address the challenges posed by large data and the expansion of SA usage, its applicability, and its influence on
of SA into domains such as marketing, finance, healthcare, and various business domains.
3552 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 11, NO. 3, JUNE 2024
8) Future Perspectives: Finally, to provide an overview TABLE I

of the future research directions in Twitter SA and DATA P REPROCESSING S TEPS
the need for more robust models that can han-
dle the complexities of natural languages and the
challenges faced in processing large volumes of
data.
The remainder of this article is structured as follows.
Section II provides the fundamental concepts of preprocessing
and word embedding. Next, Section III details the development
of Twitter SA and a review of existing literature. Section IV
summarizes various available data sources, evaluation metrics,
and tools. Then, Section V presents domain-specific case
studies and applications of Twitter SA. Section VI highlights
various research gaps and future perspectives, and finally,
Section VII concludes this study.
5) Stop-Word Removal: Stop-words are frequently occur-
II. SA ON T WITTER ring words, such as “an,” “in,” “of,” “a,” “is,” “the,”
This section provides information about the standard prepro- and “to.” However, they do not add much value to
cessing steps and different word embeddings used to perform text analysis. Hence, they are removed to reduce the
SA tasks on the Twitter dataset. noise in the data and to improve the efficiency of
SA [56].
6) Stemming: It is also termed the text standardization
A. Preprocessing of Twitter Data where the tokens are truncated to their root form to
The input data quality significantly impacts the perfor- reduce feature complexity and enhance the learning
mance of the SA models. The datasets that are used for SA capability of classifiers [56].
are often unstructured or semistructured, containing a huge 7) Lemmatization: It is a similar process to stemming
amount of irrelevant data that are not useful for predicting but with a predefined dictionary that retains the con-
sentiments. For instance, when dealing with large datasets, the text of the word and ensures that the meaning is not
computational training time can be lengthy, and the presence lost [56].
of stop-words can negatively impact the accuracy of the 8) Short-Word Removal: Remove words less than three
model. Therefore, it is necessary to preprocess the data in characters to enhance the accuracy and robustness of
order to save time during training and to increase efficiency classifiers [56].
[56]. As a consequence, preprocessing text plays a crucial 9) Case Conversion: Convert text into lowercase to avoid
role in noise reduction and data quality improvement, which any case-sensitivity issues that could affect the classi-
further elevates the model performance. Based on our literature fier’s performance [56].
review, we have compiled the standard preprocessing steps 10) Punctuation Removal: Remove punctuation marks from
adopted for Twitter SA in the state of the art, as illustrated in the text such as full stops, commas, and brackets [57].
Table I. 11) URLs Removal: URLs are references to web loca-
1) Data Collection: To collect the relevant tweets using the tions that do not provide any additional details
Twitter API1 or other tools.2,3,4 and are removed using regular expression matching
2) Data Cleaning: To remove any irrelevant information operations [57].
from the tweets, such as URLs, usernames, hashtags, 12) Expanding Contractions: Contractions such as “cannot”
special characters, and numbers. and “do not” are often used to fit within Twitter’s
3) Data Balancing: To have a balanced dataset with equal character limit of a tweet/post and are changed to actual
representation of each sentiment class. Therefore, tech- words to improve the accuracy of SA [7].
niques such as oversampling, undersampling, or data
augmentation are used to balance the data. B. Word Embeddings
4) Tokenization: To split the cleaned tweets into individual Unlike images where the input vectors are directly gener-
words or tokens. This involves breaking down a text ated based on pixel data (which are already numeric), it is
into tokens such as words, numbers, and punctuation more challenging to extract input vectors from the textual
marks [56]. This is done to prepare the data for further data (which are strings/characters) for neural network models.
processing. To deal with this, word embedding, which is the conversion of
1 https://developer.twitter.com/en/docs/twitter-api
the vocabulary of words into a vector representation, is used.
2 https://docs.tweepy.org/en/stable/index.html One-hot encoding, which is a common representation, assigns
3 https://twarc-project.readthedocs.io/en/latest/ a | V |-dimensional vector space to each word, where | V |
4 https://github.com/thepanacealab/SMMT represents the size of the vocabulary. The vector space consists
of only one nonzero entry that corresponds to the word, while than matrix factorization (a.k.a. latent semantic analy-
the rest of the entries are zeros. However, this method has sis), which only generates an efficient substructure of
some drawbacks, such as high computational requirements and vector space. On the other hand, the Skip-gram (local
the inability to handle context similarity, because each word context window approach) performs well on analogy
is encoded as a sparse, high-dimensional vector. The other tasks but does not make full use of the corpus statistics
approach is the term frequency–inverse document frequency [74], [81].
(TF-IDF) method that assigns a score that reflects the relevance 3) FastText [82]: It is an extension of Word2Vec that
of the term in the document compared to the rest of the corpus represents each word using n-grams of characters instead
though it is simple, effective, and computationally efficient to of individual words [83]. It learns word embeddings by
identify important words in a document and can be used to representing each word as a bag of characters and then
rank documents based on their relevance to a query. Never- learning embeddings for these n-grams. This allows the
theless, it does not take into account the order of words in a model to generate efficient embeddings of rarely occur-
document or their semantic meaning and may not perform well ring words in the corpus. The n-grams are employed
on documents with highly specialized vocabulary or uncom- to train a Skip-gram model, and the embedding of a
mon words [22], [25], [26]. An alternative approach is to use word is determined by summing up the embeddings
dense embedding vectors to obtain the context of words in of all its n-grams. However, it requires high memory
terms of both syntax and semantics [74], [75], [76]. The dense and system requirements to create embeddings of each
vector mapping ensures that words with similar meanings are character n-gram in the vocabulary [74], [84]. FastText
represented close to each other in the vector space. To improve has been shown to improve the performance of SA
the generalization, representation, and computational time of models on Twitter data, especially for out-of-vocabulary
sentiment classification models, various word embeddings words.
have been proposed. Word embeddings are a popular technique Besides the aforementioned three popular word embed-
for representing textual data into numeric input vectors, which ding schemes (Word2Vec, GloVe, and FastText), other
are easily processed by neural network models. Several types word-embedding approaches are also developed, such as
of word embeddings exist in the literature, and some of Bidirectional Encoder Representations from Transformers
the commonly used word embeddings are summarized as (BERT), which is a pretrained language model that has
follows. been fine-tuned for SA on Twitter data and has achieved
1) Word2Vec [77]: It is a popular word embedding model superior performance compared to other models. ELMo
that is based on neural networks that are designed to [85] is another pretrained language model that uses a
reconstruct the linguistic contexts of the words [74], bidirectional LSTM (Bi-LSTM) architecture to learn con-
[78], [79]. It employs a two-layer neural network archi- textualized word embeddings and has shown state-of-the-art
tecture that takes text as input and generates a vector results, especially for sentiment classification at the sentence
embedding for each word as output. There are two level.
types of Word2Vec models: Skip-gram and continuous
bag of words (CBOW). The Skip-gram model predicts
III. D EVELOPMENT OF SA AND
a D-dimensional vector representation of each word
L ITERATURE C LASSIFICATION
in the corpus. The input and hidden layers have the
same number of neurons as the vocabulary size and This section elaborates on the recent advances in the field of
the word vector dimensions, respectively. The weights Twitter SA. For this study, we have considered publications of
between these layers are represented by W H ×D , where the Scopus database from 2010 to 2022. The Scopus document
H is the size of the hidden layer. This weight matrix search string in this study was composed as follows—(TITLE-
signifies the likelihood of each word’s occurrence for ABS-KEY(“twitter” AND “sentiment” AND “deep” AND
that input. The model learns the correlation between “learning”) AND (LIMIT-TO (LANGUAGE, “English”))),
words in a vocabulary by computing the error at the which resulted in 1115 research papers. The section highlights
output layer using a loss function and updating the the yearwise publications, top organizations, key researchers,
weights (word embeddings) through backpropagation. and prominent source titles of this domain. Furthermore,
In contrast, the CBOW model processes the context of a detailed survey is presented on Twitter SA, bifurcated into
a word as input and predicts the word based on that conventional and DL-based approaches along with hybrid
context [74], [79]. techniques.
2) GloVe or Global Vectors [80]: Glove is another popular
word embedding model that is based on co-occurrence
statistics. It is an unsupervised learning method that A. Recent Trends and Developments
learns word embeddings by factorizing a matrix of 1) Annual Trends: Fig. 1 displays the annual trend
word co-occurrence probabilities and combines the local of research publications on SA using Twitter data
context window and matrix factorization methods to from 2010 to 2022. The x-axis represents the publi-
analyze the local and global statistics of a corpus [80]. cation years, while the y-axis indicates the publication
It performs better in apprehending the analogy of words count recorded in the Scopus database. The data
Fig. 1. Annual publication count. Fig. 3. Top sources.
at the Faculté des Sciences et Techniques, France,

contributed nine articles. King Abdulaziz University,
Saudi Ar, the Wessex Institute of Technology, U.K.,
the Instituto Politécnico Nacional, Mexico, and the
Universidad de Granada, Spain, published eight articles
each. This analysis provides a detailed overview of the
affiliations of authors worldwide, who have made sig-
nificant contributions to Twitter SA research in the past
12 years.
3) Key Authors: Table II presents the top ten researchers
globally, who have published the most articles on Twitter
SA in Scopus, along with their respective organizations.
The leading author in terms of publication count is
Mohammed Erritali from Sultan Moulay Slimane Uni-
Fig. 2. Top organizations. versity, Beni Mellal, Morocco, with a total of nine
articles. The subsequent five top authors come from
organizations in Italy, the United Kingdom, and Spain,
show a gradual increase in the number of publi- and have contributed eight articles each. Furthermore,
cations from 2010 to 2016, followed by a dip in Ana Reyes-Menendez, affiliated with Rey Juan Carlos
2017. However, 2018 and 2019 witnessed a signifi- University, Spain, has published seven articles. The
cant surge in publication counts compared to previous remaining three authors, affiliated with organizations in
years, with a remarkable 19-fold increase in 2019 com- South Korea, Morocco, and Spain, have published six
pared to 2011. Despite a decline in 2020, the trend articles each. This analysis highlights the presence of
has shown an upward trend from 2020 to 2022, sug- highly productive research groups focusing on Twitter
gesting a sustained interest in Twitter SA research SA in Spain.
over time. Specific publication counts for each year 4) Key Sources: Fig. 3 presents an analysis of the top
are mentioned above the corresponding bars in the ten source titles that published articles on Twitter SA
chart. between 2010 and 2022. The Lecture Notes in Computer
2) Key Organizations: Fig. 2 presents an analysis of Science, which includes the subseries Lecture Notes
the top ten organizations that published the most in Artificial Intelligence and Lecture Notes in Bioin-
Twitter SA-related articles in the Scopus database formatics, had the highest 72 publications on Twitter
from 2010 to 2022. The Universidad Rey Juan Carlos, SA. The second-highest number of publications, 37,
Spain, published the highest number of 16 articles on came from Advances in Intelligent Systems and Com-
Twitter SA during this period, followed by Yeung- puting, followed by Communications in Computer and
nam University, South Korea, with 13 articles. The Information Science with 35 articles, and the ACM Inter-
Université Sultan Moulay Slimane, Morocco, published national Conference Proceeding Series with 31 articles.
a total of 12 articles, while the Vellore Institute of The Ceur Workshop Proceedings and Technology and
Technology, India, and the Sapienza Università di IEEE ACCESS had 28 and 22 publications, respectively.
Roma, Italy, had 11 publications each. The researchers In addition, Social Network Analysis and Mining and
TABLE II
K EY AUTHORS (P: P UBLICATION ; C: C ITATION ; AND AC: AVERAGE C ITATION )
Fig. 4. Top countries.

Fig. 5. Keywords word cloud.
the International Journal of Advanced Computer Science

and Applications published 15 and 12 articles each. based on the author’s mentioned keywords in sci-
Finally, Lecture Notes in Networks and Systems and entific publications that spotlight key themes and
Procedia Computer Science were ranked ninth and tenth, topics of interest of various researchers in this
with 11 publications each, respectively. This analysis field.
provides detailed information on the significant source
titles that have contributed to the research in this B. Literature Survey
domain. SA has drawn significant attention from the research
5) Global View: Fig. 4 displays the relative percentage of community and emerged as a topic of interest, and thus,
publications according to the article count per country. surfeit approaches and techniques are proposed to address
It is evident from the figure that a majority of the this task. In this section, we partition diverse approaches
publications on Twitter SA have first authors from Asia, proposed in the literature into two broad categories: conven-
followed by the USA and Europe. India has the highest tional and DL. The first category, conventional approaches,
number of publications in this field, with 222 articles, includes methods based on lexicons and ML, and the
indicating a considerable research interest in Twitter latter is based on DNN models. Fig. 6 depicts the evo-
SA among Indian authors. United States researchers lution of SA techniques, from lexicon-based methods to
have the second-highest number of publications, with DL methods, and Fig. 7 categorizes the publication counts
145 articles. Other countries such as Spain, Italy, China, of trends from various approaches during the given survey
the United Kingdom, Brazil, Saudi Arabia, South Korea, period.
and Morocco have also contributed a significant amount 1) Lexicon-Based Approaches: Popularly known as rule-
to this field, providing a decent level of diversity. /corpus-based approaches, they rely on predefined dictionaries
Furthermore, researchers from Egypt, Mexico, Pakistan, or word lists with assigned polarity scores to determine the
Australia, Germany, Turkey, Japan, Iran, Malaysia, and sentiment of a given dataset without any training. One of
Canada have also made a noteworthy contribution to this the earliest and most widely used sentiment lexicons is the
area of research. SentiWordNet [86]. Other popular lexicons include the AFINN
Fig. 5 showcases the visual representation of the [87], the VADER [88], WordNet, and q-word, which are used
trending keywords for SA. The word cloud is by researchers to match words from the input statement [89].
Fig. 6. Taxonomy and objectives of DL-based Twitter SA.
Fig. 7. Publication counts of trends across various approaches (2010–2022).
Many studies have utilized lexicon-based approaches, such Al-Khalisy and Jehlol [19] proposed a dictionary-based
as by Jurek et al. [90] who developed an SA algorithm that approach for extracting significant information from terrorist
focuses on real-time analysis of Twitter content. This method propaganda such as account name, location, and supporter
includes two main components: a combination function based data. This method utilized a bag of words (BOW) to
on evidence and sentiment normalization, which are used to compute the overall scores for each tweet that represents the
estimate the sentiment intensity. Table III provides further training data and to analyze the polarity; the created word list
details on lexicon-based state of the art. comprised of antonyms and synonyms from the dictionary.
Lexicon-based approaches can be divided into two subcate- Chalothorn and Ellman [15] suggested the use of lexical
gories: dictionary-based and corpus-based. Dictionary-based resources, such as NLTK toolkit, SentiWordNet, and WordNet,
approaches use predefined dictionaries for instance Senti- for the analysis of online radical posts. The polarity and text
WordNet, and WordNet to perform SA [18], [19], [20], [21]. intensity are calculated to analyze the sentiment. For this, the
Corpus-based approaches leverage corpus data for sentiment text corpus was initially acquired from various web platforms,
classification, which are further divided into statistical and such as Qawem and Montada, and after essential data prepro-
semantic subcategories [24]. The statistical category comprises cessing, various attribute-driven measures were employed to
conditional random field (CRF) [91], K-nearest neighbors identify and manage extremist and religious content. Based on
(KNNs) [92], hidden Markov models (HMMs) [53], and [15], Gitari et al. [16] build their hate verb lexicon, starting
so on. with a basic verb list, and expanding it iteratively by adding
synonyms and hypernyms of the seed verbs depending on ist affiliations in social media communication. Their model
WordNet relations. Simon et al. [23] developed a corpus-based classifies user reviews into positive and negative sentiments
approach that uses divergent behavior to analyze the sentiment to reflect affiliations with extremist or nonextremist groups.
of tweets during the Kenya Westgate Mall attack to find the However, this method does not consider the overall depen-
radicalization time among the users of Twitter. The emergency dencies concerning a sentence in a given document. Rani
organizations and communication centers recommended by the and Singh [98] proposed an SVM model with features
authors minimize the use of negative sentiments when they extracted using the TF-IDF method for SA in which they
communicate with the public. Another corpus-based method detected sentiment polarity using two SVM methods and
was proposed by Mansour [22] to analyze public sentiment concluded that the linear SVM model outperformed the kernel
polarity from Eastern and Western countries toward Islamic SVM.
State of Iraq and Syria ( ISIS). This method employs text Omer [101] proposed an ML-based approach that col-
SA using TF-IDF for analyzing the frequency of words and lects and uses three different datasets, including supporters
word sentiment. The other lexicon-based method proposed by of ISIS, antisupporters of ISIS, and random tweet datasets
Kharde et al. [18] uses part-of-speech (POS) tagging, while the that are unrelated to ISIS. The method employs three
lexicon approaches presented by Ferrara et al. [20] and Kumar primary classifiers, namely, NB, AdaBoost, and SVM.
et al. [21] depend on a dictionary for feature extraction from Nouh et al. [103] developed a novel ML-based approach
the dataset. to analyze radical content and extremism propaganda in
Numerous techniques have utilized lexical approaches as tweets. Kaati et al. [102] introduced a method for identify-
they do not require annotated data, which is one of the key ing the Twitter accounts of jihadist group supporters and
challenges in the SA task. However, these methods have online propaganda propagators using feature engineering,
certain limitations such as their accuracy being influenced by which involves analyzing data dependencies and classify-
the size and quality of the lexicon. Moreover, these approaches ing features as data-independent or data-dependent. Ferrara
cannot handle sarcasm and irony, which are common on et al. [20] developed an SA technique that uses meta-
Twitter. Furthermore, these methods cannot handle out-of- data as a feature, together with a greedy selection method,
vocabulary words, which can lead to incorrect sentiment and applies the random forest classifier and logistic regres-
classification. This is particularly problematic for Twitter data, sion models to predict the extremists’ sentiment polarity in
which is constantly evolving, requiring frequent updates to the interactions.
lexicon. In addition, they rely on handcrafted features, which Omar et al. [104] identified the relationship between hate
can be a laborious and time-consuming process. Another speech and topics present on online social platforms based on
limitation of these methods is that they are not very effec- an ML method. This approach utilizes multilabel classification
tive at generalizing to different domains or context-specific by employing logistic regression, linear SVC, and random
orientations [23]. forest classifiers. To classify text sentiment into positive,
2) ML-Based Approaches: These techniques have been neutral, or negative, the authors have utilized feature repre-
extensively employed for SA on Twitter in recent years. sentations that include TF-IDF, N-gram, and BOW. Rehman
Leveraging statistical techniques, these methods have the et al. [107] have proposed a method to detect radical text
ability to automatically learn patterns and relationships from on Twitter, where religious language plays a significant role
data, which are then used to classify the sentiment of the in radicalization. The authors have utilized both radical and
text. religious features for training the model and applied TF-IDF
ML approaches broadly come under the umbrella of con- for feature engineering to feed into ML classifiers, includ-
ventional methods that constitute popular techniques such ing random forest, SVM, and NB, to detect the sentiment
as support vector machines (SVMs) [24], [98], [99], Naïve polarity.
Bayes (NB) classifier [24], [25], [99], and maximum entropy In order to improve the accuracy of SA, researchers
classifier [55], [100]. These techniques have been used in have dedicated efforts to developing ML-hybrid mod-
several studies for SA, including studies on terrorism [25], els [26], [29], [108], [109] that integrate multiple ML
[101], [102], hate speech detection [103], [104], customer approaches to address the shortcomings of individual meth-
satisfaction [105], and sentiment polarity detection [27] For ods. While these hybrid models have led to better results,
instance, Wei et al. [24] have used the NB algorithm to there is still potential for further enhancement of their
classify tweets as positive, negative, or neutral based on the outcomes.
presence of specific words in a tweet. Additional traditional Despite the success of ML-based Twitter SA approaches,
ML approaches have also been utilized in various other studies there still exist challenges that need to be addressed. One
[26], [27], [28], [29], [101], [103], [106], which are presented of the major limitations is their dependence on the quality
in detail in Table IV. and size of the training dataset. If the training dataset is
Wei et al. [24] proposed a KNN classifier-based approach biased or too small, it may lead to poor performance of
for sentiment classification to identify extremist-related the model. Another limitation is their inability to handle the
conversations on Twitter public tweets. Similarly, Azizan ambiguity and complexity of the multilingual dataset and
and Aziz [25] utilized the NB algorithm to detect extrem- their inadequacy to efficiently capture relevant features from
TABLE III
L EXICON -BASED A PPROACHES
short sequences of text (short text). For instance, sarcasm methods struggle with such surplus data, whereas DL models
and irony in tweets can often be misinterpreted by these excel as they can learn more features while training on
models, leading to incorrect sentiment classification. Also, vast datasets, resulting in superior accuracy and performance
their performance relies on the amount of annotated data efficiency. In addition, DL models equipped with multiple
available for training, making them highly data-dependent. hidden layers enable them to capture complex and nonlinear
However, annotating tweets is a costly process due to the patterns in the data [33] easily even in short-text data, such
dynamic nature of Twitter content. Moreover, these methods as “tweets.” Therefore, a plethora of DL-based models are
are domain-specific, which means that their effectiveness is developed over the past few decades to analyze text senti-
limited to the domain in which they are trained. If they ments of posts on various social media platforms, including
are used in a different domain, their efficacy decreases, and Twitter.
they need to be retrained in that domain to perform well To gain better insights into recent years’ advancements,
[51]. Furthermore, the step of feature extraction and engineer- the current survey bifurcates the DL literature into a tax-
ing in ML-based methods is computationally expensive and onomy broadly categorized as basic and transformer-based.
time-consuming. Basic DL models consist of DNN [32], [33], [34], CNN,
Overall, the use of ML approaches for Twitter SA has [35], [36], [37], [38], RNN [40], and LSTM [110], whereas
shown promising results; nevertheless, there is still room transformer-based includes BERT [111], RoBERTa [112],
for improvement. Further address is needed to develop more XLNet [113], GPT [114], and so on. Besides, in these
robust and efficient models for SA, which can handle the two major categories, there are many DL-hybrid methods
complexities of natural languages. proposed by the research community for Twitter text SA
along with recent developments of graph-based methods that
3) DL-Based Approaches: Unlike handcrafted feature engi- are classified under the “other” category in this study. The
neering in ML approaches, models based on the DL paradigm current section details the literature for each category of DL
are capable of automatically extracting significant features models. Fig. 8 presents the generic DL-based pipeline for
from the text and have shown state-of-the-art results for SA Twitter SA.
tasks.
DL models offer several advantages over conventional meth- 1) DNN is a type of artificial neural network that consists
ods for SA and, thus, have become a recent emerging research of multiple hidden layers between the input and output
area for Twitter SA tasks. DL models are well-suited for layers (as shown in Fig. 9). These textit vanilla neu-
handling large amounts of data that are generated every day ral networks can efficiently handle complex nonlinear
on social media platforms. For example, on Twitter, about relationships between the layers compared to con-
6000 tweets are produced per second on average, resulting ventional single-hidden-layer architectures. To perform
in approximately 200 billion tweets per year. Traditional the Twitter SA task, the DNN model implicitly learns
TABLE IV
T RADITIONAL ML- AND H YBRID -BASED M ETHODS
different features from the input data in a feedforward and they have a greater number of parameter values too,
manner where each layer is fully connected with the making them difficult to train [33].
next layer. During the training step, backpropagation 2) CNN is one of the DNN-based model variations that
is used to learn and adjust the weights among neu- typically consist of a sequence of convolutional and
rons. The weights are updated depending on the error pooling layers, followed by one or more fully connected
obtained at the output layers. Ali et al. [32] developed layers for sentiment classification. The convolutional
a DL-based SA model using RapidMiner to predict layers use filters of varying sizes to extract local features
the results of general elections in Pakistan in 2018. from the input, while the pooling layers reduce the
Similarly, Yasir et al. [34] employed a DL model to dimensionality of the data by downsampling the output
forecast the interest rates of five countries, utilizing of the convolutional layers. The extracted features are
Twitter sentiments as an input. They have also integrated then fed into the fully connected layers for classification
regression models such as linear and support vectors in purposes. CNN-based models have shown to be effective
their analysis. for Twitter SA and can learn complex features from the
The DNN-based models have shown superior perfor- input data, allowing them to capture the context and
mance compared to traditional ML models due to their meaning of the words. In addition, being computation-
ability to learn complex features from the data. However, ally efficient and easy to train on large datasets, these
these models have a larger number of hidden layers, models have opted for SA in various domains.
TABLE V
DL-BASED M ODELS
A generic architecture of CNN is illustrated by Fig. 10. input is unrelated to the output, which means they
The most important component in CNN is the convo- do not help in dealing with contextual dependencies
lution layer. The convolutional layer h is formed by present in the dataset. To address this, RNNs [40]
applying the activation function f (·) to the input matrix aid from previous state information to handle contex-
X , which is convolved with the weight matrix W k and tual relationships to capture the temporal dependencies
added to the bias term bk for each layer. The elements in between words of data. These models use a hidden state
the ith row and the jth column of W k and X are referred that is updated at each time step, allowing them to
to as wi,k j and xi, j , respectively. The resulting k feature apprehend the context and meaning of the words. Each
map of the hth layer has a dimension C × H × W , word in the text is considered as a separate input at a
where C, H , and W represent the channel, height, and given time t, and previously hidden state information
width, respectively. One can create a convolutional layer, is employed to process the current input, as presented
denoted as h, by using k small filters (also known as in Fig. 11.
kernels) of size Ni × N j , as shown in (1). These filters At a particular time step t, the input vector X t and the
perform a cross correlation operation, convolving the output vector h t−1 from the preceding RNN layer are
input pixel xu,v to obtain h ku,v fed as inputs to the current RNN layer. The output for
  that time step is then computed using these two input
Ni XNj
X vectors using (2)
h ku,v (X u,v ) = f  wi,k j xu+i,v+ j + bk . (1)
i=1 j=1 h t = tanh(h t−1 Wh + X t W X + b). (2)
Zola et al. [116] have developed a word-embedded While RNNs are great at learning sequential data, they
CNN model to address cross-domain issues encoun- cannot obtain local attributes in parallel. As a result,
tered while performing SA. The model employs web RNN models are complementary to CNN models since
sources such as Amazon and TripAdvisor, which contain they maintain sequential information over time. LSTM
easily labeled reviews, for fitting a sentiment predic- and GRU are extensions of RNNs widely used for SA
tion model. This model is later reused to classify on Twitter.
the sentiment polarity of two unlabeled social media 4) LSTM: As RNNs may suffer from exploding gradient
platforms: Twitter and Facebook. The authors have and vanishing gradient issues, which makes it difficult to
also explored various techniques such as POS tagging, handle long-term contextual dependencies and fine-tune
stemming, undersampling, oversampling, and handling their parameters. This can lead to difficulty in training
unlabeled sentiment data to reduce word sparsity. and remembering long-distance correlations in a sequen-
Paredes-Valverde et al. [64] have proposed an approach tial manner [39]. To resolve these issues, an LSTM,
based on Word2Vec for sentiment classification, which a variant of RNN, restructures the RNN by introducing
helps companies and organizations identify opportuni- a memory cell and a gate to retain information for
ties for improving the quality of their products and further utilization and updates [110]. By modifying the
services. RNN layer, the LSTM model solves both exploding and
Alharbi and de Doncker [67] have developed a CNN vanishing gradient problems occurring in RNN models.
model that incorporates user behavioral details present LSTM models are beneficial for sentiment classification
in a document, such as a tweet, for SA. The authors since they can apprehend both long and short-term
have utilized two datasets provided by the SemEval- dependencies and have obtained notable results in
2016 Workshop to evaluate the model’s performance. this task. Also, these models are capable of solving
This approach suggests that considering the content time-series and sequential problems with remarkable
of a document or a tweet beyond its availability outcomes.
is advantageous in SA, as it provides the model As shown in Fig. 12, an LSTM cell with input feature
with an in-depth understanding of the classification xt receives input data x at time t, and an input gate i t
task. regulates the input data’s flow into the cell. The forget
Overall, CNN-based models have shown to be effec- gate f t determines when to discard the contents of the
tive for SA on Twitter data. While these models are cell’s internal state, while the output gate ot governs
designed to extract local features from the input and the flow of information to the output. Equations (3)–(8)
may have limitations in capturing long-term dependen- summarize the cell function
cies between words, recent studies have shown that
it = σ (Ui xt + Wi h t−1 + bi ) (3)
incorporating attention mechanisms can improve their
performance. ft = σ (U f xt + W f h t−1 + b f ) (4)
3) RNNs: The main drawback of CNNs is their inabil- ot = σ (Uo xt + Wo h t−1 + bo ) (5)
ity to understand the relationships between sequences. gt = σ (Ug xt + Wg h t−1 + bg ) (6)
In addition, the effectiveness of the CNN technique
ct = gt ⊙ i t + f t ⊙ ct−1 (7)
largely depends on choosing an appropriate window
size of kernels [39]. CNN models assume that each ht = ot ⊙ tanh(ct ). (8)
Fig. 8. Generic pipeline of Twitter SA.
Fig. 9. DNN architecture [115].
Fig. 12. LSTM architecture [119].
ct and a hidden unit h t . The initial values for c0 and

h 0 can be initialized to zero. LSTM model parameters
consist of weight matrices U and W , as well as a bias
vector b.
Tam et al. [120] developed an LSTM-based model to
learn the text sequence and find the relation between
Fig. 10. CNN architecture [117].
words or phrases for sentiment classification. This
model also improves the semantic information of tweets
and enhances the learning model’s efficiency. Another
LSTM model [110] was developed, to overcome the
limitations of RNNs, which can learn long sequences
of data with time lags. The significant advantage
of using the LSTM model is the recurrent units
that allow long-range learning. A hidden state in
augmented form is also included with nonlinearity,
which permits updating the states, propagating it
without any modification, or resetting, by employing
Fig. 11. RNN architecture [118]. simple learned gating functions. Drif and Hadjoudj [42]
have proposed two multilevel LSTM models: one based
on user and content-specific features and the other one
The logistic sigmoid function is denoted by σ , and based on user, content, and sentiment features. Drif
the elementwise vector product operation is denoted and Hadjoudj [42] conducted a case study on social
by ⊙. At any given time t, an LSTM architecture media platforms to gain insights into sentiment intensity
includes three gates: an input gate i t , a forget gate and the influence of social networking platforms on
f t , and an output gate ot , as well as a memory cell political protests. They built an LSTM model to analyze
the effects of sentiment, user, and content on the

dissemination of information, using the learning
ability of the model to predict retweetability.
Zhu et al. [43] proposed a sentiment index for
the Chinese housing market by analyzing the sentiment
expressed on social media regarding house prices.
Imran et al. [44] presented a research study to analyze
public reactions to the novel Coronavirus and the
subsequent actions taken by different countries from
different cultures. They leveraged LSTM to estimate
the sentiment polarity and trained their model using Fig. 13. Bi-LSTM architecture [121].
emotions extracted from tweets to achieve higher
accuracy on their dataset.
Unlike LSTM, where information moves from back-
ward to forward, Bi-LSTM allows information to
flow in both directions through two hidden states,
which avoids the need for decay in future data inclu-
sion. Fig. 13 depicts an architecture of a Bi-LSTM
−
→ ←−
model where the forward h and backward h
sequences, respectively, are represented by the red and
green arrows, and the calculations are mentioned as
follows:
−
→ −−→
h t = g U−→ x t + W−
h
→ h t−1 + b−
h
→
h
(9)
←− ←−−
h t = g U←− x + W←
h t
−h
h t−1
+ b←
−
h
(10) Fig. 14. GRU architecture [124].
− → ←−
yt = g V−→ h t + V← − h + by . (11)
h h t
(rt ), making its calculation less complex and effective
In the study presented by Schuster and Paliwal [122], in capturing long-term relationships between sequence
a Bi-LSTM model is utilized, which employs two elements [125].
independent recurrent networks to extract contextual The reset gate is responsible for reducing the signifi-
relationships in both the forward and backward direc- cance of the past hidden state (h t−1 ) if it is deemed
tions and enhances the limitations of the LSTM model unnecessary for computing the new state, while the
in text sequence features. Feizollah et al. [37] proposed update gate determines the proportion of the previ-
a sentiment classification method using CNNs, RNNs, ous state (h (t−1) ) that should be incorporated into
and LSTM for tweets related to Halal cosmetics and the next state (h t ). The output state (h t ) is deter-
Halal tourism, while Wang et al. [38] developed a sen- mined by a combination of the candidate output state
timent prediction method using CNNs and Bi-LSTM to (het ), the input vector (xt ), and the previous out-
model multidimension and multilevel social media text put state (h t−1 ). The gates are updated using the
to improve performance and textual semantic context. sigmoid function (σ ), and vector multiplication is
Blanco and Lourenço [123] have proposed an approach accomplished through elementwise multiplication (⊙).
based on CNN and Bi-LSTM models for a better During training, the parameters for the gates (Wr ,
understanding of both optimistic and pessimistic senti- Wz , and Wh ) and biases (br , bz , and bh ) are learned.
ments related to COVID-19 discussions on Twitter. The The calculation can be expressed using the following
authors have utilized a pretrained transformer embed- equations:
ding for extracting significant semantic features from the
data. z t = σ (Wz xt + Uz h t−1 + bz ) (12)
5) GRU: It is a frequently employed variation of the rt = σ (Wr xt + Ur h t−1 + br ) (13)
RNN model, which was introduced to overcome the het = tanh(W xt + U (rt × h t−1 ) + bh ) (14)
challenge of the vanishing gradient problem [124]. This
h t = z t × h t−1 ⊕ (1 − z t ) × het . (15)
challenge is similar to the one addressed by the LSTM
model. However, in various tasks, the GRU model Jabreel and Moreno [62] introduced a GRU-based model
has been found to outperform the LSTM model, with for the multiemotion classification of Twitter data. This
the exception of language modeling [110]. In contrast model was designed to be entirely data-driven and did
to the three gates in LSTM (input, forget, and out- not require external resources, such as emotion lexicons
put), the GRU architecture (as depicted in Fig. 14) is or POS taggers. However, the model faced difficulties
simpler comprising two gates: update (z t ) and reset when classifying emotions in extended and complicated
text sequences. To improve the model’s ability to capture

intricate linguistic features and context, researchers have
explored the use of an attention-based mechanism in SA
for Twitter.
6) Transformer-Based: Models, such as BERT [126],
[127], Robustly Optimized BERT pretraining approach
(RoBERTa) [128], eXtreme MultiLingual Language
Model (XLNet) [129], and generative pretrained trans-
former (GPT) [130], have been fine-tuned for SA
on Twitter data. These models use attention mech-
anisms to weigh the importance of different words
in a text and can identify key patterns and relation-
ships between words, making them particularly effective
for analyzing short and noisy text, such as Twitter
posts.
The BERT model, introduced by Google AI in 2018 Fig. 15. BERT architecture [131].
[111], is a bidirectional language model consisting
of several encoders, an attention head, and a large
feedforward neural network. Each layer comprises a with lexicon methods to predict patterns in tweets related to
self-attention mechanism to process the input, which Kenya terrorist attacks. Gupta and Joshi [146] proposed a
is then passed to the next layer via the feedforward hybrid model that extracts feature vectors from SentiWordNet
network. Initially, the input to the model is a sequence to build an SVM classifier for Twitter SA. Du et al. [147]
of words with tokens, and the output is a vector repre- applied hierarchical ML to extract sentiment from opinions
sentation of the sequence. This representation, obtained about HPV vaccines on Twitter and concluded the method to
from the first token of the input sequence, is used for be highly efficient. Fadel and Cemil [27] presented a hybrid
sentiment classification. The output combined with a model to classify reviews on terrorist attacks posted on Twitter.
softmax layer and a feedforward neural network is then The model utilizes a lexicon approach to generate a labeled
used to determine the distribution of the target class. training dataset and an ML approach to finally build the
Fig. 15 illustrates the BERT architecture for Twitter SA. model.
Bedi and Toshniwal [74] proposed BERT-based classifi- Although the combination of lexicon and ML methods has
cation models for complaints and sentiment to improve shown promising outcomes, there are still limitations that must
the accuracy of energy-related tweets. Chandra and be addressed to improve SA efficiency. One such drawback
Saini [132] developed a framework to model the U.S. is the reliance on the quality of the lexicon, which may not
general elections using two models, LSTM and BERT, be adequate to handle complex semantic contexts, such as
to investigate if sentiment classification could pre- sarcasm, or filter out irrelevant words that add noise to reviews.
dict election outcomes. Eke et al. [73] proposed a In order to overcome the dependence on lexicon quality,
technique for sarcasm identification on IAC-v2 and various studies are based on hybrid DL models that are capable
Twitter data, using context-based features, employ- of dealing with complex word patterns, thereby improving the
ing three models, including Bi-LSTM, BERT, and performance of SA tasks. Numerous hybrid DL models have
traditional ML. been suggested in the literature to improve the performance of
Table V summarizes the literature on Twitter SA. DL models used for Twitter SA [61], [142]. Singh et al. [142]
developed a hybrid DL model integrating LSTM and RNN
models with attention layers to predict the sentiment of Twitter
C. Hybrid Approaches data related to COVID-19. Ahmad et al. [61] presented a joint
To combine the strengths of individual models, several approach of LSTM and CNN models to classify extremist-
researchers adopted hybrid models that integrate two or more related tweets.
approaches for instance lexicon and ML or lexicon and DL. Salur and Aydin [141] proposed the amalgamation of
The combination of these methods can help to overcome the various embeddings with multiple DL models, including
limitations of each approach [144]. The advantage of combin- LSTM, CNN, BiLSTM, and GRU, to extract features
ing learning-based approaches and lexicon is that it eliminates from word embeddings and then merge them for senti-
the need for manual labeling of training data and allows the ment classification. Tam et al. [120] suggested a Con-
measurement and detection of polarity at the conceptual level. vBiLSTM model, which integrates Bi-LSTM and CNN
Ngoge [145] developed a hybrid approach that combines ML to classify sentiment using Word2Vec and GloVe to
techniques with lexicon methods to classify sentiment for the obtain tweet embeddings. Shehu et al. [60] applied three
identification of terrorist activities. This approach uses SVM, data augmentation methods to increase the training size
NB classifier, and maximum entropy methods in combination of stemmed Turkish Twitter data and subsequently used
RNN, hierarchical attention network (HAN), and CNN includes semantic, statistical similarity information and man-
for SA. ually calibrated sentiment scores that enhance the senti-
Jalil et al. [65] applied a hybrid model to analyze tweets ment lexicon quality. Hussain and Cambria [31] analyzed
collected on COVID-19 using various classifiers and fea- knowledge-based reasoning using a vector space and SVM
ture sets. Jain et al. [66] suggested a hybrid CNN-LSTM model, which utilizes lexical and graph representations for
model that uses word embedding to convert texts into vec- SA. Aflakparast et al. [150] proposed a Bayesian graphi-
tors to classify sentiments of the text. Wu et al. [136] cal model to examine Twitter data. Demotte et al. [56]
proposed a hybrid approach to summarize opinions on Chi- presented a CN-based model that utilizes GloVe embed-
nese microblogging systems using CNN and the Ortony, dings and dynamic/static routing to analyze social media
Clore, and Collins (OCC) model that is a rule-based export content. Lovera et al. [7] developed a hybrid DL with
mechanism. a KG to analyze sentiment in a short text, such as
While the DL- and ML-hybrid models leverage the Twitter posts. Aflakparast et al. [150] concluded that the
strengths of both DL and ML algorithms to achieve bet- results obtained from graph-based methods are promising,
ter SA performance, they still have limitations in terms of but there is still some ambiguity regarding the relation-
capturing nonlinear data complexity. Recently, new advance- ship between identified clusters and actual ratings. Moreover,
ments have been made by exploring the field of knowledge these methods can be computationally demanding and time-
graphs (KGs), graph neural networks (GNNs), capsule net- consuming, and may not necessarily lead to improved
works (CNs), and so on, which we have detailed in accuracy.
Section III-D. Table VI presents a summary of the advantages (pros) and
disadvantages (cons) of Twitter SA methods based on different
criteria.
D. Other Methods
The “other methods” category comprises KG-, GNN-, and IV. P UBLISHED DATA S OURCES AND T OOLS
CN-based approaches that utilize the Twitter graph’s properties This section details different versions of the Twitter dataset
and characteristics. While the GNNs are a subset of DL, exploited in the existing literature and additional information
their distinct characteristics in handling graph-structured data about other similar datasets adopted for SA. Furthermore,
necessitate their separate classification. GNNs leverage the it details the employed performance metrics used for the
graph-based learning paradigm, which fundamentally differs evaluation of the proposed approaches [51]. Moreover, the
from the standard feedforward learning approach used in most section highlights the diverse tools and libraries leveraged for
traditional DL models. In GNNs, each node in the graph is SA tasks.
associated with a feature vector, and learning involves updat-
ing node representations by aggregating information from
A. Dataset Description
their neighboring nodes iteratively, where nodes represent
users, hashtags, or words, and edges represent the com- In the presented work, we have broadly divided the var-
plex, nonlinear relationships among the nodes [148], [149]. ious popular benchmark datasets used for SA into two
Unlike traditional DL models, such as CNNs for images categories of Twitter and other. Table VII summarizes the
or RNNs for sequences, which process fixed-sized inputs, dataset details and provides information about the size,
GNNs operate directly on irregular graph structures. This polarity, source, and publications that have utilized them.
recursive information propagation mechanism enables GNNs Twitter has emerged as a prominent platform for SA due
to capture complex patterns and dependencies within the data, to its large user base and the availability of real-time data.
especially in scenarios where traditional DL models struggle Researchers have used various versions of Twitter datasets
due to their fixed-sized input representations. Furthermore, for SA, ranging from general datasets to domain-specific
unlike other approaches, these methods do not require large datasets. One commonly used dataset is the Sentiment140
amounts of manually annotated data as they automatically dataset [151], which contains 1.6 million tweets labeled
collect annotated data using links between users and tweets, as positive or negative. It has been widely used for SA
such as replies, followers, and previous tweets. However, research and benchmarking, and has been utilized to evaluate
they are domain-specific since the relationships and sentiment the performance of various DL models. Another common
lexicon they use are tailored to the domain. These methods dataset is the SemEval, which contains tweets related to
assume that sentiment and rating are interdependent, and specific events or topics. The dataset provides labels for
they address the issue of existing approaches where posi- three categories: positive, negative, and neutral. SemEval [13]
tive sentiment can be expressed using words with negative datasets have been used for SA research and competition,
connotations. providing a more challenging task due to the inclusion
Li et al. [30] proposed a graph-based technique, DWWP, of neutral tweets. Several other domain-specific datasets
which includes domain-specific word detection (DW) and include data related to politics, finance, and healthcare. These
word propagation (WP) methods. DW handles new words datasets provide a more targeted analysis of public sentiments
invented by users and converts the sentiment of words within specific domains and can be useful for real-world
using assembled mutual information (AMI), while WP applications.
P ROS AND C ONS OF VARIOUS T WITTER SA M ODELS

TABLE VI
(Continued.) P ROS AND C ONS OF VARIOUS T WITTER SA M ODELS

TABLE VI
(Continued.) P ROS AND C ONS OF VARIOUS T WITTER SA M ODELS

TABLE VI
It is a standard practice for researchers to use the Twitter actual sentiment scores, taking into account the rank
dataset in SA tasks. However, these data have its limitations order of the scores rather than their absolute values
such as short text (tweets have a word limit), noise, and the [156]. It is defined as follows:
presence of sarcasm, irony, and slang, which can affect the 6 × di2
P
accuracy of SA models. In addition, the use of prelabeled ρ =1− 2 (19)
n × (n − 1)
datasets may not always accurately represent the sentiment
of the tweets, as the interpretation of sentiments can be very where n is the number of instances and di is the
subjective. difference between the rank of the predicted sentiment
To address these challenges, researchers have explored score and the rank of the actual sentiment score for ith
techniques for preprocessing and developing more robust SA instance.
models that can handle noisy and ambiguous data. In addi- 5) Kendall’s tau (τ ) is a measure of the strength of
tion, researchers have explored the use of active and transfer the association between predicted and actual sentiment
learning techniques to improve the efficiency and accuracy of scores, taking into account the number of pairwise
SA models with limited labeled data. disagreements [157]. It is defined as follows:
nc − nd
τ= (20)
B. Evaluation Metrics n × (n − 1)/2
In addition to the gold standard evaluation metrics, such where n is the number of instances, n c is the number
as precision [32], [62], [74], [142], recall [68], [72], [73], of concordant pairs, and n d is the number of discordant
F_score [65], [69], [137], [143], area under the curve (AUC) pairs.
[57], and receiver operating characteristic (ROC) curve [60], 6) Hamming loss (HL) measures the fraction of labels that
other performance metrics have been utilized in the literature are incorrectly predicted for a given set of instances [57],
to assess the performance of SA models. This section outlines [158]. It is defined as follows:
some of the commonly used evaluation metrics, along with n
1 X
their computation formulas. Table VIII presents an overview HL = × L(h(xi ), yi ) (21)
n i=1
of various performance metrics adopted in the literature to
evaluate SA models. where n is the number of instances, h(xi ) predicted
1) Cohen’s kappa (CK) is a measure of interannotator labels for instance i, yi is the true label for instance
agreement that accounts for chance agreement [57], i, and L is the loss function, which is typically defined
[152], [153]. It is defined as follows: as the number of labels that are different between the
predicted and true labels. In SA, each instance (e.g.,
Po − Pe
CK = (16) a tweet or a review) can be associated with multiple
1 − Pe sentiment labels, such as positive, negative, neutral,
where Po is the observed agreement and Pe is the or a combination of these. HL is used to evaluate
expected agreement. Po is calculated as the proportion the accuracy of a multilabel classifier in predicting the
of times the annotators agree, while Pe is calculated as correct sentiment labels for each instance.
the product of the marginal proportions of each label. 7) Jaccard index (JI) also known as the Jaccard similarity
2) Mean absolute error (MAE) is the average absolute coefficient or Jaccard similarity index and is an eval-
difference between predicted and actual sentiment scores uation metric used in SA and other natural language
[34], [154]. It is defined as follows: processing tasks to measure the similarity between two
1 X
n sets of labels [62]. It is defined as follows:
MAE = × | yi − e
yi | (17) | A∩B |
n i=1 JI = (22)
| A∪B |
where n is the number of instances, yi is the actual
where A is the set of labels assigned by the model, B is
sentiment score, and eyi is the predicted sentiment score
the set of true labels, and | A∩ B | and | A∪ B | represent
for instance i, respectively.
the size of the intersection and the union between A and
3) Root mean square error (RMSE) is the square root of
B, respectively. In SA, the JI is used to evaluate the
the average squared difference between predicted and
overlap between the predicted sentiment labels and the
actual sentiment scores [154], [155]. It is defined as true sentiment labels for each instance (e.g., a tweet or a
follows: review). JI’s higher values indicate better performance.
v
u
u1 X n 8) Matthews correlation coefficient (MCC) is an evalua-
RMSE = t × (yi − e
yi )2 (18) tion metric used in SA and other classification tasks to
n i=1 measure the quality of the predictions made by a model.
where n is the number of instances, yi is the actual It takes into account true positives (TPs), true negatives
sentiment score, and eyi is the predicted sentiment score (TNs), false positives (FPs), and false negatives (FNs)
for instance i, respectively. [57], [159]. It is defined as follows:
4) Spearman’s rank correlation coefficient (ρ) measures (TP × TN) − (FP × FN)
MCC = √ (23)
the strength of the association between predicted and (P × Q × R × S)
where P, Q, R, and S denote (TP + FP), (TP + FN), for text preprocessing, topic modeling, and similarity
(TN + FP), and (TN + FN), respectively. MCC ranges calculation.
from −1 (total disagreement between the predictions 7) Word2vec is a well-known algorithm used for word
and the true labels) to 1 (perfect agreement between embeddings, generating dense vector representations of
the predictions and the true labels), where higher values words that can be utilized as input to DL models for
indicate better performance. In SA, MCC is used to SA.
evaluate the overall performance of a binary classifier in 8) Global Vectors for Word Representation (GloVe) is
predicting the correct sentiment label for each instance another widely used algorithm for word embeddings.
(e.g., a tweet or a review). It relies on co-occurrence statistics and produces dense
9) Logarithmic loss (log loss) is used in binary clas- vector representations of words that capture semantic
sification tasks to measure the performance of a relationships.
probabilistic classifier in predicting the correct label for The earlier mentioned software tools and libraries provide
each instance. It calculates the difference between the a range of functionalities and resources for DL-based Twitter
predicted probabilities and the true binary labels, and SA. They are typically employed for preprocessing, fea-
penalizes high-confidence wrong predictions more than ture extraction, model building, and evaluation. Nevertheless,
low-confidence ones [57]. It is defined as follows: selecting tools or libraries should be carefully considered
1 X based on the research question and the problem at hand.
Log Loss = × (y ×log( p)+(1− y)×log(1− p))
n
(24) V. P RACTICAL A PPLICATIONS AND R ELATED
C ASE S TUDIES
where n is the total number of instances, y is the true
SA has garnered significant attention from the research
binary label (0 or 1), and p is the predicted probability
community due to its diverse use cases such as in social media,
of the positive class (i.e., the sentiment label “positive”).
business, politics, healthcare, and tourism domains, as illus-
trated in Fig. 16. This section highlights the various practical
C. Resources and Tools for SA business applications of SA through related real-world case
Over the past few years, SA on Twitter using DL has studies.
gained substantial attention. This has led to the development 1) Brand Reputation Management: SA can be used to
of various software tools and libraries that can be used to monitor the online reputation of brands. A company
implement and evaluate these models. In this section, the can leverage it to monitor social media and determine
commonly used software tools and libraries in state of the how customers are reacting to their products or ser-
art are compiled, as presented in Table IX. The majority of vices. This information can help them improve their
the implementations use Python 3.x, along with popular DL marketing strategies, create targeted campaigns, and ulti-
libraries such as PyTorch, Keras, and TensorFlow. In addition, mately increase sales. For instance, a study conducted by
some implementations also use the MATLAB platform. Ahmed and Rodríguez-Díaz [160] used SA to monitor
the online reputation of various airlines. The study
1) Natural Language Toolkit (NLTK) is a Python library
analyzed tweets containing airline names and catego-
that offers a comprehensive suite of tools and resources
rized the sentiments as positive, negative, or neutral.
for natural language processing, including functions for
The results provided insights to airlines for reputation
tokenization, stemming, SA, and text classification.
management and to improve their services. Another case
2) Scikit-learn is a Python-based ML library that offers
study is the analysis of customer reviews for online food
a wide range of supervised and unsupervised learning
delivery apps to identify areas of improvement in their
algorithms. It provides tools for text classification, SA,
and feature extraction. services [161].
3) TensorFlow is an open-source ML library created by 2) Structure Marketing Strategies: SA can be useful in for-
Google that provides a wide range of tools for building mulating marketing strategies and marketing forecasting.
and training DL models, such as CNNs, RNNs, and A study by Lehrer et al. [162] suggests a DL-based
transformers. technique to evaluate the polarity of sentiments on
4) PyTorch is an open-source ML library developed by Twitter at an hourly rate. The proposed method considers
Facebook. It offers tools for building and training DL mixed data sampling, resulting in a lower reduction of
models, including CNNs, RNNs, and transformers. It has past data, which makes it highly appropriate for this
gained popularity due to its user-friendliness and flexi- novel source of data.
bility. 3) Political Opinion Mining: Recently, SA has been
5) Keras is a high-level DL library that provides a increasingly used in politics to monitor public opinions
user-friendly API for building and training DL models. and identify the sentiment behind political campaigns.
It is built on top of TensorFlow and simplifies the It is useful for understanding and modeling voter behav-
process of building complex models. ior during political campaigns or activism and can even
6) Gensim is a Python library used for topic model- indicate the outcome of an election. In a case study
ing and natural language processing. It provides tools by Chandra and Saini [132], conducted on the U.S.
S UMMARY OF DATASETS
TABLE VII
(Continued.) S UMMARY OF DATASETS

TABLE VII
TABLE VIII Dandannavar [105] proposed a fast, scalable, and

S UMMARY OF C OMMONLY U SED P ERFORMANCE M ETRICS IN SA flexible SA model on the Twitter dataset that uses
Apache Spark and some ML models. Yasir et al. [34]
deployed a DNN model to forecast the interest rate of
five countries.
5) Finance Management: SA can help investors make better
decisions by providing insights into market sentiment.
For example, investors can use SA to analyze news
articles and social media to understand the sentiment
behind market movements. One such case study is
the analysis of the sentiment of tweets related to the
stock market and predicted changes in stock prices by
Swathi et al. [57].
6) Public Healthcare: SA can also be used to moni-
tor public health. For instance, a study conducted by
Reshi et al. [59] analyzed tweets related to COVID-19
and identified the areas where the outbreak was most
severe. The results were used to improve public health
policies.
7) Medical Services: Healthcare providers can use SA
to analyze patient feedback and determine areas for
improvement in their services. Opinion mining in
health-related contexts is explored in [163], where the
TABLE IX researcher offers new methods and a medical lexicon
S UMMARY OF T OOLS AND L IBRARIES to assist patients and experts in explaining diseases
and symptoms. The study used text processing and
traditional ML methods as well.
8) Disaster Assessment, Response, and Management: SA
can be used to analyze social media data during disasters
to identify the areas affected, assess the public senti-
ment toward the disaster, and provide real-time updates
to the public. A deep sentiment and activity analyzer
combined with a deep human count tracker is proposed
by Sadiq et al. [139] to track the number of people
present in disaster-related visual content and analyze
their sentiments.
9) Crime Prediction: SA can be used for the identifica-
tion and classification of potential criminal activities or
terrorist groups. Ahmad et al. [61] proposed a tweet
classification system using LSTM and CNN models to
categorize tweets into extremist or nonextremist groups.
Alotaibi et al. [68] developed an automatic cyberbullying
approach for detecting aggressive behavior on Twitter by
presidential election, SA was used to analyze Twitter utilizing a bidirectional GRU, CNN model, and trans-
data and identify the sentiment of the people toward the former block to catalog tweet sentiment as aggressive
candidates. The proposed framework for modeling U.S. or not aggressive.
general elections is based on LSTM and BERT models 10) Tourism Planning: Tourism is an important industry
to predict voter sentiment. Another study conducted by that is greatly influenced by public opinion. SA can
Ali et al. [32] proposed a DNN model for SA to predict be used to analyze user reviews and social media
2018 general election results in Pakistan using Twitter data to understand the satisfaction level of tourists and
opinions. identify areas for improvement. Combining geolocation
4) Customer Feedback Analysis: In the realm of information with SA can provide an effective plan for
e-commerce and business intelligence, organizations can tourist destinations. Paolanti et al. [164] proposed a
analyze customer feedback and reviews to understand DNN approach for finding the sentiment of a widely
the strengths and weaknesses of a business. SA is known tourism venue, Cilento in Southern Italy.
used to gain insights and opinions of users about 11) Recommendation Systems: SA can also benefit recom-
products or events and to gain a deep understanding mendation systems to offer personalized user recom-
of customer interests and industry trends. Jain and mendations. A hybrid CNN-LSTM model is suggested
by Jain et al. [66] to classify the sentiment of cus- 5) Limited Attention to Domain-Specific SA: The focus of
tomer reviews to further recommend user-personalized SA research has been predominantly on general SA
products. Preethi et al. [165] developed an RNN to models, with limited attention paid to domain-specific
analyze sentiments in reviews and improve movie and SA. There is a pressing need for models that can
restaurant recommendations. In addition, SA can also effectively analyze sentiment in specialized fields, such
aid in behavioral analysis in commodity markets [166]. as medical [163], financial [141], or legal data [68].
Future research should explore methods for developing
VI. R ESEARCH G APS AND F UTURE P ERSPECTIVES domain-specific SA models that can accurately capture
Though DL models have shown significant evolution and the nuances of sentiment within these specific contexts.
excellent outcomes in the area of SA, there exist several 6) Robustness and Reliability of Models: The robustness
research gaps and open challenges that need further explo- and reliability of SA models are an important area of
ration. This section discusses the current research gaps and research that needs to be addressed. The existing models
potential future directions for SA research. are not robust enough to handle sarcasm, irony, and
1) Decision-Making Tool: DL models find their usage figurative language, which are prevalent in social media
in various industries, including marketing, service, platforms, especially Twitter. The models often misin-
government, and academia, to analyze sentiment in terpret these nuances and produce inaccurate results.
decision-making problems. These models can be mod- Therefore, future research should focus on developing
ified and adopted to achieve high accuracy, taking into more robust models that can handle these language
account the complexities of textual analysis for practi- intricacies and improve the reliability of SA results [64],
cal applications. Numerous studies indicate that noisy [74]. In addition, research can also explore the impact
features may negatively impact classification outcomes; of linguistic and cultural differences on the accuracy of
hence, DL methods can be designed to optimize features SA models.
in an iterative process [7], [38], [120]. In addition, the 7) Interpretability of Models: It refers to the ability to
models can be improved to perform opinion mining, SA, understand the reasoning behind a model’s predic-
and topic detection simultaneously. tions. Most of the DL-based SA models are considered
2) Processing Short Sequences: Dealing with short black-box models because they operate on complex
sequences of social media text content that has varying computations and are difficult to interpret [37], [139],
content and background information is a challenging [141]. This makes it challenging for users to trust the
task. When it comes to processing such short sequences, model’s predictions and understand how they were gen-
dynamic routing is not as effective as static rout- erated. In recent years, there has been a growing interest
ing algorithms due to the variability of background in developing more transparent models, also known
details. However, this issue can be addressed by using as explainable AI, which can provide insight into the
attention-based CNs [56] along with dynamic routing reasoning behind their predictions. This approach could
algorithms to extract relations for text content processing help increase trust in the model and improve its use-
and SA. Moreover, integrating contextual embedding fulness by allowing users to understand and potentially
with capsule-based models can lead to better perfor- correct any biases or errors. Therefore, there is a need
mance as this technique has proven effective in most for research to develop more transparent SA models that
DL approaches. can provide explanations for their predictions.
3) Handling Large Datasets: One of the research gaps in 8) Performance Measures: Current evaluation metrics used
SA on Twitter is the need to handle large datasets. in SA research focus mainly on accuracy [32], [38].
DL models have shown promising results in SA on There is a need for more comprehensive evaluation met-
Twitter, but they require large datasets for effective rics that take into account the nuances of SA, other than
optimization of the model parameters. The current state- gold standard performance measures, such as precision,
of-the-art methods for SA on Twitter, such as those recall, and F1-score [51].
presented in [68] and [69], can be improved by apply- 9) Incorporating User Feedback: Developing SA models
ing these models to larger datasets. Therefore, future that can learn from user feedback by identifying user
research could focus on developing new methods to patterns and can adapt to dynamic user preferences
handle large datasets that can improve the performance may improve the accuracy of SA models. In addi-
of SA models on Twitter. tion, researchers can explore the use of interactive SA
4) Handling Data Sparseness: Handling data sparseness tools that allow users to provide feedback in real time,
refers to the challenge of building effective SA models enabling the model to adapt to changing sentiments and
when the dataset is limited or incomplete. DL models preferences [60], [69].
have shown promising results in SA tasks, but they 10) Integrating Multiple Modalities: SA has traditionally
require large datasets to perform well and optimize been limited to analyzing only textual data, but the
their parameters [116]. By doing so, it may be possible incorporation of multiple modalities such as audio,
to address the issue of data sparsity and improve the video, and images can provide richer information for
accuracy and generalization ability of SA models on SA [137]. Future research can focus on developing
Twitter. more advanced multimodal SA models that can integrate
Fig. 16. Twitter-based SA applications.
multiple modalities that could better capture the com- more instances of neutral or negative sentiments compared to
plexity and variability of human emotions and poten- positive sentiments. Also, the integration of SA with other
tially provide more accurate and nuanced results in NLP techniques, such as entity recognition and summarization,
real-world settings. could provide further insights and improve the performance
of these models. In essence, this survey demonstrates that DL
VII. C ONCLUSION methods have advanced SA on Twitter, and there is still much
To sum up, the potential of using DL for SA on Twitter room for further improvement.
has been widely recognized and has become an important
research field due to the vast amount of user-generated content.
R EFERENCES
This work provides a comprehensive overview of the latest
advances in DL techniques for SA on Twitter. This work [1] N. Girdhar and K. K. Bharadwaj, “Signed social networks: A survey,”
outlines various preprocessing steps and word embeddings in Advances in Computing and Data Sciences. Ghaziabad, India:
required for this task. The work presents a simplified taxonomy Springer, 2017, pp. 326–335.
[2] N. Girdhar and K. K. Bharadwaj, “Community detection in signed
that bifurcates the literature into two categories: conventional social networks using multiobjective genetic algorithm,” J. Assoc. Inf.
(lexicon and ML) and DL approaches, along with their Sci. Technol., vol. 70, no. 8, pp. 788–804, Aug. 2019.
respective pros and cons. In addition, this article summarizes [3] N. Girdhar, S. Minz, and K. K. Bharadwaj, “Link prediction in signed
various practical applications of SA and identifies research social networks based on fuzzy computational model of trust and
distrust,” Soft Comput., vol. 23, no. 22, pp. 12123–12138, Nov. 2019.
gaps and domain-specific challenges. Furthermore, various [4] N. Girdhar and K. K. Bharadwaj, “Social status computation for
metrics adopted by different studies to evaluate the models’ nodes of overlapping communities in directed signed social networks,”
performance are also reviewed in this work. in Integrated Intelligent Computing, Communication and Security.
Singapore: Springer, 2019, pp. 49–57.
Overall, DL-based methods have shown great promise in SA
[5] N. Girdhar and K. K. Bharadwaj, “Friends recommender system
on Twitter, as they can capture complex language patterns and based on status (StatusFRS) for users of overlapping communities
handle the noise and sparsity of data. Moreover, techniques in directed,” in Applications of Artificial Intelligence Techniques in
such as fine-tuning and transfer learning have proved effective Engineering: SIGMA, vol. 698. Singapore: Springer, 2018, p. 225.
[6] N. Girdhar and K. K. Bharadwaj, “Mining of influencers in signed
in adapting pretrained models to Twitter-specific datasets. social networks: A memetic approach,” in Intelligent Human Computer
However, there are still several challenges that need to be Interaction. Allahabad, India: Springer, 2018, pp. 306–316.
addressed in SA on Twitter. One of the main challenges is [7] F. A. Lovera, Y. C. Cardinale, and M. N. Homsi, “Sentiment analysis
in Twitter based on knowledge graph and deep learning classification,”
dealing with the noise and complexity of Twitter data, such as Electronics, vol. 10, no. 22, p. 2739, Nov. 2021.
short text, spelling mistakes, abbreviations, slang, and emojis. [8] E. Cambria, “Affective computing and sentiment analysis,” IEEE Intell.
Another challenge is the imbalance of sentiments in data, with Syst., vol. 31, no. 2, pp. 102–107, Mar. 2016.
[9] E. Chu and D. Roy, “Audio-visual sentiment analysis for learning [32] H. Ali, H. Farman, H. Yar, Z. Khan, S. Habib, and A. Ammar, “Deep
emotional arcs in movies,” in Proc. IEEE Int. Conf. Data Mining learning-based election results prediction using Twitter activity,” Soft
(ICDM), Nov. 2017, pp. 829–834. Comput., vol. 26, no. 16, pp. 7535–7543, Aug. 2022.
[10] J. C. De Albornoz, L. Plaza, P. Gervás, and A. Díaz, “A joint model [33] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning.
of feature mining and sentiment analysis for product review rating,” Cambridge, MA, USA: MIT Press, 2016.
in Advances in Information Retrieval. Dublin, Ireland: Springer, 2011, [34] M. Yasir et al., “An efficient deep learning based model to predict
pp. 55–66. interest rate using Twitter sentiment,” Sustainability, vol. 12, no. 4,
[11] M. A. Mirtalaie, O. K. Hussain, E. Chang, and F. K. Hussain, p. 1660, Feb. 2020.
“Sentiment analysis of specific product’s features using product tree [35] F. Y. Zhou, L. P. Jin, and J. Dong, “A review of convolutional
for application in new product development,” in Advances in Intelligent neural networks,” J. Comput., vol. 40, pp. 1229–1251, 2017.
Networking and Collaborative Systems. Berlin, Germany: Springer, [Online]. Available: https://www.scirp.org/(S(vtj3fa45qm1ean45%
2018, pp. 82–95. 20vvffcz55))/reference/referencespapers.aspx?referenceid=3111121
[12] D. J. S. Oliveira, P. H. D. S. Bermejo, and P. A. dos Santos, “Can social [36] Y. Li and H. Dong, “Text emotion analysis based on CNN and BiLSTM
media reveal the preferences of voters? A comparison between senti- network feature fusion,” Comput. Appl., vol. 38, no. 11, pp. 29–34,
ment analysis and traditional opinion polls,” J. Inf. Technol. Politics, 2018.
[37] A. Feizollah, S. Ainin, N. B. Anuar, N. A. B. Abdullah, and M. Hazim,
vol. 14, no. 1, pp. 34–45, Jan. 2017.
“Halal products on Twitter: Data extraction and sentiment analy-
[13] S. Rosenthal, N. Farra, and P. Nakov, “SemEval-2017 task 4: Sentiment sis using stack of deep learning algorithms,” IEEE Access, vol. 7,
analysis in Twitter,” 2019, arXiv:1912.00741. pp. 83354–83362, 2019.
[14] V. S. Pagolu, K. N. Reddy, G. Panda, and B. Majhi, “Sentiment analysis [38] B. Wang, D. Shan, A. Fan, L. Liu, and J. Gao, “A sentiment clas-
of Twitter data for predicting stock market movements,” in Proc. Int. sification method of web social media based on multidimensional
Conf. Signal Process., Commun., Power Embedded Syst. (SCOPES), and multilevel modeling,” IEEE Trans. Ind. Informat., vol. 18, no. 2,
Oct. 2016, pp. 1345–1350. pp. 1240–1249, Feb. 2022.
[15] T. Chalothorn and J. Ellman, “Using SentiWordNet and sentiment [39] G. Xu, Y. Meng, X. Qiu, Z. Yu, and X. Wu, “Sentiment analysis of com-
analysis for detecting radical content on web forums,” Northumbria ment texts based on BiLSTM,” IEEE Access, vol. 7, pp. 51522–51532,
Univ. Newcastle, Newcastle upon Tyne, U.K., Tech. Rep., 2012. 2019.
[Online]. Available: https://nrl.northumbria.ac.uk/id/eprint/13075/ [40] A. Sherstinsky, “Fundamentals of recurrent neural network (RNN)
[16] N. D. Gitari, Z. Zhang, H. Damien, and J. Long, “A lexicon-based and long short-term memory (LSTM) network,” Phys. D, Nonlinear
approach for hate speech detection,” Int. J. Multimedia Ubiquitous Phenomena, vol. 404, Mar. 2020, Art. no. 132306.
Eng., vol. 10, no. 4, pp. 215–230, Apr. 2015. [41] G. Liu and J. Guo, “Bidirectional LSTM with attention mechanism and
[17] M. Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede, “Lexicon- convolutional layer for text classification,” Neurocomputing, vol. 337,
based methods for sentiment analysis,” Comput. Linguistics, vol. 37, pp. 325–338, Apr. 2019.
no. 2, pp. 267–307, Jun. 2011. [42] A. Drif and K. Hadjoudj, “An opinion spread prediction model with
Twitter emotion analysis during Algeria’s Hirak,” Comput. J., vol. 64,
[18] V. A. Kharde and P. Sheetal. Sonawane, “Sentiment analysis of Twitter
no. 1, pp. 358–368, Nov. 2019.
data: A survey of techniques,” 2016, arXiv:1601.06971.
[43] E. Zhu, J. Wu, H. Liu, and K. Li, “A sentiment index
[19] M. A. Al-Khalisy and H. B. Jehlol, “Terrorist affiliations identifying of the housing market: Text mining of narratives on social
through Twitter social media analysis using data mining and web media,” tsinghua Univ., China, Apr. 2018. [Online]. Available:
mapping techniques,” J. Eng. Appl. Sci., vol. 13, no. 17, pp. 7459–7464, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3223566
2018. [44] A. S. Imran, S. M. Daudpota, Z. Kastrati, and R. Batra, “Cross-
[20] E. Ferrara, W.-Q. Wang, O. Varol, A. Flammini, and A. Galstyan, “Pre- cultural polarity and emotion detection using sentiment analysis and
dicting online extremism, content adopters, and interaction reciprocity,” deep learning on COVID-19 related tweets,” IEEE Access, vol. 8,
in Social Informatics. Bellevue, WA, USA: Springer, 2016, pp. 22–39. pp. 181074–181090, 2020.
[21] M. Kumar, R. Bhatia, and D. Rattan, “A survey of web crawlers for [45] A. Mittal and S. Patidar, “Sentiment analysis on Twitter data: A survey,”
information retrieval,” WIREs Data Mining Knowl. Discovery, vol. 7, in Proc. 7th Int. Conf. Comput. Commun. Manag., 2019, pp. 91–95.
no. 6, p. e1218, Nov. 2017. [46] N. F. F. D. Silva, L. F. S. Coletta, and E. R. Hruschka, “A survey
[22] S. Mansour, “Social media analysis of user’s responses to terrorism and comparative study of tweet sentiment analysis via semi-supervised
using sentiment analysis and text mining,” Proc. Comput. Sci., vol. 140, learning,” ACM Comput. Surv., vol. 49, no. 1, pp. 1–26, Mar. 2017.
pp. 95–103, Jan. 2018. [47] N. Azzouza, K. Akli-Astouati, A. Oussalah, and S. A. Bachir, “A real-
[23] T. Simon, A. Goldberg, L. Aharonson-Daniel, D. Leykin, and B. Adini, time Twitter sentiment analysis using an unsupervised method,” in
“Twitter in the cross fire—The use of social media in the westgate Proc. 7th Int. Conf. Web Intell., Mining Semantics, Jun. 2017, pp. 1–10.
mall terror attack in Kenya,” PLoS ONE, vol. 9, no. 8, Aug. 2014, [48] A. Ligthart, C. Catal, and B. Tekinerdogan, “Systematic reviews in
Art. no. e104136. sentiment analysis: A tertiary study,” Artif. Intell. Rev., vol. 54, no. 7,
[24] Y. Wei, L. Singh, and S. Martin, “Identification of extremism on pp. 4997–5053, Oct. 2021.
Twitter,” in Proc. IEEE/ACM Int. Conf. Adv. Social Netw. Anal. Mining [49] M. Wankhade, A. C. S. Rao, and C. Kulkarni, “A survey on sentiment
(ASONAM), Aug. 2016, pp. 1251–1255. analysis methods, applications, and challenges,” Artif. Intell. Rev.,
[25] B. S. Iskandar, “Terrorism detection based on sentiment analysis using vol. 55, no. 7, pp. 5731–5780, Oct. 2022.
machine learning,” J. Eng. Appl. Sci., vol. 12, no. 3, pp. 691–698, 2017. [50] R. Das and T. D. Singh, “Multimodal sentiment analysis: A survey of
[26] T. B. Mirani and S. Sasi, “Sentiment analysis of ISIS related tweets methods, trends, and challenges,” ACM Comput. Surv., vol. 55, no. 13s,
using absolute location,” in Proc. Int. Conf. Comput. Sci. Comput. pp. 1–38, Dec. 2023.
Intell. (CSCI), Dec. 2016, pp. 1140–1145. [51] A. Yadav and D. K. Vishwakarma, “Sentiment analysis using deep
learning architectures: A review,” Artif. Intell. Rev., vol. 53, no. 6,
[27] I. Fadel and C. Öz, “A sentiment analysis model for terrorist
pp. 4335–4385, Aug. 2020.
attacks reviews on Twitter,” Sakarya Univ. J. Sci., vol. 24, no. 6, [52] S. Sharma and A. Jain, “Role of sentiment analysis in social media
pp. 1294–1302, Dec. 2020. security and analytics,” WIREs Data Mining Knowl. Discovery, vol. 10,
[28] L. G. E. Smith, L. Wakeford, T. F. Cribbin, J. Barnett, and W. K. Hou, no. 5, p. e1366, Sep. 2020.
“Detecting psychological change through mobilizing interactions and [53] S. Soni and A. Sharaff, “Sentiment analysis of customer reviews based
changes in extremist linguistic style,” Comput. Hum. Behav., vol. 108, on hidden Markov model,” in Proc. Int. Conf. Adv. Res. Comput. Sci.
Jul. 2020, Art. no. 106298. Eng. Technol. (ICARCSET), Mar. 2015, pp. 1–5.
[29] A. Aleroud, N. Abu-Alsheeh, and E. Al-Shawakfa, “A graph proximity [54] P. N. Jain and A. S. Vaidya, “Analysis of social media based on
feature augmentation approach for identifying accounts of terrorists on terrorism-a review,” Vietnam J. Comput. Sci., vol. 8, no. 1, pp. 1–21,
Twitter,” Comput. Secur., vol. 99, Dec. 2020, Art. no. 102056. 2021.
[30] W. Li, K. Guo, Y. Shi, L. Zhu, and Y. Zheng, “DWWP: Domain-specific [55] N. Mehra, S. Khandelwal, and P. Patel, “Sentiment
new words detection and word propagation system for sentiment analy- identification using maximum entropy analysis of movie
sis in the tourism domain,” Knowl.-Based Syst., vol. 146, pp. 203–214, reviews,” Stanford Univ., Stanford, CA, USA, 2002. [Online].
Apr. 2018. Available: https://www.semanticscholar.org/paper/Sentiment-
[31] A. Hussain and E. Cambria, “Semi-supervised learning for big social Identification-Using-Maximum-Entropy-of-Mehra-
data analysis,” Neurocomputing, vol. 275, pp. 1662–1673, Jan. 2018. Khandelwal/bd5b49e12cdc1b922a28b122acf528199be4bf8d
[56] P. Demotte, K. Wijegunarathna, D. Meedeniya, and I. Perera, [79] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean,
“Enhanced sentiment extraction architecture for social media content “Distributed representations of words and phrases and their compo-
analysis using capsule networks,” Multimedia tools Appl., vol. 82, sitionality,” in Proc. Adv. Neural Inf. Process. Syst., vol. 26, 2013,
pp. 8665–8690, Sep. 2021. pp. 1–9.
[57] T. Swathi, N. Kasiviswanath, and A. A. Rao, “An optimal deep [80] J. Pennington, R. Socher, and C. Manning, “GloVe: Global vectors for
learning-based LSTM for stock price prediction using Twitter sentiment word representation,” in Proc. Conf. Empirical Methods Natural Lang.
analysis,” Int. J. Speech Technol., vol. 52, no. 12, pp. 13675–13688, Process. (EMNLP), 2014, pp. 1532–1543.
Sep. 2022. [81] T. Shi and Z. Liu, “Linking GloVe with word2vec,” 2014,
[58] P. Mehta, S. Pandya, and K. Kotecha, “Harvesting social media arXiv:1411.5595.
sentiment analysis to enhance stock market prediction using deep [82] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word
learning,” PeerJ Comput. Sci., vol. 7, p. e476, Apr. 2021. vectors with subword information,” Trans. Assoc. Comput. Linguistics,
[59] A. A. Reshi et al., “COVID-19 vaccination-related sentiments analysis: vol. 5, pp. 135–146, Dec. 2017.
A case study using worldwide Twitter dataset,” Healthcare, vol. 10, [83] A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, and
no. 3, p. 411, 2022. T. Mikolov, “FastText.Zip: Compressing text classification models,”
[60] H. A. Shehu et al., “Deep sentiment analysis: A case study on 2016, arXiv:1612.03651.
stemmed Turkish Twitter data,” IEEE Access, vol. 9, pp. 56836–56854, [84] J. Xu and Q. Du, “A deep investigation into fastText,” in Proc. IEEE
2021. 21st Int. Conf. High Perform. Comput. Commun., IEEE 17th Int. Conf.
[61] S. Ahmad, M. Z. Asghar, F. M. Alotaibi, and I. Awan, “Detection Smart City, IEEE 5th Int. Conf. Data Sci. Syst. (HPCC/SmartCity/DSS),
and classification of social media-based extremist affiliations using Aug. 2019, pp. 1714–1719.
sentiment analysis techniques,” Hum.-centric Comput. Inf. Sci., vol. 9, [85] M. E. Peters et al., “Deep contextualized word representations,” 2018,
no. 1, pp. 1–23, Dec. 2019. arXiv:1802.05365.
[62] M. Jabreel and A. Moreno, “A deep learning-based approach for multi- [86] F. Sebastiani and A. Esuli, “SentiWordNet: A publicly available lexical
label emotion classification in tweets,” Appl. Sci., vol. 9, no. 6, p. 1123, resource for opinion mining,” in Proc. 5th Int. Conf. Lang. Resour. Eval.
Mar. 2019. Genoa, Italy: European Language Resources Association (ELRA),
[63] J. G. D. Harb, R. Ebeling, and K. Becker, “A framework to ana- 2006, pp. 417–422.
lyze the emotional reactions to mass violent events on Twitter and [87] F. Å. Nielsen, “A new ANEW: Evaluation of a word list for sentiment
influential factors,” Inf. Process. Manag., vol. 57, no. 6, Nov. 2020, analysis in microblogs,” 2011, arXiv:1103.2903.
Art. no. 102372. [88] C. Hutto and E. Gilbert, “VADER: A parsimonious rule-based model
[64] M. A. Paredes-Valverde, R. Colomo-Palacios, M. D. P. Salas-Zarate, for sentiment analysis of social media text,” in Proc. Int. AAAI Conf.
and R. Valencia-García, “Sentiment analysis in Spanish for improve- Web Social Media, 2014, vol. 8, no. 1, pp. 216–225.
ment of products and services: A deep learning approach,” Sci. [89] W. Medhat, A. Hassan, and H. Korashy, “Sentiment analysis algo-
Programming, vol. 2017, Oct. 2017, Art. no. 1329281. rithms and applications: A survey,” Ain Shams Eng. J., vol. 5, no. 4,
[65] Z. Jalil et al., “COVID-19 related sentiment analysis using state-of-the- pp. 1093–1113, Dec. 2014.
art machine learning and deep learning techniques,” Frontiers Public [90] A. Jurek, M. D. Mulvenna, and Y. Bi, “Improved lexicon-based
Health, vol. 9, p. 2276, Jan. 2022. sentiment analysis for social media analytics,” Secur. Informat., vol. 4,
[66] P. K. Jain, V. Saravanan, and R. Pamula, “A hybrid CNN-LSTM: no. 1, pp. 1–13, Dec. 2015.
A deep learning approach for consumer sentiment analysis using
[91] D. Pinto, A. McCallum, X. Wei, and W. B. Croft, “Table extraction
qualitative user-generated contents,” ACM Trans. Asian Low-Resource
using conditional random fields,” in Proc. 26th Annu. Int. ACM SIGIR
Lang. Inf. Process., vol. 20, no. 5, pp. 1–15, Sep. 2021.
Conf. Res. Develop. informaion Retr., Jul. 2003, pp. 235–242.
[67] A. S. M. Alharbi and E. de Doncker, “Twitter sentiment analysis with
[92] M. Rezwanul, A. Ali, and A. Rahman, “Sentiment analysis on Twitter
a deep neural network: An enhanced approach using user behavioral
data using KNN and SVM,” Int. J. Adv. Comput. Sci. Appl., vol. 8,
information,” Cognit. Syst. Res., vol. 54, pp. 50–61, May 2019.
no. 6, pp. 1–7, 2017.
[68] M. Alotaibi, B. Alotaibi, and A. Razaque, “A multichannel deep
[93] L. Zhang, R. Ghosh, M. Dekhil, M. Hsu, and B. Liu, “Combining
learning framework for cyberbullying detection on social media,”
lexicon-based and learning-based methods for Twitter sentiment anal-
Electronics, vol. 10, no. 21, p. 2664, Oct. 2021.
[69] G. Abdalla and F. Özyurt, “Sentiment analysis of fast food companies ysis,” HP Lab., Tech. Rep. HPL-2011, 2011, pp. 1–8.
with deep learning models,” Comput. J., vol. 64, no. 1, pp. 383–390, [94] P. Palanisamy, V. Yadav, and H. Elchuri, “Serendio: Simple and
Nov. 2019. practical lexicon based approach to sentiment analysis,” in Proc. 2nd
[70] M. Umer, I. Ashraf, A. Mehmood, S. Kumari, S. Ullah, and G. S. Choi, Joint Conf. Lexical Comput. Semantics, 7th Int. Workshop Semantic
“Sentiment analysis of tweets using a unified convolutional neural Eval., 2013, pp. 543–548.
network-long short-term memory network model,” Comput. Intell., [95] B. Agarwal, S. Poria, N. Mittal, A. Gelbukh, and A. Hussain, “Concept-
vol. 37, no. 1, pp. 409–434, Feb. 2021. level sentiment analysis with dependency-based semantic parsing:
[71] C. P. D. Cyril, J. R. Beulah, N. Subramani, P. Mohan, A novel approach,” Cognit. Comput., vol. 7, no. 4, pp. 487–499,
A. Harshavardhan, and D. Sivabalaselvamani, “An automated learn- Aug. 2015.
ing model for sentiment analysis and data classification of Twitter [96] R. D. Desai, “Sentiment analysis of Twitter data,” in Proc. 2nd Int.
data using balanced CA-SVM,” Concurrent Eng., vol. 29, no. 4, Conf. Intell. Comput. Control Syst. (ICICCS), Jun. 2018, pp. 114–117.
pp. 386–395, Dec. 2021. [97] M. Mashuri, “Sentiment analysis in Twitter using lexicon based and
[72] A. R. Pathak, M. Pandey, and S. Rautaray, “Topic-level sentiment polarity multiplication,” in Proc. Int. Conf. Artif. Intell. Inf. Technol.
analysis of social media data using deep learning,” Appl. Soft Comput., (ICAIIT), Mar. 2019, pp. 365–368.
vol. 108, Sep. 2021, Art. no. 107440. [98] S. Rani and J. Singh, “Sentiment analysis of tweets using support vector
[73] C. I. Eke, A. A. Norman, and L. Shuib, “Context-based feature machine,” Int. J. Comput. Sci. Mobile Appl., vol. 5, no. 10, pp. 83–91,
technique for sarcasm identification in benchmark datasets using deep 2017.
learning and BERT model,” IEEE Access, vol. 9, pp. 48501–48518, [99] A. L. F. Alves, C. D. S. Baptista, A. A. Firmino, M. G. D. Oliveira, and
2021. A. C. D. Paiva, “A comparison of SVM versus naive-bayes techniques
[74] J. Bedi and D. Toshniwal, “CitEnergy: A BERT based model to analyse for sentiment analysis in tweets: A case study with the 2013 FIFA
Citizens’ energy-tweets,” Sustain. Cities Soc., vol. 80, May 2022, confederations cup,” in Proc. 20th Brazilian Symp. Multimedia Web,
Art. no. 103706. Brazil, Nov. 2014, pp. 123–130.
[75] Y. Liu and M. Zhang, Neural Network Methods for Natural Language [100] H. Wu, J. Li, and J. Xie, “Maximum entropy-based sentiment analysis
Processing. Cambridge, MA, USA: MIT Press, 2018. of online product reviews in Chinese,” in Automotive, Mechanical
[76] E. Rudkowsky, M. Haselmayer, M. Wastian, M. Jenny, S. Emrich, and Electrical Engineering. Boca Raton, FL, USA: CRC Press, 201,
and M. Sedlmair, “More than bags of words: Sentiment analysis with pp. 559–562.
word embeddings,” Commun. Methods Measures, vol. 12, nos. 2–3, [101] E. Omer, “Using machine learning to identify Jihadist
pp. 140–157, Apr. 2018. messages on Twitter,” Uppsala Univ., Uppsala, Sweden, 2015.
[77] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation [Online]. Available: https://www.diva-portal.org/smash/get/
of word representations in vector space,” 2013, arXiv:1301.3781. diva2:846343/FULLTEXT01.pdf
[78] Y. Goldberg and O. Levy, “word2vec explained: Deriving [102] L. Kaati, E. Omer, N. Prucha, and A. Shrestha, “Detecting multipliers
Mikolov et al.’s negative-sampling word-embedding method,” of jihadism on Twitter,” in Proc. IEEE Int. Conf. Data Mining Work-
2014, arXiv:1402.3722. shop (ICDMW), Nov. 2015, pp. 954–960.
[103] M. Nouh, J. R. C. Nurse, and M. Goldsmith, “Understanding the [127] A. Bello, S.-C. Ng, and M.-F. Leung, “A BERT framework to sentiment
radical mind: Identifying signals to detect extremist content on Twitter,” analysis of tweets,” Sensors, vol. 23, no. 1, p. 506, Jan. 2023.
in Proc. IEEE Int. Conf. Intell. Secur. Informat. (ISI), Jul. 2019, [128] F. Barbieri, J. Camacho-Collados, L. Espinosa Anke, and L. T. Neves,
pp. 98–103. “Unified benchmark and comparative evaluation for tweet classifica-
[104] A. Omar, T. M. Mahmoud, T. Abd-El-Hafeez, and A. Mahfouz, “Multi- tion,” in Proc. Findings Assoc. Comput. Linguistics, Nov. 2020, pp.
label Arabic text classification in online social networks,” Inf. Syst., 1644–1650. [Online]. Available: https://aclanthology.org/2020.findings-
vol. 100, Sep. 2021, Art. no. 101785. emnlp.148/
[105] A. P. Jain and P. Dandannavar, “Application of machine learning [129] Y. Cui, Y. Jiang, and H. Gu, “Novel sentiment analysis from Twitter
techniques to sentiment analysis,” in Proc. 2nd Int. Conf. Appl. Theor. for stock change prediction,” in Data Mining and Big Data. Beijing,
Comput. Commun. Technol. (iCATccT), Jul. 2016, pp. 628–632. China: Springer, 2023, pp. 160–172.
[106] M. Hartung, R. Klinger, F. Schmidtke, and L. Vogel, “Identifying right- [130] M. Ul Haque, I. Dharmadasa, Z. T. Sworna, R. N. Rajapakse, and
wing extremism in German Twitter profiles: A classification approach,” H. Ahmad, “‘I think this is the most disruptive technology’: Exploring
in Natural Language Processing and Information Systems. Liège, sentiments of ChatGPT early adopters using Twitter data,” 2022,
Belgium: Springer, 2017, pp. 320–325. arXiv:2212.05856.
[107] Z. Ul Rehman et al., “Understanding the language of ISIS: An empir- [131] F. K. Khattak, S. Jeblee, C. Pou-Prom, M. Abdalla, C. Meaney, and
ical approach to detect radical content on Twitter using machine F. Rudzicz, “A survey of word embeddings for clinical text,” J. Biomed.
learning,” Comput., Mater. Continua, vol. 66, no. 2, pp. 1075–1090, Informat., vol. 100, Jan. 2019, Art. no. 100057.
2021. [132] R. Chandra and R. Saini, “Biden vs trump: Modeling u s gen-
[108] W. Sharif et al., “An empirical approach for extreme behavior iden- eral elections using BERT language model,” IEEE Access, vol. 9,
tification through tweets using machine learning,” Appl. Sci., vol. 9, pp. 128494–128505, 2021.
no. 18, p. 3723, Sep. 2019. [133] M. E. Basiri, S. Nemati, M. Abdar, S. Asadi, and U. R. Acharrya,
[109] M. A. Masood and R. A. Abbasi, “Using graph embedding and machine “A novel fusion-based deep learning model for sentiment analysis
learning to identify rebels on Twitter,” J. Informetrics, vol. 15, no. 1, of COVID-19 tweets,” Knowl.-Based Syst., vol. 228, Sep. 2021,
Feb. 2021, Art. no. 101121. Art. no. 107242.
[110] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural [134] J. Yang, X. Zou, W. Zhang, and H. Han, “Microblog sentiment analysis
Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997. via embedding social contexts into an attentive LSTM,” Eng. Appl.
[111] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre- Artif. Intell., vol. 97, Jan. 2021, Art. no. 104048.
training of deep bidirectional transformers for language understanding,” [135] A. Alsayat, “Improving sentiment analysis for social media applications
2018, arXiv:1810.04805. using an ensemble deep learning language model,” Arabian J. Sci. Eng.,
[112] Y. Liu et al., “RoBERTa: A robustly optimized BERT pretraining vol. 47, no. 2, pp. 2499–2511, Feb. 2022.
approach,” 2019, arXiv:1907.11692. [136] P. Wu, X. Li, S. Shen, and D. He, “Social media opinion summarization
[113] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and using emotion cognition and convolutional neural networks,” Int. J. Inf.
Q. V. Le, “XLNet: Generalized autoregressive pretraining for language Manag., vol. 51, Apr. 2020, Art. no. 101978.
understanding,” in Proc. Adv. Neural Inf. Process. Syst., vol. 32, 2019, [137] G. Chandrasekaran, N. Antoanela, G. Andrei, C. Monica, and
pp. 1–11. J. Hemanth, “Visual sentiment analysis using deep learning models
[114] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, with social media data,” Appl. Sci., vol. 12, no. 3, p. 1030, Jan. 2022.
“Improving language understanding by generative pre- [138] H.-T. Nguyen and L.-M. Nguyen, “ILWAANet: An interactive
training,” OpenAI, San Francisco, CA, USA, 2018. [Online]. lexicon-aware word-aspect attention network for aspect-level senti-
Available: https://www.semanticscholar.org/paper/Improving- ment classification on social networking,” Exp. Syst. Appl., vol. 146,
Language-Understanding-by-Generative-Radford- May 2020, Art. no. 113065.
Narasimhan/cd18800a0fe0b668a1cc19f2ec95b5003d0a5035 [139] A. M. Sadiq, H. Ahn, and Y. B. Choi, “Human sentiment and activity
[115] J. Feng, X. He, Q. Teng, C. Ren, H. Chen, and Y. Li, “Reconstruction recognition in disaster situations using social media images based on
of porous media from extremely limited information using conditional deep learning,” Sensors, vol. 20, no. 24, p. 7115, Dec. 2020.
generative adversarial networks,” Phys. Rev. E, Stat. Phys. Plasmas Flu- [140] S. Visweswaran et al., “Machine learning classifiers for Twitter surveil-
ids Relat. Interdiscip. Top., vol. 100, no. 3, Sep. 2019, Art. no. 033308. lance of vaping: Comparative machine learning study,” J. Med. Internet
[116] P. Zola, P. Cortez, C. Ragno, and E. Brentari, “Social media cross- Res., vol. 22, no. 8, Aug. 2020, Art. no. e17478.
source and cross-domain sentiment classification,” Int. J. Inf. Technol. [141] M. U. Salur and I. Aydin, “A novel hybrid deep learning model for
Decis. Making, vol. 18, no. 5, pp. 1469–1499, Sep. 2019. sentiment classification,” IEEE Access, vol. 8, pp. 58080–58093, 2020.
[117] H. Gu, Y. Wang, S. Hong, and G. Gui, “Blind channel identification [142] C. Singh, T. Imam, S. Wibowo, and S. Grandhi, “A deep learning
aided generalized automatic modulation recognition based on deep approach for sentiment analysis of COVID-19 reviews,” Appl. Sci.,
learning,” IEEE Access, vol. 7, pp. 110722–110729, 2019. vol. 12, no. 8, p. 3709, Apr. 2022.
[118] W. Fang et al., “A LSTM algorithm estimating pseudo measurements [143] B. A. Galende, G. Hernández-Peñaloza, S. Uribe, and F. A. García,
for aiding INS during GNSS signal outages,” Remote Sens., vol. 12, “Conspiracy or not? A deep learning approach to spot it on Twitter,”
no. 2, p. 256, Jan. 2020. IEEE Access, vol. 10, pp. 38370–38378, 2022.
[119] X.-H. Le, H. V. Ho, G. Lee, and S. Jung, “Application of long short- [144] N. C. Dang, M. N. Moreno-García, and F. De la Prieta, “Sentiment
term memory (LSTM) neural network for flood forecasting,” Water, analysis based on deep learning: A comparative study,” Electron.,
vol. 11, no. 7, p. 1387, Jul. 2019. vol. 9, no. 3, p. 483, 2020.
[120] S. Tam, R. B. Said, and Ö. Ö. Tanriöver, “A ConvBiLSTM deep [145] L. A. Ngoge, “Real-time sentiment analysis for detection of terrorist
learning model-based approach for Twitter sentiment classification,” activities in Kenya,” Ph.D. dissertation, Fac. Inf. Technol., Strathmore
IEEE Access, vol. 9, pp. 41283–41293, 2021. Univ., Nairobi, Kenya, 2016.
[121] I. K. Ihianle, A. O. Nwajana, S. H. Ebenuwa, R. I. Otuka, K. Owa, [146] I. Gupta and N. Joshi, “Enhanced Twitter sentiment analysis using
and M. O. Orisatoki, “A deep learning approach for human activities hybrid approach and by accounting local contextual semantic,” J. Intell.
recognition from multimodal sensing devices,” IEEE Access, vol. 8, Syst., vol. 29, no. 1, pp. 1611–1625, Dec. 2019.
pp. 179028–179038, 2020. [147] J. Du, J. Xu, H.-Y. Song, and C. Tao, “Leveraging machine
[122] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural net- learning-based approaches to assess human papillomavirus vaccination
works,” IEEE Trans. Signal Process., vol. 45, no. 11, pp. 2673–2681, sentiment trends with Twitter data,” BMC Med. Informat. Decis.
1997. Making, vol. 17, no. S2, pp. 63–70, Jul. 2017.
[123] G. Blanco and A. Lourenco, “Optimism and pessimism analysis [148] M. Wang and G. Hu, “A novel method for Twitter sentiment analysis
using deep learning on COVID-19 related Twitter conversations,” Inf. based on attentional-graph neural network,” Information, vol. 11, no. 2,
Process. Manag., vol. 59, no. 3, 2022, Art. no. 1029182. p. 92, Feb. 2020.
[124] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation [149] W. Liao, B. Zeng, J. Liu, P. Wei, X. Cheng, and W. Zhang, “Multi-
of gated recurrent neural networks on sequence modeling,” 2014, level graph neural network for text sentiment analysis,” Comput. Electr.
arXiv:1412.3555. Eng., vol. 92, Jun. 2021, Art. no. 107096.
[125] F. Chollet, Deep Learning With Python. New York, NY, USA: Simon [150] M. Aflakparast, M. de Gunst, and W. van Wieringen, “Analysis of
and Schuster, 2021. Twitter data with the Bayesian fused graphical lasso,” PLoS ONE,
[126] S. Mann, J. Arora, M. Bhatia, R. Sharma, and R. Taragi, “Twitter vol. 15, no. 7, Jul. 2020, Art. no. e0235596.
sentiment analysis using enhanced BERT,” in Intelligent Systems and [151] A. Go, R. Bhayani, and L. Huang, “Twitter sentiment classification
Applications. Berlin, Germany: Springer, 2023, pp. 263–271. using distant supervision,” Stanford, vol. 1, no. 12, p. 2009, 2009.
[152] J. Cohen, “A coefficient of agreement for nominal scales,” Educ. [160] A. Z. Ahmed and M. Rodríguez-Díaz, “Significant labels in sentiment
Psychol. Meas., vol. 20, no. 1, pp. 37–46, Apr. 1960. analysis of online customer reviews of airlines,” Sustainability, vol. 12,
[153] J. Carletta, “Assessing agreement on classification tasks: The Kappa no. 20, p. 8683, 2020.
statistic,” 1996, arXiv:preprint cmp-lg/9602004. [161] S. K. Trivedi and A. Singh, “Twitter sentiment analysis of app based
[154] C. Willmott and K. Matsuura, “Advantages of the mean absolute error online food delivery companies,” Global Knowl., Memory Commun.,
(MAE) over the root mean square error (RMSE) in assessing average vol. 70, pp. 891–910, Nov. 2021.
model performance,” Climate Res., vol. 30, pp. 79–82, 2005. [162] S. Lehrer, T. Xie, and T. Zeng, “Does high-frequency social media data
[155] P. D. Mahendhiran and S. Kannimuthu, “Deep learning tech- improve forecasts of low-frequency consumer confidence measures?”
niques for polarity classification in multimodal sentiment analysis,” J. Financial Econ., vol. 19, no. 5, pp. 910–933, Nov. 2021.
Int. J. Inf. Technol. Decis. Making, vol. 17, no. 3, pp. 883–910, [163] R. Satapathy, E. Cambria, and A. Hussain, Sentiment Analysis in the
May 2018. Bio-Medical Domain. Berlin, Germany: Springer, 2017.
[156] C. Spearman, “The proof and measurement of association between two [164] M. Paolanti et al., “Tourism destination management using sentiment
things,” Int. J. Epidemiology, vol. 39, no. 5, pp. 1137–1150, Oct. 2010. analysis and geo-location information: A deep learning approach,” Inf.
[157] M. G. Kendall, “A new measure of rank correlation,” Biometrika, Technol. Tourism, vol. 23, no. 2, pp. 241–264, Jun. 2021.
vol. 30, p. 81, Jun. 1938. [165] G. Preethi, P. V. Krishna, M. S. Obaidat, V. Saritha, and S. Yenduri,
[158] M.-L. Zhang and Z.-H. Zhou, “A review on multi-label learning algo- “Application of deep learning to sentiment analysis for recommender
rithms,” IEEE Trans. Knowl. Data Eng., vol. 26, no. 8, pp. 1819–1837, system on cloud,” in Proc. Int. Conf. Comput., Inf. Telecommun. Syst.
Aug. 2014. (CITS), Jul. 2017, pp. 93–97.
[159] B. W. Matthews, “Comparison of the predicted and observed secondary [166] M. J. Keenan, Advanced Positioning, Flow, and Sentiment Analysis in
structure of T4 phage lysozyme,” Biochimica et Biophysica Acta (BBA)- Commodity Markets: Bridging Fundamental and Technical Analysis.
Protein Struct., vol. 405, no. 2, pp. 442–451, Oct. 1975. Hoboken, NJ, USA: Wiley, 2020.

A_Review_of_Deep_Learning_Models_for_Twitter_Sentiment_Analysis_Challenges_and_Opportunities

Uploaded by

A_Review_of_Deep_Learning_Models_for_Twitter_Sentiment_Analysis_Challenges_and_Opportunities

Uploaded by

3550 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 11, NO.

A Review of Deep Learning Models for Twitter

8) Future Perspectives: Finally, to provide an overview TABLE I

Fig. 1. Annual publication count. Fig. 3. Top sources.

at the Faculté des Sciences et Techniques, France,

Fig. 4. Top countries.

the International Journal of Advanced Computer Science

Fig. 6. Taxonomy and objectives of DL-based Twitter SA.

Fig. 7. Publication counts of trends across various approaches (2010–2022).

Fig. 8. Generic pipeline of Twitter SA.

Fig. 9. DNN architecture [115].

Fig. 12. LSTM architecture [119].

ct and a hidden unit h t . The initial values for c0 and

the effects of sentiment, user, and content on the

text sequences. To improve the model’s ability to capture

P ROS AND C ONS OF VARIOUS T WITTER SA M ODELS

(Continued.) P ROS AND C ONS OF VARIOUS T WITTER SA M ODELS

(Continued.) P ROS AND C ONS OF VARIOUS T WITTER SA M ODELS

(Continued.) S UMMARY OF DATASETS

TABLE VIII Dandannavar [105] proposed a fast, scalable, and

Fig. 16. Twitter-based SA applications.

You might also like