A_Review_of_Deep_Learning_Models_for_Twitter_Sentiment_Analysis_Challenges_and_Opportunities
A_Review_of_Deep_Learning_Models_for_Twitter_Sentiment_Analysis_Challenges_and_Opportunities
3, JUNE 2024
Abstract— Microblogging site Twitter (re-branded to X since audio, and video, which has piqued the interest of the research
July 2023) is one of the most influential online social media community to identify, extract, and analyze user sentiments
websites, which offers a platform for the masses to communicate, exhibited in the text, referred to as sentiment analysis (SA).
expresses their opinions, and shares information on a wide range
of subjects and products, resulting in the creation of a large Among social networking sites, Twitter, with over 330 million
amount of unstructured data. This has attracted significant atten- active microblogging service users, has become a popular
tion from researchers who seek to understand and analyze the source of data for SA due to its real-time nature and the sheer
sentiments contained within this massive user-generated text. The volume of data generated [7]. The analysis of user-generated
task of sentiment analysis (SA) entails extracting and identifying content is crucial for various business applications, as it
user opinions from the text, and various lexicon- and machine
learning-based methods have been developed over the years to provides insights into users’ daily lives and explains their
accomplish this. However, deep learning (DL)-based approaches behavior and activities, as well as how they are influenced by
have recently become dominant due to their superior perfor- others’ opinions. The task of SA can yield valuable knowledge
mance. This study briefs on standard preprocessing techniques for further detailed analysis, including identifying trends or
and various word embeddings for data preparation. It then results of a particular topic based on sentiment [8], such as
delves into a taxonomy to provide a comprehensive summary
of DL-based approaches. In addition, the work compiles popular movie preferences [9], product proclivity in the market [10],
benchmark datasets and highlights evaluation metrics employed [11], or political opinions [12].
for performance measures and the resources available in the Despite the growing interest in SA, classifying the sentiment
public domain to aid SA tasks. Furthermore, the survey discusses polarity of Twitter tweets remains a crucial task due to
domain-specific practical applications of SA tasks. Finally, the several factors, including language and the lack of contextual
study concludes with various research challenges and outlines
future outlooks for further investigation. cues. Such factors may contradict the well-formed language
embodied in most corpora used for text analysis. Therefore,
Index Terms— Deep learning (DL), natural language process-
there is an increasing interest in improving sentiment classi-
ing, opinion mining, sentiment analysis (SA), social network,
Twitter. fication methods to achieve more accurate, explainable, and
traceable outcomes, as well as better performance in real-
I. I NTRODUCTION time applications. Numerous studies have been conducted to
improve SA techniques, as evidenced by the recent SemEval
O VER the past few years, social media platforms, such as
Twitter, Instagram, Facebook, and various blogging sites,
have experienced exponential growth in their user base. These
challenges [13], and there is still much work to be done to
enhance sentiment classification methods further [14].
venues allow users to be more vocal about their opinions, Various SA techniques, including traditional ones such as
emotions, and thoughts on diverse topics and items of their lexicon-based methods [15], [16], [17], [18], [19], [20], [21],
interests, resulting in the generation of a surplus multitude [22], [23], machine learning (ML) algorithms [24], [25], [26],
of data [1], [2], [3], [4], [5], [6]. Moreover, besides textual [27], [28], [29], and hybrid approaches, have been employed
content, the various aspects of multimodality include pictures, for analyzing Twitter data. In addition, graph-based approaches
have also been suggested to identify sentiment in Twitter
Manuscript received 15 April 2023; revised 28 August 2023 and datasets [30], [31]. However, these techniques have certain
21 September 2023; accepted 29 September 2023. Date of publication
27 October 2023; date of current version 31 May 2024. (Corresponding limitations, such as handling natural language complexities,
authors: Nancy Girdhar; Deepak Sharma.) short sequences of text, semantic relationships, feature selec-
Laxmi Chaudhary is with the Department of Computer Science and tion, lack of validation results, and processing large amounts
Engineering, Jaypee Institute of Information Technology, Noida 201309, India.
Nancy Girdhar and Antoine Doucet are with the Laboratoire Informatique, of data, which hinders their real-time applicability, especially
Image et Interaction (L3i), University of La Rochelle, 17000 La Rochelle, with high-dimensional features.
France (e-mail: nancy.gr1991@gmail.com). To address these limitations, deep learning (DL), a cluster
Deepak Sharma and Matthias Renz are with the Department of Com-
puter Science, Christian-Albrechts-Universität zu Kiel, 24118 Kiel, Germany of multilayer neural network algorithms, has emerged as a
(e-mail: deepak.btg@gmail.com). promising subfield of ML for Twitter SA [32], [33], [34].
Javier Andreu-Perez is with the School of Computer Science and Electronic Several DL-based models, including deep (Vanilla) neural
Engineering, University of Essex, CO4 3SQ Colchester, U.K., and also with
Simbad Group, University of Jaen, 23071 Jaén, Spain. networks (DNNs) [32], [34], convolutional neural networks
Digital Object Identifier 10.1109/TCSS.2023.3322002 (CNNs) [35], [36], [37], [38], recurrent neural networks
2329-924X © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: University of the West of England. Downloaded on December 25,2024 at 12:06:56 UTC from IEEE Xplore. Restrictions apply.
CHAUDHARY et al.: REVIEW OF DL MODELS FOR TWITTER SA 3551
(RNNs) [39], [40], and their variants such as long short-term disaster analysis. Works by De Albornoz et al. [10], Soni and
memory (LSTM) [41], [42], [43], [44], gated recurrent units Sharaff [53], and Fadel and Cemil [27] contribute to this effort.
(GRUs), and hybrid techniques, have shown effectiveness in Furthermore, few researchers have investigated the impact of
capturing the nuances of natural language and handling the data quality on SA performance, considering factors such as
noise and ambiguity present in Twitter data [35], [36], [37], readability, subjectivity, and informativeness. Kumar et al. [21]
[38], [39], [40], [41], [42], [43], [44]. These models offer and Jain and Vaidya [54] examined online product reviews to
flexible solutions that enhance SA performance by providing analyze customer feedback for applications such as business
a better interpretation of the context and semantic meaning of monitoring and brand management.
text data. In a different line of work, comparative studies, most
Motivation-Twitter as a Unique Case for SA: Twitter reviews have focused on reliability metrics such as F1-score
presents a distinctive environment for SA, characterized by or overall accuracy, and performance evaluation of methods
the specific features that set it apart from other contexts. is often carried out on small datasets [55]. These studies
First, the stringent character limit, which ranges from 280 to have shed light on domain-specific past literature or compared
10 000 characters (depending on the subscription) per tweet, the performances of different models on SA tasks. However,
leads to concise expressions. This can result in the loss despite the recent surge in DL-based developments in Twitter
of nuanced sentiment cues. In addition, Twitter users often SA, there is still a gap in the literature for an extensive analysis
employ informal language, slang, and abbreviations, posing and outline of research progress over the years. To bridge this
challenges for SA algorithms to comprehend unconventional gap, this comprehensive study presents an objective overview
language usage accurately. Moreover, the widespread use of of various SA methods, with a focus on DL approaches,
emojis and hashtags in tweets requires specialized techniques to provide an overview of existing research and identify
to effectively integrate these nontextual elements into SA, research gaps, paving the way for researchers to fill those
capturing their emotive context. gaps.
Furthermore, Twitter data are often noisy, with promotional Observing prevalent previous, current, and coming trends &
content, news updates, and irrelevant information intermingled developments, this research aims to achieve the following
with sentiment expressions. This noise hampers SA perfor- objectives.
mance and necessitates robust preprocessing techniques to 1) Systematic Taxonomy: To present a systematic taxonomy
filter out irrelevant content and enhance sentiment predic- that summarizes, compares, and reviews representa-
tion accuracy. The challenges posed by Twitter SA, such tive works for each type of approach. This provides
as handling brevity, informality, nontextual cues, and noise, new perspectives for future exploration and practices
demand tailored preprocessing strategies and algorithms to in SA.
ensure reliable SA results. 2) Preprocessing Techniques: To provide a summary of
Given the significance of SA in a vast spectrum of appli- various preprocessing techniques used to clean and
cations and the plethora of work dedicated to SA within process text data before applying DL models. To discuss
Twitter literature, various lines of review studies are presented the impact of these techniques on the accuracy of SA
by the researchers in order to highlight the advances being and their effectiveness in handling noisy and ambiguous
achieved, and the challenges yet need to be addressed. Mittal data.
and Patidar [45] focused on exploring lexicon- and ML-based 3) Overview of Traditional SA Techniques: To provide an
methods for SA on Twitter. However, this study did not delve overview of traditional SA techniques and their lim-
extensively into preprocessing methods and DL techniques. itations in processing large volumes of Twitter data.
In contrast, Silva et al. [46] conducted a comprehensive survey To discuss the challenges faced by traditional techniques
primarily centered around semisupervised approaches, encom- and the need for more advanced techniques to handle the
passing graph-, wrapper-, and topic-based methods for tweet complexities of natural language.
classification. This survey featured a comparative analysis 4) Analysis of DL-Based Approaches: To provide a detailed
of three semisupervised techniques: self-training, cotraining, analysis of various DL-based approaches for SA on
and topic modeling. Azzouza et al. [47] introduced a system Twitter, including DNN, CNN, RNN, and their variants
aimed at discovering and tracking opinions on Twitter using such as LSTM and GRU. To discuss their architectures,
Apache Storm. Through dynamic graphical visualizations, training methodologies, and their strengths and limita-
multiple opinions were represented, while an unsupervised tions.
ML technique was employed for SA and polarity detection, 5) Challenges Faced in SA on Twitter: To analyze the
and the evaluation of the model’s performance was conducted challenges faced in SA on Twitter, such as noisy data,
using SemEval datasets. In addition, Ligthart et al. [48], sarcasm, and irony. To discuss the impact of these
Wankhade et al. [49], and Das and Singh [50] have conducted challenges on the accuracy of SA and the need for
studies to gain insights into diverse tasks and approaches preprocessing techniques to clean text data.
within SA. 6) Performance Measures: To present a detailed analysis of
Some studies have summarized the technical and theoretical the evaluation metrics used to evaluate the performance
aspects of SA, as done by Yadav and Vishwakarma [51] of the models.
and Sharma and Jain [52]. Others have compiled literature to 7) Real-World Case Studies: To discuss various dimensions
address the challenges posed by large data and the expansion of SA usage, its applicability, and its influence on
of SA into domains such as marketing, finance, healthcare, and various business domains.
Authorized licensed use limited to: University of the West of England. Downloaded on December 25,2024 at 12:06:56 UTC from IEEE Xplore. Restrictions apply.
3552 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 11, NO. 3, JUNE 2024
Authorized licensed use limited to: University of the West of England. Downloaded on December 25,2024 at 12:06:56 UTC from IEEE Xplore. Restrictions apply.
CHAUDHARY et al.: REVIEW OF DL MODELS FOR TWITTER SA 3553
of only one nonzero entry that corresponds to the word, while than matrix factorization (a.k.a. latent semantic analy-
the rest of the entries are zeros. However, this method has sis), which only generates an efficient substructure of
some drawbacks, such as high computational requirements and vector space. On the other hand, the Skip-gram (local
the inability to handle context similarity, because each word context window approach) performs well on analogy
is encoded as a sparse, high-dimensional vector. The other tasks but does not make full use of the corpus statistics
approach is the term frequency–inverse document frequency [74], [81].
(TF-IDF) method that assigns a score that reflects the relevance 3) FastText [82]: It is an extension of Word2Vec that
of the term in the document compared to the rest of the corpus represents each word using n-grams of characters instead
though it is simple, effective, and computationally efficient to of individual words [83]. It learns word embeddings by
identify important words in a document and can be used to representing each word as a bag of characters and then
rank documents based on their relevance to a query. Never- learning embeddings for these n-grams. This allows the
theless, it does not take into account the order of words in a model to generate efficient embeddings of rarely occur-
document or their semantic meaning and may not perform well ring words in the corpus. The n-grams are employed
on documents with highly specialized vocabulary or uncom- to train a Skip-gram model, and the embedding of a
mon words [22], [25], [26]. An alternative approach is to use word is determined by summing up the embeddings
dense embedding vectors to obtain the context of words in of all its n-grams. However, it requires high memory
terms of both syntax and semantics [74], [75], [76]. The dense and system requirements to create embeddings of each
vector mapping ensures that words with similar meanings are character n-gram in the vocabulary [74], [84]. FastText
represented close to each other in the vector space. To improve has been shown to improve the performance of SA
the generalization, representation, and computational time of models on Twitter data, especially for out-of-vocabulary
sentiment classification models, various word embeddings words.
have been proposed. Word embeddings are a popular technique Besides the aforementioned three popular word embed-
for representing textual data into numeric input vectors, which ding schemes (Word2Vec, GloVe, and FastText), other
are easily processed by neural network models. Several types word-embedding approaches are also developed, such as
of word embeddings exist in the literature, and some of Bidirectional Encoder Representations from Transformers
the commonly used word embeddings are summarized as (BERT), which is a pretrained language model that has
follows. been fine-tuned for SA on Twitter data and has achieved
1) Word2Vec [77]: It is a popular word embedding model superior performance compared to other models. ELMo
that is based on neural networks that are designed to [85] is another pretrained language model that uses a
reconstruct the linguistic contexts of the words [74], bidirectional LSTM (Bi-LSTM) architecture to learn con-
[78], [79]. It employs a two-layer neural network archi- textualized word embeddings and has shown state-of-the-art
tecture that takes text as input and generates a vector results, especially for sentiment classification at the sentence
embedding for each word as output. There are two level.
types of Word2Vec models: Skip-gram and continuous
bag of words (CBOW). The Skip-gram model predicts
III. D EVELOPMENT OF SA AND
a D-dimensional vector representation of each word
L ITERATURE C LASSIFICATION
in the corpus. The input and hidden layers have the
same number of neurons as the vocabulary size and This section elaborates on the recent advances in the field of
the word vector dimensions, respectively. The weights Twitter SA. For this study, we have considered publications of
between these layers are represented by W H ×D , where the Scopus database from 2010 to 2022. The Scopus document
H is the size of the hidden layer. This weight matrix search string in this study was composed as follows—(TITLE-
signifies the likelihood of each word’s occurrence for ABS-KEY(“twitter” AND “sentiment” AND “deep” AND
that input. The model learns the correlation between “learning”) AND (LIMIT-TO (LANGUAGE, “English”))),
words in a vocabulary by computing the error at the which resulted in 1115 research papers. The section highlights
output layer using a loss function and updating the the yearwise publications, top organizations, key researchers,
weights (word embeddings) through backpropagation. and prominent source titles of this domain. Furthermore,
In contrast, the CBOW model processes the context of a detailed survey is presented on Twitter SA, bifurcated into
a word as input and predicts the word based on that conventional and DL-based approaches along with hybrid
context [74], [79]. techniques.
2) GloVe or Global Vectors [80]: Glove is another popular
word embedding model that is based on co-occurrence
statistics. It is an unsupervised learning method that A. Recent Trends and Developments
learns word embeddings by factorizing a matrix of 1) Annual Trends: Fig. 1 displays the annual trend
word co-occurrence probabilities and combines the local of research publications on SA using Twitter data
context window and matrix factorization methods to from 2010 to 2022. The x-axis represents the publi-
analyze the local and global statistics of a corpus [80]. cation years, while the y-axis indicates the publication
It performs better in apprehending the analogy of words count recorded in the Scopus database. The data
Authorized licensed use limited to: University of the West of England. Downloaded on December 25,2024 at 12:06:56 UTC from IEEE Xplore. Restrictions apply.
3554 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 11, NO. 3, JUNE 2024
Authorized licensed use limited to: University of the West of England. Downloaded on December 25,2024 at 12:06:56 UTC from IEEE Xplore. Restrictions apply.
CHAUDHARY et al.: REVIEW OF DL MODELS FOR TWITTER SA 3555
TABLE II
K EY AUTHORS (P: P UBLICATION ; C: C ITATION ; AND AC: AVERAGE C ITATION )
Authorized licensed use limited to: University of the West of England. Downloaded on December 25,2024 at 12:06:56 UTC from IEEE Xplore. Restrictions apply.
3556 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 11, NO. 3, JUNE 2024
Many studies have utilized lexicon-based approaches, such Al-Khalisy and Jehlol [19] proposed a dictionary-based
as by Jurek et al. [90] who developed an SA algorithm that approach for extracting significant information from terrorist
focuses on real-time analysis of Twitter content. This method propaganda such as account name, location, and supporter
includes two main components: a combination function based data. This method utilized a bag of words (BOW) to
on evidence and sentiment normalization, which are used to compute the overall scores for each tweet that represents the
estimate the sentiment intensity. Table III provides further training data and to analyze the polarity; the created word list
details on lexicon-based state of the art. comprised of antonyms and synonyms from the dictionary.
Lexicon-based approaches can be divided into two subcate- Chalothorn and Ellman [15] suggested the use of lexical
gories: dictionary-based and corpus-based. Dictionary-based resources, such as NLTK toolkit, SentiWordNet, and WordNet,
approaches use predefined dictionaries for instance Senti- for the analysis of online radical posts. The polarity and text
WordNet, and WordNet to perform SA [18], [19], [20], [21]. intensity are calculated to analyze the sentiment. For this, the
Corpus-based approaches leverage corpus data for sentiment text corpus was initially acquired from various web platforms,
classification, which are further divided into statistical and such as Qawem and Montada, and after essential data prepro-
semantic subcategories [24]. The statistical category comprises cessing, various attribute-driven measures were employed to
conditional random field (CRF) [91], K-nearest neighbors identify and manage extremist and religious content. Based on
(KNNs) [92], hidden Markov models (HMMs) [53], and [15], Gitari et al. [16] build their hate verb lexicon, starting
so on. with a basic verb list, and expanding it iteratively by adding
Authorized licensed use limited to: University of the West of England. Downloaded on December 25,2024 at 12:06:56 UTC from IEEE Xplore. Restrictions apply.
CHAUDHARY et al.: REVIEW OF DL MODELS FOR TWITTER SA 3557
synonyms and hypernyms of the seed verbs depending on ist affiliations in social media communication. Their model
WordNet relations. Simon et al. [23] developed a corpus-based classifies user reviews into positive and negative sentiments
approach that uses divergent behavior to analyze the sentiment to reflect affiliations with extremist or nonextremist groups.
of tweets during the Kenya Westgate Mall attack to find the However, this method does not consider the overall depen-
radicalization time among the users of Twitter. The emergency dencies concerning a sentence in a given document. Rani
organizations and communication centers recommended by the and Singh [98] proposed an SVM model with features
authors minimize the use of negative sentiments when they extracted using the TF-IDF method for SA in which they
communicate with the public. Another corpus-based method detected sentiment polarity using two SVM methods and
was proposed by Mansour [22] to analyze public sentiment concluded that the linear SVM model outperformed the kernel
polarity from Eastern and Western countries toward Islamic SVM.
State of Iraq and Syria ( ISIS). This method employs text Omer [101] proposed an ML-based approach that col-
SA using TF-IDF for analyzing the frequency of words and lects and uses three different datasets, including supporters
word sentiment. The other lexicon-based method proposed by of ISIS, antisupporters of ISIS, and random tweet datasets
Kharde et al. [18] uses part-of-speech (POS) tagging, while the that are unrelated to ISIS. The method employs three
lexicon approaches presented by Ferrara et al. [20] and Kumar primary classifiers, namely, NB, AdaBoost, and SVM.
et al. [21] depend on a dictionary for feature extraction from Nouh et al. [103] developed a novel ML-based approach
the dataset. to analyze radical content and extremism propaganda in
Numerous techniques have utilized lexical approaches as tweets. Kaati et al. [102] introduced a method for identify-
they do not require annotated data, which is one of the key ing the Twitter accounts of jihadist group supporters and
challenges in the SA task. However, these methods have online propaganda propagators using feature engineering,
certain limitations such as their accuracy being influenced by which involves analyzing data dependencies and classify-
the size and quality of the lexicon. Moreover, these approaches ing features as data-independent or data-dependent. Ferrara
cannot handle sarcasm and irony, which are common on et al. [20] developed an SA technique that uses meta-
Twitter. Furthermore, these methods cannot handle out-of- data as a feature, together with a greedy selection method,
vocabulary words, which can lead to incorrect sentiment and applies the random forest classifier and logistic regres-
classification. This is particularly problematic for Twitter data, sion models to predict the extremists’ sentiment polarity in
which is constantly evolving, requiring frequent updates to the interactions.
lexicon. In addition, they rely on handcrafted features, which Omar et al. [104] identified the relationship between hate
can be a laborious and time-consuming process. Another speech and topics present on online social platforms based on
limitation of these methods is that they are not very effec- an ML method. This approach utilizes multilabel classification
tive at generalizing to different domains or context-specific by employing logistic regression, linear SVC, and random
orientations [23]. forest classifiers. To classify text sentiment into positive,
2) ML-Based Approaches: These techniques have been neutral, or negative, the authors have utilized feature repre-
extensively employed for SA on Twitter in recent years. sentations that include TF-IDF, N-gram, and BOW. Rehman
Leveraging statistical techniques, these methods have the et al. [107] have proposed a method to detect radical text
ability to automatically learn patterns and relationships from on Twitter, where religious language plays a significant role
data, which are then used to classify the sentiment of the in radicalization. The authors have utilized both radical and
text. religious features for training the model and applied TF-IDF
ML approaches broadly come under the umbrella of con- for feature engineering to feed into ML classifiers, includ-
ventional methods that constitute popular techniques such ing random forest, SVM, and NB, to detect the sentiment
as support vector machines (SVMs) [24], [98], [99], Naïve polarity.
Bayes (NB) classifier [24], [25], [99], and maximum entropy In order to improve the accuracy of SA, researchers
classifier [55], [100]. These techniques have been used in have dedicated efforts to developing ML-hybrid mod-
several studies for SA, including studies on terrorism [25], els [26], [29], [108], [109] that integrate multiple ML
[101], [102], hate speech detection [103], [104], customer approaches to address the shortcomings of individual meth-
satisfaction [105], and sentiment polarity detection [27] For ods. While these hybrid models have led to better results,
instance, Wei et al. [24] have used the NB algorithm to there is still potential for further enhancement of their
classify tweets as positive, negative, or neutral based on the outcomes.
presence of specific words in a tweet. Additional traditional Despite the success of ML-based Twitter SA approaches,
ML approaches have also been utilized in various other studies there still exist challenges that need to be addressed. One
[26], [27], [28], [29], [101], [103], [106], which are presented of the major limitations is their dependence on the quality
in detail in Table IV. and size of the training dataset. If the training dataset is
Wei et al. [24] proposed a KNN classifier-based approach biased or too small, it may lead to poor performance of
for sentiment classification to identify extremist-related the model. Another limitation is their inability to handle the
conversations on Twitter public tweets. Similarly, Azizan ambiguity and complexity of the multilingual dataset and
and Aziz [25] utilized the NB algorithm to detect extrem- their inadequacy to efficiently capture relevant features from
Authorized licensed use limited to: University of the West of England. Downloaded on December 25,2024 at 12:06:56 UTC from IEEE Xplore. Restrictions apply.
3558 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 11, NO. 3, JUNE 2024
TABLE III
L EXICON -BASED A PPROACHES
short sequences of text (short text). For instance, sarcasm methods struggle with such surplus data, whereas DL models
and irony in tweets can often be misinterpreted by these excel as they can learn more features while training on
models, leading to incorrect sentiment classification. Also, vast datasets, resulting in superior accuracy and performance
their performance relies on the amount of annotated data efficiency. In addition, DL models equipped with multiple
available for training, making them highly data-dependent. hidden layers enable them to capture complex and nonlinear
However, annotating tweets is a costly process due to the patterns in the data [33] easily even in short-text data, such
dynamic nature of Twitter content. Moreover, these methods as “tweets.” Therefore, a plethora of DL-based models are
are domain-specific, which means that their effectiveness is developed over the past few decades to analyze text senti-
limited to the domain in which they are trained. If they ments of posts on various social media platforms, including
are used in a different domain, their efficacy decreases, and Twitter.
they need to be retrained in that domain to perform well To gain better insights into recent years’ advancements,
[51]. Furthermore, the step of feature extraction and engineer- the current survey bifurcates the DL literature into a tax-
ing in ML-based methods is computationally expensive and onomy broadly categorized as basic and transformer-based.
time-consuming. Basic DL models consist of DNN [32], [33], [34], CNN,
Overall, the use of ML approaches for Twitter SA has [35], [36], [37], [38], RNN [40], and LSTM [110], whereas
shown promising results; nevertheless, there is still room transformer-based includes BERT [111], RoBERTa [112],
for improvement. Further address is needed to develop more XLNet [113], GPT [114], and so on. Besides, in these
robust and efficient models for SA, which can handle the two major categories, there are many DL-hybrid methods
complexities of natural languages. proposed by the research community for Twitter text SA
along with recent developments of graph-based methods that
3) DL-Based Approaches: Unlike handcrafted feature engi- are classified under the “other” category in this study. The
neering in ML approaches, models based on the DL paradigm current section details the literature for each category of DL
are capable of automatically extracting significant features models. Fig. 8 presents the generic DL-based pipeline for
from the text and have shown state-of-the-art results for SA Twitter SA.
tasks.
DL models offer several advantages over conventional meth- 1) DNN is a type of artificial neural network that consists
ods for SA and, thus, have become a recent emerging research of multiple hidden layers between the input and output
area for Twitter SA tasks. DL models are well-suited for layers (as shown in Fig. 9). These textit vanilla neu-
handling large amounts of data that are generated every day ral networks can efficiently handle complex nonlinear
on social media platforms. For example, on Twitter, about relationships between the layers compared to con-
6000 tweets are produced per second on average, resulting ventional single-hidden-layer architectures. To perform
in approximately 200 billion tweets per year. Traditional the Twitter SA task, the DNN model implicitly learns
Authorized licensed use limited to: University of the West of England. Downloaded on December 25,2024 at 12:06:56 UTC from IEEE Xplore. Restrictions apply.
CHAUDHARY et al.: REVIEW OF DL MODELS FOR TWITTER SA 3559
TABLE IV
T RADITIONAL ML- AND H YBRID -BASED M ETHODS
different features from the input data in a feedforward and they have a greater number of parameter values too,
manner where each layer is fully connected with the making them difficult to train [33].
next layer. During the training step, backpropagation 2) CNN is one of the DNN-based model variations that
is used to learn and adjust the weights among neu- typically consist of a sequence of convolutional and
rons. The weights are updated depending on the error pooling layers, followed by one or more fully connected
obtained at the output layers. Ali et al. [32] developed layers for sentiment classification. The convolutional
a DL-based SA model using RapidMiner to predict layers use filters of varying sizes to extract local features
the results of general elections in Pakistan in 2018. from the input, while the pooling layers reduce the
Similarly, Yasir et al. [34] employed a DL model to dimensionality of the data by downsampling the output
forecast the interest rates of five countries, utilizing of the convolutional layers. The extracted features are
Twitter sentiments as an input. They have also integrated then fed into the fully connected layers for classification
regression models such as linear and support vectors in purposes. CNN-based models have shown to be effective
their analysis. for Twitter SA and can learn complex features from the
The DNN-based models have shown superior perfor- input data, allowing them to capture the context and
mance compared to traditional ML models due to their meaning of the words. In addition, being computation-
ability to learn complex features from the data. However, ally efficient and easy to train on large datasets, these
these models have a larger number of hidden layers, models have opted for SA in various domains.
Authorized licensed use limited to: University of the West of England. Downloaded on December 25,2024 at 12:06:56 UTC from IEEE Xplore. Restrictions apply.
3560 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 11, NO. 3, JUNE 2024
TABLE V
DL-BASED M ODELS
Authorized licensed use limited to: University of the West of England. Downloaded on December 25,2024 at 12:06:56 UTC from IEEE Xplore. Restrictions apply.
CHAUDHARY et al.: REVIEW OF DL MODELS FOR TWITTER SA 3561
A generic architecture of CNN is illustrated by Fig. 10. input is unrelated to the output, which means they
The most important component in CNN is the convo- do not help in dealing with contextual dependencies
lution layer. The convolutional layer h is formed by present in the dataset. To address this, RNNs [40]
applying the activation function f (·) to the input matrix aid from previous state information to handle contex-
X , which is convolved with the weight matrix W k and tual relationships to capture the temporal dependencies
added to the bias term bk for each layer. The elements in between words of data. These models use a hidden state
the ith row and the jth column of W k and X are referred that is updated at each time step, allowing them to
to as wi,k j and xi, j , respectively. The resulting k feature apprehend the context and meaning of the words. Each
map of the hth layer has a dimension C × H × W , word in the text is considered as a separate input at a
where C, H , and W represent the channel, height, and given time t, and previously hidden state information
width, respectively. One can create a convolutional layer, is employed to process the current input, as presented
denoted as h, by using k small filters (also known as in Fig. 11.
kernels) of size Ni × N j , as shown in (1). These filters At a particular time step t, the input vector X t and the
perform a cross correlation operation, convolving the output vector h t−1 from the preceding RNN layer are
input pixel xu,v to obtain h ku,v fed as inputs to the current RNN layer. The output for
that time step is then computed using these two input
Ni XNj
X vectors using (2)
h ku,v (X u,v ) = f wi,k j xu+i,v+ j + bk . (1)
i=1 j=1 h t = tanh(h t−1 Wh + X t W X + b). (2)
Zola et al. [116] have developed a word-embedded While RNNs are great at learning sequential data, they
CNN model to address cross-domain issues encoun- cannot obtain local attributes in parallel. As a result,
tered while performing SA. The model employs web RNN models are complementary to CNN models since
sources such as Amazon and TripAdvisor, which contain they maintain sequential information over time. LSTM
easily labeled reviews, for fitting a sentiment predic- and GRU are extensions of RNNs widely used for SA
tion model. This model is later reused to classify on Twitter.
the sentiment polarity of two unlabeled social media 4) LSTM: As RNNs may suffer from exploding gradient
platforms: Twitter and Facebook. The authors have and vanishing gradient issues, which makes it difficult to
also explored various techniques such as POS tagging, handle long-term contextual dependencies and fine-tune
stemming, undersampling, oversampling, and handling their parameters. This can lead to difficulty in training
unlabeled sentiment data to reduce word sparsity. and remembering long-distance correlations in a sequen-
Paredes-Valverde et al. [64] have proposed an approach tial manner [39]. To resolve these issues, an LSTM,
based on Word2Vec for sentiment classification, which a variant of RNN, restructures the RNN by introducing
helps companies and organizations identify opportuni- a memory cell and a gate to retain information for
ties for improving the quality of their products and further utilization and updates [110]. By modifying the
services. RNN layer, the LSTM model solves both exploding and
Alharbi and de Doncker [67] have developed a CNN vanishing gradient problems occurring in RNN models.
model that incorporates user behavioral details present LSTM models are beneficial for sentiment classification
in a document, such as a tweet, for SA. The authors since they can apprehend both long and short-term
have utilized two datasets provided by the SemEval- dependencies and have obtained notable results in
2016 Workshop to evaluate the model’s performance. this task. Also, these models are capable of solving
This approach suggests that considering the content time-series and sequential problems with remarkable
of a document or a tweet beyond its availability outcomes.
is advantageous in SA, as it provides the model As shown in Fig. 12, an LSTM cell with input feature
with an in-depth understanding of the classification xt receives input data x at time t, and an input gate i t
task. regulates the input data’s flow into the cell. The forget
Overall, CNN-based models have shown to be effec- gate f t determines when to discard the contents of the
tive for SA on Twitter data. While these models are cell’s internal state, while the output gate ot governs
designed to extract local features from the input and the flow of information to the output. Equations (3)–(8)
may have limitations in capturing long-term dependen- summarize the cell function
cies between words, recent studies have shown that
it = σ (Ui xt + Wi h t−1 + bi ) (3)
incorporating attention mechanisms can improve their
performance. ft = σ (U f xt + W f h t−1 + b f ) (4)
3) RNNs: The main drawback of CNNs is their inabil- ot = σ (Uo xt + Wo h t−1 + bo ) (5)
ity to understand the relationships between sequences. gt = σ (Ug xt + Wg h t−1 + bg ) (6)
In addition, the effectiveness of the CNN technique
ct = gt ⊙ i t + f t ⊙ ct−1 (7)
largely depends on choosing an appropriate window
size of kernels [39]. CNN models assume that each ht = ot ⊙ tanh(ct ). (8)
Authorized licensed use limited to: University of the West of England. Downloaded on December 25,2024 at 12:06:56 UTC from IEEE Xplore. Restrictions apply.
3562 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 11, NO. 3, JUNE 2024
Authorized licensed use limited to: University of the West of England. Downloaded on December 25,2024 at 12:06:56 UTC from IEEE Xplore. Restrictions apply.
CHAUDHARY et al.: REVIEW OF DL MODELS FOR TWITTER SA 3563
Authorized licensed use limited to: University of the West of England. Downloaded on December 25,2024 at 12:06:56 UTC from IEEE Xplore. Restrictions apply.
3564 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 11, NO. 3, JUNE 2024
Authorized licensed use limited to: University of the West of England. Downloaded on December 25,2024 at 12:06:56 UTC from IEEE Xplore. Restrictions apply.
CHAUDHARY et al.: REVIEW OF DL MODELS FOR TWITTER SA 3565
RNN, hierarchical attention network (HAN), and CNN includes semantic, statistical similarity information and man-
for SA. ually calibrated sentiment scores that enhance the senti-
Jalil et al. [65] applied a hybrid model to analyze tweets ment lexicon quality. Hussain and Cambria [31] analyzed
collected on COVID-19 using various classifiers and fea- knowledge-based reasoning using a vector space and SVM
ture sets. Jain et al. [66] suggested a hybrid CNN-LSTM model, which utilizes lexical and graph representations for
model that uses word embedding to convert texts into vec- SA. Aflakparast et al. [150] proposed a Bayesian graphi-
tors to classify sentiments of the text. Wu et al. [136] cal model to examine Twitter data. Demotte et al. [56]
proposed a hybrid approach to summarize opinions on Chi- presented a CN-based model that utilizes GloVe embed-
nese microblogging systems using CNN and the Ortony, dings and dynamic/static routing to analyze social media
Clore, and Collins (OCC) model that is a rule-based export content. Lovera et al. [7] developed a hybrid DL with
mechanism. a KG to analyze sentiment in a short text, such as
While the DL- and ML-hybrid models leverage the Twitter posts. Aflakparast et al. [150] concluded that the
strengths of both DL and ML algorithms to achieve bet- results obtained from graph-based methods are promising,
ter SA performance, they still have limitations in terms of but there is still some ambiguity regarding the relation-
capturing nonlinear data complexity. Recently, new advance- ship between identified clusters and actual ratings. Moreover,
ments have been made by exploring the field of knowledge these methods can be computationally demanding and time-
graphs (KGs), graph neural networks (GNNs), capsule net- consuming, and may not necessarily lead to improved
works (CNs), and so on, which we have detailed in accuracy.
Section III-D. Table VI presents a summary of the advantages (pros) and
disadvantages (cons) of Twitter SA methods based on different
criteria.
D. Other Methods
The “other methods” category comprises KG-, GNN-, and IV. P UBLISHED DATA S OURCES AND T OOLS
CN-based approaches that utilize the Twitter graph’s properties This section details different versions of the Twitter dataset
and characteristics. While the GNNs are a subset of DL, exploited in the existing literature and additional information
their distinct characteristics in handling graph-structured data about other similar datasets adopted for SA. Furthermore,
necessitate their separate classification. GNNs leverage the it details the employed performance metrics used for the
graph-based learning paradigm, which fundamentally differs evaluation of the proposed approaches [51]. Moreover, the
from the standard feedforward learning approach used in most section highlights the diverse tools and libraries leveraged for
traditional DL models. In GNNs, each node in the graph is SA tasks.
associated with a feature vector, and learning involves updat-
ing node representations by aggregating information from
A. Dataset Description
their neighboring nodes iteratively, where nodes represent
users, hashtags, or words, and edges represent the com- In the presented work, we have broadly divided the var-
plex, nonlinear relationships among the nodes [148], [149]. ious popular benchmark datasets used for SA into two
Unlike traditional DL models, such as CNNs for images categories of Twitter and other. Table VII summarizes the
or RNNs for sequences, which process fixed-sized inputs, dataset details and provides information about the size,
GNNs operate directly on irregular graph structures. This polarity, source, and publications that have utilized them.
recursive information propagation mechanism enables GNNs Twitter has emerged as a prominent platform for SA due
to capture complex patterns and dependencies within the data, to its large user base and the availability of real-time data.
especially in scenarios where traditional DL models struggle Researchers have used various versions of Twitter datasets
due to their fixed-sized input representations. Furthermore, for SA, ranging from general datasets to domain-specific
unlike other approaches, these methods do not require large datasets. One commonly used dataset is the Sentiment140
amounts of manually annotated data as they automatically dataset [151], which contains 1.6 million tweets labeled
collect annotated data using links between users and tweets, as positive or negative. It has been widely used for SA
such as replies, followers, and previous tweets. However, research and benchmarking, and has been utilized to evaluate
they are domain-specific since the relationships and sentiment the performance of various DL models. Another common
lexicon they use are tailored to the domain. These methods dataset is the SemEval, which contains tweets related to
assume that sentiment and rating are interdependent, and specific events or topics. The dataset provides labels for
they address the issue of existing approaches where posi- three categories: positive, negative, and neutral. SemEval [13]
tive sentiment can be expressed using words with negative datasets have been used for SA research and competition,
connotations. providing a more challenging task due to the inclusion
Li et al. [30] proposed a graph-based technique, DWWP, of neutral tweets. Several other domain-specific datasets
which includes domain-specific word detection (DW) and include data related to politics, finance, and healthcare. These
word propagation (WP) methods. DW handles new words datasets provide a more targeted analysis of public sentiments
invented by users and converts the sentiment of words within specific domains and can be useful for real-world
using assembled mutual information (AMI), while WP applications.
Authorized licensed use limited to: University of the West of England. Downloaded on December 25,2024 at 12:06:56 UTC from IEEE Xplore. Restrictions apply.
3566 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 11, NO. 3, JUNE 2024
Authorized licensed use limited to: University of the West of England. Downloaded on December 25,2024 at 12:06:56 UTC from IEEE Xplore. Restrictions apply.
CHAUDHARY et al.: REVIEW OF DL MODELS FOR TWITTER SA 3567
Authorized licensed use limited to: University of the West of England. Downloaded on December 25,2024 at 12:06:56 UTC from IEEE Xplore. Restrictions apply.
3568 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 11, NO. 3, JUNE 2024
Authorized licensed use limited to: University of the West of England. Downloaded on December 25,2024 at 12:06:56 UTC from IEEE Xplore. Restrictions apply.
CHAUDHARY et al.: REVIEW OF DL MODELS FOR TWITTER SA 3569
It is a standard practice for researchers to use the Twitter actual sentiment scores, taking into account the rank
dataset in SA tasks. However, these data have its limitations order of the scores rather than their absolute values
such as short text (tweets have a word limit), noise, and the [156]. It is defined as follows:
presence of sarcasm, irony, and slang, which can affect the 6 × di2
P
accuracy of SA models. In addition, the use of prelabeled ρ =1− 2 (19)
n × (n − 1)
datasets may not always accurately represent the sentiment
of the tweets, as the interpretation of sentiments can be very where n is the number of instances and di is the
subjective. difference between the rank of the predicted sentiment
To address these challenges, researchers have explored score and the rank of the actual sentiment score for ith
techniques for preprocessing and developing more robust SA instance.
models that can handle noisy and ambiguous data. In addi- 5) Kendall’s tau (τ ) is a measure of the strength of
tion, researchers have explored the use of active and transfer the association between predicted and actual sentiment
learning techniques to improve the efficiency and accuracy of scores, taking into account the number of pairwise
SA models with limited labeled data. disagreements [157]. It is defined as follows:
nc − nd
τ= (20)
B. Evaluation Metrics n × (n − 1)/2
In addition to the gold standard evaluation metrics, such where n is the number of instances, n c is the number
as precision [32], [62], [74], [142], recall [68], [72], [73], of concordant pairs, and n d is the number of discordant
F_score [65], [69], [137], [143], area under the curve (AUC) pairs.
[57], and receiver operating characteristic (ROC) curve [60], 6) Hamming loss (HL) measures the fraction of labels that
other performance metrics have been utilized in the literature are incorrectly predicted for a given set of instances [57],
to assess the performance of SA models. This section outlines [158]. It is defined as follows:
some of the commonly used evaluation metrics, along with n
1 X
their computation formulas. Table VIII presents an overview HL = × L(h(xi ), yi ) (21)
n i=1
of various performance metrics adopted in the literature to
evaluate SA models. where n is the number of instances, h(xi ) predicted
1) Cohen’s kappa (CK) is a measure of interannotator labels for instance i, yi is the true label for instance
agreement that accounts for chance agreement [57], i, and L is the loss function, which is typically defined
[152], [153]. It is defined as follows: as the number of labels that are different between the
predicted and true labels. In SA, each instance (e.g.,
Po − Pe
CK = (16) a tweet or a review) can be associated with multiple
1 − Pe sentiment labels, such as positive, negative, neutral,
where Po is the observed agreement and Pe is the or a combination of these. HL is used to evaluate
expected agreement. Po is calculated as the proportion the accuracy of a multilabel classifier in predicting the
of times the annotators agree, while Pe is calculated as correct sentiment labels for each instance.
the product of the marginal proportions of each label. 7) Jaccard index (JI) also known as the Jaccard similarity
2) Mean absolute error (MAE) is the average absolute coefficient or Jaccard similarity index and is an eval-
difference between predicted and actual sentiment scores uation metric used in SA and other natural language
[34], [154]. It is defined as follows: processing tasks to measure the similarity between two
1 X
n sets of labels [62]. It is defined as follows:
MAE = × | yi − e
yi | (17) | A∩B |
n i=1 JI = (22)
| A∪B |
where n is the number of instances, yi is the actual
where A is the set of labels assigned by the model, B is
sentiment score, and eyi is the predicted sentiment score
the set of true labels, and | A∩ B | and | A∪ B | represent
for instance i, respectively.
the size of the intersection and the union between A and
3) Root mean square error (RMSE) is the square root of
B, respectively. In SA, the JI is used to evaluate the
the average squared difference between predicted and
overlap between the predicted sentiment labels and the
actual sentiment scores [154], [155]. It is defined as true sentiment labels for each instance (e.g., a tweet or a
follows: review). JI’s higher values indicate better performance.
v
u
u1 X n 8) Matthews correlation coefficient (MCC) is an evalua-
RMSE = t × (yi − e
yi )2 (18) tion metric used in SA and other classification tasks to
n i=1 measure the quality of the predictions made by a model.
where n is the number of instances, yi is the actual It takes into account true positives (TPs), true negatives
sentiment score, and eyi is the predicted sentiment score (TNs), false positives (FPs), and false negatives (FNs)
for instance i, respectively. [57], [159]. It is defined as follows:
4) Spearman’s rank correlation coefficient (ρ) measures (TP × TN) − (FP × FN)
MCC = √ (23)
the strength of the association between predicted and (P × Q × R × S)
Authorized licensed use limited to: University of the West of England. Downloaded on December 25,2024 at 12:06:56 UTC from IEEE Xplore. Restrictions apply.
3570 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 11, NO. 3, JUNE 2024
where P, Q, R, and S denote (TP + FP), (TP + FN), for text preprocessing, topic modeling, and similarity
(TN + FP), and (TN + FN), respectively. MCC ranges calculation.
from −1 (total disagreement between the predictions 7) Word2vec is a well-known algorithm used for word
and the true labels) to 1 (perfect agreement between embeddings, generating dense vector representations of
the predictions and the true labels), where higher values words that can be utilized as input to DL models for
indicate better performance. In SA, MCC is used to SA.
evaluate the overall performance of a binary classifier in 8) Global Vectors for Word Representation (GloVe) is
predicting the correct sentiment label for each instance another widely used algorithm for word embeddings.
(e.g., a tweet or a review). It relies on co-occurrence statistics and produces dense
9) Logarithmic loss (log loss) is used in binary clas- vector representations of words that capture semantic
sification tasks to measure the performance of a relationships.
probabilistic classifier in predicting the correct label for The earlier mentioned software tools and libraries provide
each instance. It calculates the difference between the a range of functionalities and resources for DL-based Twitter
predicted probabilities and the true binary labels, and SA. They are typically employed for preprocessing, fea-
penalizes high-confidence wrong predictions more than ture extraction, model building, and evaluation. Nevertheless,
low-confidence ones [57]. It is defined as follows: selecting tools or libraries should be carefully considered
1 X based on the research question and the problem at hand.
Log Loss = × (y ×log( p)+(1− y)×log(1− p))
n
(24) V. P RACTICAL A PPLICATIONS AND R ELATED
C ASE S TUDIES
where n is the total number of instances, y is the true
SA has garnered significant attention from the research
binary label (0 or 1), and p is the predicted probability
community due to its diverse use cases such as in social media,
of the positive class (i.e., the sentiment label “positive”).
business, politics, healthcare, and tourism domains, as illus-
trated in Fig. 16. This section highlights the various practical
C. Resources and Tools for SA business applications of SA through related real-world case
Over the past few years, SA on Twitter using DL has studies.
gained substantial attention. This has led to the development 1) Brand Reputation Management: SA can be used to
of various software tools and libraries that can be used to monitor the online reputation of brands. A company
implement and evaluate these models. In this section, the can leverage it to monitor social media and determine
commonly used software tools and libraries in state of the how customers are reacting to their products or ser-
art are compiled, as presented in Table IX. The majority of vices. This information can help them improve their
the implementations use Python 3.x, along with popular DL marketing strategies, create targeted campaigns, and ulti-
libraries such as PyTorch, Keras, and TensorFlow. In addition, mately increase sales. For instance, a study conducted by
some implementations also use the MATLAB platform. Ahmed and Rodríguez-Díaz [160] used SA to monitor
the online reputation of various airlines. The study
1) Natural Language Toolkit (NLTK) is a Python library
analyzed tweets containing airline names and catego-
that offers a comprehensive suite of tools and resources
rized the sentiments as positive, negative, or neutral.
for natural language processing, including functions for
The results provided insights to airlines for reputation
tokenization, stemming, SA, and text classification.
management and to improve their services. Another case
2) Scikit-learn is a Python-based ML library that offers
study is the analysis of customer reviews for online food
a wide range of supervised and unsupervised learning
delivery apps to identify areas of improvement in their
algorithms. It provides tools for text classification, SA,
and feature extraction. services [161].
3) TensorFlow is an open-source ML library created by 2) Structure Marketing Strategies: SA can be useful in for-
Google that provides a wide range of tools for building mulating marketing strategies and marketing forecasting.
and training DL models, such as CNNs, RNNs, and A study by Lehrer et al. [162] suggests a DL-based
transformers. technique to evaluate the polarity of sentiments on
4) PyTorch is an open-source ML library developed by Twitter at an hourly rate. The proposed method considers
Facebook. It offers tools for building and training DL mixed data sampling, resulting in a lower reduction of
models, including CNNs, RNNs, and transformers. It has past data, which makes it highly appropriate for this
gained popularity due to its user-friendliness and flexi- novel source of data.
bility. 3) Political Opinion Mining: Recently, SA has been
5) Keras is a high-level DL library that provides a increasingly used in politics to monitor public opinions
user-friendly API for building and training DL models. and identify the sentiment behind political campaigns.
It is built on top of TensorFlow and simplifies the It is useful for understanding and modeling voter behav-
process of building complex models. ior during political campaigns or activism and can even
6) Gensim is a Python library used for topic model- indicate the outcome of an election. In a case study
ing and natural language processing. It provides tools by Chandra and Saini [132], conducted on the U.S.
Authorized licensed use limited to: University of the West of England. Downloaded on December 25,2024 at 12:06:56 UTC from IEEE Xplore. Restrictions apply.
CHAUDHARY et al.: REVIEW OF DL MODELS FOR TWITTER SA 3571
S UMMARY OF DATASETS
TABLE VII
Authorized licensed use limited to: University of the West of England. Downloaded on December 25,2024 at 12:06:56 UTC from IEEE Xplore. Restrictions apply.
3572 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 11, NO. 3, JUNE 2024
Authorized licensed use limited to: University of the West of England. Downloaded on December 25,2024 at 12:06:56 UTC from IEEE Xplore. Restrictions apply.
CHAUDHARY et al.: REVIEW OF DL MODELS FOR TWITTER SA 3573
Authorized licensed use limited to: University of the West of England. Downloaded on December 25,2024 at 12:06:56 UTC from IEEE Xplore. Restrictions apply.
3574 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 11, NO. 3, JUNE 2024
by Jain et al. [66] to classify the sentiment of cus- 5) Limited Attention to Domain-Specific SA: The focus of
tomer reviews to further recommend user-personalized SA research has been predominantly on general SA
products. Preethi et al. [165] developed an RNN to models, with limited attention paid to domain-specific
analyze sentiments in reviews and improve movie and SA. There is a pressing need for models that can
restaurant recommendations. In addition, SA can also effectively analyze sentiment in specialized fields, such
aid in behavioral analysis in commodity markets [166]. as medical [163], financial [141], or legal data [68].
Future research should explore methods for developing
VI. R ESEARCH G APS AND F UTURE P ERSPECTIVES domain-specific SA models that can accurately capture
Though DL models have shown significant evolution and the nuances of sentiment within these specific contexts.
excellent outcomes in the area of SA, there exist several 6) Robustness and Reliability of Models: The robustness
research gaps and open challenges that need further explo- and reliability of SA models are an important area of
ration. This section discusses the current research gaps and research that needs to be addressed. The existing models
potential future directions for SA research. are not robust enough to handle sarcasm, irony, and
1) Decision-Making Tool: DL models find their usage figurative language, which are prevalent in social media
in various industries, including marketing, service, platforms, especially Twitter. The models often misin-
government, and academia, to analyze sentiment in terpret these nuances and produce inaccurate results.
decision-making problems. These models can be mod- Therefore, future research should focus on developing
ified and adopted to achieve high accuracy, taking into more robust models that can handle these language
account the complexities of textual analysis for practi- intricacies and improve the reliability of SA results [64],
cal applications. Numerous studies indicate that noisy [74]. In addition, research can also explore the impact
features may negatively impact classification outcomes; of linguistic and cultural differences on the accuracy of
hence, DL methods can be designed to optimize features SA models.
in an iterative process [7], [38], [120]. In addition, the 7) Interpretability of Models: It refers to the ability to
models can be improved to perform opinion mining, SA, understand the reasoning behind a model’s predic-
and topic detection simultaneously. tions. Most of the DL-based SA models are considered
2) Processing Short Sequences: Dealing with short black-box models because they operate on complex
sequences of social media text content that has varying computations and are difficult to interpret [37], [139],
content and background information is a challenging [141]. This makes it challenging for users to trust the
task. When it comes to processing such short sequences, model’s predictions and understand how they were gen-
dynamic routing is not as effective as static rout- erated. In recent years, there has been a growing interest
ing algorithms due to the variability of background in developing more transparent models, also known
details. However, this issue can be addressed by using as explainable AI, which can provide insight into the
attention-based CNs [56] along with dynamic routing reasoning behind their predictions. This approach could
algorithms to extract relations for text content processing help increase trust in the model and improve its use-
and SA. Moreover, integrating contextual embedding fulness by allowing users to understand and potentially
with capsule-based models can lead to better perfor- correct any biases or errors. Therefore, there is a need
mance as this technique has proven effective in most for research to develop more transparent SA models that
DL approaches. can provide explanations for their predictions.
3) Handling Large Datasets: One of the research gaps in 8) Performance Measures: Current evaluation metrics used
SA on Twitter is the need to handle large datasets. in SA research focus mainly on accuracy [32], [38].
DL models have shown promising results in SA on There is a need for more comprehensive evaluation met-
Twitter, but they require large datasets for effective rics that take into account the nuances of SA, other than
optimization of the model parameters. The current state- gold standard performance measures, such as precision,
of-the-art methods for SA on Twitter, such as those recall, and F1-score [51].
presented in [68] and [69], can be improved by apply- 9) Incorporating User Feedback: Developing SA models
ing these models to larger datasets. Therefore, future that can learn from user feedback by identifying user
research could focus on developing new methods to patterns and can adapt to dynamic user preferences
handle large datasets that can improve the performance may improve the accuracy of SA models. In addi-
of SA models on Twitter. tion, researchers can explore the use of interactive SA
4) Handling Data Sparseness: Handling data sparseness tools that allow users to provide feedback in real time,
refers to the challenge of building effective SA models enabling the model to adapt to changing sentiments and
when the dataset is limited or incomplete. DL models preferences [60], [69].
have shown promising results in SA tasks, but they 10) Integrating Multiple Modalities: SA has traditionally
require large datasets to perform well and optimize been limited to analyzing only textual data, but the
their parameters [116]. By doing so, it may be possible incorporation of multiple modalities such as audio,
to address the issue of data sparsity and improve the video, and images can provide richer information for
accuracy and generalization ability of SA models on SA [137]. Future research can focus on developing
Twitter. more advanced multimodal SA models that can integrate
Authorized licensed use limited to: University of the West of England. Downloaded on December 25,2024 at 12:06:56 UTC from IEEE Xplore. Restrictions apply.
CHAUDHARY et al.: REVIEW OF DL MODELS FOR TWITTER SA 3575
multiple modalities that could better capture the com- more instances of neutral or negative sentiments compared to
plexity and variability of human emotions and poten- positive sentiments. Also, the integration of SA with other
tially provide more accurate and nuanced results in NLP techniques, such as entity recognition and summarization,
real-world settings. could provide further insights and improve the performance
of these models. In essence, this survey demonstrates that DL
VII. C ONCLUSION methods have advanced SA on Twitter, and there is still much
To sum up, the potential of using DL for SA on Twitter room for further improvement.
has been widely recognized and has become an important
research field due to the vast amount of user-generated content.
R EFERENCES
This work provides a comprehensive overview of the latest
advances in DL techniques for SA on Twitter. This work [1] N. Girdhar and K. K. Bharadwaj, “Signed social networks: A survey,”
outlines various preprocessing steps and word embeddings in Advances in Computing and Data Sciences. Ghaziabad, India:
required for this task. The work presents a simplified taxonomy Springer, 2017, pp. 326–335.
[2] N. Girdhar and K. K. Bharadwaj, “Community detection in signed
that bifurcates the literature into two categories: conventional social networks using multiobjective genetic algorithm,” J. Assoc. Inf.
(lexicon and ML) and DL approaches, along with their Sci. Technol., vol. 70, no. 8, pp. 788–804, Aug. 2019.
respective pros and cons. In addition, this article summarizes [3] N. Girdhar, S. Minz, and K. K. Bharadwaj, “Link prediction in signed
various practical applications of SA and identifies research social networks based on fuzzy computational model of trust and
distrust,” Soft Comput., vol. 23, no. 22, pp. 12123–12138, Nov. 2019.
gaps and domain-specific challenges. Furthermore, various [4] N. Girdhar and K. K. Bharadwaj, “Social status computation for
metrics adopted by different studies to evaluate the models’ nodes of overlapping communities in directed signed social networks,”
performance are also reviewed in this work. in Integrated Intelligent Computing, Communication and Security.
Singapore: Springer, 2019, pp. 49–57.
Overall, DL-based methods have shown great promise in SA
[5] N. Girdhar and K. K. Bharadwaj, “Friends recommender system
on Twitter, as they can capture complex language patterns and based on status (StatusFRS) for users of overlapping communities
handle the noise and sparsity of data. Moreover, techniques in directed,” in Applications of Artificial Intelligence Techniques in
such as fine-tuning and transfer learning have proved effective Engineering: SIGMA, vol. 698. Singapore: Springer, 2018, p. 225.
[6] N. Girdhar and K. K. Bharadwaj, “Mining of influencers in signed
in adapting pretrained models to Twitter-specific datasets. social networks: A memetic approach,” in Intelligent Human Computer
However, there are still several challenges that need to be Interaction. Allahabad, India: Springer, 2018, pp. 306–316.
addressed in SA on Twitter. One of the main challenges is [7] F. A. Lovera, Y. C. Cardinale, and M. N. Homsi, “Sentiment analysis
in Twitter based on knowledge graph and deep learning classification,”
dealing with the noise and complexity of Twitter data, such as Electronics, vol. 10, no. 22, p. 2739, Nov. 2021.
short text, spelling mistakes, abbreviations, slang, and emojis. [8] E. Cambria, “Affective computing and sentiment analysis,” IEEE Intell.
Another challenge is the imbalance of sentiments in data, with Syst., vol. 31, no. 2, pp. 102–107, Mar. 2016.
Authorized licensed use limited to: University of the West of England. Downloaded on December 25,2024 at 12:06:56 UTC from IEEE Xplore. Restrictions apply.
3576 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 11, NO. 3, JUNE 2024
[9] E. Chu and D. Roy, “Audio-visual sentiment analysis for learning [32] H. Ali, H. Farman, H. Yar, Z. Khan, S. Habib, and A. Ammar, “Deep
emotional arcs in movies,” in Proc. IEEE Int. Conf. Data Mining learning-based election results prediction using Twitter activity,” Soft
(ICDM), Nov. 2017, pp. 829–834. Comput., vol. 26, no. 16, pp. 7535–7543, Aug. 2022.
[10] J. C. De Albornoz, L. Plaza, P. Gervás, and A. Díaz, “A joint model [33] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning.
of feature mining and sentiment analysis for product review rating,” Cambridge, MA, USA: MIT Press, 2016.
in Advances in Information Retrieval. Dublin, Ireland: Springer, 2011, [34] M. Yasir et al., “An efficient deep learning based model to predict
pp. 55–66. interest rate using Twitter sentiment,” Sustainability, vol. 12, no. 4,
[11] M. A. Mirtalaie, O. K. Hussain, E. Chang, and F. K. Hussain, p. 1660, Feb. 2020.
“Sentiment analysis of specific product’s features using product tree [35] F. Y. Zhou, L. P. Jin, and J. Dong, “A review of convolutional
for application in new product development,” in Advances in Intelligent neural networks,” J. Comput., vol. 40, pp. 1229–1251, 2017.
Networking and Collaborative Systems. Berlin, Germany: Springer, [Online]. Available: https://www.scirp.org/(S(vtj3fa45qm1ean45%
2018, pp. 82–95. 20vvffcz55))/reference/referencespapers.aspx?referenceid=3111121
[12] D. J. S. Oliveira, P. H. D. S. Bermejo, and P. A. dos Santos, “Can social [36] Y. Li and H. Dong, “Text emotion analysis based on CNN and BiLSTM
media reveal the preferences of voters? A comparison between senti- network feature fusion,” Comput. Appl., vol. 38, no. 11, pp. 29–34,
ment analysis and traditional opinion polls,” J. Inf. Technol. Politics, 2018.
[37] A. Feizollah, S. Ainin, N. B. Anuar, N. A. B. Abdullah, and M. Hazim,
vol. 14, no. 1, pp. 34–45, Jan. 2017.
“Halal products on Twitter: Data extraction and sentiment analy-
[13] S. Rosenthal, N. Farra, and P. Nakov, “SemEval-2017 task 4: Sentiment sis using stack of deep learning algorithms,” IEEE Access, vol. 7,
analysis in Twitter,” 2019, arXiv:1912.00741. pp. 83354–83362, 2019.
[14] V. S. Pagolu, K. N. Reddy, G. Panda, and B. Majhi, “Sentiment analysis [38] B. Wang, D. Shan, A. Fan, L. Liu, and J. Gao, “A sentiment clas-
of Twitter data for predicting stock market movements,” in Proc. Int. sification method of web social media based on multidimensional
Conf. Signal Process., Commun., Power Embedded Syst. (SCOPES), and multilevel modeling,” IEEE Trans. Ind. Informat., vol. 18, no. 2,
Oct. 2016, pp. 1345–1350. pp. 1240–1249, Feb. 2022.
[15] T. Chalothorn and J. Ellman, “Using SentiWordNet and sentiment [39] G. Xu, Y. Meng, X. Qiu, Z. Yu, and X. Wu, “Sentiment analysis of com-
analysis for detecting radical content on web forums,” Northumbria ment texts based on BiLSTM,” IEEE Access, vol. 7, pp. 51522–51532,
Univ. Newcastle, Newcastle upon Tyne, U.K., Tech. Rep., 2012. 2019.
[Online]. Available: https://nrl.northumbria.ac.uk/id/eprint/13075/ [40] A. Sherstinsky, “Fundamentals of recurrent neural network (RNN)
[16] N. D. Gitari, Z. Zhang, H. Damien, and J. Long, “A lexicon-based and long short-term memory (LSTM) network,” Phys. D, Nonlinear
approach for hate speech detection,” Int. J. Multimedia Ubiquitous Phenomena, vol. 404, Mar. 2020, Art. no. 132306.
Eng., vol. 10, no. 4, pp. 215–230, Apr. 2015. [41] G. Liu and J. Guo, “Bidirectional LSTM with attention mechanism and
[17] M. Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede, “Lexicon- convolutional layer for text classification,” Neurocomputing, vol. 337,
based methods for sentiment analysis,” Comput. Linguistics, vol. 37, pp. 325–338, Apr. 2019.
no. 2, pp. 267–307, Jun. 2011. [42] A. Drif and K. Hadjoudj, “An opinion spread prediction model with
Twitter emotion analysis during Algeria’s Hirak,” Comput. J., vol. 64,
[18] V. A. Kharde and P. Sheetal. Sonawane, “Sentiment analysis of Twitter
no. 1, pp. 358–368, Nov. 2019.
data: A survey of techniques,” 2016, arXiv:1601.06971.
[43] E. Zhu, J. Wu, H. Liu, and K. Li, “A sentiment index
[19] M. A. Al-Khalisy and H. B. Jehlol, “Terrorist affiliations identifying of the housing market: Text mining of narratives on social
through Twitter social media analysis using data mining and web media,” tsinghua Univ., China, Apr. 2018. [Online]. Available:
mapping techniques,” J. Eng. Appl. Sci., vol. 13, no. 17, pp. 7459–7464, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3223566
2018. [44] A. S. Imran, S. M. Daudpota, Z. Kastrati, and R. Batra, “Cross-
[20] E. Ferrara, W.-Q. Wang, O. Varol, A. Flammini, and A. Galstyan, “Pre- cultural polarity and emotion detection using sentiment analysis and
dicting online extremism, content adopters, and interaction reciprocity,” deep learning on COVID-19 related tweets,” IEEE Access, vol. 8,
in Social Informatics. Bellevue, WA, USA: Springer, 2016, pp. 22–39. pp. 181074–181090, 2020.
[21] M. Kumar, R. Bhatia, and D. Rattan, “A survey of web crawlers for [45] A. Mittal and S. Patidar, “Sentiment analysis on Twitter data: A survey,”
information retrieval,” WIREs Data Mining Knowl. Discovery, vol. 7, in Proc. 7th Int. Conf. Comput. Commun. Manag., 2019, pp. 91–95.
no. 6, p. e1218, Nov. 2017. [46] N. F. F. D. Silva, L. F. S. Coletta, and E. R. Hruschka, “A survey
[22] S. Mansour, “Social media analysis of user’s responses to terrorism and comparative study of tweet sentiment analysis via semi-supervised
using sentiment analysis and text mining,” Proc. Comput. Sci., vol. 140, learning,” ACM Comput. Surv., vol. 49, no. 1, pp. 1–26, Mar. 2017.
pp. 95–103, Jan. 2018. [47] N. Azzouza, K. Akli-Astouati, A. Oussalah, and S. A. Bachir, “A real-
[23] T. Simon, A. Goldberg, L. Aharonson-Daniel, D. Leykin, and B. Adini, time Twitter sentiment analysis using an unsupervised method,” in
“Twitter in the cross fire—The use of social media in the westgate Proc. 7th Int. Conf. Web Intell., Mining Semantics, Jun. 2017, pp. 1–10.
mall terror attack in Kenya,” PLoS ONE, vol. 9, no. 8, Aug. 2014, [48] A. Ligthart, C. Catal, and B. Tekinerdogan, “Systematic reviews in
Art. no. e104136. sentiment analysis: A tertiary study,” Artif. Intell. Rev., vol. 54, no. 7,
[24] Y. Wei, L. Singh, and S. Martin, “Identification of extremism on pp. 4997–5053, Oct. 2021.
Twitter,” in Proc. IEEE/ACM Int. Conf. Adv. Social Netw. Anal. Mining [49] M. Wankhade, A. C. S. Rao, and C. Kulkarni, “A survey on sentiment
(ASONAM), Aug. 2016, pp. 1251–1255. analysis methods, applications, and challenges,” Artif. Intell. Rev.,
[25] B. S. Iskandar, “Terrorism detection based on sentiment analysis using vol. 55, no. 7, pp. 5731–5780, Oct. 2022.
machine learning,” J. Eng. Appl. Sci., vol. 12, no. 3, pp. 691–698, 2017. [50] R. Das and T. D. Singh, “Multimodal sentiment analysis: A survey of
[26] T. B. Mirani and S. Sasi, “Sentiment analysis of ISIS related tweets methods, trends, and challenges,” ACM Comput. Surv., vol. 55, no. 13s,
using absolute location,” in Proc. Int. Conf. Comput. Sci. Comput. pp. 1–38, Dec. 2023.
Intell. (CSCI), Dec. 2016, pp. 1140–1145. [51] A. Yadav and D. K. Vishwakarma, “Sentiment analysis using deep
learning architectures: A review,” Artif. Intell. Rev., vol. 53, no. 6,
[27] I. Fadel and C. Öz, “A sentiment analysis model for terrorist
pp. 4335–4385, Aug. 2020.
attacks reviews on Twitter,” Sakarya Univ. J. Sci., vol. 24, no. 6, [52] S. Sharma and A. Jain, “Role of sentiment analysis in social media
pp. 1294–1302, Dec. 2020. security and analytics,” WIREs Data Mining Knowl. Discovery, vol. 10,
[28] L. G. E. Smith, L. Wakeford, T. F. Cribbin, J. Barnett, and W. K. Hou, no. 5, p. e1366, Sep. 2020.
“Detecting psychological change through mobilizing interactions and [53] S. Soni and A. Sharaff, “Sentiment analysis of customer reviews based
changes in extremist linguistic style,” Comput. Hum. Behav., vol. 108, on hidden Markov model,” in Proc. Int. Conf. Adv. Res. Comput. Sci.
Jul. 2020, Art. no. 106298. Eng. Technol. (ICARCSET), Mar. 2015, pp. 1–5.
[29] A. Aleroud, N. Abu-Alsheeh, and E. Al-Shawakfa, “A graph proximity [54] P. N. Jain and A. S. Vaidya, “Analysis of social media based on
feature augmentation approach for identifying accounts of terrorists on terrorism-a review,” Vietnam J. Comput. Sci., vol. 8, no. 1, pp. 1–21,
Twitter,” Comput. Secur., vol. 99, Dec. 2020, Art. no. 102056. 2021.
[30] W. Li, K. Guo, Y. Shi, L. Zhu, and Y. Zheng, “DWWP: Domain-specific [55] N. Mehra, S. Khandelwal, and P. Patel, “Sentiment
new words detection and word propagation system for sentiment analy- identification using maximum entropy analysis of movie
sis in the tourism domain,” Knowl.-Based Syst., vol. 146, pp. 203–214, reviews,” Stanford Univ., Stanford, CA, USA, 2002. [Online].
Apr. 2018. Available: https://www.semanticscholar.org/paper/Sentiment-
[31] A. Hussain and E. Cambria, “Semi-supervised learning for big social Identification-Using-Maximum-Entropy-of-Mehra-
data analysis,” Neurocomputing, vol. 275, pp. 1662–1673, Jan. 2018. Khandelwal/bd5b49e12cdc1b922a28b122acf528199be4bf8d
Authorized licensed use limited to: University of the West of England. Downloaded on December 25,2024 at 12:06:56 UTC from IEEE Xplore. Restrictions apply.
CHAUDHARY et al.: REVIEW OF DL MODELS FOR TWITTER SA 3577
[56] P. Demotte, K. Wijegunarathna, D. Meedeniya, and I. Perera, [79] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean,
“Enhanced sentiment extraction architecture for social media content “Distributed representations of words and phrases and their compo-
analysis using capsule networks,” Multimedia tools Appl., vol. 82, sitionality,” in Proc. Adv. Neural Inf. Process. Syst., vol. 26, 2013,
pp. 8665–8690, Sep. 2021. pp. 1–9.
[57] T. Swathi, N. Kasiviswanath, and A. A. Rao, “An optimal deep [80] J. Pennington, R. Socher, and C. Manning, “GloVe: Global vectors for
learning-based LSTM for stock price prediction using Twitter sentiment word representation,” in Proc. Conf. Empirical Methods Natural Lang.
analysis,” Int. J. Speech Technol., vol. 52, no. 12, pp. 13675–13688, Process. (EMNLP), 2014, pp. 1532–1543.
Sep. 2022. [81] T. Shi and Z. Liu, “Linking GloVe with word2vec,” 2014,
[58] P. Mehta, S. Pandya, and K. Kotecha, “Harvesting social media arXiv:1411.5595.
sentiment analysis to enhance stock market prediction using deep [82] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word
learning,” PeerJ Comput. Sci., vol. 7, p. e476, Apr. 2021. vectors with subword information,” Trans. Assoc. Comput. Linguistics,
[59] A. A. Reshi et al., “COVID-19 vaccination-related sentiments analysis: vol. 5, pp. 135–146, Dec. 2017.
A case study using worldwide Twitter dataset,” Healthcare, vol. 10, [83] A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, and
no. 3, p. 411, 2022. T. Mikolov, “FastText.Zip: Compressing text classification models,”
[60] H. A. Shehu et al., “Deep sentiment analysis: A case study on 2016, arXiv:1612.03651.
stemmed Turkish Twitter data,” IEEE Access, vol. 9, pp. 56836–56854, [84] J. Xu and Q. Du, “A deep investigation into fastText,” in Proc. IEEE
2021. 21st Int. Conf. High Perform. Comput. Commun., IEEE 17th Int. Conf.
[61] S. Ahmad, M. Z. Asghar, F. M. Alotaibi, and I. Awan, “Detection Smart City, IEEE 5th Int. Conf. Data Sci. Syst. (HPCC/SmartCity/DSS),
and classification of social media-based extremist affiliations using Aug. 2019, pp. 1714–1719.
sentiment analysis techniques,” Hum.-centric Comput. Inf. Sci., vol. 9, [85] M. E. Peters et al., “Deep contextualized word representations,” 2018,
no. 1, pp. 1–23, Dec. 2019. arXiv:1802.05365.
[62] M. Jabreel and A. Moreno, “A deep learning-based approach for multi- [86] F. Sebastiani and A. Esuli, “SentiWordNet: A publicly available lexical
label emotion classification in tweets,” Appl. Sci., vol. 9, no. 6, p. 1123, resource for opinion mining,” in Proc. 5th Int. Conf. Lang. Resour. Eval.
Mar. 2019. Genoa, Italy: European Language Resources Association (ELRA),
[63] J. G. D. Harb, R. Ebeling, and K. Becker, “A framework to ana- 2006, pp. 417–422.
lyze the emotional reactions to mass violent events on Twitter and [87] F. Å. Nielsen, “A new ANEW: Evaluation of a word list for sentiment
influential factors,” Inf. Process. Manag., vol. 57, no. 6, Nov. 2020, analysis in microblogs,” 2011, arXiv:1103.2903.
Art. no. 102372. [88] C. Hutto and E. Gilbert, “VADER: A parsimonious rule-based model
[64] M. A. Paredes-Valverde, R. Colomo-Palacios, M. D. P. Salas-Zarate, for sentiment analysis of social media text,” in Proc. Int. AAAI Conf.
and R. Valencia-García, “Sentiment analysis in Spanish for improve- Web Social Media, 2014, vol. 8, no. 1, pp. 216–225.
ment of products and services: A deep learning approach,” Sci. [89] W. Medhat, A. Hassan, and H. Korashy, “Sentiment analysis algo-
Programming, vol. 2017, Oct. 2017, Art. no. 1329281. rithms and applications: A survey,” Ain Shams Eng. J., vol. 5, no. 4,
[65] Z. Jalil et al., “COVID-19 related sentiment analysis using state-of-the- pp. 1093–1113, Dec. 2014.
art machine learning and deep learning techniques,” Frontiers Public [90] A. Jurek, M. D. Mulvenna, and Y. Bi, “Improved lexicon-based
Health, vol. 9, p. 2276, Jan. 2022. sentiment analysis for social media analytics,” Secur. Informat., vol. 4,
[66] P. K. Jain, V. Saravanan, and R. Pamula, “A hybrid CNN-LSTM: no. 1, pp. 1–13, Dec. 2015.
A deep learning approach for consumer sentiment analysis using
[91] D. Pinto, A. McCallum, X. Wei, and W. B. Croft, “Table extraction
qualitative user-generated contents,” ACM Trans. Asian Low-Resource
using conditional random fields,” in Proc. 26th Annu. Int. ACM SIGIR
Lang. Inf. Process., vol. 20, no. 5, pp. 1–15, Sep. 2021.
Conf. Res. Develop. informaion Retr., Jul. 2003, pp. 235–242.
[67] A. S. M. Alharbi and E. de Doncker, “Twitter sentiment analysis with
[92] M. Rezwanul, A. Ali, and A. Rahman, “Sentiment analysis on Twitter
a deep neural network: An enhanced approach using user behavioral
data using KNN and SVM,” Int. J. Adv. Comput. Sci. Appl., vol. 8,
information,” Cognit. Syst. Res., vol. 54, pp. 50–61, May 2019.
no. 6, pp. 1–7, 2017.
[68] M. Alotaibi, B. Alotaibi, and A. Razaque, “A multichannel deep
[93] L. Zhang, R. Ghosh, M. Dekhil, M. Hsu, and B. Liu, “Combining
learning framework for cyberbullying detection on social media,”
lexicon-based and learning-based methods for Twitter sentiment anal-
Electronics, vol. 10, no. 21, p. 2664, Oct. 2021.
[69] G. Abdalla and F. Özyurt, “Sentiment analysis of fast food companies ysis,” HP Lab., Tech. Rep. HPL-2011, 2011, pp. 1–8.
with deep learning models,” Comput. J., vol. 64, no. 1, pp. 383–390, [94] P. Palanisamy, V. Yadav, and H. Elchuri, “Serendio: Simple and
Nov. 2019. practical lexicon based approach to sentiment analysis,” in Proc. 2nd
[70] M. Umer, I. Ashraf, A. Mehmood, S. Kumari, S. Ullah, and G. S. Choi, Joint Conf. Lexical Comput. Semantics, 7th Int. Workshop Semantic
“Sentiment analysis of tweets using a unified convolutional neural Eval., 2013, pp. 543–548.
network-long short-term memory network model,” Comput. Intell., [95] B. Agarwal, S. Poria, N. Mittal, A. Gelbukh, and A. Hussain, “Concept-
vol. 37, no. 1, pp. 409–434, Feb. 2021. level sentiment analysis with dependency-based semantic parsing:
[71] C. P. D. Cyril, J. R. Beulah, N. Subramani, P. Mohan, A novel approach,” Cognit. Comput., vol. 7, no. 4, pp. 487–499,
A. Harshavardhan, and D. Sivabalaselvamani, “An automated learn- Aug. 2015.
ing model for sentiment analysis and data classification of Twitter [96] R. D. Desai, “Sentiment analysis of Twitter data,” in Proc. 2nd Int.
data using balanced CA-SVM,” Concurrent Eng., vol. 29, no. 4, Conf. Intell. Comput. Control Syst. (ICICCS), Jun. 2018, pp. 114–117.
pp. 386–395, Dec. 2021. [97] M. Mashuri, “Sentiment analysis in Twitter using lexicon based and
[72] A. R. Pathak, M. Pandey, and S. Rautaray, “Topic-level sentiment polarity multiplication,” in Proc. Int. Conf. Artif. Intell. Inf. Technol.
analysis of social media data using deep learning,” Appl. Soft Comput., (ICAIIT), Mar. 2019, pp. 365–368.
vol. 108, Sep. 2021, Art. no. 107440. [98] S. Rani and J. Singh, “Sentiment analysis of tweets using support vector
[73] C. I. Eke, A. A. Norman, and L. Shuib, “Context-based feature machine,” Int. J. Comput. Sci. Mobile Appl., vol. 5, no. 10, pp. 83–91,
technique for sarcasm identification in benchmark datasets using deep 2017.
learning and BERT model,” IEEE Access, vol. 9, pp. 48501–48518, [99] A. L. F. Alves, C. D. S. Baptista, A. A. Firmino, M. G. D. Oliveira, and
2021. A. C. D. Paiva, “A comparison of SVM versus naive-bayes techniques
[74] J. Bedi and D. Toshniwal, “CitEnergy: A BERT based model to analyse for sentiment analysis in tweets: A case study with the 2013 FIFA
Citizens’ energy-tweets,” Sustain. Cities Soc., vol. 80, May 2022, confederations cup,” in Proc. 20th Brazilian Symp. Multimedia Web,
Art. no. 103706. Brazil, Nov. 2014, pp. 123–130.
[75] Y. Liu and M. Zhang, Neural Network Methods for Natural Language [100] H. Wu, J. Li, and J. Xie, “Maximum entropy-based sentiment analysis
Processing. Cambridge, MA, USA: MIT Press, 2018. of online product reviews in Chinese,” in Automotive, Mechanical
[76] E. Rudkowsky, M. Haselmayer, M. Wastian, M. Jenny, S. Emrich, and Electrical Engineering. Boca Raton, FL, USA: CRC Press, 201,
and M. Sedlmair, “More than bags of words: Sentiment analysis with pp. 559–562.
word embeddings,” Commun. Methods Measures, vol. 12, nos. 2–3, [101] E. Omer, “Using machine learning to identify Jihadist
pp. 140–157, Apr. 2018. messages on Twitter,” Uppsala Univ., Uppsala, Sweden, 2015.
[77] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation [Online]. Available: https://www.diva-portal.org/smash/get/
of word representations in vector space,” 2013, arXiv:1301.3781. diva2:846343/FULLTEXT01.pdf
[78] Y. Goldberg and O. Levy, “word2vec explained: Deriving [102] L. Kaati, E. Omer, N. Prucha, and A. Shrestha, “Detecting multipliers
Mikolov et al.’s negative-sampling word-embedding method,” of jihadism on Twitter,” in Proc. IEEE Int. Conf. Data Mining Work-
2014, arXiv:1402.3722. shop (ICDMW), Nov. 2015, pp. 954–960.
Authorized licensed use limited to: University of the West of England. Downloaded on December 25,2024 at 12:06:56 UTC from IEEE Xplore. Restrictions apply.
3578 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 11, NO. 3, JUNE 2024
[103] M. Nouh, J. R. C. Nurse, and M. Goldsmith, “Understanding the [127] A. Bello, S.-C. Ng, and M.-F. Leung, “A BERT framework to sentiment
radical mind: Identifying signals to detect extremist content on Twitter,” analysis of tweets,” Sensors, vol. 23, no. 1, p. 506, Jan. 2023.
in Proc. IEEE Int. Conf. Intell. Secur. Informat. (ISI), Jul. 2019, [128] F. Barbieri, J. Camacho-Collados, L. Espinosa Anke, and L. T. Neves,
pp. 98–103. “Unified benchmark and comparative evaluation for tweet classifica-
[104] A. Omar, T. M. Mahmoud, T. Abd-El-Hafeez, and A. Mahfouz, “Multi- tion,” in Proc. Findings Assoc. Comput. Linguistics, Nov. 2020, pp.
label Arabic text classification in online social networks,” Inf. Syst., 1644–1650. [Online]. Available: https://aclanthology.org/2020.findings-
vol. 100, Sep. 2021, Art. no. 101785. emnlp.148/
[105] A. P. Jain and P. Dandannavar, “Application of machine learning [129] Y. Cui, Y. Jiang, and H. Gu, “Novel sentiment analysis from Twitter
techniques to sentiment analysis,” in Proc. 2nd Int. Conf. Appl. Theor. for stock change prediction,” in Data Mining and Big Data. Beijing,
Comput. Commun. Technol. (iCATccT), Jul. 2016, pp. 628–632. China: Springer, 2023, pp. 160–172.
[106] M. Hartung, R. Klinger, F. Schmidtke, and L. Vogel, “Identifying right- [130] M. Ul Haque, I. Dharmadasa, Z. T. Sworna, R. N. Rajapakse, and
wing extremism in German Twitter profiles: A classification approach,” H. Ahmad, “‘I think this is the most disruptive technology’: Exploring
in Natural Language Processing and Information Systems. Liège, sentiments of ChatGPT early adopters using Twitter data,” 2022,
Belgium: Springer, 2017, pp. 320–325. arXiv:2212.05856.
[107] Z. Ul Rehman et al., “Understanding the language of ISIS: An empir- [131] F. K. Khattak, S. Jeblee, C. Pou-Prom, M. Abdalla, C. Meaney, and
ical approach to detect radical content on Twitter using machine F. Rudzicz, “A survey of word embeddings for clinical text,” J. Biomed.
learning,” Comput., Mater. Continua, vol. 66, no. 2, pp. 1075–1090, Informat., vol. 100, Jan. 2019, Art. no. 100057.
2021. [132] R. Chandra and R. Saini, “Biden vs trump: Modeling u s gen-
[108] W. Sharif et al., “An empirical approach for extreme behavior iden- eral elections using BERT language model,” IEEE Access, vol. 9,
tification through tweets using machine learning,” Appl. Sci., vol. 9, pp. 128494–128505, 2021.
no. 18, p. 3723, Sep. 2019. [133] M. E. Basiri, S. Nemati, M. Abdar, S. Asadi, and U. R. Acharrya,
[109] M. A. Masood and R. A. Abbasi, “Using graph embedding and machine “A novel fusion-based deep learning model for sentiment analysis
learning to identify rebels on Twitter,” J. Informetrics, vol. 15, no. 1, of COVID-19 tweets,” Knowl.-Based Syst., vol. 228, Sep. 2021,
Feb. 2021, Art. no. 101121. Art. no. 107242.
[110] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural [134] J. Yang, X. Zou, W. Zhang, and H. Han, “Microblog sentiment analysis
Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997. via embedding social contexts into an attentive LSTM,” Eng. Appl.
[111] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre- Artif. Intell., vol. 97, Jan. 2021, Art. no. 104048.
training of deep bidirectional transformers for language understanding,” [135] A. Alsayat, “Improving sentiment analysis for social media applications
2018, arXiv:1810.04805. using an ensemble deep learning language model,” Arabian J. Sci. Eng.,
[112] Y. Liu et al., “RoBERTa: A robustly optimized BERT pretraining vol. 47, no. 2, pp. 2499–2511, Feb. 2022.
approach,” 2019, arXiv:1907.11692. [136] P. Wu, X. Li, S. Shen, and D. He, “Social media opinion summarization
[113] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and using emotion cognition and convolutional neural networks,” Int. J. Inf.
Q. V. Le, “XLNet: Generalized autoregressive pretraining for language Manag., vol. 51, Apr. 2020, Art. no. 101978.
understanding,” in Proc. Adv. Neural Inf. Process. Syst., vol. 32, 2019, [137] G. Chandrasekaran, N. Antoanela, G. Andrei, C. Monica, and
pp. 1–11. J. Hemanth, “Visual sentiment analysis using deep learning models
[114] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, with social media data,” Appl. Sci., vol. 12, no. 3, p. 1030, Jan. 2022.
“Improving language understanding by generative pre- [138] H.-T. Nguyen and L.-M. Nguyen, “ILWAANet: An interactive
training,” OpenAI, San Francisco, CA, USA, 2018. [Online]. lexicon-aware word-aspect attention network for aspect-level senti-
Available: https://www.semanticscholar.org/paper/Improving- ment classification on social networking,” Exp. Syst. Appl., vol. 146,
Language-Understanding-by-Generative-Radford- May 2020, Art. no. 113065.
Narasimhan/cd18800a0fe0b668a1cc19f2ec95b5003d0a5035 [139] A. M. Sadiq, H. Ahn, and Y. B. Choi, “Human sentiment and activity
[115] J. Feng, X. He, Q. Teng, C. Ren, H. Chen, and Y. Li, “Reconstruction recognition in disaster situations using social media images based on
of porous media from extremely limited information using conditional deep learning,” Sensors, vol. 20, no. 24, p. 7115, Dec. 2020.
generative adversarial networks,” Phys. Rev. E, Stat. Phys. Plasmas Flu- [140] S. Visweswaran et al., “Machine learning classifiers for Twitter surveil-
ids Relat. Interdiscip. Top., vol. 100, no. 3, Sep. 2019, Art. no. 033308. lance of vaping: Comparative machine learning study,” J. Med. Internet
[116] P. Zola, P. Cortez, C. Ragno, and E. Brentari, “Social media cross- Res., vol. 22, no. 8, Aug. 2020, Art. no. e17478.
source and cross-domain sentiment classification,” Int. J. Inf. Technol. [141] M. U. Salur and I. Aydin, “A novel hybrid deep learning model for
Decis. Making, vol. 18, no. 5, pp. 1469–1499, Sep. 2019. sentiment classification,” IEEE Access, vol. 8, pp. 58080–58093, 2020.
[117] H. Gu, Y. Wang, S. Hong, and G. Gui, “Blind channel identification [142] C. Singh, T. Imam, S. Wibowo, and S. Grandhi, “A deep learning
aided generalized automatic modulation recognition based on deep approach for sentiment analysis of COVID-19 reviews,” Appl. Sci.,
learning,” IEEE Access, vol. 7, pp. 110722–110729, 2019. vol. 12, no. 8, p. 3709, Apr. 2022.
[118] W. Fang et al., “A LSTM algorithm estimating pseudo measurements [143] B. A. Galende, G. Hernández-Peñaloza, S. Uribe, and F. A. García,
for aiding INS during GNSS signal outages,” Remote Sens., vol. 12, “Conspiracy or not? A deep learning approach to spot it on Twitter,”
no. 2, p. 256, Jan. 2020. IEEE Access, vol. 10, pp. 38370–38378, 2022.
[119] X.-H. Le, H. V. Ho, G. Lee, and S. Jung, “Application of long short- [144] N. C. Dang, M. N. Moreno-García, and F. De la Prieta, “Sentiment
term memory (LSTM) neural network for flood forecasting,” Water, analysis based on deep learning: A comparative study,” Electron.,
vol. 11, no. 7, p. 1387, Jul. 2019. vol. 9, no. 3, p. 483, 2020.
[120] S. Tam, R. B. Said, and Ö. Ö. Tanriöver, “A ConvBiLSTM deep [145] L. A. Ngoge, “Real-time sentiment analysis for detection of terrorist
learning model-based approach for Twitter sentiment classification,” activities in Kenya,” Ph.D. dissertation, Fac. Inf. Technol., Strathmore
IEEE Access, vol. 9, pp. 41283–41293, 2021. Univ., Nairobi, Kenya, 2016.
[121] I. K. Ihianle, A. O. Nwajana, S. H. Ebenuwa, R. I. Otuka, K. Owa, [146] I. Gupta and N. Joshi, “Enhanced Twitter sentiment analysis using
and M. O. Orisatoki, “A deep learning approach for human activities hybrid approach and by accounting local contextual semantic,” J. Intell.
recognition from multimodal sensing devices,” IEEE Access, vol. 8, Syst., vol. 29, no. 1, pp. 1611–1625, Dec. 2019.
pp. 179028–179038, 2020. [147] J. Du, J. Xu, H.-Y. Song, and C. Tao, “Leveraging machine
[122] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural net- learning-based approaches to assess human papillomavirus vaccination
works,” IEEE Trans. Signal Process., vol. 45, no. 11, pp. 2673–2681, sentiment trends with Twitter data,” BMC Med. Informat. Decis.
1997. Making, vol. 17, no. S2, pp. 63–70, Jul. 2017.
[123] G. Blanco and A. Lourenco, “Optimism and pessimism analysis [148] M. Wang and G. Hu, “A novel method for Twitter sentiment analysis
using deep learning on COVID-19 related Twitter conversations,” Inf. based on attentional-graph neural network,” Information, vol. 11, no. 2,
Process. Manag., vol. 59, no. 3, 2022, Art. no. 1029182. p. 92, Feb. 2020.
[124] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation [149] W. Liao, B. Zeng, J. Liu, P. Wei, X. Cheng, and W. Zhang, “Multi-
of gated recurrent neural networks on sequence modeling,” 2014, level graph neural network for text sentiment analysis,” Comput. Electr.
arXiv:1412.3555. Eng., vol. 92, Jun. 2021, Art. no. 107096.
[125] F. Chollet, Deep Learning With Python. New York, NY, USA: Simon [150] M. Aflakparast, M. de Gunst, and W. van Wieringen, “Analysis of
and Schuster, 2021. Twitter data with the Bayesian fused graphical lasso,” PLoS ONE,
[126] S. Mann, J. Arora, M. Bhatia, R. Sharma, and R. Taragi, “Twitter vol. 15, no. 7, Jul. 2020, Art. no. e0235596.
sentiment analysis using enhanced BERT,” in Intelligent Systems and [151] A. Go, R. Bhayani, and L. Huang, “Twitter sentiment classification
Applications. Berlin, Germany: Springer, 2023, pp. 263–271. using distant supervision,” Stanford, vol. 1, no. 12, p. 2009, 2009.
Authorized licensed use limited to: University of the West of England. Downloaded on December 25,2024 at 12:06:56 UTC from IEEE Xplore. Restrictions apply.
CHAUDHARY et al.: REVIEW OF DL MODELS FOR TWITTER SA 3579
[152] J. Cohen, “A coefficient of agreement for nominal scales,” Educ. [160] A. Z. Ahmed and M. Rodríguez-Díaz, “Significant labels in sentiment
Psychol. Meas., vol. 20, no. 1, pp. 37–46, Apr. 1960. analysis of online customer reviews of airlines,” Sustainability, vol. 12,
[153] J. Carletta, “Assessing agreement on classification tasks: The Kappa no. 20, p. 8683, 2020.
statistic,” 1996, arXiv:preprint cmp-lg/9602004. [161] S. K. Trivedi and A. Singh, “Twitter sentiment analysis of app based
[154] C. Willmott and K. Matsuura, “Advantages of the mean absolute error online food delivery companies,” Global Knowl., Memory Commun.,
(MAE) over the root mean square error (RMSE) in assessing average vol. 70, pp. 891–910, Nov. 2021.
model performance,” Climate Res., vol. 30, pp. 79–82, 2005. [162] S. Lehrer, T. Xie, and T. Zeng, “Does high-frequency social media data
[155] P. D. Mahendhiran and S. Kannimuthu, “Deep learning tech- improve forecasts of low-frequency consumer confidence measures?”
niques for polarity classification in multimodal sentiment analysis,” J. Financial Econ., vol. 19, no. 5, pp. 910–933, Nov. 2021.
Int. J. Inf. Technol. Decis. Making, vol. 17, no. 3, pp. 883–910, [163] R. Satapathy, E. Cambria, and A. Hussain, Sentiment Analysis in the
May 2018. Bio-Medical Domain. Berlin, Germany: Springer, 2017.
[156] C. Spearman, “The proof and measurement of association between two [164] M. Paolanti et al., “Tourism destination management using sentiment
things,” Int. J. Epidemiology, vol. 39, no. 5, pp. 1137–1150, Oct. 2010. analysis and geo-location information: A deep learning approach,” Inf.
[157] M. G. Kendall, “A new measure of rank correlation,” Biometrika, Technol. Tourism, vol. 23, no. 2, pp. 241–264, Jun. 2021.
vol. 30, p. 81, Jun. 1938. [165] G. Preethi, P. V. Krishna, M. S. Obaidat, V. Saritha, and S. Yenduri,
[158] M.-L. Zhang and Z.-H. Zhou, “A review on multi-label learning algo- “Application of deep learning to sentiment analysis for recommender
rithms,” IEEE Trans. Knowl. Data Eng., vol. 26, no. 8, pp. 1819–1837, system on cloud,” in Proc. Int. Conf. Comput., Inf. Telecommun. Syst.
Aug. 2014. (CITS), Jul. 2017, pp. 93–97.
[159] B. W. Matthews, “Comparison of the predicted and observed secondary [166] M. J. Keenan, Advanced Positioning, Flow, and Sentiment Analysis in
structure of T4 phage lysozyme,” Biochimica et Biophysica Acta (BBA)- Commodity Markets: Bridging Fundamental and Technical Analysis.
Protein Struct., vol. 405, no. 2, pp. 442–451, Oct. 1975. Hoboken, NJ, USA: Wiley, 2020.
Authorized licensed use limited to: University of the West of England. Downloaded on December 25,2024 at 12:06:56 UTC from IEEE Xplore. Restrictions apply.