Arabic Sentiment Analysis Model
Arabic Sentiment Analysis Model
This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3342755
Date of publication xxxx 00, 0000, date of current version Nov 09, 2023.
Digital Object Identifier 10.1109/ACCESS.2023.17425
ABSTRACT The majority of research on the Aspect-Based Sentiment Analysis (ABSA) tends to split this
task into two subtasks: one for extracting aspects, Aspect Term Extraction (ATE), and another for identifying
sentiments toward particular aspects, Aspect Sentiment Classification (ASC). Although these subtasks are
closely related, they are performed independently; while performing the Aspect Sentiment Classification task,
it is assumed that the aspect terms are pre-identified, which ignores the practical interaction required to
properly perform the ABSA. This study addresses these limitations using a unified End-to-End (E2E)
approach, which combines the two subtasks into a single sequence labeling task using a unified tagging
schema. The proposed model was evaluated by fine-tuning the Arabic version of the Bidirectional Encoder
Representations from Transformers (AraBERT) model with a Conditional Random Fields (CRF) classifier
for enhanced target-polarity identification. The experimental results demonstrated the efficiency of the
proposed fine-tuned AraBERT-CRF model, which achieved an overall F1 score of 95.11% on the SemEval-
2016 Arabic Hotel Reviews dataset. The model's predictions are then subjected to additional processing, and
the results indicate the superiority of the proposed model, achieving an F1 score of 97.78% for the ATE task
and an accuracy of 98.34% for the ASC task, outperforming previous studies.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3342755
(E2E) approach, which can help overcome the limitations of of tasks and deliver remarkable results without the need for
previous studies. The E2E-ABSA can be carried out in one large datasets [9, 19, 20]. AraBERT [21] is a pre-trained
of two approaches [15]. The first approach is known as the language model specifically designed to handle the
Joint approach; it involves performing the two subtasks in complexities and ambiguity of the Arabic language and has
parallel with two sets of labels: one for the aspect boundaries achieved state-of-the-art performances in many Arabic NLP
(B, I, and O) [16] denoting the Beginning, Inside, and tasks.
Outside of the aspect term, respectively, for the ATE task, Utilizing this model to evaluate our proposed model can
and the other set of labels represents the sentiment polarities have a great influence. The bi-directionality of BERT [22]
(positive, negative, and neutral) for the ASC task. The allows it to learn the context of each word with respect to the
outcomes of both tasks are combined to produce the final entire sequence simultaneously, making it easier for the
label. However, the lack of a correlation between the aspect model to identify the aspect boundaries. Furthermore, the
boundaries and the corresponding sentiment polarities could self-attention mechanism of BERT [18] allows for the
cause this approach to suffer from error propagation [15]. association of opinion words with their relevant aspect terms
The second approach is the unified approach, which in order to predict sentiment polarity.
combines the two subtasks into a single sequence labeling The Conditional Random Fields (CRF) [23] classifier
task. The aspect boundary labels and the sentiment polarity has also proven its efficiency in delivering accurate results in
labels are combined to generate one set of unified labels (B- a variety of sequence labeling tasks [9]; it preserves the
positive, I-positive, etc.). Although the unified approach dependencies between tags/labels, ensuring the correctness
preserves the dependency between the aspect boundaries and of the predicted tag sequence and boosting overall
their sentiment polarities, it makes model prediction more performance.
challenging and can result in performance degradation [17]. Motivated by the aforementioned, the following is a
The model must identify the aspect boundary and the summary of the main contributions of this study:
sentiment polarity without providing any implicit prior • This study aims to tackle the subtasks of ABSA,
information about the aspect terms. An example of an Arabic specifically ATE and ASC, by integrating them into
sentence that clarifies the differences among ABSA a single sequence labeling task using a unified E2E
approaches is shown in TABLE 1. approach in order to overcome the previously
Several techniques, from rule-based to traditional mentioned limitations of two separate models for
machine and deep learning techniques, have been used to each subtask. To the best of our knowledge, this is
handle the Arabic ABSA. Rule-based techniques are static the first study to apply a unified E2E-ABSA on the
techniques with no learning models involved; they also rely SemEval-2016 Arabic Hotel Reviews dataset [2].
on external resources, which are scarce in Arabic. Machine • Preparing the dataset of Arabic Hotel Reviews [2]
Learning (ML) techniques, on the other hand, rely on so that it matches the desired classification task.
intensive feature engineering to adjust the data and select the • Several experiments were applied to evaluate the
appropriate features. Although Deep Learning (DL) proposed E2E approach, utilizing a feature-based
techniques have overcome the intensive feature engineering vs. fine-tuned AraBERT model along with CRF vs.
limitation, they require a large dataset for models to train and softmax [24] to assess the impact of different
produce accurate results [9]. implementations on the performance of the
Recently, pre-trained transformer-based language proposed model.
models [18] have attracted much attention due to their • Resolve the complexity and morphological
significant influence on various Natural Language ambiguity of the Arabic language using the
Processing (NLP) applications, including ABSA. A large AraBERT model.
amount of unlabeled texts were used to train these models to • Preserve the tag/label dependencies using
make them efficient in comprehending the input context. As Conditional Random Fields.
a result, these models can be fine-tuned to handle a variety
TABLE 1. Example to clarify the difference among ABSA approaches applied on the Arabic sentence " الطعام لذيذ ولكن الخدمة سيئة," i.e., the food is delicious,
but the service is terrible.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3342755
• Experimental results demonstrate that the proposed with the same set of features utilized in [7]. However, the
fine-tuned AraBERT-CRF model outperforms SMO classifier outperformed the baseline work and achieved
single-task methods and yields a better ABSA task the best results regarding the OTE task and the sentiment
representation. polarity identification task, respectively. Although ML-based
models perform well, they rely significantly on data
The rest of the paper is structured as follows: Section II
preprocessing and intensive feature engineering.
provides the related works in the ABSA field. The proposed
Additionally, Deep Learning models have made
model is presented in Section III. Section IV discusses the
conducted experiments and comparisons of the achieved significant contributions to the ABSA task. The authors in [14]
results with other related works. Section V shows our proposed INSIGHT-1 at SemEval-2016. They applied a
conclusion and future directions. Convolutional Neural Network (CNN) model for the ABSA
task on the Arabic Hotel Reviews dataset. The authors in [4],
II. RELATED WORKS the same authors of [26], have examined the use of the
This section provides an overview of the works applied to the Recurrent Neural Network (RNN) model to address the OTE
two subtasks of Arabic ABSA, Aspect Term Extraction and task as well as the aspect sentiment polarity identification task.
Aspect Sentiment Classification, showing their advantages They combined the word2vec [27] word embedding along
and limitations. However, while some studies may have with the features presented in the previous experiment [26].
covered other ABSA subtasks, our focus in this study is The results demonstrated that the SMO classifier
entirely on ATE and ASC. In addition, some of the work on outperformed the RNN model regarding performance metrics;
English E2E-ABSA is presented due to the lack of work on however, the RNN was faster during the execution time. Other
Arabic E2E-ABSA. variations of RNN are then explored in many studies. The
A. ASPECT TERM EXTRACTION (ATE) and ASPECT Bidirectional Long Short-Term Memory (BiLSTM) [28] and
SENTIMENT CLASSIFICATION (ASC) the Bidirectional Gated Recurrent Unit (BiGRU) were the
The Aspect Term Extraction task (or Opinion Target most utilized techniques in combination with the CRF
Expression (OTE) Extraction) extracts the explicit target classifier. In [5], the authors utilized a BiLSTM-CRF model
opinionated words or phrases in each text. It is usually for the ATE task, whereas in [3], a BiGRU-CRF model was
formulated as a sequence labeling task with a BIO tagging utilized. In [8], a BiLSTM-attention-LSTM-CRF model is
schema [16]. Consequently, the Aspect Sentiment utilized for the OTE task. As a feature representation, a
Classification task identifies the sentiment polarities towards combination of Continuous-Bag-of-Words (CBOW) [27] and
the given aspects [1]. This task is usually addressed with character-level embeddings generated via CNN is utilized in
several naming conventions of aspect term/based [3] and [8]. The fastText [29] character-level embedding is
polarity/sentiment identification/classification; however, for utilized in [5]. For the aspect-based sentiment polarity
simplicity, we will refer to it as ASC. classification task, the authors in [3] proposed an interactive
A lot of research has been conducted regarding the two attention network model (IAN) combined with a BiGRU. In
subtasks. In [25], the authors provided a benchmark annotated [5], they proposed the Aspect Based-LSTM-Polarity
Arabic News Posts dataset with a lexicon-based approach to Classification (AB-LSTM-PC) model with an aspect
evaluate their work on aspect term extraction and aspect term attention-based vector. In [12], the authors used a combination
polarity identification. The same authors then investigated of CBOW and skip-gram character-level embeddings. They
enhancing their baseline work by utilizing a set of ML applied a Stacked Bidirectional Independent LSTM (Bi-Indy-
classifiers, including CRF, Naïve Bayes (NB), Decision Tree LSTM) with a position-weighting and an attention mechanism
(J48: WEKA1 implementation), and K-Nearest Neighbor combined with a GRUs layer for the aspect sentiment
(IBK: WEKA implementation) along with a set of classification task.
morphological and word features including Named Entity An improvement in the performance concerning the
Recognition (NER), Part-of-Speech (POS) tagging and N- previous experiments was observed after utilizing character-
Grams. Results demonstrated that the J48 classifier level embeddings and attention mechanisms.
outperformed other classifiers regarding the ATE task, Consequently, pre-trained language models based on
whereas the CRF classifier achieved the best performance transformer architecture [18] have achieved remarkable
regarding the aspect term polarity identification task [7]. The success in Arabic ABSA. The authors in [10] used a
authors in [2] created a benchmark dataset of Arabic Hotel combination of AraBERT and Flair embeddings for aspect
Reviews in SemEval-2016 for the ABSA task. They applied extraction. They compared attaching a BiLSTM-CRF and
the Support Vector Machine (SVM) classifier as a baseline BiGRU-CRF layer on top of the stacked embeddings. The
model. An enhanced study is introduced in [26]; the authors results showed that fine-tuning AraBERT with a BiLSTM-
experimented with applying NB, Bayes Networks, J48, IBK, CRF layer achieved better performance. In [11], the authors
and SVM (SMO: WEKA implementation) classifiers along fine-tuned the pre-trained language model Arabic BERT [30]
1
https://www.cs.waikato.ac.nz/ml/weka/
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3342755
for the aspect sentiment polarity classification task. They used Aspect Sentiment Detection modules with a sentiment
a sentence-pair classification approach where the aspect term lexicon and an attention mechanism. The BiLSTM-CRF
is paired with the input sentence as an auxiliary sentence. In layer is utilized for predicting the final target sentiment label.
[13], the authors combined AraBERT and Arabic BERT and The authors also utilize attaching BERT embeddings, which
fine-tuned the generated Sequence-to-Sequence (Seq2Seq) eventually boost the performance. In [35], the authors
model for the aspect term polarity task. In [6], the authors utilized a BERT-SAN model where the BERT model is fine-
investigated the use of a Multilingual Universal Sentence tuned along with a neural classification layer and a Self-
Encoder (MUSE) [31] with a pooled BiGRU for the aspect Attention Network (SAN) for the unified E2E-ABSA. In [36,
extraction and aspect polarity classification tasks. The model 15], the authors applied two stacked BiLSTM layers for a
achieved a state-of-the-art result, indicating the superiority of unified E2E-ABSA. The GloVe [37] embeddings and target-
position information are used as features. In [17], the authors
the pre-trained language models. TABLE 2 summarizes the
propose a CasNSA model that consists of several modules: a
work on both ATE and ASC tasks, respectively.
contextual semantic representation module, a target
TABLE 2. Summary of Related Works for the Aspect Term Extraction (ATE) and Aspect Sentiment Classification (ASC) tasks.
Discussed Performance
Ref/Year Model Dataset Features
Task P(%) R(%) F1(%) ACC(%)
ATE Hotel - - 30.9 -
[2]/2016 SVM baseline N-Unigrams
ASC Reviews - - - 76.42
ATE J48 Gaza News 81.3 82.5 81.7 -
[7]/2016 POS, NER, and N-Grams
ASC CRF Posts - - - 87.9
Arabic Hotel
[14]/2016 ASC CNN (Aspect + text) word embeddings randomly initialized - - - 82.7
Reviews
ATE Arabic Hotel POS, NER, N-Grams, morphological and word features - - 48 -
[4]/2018 RNN
ASC Reviews + word2vec word embedding - - - 87
ATE Arabic Hotel 89.8 90 89.8 -
[26]/2019 SVM POS, NER, N-Grams, morphological and word features
ASC Reviews - - - 95.4
ATE BiLSTM-CRF - - 69.98
Arabic Hotel
[5]/2019 AB-LSTM-PC + Soft fastText char-level embedding
ASC Reviews - - - 82.6
Attention
BiLSTM-attention- Arabic Hotel
[8]/2020 ATE CNN char-level + CBOW word-level embeddings - - 72.83 -
LSTM-CRF Reviews
Bi-Indy-LSTM + Arabic Hotel (Aspect + text) word embeddings using skip-gram and
[12]/2021 ASC - - - 87.31
recurrent attention Reviews CBOW
ATE Arabic Hotel - - 93 92.82
[6]/2021 BiGRU MUSE sentence-level embeddings
ASC Reviews 90.8 90.5 90.86 91.40
ATE BiGRU-CNN-CRF Arabic Hotel - - 69.44 -
[3]/2021 AraVec [32] word-level + CNN char-level embedding
ASC IAN-BGRU Reviews - - - 83.98
HAAD [33] - - - 73
Gaza News
fine-tune Arabic - - - 85.73
[11]/2021 ASC Posts (Aspect + text) Arabic BERT word embeddings
BERT
Arabic Hotel
- - - 89.51
Reviews
fine-tune AraBERT- Gaza News
[9]/2022 ATE AraBERTv0.1 word embedding 87.7 88.5 88.1 -
BiGRU-CRF Posts
fine-tune AraBERT- Arabic Hotel AraBERTv0.2 word embedding + Flair string
[10]/2022 ATE - - 79.9 -
BiLSTM-CRF Reviews embedding
HAAD - - - 74.85
(Aspect + text) word embeddings using (AraBERT +
[13]/2022 ASC fine-tune AraBERT Arabic Hotel
Arabic BERT) along with Seq2Seq dialect normalization - - - 84.65
Reviews
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3342755
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3342755
10 Last_end ← 0 (pointer) TABLE 4. Distribution of Classes in Arabic Hotel Reviews Dataset After
11 For key ∈ Sort(Dict) do Preprocessing.
12 target, from_, to_, polarity ← Dict[key]
Tag O B-POS B-NEG B-NEU I-POS I-NEG I-NEU
13 Extract the text that precedes the first target
Text_with_Os ← text [last_end: from_] Train 79081 5846 3151 662 1130 629 85
14 Update the pointer to point to the remaining text Test 19370 1430 786 163 274 194 26
Last_end ← to_ Total 98451 7276 3937 825 1404 823 111
15 If the current target consists of only one word:
16 t ← t + target + “B-” + polarity TABLE 5. Samples of The Dataset After Preprocessing.
17 Else
18 Do the same for the first word, then change “B-” to Sentence Label Sequence
“I-” for the remaining words.
End
كانت الغرفه ممتازه وكذلك الموظفين B-POS O O B-POS O
19 Concatenate the text that precedes the first target with t
i.e., The room was excellent and so
S ← text_with_Os + t were the staff
20 If the current target is the last target that appears in the
sentence: فريق العمل الودود والمتعاون على االطالق O O O O I-POS B-POS
21 S ← S + Text[to_: ] i.e., Absolutely friendly and
22 Replace the white spaces in S with “O” followed by a new line cooperative staff team
23 Write the sentence S to the .iob output file. موقع االوتيل جيد واالكل جيد والحلويات مميزه O B-POS O B-NEU O I-NEU B-NEU
End i.e., The hotel’s location is good,
25 Return the.iob file the food is good, and the desserts
are distinctive
D. FEATURE EXTRACTION
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3342755
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3342755
class 𝑖, while 𝑑𝑗 denotes the representation with respect to all For prediction, the Viterbi algorithm is used to find the
classes 𝐶. tag sequence with the highest score 𝑌 ∗ :
Because the proposed E2E-ABSA is a multi-class 𝑌 ∗ = 𝐴𝑟𝑔𝑚𝑎𝑥 𝑌 ′ ∈𝑌̂ 𝑆(𝑋, 𝑌 ′ ) (8)
classification task, the model is trained to minimize the
categorical cross-entropy [39] between predicted and true The operational flow within CRF is shown in FIGURE 7.
results as follows:
𝑁 𝐶
1
𝐿(𝑦̂, 𝑦) = − ∑ ∑ 𝑦𝑖𝑘 log (𝑦̂𝑖𝑘 ) (4)
𝑁
𝑖=1 𝑘=1
2
https://huggingface.co/aubmindlab/bert-base-arabertv02
3 https://keras.io/api/
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3342755
Google Colaboratory with a Tesla P100 GPU, 25 GB RAM, We calculate the True Positive (TP), True Negative (TN),
and 167 GB Disk Space. False Positive (FP), and False Negative (FN) for each tag
independently. For example, in terms of the B-POS tag:
B. EVALUATION METRICS
All experiments were evaluated with four versions of k-fold • TP is the number of samples predicted as B-POS, and
cross-validation [42]: 3, 5, 10, and 15. The entire dataset its actual label is also B-POS.
(train and test) is shuffled and divided into k smaller sets; for • FP is the number of samples predicted as B-POS, but
each k, the model is trained using k-1 of the folds as training its actual label is something else.
data, then the model is validated on the test part. This process • FN is the number of B-POS samples but predicted as
is repeated k times with a new model and different testing something else.
folds in each case. The performance measure is then the • TN is the number of samples predicted as not B-POS,
average of the values computed in the loop to ensure the and its actual label is also not B-POS.
model’s resistance to overfitting.
The following metrics [43, 44] will be used to evaluate The evaluation scores are evaluated token-wise [45],
our proposed model, including Precision (P), Recall (R), F1 then an average value is calculated as the proposed model's
score, Accuracy (ACC), Area Under Curve (AUC), and Area evaluation score (macro-average [43]). Furthermore,
Under Precision-Recall (AUPR), which are defined by (9- BERT’s tokenizer generates new labels that are not defined
(15) as follows : in the dataset, which are created by [CLS], [SEP], and [PAD]
tokens discussed earlier. Those labels are ignored since they
𝑇𝑃 are irrelevant to the actual inference. Therefore, only the
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (9)
𝑇𝑃 + 𝐹𝑃 seven entities specified by B-NEG, B-POS, B-NEU, I-POS,
I-NEG, I-NEU, and O are reported for the evaluation metrics.
𝑇𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑃𝑅 = (10)
𝑇𝑃 + 𝐹𝑁 C. RESULTS AND DISCUSSION
2(𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙 ) This section presents the experiments that were carried out
𝐹1 𝑠𝑐𝑜𝑟𝑒 = (11) in this study, along with an analysis of the obtained results.
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙
1) EXPERIMENT 1: FEATURE-BASED METHOD
𝑇𝑃 + 𝑇𝑁
𝐴𝑐𝑐 = (12) In this experiment, we investigated the impact of utilizing the
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁 pre-trained AraBERT model as a feature-based model while
𝐹𝑃 keeping its parameters fixed during the training process.
𝐹𝑃𝑅 = (13) As stated in TABLE 6 and TABLE 7, the performance
𝐹𝑃 + 𝑇𝑁
of the feature-based AraBERT model on our E2E-ABSA
1 task is not particularly outstanding in all folds when using
𝐴𝑈𝐶 = ∫ 𝑇𝑃𝑅 𝑑(𝐹𝑃𝑅) (14) either the CRF or the MLP with softmax as classifiers. The
0
model does not appear to learn the required contextualized
features.
𝐴𝑈𝑃𝑅 = ∑(𝑅𝑒𝑐𝑎𝑙𝑙𝑛 − 𝑅𝑒𝑐𝑎𝑙𝑙𝑛−1 )𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛𝑛 (15) This behavior is expected because the AraBERT model
𝑛
was pre-trained on two specific tasks: Next Sentence
The precision (9) is the ratio of correctly predicted Prediction and Masked Language Modeling [21]. The
values for a class to all of its predictions, while the Recall representation of the model is obviously insufficient for the
(10), or the True Positive Rate (TPR), is the ratio of correctly downstream task, and task-specific fine-tuning is required to
predicted values for a class to the number of actual samples take advantage of AraBERT's capabilities in enhancing
of that class in the dataset. performance. However, the best performance was achieved
F1 score (11) is the harmonic average of Precision and by the 15-fold AraBERT-softmax model with a Precision of
Recall and is used mainly for evaluating sequence labeling 29.91%, Recall of 20.09%, F1 score of 21.65%, AUC of
tasks [9, 13, 17, 34]. 94.75%, and AUPR of 26.85%.
The Accuracy (12) is obtained by dividing the correctly Additionally, the average-AUC value seems high
classified labels by the total number of labels in the dataset. compared to the other model performance metrics. This is
The Receiver Operating Characteristic (ROC) [44] curve is often true for highly imbalanced datasets. As illustrated in
summarized by AUC (14) based on the TPR and False FIGURE 8 (a), the ROC curve has two lines: one for how
Positive Rate (FPR) at different classification thresholds. often the model correctly identifies positive cases (TPR) and
The higher the AUC, the better the model's performance in another for how often it mistakenly identifies negative cases
distinguishing between positive and negative classes. as positive (FPR). However, the false positive rate could be
The Precision-Recall (PR) curve is summarized by pulled down due to the large number of true negatives,
AUPR [44], defined by (15), as the weighted mean of resulting in a high-pointed ROC curve.
precisions achieved at each threshold 𝑛, where the weights In our proposed feature-based model, it is apparent that
are the increase in Recall from the previous threshold 𝑛 − 1. the model is confusing tags 𝐵 − 𝑃𝑂𝑆, 𝐵 − 𝑁𝐸𝐺, 𝐵 − 𝑁𝐸𝑈,
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3342755
#Fold 3 5 10 15
Label / Metric P R F1 AUC AUPR P R F1 AUC AUPR P R F1 AUC AUPR P R F1 AUC AUPR
B-POS 0.52 0.15 0.23 0.94 0.33 0.52 0.25 0.33 0.96 0.39 0.55 0.25 0.34 0.96 0.41 0.54 0.32 0.40 0.97 0.46
B-NEG 0.35 0.02 0.04 0.93 0.16 0.36 0.07 0.11 0.95 0.20 0.45 0.06 0.11 0.95 0.21 0.45 0.11 0.17 0.96 0.27
B-NEU 0.00 0.00 0.00 0.90 0.01 0.02 0.00 0.00 0.90 0.02 0.00 0.00 0.00 0.93 0.03 0.00 0.00 0.00 0.93 0.03
O 0.88 0.96 0.92 0.99 0.97 0.89 0.96 0.93 0.99 0.97 0.89 0.97 0.93 0.99 0.97 0.90 0.97 0.93 0.99 0.98
I-POS 0.17 0.00 0.00 0.92 0.05 0.09 0.00 0.00 0.93 0.08 0.06 0.00 0.00 0.93 0.06 0.17 0.01 0.02 0.94 0.09
I-NEG 0.00 0.00 0.00 0.89 0.02 0.07 0.00 0.00 0.92 0.04 0.07 0.00 0.00 0.92 0.03 0.07 0.00 0.00 0.93 0.05
I-NEU 0.00 0.00 0.00 0.84 0.00 0.00 0.00 0.00 0.86 0.00 0.00 0.00 0.00 0.87 0.00 0.03 0.00 0.00 0.90 0.00
Macro-average 27.40 16.12 16.95 91.57 22.01 27.87 18.28 19.65 93.00 24.30 28.79 18.26 19.68 93.57 24.42 29.91 20.09 21.65 94.57 26.85
#Fold 3 5 10 15
Label / Metric P R F1 AUC AUPR P R F1 AUC AUPR P R F1 AUC AUPR P R F1 AUC AUPR
B-POS 0.52 0.23 0.31 0.95 0.34 0.50 0.24 0.32 0.95 0.35 0.53 0.24 0.33 0.96 0.39 0.51 0.27 0.35 0.96 0.39
B-NEG 0.29 0.03 0.05 0.91 0.14 0.42 0.05 0.09 0.94 0.19 0.39 0.07 0.11 0.95 0.22 0.36 0.08 0.13 0.95 0.20
B-NEU 0.00 0.00 0.00 0.89 0.02 0.00 0.00 0.00 0.88 0.01 0.00 0.00 0.00 0.91 0.02 0.02 0.00 0.00 0.90 0.02
O 0.89 0.96 0.92 0.99 0.96 0.89 0.96 0.92 0.99 0.97 0.89 0.96 0.93 0.99 0.97 0.89 0.96 0.92 0.99 0.97
I-POS 0.14 0.00 0.01 0.86 0.03 0.21 0.01 0.01 0.92 0.06 0.16 0.01 0.01 0.91 0.06 0.27 0.01 0.01 0.93 0.08
I-NEG 0.03 0.00 0.00 0.91 0.02 0.02 0.00 0.00 0.88 0.02 0.05 0.01 0.01 0.90 0.03 0.00 0.00 0.00 0.92 0.04
I-NEU 0.00 0.00 0.00 0.84 0.00 0.00 0.00 0.00 0.85 0.00 0.00 0.00 0.00 0.86 0.00 0.00 0.00 0.00 0.86 0.00
Macro-average 26.71 17.39 18.43 90.71 21.57 29.10 17.97 19.26 91.67 22.85 28.83 18.37 19.87 92.57 24.14 29.31 18.75 20.20 93.00 24.30
FIGURE 8. (a) ROC Curve and (b) PR Curve for 15-Fold Feature-Based AraBERT-Softmax Model in a One-
vs-Rest Approach.
𝐼 − 𝑃𝑂𝑆, 𝐼 − 𝑁𝐸𝐺, and 𝐼 − 𝑁𝐸𝑈 with tag 𝑂, and in some Consequently, As illustrated in FIGURE 9 (a), the data
instances, it predicts tag 𝑂 more often than the correct tag point that is close to the 1 on the TPR axe is actually the
(current tag in a one-vs-rest). This implies that the model has optimal threshold, which means that at this threshold, the
asymmetric error distribution, and the ROC curve fails to classifier is perfectly able to distinguish between positive
explicitly show this performance difference. class (current tag in a one-vs-rest) and the negative class (rest
Concurrently, as all tags contribute equally to the of tags). However, as AUC excels under imbalanced
classification task, the PR curve is used instead; this metric settings, the results could be misleading. For instance, the
computes a weighted average precision value for each tag performance gap between AUC and the pointwise metrics (P,
independent of the predictions of other tags. As illustrated in R, and F1 score) regarding the 3-fold AraBERT-Softmax
FIGURE 8 (b), the model that is considered good with ROC- model, presented in TABLE 8, is significant. The AUC is
AUC, performs poorly with PR curve that focuses on the 99.85%, whereas P, R, and F1 score are 69.13%, 71.50%,
positive labels (current tag) and not the true negatives. and 69.23%, respectively, which means that even a perfect
ROC-AUC does not mean that the predictions are well-
2) EXPERIMENT 2: FINE-TUNED METHOD
calibrated.
In this experiment, the AraBERT model’s parameters are
FIGURE 9 (b) illustrates the AUPR in a one-vs-rest
fine-tuned during the training process.
approach for the 15-fold fine-tuned AraBERT-CRF model,
As demonstrated in TABLE 8 and TABLE 9, results
where the average-AUPR is 97.37%. As presented in
were significantly improved when the model's parameters
TABLE 8, CRF outperformed softmax in all mentioned
were adjusted for our E2E-ABSA task rather than using the
pointwise metrics; however, the average-AUPR for the 15-
model as a feature-based only.
fold AraBERT-Softmax model is 98%. It should be noted
With CRF as a classifier, the best performance was
that CRF employs the transition scores matrix to generate the
achieved by 15-fold with 95.41% Precision, 95.23% Recall,
prediction probabilities, whereas the PR curve and ROC
95.16% F1 score, 100% AUC, and 97.37% AUPR; similarly,
curve depend only on the emission scores of each token
using MLP with softmax as a classifier, the best performance
independently. However, CRF still produces quite stable
was achieved by 15-fold with 94.14% Precision, 94.52%
results.
Recall, 94.18% F1 score, 100% AUC, and 98% AUPR.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3342755
#Fold 3 5 10 15
Label / Metric P R F1 AUC AUPR P R F1 AUC AUPR P R F1 AUC AUPR P R F1 AUC AUPR
B-POS 0.79 0.79 0.79 1.00 0.88 0.89 0.90 0.90 1.00 0.96 0.94 0.94 0.94 1.00 0.99 0.96 0.97 0.96 1.00 0.99
B-NEG 0.78 0.86 0.81 1.00 0.88 0.90 0.91 0.90 1.00 0.96 0.95 0.94 0.94 1.00 0.98 0.96 0.97 0.97 1.00 0.99
B-NEU 0.47 0.46 0.45 0.99 0.47 0.75 0.69 0.71 1.00 0.81 0.86 0.83 0.84 1.00 0.93 0.90 0.88 0.89 1.00 0.96
O 0.98 0.97 0.98 1.00 1.00 0.99 0.99 0.99 1.00 1.00 0.99 0.99 0.99 1.00 1.00 1.00 0.99 1.00 1.00 1.00
I-POS 0.69 0.74 0.71 1.00 0.77 0.83 0.84 0.83 1.00 0.90 0.91 0.91 0.91 1.00 0.96 0.93 0.96 0.94 1.00 0.97
I-NEG 0.71 0.83 0.76 1.00 0.82 0.85 0.91 0.88 1.00 0.94 0.93 0.92 0.92 1.00 0.97 0.93 0.97 0.95 1.00 0.99
I-NEU 0.42 0.35 0.33 1.00 0.37 0.64 0.61 0.61 1.00 0.74 0.86 0.80 0.81 1.00 0.90 0.91 0.88 0.89 1.00 0.96
Macro-average 69.13 71.50 69.23 99.85 74.14 83.53 83.54 83.19 1.00 90.14 91.93 90.50 90.79 1.00 96.14 94.14 94.52 94.18 1.00 98.00
#Fold 3 5 10 15
Label / Metric P R F1 AUC AUPR P R F1 AUC AUPR P R F1 AUC AUPR P R F1 AUC AUPR
B-POS 0.84 0.87 0.85 0.99 0.91 0.90 0.92 0.91 1.00 0.96 0.95 0.95 0.95 1.00 0.97 0.96 0.97 0.97 1.00 0.99
B-NEG 0.86 0.88 0.87 1.00 0.91 0.90 0.92 0.91 1.00 0.95 0.96 0.94 0.95 1.00 0.96 0.97 0.97 0.97 1.00 0.98
B-NEU 0.69 0.63 0.66 0.99 0.74 0.79 0.71 0.74 0.99 0.82 0.89 0.87 0.88 1.00 0.94 0.92 0.90 0.91 1.00 0.95
O 0.98 0.98 0.98 1.00 1.00 0.99 0.99 0.99 1.00 1.00 0.99 0.99 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00
I-POS 0.79 0.79 0.78 0.99 0.80 0.89 0.84 0.86 0.99 0.89 0.92 0.92 0.92 1.00 0.94 0.93 0.96 0.95 1.00 0.97
I-NEG 0.80 0.84 0.82 1.00 0.86 0.90 0.88 0.89 1.00 0.92 0.95 0.92 0.93 1.00 0.95 0.96 0.95 0.96 1.00 0.97
I-NEU 0.57 0.55 0.55 1.00 0.67 0.78 0.71 0.73 1.00 0.81 0.89 0.85 0.87 1.00 0.94 0.94 0.92 0.92 1.00 0.95
Macro-average 79.03 79.24 78.78 99.57 84.14 87.97 85.20 86.13 99.85 90.71 93.60 92.16 92.75 1.00 95.88 95.41 95.23 95.16 1.00 97.37
FIGURE 9. (a) ROC Curve and (b) PR Curve for 15-Fold Fine-Tuned AraBERT-CRF Model in a One-vs-Rest
Approach.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3342755
TABLE 10. Example of Model Inference with softmax as a Classification TABLE 12. Experimental Results After Splitting The Prediction of The E2E
Layer. Fine-tuned AraBERT Model for The Tasks of Aspect Term Extraction
Sentence مثيل لها ليس الطعام جودة و نظافة (ATE) and Aspect Sentiment Classification (ASC).
Label O O O I-POS B-POS O B-POS
Prediction O O O B-POS B-POS O O Model Task #Fold P(%) R(%) F1(%) ACC(%)
3 82.98 90.23 86.29 -
5 90.78 92.86 91.79 -
TABLE 11. Example of Model Inference with CRF as a Classification ATE
10 95.25 96.29 95.75 -
Layer. Fine-tuned
15 96.93 97.11 97.02 -
AraBERT-
Sentence مثيل لها ليس الطعام جودة و نظافة 3 75.99 80.81 78.23 90.20
Softmax
5 86.45 85.98 86.18 94.43
Label O O O I-POS B-POS O B-POS ASC
10 92.01 93.73 92.85 97.12
Prediction O O O I-POS B-POS O B-POS 15 95.53 95.68 95.61 98.07
3 91.01 89.66 90.32 -
5 93.78 92.97 93.36 -
and global context words and learning further interactive ATE
10 96.89 96.10 96.49 -
aspect-sentiment representations, which helps the proposed Fine-tuned
15 97.80 97.05 97.78 -
AraBERT-
model in producing improved sequence representations. The 3 86.69 84.24 85.39 93.64
CRF
5 89.77 91.03 90.38 95.59
difference will be more evident when utilizing context-free ASC
10 94.63 94.70 94.66 97.69
embedding models. 15 96.25 96.19 96.22 98.34
However, the best performance was achieved by the 15-
fold fine-tuned AraBERT-CRF model with 95.41% Additionally, we observed that the ATE task
Precision, 95.23% Recall, and 95.16% F1 score. The consistently outperforms the ASC task and the E2E-ABSA
confusion matrix of this model is presented in FIGURE 11; task. This result indicates that the boundary information
based on the differences between predicted and actual labels, learned by the model enhances the evaluation scores of the
it is demonstrated that the model can discriminate between overall E2E-ABSA task. Therefore, utilizing a model that
labels effectively. can set constraints on the boundary information is crucial for
improving the overall E2E-ABSA task, and the CRF model
can be a straightforward and efficient solution.
Furthermore, we observed that k-fold cross-validation
may have an impact on the model's performance. By
employing k-fold cross-validation, all parts of the dataset can
be used for training and testing, forcing the model to attend
to a larger context and increasing the possibility of
associating with relevant opinion words without overfitting.
According to TABLE 12, the best results are obtained at
k=15, which means that small k is likely insufficient to
involve the potential opinion words and does not offer an
accurate evaluation of the model’s performance. FIGURE 12
illustrates the train and test error of the 15-fold AraBERT-
CRF model which shows the model’s resistance to
overfitting. If the model overfits in a particular fold, the
FIGURE 11. Normalized Confusion Matrix of The 15-Fold Fine-Tuned training error of that fold will be less than the testing error;
AraBERT-CRF Model Showing All Entities in The Dataset.
hence, when summing/averaging the errors of all folds, a
3) COMPARISONS WITH EXISTING STUDIES model that overfits would have a low cross-validated
To evaluate the proposed E2E-ABSA approach, we further performance.
processed the predicted labels to separate them into two
distinct categories: aspect term labels (B, I and O) and
sentiment polarity labels (positive, negative, and neutral), to
be appropriate for comparisons with the previous single-task
approaches. By reformulating the predictions into two
separate tasks, we can maximize the evaluation scores,
which results in a better classifier for each task and
ultimately enhances the ABSA task.
We utilized the Precision, Recall, and F1 score metrics
to evaluate the two tasks in addition to the Accuracy for
evaluating the ASC task. As shown in TABLE 12, the 15-
fold fine-tuned AraBERT-CRF model achieved the best
performance, with an F1 score of 97.78% for the ATE task
and 96.22% for the ASC task, respectively, and an Accuracy
of 98.34% for the ASC task. FIGURE 12. Train Test Error of The 15-Fold Fine-Tuned AraBERT-CRF
Model.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3342755
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3342755
ACKNOWLEDGMENT [17] H. Ding, S. Huang, W. Jin, Y. Shan and H. Yu, “A Novel Cascade
The Department of Computer Science supported this work Model for End-to-End Aspect-Based Social Comment Sentiment
under the Faculty of Computers and Information - Mansoura Analysis,” Electronics, vol. 11, p. 1810, 2022.
University of Egypt. [18] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N.
Gomez, {. Kaiser and I. Polosukhin, “Attention is all you need,”
Advances in neural information processing systems, vol. 30, 2017.
REFERENCES
[19] M. F. Abdelfattah, M. W. Fakhr and M. A. Rizka, “ArSentBERT:
[1] R. Obiedat, D. Al-Darras, E. Alzaghoul and O. Harfoushi, “Arabic fine-tuned bidirectional encoder representations from transformers
Aspect-Based Sentiment Analysis: A Systematic Literature Review,” model for Arabic sentiment classification,” Bulletin of Electrical
IEEE Access, vol. 9, pp. 152628--152645, 2021. Engineering and Informatics, vol. 12, pp. 1196--1202, 2023.
[2] M. AL-Smadi, O. Qwasmeh, B. Talafha, M. Al-Ayyoub, Y. Jararweh [20] R. Bensoltane and T. Zaki, “Combining BERT with TCN-BiGRU for
and E. Benkhelifa, “An enhanced framework for aspect-based enhancing Arabic aspect category detection,” Journal of Intelligent
sentiment analysis of Hotels' reviews: Arabic reviews case study,” in \& Fuzzy Systems, pp. 1--14, 2023.
2016 11th International conference for internet technology and
secured transactions (ICITST), IEEE, 2016, pp. 98--103. [21] W. Antoun, F. Baly and H. Hajj, “Arabert: Transformer-based model
for arabic language understanding,” arXiv preprint
[3] M. M. Abdelgwad, T. H. A. Soliman, A. I. Taloba and M. F. Farghaly, arXiv:2003.00104, 2020.
“Arabic aspect based sentiment analysis using bidirectional GRU
based models,” Journal of King Saud University-Computer and [22] J. Devlin, M.-W. Chang, K. Lee and K. Toutanova, “Bert: Pre-
Information Sciences, vol. 34, pp. 6652--6662, 2022. training of deep bidirectional transformers for language
understanding,” arXiv preprint arXiv:1810.04805, 2018.
[4] M. Al-Smadi, O. Qawasmeh, M. Al-Ayyoub, Y. Jararweh and B.
Gupta, “Deep Recurrent neural network vs. support vector machine [23] J. Lafferty, A. McCallum and F. C. Pereira, “Conditional random
for aspect-based sentiment analysis of Arabic hotels’ reviews,” fields: Probabilistic models for segmenting and labeling sequence
Journal of computational science, vol. 27, pp. 386--393, 2018. data,” 2001.
[5] M. Al-Smadi, B. Talafha, M. Al-Ayyoub and Y. Jararweh, “Using [24] J. S. Bridle, “Probabilistic interpretation of feedforward classification
long short-term memory deep neural networks for aspect-based network outputs, with relationships to statistical pattern recognition,”
sentiment analysis of Arabic reviews,” International Journal of in Neurocomputing: Algorithms, architectures and applications,
Machine Learning and Cybernetics, vol. 10, pp. 2163--2175, 2019. Springer, 1990, pp. 227--236.
[6] A.-S. Mohammad, M. M. Hammad, A. Sa’ad, A.-T. Saja and E. [25] M. Al-Ayyoub, H. Al-Sarhan, M. Al-So'ud, M. Al-Smadi and Y.
Cambria, “Gated Recurrent Unit with Multilingual Universal Jararweh, “Framework for Affective News Analysis of Arabic News:
Sentence Encoder for Arabic Aspect-Based Sentiment Analysis,” 2014 Gaza Attacks Case Study.,” J. Univers. Comput. Sci., vol. 23,
Knowledge-Based Systems, p. 107540, 2021. pp. 327--352, 2016.
[7] A. Mohammad, M. Al-Ayyoub, H. N. Al-Sarhan and Y. Jararweh, [26] M. Al-Smadi, M. Al-Ayyoub, Y. Jararweh and O. Qawasmeh,
“An aspect-based sentiment analysis approach to evaluating arabic “Enhancing aspect-based sentiment analysis of Arabic hotels’
news affect on readers,” Journal of Universal Computer Science, vol. reviews using morphological, syntactic and semantic features,”
22, pp. 630--649, 2016. Information Processing & Management, vol. 56, pp. 308--319, 2019.
[8] S. Al-Dabet, S. Tedmori and M. Al-Smadi, “Extracting opinion [27] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado and J. Dean,
targets using attention-based neural model,” SN Computer Science, “Distributed representations of words and phrases and their
vol. 1, pp. 1--10, 2020. compositionality,” Advances in neural information processing
systems, vol. 26, 2013.
[9] R. Bensoltane and T. Zaki, “Towards Arabic aspect-based sentiment
analysis: a transfer learning-based approach,” Social Network [28] S. Hochreiter and J. Schmidhuber, “Long short-term memory,”
Analysis and Mining, vol. 12, pp. 1--16, 2022. Neural computation, vol. 9, pp. 1735--1780, 1997.
[10] A. S. Fadel, M. E. Saleh and O. A. Abulnaja, “Arabic Aspect [29] A. Joulin, E. Grave, P. Bojanowski and T. Mikolov, “Bag of tricks
Extraction Based on Stacked Contextualized Embedding With Deep for efficient text classification,” arXiv preprint arXiv:1607.01759,
Learning,” IEEE Access, vol. 10, pp. 30526--30535, 2022. 2016.
[11] M. Abdelgwad, “Arabic aspect based sentiment classification using [30] A. Safaya, M. Abdullatif and D. Yuret, “Kuisail at semeval-2020 task
BERT,” arXiv: 2107.13290, 2021. 12: Bert-cnn for offensive speech identification in social media,” in
Proceedings of the Fourteenth Workshop on Semantic Evaluation,
[12] S. Al-Dabet, S. Tedmori and A.-S. Mohammad, “Enhancing Arabic 2020, pp. 2054--2059.
aspect-based sentiment analysis using deep learning models,”
Computer Speech & Language, vol. 69, p. 101224, 2021. [31] Y. Yang, D. Cer, A. Ahmad, M. Guo, J. Law, N. Constant, G. H.
Abrego, S. Yuan, C. Tar and Y.-H. Sung, “Multilingual universal
[13] M. E. Chennafi, H. Bedlaoui, A. Dahou and M. A. Al-qaness, “Arabic sentence encoder for semantic retrieval,” arXiv preprint
aspect-based sentiment classification using Seq2Seq dialect arXiv:1907.04307, 2019.
normalization and transformers,” Knowledge, vol. 2, pp. 388--401,
2022. [32] A. B. Soliman, K. Eissa and S. R. El-Beltagy, “Aravec: A set of arabic
word embedding models for use in arabic nlp,” Procedia Computer
[14] S. Ruder, P. Ghaffari and J. G. Breslin, “Insight-1 at semeval-2016 Science, vol. 117, pp. 256--265, 2017.
task 5: Deep learning for multilingual aspect-based sentiment
analysis,” arXiv preprint arXiv:1609.02748, 2016. [33] M. Al-Smadi, O. Qawasmeh, B. Talafha and M. Quwaider, “Human
annotated arabic dataset of book reviews for aspect based sentiment
[15] X. Li, L. Bing, P. Li and W. Lam, “A unified model for opinion target analysis,” in 2015 3rd International conference on future internet of
extraction and target sentiment prediction,” in Proceedings of the things and cloud, IEEE, 2015, pp. 726--730.
AAAI conference on artificial intelligence, 2019.
[34] X. Wang, G. Xu, Z. Zhang, L. Jin and X. Sun, “End-to-end aspect-
[16] B. Carpenter, “Coding chunkers as taggers: Io, bio, bmewo, and based sentiment analysis with hierarchical multi-task learning,”
bmewo+,” LingPipe Blog, p. 14, 2009. Neurocomputing, vol. 455, pp. 178--188, 2021.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3342755
[35] X. Li, L. Bing, W. Zhang and W. Lam, “Exploiting BERT for end- now, he is the acting-head of the Computer Scientists Department at
to-end aspect-based sentiment analysis,” arXiv preprint Mansoura University.
arXiv:1910.00883, 2019.
REEM EL-DEEB was born in El-Mahalla El-
[36] B. Xu, X. Wang, B. Yang and Z. Kang, “Target embedding and Kobra, Egypt, in 1987. She received the B.S. degree
position attention with lstm for aspect based sentiment analysis,” in in Computer Science from Mansoura University,
Proceedings of the 2020 5th International Conference on Faculty of Computers and Information, Egypt, in
Mathematics and Artificial Intelligence, 2020, pp. 93--97. 2008, and the M. Sc. and Ph.D. degrees in Computer
[37] J. Pennington, R. Socher and C. D. Manning, “GloVe: Global Vectors Science from Mansoura University, Faculty of
for Word Representation,” in Proceedings of the 2014 conference on Computers and Information, Egypt, in 2012 and
empirical methods in natural language processing (EMNLP), 2014, 2019, respectively. In 2009, she joined the
Computer Science Department, Mansoura University, as a Teaching
pp. 1532--1543.
Assistant, and in 2012 as an Assistant Lecturer and she became an Assistant
[38] V. Nair and G. E. Hinton, “Rectified linear units improve restricted Professor in 2019. Her Current Research Interests include Natural Language
boltzmann machines,” in Icml, 2010. Processing, Artificial intelligence applications, and Machine learning for
[39] J. Terven, D. M. Cordova-Esparza, A. Ramirez-Pedraza and E. A. text semantic analysis and language understanding. Dr. Reem was the
Chavez-Urbiola, “Loss Functions and Metrics in Deep Learning. A recipient of the Scientific Publishing Grant award (2019).
Review,” arXiv preprint arXiv:2307.02694, 2023.
[40] G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami and C.
Dyer, “Neural architectures for named entity recognition,” arXiv
preprint arXiv:1603.01360, 2016.
[41] D. P. Kingma and J. Ba, “Adam: A method for stochastic
optimization,” arXiv preprint arXiv:1412.6980, 2014.
[42] M. Stone, “Cross-validatory choice and assessment of statistical
predictions,” Journal of the royal statistical society: Series B
(Methodological), vol. 36, pp. 111--133, 1974.
[43] M. Hossin and M. N. Sulaiman, “A review on evaluation metrics for
data classification evaluations,” International journal of data mining
\& knowledge management process, vol. 5, p. 1, 2015.
[44] T. Saito and M. Rehmsmeier, “The precision-recall plot is more
informative than the ROC plot when evaluating binary classifiers on
imbalanced datasets,” PloS one, vol. 10, p. e0118432, 2015.
[45] M. Pontiki, D. Galanis, H. Papageorgiou, I. Androutsopoulos, S.
Manandhar, M. AL-Smadi, M. Al-Ayyoub, Y. Zhao, B. Qin, O. De
Clercq and others, “Semeval-2016 task 5: Aspect based sentiment
analysis,” in ProWorkshop on Semantic Evaluation (SemEval-2016),
Association for Computational Linguistics, 2016, pp. 19--30.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4