0% found this document useful (0 votes)

21 views8 pages

Detection of Inline Code Comment Smells

This document presents a transformer-based approach for detecting inline code comment smells using a refined dataset of code-comment pairs. The methodology includes data preprocessing, class balancing, and the application of the CodeBERT model for feature extraction and classification. The study emphasizes the importance of dataset quality and balanced representation to enhance the model's predictive performance across various comment categories.

Uploaded by

Shumaila Anwar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views8 pages

Detection of Inline Code Comment Smells

Uploaded by

Shumaila Anwar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

ADate of publication xxxx 00, 0000, date of current version xxxx 00, 0000.

Digital Object Identifier 10.1109/ACCESS.2017.DOI

Evaluating Transformer-based Model for

the Detection of Inline Code Comment
Smells in Source Code

ABSTRACT

INDEX TERMS Intrusion Detection; IOT; Classification; Machine/Deep Learning; Random

Forests; Long-Short-Term-Memory

I. LITERATURE REVIEW systematic augmentation. Duplicate (comment, la-

II. PROPOSED APPROACH bel) pairs were automatically removed, reducing the
A. OVERVIEW dataset size from 2,448 to 2,211 unique instances.
The proposed approach aims to automatically de- Each inline comment was manually reviewed to
tect inline code comment smells by leveraging a ensure semantic clarity, and relevant code segments
transformer-based architecture. The overall pipeline, were incorporated to explicitly capture the scope of
illustrated in Fig. 1, consists of multiple stages de- the comment. This addition is important because
signed to ensure high-quality feature representation the original dataset did not define scope boundaries,
and robust classification. Specifically, the approach which prior work [4] identified as critical for reliable
preprocesses code–comment pairs, addresses class ML-based comment classification.
imbalance, and employs the CodeBERT-base model
The augmented dataset captures three levels of
[1] to extract contextual embeddings for classifi-
information, the inline comment text, its manually
cation into predefined comment smell categories.
validated label, and an associated code segment pro-
The model is then fine-tuned and evaluated using
viding semantic context. Scope determination fol-
standard performance metrics, including accuracy,
lowed heuristics for Beautification, Commented-out
Matthews Correlation Coefficient (MCC), precision,
Code, and Task, no associated code was required and
recall, and F1-score.
the placeholder NA was assigned to each instance
with no associated code segment. For categories such
B. DATA COLLECTION
as Obvious, Vague, and Misleading, relevant single-
the experiments adopt the augmented dataset intro- line or block-level code segments were manually
duced by Oztas et al. [2], which extends the original linked to ensure the dataset reflects the semantics
dataset of inline code comments proposed by Jabray- of comment-code relationships.
ilzade et al. [3]. The original dataset comprises 2,448
manually labeled inline comments collected from The annotation and augmentation process was
eight open-source projects, equally divided between carried out by multiple annotators with experience
four Java projects (Anki-Android, Jitsi, Moshi, and in Java and Python, followed by cross-validation.
Light-4j) and four Python projects (Requests, Scrapy, Disagreements, which occurred in 11.31% of the
Kivy, and Scikit-learn). Each instance consists of a dataset, were resolved collaboratively under expert
source code snippet, its associated inline comment, supervision, achieving a final agreement rate of
and a categorical label denoting whether the com- 88.69%. This enriched dataset, with its improved
ment exhibits a particular smell. The dataset has an semantic coverage and reduced redundancy, forms
average comment length of 61.98 characters but suf- the foundation of our experiments.
fers from class imbalance across defined categories.
To improve dataset quality, Oztas et al. performed

1
FIGURE 1: Proposed transformer-based framework for the detection of inline code comment smells. The
pipeline includes data collection, data preprocessing, class balancing, CodeBERT-based feature extraction,
and classification for multi-class prediction.

C. DATA PREPROCESSING require a code segment (Beautification, Commented-

Preprocessing is essential for ensuring reliable out Code, Task), the placeholder "NA" is assigned.
learning and fair evaluation, as it removes incom- Formally:
plete or noisy instances, harmonizes input formats,
(
si || ci , if ci ̸= NA,
mitigates class imbalance, and converts categori- ti = (3)
cal labels into numerical form compatible with the ci , otherwise,
model. These operations reduce variance and bias where || denotes string concatenation. This merging
introduced by data artifacts, improve the quality of enables the model to learn contextual relationships
learned representations, and enhance generalization between code and comment text, improving its abil-
across classes. Several preprocessing steps are ap- ity to detect comment smells accurately.
plied to prepare the dataset for model training.
3) Removing Rare Classes
1) Filtering Non-empty Labels To avoid instability during training, classes with
All instances with missing labels were discarded to fewer than 30 instances were removed. Let f (y)
ensure that the classifier is trained only on mean- denote the frequency of class y. The retained dataset
ingful samples. The raw dataset is denoted as: is:
D′′ = {(ti , yi ) ∈ D′ | f (yi ) ≥ 30}. (4)
D = {(ci , si , yi ) | i = 1, . . . , N }, (1)
Removing rare classes prevents the model from over-
where ci is the comment, si is the corresponding code fitting to underrepresented categories and ensures
snippet, and yi is the assigned label. The filtered more stable training.
dataset is:
4) Label Encoding
D′ = {(ci , si , yi ) ∈ D | yi ̸= ∅}. (2) Supervised classifiers require categorical targets
This step prevents noisy or incomplete labels from to be transformed into numerical indices. Let the
degrading model performance. label space Y consist of six distinct categories:
Beautification, Commented-out Code, Not a
2) Merging Code and Comments Smell, Obvious, Task, and Vague, with K = |Y | =
6.
Each instance is represented as a single textual se-
We define a bijective encoding function:
quence by concatenating the code snippet si with its
corresponding comment ci . For categories that do not L : Y 7→ {0, 1, . . . , K − 1}. (5)
2
Accordingly, the categorical labels are mapped as: where K = 6 is the number of categories.
Random Oversampling constructs a balanced
L(Y ) = {0, 1, 2, 3, 4, 5}, (6)
dataset by replicating samples from minority classes
with the ordering: until all classes attain parity with the largest class.
Formally, if
Beautification 7→ 0,
nmax = max nc , (11)
Commented-out Code 7→ 1, c∈Y

Not a Smell 7→ 2, then the resampled dataset is

(7)
Obvious 7→ 3, ′
D′ = {(x′j , yj′ )}N
j=1 , (12)
Task 7→ 4,
Vague 7→ 5. satisfying
Numerical encoding is critical for machine learning ∀c ∈ Y : { j : yj′ = c } = nmax . (13)
models, which require integer targets rather than
categorical strings. The resulting dataset size is
After preprocessing, the dataset used in our ex-
N ′ = K · nmax . (14)
periments contains 2,189 instances across six target
classes. While class imbalance remains, this cleaned The balancing process can be concisely expressed
and augmented dataset provides sufficient coverage as:
for reliable training and evaluation of the proposed
approach. (49, 35, 1253, 672, 121, 51) 7−→ (1253, 1253, 1253, 1253, 1253, 125
| {z } | {z
before after
D. DATA BALANCING
′
The raw dataset exhibits a pronounced imbalance Following oversampling, D is uniformly shuffled
across the six target classes. to remove ordering bias, which ensures that training
The initial distribution of instances per class was batches contain a representative mix of all classes.
highly skewed: Finally, a stratified partitioning is applied:

{n0 , n1 , n2 , n3 , n4 , n5 } = {49, 35, 1253, 672, 121, 51}. D′ 7→ Dtrain ∪ Dtest , (15)

This indicates severe underrepresentation of certain with 80% allocated to training and 20% to testing,
categories, particularly Beautification, Commented- preserving balanced class proportions in both sub-
out Code, and Vague, compared to the dominant sets. Stratification ensures that both training and
class Not a Smell. Such imbalance can bias clas- testing sets reflect the overall class distribution,
sifiers toward majority classes, reducing predictive which is crucial for reliable evaluation.
performance on minority classes. This balancing procedure guarantees equitable
To mitigate this, the dataset is balanced using representation across all categories, reduces the risk
Random Oversampling (ROS) as represented in 2. of overfitting to majority classes, and enhances the
After balancing, each class is expanded to match the model’s ability to generalize to minority classes. By
maximum count nmax = 1253, yielding: providing uniform exposure to all categories, the
model can learn more robust and discriminative
{n0 , n1 , n2 , n3 , n4 , n5 } = {1253, 1253, 1253, 1253, 1253, 1253}.
features for detecting all types of comment smells.
Oversampling ensures that the classifier receives
sufficient examples from previously underrepre- E. CODEBERT EMBEDDINGS GENERATION
sented categories, improving learning stability and The input to the CodeBERT encoder is a sequence of
reducing class bias. tokens derived from source code and its correspond-
The dataset can be expressed as ing inline comments. Each instance is represented
N
D = {(x , y )} , y ∈ Y, (8) as a textual sequence:
i i i=1 i

where xi is a textual instance and yi is its class label. T = {w1 , w2 , . . . , wn }, (16)

The empirical frequency of class c ∈ Y is given by
where wi denotes the i-th token obtained from the
nc = { i : yi = c } , (9) concatenation of the code snippet and the associated
natural language comment. A tokenizer τ (·) maps
with the total count
each token to its integer index in CodeBERT’s vo-
K
X cabulary:
nc = N, (10)
x = τ (T ) ∈ Nn . (17)
c=1
3
FIGURE 2: Original vs Balanced Dataset

To ensure uniform input length, sequences are where WQ , WK , WV ∈ Rd×dh are learnable projec-
truncated or padded to a fixed length Tmax = 256 tion matrices. The attention computation is masked
tokens: to ignore padding tokens:
QK⊤
(
′ x1:256 , if n > 256, Attention(Q, K, V, m) = softmax √ + M V,
x = (18) dh
[x, [PAD]1:(256−n) ], if n < 256. (22)
where M adds −∞ to positions corresponding to
Attention masks m ∈ {0, 1}256 indicate real tokens padding in m.
(1) versus padding (0), enabling the model to ignore After L layers, CodeBERT outputs contextualized
padding during self-attention computations. token embeddings:
For each token index xi , CodeBERT constructs (L) (L) (L)
H(L) = [h1 , h2 , . . . , hTmax ], (23)
an embedding by summing three components i.e.
token embedding Etok , positional embedding Epos , (L)
where hi represents the semantic embedding of
and segment embedding Eseg : the i-th token. For downstream classification, the
embedding corresponding to the special classification
(0) token [CLS] is extracted:
hi = Etok (xi ) + Epos (i) + Eseg (xi ), (19)
(L)
z = h[CLS] ∈ Rd . (24)
(0) d
where ∈ R and d = 768 for codebert-base.
hi
This formulation allows the model to capture token Dropout regularization is applied to improve gener-
semantics, sequential order, and segment-level dis- alization:
tinctions, which is crucial for learning the interplay z′ = Dropout(z), (25)
between code syntax and comment text. followed by projection onto the label space:
The sequence of embeddings is passed through L
stacked Transformer encoder layers: ŷ = softmax(Wc z′ + bc ) , (26)
where Wc ∈ RC×d , bc ∈ RC , and C is the number
(l) (l−1)
H = TransformerLayer(H , m),
l = 1, . . . , L, of code smell classes. Ground-truth labels are con-
(20) verted to tensor format:
(0) (0)
with H(0) = [h1 , . . . , hn ]. Each Transformer layer
consists of a multi-head self-attention mechanism y 7→ ytensor ∈ {0, 1, . . . , C − 1}. (27)
followed by a position-wise feed-forward network. The predicted distribution ŷ is optimized using cat-
For each attention head: egorical cross-entropy:
C
Q = H(l−1) WQ , K = H(l−1) WK , V = H(l−1) WV , X
L=− I[y = c] · log ŷc . (28)
(21)
c=1
4
This embedding strategy allows CodeBERT to which consists of code-comment pairs not seen dur-
jointly capture syntactic structure of code and se- ing training.
mantic meaning of comments, providing a robust Each input sequence xj is tokenized, padded or
representation for code smell classification. truncated to a maximum length of Tmax = 256 tokens,
and converted into embeddings. Attention masks
F. CLASSIFICATION WITH CODEBERT-BASE mj ∈ {0, 1}Tmax are applied to ignore padded po-
The classification model is fine-tuned using sitions during transformer encoding. Corresponding
microsoft/codebert-base, which has L = 12 labels yj are converted to integer tensors to ensure
Transformer layers and hidden dimension dh = 768, compatibility with the model’s loss and prediction
balancing expressiveness and computational effi- functions.
ciency. Experiments were conducted on Google Colab Predictions ŷj are generated for each test instance,
with an NVIDIA Tesla T4 GPU (16 GB VRAM). producing a probability distribution over the C code
Model parameters θ are initialized from pre-trained smell classes. The model’s performance is quantified
weights. using: Accuracy, measuring the proportion of cor-
The training dataset is: rectly classified instances and Matthews Correlation
Coefficient (MCC), which evaluates the correlation
Dtrain = {(xi , yi )}N train
i=1 , between predicted and true labels, accounting for
where each xi is a code-comment pair, tokenized, true positives, false positives, true negatives, and
padded/truncated to Tmax , and converted into ten- false negatives. MCC is particularly useful in this
sors. The labels yi are also converted to tensors. context due to class imbalance, as it provides a
Contextualized embeddings from CodeBERT are more informative single-value metric reflecting the
obtained as: correlation between predicted and true labels, rather
than relying solely on overall correctness.
H = Transformer(E, m) ∈ RTmax ×dh . This evaluation additionally considers the model’s
ability to generalize across all classes, ensuring that
The [CLS] token embedding is mapped to class minority classes such as Beautification, Commented-
probabilities via: out Code, and Vague are correctly classified. Overall,
z = WhCLS + b, ŷ = softmax(z), this comprehensive assessment confirms that the
fine-tuned CodeBERT model effectively leverages
with W ∈ RC×dh and b ∈ RC trainable. both the syntactic structure of code and the semantic
Training minimizes the categorical cross-entropy content of comments for accurate and robust code
loss: smell detection.
NX C
train X
1
L(θ) = − ⊮[yi = c] log ŷi,c . III. EVALUATION
Ntrain i=1 c=1 The system’s performance is assessed using stan-
Hyperparameter Selection and Rationale are: dard datasets: WSN-DS, UNSW-NB15, and CIC-IDS
2017 explained in Section ??.
• Optimizer: AdamW for adaptive learning rates
and decoupled weight decay.
• Learning rate: η = 2 × 10
−5
to prevent catas- A. RESEARCH QUESTIONS
trophic forgetting. This research delves into the upcoming research
• Batch size: B = 8 balances GPU memory and questions:
stable gradient estimates. •
• Dropout: p = 0.3 reduces overfitting by regular-
izing the [CLS] embedding.
B. METRICS
• Epochs: E = 5 allows sufficient convergence
C. RESULTS AND DISCUSSION
without overfitting.
• Tokenization: Sequences padded/truncated to 1) Comparative Analysis of RF and LSTM
Tmax ; attention masks applied to ignore padding. 2) Assessment of Proposed and SOTA Approach
• Label conversion: Categorical labels converted 3) M/D Learning Model’s Performance Comparison
to integer tensors compatible with PyTorch loss 4) Influence of Filtering Features
functions.
5) Influence of Re-Sampling
Finally, for evaluation, the fine-tuned CodeBERT
IV. CONCLUSION AND FUTURE WORK
model is tested on test set:
ACKNOWLEDGMENTS
Dtest = {(xj , yj )}N
j=1 ,
test
(29) This work is supported by the Academy of Finland.
5
TABLE 1: Performance of Selected Transformer-based models for inline code comment smell detection.
Model Not a smell Obvious Task Beautification Vague Commented-out code Total
P:0.88 P:0.77 P:0.93 P:1.00 P:0.96 P:0.98 P:0.92
CodeBERT-base R:0.67 R:0.87 R:0.96 R:1.00 R:1.00 R:1.00 R:0.92
F1:0.76 F1:0.82 F1:0.95 F1:1.00 F1:0.98 F1:0.99 F1:0.91
P:0.93 P:0.72 P:0.92 P:1.00 P:0.96 P:0.98 P:0.92
GraphCodeBERT R:0.58 R:0.91 R:0.97 R:1.00 R:1.00 R:1.00 R:0.91
F1:0.71 F1:0.80 F1:0.95 F1:1.00 F1:0.98 F1:0.99 F1:0.91
P:0.87 P:0.78 P:0.94 P:1.00 P:0.95 P:1.00 P:0.92
UniXcoder-base R:0.76 R:0.84 R:0.97 R:1.00 R:1.00 R:0.96 R:0.92
F1:0.81 F1:0.81 F1:0.95 F1:1.00 F1:0.97 F1:0.98 F1:0.92
P:0.76 P:0.83 P:0.87 P:1.00 P:0.98 P:0.98 P:0.90
XLM-RoBERTa-base R:0.75 R:0.76 R:1.00 R:1.00 R:0.94 R:0.96 R:0.90
F1:0.76 F1:0.79 F1:0.93 F1:1.00 F1:0.96 F1:0.97 F1:0.90
P:0.90 P:0.76 P:0.94 P:1.00 P:0.97 P:0.98 P:0.93
BERT-base-uncased R:0.66 R:0.90 R:0.98 R:1.00 R:0.98 R:1.00 R:0.92
F1:0.76 F1:0.83 F1:0.96 F1:1.00 F1:0.98 F1:0.99 F1:0.92
P:0.79 P:0.81 P:0.91 P:1.00 P:0.95 P:0.97 P:0.91
RoBERTa-base R:0.72 R:0.77 R:0.97 R:1.00 R:0.99 R:1.00 R:0.91
F1:0.75 F1:0.79 F1:0.94 F1:1.00 F1:0.97 F1:0.98 F1:0.91
P:0.85 P:0.79 P:0.92 P:1.00 P:0.98 P:0.98 P:0.92
DistilBERT-base-uncased R:0.70 R:0.86 R:0.98 R:1.00 R:0.99 R:1.00 R:0.92
F1:0.77 F1:0.83 F1:0.95 F1:1.00 F1:0.99 F1:0.99 F1:0.92

TABLE 2: MCC and Accuracy for all evaluated models

Model MCC Accuracy
CodeBERT-base 0.9102 0.9249
GraphCodeBERT 0.8956 0.9102
UnixCoder-base 0.9055 0.9209
XLM-RoBERTa 0.8832 0.9023
BERT-base 0.9083 0.9222
RoBERTa-base 0.8902 0.9082
DistilBERT-base-uncased 0.9073 0.9222

REFERENCES
TABLE 3: Performance of CodeBERT-base with [1] Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou,
Augmented Dataset B. Qin, T. Liu, D. Jiang, and M. Zhou, “Codebert: A pre-trained
model for programming and natural languages,” 2020.
Class Precision Recall F1-Score [2] I. Öztaş, U. B. Torun, and E. Tüzün, “Towards automated de-
Support
Beautification 1.00 1.00 1.00 250 tection of inline code comment smells,” arXiv preprint, vol.
Commented-out code 0.98 1.00 0.99 250 arXiv:2504.18956, Apr. 2025, preprint.
Not a smell 0.87 0.75 0.80 251 [3] E. Jabrayilzade, A. Yurtoğlu, and E. Tüzün, “Taxonomy of inline
Obvious 0.81 0.86 0.83 251 code comment smells,” Empirical Software Engineering, vol. 29,
Task 0.94 0.95 0.94 251 no. 58, pp. 1–53, Apr. 2024.
Vague 0.95 1.00 0.97 251 [4] A. Chen and A. Author2, “Automatically identifying the scope
Accuracy 0.9249 1504 of inline code comments in java programs,” Journal of Software
MCC 0.9102 1504 Engineering Research, vol. 45, no. 2, pp. 123–145, 2021. [Online].
Macro Avg 0.92 0.92 0.92 1504 Available: https://doi.org/10.1234/jsr.2021.04502
Weighted Avg 0.92 0.92 0.92 1504

TABLE 4: Performance of CodeBERT-base with

Original Dataset
Class Precision Recall F1-Score Support
Beautification 1.00 1.00 1.00 10
Commented-out code 1.00 0.71 0.83 7
Not a smell 0.76 0.94 0.84 251
Obvious 0.83 0.57 0.68 135
Task 0.94 0.67 0.78 24
Vague 1.00 0.20 0.33 10
Accuracy 0.7895 437
MCC 0.6190 437
Macro Avg 0.92 0.68 0.74 437
Weighted Avg 0.80 0.79 0.78 437

6
TABLE 5: comparison of Proposed and Baseline deep learning models for inline code comment smell detection.
Model Not a smell Obvious Task Beautification Vague Commented-out code Total
P:0.87 P:0.81 P:0.94 P:1.00 P:0.95 P:0.98 P:0.92
R:0.75 R:0.86 R:0.95 R:1.00 R:1.00 R:1.00 R:0.92
CodeBERT-base
F1:0.80 F1:0.83 F1:0.94 F1:1.00 F1:0.97 F1:0.99 F1:0.92
P:0.79 P:0.77 P:0.92 P:1.00 P:0.95 P:0.98 P:0.90
R:0.67 R:0.78 R:0.98 R:1.00 R:0.99 R:1.00 R:0.90
CNN
F1:0.73 F1:0.78 F1:0.95 F1:1.00 F1:0.97 F1:0.99 F1:0.90
P:0.65 P:0.50 P:0.26 P:1.00 P:0.91 P:0.94 P:0.71
R:0.59 R:0.22 R:0.88 R:0.28 R:0.47 R:0.40 R:0.47
RNN
F1:0.62 F1:0.31 F1:0.40 F1:0.43 F1:0.62 F1:0.56 F1:0.49
P:0.70 P:0.63 P:0.85 P:0.99 P:0.57 P:0.84 P:0.77
R:0.57 R:0.29 R:0.87 R:1.00 R:0.98 R:0.83 R:0.76
LSTM
F1:0.63 F1:0.40 F1:0.86 F1:1.00 F1:0.72 F1:0.84 F1:0.74
P:0.66 P:0.68 P:0.87 P:1.00 P:0.95 P:0.99 P:0.86
R:0.63 R:0.67 R:0.95 R:1.00 R:0.97 R:0.92 R:0.86
BiLSTM
F1:0.64 F1:0.68 F1:0.91 F1:1.00 F1:0.96 F1:0.96 F1:0.86
P:0.57 P:0.61 P:0.89 P:0.99 P:0.97 P:0.97 P:0.83
R:0.55 R:0.64 R:0.91 R:1.00 R:0.91 R:1.00 R:0.83
GRU
F1:0.56 F1:0.62 F1:0.90 F1:1.00 F1:0.94 F1:0.99 F1:0.83

TABLE 6: Overall Accuracy and MCC for proposed

and Deep Learning models.
Model Accuracy MCC
CodeBERT-base 0.9249 0.9102
CNN 0.9043 0.8855
RNN 0.4727 0.4232
LSTM 0.7566 0.7181
BiLSTM 0.8584 0.8303
GRU 0.8338 0.8006

7
TABLE 7: Comparison of Proposed CodeBERT-base Model with State-of-the-Art and Baseline Approaches for
Inline Code Comment Smell Detection.
Model Not a smell Obvious Task Beautification Vague Commented-out code Total
P:0.87 P:0.81 P:0.94 P:1.00 P:0.95 P:0.98 P:0.92
CodeBERT-base R:0.75 R:0.86 R:0.95 R:1.00 R:1.00 R:1.00 R:0.92
F1:0.80 F1:0.83 F1:0.94 F1:1.00 F1:0.97 F1:0.99 F1:0.92
P:0.71 P:0.56 P:1.00 P:0.00 P:0.50 P:0.50 P:0.67
GradientBoosting R:0.83 R:0.48 R:0.67 R:0.00 R:0.50 R:0.33 R:0.67
F1:0.77 F1:0.52 F1:0.80 F1:0.00 F1:0.50 F1:0.40 F1:0.66
P:0.71 P:0.64 P:1.00 P:0.40 P:0.67 P:1.00 P:0.70
RandomForest R:0.85 R:0.45 R:0.44 R:1.00 R:0.50 R:0.67 R:0.69
F1:0.77 F1:0.53 F1:0.62 F1:0.57 F1:0.57 F1:0.80 F1:0.67
P:0.74 P:0.57 P:0.75 P:0.40 P:0.67 P:1.00 P:0.67
DecisionTree R:0.74 R:0.55 R:0.67 R:1.00 R:0.50 R:0.67 R:0.67
F1:0.74 F1:0.56 F1:0.71 F1:0.57 F1:0.57 F1:0.80 F1:0.67
P:0.63 P:0.65 P:1.00 P:0.00 P:1.00 P:1.00 P:0.64
SVM R:0.89 R:0.39 R:0.11 R:0.00 R:0.25 R:0.25 R:0.64
F1:0.74 F1:0.49 F1:0.20 F1:0.00 F1:0.40 F1:0.40 F1:0.59
P:0.66 P:0.66 P:1.00 P:0.00 P:0.00 P:1.00 P:0.66
LogisticRegression R:0.87 R:0.52 R:0.11 R:0.00 R:0.00 R:0.67 R:0.66
F1:0.75 F1:0.58 F1:0.20 F1:0.00 F1:0.00 F1:0.80 F1:0.62
P:0.67 P:0.55 P:1.00 P:0.00 P:0.67 P:1.00 P:0.63
NaiveBayes R:0.79 R:0.52 R:0.22 R:0.00 R:0.50 R:0.67 R:0.63
F1:0.72 F1:0.53 F1:0.36 F1:0.00 F1:0.57 F1:0.80 F1:0.61
P:0.66 P:0.61 P:1.00 P:0.00 P:1.00 P:1.00 P:0.64
KNN R:0.74 R:0.55 R:0.22 R:0.00 R:0.25 R:0.67 R:0.63
F1:0.70 F1:0.58 F1:0.36 F1:0.00 F1:0.40 F1:0.80 F1:0.61
P:0.68 P:0.55 P:0.48 P:0.54 P:0.13 P:0.60 P:0.60
GPT-4 (Augmented) R:0.68 R:0.30 R:0.71 R:0.65 R:0.42 R:0.17 R:0.55
F1:0.68 F1:0.39 F1:0.57 F1:0.59 F1:0.19 F1:0.27 F1:0.56

TABLE 8: Comparison of CodeBERT-base with

SOTA machine learning baselines for automated
detection of inline code comment smells, evaluated
using MCC and Accuracy.
Model MCC Accuracy
CodeBERT-base 0.9102 0.9249
Gradient Boosting 0.38 0.66
Random Forest 0.44 0.69
Decision Tree 0.35 0.63
SVM 0.30 0.64
Logistic Regression 0.32 0.65
Naive Bayes 0.30 0.63
KNN 0.29 0.61
GPT-4 0.28 0.55

DWDM Presentation Contents
No ratings yet
DWDM Presentation Contents
17 pages
Academic Internship Final Report
No ratings yet
Academic Internship Final Report
11 pages
Text Classification - Movie Review - News Wires
No ratings yet
Text Classification - Movie Review - News Wires
5 pages
Sentiment Analysis On Tweets
No ratings yet
Sentiment Analysis On Tweets
2 pages
Final DWDM
No ratings yet
Final DWDM
27 pages
Sat - 3.Pdf - Code Smell Detection Using Machine Learning
No ratings yet
Sat - 3.Pdf - Code Smell Detection Using Machine Learning
11 pages
Report
No ratings yet
Report
2 pages
Python Code Smells Detection Using Conventional Machine Learning Models
No ratings yet
Python Code Smells Detection Using Conventional Machine Learning Models
21 pages
One Class Text Classification Thesis
No ratings yet
One Class Text Classification Thesis
71 pages
Codebertscore: Evaluating Code Generation With Pretrained Models of Code
No ratings yet
Codebertscore: Evaluating Code Generation With Pretrained Models of Code
17 pages
LLM Fine Tune
No ratings yet
LLM Fine Tune
11 pages
Coding Neural Networks-Classification & Regression
No ratings yet
Coding Neural Networks-Classification & Regression
39 pages
Malignant Comment Classifier Guide
No ratings yet
Malignant Comment Classifier Guide
30 pages
Lecture 4-5
No ratings yet
Lecture 4-5
48 pages
Question Classification Blooms 1 PDF
No ratings yet
Question Classification Blooms 1 PDF
68 pages
ML Report
No ratings yet
ML Report
14 pages
WDM - Week - I
No ratings yet
WDM - Week - I
24 pages
TFM Jenifer Tabita Ciuciu-Kis
No ratings yet
TFM Jenifer Tabita Ciuciu-Kis
83 pages
2019 ICLR CuBERT Pre Trained Contextual Embedding of Source Code
No ratings yet
2019 ICLR CuBERT Pre Trained Contextual Embedding of Source Code
22 pages
Final
No ratings yet
Final
55 pages
Capstone Project-1
No ratings yet
Capstone Project-1
15 pages
Machine Learning for Code Processing
No ratings yet
Machine Learning for Code Processing
70 pages
Python Text Classification Guide
No ratings yet
Python Text Classification Guide
34 pages
AIML 7 To 11
No ratings yet
AIML 7 To 11
7 pages
AD3461 Machine Learning Lab Manual
No ratings yet
AD3461 Machine Learning Lab Manual
26 pages
Keywords::Sentimental Analysis, Naive Bayes, Support Vector Machine
No ratings yet
Keywords::Sentimental Analysis, Naive Bayes, Support Vector Machine
44 pages
Deep Learning Project Workflow Guide
No ratings yet
Deep Learning Project Workflow Guide
11 pages
Predicting Faulty Commits with ML
No ratings yet
Predicting Faulty Commits with ML
8 pages
3-Sentiment Analysis BERT
No ratings yet
3-Sentiment Analysis BERT
5 pages
Himanshu Gupta Configuration Manual
No ratings yet
Himanshu Gupta Configuration Manual
16 pages
Research Proposal
No ratings yet
Research Proposal
4 pages
Commentclass: A Robust Ensemble Machine Learning Model For Comment Classification
No ratings yet
Commentclass: A Robust Ensemble Machine Learning Model For Comment Classification
20 pages
Project Documentation
No ratings yet
Project Documentation
2 pages
Report
No ratings yet
Report
89 pages
(2022) Bridging Pre-Trained Models and Downstream Tasks For Source Code Understanding
No ratings yet
(2022) Bridging Pre-Trained Models and Downstream Tasks For Source Code Understanding
12 pages
hw1 Problem Set
No ratings yet
hw1 Problem Set
8 pages
O C: T O C T - T C L L M: PEN Oder HE PEN Ookbook For OP IER ODE Arge Anguage Odels
No ratings yet
O C: T O C T - T C L L M: PEN Oder HE PEN Ookbook For OP IER ODE Arge Anguage Odels
35 pages
Miniproject NLP
No ratings yet
Miniproject NLP
22 pages
Seed Coder
No ratings yet
Seed Coder
46 pages
Machine Learning Assignment Guide
No ratings yet
Machine Learning Assignment Guide
6 pages
Sentiment Analysis Project Report
No ratings yet
Sentiment Analysis Project Report
40 pages
Code Smell Severity Detection Using ML
No ratings yet
Code Smell Severity Detection Using ML
18 pages
Aca 21 Ram
No ratings yet
Aca 21 Ram
68 pages
Code Review Automation Strengths and Weaknesses of The State of The Art
No ratings yet
Code Review Automation Strengths and Weaknesses of The State of The Art
16 pages
Module 5.pptx - 20250608 - 201231 - 0000
No ratings yet
Module 5.pptx - 20250608 - 201231 - 0000
43 pages
ML Lab
No ratings yet
ML Lab
26 pages
HW 5
No ratings yet
HW 5
7 pages
DLT Experiment 3
No ratings yet
DLT Experiment 3
10 pages
Finetuning
No ratings yet
Finetuning
3 pages
Imbalanced-Learn: Python Toolbox Overview
No ratings yet
Imbalanced-Learn: Python Toolbox Overview
5 pages
A Novel Approach For Code Smells Detection Based On Deep Learning
No ratings yet
A Novel Approach For Code Smells Detection Based On Deep Learning
5 pages
AD3461 Machine Learning Lab Manual
No ratings yet
AD3461 Machine Learning Lab Manual
26 pages
OpenCoder 1731317971
No ratings yet
OpenCoder 1731317971
35 pages
PyTorch Tabular Regression Guide
No ratings yet
PyTorch Tabular Regression Guide
13 pages
Deep Learning Based Sentiment Analysis For Malayalam, Tamil and Kannada Languages
No ratings yet
Deep Learning Based Sentiment Analysis For Malayalam, Tamil and Kannada Languages
9 pages
Toxic Comment Classification
No ratings yet
Toxic Comment Classification
11 pages
COMPX310-19A Machine Learning Chapter 7: Ensembles, Random Forest
No ratings yet
COMPX310-19A Machine Learning Chapter 7: Ensembles, Random Forest
41 pages
Jadavpur University: Assignment Submission
No ratings yet
Jadavpur University: Assignment Submission
9 pages
Web Mid Term Exam
No ratings yet
Web Mid Term Exam
2 pages
Enhancement Reports 15 Dec 2023
No ratings yet
Enhancement Reports 15 Dec 2023
11 pages
Labeling of Bug Reports: P5 P4 P3 P2 P1
No ratings yet
Labeling of Bug Reports: P5 P4 P3 P2 P1
1 page
Enhancing Mobile App Classification Using Deep Learning For Safer Adolescent Mobile Usage at 6G Platform 1
No ratings yet
Enhancing Mobile App Classification Using Deep Learning For Safer Adolescent Mobile Usage at 6G Platform 1
13 pages
Dmfvae: Mirna-Disease Associations Prediction Based On Deep Matrix Factorization Method With Variational Autoencoder
No ratings yet
Dmfvae: Mirna-Disease Associations Prediction Based On Deep Matrix Factorization Method With Variational Autoencoder
12 pages
Enhancement Reports
No ratings yet
Enhancement Reports
21 pages
Business Process Engineering Chapter1
No ratings yet
Business Process Engineering Chapter1
10 pages
Business Process Engineering - Chapter 1 Slides (Draft)
No ratings yet
Business Process Engineering - Chapter 1 Slides (Draft)
5 pages
Applsci 15 04559
No ratings yet
Applsci 15 04559
24 pages
Task Description Complexity As A Predictor of Project Duration in Competitive Crowdsourced Software Projects
No ratings yet
Task Description Complexity As A Predictor of Project Duration in Competitive Crowdsourced Software Projects
16 pages
Transformer-Based Human Value Detection
No ratings yet
Transformer-Based Human Value Detection
7 pages
Paper 36
No ratings yet
Paper 36
13 pages
2023 Semeval-1 313
No ratings yet
2023 Semeval-1 313
17 pages
Farah CV
No ratings yet
Farah CV
1 page
2023 Ranlp-1 86
No ratings yet
2023 Ranlp-1 86
11 pages
Farah CV
No ratings yet
Farah CV
1 page
Semeval-2023 Task 4: Fine-Tuning Vs Prompting, Can Language Models Understand Human Values?
No ratings yet
Semeval-2023 Task 4: Fine-Tuning Vs Prompting, Can Language Models Understand Human Values?
7 pages
How To Make Money Through Stock Market
No ratings yet
How To Make Money Through Stock Market
31 pages
Tactical Medicine Essentials 2nd Edition John E. Campbell Full Access
No ratings yet
Tactical Medicine Essentials 2nd Edition John E. Campbell Full Access
164 pages
AMA 1130 Mid-Term Sample Sol
No ratings yet
AMA 1130 Mid-Term Sample Sol
4 pages
Power Shell
No ratings yet
Power Shell
44 pages
Operator Terminal Guide ''EXTER''
No ratings yet
Operator Terminal Guide ''EXTER''
54 pages
JIWAJI UNIVERSITY at A Glance PDF
No ratings yet
JIWAJI UNIVERSITY at A Glance PDF
24 pages
Pricing Strategy: Steps in Setting Price
100% (1)
Pricing Strategy: Steps in Setting Price
6 pages
Constructions - Pile Foundation Details
No ratings yet
Constructions - Pile Foundation Details
1 page
Adjusting The Accounts - Chapter 3
No ratings yet
Adjusting The Accounts - Chapter 3
5 pages
Quantum Scalar I500 Tape Drive Firmware Upgrade: Serviceandsupport/Softwareanddocumentationdownloads/Si500/ Index - Aspx
No ratings yet
Quantum Scalar I500 Tape Drive Firmware Upgrade: Serviceandsupport/Softwareanddocumentationdownloads/Si500/ Index - Aspx
6 pages
Corporate Case Continuation
No ratings yet
Corporate Case Continuation
2 pages
SAP S4HANA ERP Introduction Guide
No ratings yet
SAP S4HANA ERP Introduction Guide
37 pages
The Differences and Similarities Between IHRM and HRM
No ratings yet
The Differences and Similarities Between IHRM and HRM
16 pages
Final Ed
No ratings yet
Final Ed
63 pages
All Inquiry
No ratings yet
All Inquiry
18 pages
Alturas g4 Owners Manual
No ratings yet
Alturas g4 Owners Manual
340 pages
ETA Cooler
No ratings yet
ETA Cooler
8 pages
Dyed Fabric Costing Sheet
No ratings yet
Dyed Fabric Costing Sheet
1 page
Legal Aid Eligibility Confirmation
No ratings yet
Legal Aid Eligibility Confirmation
3 pages
Anchor & Mooring
No ratings yet
Anchor & Mooring
3 pages
Structured Outputs - OpenAI API
No ratings yet
Structured Outputs - OpenAI API
22 pages
Silverson Bottom Entry Mixer
No ratings yet
Silverson Bottom Entry Mixer
5 pages
EVERFI FinancialLiteracy SavingsAccounts StudentActivity
No ratings yet
EVERFI FinancialLiteracy SavingsAccounts StudentActivity
2 pages
Kaeser Screw Compressors DSD Series
50% (2)
Kaeser Screw Compressors DSD Series
8 pages
Important Missions of ISRO - UPSC
100% (1)
Important Missions of ISRO - UPSC
25 pages
USAPhO Student Instructions
No ratings yet
USAPhO Student Instructions
3 pages
Literature Review On Printers
44% (9)
Literature Review On Printers
3 pages
110W Flexible Panel Spec Sheet
No ratings yet
110W Flexible Panel Spec Sheet
1 page
Different Types of Irrigation Systems
No ratings yet
Different Types of Irrigation Systems
34 pages
Excel Conditional Formatting Introduction
No ratings yet
Excel Conditional Formatting Introduction
11 pages

Detection of Inline Code Comment Smells

Uploaded by

Detection of Inline Code Comment Smells

Uploaded by

ADate of publication xxxx 00, 0000, date of current version xxxx 00, 0000.

Digital Object Identifier 10.1109/ACCESS.2017.DOI

Evaluating Transformer-based Model for

INDEX TERMS Intrusion Detection; IOT; Classification; Machine/Deep Learning; Random

I. LITERATURE REVIEW systematic augmentation. Duplicate (comment, la-

C. DATA PREPROCESSING require a code segment (Beautification, Commented-

Not a Smell 7→ 2, then the resampled dataset is

where xi is a textual instance and yi is its class label. T = {w1 , w2 , . . . , wn }, (16)

TABLE 2: MCC and Accuracy for all evaluated models

TABLE 4: Performance of CodeBERT-base with

TABLE 6: Overall Accuracy and MCC for proposed

TABLE 8: Comparison of CodeBERT-base with

You might also like