Arabic Handwritten Text for Person Biometric
Identification: A Deep Learning Approach
1st Mazen Balat 1st Youssef Mohamed
Computer science and information technology Computer science and information technology
Egypt-Japan University of Science and Technology (E-JUST) Egypt-Japan University of Science and Technology (E-JUST)
Alexandria, 21934, Egypt Alexandria, 21934, Egypt
[Link]@[Link] [Link]@[Link]
2nd Ahmed Heakl 3rd Ahmed B. Zaky
Computer engineering Computer science and information technology
arXiv:2406.00409v1 [[Link]] 1 Jun 2024
Egypt-Japan University of Science and Technology (E-JUST) Egypt-Japan University of Science and Technology (E-JUST)
Alexandria, 21934, Egypt Alexandria, 21934, Egypt
[Link]@[Link] [Link]@[Link],[Link]@[Link]
Abstract—This study thoroughly investigates how well deep robustness. We also investigate the effectiveness of transfer
learning models can recognize Arabic handwritten text for person learning in reducing training time and enhancing recognition
biometric identification. It compares three advanced architec- accuracy. Our experimental results demonstrate the feasibility
tures—ResNet50, MobileNetV2, and EfficientNetB7—using three
widely recognized datasets: AHAWP, Khatt, and LAMIS-MSHD. of Arabic handwriting recognition for biometric identification
Results show that EfficientNetB7 outperforms the others, achiev- and its potential applications in forensic document examina-
ing test accuracies of 98.57%, 99.15%, and 99.79% on AHAWP, tion, identity verification, and access control.
Khatt, and LAMIS-MSHD datasets, respectively. EfficientNetB7’s The main contributions of this work are:
exceptional performance is credited to its innovative techniques,
including compound scaling, depth-wise separable convolutions, 1) Introduced new preprocessing and augmentation tech-
and squeeze-and-excitation blocks. These features allow the niques for Arabic handwriting recognition, improving
model to extract more abstract and distinctive features from model accuracy and robustness.
handwritten text images. The study’s findings hold significant 2) Demonstrated the potential of transfer learning to im-
implications for enhancing identity verification and authentica-
tion systems, highlighting the potential of deep learning in Arabic prove Arabic handwriting recognition accuracy and re-
handwritten text recognition for person biometric identification. duce training time.
Index Terms—Arabic handwritten text recognition,Deep learn- 3) Conducted a study on the relationship between the
ing,Handwritten,ResNet,MobileNet,EfficientNet number of writers and system accuracy.
Our approach has the potential to be extended to other
I. I NTRODUCTION languages and handwriting styles, making it a versatile method
Biometric identification has become a crucial aspect of for biometric identification.
modern security and authentication systems. Among various
II. R ELATED W ORKS
biometric modalities, handwritten text has gained significant
attention due to its unique characteristics and potential ap- In ”GR-RNN: Global-Context Residual Recurrent Neural
plications. In this work, we focus on Arabic handwriting Networks for Writer Identification” [6],the proposed method-
recognition, which presents a challenging task due to the ology involves a series of convolutional layers, max-pooling
cursive nature of the script and limited availability of datasets. layers, and global average pooling, followed by the integration
Recent advances in deep learning have improved the accu- of extracted features using a residual RNN to model spatial
racy of biometric identification systems, including fingerprint dependencies between fragments. The approach is extensively
[1], face [2], iris [3], and voice recognition [4],electroen- evaluated on four benchmark datasets, comprising IAM (657
cephalography (EEG) [5]. However, handwritten text recog- writers), CVL (310 writers), Firemaker (250 writers), and
nition remains a relatively understudied area, particularly for CERUG-EN (105 writers). The results demonstrate the ef-
Arabic scripts. Our research aims to bridge this gap by ficacy of the proposed GR-RNN method, achieving top-1
exploring the potential of Arabic handwriting recognition for accuracy ranging from 82.4% to 95.2% for word-level writer
biometric identification. identification, 82.4% to 95.2% for line-level writer identifica-
In this paper, we present a novel approach to Arabic tion, and 82.6% to 96.6% for page-level writer identification.
handwriting recognition, leveraging advanced preprocessing These findings underscore the effectiveness of the GR-RNN
and augmentation techniques to improve model accuracy and approach in capturing the global context and spatial depen-
dencies inherent in handwritten images, thereby facilitating pre-trained on extensive image datasets, have demonstrated ex-
accurate writer identification. ceptional adaptability in feature extraction and pattern recog-
In ”Hybrid Trainable System for Writer Identification of nition, making them ideal candidates for the task at hand.
Arabic Handwriting” [7],the writer identification scheme is ResNet50 [10], with its deep residual learning framework,
evaluated on the KHATT dataset, comprising 4,000 Arabic facilitates the training of networks that are substantially deeper
handwritten documents from 1,000 authors. The cumulative than those used previously. MobileNetV2 [11], designed for
match characteristic (CMC) curve is used to measure the per- mobile and embedded vision applications, offers an optimal
formance of the identification algorithm. Experimental results balance between latency and accuracy. EfficientNetB7 [12],
show that the proposed hybrid trainable system, combining the largest variant in the EfficientNet family, achieves superior
convolutional neural networks (CNNs) and support vector performance by scaling up the network in a more structured
machines (SVMs), outperforms other models, including HOG manner. The integration of these models into our frame-
and ANN, and ResNet50. The system achieves an accuracy work, through transfer learning, has significantly advanced the
of 94.2% when using the whole paragraph with augmentation, frontiers of Arabic handwritten text recognition, providing a
and 83.2% when using lines. The results demonstrate the effec- testament to their versatility and power in deciphering complex
tiveness of the proposed system in writer identification tasks, linguistic patterns inherent to the Arabic script.
and highlight the importance of using augmentation to increase
the training dataset. Future work will focus on identifying III. DATASETS
effective features that can facilitate writer identification based A. KHATT Dataset
on text lines and subwords with high accuracy. We use the KHATT database [13], a comprehensive collec-
In ”Writer identification using textural features” [8], the tion of handwritten Arabic text from 1,000 writers, with 2,000
authors proposed a writer identification approach using local unique paragraph images and 2,000 similar-text paragraph
binary pattern (LBP) features and vector of locally aggregated images (see Figure 1).
descriptors (VLAD) encoding. They evaluated their method
on the LAMIS-MSHD dataset, which consists of Arabic and
French handwritten texts from 100 individuals. The experi-
mental results showed that the proposed approach achieved
high writer identification rates, with top-1 classification rates
ranging from 95% to 100% for both Arabic and French texts,
depending on the number of clusters used in the VLAD
encoding. Notably, the authors observed that medium and Fig. 1. KHATT dataset sample
small numbers of clusters (between 16 and 256) resulted in the
highest identification rates, while larger numbers of clusters
led to degraded performance. This study demonstrates the B. AHAWP Dataset
effectiveness of texture-based features and VLAD encoding for The AHAWP dataset [14] provides a diverse collection of
writer identification in handwritten Arabic and French texts. handwritten Arabic alphabets, words, and paragraphs from 82
In ”Offline text-independent writer identification using a individuals, with 53,199 alphabet images, 8,144 word images,
codebook with structural features” [9], the authors proposed and 241 paragraph images (see Figure 2).
a writer identification approach using contour-based features
extracted from handwritten texts. They employed two fea-
ture extraction methods, CPCA and CON3, and encoded
the features into a codebook using k-means clustering. The
authors evaluated their approach on two datasets, KHATT
and IAM, and achieved high identification rates of up to
96.3% and 88.2%, respectively, using multiclass SVM and
nearest neighbor classifiers. The authors analyzed the impact
of various parameters, including contour fragment length and
angular quantization, on the system’s performance and found
that they have a significant effect. The study demonstrates the
effectiveness of using structural features and codebook-based
encoding for offline text-independent writer identification.
In the realm of biometric identification, the efficacy of deep
learning models is paramount, particularly in the challenging
domain of Arabic handwritten text recognition. This study
harnesses the robust capabilities of transfer learning, em-
ploying state-of-the-art convolutional neural networks such as Fig. 2. AHAWP dataset sample
ResNet50, MobileNetV2, and EfficientNetB7. These models,
C. LAMIS-MSHD Dataset
The LAMIS-MSHD dataset [15] is a multi-script offline
handwritten database containing Arabic and French text sam-
ples, signatures, and digits from 100 writers (see Figure 3).
Fig. 3. LAMIS-MSHD dataset sample
TABLE I Fig. 4. Binarization result
C OMPARISON OF DATASETS
Dataset Number Paragraphs Total Words
of per User Para- per User
Writers graphs
KHATT 1,000 4 4,000 -
AHAWP 82 3 246 100
LAMIS-MSHD 100 12 1,200 -
IV. P REPROCESSING
To advance person biometric identification using deep learn-
ing models for Arabic handwritten text recognition, a crucial
preprocessing step is necessary to prepare the data for training
and testing. Our proposed preprocessing pipeline consists
of five stages: binarization, dilation, contour detection and
sorting, filtering small components, and ROI extraction.
A. Binarization
The first step in our preprocessing pipeline is binarization Fig. 5. Dilation result
[16], which involves converting the grayscale images of Arabic
handwritten paragraphs into binary images. This is done to
enhance the contrast between the text and the background,
making it easier to extract features from the text. We applied
the Otsu’s thresholding method to achieve binarization (see
Figure 4).
B. Dilation
After binarization, we applied dilation [17] to the binary
images to fill in any gaps between connected components.
This step helps to strengthen the connections between adjacent
strokes in the Arabic handwritten text, making it easier to
detect contours (see Figure 5).
C. Contour Detection and Sorting
In the third stage, we detected contours in the dilated images
using the Canny edge detection algorithm [18]. The detected
contours were then sorted based on their size and orientation to
identify the text lines. This step is crucial in separating the text
into individual lines, which is essential for Arabic handwritten Fig. 6. Contour Detection and Sorting result
text recognition (see Figure 6).
D. ROI Extraction with Filtering
Finally, we extracted the regions of interest (ROIs) [19] from
the filtered images, which correspond to individual text lines.
Fig. 8. Original image and after applying the reduce line thickness technique.
Each ROI represents a single line of Arabic handwritten text,
which is then fed into the deep learning model for recognition.
During ROI extraction, we also filter out small components
B. Apply Random Noise
that are unlikely to be part of the text (see Figure 7).
Another technique used is adding random noise to the
handwritten text. This technique helps to simulate the noise
and distortions that may occur during image acquisition or
transmission. Figure 9 shows the effect of this technique on a
sample image.
Fig. 9. Original image and after applying the random noise technique.
Fig. 7. ROI Extraction with Filtering result
C. Apply Random Stretch
Finally, each extracted ROI is resized to a fixed size of We also applied random stretch to the handwritten text,
224x224 pixels to ensure uniformity for the deep learning horizontally stretching the image by a random factor between
model. -0.9 and +0.1. This technique helps to simulate the variability
in writing sizes and orientations. Figure 10 illustrates the effect
E. Data Splitting of this technique on a sample image.
After preprocessing all the images, the data is split into
training, validation, and testing sets. In this work, we adopt
an 80/10/10 split, where 80% of the data is used for training
the deep learning model, 10% is used for validation, and the
remaining 10% is used for testing. The validation set allows for
tuning the hyperparameters of the model, while the testing set
provides an unbiased evaluation of the model’s generalization
performance on unseen data.
By adopting a rigorous data splitting approach, we can Fig. 10. Original image and after applying the random stretch technique.
ensure that our model is thoroughly trained, validated, and
tested, thereby enhancing its reliability and effectiveness in
addressing complex problems.
D. Combined Augmentations (All Augmentations)
V. DATA AUGMENTATION All three augmentation techniques were combined to further
increase the diversity of the training dataset.
To increase the robustness of our deep learning models for The ratio of the final dataset size to the original dataset
Arabic handwritten text recognition, we employed several data size after applying each augmentation technique and their
augmentation techniques. These techniques aim to artificially combinations are summarized below:
increase the size of the training dataset by applying transfor- Augmentation Technique Final Dataset Size Ratio
mations to the original images.
Reduce Line Thickness 2x
Apply Random Noise 3x
A. Reduce Line Thickness Apply Random Stretch 4x
One of the augmentation techniques used is reducing the These data augmentation techniques effectively quadrupled
line thickness of the handwritten text. This technique helps the dataset size when all techniques were combined. This
to simulate the variability in writing styles and instruments. increased diversity in the training data is expected to enhance
Figure 8 illustrates the effect of this technique on a sample the performance and robustness of our deep learning models
image. across various datasets for Arabic handwritten text recognition.
Without Augmentation
Input Pre-trained Model Model Train Acc Val Acc Test Acc
ResNet 100.00% 87.50% 87.84%
MobileNet 99.00% 82.81% 82.40%
EfficientNet B7 98.74% 75.00% 75.00%
Feature Maps All Augmentations
Model Train Acc Val Acc Test Acc
ResNet 99.35% 90.95% 90.82%
MobileNet 99.88% 94.82% 94.4%
FC New EfficientNet B7 99.9% 98.57% 98.52%
TABLE II
C OMPARISON OF MODELS WITH DIFFERENT AUGMENTATIONS FOR THE
AHAWP DATASET
Output Predictions
the images. These techniques introduced diversity into the
training data, which in turn facilitated better generalization.
Fig. 11. Illustration of Fine-Tuning Process
The combination of these augmentation methods significantly
reduced overfitting, resulting in improved performance across
VI. M ETHODOLOGY all datasets.
In this study, we leveraged the power of transfer learn- A. AHAWP Dataset
ing to fine-tune three state-of-the-art deep learning models
EfficientNet B7 consistently outperformed other models,
- ResNet50, MobileNetV2, and EfficientNetB7 - for Arabic
exhibiting the highest test accuracy across all augmentation
handwritten text recognition in the context of person bio-
techniques (see Table II). Upon the application of all aug-
metric identification. By building upon the knowledge these
mentations, a substantial improvement in performance was
models have already acquired, we aimed to create a robust
observed across all models. Notably, ResNet’s test accuracy
and accurate recognition system that can effectively identify
improved by 3.42%, MobileNet’s by 14.56%, and EfficientNet
individuals.
B7’s by an impressive 31.36%. These findings underscore the
A. Model Fine-Tuning pivotal role of augmentations in bolstering the efficacy of
To adapt these pre-trained models to our specific task, we image classification models, with EfficientNet B7 showcasing
employed a fine-tuning approach. This involved modifying the the most significant enhancement.
final classification layer of each model to accommodate the B. KHATT Dataset
number of classes in our dataset, allowing the models to learn
features that are specific to our Arabic handwritten text dataset.
No Augmentation
As shown in Figure 11, we replaced the last fully connected
Model Train Acc Test Acc Val Acc
layer of each model with a new one that matches the number
of classes in our dataset. This enabled the models to produce ResNet 94.48% 31.99% 29.64%
predictions that are relevant to our task, rather than the generic MobileNet 76.28% 37.94% 36.58%
predictions they were initially trained to make. EfficientNet B7 91.77% 32.45% 31.90%
All Augmentations
B. Optimizing Model Performance
ResNet 99.84% 98.47% 98.71%
To optimize the performance of each model, we conducted
MobileNet 98.76% 97.08% 97.28%
a thorough training and hyperparameter tuning process. We
trained each model using our augmented dataset, with a batch EfficientNet B7 99.76% 99.15% 99.12%
TABLE III
size of 16 and a maximum of 10 epochs. We employed the C OMPARISON OF MODELS WITH DIFFERENT AUGMENTATIONS FOR THE
Adam optimizer with an initial learning rate of 0.0001, which KHATT DATASET
was decayed by a factor of 0.1 every 7 epochs using a step
learning rate scheduler.
EfficientNet B7 emerged as the preeminent model across all
VII. R ESULTS augmentation scenarios, boasting the highest test accuracy and
The initial experiments revealed the presence of overfitting validation accuracy (see Table III). Following augmentation,
across all datasets. To address this challenge, we implemented all models exhibited substantial performance enhancements.
various data augmentation techniques, including adjusting Specifically, ResNet’s test accuracy surged by 66.48%, Mo-
line thickness, adding random noise, and randomly stretching bileNet’s by 59.14%, and EfficientNet B7’s by an impressive
66.70%. These results underscore the efficacy of augmen- with our techniques leading to significant improvements in
tations in fortifying the performance of image classification recognition accuracy. These findings pave the way for further
models, with all models demonstrating substantial progress. advancements in Arabic handwriting recognition and its appli-
Our model has demonstrated a significant enhancement, cations in biometric identification, and the proposed method
achieving an improvement of approximately 4.95% over the can be extended to other languages and handwriting styles,
accuracy reported in the study referenced as [7]. Furthermore, making it a versatile solution for security and authentication
it surpassed the accuracy reported in the paper cited as [9] by systems.
approximately 2.85%.
R EFERENCES
.
[1] S. A. Grosz, A. K. Jain, Afr-net: Attention-driven fingerprint recognition
C. LAMIS-MSHD Dataset network, IEEE Transactions on Biometrics, Behavior, and Identity Sci-
ence 6 (1) (2024) 30–42. doi:10.1109/TBIOM.2023.3317303.
[2] F. Liu, D. Chen, F. Wang, Z. Li, F. Xu, Deep learning based single
No Augmentation
sample face recognition: a survey, Artificial Intelligence Review 56 (3)
Model Train Acc Test Acc Val Acc (2023) 2723–2748. doi:10.1007/s10462-022-10240-2.
[3] Y. Yin, S. He, R. Zhang, H. Chang, X. Han, J. Zhang, Deep learning
ResNet 99.71% 79.96% 84.10% for iris recognition: A review (2023). arXiv:2303.08514.
MobileNet 98.93% 75.16% 75.10% [4] B. Saritha, M. A. Laskar, A. M. Kirupakaran, R. H. Laskar, M. Choud-
hury, N. Shome, Deep learning-based end-to-end speaker identification
EfficientNet B7 99.52% 83.20% 82.15% using time-frequency representation of speech signal, Circuits, Systems,
All Augmentations and Signal Processing 43 (3) (2024) 1839–1861. doi:10.1007/
s00034-023-02542-9.
Model Train Acc Test Acc Val Acc [5] Y. Mohamed, A. M. Anter, A. B. Zaky, Recurrent neural networks
ResNet 99.89% 99.37% 99.22% (rnns) to improve eeg-based person identification, in: 2023 Intelligent
Methods, Systems, and Applications (IMSA), 2023, pp. 616–621. doi:
MobileNet 99.80% 98.48% 99.37% 10.1109/IMSA58542.2023.10217750.
EfficientNet B7 99.52% 99.79% 99.63% [6] S. He, L. Schomaker, Gr-rnn: Global-context residual recurrent neural
TABLE IV networks for writer identification (2021). arXiv:2104.05036.
C OMPARISON OF MODELS WITH DIFFERENT AUGMENTATIONS FOR THE [7] S. Saleem, A. Mohsin Abdulazeez, Hybrid trainable system for writer
LAMIS DATASET identification of arabic handwriting, Computers, Materials and Continua
68 (05 2021). doi:10.32604/cmc.2021.016342.
[8] S. Lazrak, A. Semma, A. E. K. Noureddine, Y. Elkettani, D. Mentagui,
Writer identification using textural features, ITM Web of Conferences
EfficientNet B7 emerged as the top-performing model for 43 (2022) 01027. doi:10.1051/itmconf/20224301027.
the LAMIS dataset, maintaining the highest test accuracy both [9] B. Q. Ahmed, Y. F. Hassan, A. S. Elsayed, Offline text-independent
writer identification using a codebook with structural features, PLOS
before and after augmentations (see Table IV). Following ONE 18 (4) (2023) 1–31. doi:10.1371/[Link].
augmentation, ResNet’s test accuracy improved by 19.41%, 0284680.
MobileNet’s by 23.32%, and EfficientNet B7’s by 17.48%. [10] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image
recognition (2015). arXiv:1512.03385.
The consistent high performance of EfficientNet B7 across [11] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.-C. Chen, Mo-
all datasets underscores its robustness and effectiveness in bilenetv2: Inverted residuals and linear bottlenecks (2019). arXiv:
handwritten text recognition tasks. 1801.04381.
[12] M. Tan, Q. V. Le, Efficientnet: Rethinking model scaling for convolu-
tional neural networks (2020). arXiv:1905.11946.
VIII. C ONCLUSION [13] S. Mahmoud, I. Ahmad, W. Al-Khatib, M. Alshayeb, M. Parvez,
V. Märgner, G. Fink, Khatt: An open arabic offline handwritten text
In this work, we present a comprehensive approach to database, Pattern Recognition 47 (2014) 1096–1112. doi:10.1109/
Arabic handwritten text recognition for person biometric ICFHR.2012.224.
identification using deep learning. We address the challenge [14] M. Khan, Arabic handwritten alphabets, words and paragraphs per user
(ahawp) (2022). doi:10.17632/2h76672znt.2.
of accurately recognizing Arabic handwritten text, which is [15] C. Djeddi, A. Gattal, L. Souici-Meslati, I. Siddiqi, Y. Chibani,
crucial for applications such as forensic document examina- H. El Abed, Lamis-mshd: A multi-script offline handwriting database,
tion, identity verification, and access control. To solve this in: 2015 13th International Conference on Document Analysis and
Recognition (ICDAR), IEEE, 2015, pp. 1146–1150.
problem, we develop a robust preprocessing pipeline and [16] N. Otsu, A threshold selection method from gray-level histograms, IEEE
employ data augmentation techniques to improve the accuracy Trans. Syst. Man Cybern. 9 (1979) 62–66.
and robustness of the recognition process. The combination of [17] T. Zhang, C. Suen, A fast parallel algorithm for thinning digital patterns,
Commun. ACM 27 (1984) 236–239.
these techniques plays a crucial role in achieving significant [18] J. Canny, A computational approach to edge detection, IEEE Transac-
improvements in recognition accuracy. tions on Pattern Analysis and Machine Intelligence PAMI-8 (6) (1986)
We also investigate the effectiveness of transfer learning, 679–698. doi:10.1109/TPAMI.1986.4767851.
[19] R. Lienhart, J. Maydt, An extended set of haar-like features for rapid
reducing training time and achieving higher recognition accu- object detection, Vol. 1, 2002, pp. I–900. doi:10.1109/ICIP.
racy of up to 99.79%. Furthermore, our study explores the 2002.1038171.
relationship between the number of writers and the system’s
accuracy, providing insights into the scalability of the proposed
approach.
The results demonstrate the feasibility and potential of using
Arabic handwritten text for person biometric identification,