Lung Cancer Detection Using Digital Image Processing On CT Scan Images
Lung Cancer Detection Using Digital Image Processing On CT Scan Images
Abstract— Biomedical Image Processing is the latest emerging able to afford healthcare or expertise and imaging., let alone
tool in medical research used for the early detection of cancers. consultation and diagnosis. Artificial Intelligence will play an
Artificial Intelligence can be used in the medical field to diagnose important role in a demographic where expensive health care
diseases at an early stage. Computed Tomography (CT Scans) of lungs can be replaced by a computerized system.
of the patients from Lung Image Database Consortium (LIDC) is used
as input data for image processing. In pre-processing stage conversion
of RGB image to gray-scale image takes place because RGB images Image Processing is a method to convert an image into digital
are too complex to process. Gray-scale image is further converted to form and perform some operations on it, in order to get an
Binary image. After Image Processing, the input images become more enhanced image or to extract some useful information from it.
efficient and refined. These are input for the Convolution Neural It includes acquiring input images, noise removal, enhancement
Network. Convolution Filtering, Max Pooling filtering are steps in and segmentation. In Image Processing, the RGB images will
CNN which train the data to predict whether lung image is cancerous be converted to grayscale and binary. Images will be enhanced
(malignant) or non-cancerous (benign). Deep Learning is a newer and noise will be removed using filters. Blurred effect will be
branch of Artificial Intelligence research will help in better eliminated if any. This improves the quality of input image [13].
performance in CNN based systems. The proposed system will also
The Image Processing Toolbox in MATLAB is used to perform
take into account the processing power and time delay of the cancer
detection process for efficiency. the Image Processing stages. Many different algorithms are
possible for each stage in image processing [14].
Keywords— CNN, Deep Learning, LIDC, image processing, CT
scan, watershed segmentation. Deep Learning is used for the classification of CT Scan
Images as cancerous/non-cancerous. The process of feature
I. INTRODUCTION extraction in Convolution Neural Networks is such that features
are defined and computed by the algorithm itself. During the
Lung Cancer is the most common cancer among men and the training stage, input and an output label are provided. Based on
third most common cancer in women. 85% of all lung cancer the given data, the algorithm analyses the features/patterns and
cases are related to smoking and consumption of tobacco. for a training data, forms a set of parameters and feature
Around 20% of all mortality in cancer is due to lung cancer. 2.1 extraction [15]. Based on the computations, the new data can be
million new cases were registered and 1.8 million deaths were tested for prediction of a correct output. Convolution Neural
accounted in 2018 alone. The five-year survival rate for tumors Networks consist of an input and an output layer, and multiple
detected at an early stage is 56%, i.e, if detected at an early hidden layers. The input layers accept inputs and the number of
stage, treatments can prevent mortality and spread of tumor for output layers define the number of outputs in the result.
atmost 5 years, if not more or fully diagnosed. Early detection Convolution layers are used to define features and parameters.
of tumor can decrease mortality rate by 20% among individuals. Pooling layers bring together the computations with similar
Cancer cells multiply as the infection spreads, and tumors permutation. The convolution filter will form a spatially dense
increase in size gradually. A large population of cancer patients output by assigning a common value to a set of matrix pixels.
are from an economically poor background and might not be These values decide the output for that image.
577
II. RELATED WORK using different markers : internal markers associated with
In 2012, Mokhled S. Al Tarawneh published a comparison object of interest and external markers associated with
paper between different Image Processing techniques and the background. It is simple, intuitive and fast method [4].
algorithms they use for a CAD system for lung cancer detection. According to the research, watershed segmentation approach
The main aim of the paper is to detect features for accurate has more accuracy (85.27%) than thresholding approach
comparisons between images with different processing (81.24%).
techniques [1]. Three steps are Image enhancement,
segmentation and feature extraction. The aim of image In 2017, authors Pooja R. Katre and Anuradha Thakare
enhancement is to improve quality of image to provide better described the various image processing techniques for detecting
input for classification. Gabor filter, auto enhancement and fast- lung cancer. In their proposed approach for noise removal and
fourier transform improve enhancement rate. Thresholding enhancement, method Median Filter is used. The best part about
and Watershed methods are used for segmentation, of which, median filter is it removes noise without blurring the image. It
watershed provides a better quality of segmentation. Feature preserves the edges of the regions [5]. It is used to remove salt
extraction uses binarization and masking approach. and pepper noise from the image. In enhancement stage gabor
Binarization and Masking, on combined implementation, gives filter is used as it gives better result compared to fast fourier and
an optimal result. auto enhancement. The purpose of this paper is to detect the
In December 2017, Suren Makaju, P.W.C Prasad, Abeer tumour at an early stage. CT scan images are taken as input.
Alsadoon and A.K. Singh worked on CAD system of lung After Image Processing is the feature extraction stage in which
cancer with CT Scan Images as primary focus. They believe the the area, perimeter, Eccentricity of the image is calculated.
that CT Scan Images are the best input data for this research [2]. Support Vector Machine algorithm is used to classify the data.
The proposed model uses noise removal algorithms before The features mentioned above help to identify the size of the
image processing. It uses the same segmentation as the current tumour and from that, the stage of the cancer is detected.
system, i.e., watershed algorithm and promotes a well-defined
feature extraction before classification using SVM. The author In 2017, from China, Lei Fan and his group of researchers made
has used images from LIDC dataset and the system gives a 92% use of deep learning algorithm for CAD lung cancer detection.
accuracy and 50% specificity. In this paper, image processing is not applied on CT scans of
lungs. The images are directly fed as an input to convolutional
In May 2015, Md. Badrul Alam Miah and Mohammad Abu neural network which consist of two convolutional layers, two
Yousuf proposed a Neural Network based CAD system for max pooling layers, one fully connected layer and one output
early detection and diagnosis. ANN and fuzzy clustering, IP, layer [6]. Rectified linear unit (ReLU) are applied between
Curvelet transform, multinomial Bayesian algorithm, back convolutional and max pooling layers. The system gives an
propagation, gray-coefficient mass estimation and SVM are the overall accuracy of 67.7%. It concludes that Support vector
basis of these observations. The goal is to create a fast and machines has lower classification accuracy than 3D
robust, more accurate system having a rotation, scaling and convolutional neural network in the same number of input
translation variant feature extraction [3]. A dataset of 300 samples.
images acquired from hospitals is used. Steps in proposed
system are Image acquisition, processing, binarization, In 2017, authors Qing Wu and Wenbing Zhao published a paper
segmentation using thresholding, feature extraction and neural to propose a CAD based Lung cancer detection system. In their
network classification. Steps in Image processing are grayscale system, neural network based algorithm, EDM (Entropy
conversion, normalization, noise reduction, binarization and Degradation Method) is proposed to detect SCLC (small cell
removing unwanted portion of image. Feature extraction uses lung cancer) from CT scan images [7]. The training and testing
features like center of image, ratio of height to width, average data are lung CT scan images which are provided by the
distance between black pixels and the center, etc. Neural National Cancer Institute. Five scans from each group are
networks is used for classification with two outputs. The system randomly selected to train the model. The images with SCLC
gives an accuracy of 96.67%, higher than all existing systems. are labelled as cluster 1 while others are labelled as cluster 0.
The paper proposes early stage detection of lung cancer.
In March 2014, Prof. Sanjeev N. Jain and Bhagyashri G. Patil
proposed few methods to detect cancerous cells from CT Scans In another paper by authors Anita Chaudhary and Sonit Sukhraj
of Lungs. The purpose of this paper is to find the cancerous cells Singh, the proposed CAD system has techniques to detect the
and give more accurate result by using various tumour at an early stage using Image Processing. In image
segmentation techniques such as thresholding and watershed enhancement stage, Gabor filter enhancement technique, Auto
segmentation. In thresholding, a threshold value is set to Enhancement Technique and Fast Fourier Transform
differentiate object of interest from the background. If the pixel Technique is used out of which the Gabor Filter Enhancement
value is greater than the threshold value then it belongs to the Technique is the most suitable one. According to the research,
object else it is in the background. Thus the region of interest the Watershed segmentation approach is more accurate than the
can be extracted by using thresholding approach. In watershed Thresholding segmentation approach [8]
segmentation, the background and the object are separated
578
2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS)
579
2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS)
median filter. Median filter is more efficient because it removes architecture consists of 16 layers, as additional adjacent layers
the noise without distorting the image edges. Matlab function for decrease the chances of errors in performance. The list of layers
Median filter is medfilt2. The result of pre-processing is a is as follows:
normalized (uniform), binary image collection of a better quality
dataset. The processed images are fed as input to the CNN 1) imageInputLayer([227 227 3])
Network. 2) convolution2dLayer(5,20)
3) reluLayer
B. Convolution Neural Network 4) maxPooling2dLayer(2,'stride',2)
Previous implementation using 3D CNN Networks used an 5) convolution2dLayer(5,20)
architecture of 3D CNN, consisting of two different 6) reluLayer
convolution layers to obtain two sets of feature maps. The 7) maxPooling2dLayer(2,'stride',2)
second layer consists of two different max-pooling layers 8) convolution2dLayer(5,20)
applied on the feature maps. A convolution layer after the max- 9) reluLayer
pooling layer gives a set of resized feature maps. A second max- 10) maxPooling2dLayer(2,'stride',2)
pooling layer is applied to each feature map. This helps connect 11) fullyConnectedLayer(3)
convolution layer to multiple frames of data. The final layers 12) softmaxLayer
are fully connected layers and the dropout laye ReLU activation 13) classificationLayer()
is needed in each layer of the architecture. This entire
architecture is binded by a final fully-connected layer and a Input Layer provides the input image to the CNN network. The
softmax layer with constant learning rate. The architecture input to the network is a pre-processed image with zero-center
delivers an accuracy of 67.7%. normalization and empty transformations performed. The
image size is reduced to 227*227 with 3 input data types.
The CanNet architecture uses an input layer in which images
are 3D concatenated to create a linear volume. The input layer Convolution Layer uses two parameters; filter size and number
is followed by two Convolution layers, a max-pooling and a of filters. A 2-D convolution layer is applied which convolves
fully-connected layer and an output layer. The first convolution the filters vertically and horizontally through the image. It
layer produces 78 features, the second convolution layer detects calculates the dot product of the weights and and input for each
patterns in features, to learn feature exractions hierarchically feature, and adds a bias term, which is set to default value.
from the previous layer. Both layers are followed by a ReLU
layer to rectify all negative activations to zero. Max-pooling Rectified Linear Unit (ReLU) Layer applies a max function
layer reduces data size by reducing data dimensions. A dropout f(x)=max(x,0) to the matrix of the convolved image after
layer helps randomly avoiding neurons in the CNN network for convolution. It sets all the negative values in the dot products
allowing newer data to be tested on different neurons, and of the matrix to 0. All other values are unchanged. It increases
minimizing a influence of previous testing on new testing data the speed of training the network by removing negative
incase of similiarities. The fully-connected layer reduces the activations in the gradient, thus avoiding complex negative
architecture to two neurons, one for both of the desired outputs, computations.
benign and
580
2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS)
overlapping of sub-regions. The Convolution, ReLU dataset is pre-processed and stored as a training dataset. Pre-
and Max-Pooling layers are applied thrice in the sequence – processing invloves grayscale conversion, noise removal and
Convolution Layer -> ReLU Layer -> Max-Pooling Layer. segmentation. This dataset is trained in the CNN Neural
Network with its labels. The remaining 30 percent of images
Fully Connected Layer is applied the sequence of Convolution, are used as a testing dataset. Images from the testing dataset are
ReLU and Max-Pooling layers. The input to fully connected pre-processed and sent into the neural network for
layer is a meaningful, low-dimensional invarient feature space. classification. The end result is the detection of normal, beningn
The fully connected layer is used to get a non-linear or malignant case along with tumor edge-detected using
combination of the features. It holds a feature vector for the watershed algorithm.
input, which is needed for classification or regression and
categorization. The input size to the layer, ‘3’ specifies the three
types of desired outputs. A transfer function ‘tf’ shows input- IV. IMPLEMENTATION
output relation in the layer. Fully Connected layer gives an end-
A. Implementation in MATLAB
to-end training to the network.
The proposed system is implemented in MATLAB. MATLAB is a
Softmax Layer applies a softmax function to the output. The high-performance computing environment and language for
productivity, research, development and analysis in the fields of
softmax function applies probability distribution to the feature mathematics, science and computer technology. The LIDC database
vector produced in the fully connected layer. It maps the feature consists of images in DICOM format, which are converted to .jpeg
vector over a predicted output class using probability using open source software MicroDicom converter. These images
distribution. were sorted and labels were generated. The image pre-processing in
MATLAB uses the directory to dataset and applies pre-processing
Classification Layer assigns a probability value obtained from steps to the images and saves them in another folder. The Deep-
the softmax layer to each of the mutually exclusive output Learning Toolbox is available on MATLAB for Deep Learning and
CNN Neural Networks related aid. Designing the user interface in
possible on MATLAB using the GUIDE; GUI Designing
Environment. Two-dimensional or three-dimensional graphs can be
plotted on MATLAB with ease.
581
2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS)
582
2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS)
True positives : 256 False positives : 23 Symposium on Computer Science and Intelligent Controls (ISCSIC),
True negatives : 512 False negatives : 23 Budapest 2017, pg. 88-91
[9] D. P. Kaucha et. al, “Early Detection of Lung Cancer using SVM
Table 4. Cummulative Results Classifier in Biomedical Image Processing”, IEEE International
Conference on Power, Control, Signals and Instrumentation
Engineering (ICPCSI-2017), Chennai, 2017, pg. 3143-3148.
Sensitivity 91.75%
[10] A. Chaudhary and S. S. Singh, “Lung Cancer Detection on CT
Specificity 95.70%
images by using Image Processing”, 2012 International Conference
Precision 91.75% on Computing Science, Phagwara, 2012, pg. 142-146.
Accuracy 94.34% [11] S. Kalaivani et. al, “Lung Cancer Detection Using Digital Image
Processing and Artificial Neural Networks”, International
An overall accuracy of 94.34% is obtained from proposed system. Conference on Electronic, Communication and Aerospace
Technology (ICECA) Coimbatore, 2017, pg. 100-103.
[12] P. Rao et. al, “Convolution neural networks for lung cancer
VI. FUTURE SCOPE screening in computed tomography (CT) scans”, 2016 2 nd International
The LIDC dataset is a static dataset. Maintaining a dynamic Conference Conference on Contemporary Computing and Informatics
real-time database will help study the changes in lung cancer (IC3I), Noida, 2016, pg. 489-493.
cases over a time period. Increasing the number of [13] R. Golan et. al, "Lung nodule detection in CT images using deep
convolutional neural networks," 2016 International Joint Conference
Convolutions in deep learning will improve the results of the on Neural Networks (IJCNN), Vancouver, BC, 2016, pp. 243-250
system variably, but will also affect the efficient and [14] W. Alakwaa et. al, “Lung Cancer Detection and Classification
performance time delay exponentially. The batch size for with 3D Convolutional Neural Network (3D-CNN).” 2017
training neural network can be increased for a high performance International Journal of Advanced Computer Science and Applications
GPU. The system can be tested on a set of different databases (IJACSA), volume 8, no. 8, pg. 109-117.
like the RIDER or TCIA database. More accurate feature [15] S. Prakash et. al, “An Automated System for the Detection of
plotting is possible boundary detection in tumor cells. The Lung Cancer in CT data at Early Stages: Review,” 2017 International
Journal of Control Theory and Applications, pg. 133-144.
overall accuracy of the system can be furthur improved in Deep
[16] https://en.wikipedia.org/wiki/Median_filter , last accessed on
Learning. 01/10/2018
[17] https://in.mathworks.com/help/images/marker-controlled-
watershed- segmentation.html, last accessed on 19/08/2018
VII. REFERENCE [18]https://www.google.co.in/search?q=convolution+neural+network
s&oq=convolution+neural+networks&gs_1=psy, last accessed
on 05/10/2018.
[19] https://www.wikipedia.com/lung_cancer, last accessed on
[1] Md. Badrul Alam Miah et. al, “Detection of lung cancer from CT 29/01/2019.
Image using Image Processing and Neural Networks”, 2015 [20] https://en.wikipedia.org/wiki/Sensitivity_and_specificity, last
International Conference on Electrical Engineering and Information accessed on 14/02/2019.
Communication Technology, (ICEEICT), Dhaka, 2015, pg.1-6. [21] https://in.mathworks.com/help/deeplearning/index.html, last
[2] P. R. Katre and A. Thakare, “Detection of Lung ancer Stages accessed on 14/02/2019.
using Image Processing and Data Classification Techniques”, 2017 [22] C.V. Arulkumar et al., "Secure Communication in Unstructured
2nd International Conference for Convergence in Technology (I2CT), P2P Networks based on Reputation Management and Self
Mumbai, 2017, pg. 402-404. Certification", International Journal of Computer Applications, vol.
[3] L. Fan et. al, “Lung nodule detection based on 3D Convolutional 15, pp. 1-3, 2012.
Neural Networks”, 2017 International Conference on the Frontiers [23] C.V. Arulkumar, G. Selvayinayagam and J. Vasuki,
and Advances in Data Science (FADS), pg. 7-10. “Enhancement in face recognition using PFS using Matlab,”
[4] Mr. Vijay A. Gajdhane and Prof. L. M . Deshpande, International Journal of Computer Science & Management Research,
“Detection of Lung Cancer Stages on CT scan Images by Using vol. 1(1), pp. 282-288, 2012
Various Image Processing Techniques”, IOSR Journal of Computer [24] H. Anandakumar and K. Umamaheswari, “Supervised machine
Engineering (IOSR - JCE), Volume 16 issue 5 (Sept – Oct 2014), learning techniques in cognitive radio networks during cooperative
pg. 28-35, e-ISSN 2278-0661, p-ISSN 2278-8727. spectrum handovers,” Cluster Computing, vol. 20, no. 2, pp. 1505–
[5] Mokhled S. Al Tarawneh, “Lung Cancer Detection Using 1515, Mar. 2017.
Image Processing Techniques”, Leonardo Electronic Journal of [25] V. Arulkumar. "An Intelligent Technique for Uniquely
Practices and Technologies, Issue 20, pg. 147-158, January-June 2012, Recognising Face and Finger Image Using Learning Vector
ISSN 1583-1078. Quantisation (LVQ)-based Template Key Generation," International
[6] Suren Makaju et. al, “Lung Cancer Detection using CT scan Journal of Biomedical Engineering and Technology 26, no. 3/4
images.”, 6th International Conference on Smart Computing and (February 2, 2018): 237-49.
Communications, ICSCC 2017, 7-8th December 2017, Kurukshetra,
India.
[7] Sanjeev Jain and Bhagyashri G. Patil, “Cancer cells detection
using Digital Image Processing Methods”, International Journal of
Latest Research in Science and Technology, Volume 3(4), March
2014, pg. 45-49.
[8] Q. Wu and W. Zhao, “Small-Cell Lung Cancer Detection Using a
Supervised Machine Learning Algorithm”, 2017 International
583