FRAMEWORK FOR CLASSIFYING CLINICAL DATASETS USING GENETIC
ALGORITHM, COEVOLUTION AND NEURAL NETWORK
GUIDED BY DR. H. KHANNA NEHEMIAH
NAME: J. BRIGHTY
REG NO: 2015207029
SIGNATURE
PROBLEM STATEMENT OF PHASE 1
The datasets are found to contain missing values. Missing values are the most
problematic of data values. Missing values can be due to data entry errors, unknown values,
loss of data or data collection limitations. For smaller data sets, the missing values are
imputed by the most frequent of the five nearest neighbours using Euclidean measure. In
case of larger datasets, list wise deletion is performed for instances containing missing values
and not all features and instances are to be considered at all times. Larger datasets increase
the computational cost and search space. To increase the efficiency of the classification
process and to reduce the classification error rate, data reduction techniques can be used. The
data reduction subsystem uses co evolution which makes use of CHC genetic algorithm to
perform feature and instance selection in a cooperative fashion. Analysing data to assist
physician to diagnose and enable clinical decision in health care. The classification subsystem
uses extreme learning machine technique on this dataset.
PROBLEM STATEMENT OF PHASE 2
The data reduction techniques are carried out using Lion Algorithm and Wolf
Algorithm instead of co evolution technique and the results are compared. The Lion's
algorithm searches optimal solution in a way similar to the social behaviour of the lion and
Wolf algorithm searches in the way the grey wolves diverge to find the prey and converge to
attack the prey. The classification is done using ensemble method where classifiers built
using neural network, decision tree and Bayes are aggregated and majority vote gives the
final output.
BRIEF SUMMARY OF RELATED WORK
Jose RC et al., showed that evolutionary instance selection algorithms
consistently outperform the non-evolutionary ones, the main advantages being better instance
reduction rates, higher classification accuracy and models that are easier to interpret. Derrac
et al., proposed an algorithm which employs divide and conquer strategy where each
population can focus its efforts on solving a part of the problem. It increases the collaborative
fitness value shared among individuals. Kindie et al., proved that the proposed system has a
better generalization performance compared to Back Propagation Neural Network and takes
lesser time to train the network comparatively. [Link] proposed a new nature
inspired Lion algorithm based on social behaviour of lion that helps in searching out highly
optimal solutions from a huge solution space. Seyedali Mirjalili et al., proposed Grey Wolf
Algorithm that mimics the leadership hierarchy and hunting mechanism of grey wolves in
nature is applicable to challenging problems with unknown search spaces. [Link] et
al., proposed the idea of improving efficiency of classification on Arrhythmia dataset using
ensembling method.
BLOCK DIAGRAM OF PHASE 1
Clinical DatasetArrhythmia
Pre-processing
Data Cleaning using
KNN
Data Normalization using Min Max
Normalization
Feature selection and Instance selection using Co operative co
evolution
Pre-processed
dataset
Training Data
ELM
Classifier
Test
Data
Classification
Trained Neural Network
Result
BLOCK DIAGRAM OF PHASE 2
Clinical DatasetArrhythmia Normalized
dataset of Phase1
Pre-processing
Data Reduction using Lion
Algorithm
Data Reduction using Wolf
Algorithm
Pre-processed
dataset
Training Data
Decision
tree
Test
Data
Bayesian
Classifier
Neural Network
Classifier
Classification
Ensemble Result
Result
REFERENCES
1. Kindie Biredagn Nahato, Khanna H Nehemiah and A. Kannan, Hybrid Approach
using Fuzzy Sets and Extreme Learning Machine for Classifying Clinical Datasets,
Information in Medicine Unlocked, pp. 1-11, 2016
2. Jose Ramon Cano, Franciso Herrara and Monu el Lozano, For Data Reduction in
KDD: An Experimental Study, IEEE Transactions on Evolutionary Computation, pp.
561 -575, 2003
3. J Derrac, S Garca, F Herrera, IFS-CoCo: Instance and Feature Selection Based on
Cooperative Coevolution with the Nearest Neighbour Rule, Pattern Recognition,
Elsevier, pp. 20822105, 2010
4. [Link], "The Lion's Algorithm: A New Nature-Inspired Search Algorithm",
Procedia Technology, 2nd International Conference on Communication, Computing
& Security, Elsevier, pp. 126-135, 2012
5. Seyedali Mirjalili, Seyed Mohammed Mirjalili, Andrew Lewis, "Grey Wolf
Optimizer", Advances in Engineering Software, Elsevier, pp. 46-61, 2013
6. [Link], [Link], [Link], Efficient learning of Arrhythmia data set
with Multi class-cost sensitive classifiers, International Research Journal of
Engineering and Technology(IRJET), pp. 782-786, 2015
DEMONSTRATION
AT THE END OF PHASE 1
Classified results of clinical datasets using Extreme Learning Machine where data reduction
is performed using co evolution.
AT THE END OF PHASE 2
Classified results of clinical datasets using ensemble classification of three classifiers where
data reduction is performed using Lions Algorithm and Wolf Algorithm. The results are
compared.