0% found this document useful (0 votes)

58 views21 pages

Classification

Genetic

Uploaded by

Rishabh Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views21 pages

Classification

Genetic

Uploaded by

Rishabh Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Btech 6CE-2 Semester

Dr. Neelam Duhan

Associate Professor, CE Department
J. C Bose University of Science & Technology, YMCA,
Faridabad

July 30, 2022 1

Basics of GAs
◼ A Genetic Algorithm is a search heuristic that is inspired by
Charles Darwin’s theory of Survival of the fittest in natural
evolution.
◼ This algorithm reflects the process of natural selection where
the fittest individuals are selected for reproduction in order to
produce offspring of the next generation.
◼ If parents have better fitness, their offspring will be better
than parents and have a better chance at surviving.

Coding
Has a Fitness value

July 30, 2022 Data Mining: Concepts and Techniques 2

Basics
◼ GA is based on the genetic structure and behavior of the chromosome of the
population.
◼ Each chromosome indicates a possible solution. Thus the population is a
collection of chromosomes.
◼ Each individual in the population is characterized by a fitness function. Greater
fitness better is the solution.
◼ Out of the available individuals in the population, the best individuals are used
for the reproduction of the next generation offsprings (Crossover).
◼ The offspring produced will have features of both the parents and is a result of
mutation. A mutation is a small change in the gene structure.

July 30, 2022 Data Mining: Concepts and Techniques 3

The Process of GA working

July 30, 2022 Data Mining: Concepts and Techniques 4

Outline of the Basic Genetic Algorithm
1. [Start] Generate random population of n chromosomes (suitable solutions for the
problem)
2. [Fitness] Evaluate the fitness f(x) of each chromosome x in the population
3. [New population] Create a new population by repeating following steps until the
new population is complete
4. [Selection] Select two parent chromosomes from a population according to their
fitness (the better fitness, the bigger chance to be selected)
5. [Crossover] With a crossover probability, cross over the parents to form a new
offspring (children). If no crossover was performed, offspring is an exact copy of
parents.
6. [Mutation] With a mutation probability, mutate new offspring.
7. [Accepting] Place new offspring in a new population
8. [Replace] Use new generated population for a further run of algorithm
9. [Test] If the end condition is satisfied, stop, and return the best solution in current
population
10. [Loop] Go to step 2
Data Mining: Concepts and Techniques 5
Crossover
During crossover, a random point is selected
while mating a pair of parents to generate
offsprings.
There are 3 major types of crossover.
Single Point Crossover: A point on both
parents’ chromosomes is picked randomly,
and designated a ‘crossover point’. Bits to the
right/left of that point are exchanged
between the two parent chromosomes.
Two-Point Crossover: Two crossover
points are picked randomly from the parent
chromosomes. The bits in between the two
points are swapped between the parent
organisms.
Uniform Crossover: In a uniform
crossover, typically, each bit is chosen from
either parent with equal probability.

The new offspring are added to the

population.
Data Mining: Concepts and Techniques 6
Types of Crossover

July 30, 2022 Data Mining: Concepts and Techniques 7

Mutation
◼ In a few new offspring formed, some of their genes can be subjected to
a mutation with a low random probability. This indicates that some of
the bits in the bit chromosome can be flipped.
◼ Mutation happens to take care of diversity among the population and
stop premature convergence.

8
Process in brief
Termination
◼ The algorithm terminates if the population has converged (does not produce
offspring which are significantly different from the previous generation). Then it
is said that the genetic algorithm has provided a set of solutions to our
problem.

July 30, 2022 Data Mining: Concepts and Techniques 9

Genetic Algorithms in Classification process
◼ An initial population is created consisting of randomly
generated rules
◼ Each rule is represented by a string of bits
◼ E.g., rule (A1 and ¬A2 then C2) can be encoded as 100
◼ Rule (‘age=medium’ and ‘student=yes’ then
‘buys_comp=‘yes’) can be encoded as 01011 (here, 010
for medium, 1 for yes, 1 for yes)
◼ If an attribute has k > 2 values, k bits can be used

(‘age=low’ and ‘student=yes’ and ‘income=high’ and

‘credit_rating=fair’ then ‘risk= medium’) ???? encode

July 30, 2022 Data Mining: Concepts and Techniques 10

Continued…
◼ Based on the notion of survival of the fittest, a new population is formed to
consist of the fittest rules and their offsprings
◼ The fitness of a rule is represented by its classification accuracy on a set of
training examples
◼ Offsprings are generated by crossover and mutation
◼ The process continues until a population P evolves when each rule in P satisfies
a prespecified threshold
◼ Slow but easily parallelizable

July 30, 2022 Data Mining: Concepts and Techniques 11

Regression/ Prediction

July 30, 2022 Data Mining: Concepts and Techniques 12

What Is Prediction?
◼ (Numerical) prediction is similar to classification
◼ construct a model

◼ use model to predict continuous or ordered value for a given input

◼ Prediction is different from classification

◼ Classification refers to predict categorical class label

◼ Prediction models continuous-valued functions

◼ Major method for prediction: regression

◼ model the relationship between one or more independent or

predictor variables and a dependent or response variable

◼ Regression analysis
◼ Linear and multiple regression

◼ Non-linear regression

◼ Other regression methods: generalized linear model, Poisson

regression, log-linear models, regression trees

July 30, 2022 Data Mining: Concepts and Techniques 13
July 30, 2022 Data Mining: Concepts and Techniques 14
Linear Regression
◼ Linear regression: involves a response variable y and a single
predictor variable x (e.g. x is experience, y is salary)
y = w0 + w1 x
where w0 (y-intercept) and w1 (slope) are regression coefficients
◼ Method of least squares: estimates the best-fitting straight line

| D|

 (x − x )( yi − y )
w =
i
i =1

1 | D|

 (x i − x )2
i =1

w = y −w x
0 1

July 30, 2022 15

Solved example

Here mean(x)=9.1 and mean(y)= 55.4

Thus linear equation becomes: y= 3.5x + 23.6

Say a person has experience=10 years, what is his predicted salary???
Salary= $58600
16
Multiple linear regression
◼ involves more than one predictor variable
◼ Training data is of the form (X1, y1), (X2, y2),…, (X|D|, y|D|)
◼ Ex. For 2-D data, we may have: y = w0 + w1 x1+ w2 x2

July 30, 2022 17

Non-linear Regression
◼ Some nonlinear models can be modeled by a polynomial
function
◼ A polynomial regression model can be transformed into
linear regression model. For example,
y = w0 + w1 x + w2 x2 + w3 x3
convertible to linear with new variables: x2 = x2, x3= x3
y = w0 + w1 x + w2 x2 + w3 x3
◼ Other functions, such as power function, can also be
transformed to linear model

July 30, 2022 Data Mining: Concepts and Techniques 18

Classifier Accuracy Measures
Here,
Class 1 : Positive
Class 2 : Negative

Definition of the Terms:

Positive (P) : Observation is positive (for example: is an apple).
Negative (N) : Observation is not positive (for example: is not an apple).

True Positive (TP) : Observation is positive, and is predicted to be positive.

False Positive (FP) : Observation is negative, but is predicted positive.

True Negative (TN) : Observation is negative, and is predicted to be negative.

False Negative (FN) : Observation is positive, but is predicted negative.

July 30, 2022 Data Mining: Concepts and Techniques 19

Classifier Accuracy Measures
classes buy_computer = yes buy_computer = no total recognition(%)
buy_computer = yes 6954 46 7000 99.34
buy_computer = no 412 2588 3000 86.27
total 7366 2634 10000 95.52

◼ Accuracy of a classifier M, acc(M): percentage of test set tuples that are

correctly classified by the model M
◼ Error rate (misclassification rate) of M = 1 – acc(M)

◼ Given m classes, CMi,j, an entry in a confusion matrix, indicates # of

tuples in class i that are labeled by the classifier as class j

◼ Alternative accuracy measures (e.g., for cancer diagnosis)
sensitivity = TP/P /* true positive recognition rate */
specificity = TN/N /* true negative recognition rate */
precision = TP/(TP+FP)
Recall= TP/(TP+FN)
accuracy = sensitivity * P/(P+N)+specificity * N/(P+N)= (TP+TN)/(TP+TN+FP+FN)

20
Summary
◼ Classification and prediction are two forms of data analysis that can
be used to extract models describing important data classes or to
predict future data trends.
◼ Effective and scalable methods have been developed for decision
trees induction, Naive Bayesian classification, Bayesian belief
network, rule-based classifier, Backpropagation, Support Vector
Machine (SVM), associative classification, nearest neighbor classifiers,
and case-based reasoning, and other classification methods such as
genetic algorithms, rough set and fuzzy set approaches.
◼ Linear, nonlinear, and generalized linear models of regression can be
used for prediction. Many nonlinear problems can be converted to
linear problems by performing transformations on the predictor
variables. Regression trees and model trees are also used for
prediction.
July 30, 2022 Data Mining: Concepts and Techniques 21

Chapter 5. Classification and Prediction
No ratings yet
Chapter 5. Classification and Prediction
122 pages
Classification - Prediction Data Model Very Important
No ratings yet
Classification - Prediction Data Model Very Important
173 pages
Data Mining: Concepts and Techniques: - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 6
115 pages
Classification and Prediction in Data Mining
No ratings yet
Classification and Prediction in Data Mining
126 pages
Data Mining: Concepts and Techniques: - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 6
129 pages
Data Mining: Concepts and Techniques: - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 6
129 pages
Chapter3 Classification and Prediction
No ratings yet
Chapter3 Classification and Prediction
63 pages
Data Mining
No ratings yet
Data Mining
25 pages
Chapter 6 - : Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign
No ratings yet
Chapter 6 - : Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign
129 pages
Classification and Prediction in Data Mining
No ratings yet
Classification and Prediction in Data Mining
39 pages
Unit V Classification
No ratings yet
Unit V Classification
69 pages
Chapter4 Classification Prediction
No ratings yet
Chapter4 Classification Prediction
173 pages
Data Mining: Concepts and Techniques: - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 6
172 pages
Data Mining: Concepts and Techniques
100% (2)
Data Mining: Concepts and Techniques
139 pages
Classification
No ratings yet
Classification
36 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 7
88 pages
Classification
No ratings yet
Classification
50 pages
Classification & Prediction
No ratings yet
Classification & Prediction
19 pages
Classification and Prediction
No ratings yet
Classification and Prediction
134 pages
GADataMining CNA
No ratings yet
GADataMining CNA
73 pages
7 Class
No ratings yet
7 Class
72 pages
5.3 Supervised & Reinforcement
No ratings yet
5.3 Supervised & Reinforcement
30 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
Lecture 3.1.5 and 3.1.6
No ratings yet
Lecture 3.1.5 and 3.1.6
18 pages
Data Mining: UNIT-3 Classification
No ratings yet
Data Mining: UNIT-3 Classification
54 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 7
88 pages
Chapter 7. Classification and Prediction
No ratings yet
Chapter 7. Classification and Prediction
68 pages
DM Classification 1 3
No ratings yet
DM Classification 1 3
19 pages
Classification & Prediction in Data Mining
No ratings yet
Classification & Prediction in Data Mining
112 pages
DWM Unit-3 Sem Ans
No ratings yet
DWM Unit-3 Sem Ans
10 pages
Prediction - Accuracy
No ratings yet
Prediction - Accuracy
33 pages
7 Class
No ratings yet
7 Class
72 pages
Classification
No ratings yet
Classification
20 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 7
72 pages
Data Mining and Machine Learning Overview
No ratings yet
Data Mining and Machine Learning Overview
12 pages
DM Unit - 3
No ratings yet
DM Unit - 3
21 pages
Classification & Prediction
No ratings yet
Classification & Prediction
78 pages
Classification Prediction
No ratings yet
Classification Prediction
71 pages
BI Chapter 04 - Unlocked
No ratings yet
BI Chapter 04 - Unlocked
47 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
6 pages
It 311-Ads Module 5
No ratings yet
It 311-Ads Module 5
9 pages
SLIQ Algorithm in Data Mining
No ratings yet
SLIQ Algorithm in Data Mining
95 pages
CSE2021 - MODULE 1ppt
No ratings yet
CSE2021 - MODULE 1ppt
62 pages
Classification and Prediction
No ratings yet
Classification and Prediction
40 pages
Excel Data Mining Guide
100% (1)
Excel Data Mining Guide
178 pages
ML Questions2
No ratings yet
ML Questions2
27 pages
Machine Learning Project Guide
100% (2)
Machine Learning Project Guide
26 pages
Lecture2 DataMiningFunctionalities
No ratings yet
Lecture2 DataMiningFunctionalities
18 pages
Chapter 6 Classification and Prediction25.10.13
No ratings yet
Chapter 6 Classification and Prediction25.10.13
43 pages
Data Science: Classification & Regression
No ratings yet
Data Science: Classification & Regression
7 pages
ML - Machine Learning PDF
No ratings yet
ML - Machine Learning PDF
13 pages
Lecture Notes For Chapter 3 Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 3 Introduction To Data Mining, 2 Edition
76 pages
Unit-4 Data Mining
No ratings yet
Unit-4 Data Mining
19 pages
Bia Unit-3 Part-2
No ratings yet
Bia Unit-3 Part-2
43 pages
Slides For Textbook - Chapter 7 - : March 6, 2014 Data Mining: Concepts and Techniques 1
No ratings yet
Slides For Textbook - Chapter 7 - : March 6, 2014 Data Mining: Concepts and Techniques 1
23 pages
Data Science & Analytics Basics
No ratings yet
Data Science & Analytics Basics
71 pages
Data Mining Basics for Beginners
No ratings yet
Data Mining Basics for Beginners
20 pages
MTH210
No ratings yet
MTH210
126 pages
Unexpected Love: Rowan, #9
No ratings yet
Unexpected Love: Rowan, #9
32 pages
Unit 3 Yes No Not Given
No ratings yet
Unit 3 Yes No Not Given
10 pages
A2 Key - Speaking Test - Examiner Feedback
No ratings yet
A2 Key - Speaking Test - Examiner Feedback
4 pages
Smart Pet Devices Market Analysis
No ratings yet
Smart Pet Devices Market Analysis
17 pages
Diss DLP 8-11
No ratings yet
Diss DLP 8-11
14 pages
Welcome To SD1000 2025 26
No ratings yet
Welcome To SD1000 2025 26
37 pages
Physics and Electromagnetism Q&A
No ratings yet
Physics and Electromagnetism Q&A
38 pages
PHD Thesis Manufacturing Engineering
100% (3)
PHD Thesis Manufacturing Engineering
4 pages
Multiplecorrelationunit3pdf 2025 05-06-16!23!57
No ratings yet
Multiplecorrelationunit3pdf 2025 05-06-16!23!57
18 pages
Traditional vs Behavioral Finance Explained
No ratings yet
Traditional vs Behavioral Finance Explained
9 pages
True False Chapter 10
No ratings yet
True False Chapter 10
6 pages
Slope Drains
No ratings yet
Slope Drains
7 pages
DESIGN CALCULATION-1000 CUM-14m-20MT-R1
No ratings yet
DESIGN CALCULATION-1000 CUM-14m-20MT-R1
32 pages
Open Channel Flow Essentials
No ratings yet
Open Channel Flow Essentials
32 pages
StellarNet 377747 Mahesh
No ratings yet
StellarNet 377747 Mahesh
2 pages
Chapter 2 Edev
No ratings yet
Chapter 2 Edev
3 pages
ILS Guidebook Health Safety
No ratings yet
ILS Guidebook Health Safety
20 pages
Advanced Calculus for Economists
No ratings yet
Advanced Calculus for Economists
8 pages
EN-D-RTK 2 User Manual (EN) 20190430
100% (1)
EN-D-RTK 2 User Manual (EN) 20190430
20 pages
Lab Report Upsi SKT1013 Diploma Science Experiment 3
100% (1)
Lab Report Upsi SKT1013 Diploma Science Experiment 3
5 pages
Biology Remedial Module PDF
83% (6)
Biology Remedial Module PDF
185 pages
Antibiotic Resistance in Bacteria
100% (1)
Antibiotic Resistance in Bacteria
26 pages
List of Stars in Auriga
No ratings yet
List of Stars in Auriga
9 pages
Cambridge Checkpoint Biology
No ratings yet
Cambridge Checkpoint Biology
6 pages
Es05 - CLEANING OF FABRICATED PIPING
No ratings yet
Es05 - CLEANING OF FABRICATED PIPING
4 pages
Pediatric Ward Clinical Performance Checklist
No ratings yet
Pediatric Ward Clinical Performance Checklist
2 pages
Lecher Antenna
No ratings yet
Lecher Antenna
8 pages
Emerging Microthemes in Indian Polity For UPSC CSE
No ratings yet
Emerging Microthemes in Indian Polity For UPSC CSE
7 pages
NORM Awareness Course1
No ratings yet
NORM Awareness Course1
29 pages
Environmental Studies
No ratings yet
Environmental Studies
9 pages

Classification

Uploaded by

Classification

Uploaded by

Btech 6CE-2 Semester

Dr. Neelam Duhan

July 30, 2022 1

July 30, 2022 Data Mining: Concepts and Techniques 2

July 30, 2022 Data Mining: Concepts and Techniques 3

July 30, 2022 Data Mining: Concepts and Techniques 4

The new offspring are added to the

July 30, 2022 Data Mining: Concepts and Techniques 7

July 30, 2022 Data Mining: Concepts and Techniques 9

(‘age=low’ and ‘student=yes’ and ‘income=high’ and

July 30, 2022 Data Mining: Concepts and Techniques 10

July 30, 2022 Data Mining: Concepts and Techniques 11

July 30, 2022 Data Mining: Concepts and Techniques 12

◼ use model to predict continuous or ordered value for a given input

◼ Prediction is different from classification

◼ Prediction models continuous-valued functions

◼ Major method for prediction: regression

predictor variables and a dependent or response variable

◼ Other regression methods: generalized linear model, Poisson

regression, log-linear models, regression trees

July 30, 2022 15

Here mean(x)=9.1 and mean(y)= 55.4

Thus linear equation becomes: y= 3.5x + 23.6

July 30, 2022 17

July 30, 2022 Data Mining: Concepts and Techniques 18

Definition of the Terms:

True Positive (TP) : Observation is positive, and is predicted to be positive.

True Negative (TN) : Observation is negative, and is predicted to be negative.

July 30, 2022 Data Mining: Concepts and Techniques 19

◼ Accuracy of a classifier M, acc(M): percentage of test set tuples that are

◼ Given m classes, CMi,j, an entry in a confusion matrix, indicates # of

tuples in class i that are labeled by the classifier as class j

You might also like