0% found this document useful (0 votes)

117 views28 pages

Classification & Prediction: - Shailesh Yadav Central University of Rajasthan

The document discusses classification and prediction methods, describing classification as predicting categorical class labels from a training set, while prediction models continuous functions to predict unknown values. It covers classification techniques like decision tree induction, Bayesian classification, and rule-based classification, explaining the process of model construction and evaluation for classification problems.

Uploaded by

mrmnsh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

117 views28 pages

Classification & Prediction: - Shailesh Yadav Central University of Rajasthan

Uploaded by

mrmnsh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

CLASSIFICATION & PREDICTION

- Shailesh Yadav
Central University Of Rajasthan
CONTENTS
 Classification & Prediction
 Methods Of Classification
 Other Classification Methods
 Prediction
 Conclusion
Classification vs. Prediction

 Classification
 predicts categorical class labels (discrete or nominal)
 classifies data (constructs a model) based on the training set and the
values (class labels) in a classifying attribute and uses it in classifying
new data
 Prediction
 models continuous-valued functions, i.e., predicts unknown or missing
values
 Typical applications:-
1-Credit/loan approval: Medical diagnosis: if a tumor is cancerous or benign
2-Fraud detection: if a transaction is fraudulent
3-Web page categorization: which category it is
Classification—A Two-Step Process

 Model construction: describing a set of predetermined classes

 Each tuple /sample is assumed to belong to a predefined class, as determined
by the class label attribute
 The set of tuples used for model construction is training set
 The model is represented as classification rules, decision trees, or
mathematical formulae
 Model usage: for classifying future or unknown objects
 Estimate accuracy of the model
 The known label of test sample is compared with the classified result
from the model
 Accuracy rate is the percentage of test set samples that are correctly
classified by the model
 Test set is independent of training set, otherwise over-fitting will occur
 If the accuracy is acceptable, use the model to classify data tuples whose class
labels are not known
Process (1): Model Construction

Classification
Algorithms
Training
Data

NAME RANK YEARS TENURED Classifier

Mike Assistant Prof 3 no (Model)
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes
Dave Assistant Prof 6 no IF rank = ‘professor’
Anne Associate Prof 3 no OR years > 6
THEN tenured = ‘yes’
Process (2): Using the Model in Prediction

Classifier

Testing
Data Unseen Data

(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom
Merlisa
Assistant Prof
Associate Prof
2
7
no
no
Tenured?
George Professor 5 yes
Joseph Assistant Prof 7 yes
Issues: Data Preparation

 Data cleaning
 Preprocess data in order to reduce noise and handle
missing values
 Relevance analysis (feature selection)
 Remove the irrelevant or redundant attributes
 Data transformation
 Generalize and/or normalize data
Issues: Evaluating Classification Methods

 Accuracy
 classifier accuracy: predicting class label
 predictor accuracy: guessing value of predicted attributes
 Speed
 time to construct the model (training time)
 time to use the model (classification/prediction time)
 Robustness: handling noise and missing values
 Scalability: efficiency in disk-resident databases
 Interpretability
 understanding and insight provided by the model
 Other measures, e.g., goodness of rules, such as decision tree size or
compactness of classification rules
Methods Of Classification

 By Decision Tree Induction

 Bayesian Classification
 Rule Based Classification
Decision Tree Induction
 A decision tree induction is the learning of
decision trees from class- labeled training
tuples. A decision tree is a flowchart- like tree
structure, where each internal node (non leaf
node) denotes a test on an attribute, each
branch represents an outcome of the test, and
each leaf node (or terminal node) hold a class
label. The topmost node in a tree is the root
node.
Decision Tree Induction: Training Dataset

age income student credit_rating buys_computer

This <=30
<=30
high
high
no
no
fair
excellent
no
no
follows 31…40 high no fair yes
>40 medium no fair yes
an >40 low yes fair yes
example >40
31…40
low
low
yes
yes
excellent
excellent
no
yes
of <=30 medium no fair no

Quinlan’s <=30
>40
low
medium
yes
yes
fair
fair
yes
yes
ID3 <=30 medium yes excellent yes
31…40 medium no excellent yes
(Playing 31…40 high yes fair yes
Tennis) >40 medium no excellent no
Output: A Decision Tree for “buys_computer”

age?

<=30 overcast
31..40 >40

student? yes credit rating?

no yes excellent fair

no yes no yes
Algorithm for Decision Tree Induction

 Basic algorithm (a greedy algorithm)

 Tree is constructed in a top-down recursive divide-and-conquer manner
 At start, all the training examples are at the root
 Attributes are categorical (if continuous-valued, they are discretized in advance)
 Examples are partitioned recursively based on selected attributes
 Test attributes are selected on the basis of a heuristic or statistical measure (e.g.,
information gain)
 Conditions for stopping partitioning
 All samples for a given node belong to the same class
 There are no remaining attributes for further partitioning – majority voting is
employed for classifying the leaf
 There are no samples left
Bayesian Classification: Why?

 A statistical classifier: performs probabilistic prediction, i.e.,

predicts class membership probabilities
 Foundation: Based on Baye’s Theorem.
 Performance: A simple Bayesian classifier, naïve Bayesian
classifier, has comparable performance with decision tree and
selected neural network classifiers
 Incremental: Each training example can incrementally
increase/decrease the probability that a hypothesis is correct —
prior knowledge can be combined with observed data
 Standard: Even when Bayesian methods are computationally
intractable, they can provide a standard of optimal decision
making against which other methods can be measured
Bayesian Theorem: Basics
 Let X be a data sample (“evidence”): class label is unknown
 Let H be a hypothesis that X belongs to class C
 Classification is to determine P(H|X), the probability that the hypothesis
holds given the observed data sample X
 P(H) (prior probability), the initial probability
 E.g., X will buy computer, regardless of age, income, …
 P(X): probability that sample data is observed
 P(X|H) (posteriori probability), the probability of observing the sample X,
given that the hypothesis holds
 E.g., Given that X will buy computer, the prob. that X is 31..40, medium
income
Bayesian Theorem

 Given training data X, posteriori probability of a hypothesis H, P(H|X),

follows the Bayes theorem
P( H | X)  P(X | H ) P(H )
P(X)
 Informally, this can be written as
posteriori = likelihood x prior/evidence
 Predicts X belongs to C2 iff the probability P(Ci|X) is the highest among
all the P(Ck|X) for all the k classes
 Practical difficulty: require initial knowledge of many probabilities,
significant computational cost
Using IF-THEN Rules for Classification

 Represent the knowledge in the form of IF-THEN rules

R: IF age = youth AND student = yes THEN buys_computer = yes
 Rule antecedent/precondition vs. rule consequent
 Assessment of a rule: coverage and accuracy
 ncovers = # of tuples covered by R
 ncorrect = # of tuples correctly classified by R
coverage(R) = ncovers /|D| /* D: training data set */
accuracy(R) = ncorrect / ncovers
 If more than one rule is triggered, need conflict resolution
 Size ordering: assign the highest priority to the triggering rules that has the “toughest”
requirement (i.e., with the most attribute test)
 Class-based ordering: decreasing order of prevalence or misclassification cost per class
 Rule-based ordering (decision list): rules are organized into one long priority list, according
to some measure of rule quality or by experts
Other Classification Methods

 Genetic Algorithms
 Rough Set Approach
 Fuzzy Set Approach
What Is Prediction?
 (Numerical) prediction is similar to classification
 construct a model
 use model to predict continuous or ordered value for a given input
 Prediction is different from classification
 Classification refers to predict categorical class label
 Prediction models continuous-valued functions
 Major method for prediction: regression
 model the relationship between one or more independent or predictor variables
and a dependent or response variable
 Regression analysis
 Linear and multiple regression
 Non-linear regression
 Other regression methods: generalized linear model, Poisson regression, log-linear
models, regression trees
Linear Regression
 Linear regression: involves a response variable y and a single predictor variable x
y = w0 + w1 x
where w0 (y-intercept) and w1 (slope) are regression coefficients
 Method of least squares: estimates the best-fitting straight line
| D|

 (x  x )( yi  y )
w  w  yw x
i
i 1

1 | D|
0 1
 (x i 1
i  x)2

 Multiple linear regression: involves more than one predictor variable

 Training data is of the form (X1, y1), (X2, y2),…, (X|D|, y|D|)
 Ex. For 2-D data, we may have: y = w0 + w1 x1+ w2 x2
 Solvable by extension of least square method or using SAS, S-Plus
 Many nonlinear functions can be transformed into the above
Nonlinear Regression

 Some nonlinear models can be modeled by a polynomial function

 A polynomial regression model can be transformed into linear regression
model. For example,
y = w0 + w 1 x + w2 x 2 + w3 x3
convertible to linear with new variables: x2 = x2, x3= x3
y = w0 + w 1 x + w2 x 2 + w3 x3
 Other functions, such as power function, can also be transformed to linear
model
 Some models are intractable nonlinear (e.g., sum of exponential terms)
 possible to obtain least square estimates through extensive calculation on
more complex formulae
Other Regression-Based Models

 Generalized linear model:

 Foundation on which linear regression can be applied to modeling
categorical response variables
 Variance of y is a function of the mean value of y, not a constant
 Logistic regression: models the prob. of some event occurring as a linear
function of a set of predictor variables
 Poisson regression: models the data that exhibit a Poisson distribution
 Log-linear models: (for categorical data)
 Approximate discrete multidimensional prob. distributions
 Also useful for data compression and smoothing
 Regression trees and model trees
 Trees to predict continuous values rather than class labels
Predictor Error Measures
 Measure predictor accuracy: measure how far off the predicted value is from the
actual known value
 Loss function: measures the error betw. y i and the predicted value yi’
 Absolute error: | yi – yi’|
 Squared error: (yi – yi’)2
 Test error (generalization error): the average loss over the test set
d d
 Mean absolute error:  | yi  yiMean
'| squared error: (y i  yi ' ) 2
i 1 i 1
d d
d
 Relative absolute error: | y Relative squared error:
 yi '| d

(y  yi ' ) 2
i
i 1 i
d
i 1
| y i y| d
i 1
(y i  y)2
The mean squared-error exaggerates the presence of outliers i 1

Popularly use (square) root mean-square error, similarly, root relative squared
error
Conclusion
 Classification and prediction are two forms of data analysis that can be used to
extract models describing important data classes or to predict future data trends.
 Effective and scalable methods have been developed for decision trees
induction, Naive Bayesian classification, Bayesian belief network, rule-based
classifier, Back propagation, Support Vector Machine (SVM), associative
classification, nearest neighbor classifiers, and case-based reasoning, and other
classification methods such as genetic algorithms, rough set and fuzzy set
approaches.
 Linear, nonlinear, and generalized linear models of regression can be used for
prediction. Many nonlinear problems can be converted to linear problems by
performing transformations on the predictor variables. Regression trees and
model trees are also used for prediction.
Conclusion (cont.)

 Stratified k-fold cross-validation is a recommended method for accuracy

estimation. Bagging and boosting can be used to increase overall accuracy
by learning and combining a series of individual models.
 There have been numerous comparisons of the different classification and
prediction methods, and the matter remains a research topic
 No single method has been found to be superior over all others for all data
sets
 Issues such as accuracy, training time, robustness, interpretability, and
scalability must be considered and can involve trade-offs, further
complicating the quest for an overall superior method
References

 L. Breiman, J. Friedman, R. Olshen, and C.

Stone. Classification and Regression
Trees. Wadsworth International Group,
1984.
 Jiawei Han and Micheline Kamber :Data
Mining- Concepts and Techniques
.
ANY QUERY ?
THANK YOU!

19-Introduction Classification Algorithm-18-09-2024
No ratings yet
19-Introduction Classification Algorithm-18-09-2024
102 pages
UNIT-5 DWM
No ratings yet
UNIT-5 DWM
73 pages
Classification Algorithms
No ratings yet
Classification Algorithms
23 pages
Classification Methods in Data Mining
No ratings yet
Classification Methods in Data Mining
33 pages
ClassificationandPrediction Module3
No ratings yet
ClassificationandPrediction Module3
88 pages
Session 5
No ratings yet
Session 5
91 pages
Classification Techniques Overview
No ratings yet
Classification Techniques Overview
31 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
Unit V - Classification and Prediction 2020-21
100% (1)
Unit V - Classification and Prediction 2020-21
68 pages
Unit 6 Classification and Prediction
No ratings yet
Unit 6 Classification and Prediction
66 pages
4 22865 IS465 2019 1 2 1 08ClassBasic
No ratings yet
4 22865 IS465 2019 1 2 1 08ClassBasic
43 pages
Classification and Prediction Overview
No ratings yet
Classification and Prediction Overview
14 pages
Classification and Prediction Methods
No ratings yet
Classification and Prediction Methods
47 pages
Supervised vs. Unsupervised Learning
No ratings yet
Supervised vs. Unsupervised Learning
87 pages
Chapter 8
No ratings yet
Chapter 8
15 pages
L11 Slides
No ratings yet
L11 Slides
28 pages
Classification and Prediction Techniques
No ratings yet
Classification and Prediction Techniques
47 pages
Machine Learning: Classification Techniques
No ratings yet
Machine Learning: Classification Techniques
37 pages
Data Mining-Unit-3
No ratings yet
Data Mining-Unit-3
16 pages
Classification and Prediction Methods
No ratings yet
Classification and Prediction Methods
72 pages
Data Mining: Classification & Prediction
No ratings yet
Data Mining: Classification & Prediction
16 pages
Learning Agents and Machine Learning Methods
No ratings yet
Learning Agents and Machine Learning Methods
30 pages
Unit6 - 1 Classification-and-Prediction-Basics
No ratings yet
Unit6 - 1 Classification-and-Prediction-Basics
12 pages
Classification and Prediction
No ratings yet
Classification and Prediction
130 pages
Decision Tree and Instance-Based Learning
No ratings yet
Decision Tree and Instance-Based Learning
53 pages
UNIT - IV
No ratings yet
UNIT - IV
169 pages
Understanding Classification in ML
No ratings yet
Understanding Classification in ML
50 pages
Classification and Prediction Techniques
No ratings yet
Classification and Prediction Techniques
92 pages
DW Unit 6-Min
No ratings yet
DW Unit 6-Min
44 pages
Data Mining: Classification & Prediction Techniques
No ratings yet
Data Mining: Classification & Prediction Techniques
21 pages
ML Module4 Classification
No ratings yet
ML Module4 Classification
79 pages
Classification Techniques Overview
No ratings yet
Classification Techniques Overview
141 pages
Data Mining and Classification Basics
No ratings yet
Data Mining and Classification Basics
129 pages
Classification and Prediction in Data Mining
No ratings yet
Classification and Prediction in Data Mining
19 pages
Machine Learning Unit-2
No ratings yet
Machine Learning Unit-2
89 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
CH 5
No ratings yet
CH 5
84 pages
Classification and Prediction Techniques
No ratings yet
Classification and Prediction Techniques
36 pages
Big Data Analytics - Unit 3
No ratings yet
Big Data Analytics - Unit 3
55 pages
Data Mining Classification Techniques
No ratings yet
Data Mining Classification Techniques
24 pages
Unit 3 Machine Learning
No ratings yet
Unit 3 Machine Learning
159 pages
Classification-1
No ratings yet
Classification-1
48 pages
Unit 3 DM
No ratings yet
Unit 3 DM
34 pages
Classification vs. Prediction in Machine Learning
No ratings yet
Classification vs. Prediction in Machine Learning
36 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Classification and Prediction Techniques
No ratings yet
Classification and Prediction Techniques
93 pages
Classification Methods Explained
100% (1)
Classification Methods Explained
107 pages
Classification
No ratings yet
Classification
73 pages
Classification & Prediction Guide
No ratings yet
Classification & Prediction Guide
83 pages
Classification and Prediction in Data Mining
No ratings yet
Classification and Prediction in Data Mining
30 pages
Classification vs Prediction Overview
No ratings yet
Classification vs Prediction Overview
44 pages
Data Mining Classification and Prediction
No ratings yet
Data Mining Classification and Prediction
17 pages
Naive Bayes Work Type Prediction
No ratings yet
Naive Bayes Work Type Prediction
9 pages
Classification DMKD
No ratings yet
Classification DMKD
50 pages
Module 3 - Classification
No ratings yet
Module 3 - Classification
111 pages
WINSEM2024-25 BCSE334L TH VL2024250502042 2025-03-03 Reference-Material-I
No ratings yet
WINSEM2024-25 BCSE334L TH VL2024250502042 2025-03-03 Reference-Material-I
18 pages
Stock Market Operations: Trading Insights
No ratings yet
Stock Market Operations: Trading Insights
55 pages
SEO Strategies for Document Optimization
No ratings yet
SEO Strategies for Document Optimization
28 pages
Risk Classification in Non-Life Insurance
No ratings yet
Risk Classification in Non-Life Insurance
22 pages
Predictive Modeling FOR Regression: Presented by
No ratings yet
Predictive Modeling FOR Regression: Presented by
10 pages
Neha Cost Revenue PPT Friday
No ratings yet
Neha Cost Revenue PPT Friday
37 pages
Neha Cost Revenue PPT Friday
No ratings yet
Neha Cost Revenue PPT Friday
37 pages
Overview of Life Insurance in India
No ratings yet
Overview of Life Insurance in India
34 pages
3.badm - Mba Notes
No ratings yet
3.badm - Mba Notes
13 pages
12th GSEB STATS P-I+Ch-1 (P-II)
No ratings yet
12th GSEB STATS P-I+Ch-1 (P-II)
4 pages
MLP Question Bank of AI and ML and NLP
No ratings yet
MLP Question Bank of AI and ML and NLP
7 pages
M3 Part 2: Regression Analysis
No ratings yet
M3 Part 2: Regression Analysis
21 pages
Ai Project - 251012 - 153446
No ratings yet
Ai Project - 251012 - 153446
48 pages
Neonatal Anthropometry Measurement of The Abdomina
No ratings yet
Neonatal Anthropometry Measurement of The Abdomina
4 pages
Utilizing A Five - Year Bayesian Predictive Analysis in Crime Rate
No ratings yet
Utilizing A Five - Year Bayesian Predictive Analysis in Crime Rate
10 pages
FYBBA Sem 1 REGULAR INTERNAL SYLLABUS
No ratings yet
FYBBA Sem 1 REGULAR INTERNAL SYLLABUS
6 pages
Regression Analysis Techniques for Engineers
No ratings yet
Regression Analysis Techniques for Engineers
3 pages
SPSS Data Analysis Techniques Guide
No ratings yet
SPSS Data Analysis Techniques Guide
86 pages
The Impact of Reward and Recognition On Employee Engagement at Pt. Bank Sulutgo, Manado
No ratings yet
The Impact of Reward and Recognition On Employee Engagement at Pt. Bank Sulutgo, Manado
13 pages
Introduction to Econometrics Concepts
No ratings yet
Introduction to Econometrics Concepts
5 pages
Linear Regression Analysis in Excel
No ratings yet
Linear Regression Analysis in Excel
15 pages
Different SHF Parameters Discussion
No ratings yet
Different SHF Parameters Discussion
18 pages
Intro To Data Analytics Activity Templates
No ratings yet
Intro To Data Analytics Activity Templates
11 pages
An Introduction To Stata Programming 2nd Edition Christopher F. Baum Updated Edition
100% (1)
An Introduction To Stata Programming 2nd Edition Christopher F. Baum Updated Edition
144 pages
Ef3450 2122B Mid
No ratings yet
Ef3450 2122B Mid
11 pages
Regression Solution
No ratings yet
Regression Solution
11 pages
Applied Regression Assignment
0% (1)
Applied Regression Assignment
2 pages
Cse 28
No ratings yet
Cse 28
7 pages
Unit 3 Summative Assessment Review Guide KEY-1
No ratings yet
Unit 3 Summative Assessment Review Guide KEY-1
10 pages
Probability and Model Evaluation in Statistics
No ratings yet
Probability and Model Evaluation in Statistics
8 pages
A Better Lemon Squeezer? Maximum-Likelihood Regression With Beta-Distributed Dependent Variables
No ratings yet
A Better Lemon Squeezer? Maximum-Likelihood Regression With Beta-Distributed Dependent Variables
18 pages
Credit Risk Modeling Training with SAS
0% (2)
Credit Risk Modeling Training with SAS
7 pages
Online Syllabus - Business Math and Statistics
No ratings yet
Online Syllabus - Business Math and Statistics
13 pages
Data Science Guide for Beginners
No ratings yet
Data Science Guide for Beginners
138 pages
Regression - Docx 3 4
No ratings yet
Regression - Docx 3 4
2 pages
25 Nature07671 Titanoboa
No ratings yet
25 Nature07671 Titanoboa
5 pages
He DKK, 2023
No ratings yet
He DKK, 2023
6 pages
Minitab Multiple Regression Tutorial
No ratings yet
Minitab Multiple Regression Tutorial
6 pages

Classification & Prediction: - Shailesh Yadav Central University of Rajasthan

Uploaded by

Classification & Prediction: - Shailesh Yadav Central University of Rajasthan

Uploaded by

CLASSIFICATION & PREDICTION

 Model construction: describing a set of predetermined classes

NAME RANK YEARS TENURED Classifier

 By Decision Tree Induction

age income student credit_rating buys_computer

student? yes credit rating?

no yes excellent fair

 Basic algorithm (a greedy algorithm)

 A statistical classifier: performs probabilistic prediction, i.e.,

 Given training data X, posteriori probability of a hypothesis H, P(H|X),

 Represent the knowledge in the form of IF-THEN rules

 Multiple linear regression: involves more than one predictor variable

 Some nonlinear models can be modeled by a polynomial function

 Generalized linear model:

 Stratified k-fold cross-validation is a recommended method for accuracy

 L. Breiman, J. Friedman, R. Olshen, and C.

You might also like