Lecture 9 Statistical Learning

statistical learning in data mining

Uploaded by

Ifra Luqman

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Lecture 9 Statistical Learning

statistical learning in data mining

Uploaded by

Ifra Luqman

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

CIT- 652. DATA MINING.

COURSE INSTRUCTOR : Sheza Naeem

Lecture# 12

Statistical methods in data mining;

 Statistical Inference:-
Whether their number is finite or infinite, constitutes what we call a population. The term
refers to anything of statistical interest, whether it is a group of people, objects, or events. The
number of observations in the population is defined as the size of the population.
In general, populations may be finite or infinite, but some finite populations are so
large that, in theory, we assume them to be infinite. In the field of statistical inference, we are
interested in arriving at conclusions concerning a population when it is impossible or impractical to
observe the entire set of observations that make up the population. For example, in attempting to
determine the average length of the life of a certain brand of light bulbs, it would be practically
impossible to test all such bulbs. Therefore, we must depend on a subset of observations from the
population for most statistical-analysis applications. In statistics, a subset of a population is called a
sample,or dataset.
From a given data set, we build a statistical model of the population that will help us to
make inferences concerning that same population. If our inferences from the data set are to be
valid, we must obtain samples that are representative of the population. Very often, we are
tempted to choose a data set by selecting the most convenient members of the population. But
such an approach may lead to erroneous inferences concerning the population. Any sampling
procedure that produces inferences that consistently overestimate or underestimate some
characteristics of the population is said to be biased. To eliminate any possibility of bias in the
sampling procedure, it is desirable to choose a random data set in the sense that the observations
are made independently and at random. The main purpose of selecting random samples is to elicit
information about unknown population parameters.
Statistical inference is the main form of reasoning relevant to data analysis. The
theory of statistical inference consists of those methods by which one makes inferences or
generalizations about a population. These methods may be categorized into two major areas:
estimation and tests of hypotheses.
In estimation, one wants to come up with a plausible value or a range of plausible values
for the unknown parameters of the system. The goal is to gain information from a data set T in
order to estimate one or more parameters w belonging to the model of the real-world system.
In statistical testing, on the other hand, one has to decide whether a hypothesis
concerning the value of the population characteristic should be accepted or rejected in the light of
an analysis of the data set. A statistical hypothesis is an assertion or conjecture concerning one or
more populations. The truth or falsity of a statistical hypothesis can never be known with absolute
certainty, unless we examine the entire population. This, of course, would be impractical in most
situations, sometimes even impossible. Instead, we test a hypothesis on a randomly selected data
set. Evidence from the data set that is inconsistent with the stated hypothesis leads to a rejection of
the hypothesis, whereas evidence supporting the hypothesis leads to its acceptance.

 ASSESSING DIFFERENCES IN DATA SETS:-

For many data-mining tasks, it would be useful to learn the more general characteristics
about the given data set, regarding both central tendency and data dispersion. These simple parameters
of data sets are obvious descriptors for assessing differences between different data sets. Typical
measures of central tendency include mean, median, and mode, while measures of data dispersion
include variance and standard deviation. The most common and effective numeric measure of the center
of the data set is the mean value (also called the arithmetic mean). For the set of n numeric values x1, x2,
…,xn, for the given feature X, the mean is

andit is a built-in function (like all other descriptive statistical measures) in most modern statistical
software tools. For each numeric feature in the n-dimensional set of samples, it is possible to calculate the
mean value as a central tendency characteristic for this feature. Sometimes, each value xi in a set may be
associated with a weight wi, which reflects the frequency of occurrence, significance, or importance
attached to the value. In this case, the weighted arithmetic mean or the weighted average value is

Although the mean is the most useful quantity that we use to describe a set of data, it is not the only one.
For skewed data sets, a better measure of the center of data is the median. It is the middle value of the
ordered set of feature values if the set consists of an odd number of elements, and it is the average of the
middle two values if the number of elements in the set is even. If x1, x2,…,xn represents a data set of size
n, arranged in ascending order of magnitude, then the median is defined by

Another measure of the central tendency of a data set is the mode. The mode for the set of data is the
value that occurs most frequently or repeatedly in the set.
We classify datasets as unimodal (with only one mode) and multimodal (with two or more modes) .
Multimodal datasets may be precisely defined as bimodal, trimodal, etc. For unimodal frequency curves
that are moderately asymmetrical ,we have the following useful empirical relation for numeric datasets:
The degree to which numeric data tend to spread is called dispersion of the data, and the most common
measures of dispersion are the standard deviation σ and the variance σ2. The variance of n numeric values
x1,x2,…,xn is

The standard deviation σ is the square root of the variance σ2.

MATH 1281 Written Assignment Unit 6
No ratings yet
MATH 1281 Written Assignment Unit 6
14 pages
Statistical Data Science
No ratings yet
Statistical Data Science
5 pages
DeMeasure of central tendency and dispersion
No ratings yet
DeMeasure of central tendency and dispersion
15 pages
EDA Chapter 1.1 and 1.4
No ratings yet
EDA Chapter 1.1 and 1.4
2 pages
Statistics L 1
No ratings yet
Statistics L 1
27 pages
All The Statistical Concept You Required For Data Science
No ratings yet
All The Statistical Concept You Required For Data Science
26 pages
Statistics For Data Analysis
No ratings yet
Statistics For Data Analysis
71 pages
Statistics PDF
No ratings yet
Statistics PDF
9 pages
Chapt 1
No ratings yet
Chapt 1
20 pages
Basic Statistics notes
No ratings yet
Basic Statistics notes
10 pages
PSM 2020N
No ratings yet
PSM 2020N
399 pages
CE 459 Statistics: Assistant Prof. Muhammet Vefa AKPINAR
No ratings yet
CE 459 Statistics: Assistant Prof. Muhammet Vefa AKPINAR
211 pages
Statistics
100% (6)
Statistics
211 pages
ENS 185 Module 1
No ratings yet
ENS 185 Module 1
64 pages
STATISTICS NOTES (FINAL NA MAKAKAPASAR)
No ratings yet
STATISTICS NOTES (FINAL NA MAKAKAPASAR)
6 pages
Mathematics in The Modern World
No ratings yet
Mathematics in The Modern World
13 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
H
No ratings yet
H
6 pages
05 - Statistical Processing and Analysis of Medical Data
No ratings yet
05 - Statistical Processing and Analysis of Medical Data
14 pages
Masaya Book Statistics
No ratings yet
Masaya Book Statistics
460 pages
Statistics and Probability - Midterm Reviewer
No ratings yet
Statistics and Probability - Midterm Reviewer
12 pages
Statistics and Probability - Midterm Reviewer
No ratings yet
Statistics and Probability - Midterm Reviewer
13 pages
Lecture 6
No ratings yet
Lecture 6
84 pages
Bio Epi
No ratings yet
Bio Epi
6 pages
SMA 160 Stds Notes.pdf
No ratings yet
SMA 160 Stds Notes.pdf
41 pages
Statistical Methods
No ratings yet
Statistical Methods
43 pages
CH11 PPT
No ratings yet
CH11 PPT
33 pages
Statistics
No ratings yet
Statistics
25 pages
Article Review 1 Eng
No ratings yet
Article Review 1 Eng
30 pages
Advance Statistics for Data Science and Data Analysis (2)
No ratings yet
Advance Statistics for Data Science and Data Analysis (2)
47 pages
Business Statistics - Session Descriptive Statistics
No ratings yet
Business Statistics - Session Descriptive Statistics
28 pages
Handout-A-Preliminaries (Advance Statistics)
No ratings yet
Handout-A-Preliminaries (Advance Statistics)
29 pages
Quantitative Methods For Decision Making: Dr. Akhter
No ratings yet
Quantitative Methods For Decision Making: Dr. Akhter
100 pages
Week 3
No ratings yet
Week 3
35 pages
Chapter 1
No ratings yet
Chapter 1
41 pages
LEC-1
No ratings yet
LEC-1
7 pages
BSM with SPSS[1]
No ratings yet
BSM with SPSS[1]
90 pages
FDS CH 2
No ratings yet
FDS CH 2
2 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
17 pages
MMW Nursing
No ratings yet
MMW Nursing
23 pages
Topic 2- Descriptive_statistics
No ratings yet
Topic 2- Descriptive_statistics
36 pages
Course Introduction Inferential Statistics Prof. Sandy A. Lerio
No ratings yet
Course Introduction Inferential Statistics Prof. Sandy A. Lerio
46 pages
Unit II: Basic Data Analytic Methods
No ratings yet
Unit II: Basic Data Analytic Methods
38 pages
Basic Concepts Lecture Notes
No ratings yet
Basic Concepts Lecture Notes
7 pages
Lesson 02 Probability and Statistics
No ratings yet
Lesson 02 Probability and Statistics
127 pages
Bio Statistics
No ratings yet
Bio Statistics
72 pages
Statistics
No ratings yet
Statistics
12 pages
Module 2 - Statistical Foundations
No ratings yet
Module 2 - Statistical Foundations
108 pages
10 Question Answer
No ratings yet
10 Question Answer
2 pages
Week 4 Team Lecture
No ratings yet
Week 4 Team Lecture
55 pages
Introduction and Descriptive Statistics
No ratings yet
Introduction and Descriptive Statistics
50 pages
Modified Ps Final 2023
No ratings yet
Modified Ps Final 2023
124 pages
SMA 160 -Stds notes (2025)
No ratings yet
SMA 160 -Stds notes (2025)
40 pages
Average: Sagni D. 1
No ratings yet
Average: Sagni D. 1
85 pages
New Generation University College: AUGUST 2020
No ratings yet
New Generation University College: AUGUST 2020
51 pages
SSM & Da All Unit Notes
No ratings yet
SSM & Da All Unit Notes
152 pages
GE 104 Module 4
No ratings yet
GE 104 Module 4
24 pages
Stats 7th Sems
No ratings yet
Stats 7th Sems
3 pages
Statistics
No ratings yet
Statistics
61 pages
Karim,Saman (1)
No ratings yet
Karim,Saman (1)
21 pages
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet
Module 1 - Research Methodology & Intellectual Property Rights
No ratings yet
Module 1 - Research Methodology & Intellectual Property Rights
65 pages
Chapter 2 Material Handling Case Study PDF
100% (2)
Chapter 2 Material Handling Case Study PDF
12 pages
Research Proposaal (HRM481)
No ratings yet
Research Proposaal (HRM481)
15 pages
ACCTG032 WLAP Module 4
No ratings yet
ACCTG032 WLAP Module 4
1 page
Approximate Shortcut Methods For Multicomponent Distillation
100% (1)
Approximate Shortcut Methods For Multicomponent Distillation
44 pages
Arunachal Pradesh Border Area Study
No ratings yet
Arunachal Pradesh Border Area Study
64 pages
Mathcad - Example Sheet PDF
No ratings yet
Mathcad - Example Sheet PDF
12 pages
Descriptive and Inferential Statistics
No ratings yet
Descriptive and Inferential Statistics
29 pages
Project Risk Register Guidance
No ratings yet
Project Risk Register Guidance
8 pages
Willy O. Gapasin, Dba
100% (1)
Willy O. Gapasin, Dba
30 pages
UFI Global Exhibition Barometer Report
No ratings yet
UFI Global Exhibition Barometer Report
32 pages
Mb0050 SLM Unit10
No ratings yet
Mb0050 SLM Unit10
30 pages
Analysis of Statically Indeterminate Structures by The Displacement Method
100% (1)
Analysis of Statically Indeterminate Structures by The Displacement Method
17 pages
First Things First - 1 PDF
No ratings yet
First Things First - 1 PDF
5 pages
Reflections On Theoretical Issues in Argumentation Theory: Frans H. Van Eemeren Bart Garssen Editors
No ratings yet
Reflections On Theoretical Issues in Argumentation Theory: Frans H. Van Eemeren Bart Garssen Editors
290 pages
Personal Research Into Adverisemeanat
No ratings yet
Personal Research Into Adverisemeanat
4 pages
The Impacts of Extracurricular Activity Clubs On Academic Performance of Students
No ratings yet
The Impacts of Extracurricular Activity Clubs On Academic Performance of Students
26 pages
SC 312 - 403 - Research Methodology - Lecture 1
No ratings yet
SC 312 - 403 - Research Methodology - Lecture 1
10 pages
Chapter 6
No ratings yet
Chapter 6
35 pages
Attitudes To Mental Illness 2012 Research Report: Prepared For Time To Change September 2013
No ratings yet
Attitudes To Mental Illness 2012 Research Report: Prepared For Time To Change September 2013
55 pages
A Summer Training Project Report ON Study of Consumer Perception Regarding Panasonicrefrigerators"
No ratings yet
A Summer Training Project Report ON Study of Consumer Perception Regarding Panasonicrefrigerators"
49 pages
An Analysis of Consumer Satisfaction in Laguna On Online Selling Basis For A Marketing Strategy For Lazada
No ratings yet
An Analysis of Consumer Satisfaction in Laguna On Online Selling Basis For A Marketing Strategy For Lazada
8 pages
Literature Review
No ratings yet
Literature Review
2 pages
Secondary Data Analysis An Introduction for Psychologists 1st Edition Kali H. Trzesniewski all chapter instant download
100% (7)
Secondary Data Analysis An Introduction for Psychologists 1st Edition Kali H. Trzesniewski all chapter instant download
60 pages
Monitoring and Evaluation Plans
No ratings yet
Monitoring and Evaluation Plans
3 pages
Bloom's Taxonomy Domain Verbs
No ratings yet
Bloom's Taxonomy Domain Verbs
3 pages
Landscapes
No ratings yet
Landscapes
82 pages
Chapter 2-Scientific Research and Research Process
No ratings yet
Chapter 2-Scientific Research and Research Process
30 pages
Eom Notes Fy Bba (H)
No ratings yet
Eom Notes Fy Bba (H)
27 pages

Lecture 9 Statistical Learning

Uploaded by

Lecture 9 Statistical Learning

Uploaded by

CIT- 652. DATA MINING.

COURSE INSTRUCTOR : Sheza Naeem

Statistical methods in data mining;

 ASSESSING DIFFERENCES IN DATA SETS:-

The standard deviation σ is the square root of the variance σ2.

You might also like