Process Data Analysis
Process Data Analysis
Dr M. A. A. Shoukat Choudhury
Department of Chemical Engineering
BUET, Dhaka - 1000
DATA MINING
• Data explosion problem
• In most cases, cannot know what the “true” value is unless there
is an independent determination (i.e. different measurement
technique).
Population Sample
mean μ (mu) X
variance σ2 (sigma) s2
Statistics
• Median
– the middle number of a dataset arranged in numerical
order: 0, 1, 2, 5, 1000
(average of middle two numbers when even number of scores
exist)
– relatively uninfluenced by outliers
• Mean =
Measures of dispersion
• Several ways to measure spread of data:
– Range (max-min), IQR or Inter-Quartile Range (middle 50%),
Mean Absolute Deviation
small std dev: observations are clustered tightly around the mean
large std dev: observations are scattered widely around the mean
Data Distribution
Histogram is a useful graphic representation of information content of
sample or parent population
• Mathematical equation
mimics this normal
(or Gaussian) distribution
Normal Distribution
• The mathematical normal distribution is useful
as its known mathematical properties give us
useful info about our real-life variable (assuming
our real-life variable is normally distributed)
• Z -scores= X