EMAE 285
d can change if we get another sample of 20 pieces
Therefore, D – d = A random variable
What we should learn in this part of the course:
• Quantify the statistical characteristics of a data set based on
the sample taken.
• Use Probability Density function to describe the behavior
of the variable (example: the roller diameter).
• Create Histogram of a measured data
• Quantify a Confidence Interval about the measured mean
(d, example of the roller diameter) at a given probability.
• Perform Regression analysis.
• Identify the outliers in a data set.
• Specify the number of measurements required to achieve a
desired Confidence Interval.
Statistical Measurement Theory
Random Error:
• Manifested through data scatter,
• How to quantify it.
Systematics Error:
➢ Does not vary with repeated measurements.
➢ Does not affect the statistics of measurements.
For now, let’s assume Systematic Error is negligible.
Now Let us :
• Estimate the value of x’ (mean of the entire population)
based on repeated measurements of variable x (mean of the
sample) in the absence of systematic error.
• The true value of x namely, x’ is the mean value of all
possible values of x.
• If the number of samples (measurements), N, is small, then
estimation of x’ from the data set (sample) is heavily
affected by the value of any one data point.
• As “N’ ∞, or goes towards the total number of
population, all the possible variations in x becomes
included in the data set.
From Statistical analysis:
x’ = x +/- u
where, u is uncertainty interval at some probability level p%
Uncertainty: Numbers that quantify the possible range of the
effects of errors.
Probability Density Function
In measurements random scatter occurs anyway.
Two types of random variables:
1. Continuous ( Motor speed)
2. Discrete (diameter of rollers in 10,000,000 population example)
Central tendency of a random variable : Repeated measurements of x
tends to have a preferred value.
Example:
The average 0.95 -1.05 contains the true mean.
The above line figure can be shown in a Histogram chart as well:
The above histogram has 7 intervals, based on the 20 data points (N) we
have.
A correlation for an estimate for the number of intervals K (7 in
this case) is derived from the suggestions in Bendat and Piersol
as:
Example: construct the Histogram of the data given below:
Known: Data of Table 4.1
N = 20
Assumption: Fixed operating condition
Construct: The Histogram and frequency distribution
Solution:
Compute a reasonable number of Intervals, K, for the given data set.
Next, determine the maximum and minimum values of the data set
and divide this range into K intervals. For a minimum of 0.68 and a
maximum of 1.34, a value of δ x = 0.05 is chosen. The intervals as
shown below:
The results are plotted as:
The results are plotted in Figure 4.2. The plot displays a definite
central tendency seen as the maximum frequency of occurrence
falling within the interval 0.95 to 1.05.
Behavior of a Population
Relationship between probability and statistics:
• Assume one of the distributions for p(x) of the previous
table , say the normal distribution.
• Normal distribution describes the behavior of a continuous
random variable common in engineering.
• Normal distribution predicts that the scatter seen in a
measured data set will distribute symmetrically about some
central tendency.
The probability density function for a random variable, x, having
a normal distribution is defined as:
x’ is the true mean value of the x and σ2 is the true variance of x.
Standard deviation, σ is the square root of the variance.
Thus, two parameters, namely x and σ2 define the normal
distribution p(x).
• Max of p(x) occurs at x = x’
• Variance σ2defines the width or range of variation of x.
The probability that x will assume a value within the interval of
x +/- δx is given by the area under p(x), which is found by
integrating over the interval. Thus, this probability is given by:
By a change of variable, the above integration can become
easier:
New variable: β = (x – x’)/σ , and dx = σ dβ, we have:
Because of symmetry :
The above is normal error function.
The value of the P(z1)=
is tabulated in Table 4.3 for the interval defined by z1 shown in
Figure 4.3.
Decimal
point for Z1
Example 1:
Example 2:
The statistics of a well-defined varying voltage signal are given by
x’ = 8.5 V and σ2 = 2.25 V2. If a single measurement of the voltage
signal is made, determine the probability that the measured value
indicated will be between 10.0 and 11.5 V.
Solution:
To find the probability that x will fall into the interval 10.0 < x <11.5
requires finding the area under p(x) bounded by this interval. The
standard deviation of the variable is σ = Sqrt(σ2) = Sqrt(2.25) = 1.5 V.
Therefore, our interval falls under the portion of the p(x) curve bounded
by z1 = (10.0 - 8.5)/1.5 = 1 and z1 = (11.5 - 8.5)/1.5 =2.
From Table 4.3, the probability that a value will fall between
8.5 < x <10.0 is P(8.5 < x <10.0) = P(z1 = 1) = 0.3413.
For the interval defined by 8.5 < x <11.5, P(8.5 < x <11.5) = P(z1 = 2) =
0.4772. The area we need is just the overlap of these two intervals, so:
Therefore, there is a 13.59% probability that the measurement will yield
a value between 10.0 and 11.5 V.
COMMENT In general, the probability that a measured
value will lie within an interval defined by any two values
of z1, such as za and zb, is found by integrating p(x)
between za and zb. For a normal density function, this
probability is identical to the operation, P(zb) - P(za).