0% found this document useful (0 votes)
48 views26 pages

Quality Notes Part II

Uploaded by

994217768cc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views26 pages

Quality Notes Part II

Uploaded by

994217768cc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

3.

Basic Seven QC Tools

In this part, we introduce the basic seven tools, or the Ichikawa seven tools as they are
first summarized in the book by the late Japanese Professor Ichikawa. These techniques
are applicable to most engineering processes and serve as basic technique in analysing and
monitoring processes. It should be mentioned that there are also other quality control tools
and some of them will be discussed later. The basic tools, however, serves the purpose of
preliminary analysis in most of practical situations.

3.1. Check Sheet

In order to carry out statistical analysis of a process, we need some information about the
process in form of some process characteristics data. Cheek sheet is a very useful
approach in data collection activity and it can be used by everyone to collect data in a
reliable and organized way.

Check sheets are also known as Data Collection Sheets. Where they are used for counting,
a commonly used term is Tally Sheet. Some specific names are process distribution check
sheet, defective count check sheet, checklist, etc. They are all useful in data collection
process. With the help of computer, check sheet can also be used for keying in
measurement data.

An Example

Usually, there are a number of important information in the check sheet and there are
different types of check sheets depending on the objectives of the data collection. The
most important information in a check sheet is the measurement data. A sample check
sheet is displayed in Figure 3.1.

The form of the check sheet is usually individualized for each situation and it is designed
by the project team. Whenever it is possible, check sheets should be designed to show
location and time of the data collection. It should be informative and user-friendly.
Creativity usually plays an important role in designing a check sheet.

Uses of Check Sheet


Quality Engineering (SYE3102) 16
[Link]@[Link]

Check sheet should be used to ensure the data is accurately recorded and it is easy to use
later, either for direct interpretation or for transcription, for example into a computer. It is
especially useful when the recording involves counting, classifying, checking or locating.
Suitably designed check sheet is also useful to see the distribution of measures as they are
built up.

If data is collected in a disorganized way, it is likely to end up as a jumble of numbers on


a convenient scrap of paper. The numbers are easily misunderstood and the paper may
even be lost. By collecting data in an organized way, fewer mistakes are likely in the
collection, transcription, understanding and storage of the data.

1 9 18 16 12 7 2

10 20 30 40 50 60 70 80
Value of the measurement

Figure 3.1. A simple check sheet for measurement of certain percentage values.

Steps in using check sheet

Step 1: Identify the objective of the measurement and the type of information that is
needed for the analysis
Step 2: Identify the period of data collection and the number of measurements needed
Step 3: Design a check sheet that is easy to use and contain the items needed for the
data collection
Step 4: Ensure that data entry is done correctly, especially when the interpretation is
needed for entering data into correct category
Quality Engineering (SYE3102) 17
[Link]@[Link]

Step 5: Implement and collect the data


Step 6: Interpret the data and prepare for further analysis

3.2. Pareto Diagram

Background and uses

This is a technique to identify the most critical quality problems. This technique is
advocated by [Link] Juran, one of the most famous quality gurus. It is named after an
Italian economist, Vifredo Pareto, who determined that 85% of the world's wealth was
owned by 15% of the people. In terms of quality, we can usually say that a large
percentage of the problems are determined by a small percentage of the causes.
Identifying these "vital few" causes and correcting them will give us the most
improvement for our efforts. We should not worry about the "trivial many" causes that
account for only a few quality problems at this time.

Pareto charts provide the vital information needed to identify and prioritize problems in
any process. Once the frequency of defects or errors has been prioritized, the cost
associated with each category should be examined before corrective action is taken. That
is, the defects or problem containing the highest cost should be addressed first. This
procedure will enable you to improve the process and reduce costs in a more efficient
manner.

An example of Pareto diagram

To conduct a Pareto analysis, we collect data on the frequency of different causes, or types
of quality problems. Then we sort these problems in order of frequency and calculate the
percentage of each as well as the cumulative percentage. Often, we also express the result
using a graphical diagram called a Pareto diagram. This method is best seen using an
example.

Assume that for a particular production process, defectives for the past weeks are
classified as far as possible according to categories A-G with the rest as O category. The
number of counts are 8, 24, 4, 26, 6, 13, 1, 3, respectively. The Pareto diagram is the
produced as follows. It can be seen that the most important problem categories are B and
Quality Engineering (SYE3102) 18
[Link]@[Link]

D types of problems. They together count for about 60 percent of overall problems. These
two types of problems should be dealt with as early as possible.

40 100

30
number of counts

20 50

10

0
B
D

O
E

C
F

Category

Figure 3.2. An example of a Pareto plot.

Construction of a Pareto Chart

Step 1: Classify the data


A decision must be made as to how the data should be classified.

Step 2: Define the period of time


Identify an appropriate period of time to conduct the study. Any period of time can be
used; however, it should be convenient and easily implemented within the normal work
environment. Once the classification of data and period of time have been defined, a check
sheet needs to be constructed to collect the data.

Step 3: Collect the data by classification


Quality Engineering (SYE3102) 19
[Link]@[Link]

Summarize the data collected on the check sheet by counting the number of times that
each event occurred. Then rearrange the data in decreasing order of occurrence.

Step 4: Draw a graph


Draw a vertical line to indicate the frequency of occurrence and a horizontal line
indicating all of the events or problems detected. Divide the vertical axis into a scale that
reflects the totals of the data collected. Then divide the horizontal axis into equal
segments, one for each category of events or problems recorded.

Step 5: List each event in its order of frequency


At the far left side of the horizontal axis, lost the event or problem that occurred most
frequently, then the next most frequent, the next, and so on, until all of the events have
been listed.

Step 6: Draw the bars on the graph


Draw a line for the total number of times each event or problem occurred for the
categories listed on the horizontal axis with the corresponding values on the vertical axis.

Step 7: Add a legend


Indicate the title of the chart, the period of time covered, the person who prepared the
chart, sources of the data, data prepared, and any other relevant information necessary to
identify the data.

3.3. Histogram

Uses of histogram

Histogram is also known as frequency distribution chart. A histogram is a graphical


representation of a frequency distribution and it provides a useful summary for a set of
data. We can construct a histogram by placing the value of the individual observations on
the horizontal axis of a graph and the frequencies on the vertical axis. Then, we use a
rectangle or bar whose base is centred on the value of the observation and whose height
corresponds to the frequency. This is done for each observation and we will have
constructed the complete histogram.
Quality Engineering (SYE3102) 20
[Link]@[Link]

In quality control applications, a histogram is sometimes called a lot plot. If the


specifications of the quality measurement are drawn on the lot plot, it can tell quite a bit
about the capability of the process to meet specification.

An example

For the data set given before, we can plot the frequency value using a bar graph, as shown
below.

Measurement 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4


Frequency 2 6 10 12 7 8 4 1

Data from "Data set 1"


12

10

8
Freq. Value

6 Freq. Value

0
2.7 2.8 2.9 3 3.1 3.2 3.3 3.4
Categories

Figure 3.3. An example of histogram.

Interpretation of histogram

A histogram provides a convenient picture of the data. From the histogram, it is easy to
see that the range and the centre of the data. Usually, the frequencies are smaller the
further away we move from the centre of the histogram. However, this is not always the
case.
Quality Engineering (SYE3102) 21
[Link]@[Link]

A symmetric distribution means that both large and small measurements fall equally
around a central value. This is typical of the output of most industrial processes. From the
histogram, we can easily identify whether the data are equally distributed on each side of
the centre or the data are skewed to the right or to the left. Statistical distribution can then
be selected based on this characteristic.

Another characteristic is the peakedness of the data. Sometimes more than one peak is
possible. A bi-modal histogram has two distinct peaks and it indicates that there are two
frequently occurring measurements. Often this results from a mixture of two different
populations in the sample data.

Steps in drawing a histogram

Step 1: Collect the data. The more the better, subject to practical restrictions. The
sample size is then the number of data collected;
Step 2: Determine the largest value and the smallest value. This is to calculate the
range of the data;
Step 3: Based on the difference between the largest and the smallest values, determine
the number of intervals to be used.
Step 4: Determine the intervals to be used;
Step 5: Check the data which belong to each class. Enter and count the total. To avoid
mistake, double check;
Step 6: In a graph paper, put in the vertical and horizontal axis. Plot the frequency on
the vertical axis. Draw a bar graph.

3.4. Run Chart

Background and the uses of run chart

A run chart is a plot of sequence of measurements in order to discover any significant


patterns of change. It is useful when we have repeated measurement of certain process
characteristic and it is important to identify the change of it over time. Control chart that
will be discussed later can be considered a special case of run chart although the run chart
is more general in a sense that we are not focusing and measuring the degree of control of
the process.
Quality Engineering (SYE3102) 22
[Link]@[Link]

An Example

Figure 3.4 is a run chart showing the sales per week for a certain product. It can be seen
that the sales has increased over the past couple of month and probably has stabilized.

1000

750
Sales in $ per week

500

250

0
0

10

12

14

Figure 3.4. An example of a run chart.

Interpretations of run chart

A trend shows an overall movement in one direction which may be massed by the ups and
downs of individual points. For example, where daily sales figures go up and down, but
general sales slowly increase over the longer term.

A spike is a short term changes which may be caused by some unusual event, such as a
weak batch of yeast causing a large number of rejections in a bakery.

A step is a sudden and persistent change.


Quality Engineering (SYE3102) 23
[Link]@[Link]

3.5. Control Chart (Brief Introduction)

Statistical process control technique originated in early twenties when Professor Shewhart
in his famous book, Shewhart (1931), presented control charts for monitoring process
characteristic. The basic idea is that the processes are always subject to random variation,
because of the variation in the quality of incoming material, the random changes of
environmental factors such as temperature, humidity and vibration, and the difference in
handling of the product by different people.

Control chart is useful to identify special causes of variation of process characteristics. It


is probably the most important statistical technique that has been applied in industry. It is
also the main part of this subject and it will be discussed in a separate part.

Control chart should be used when investigating a process to determine whether it is in a


state of statistical control and thus whether actions are required to bring the process under
control. When we have to differentiate between special and common causes of variation,
control chart can be used for this purpose.

I Line Chart
Range Span: 2
Control Limits: 3 Sigma
3

UCL = 2.641
2.5

2
%defectivees

1.5 Center = 1.493

.5
LCL = .345

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44

Observation
Quality Engineering (SYE3102) 24
[Link]@[Link]

Figure 3.5. An example of a control chart.

Although there are many types of control charts and there also different ways to construct
them, the common steps in setting up a control chart in practice can be summarized as
follows.

Step 1: Obtain a series of process characteristics though observation or calculation

Step 2: Calculate the process mean and use it as the centre line (CL)

Step 3: Calculate the standard deviation

Step 4: Calculate the upper control limit (UCL) and the lower control limit (LCL) as
the mean plus and minus three standard deviation, respectively

Step 5: Plot the process characteristics on the chart and connect the consecutive points

Step 6: If there are points that fall out of the limits, check the reason

Step 7: Continue the plotting whenever a new measurement is obtained

3.6. Scatter Diagram

Background and uses of scatter diagram

When investigating problems, typically when searching for their causes, it may be
suspected that two items are related in some way. The scatter diagram helps to identify the
existence of a measurable relationship between two such items by measuring them in pairs
and plotting them in a graph.

Scatter diagram can be used to show the type and degree of any causal relationship
between two factors. Especially when it is suspected that the variation of two items is
connected in some way, to show any actual correlation between the two.

An example of scatter diagram


Quality Engineering (SYE3102) 25
[Link]@[Link]

The following is a set of data collected for a type of soft drink. The percentage of
defectives after a certain storage time is measured. It is aimed to study the relationship
between the storage time and the percentage.

192 305 190 239 385 30 175 273 220 121


0.06 0.08 0.06 0.08 0.04 0.01 0.04 0.03 0.02 0.01

68 402 320 180 86 330 340 331 545 105


0.01 0.07 0.09 0.02 0.00 0.07 0.06 0.03 0.11 0.05

291 232 495 437 503 488 516 598 485 171
0.09 0.06 0.09 0.12 0.08 0.08 0.09 0.13 0.14 0.05

100 418 377 230 294 403 342 378 380 339
0.03 0.06 0.08 0.05 0.08 0.03 0.12 0.10 0.07 0.10

138 431 388 370 284 171 302 413 140 141
0.06 0.11 0.08 0.04 0.05 0.03 0.09 0.06 0.05 0.05

It is very difficult to draw any conclusion from the tabulated values. However, a scatter
plot can be made and it clearly indicates the strong relationship between the storage time
and the percentage of defectives.

0.15
% defectives

0.10

0.05

0.00
0

100

200

300

400

500

600

700

800

storage time
Quality Engineering (SYE3102) 26
[Link]@[Link]

Figure 3.6. An example of scatter diagram

Interpretation of scatter diagram

If two variables show a relationship, they are said to be correlated. A correlation exists
between them. This correlation may be linear or non-linear.

If the scatter plot shows an increasing trend, the variables are positively correlated. On the
other hand, if it shows a decreasing trend, they are negatively correlated.

After having identified a potential relationship, statistical analysis such as regression


model can be used to determine the exact relationship. The scatter plot, however, is
usually a good starting point for the determination of whether there is a relationship and
what type of relationship it could be.

3.7. Cause-Effect Diagram

Background and uses of the cause-effect diagram

The basic idea is that the processes are always subject to random variation, because of the
variation in the quality of incoming material, the random changes of environmental factors
such as temperature, humidity and vibration, and the difference in handling of the product
by different people. An example of a cause-effect diagram is given as below.
Quality Engineering (SYE3102) 27
[Link]@[Link]

Man Machine
low skill
different system

old machine
lack of attention

poor inspection vibration Process


variation
humidity
bad parts

temperature
Material Environment

Figure 3.7. Common factors affecting process variation.

Construction of cause-effect diagram

Common steps in construction of the cause-effect diagram is as follows:

Step 1: Define the problem to be analysed;

Step 2: Form the team to perform the analysis;

Step 3: Draw the effect box and center line;

Step 4: Specify the major potential cause categories and join them as boxes connected
to the centre line;

Step 5: Identify possible causes through brainstorming;

Step 6: Classify them into the categories;

Step 7: Rank order the causes to identify those that seem most likely to impact the
problem;

Step 8: Take corrective action.

More about cause-effect diagram


Quality Engineering (SYE3102) 28
[Link]@[Link]

Cause-effect diagram was developed by Dr Kaoru Ishikawa and it is sometimes called


Ishikawa diagram. Because of its shape, it is also well-known as a fishbone diagram.

Cause-effect diagram is a very powerful tool for problem analysis. A highly detailed
cause-effect diagram can serve as an effective trouble-shooting aid. It has nearly unlimited
application in various industries.

Furthermore, the construction of cause-effect diagram relies heavily on the participation


by every member of the team. Each member takes a turn giving one idea at a time, so that
no one dominates the brainstorming session. It is, as a team experience tends to get people
more involved in attacking the problem rather than in affixing blame.
Quality Engineering (SYE3102) 29
[Link]@[Link]

4. Probability and Statistics Background

Modern quality control techniques are all based on sound statistical principles. In this
chapter, we summarise some basic concepts related to statistical distributions which are
essential for the understanding of statistical quality control theory. In practice, the lack of
a clear understanding of these principles may lead to certain costly mistakes that otherwise
could be avoided.

4.1. Frequency Distribution

Products produced can never be identical, even if they are similar. We always have the so-
called variation that is the difference between similar products. In order to describe the
variability of the product characteristic, we need quantitative measures. Frequency
function is the most common type of description. A frequency distribution is an
arrangement of the data by magnitude and it measures how often a certain value occurs.

Example: a data of measurement set

3.0 3.1 3.0 2.9 3.3 3.2 2.9 2.8 3.0 3.2
3.1 2.7 3.0 2.8 2.9 3.2 3.1 3.0 2.8 3.0
2.7 3.2 3.1 3.4 2.8 2.9 2.8 3.0 3.3 2.9
2.9 3.3 3.0 2.8 2.9 3.2 3.1 3.0 2.9 3.2
3.1 2.9 3.2 3.3 2.8 3.0 3.2 3.1 2.9 3.0

Table 4.1. A set of ungrouped data.

Based on the data set in Table 4.1, it is difficult to draw any conclusion. We can, however,
obtain the frequency of the occurrence of the data and obtain a set of grouped data as in
Table 4.2. It is clear that the measurement values seem to be centred around 4.2.

To get a clearer picture, we can construct a frequency histogram as shown in Figure 4.1.
Quality Engineering (SYE3102) 30
[Link]@[Link]

Measurement 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4


Frequency 2 6 10 12 7 8 4 1

Table 4.2. Measurement frequency (grouped data).

Column 1

11
10
9
8
7
6
5
4
3
2
1

0
2.40 2.50 2.60 2.70 2.80 2.90 3.00 3.10 3.20 3.30

Figure 4.1. An example of frequency distribution in form of a histogram.

Sometimes it is more useful to have the so-called cumulative frequency which is the
accumulated values at each point. Another type of frequency histogram which is common
is the relative frequency and we also have the corresponding relative cumulative
frequency histogram. They show the relative values rather than the actual number of
counts. The shape, however, remains the same.

An example of the cumulative frequency histogram is given below.


Quality Engineering (SYE3102) 31
[Link]@[Link]

Column 3

55
50
45
40
35
30
25
20
15
10
5

Figure 4.2. An example of cumulative frequency histogram.

A frequency histogram presents a visual display of the data in which one may easily see
the shape, location and spread. This is very useful in selecting the appropriate distribution
for statistical analysis. However, to provide better summary information and to compare
different data sets, we need numerical values for the average, and spread. They will be
discussed in the following.

4.2. Mean/Mode/Median

Normally, the mean is used to describe the location of the distribution of a population or a
sample. The mean is the arithmetic average of the data. It is calculated by summing the
data values and dividing this total by the number of data values. That is

n
X X
i 1
i /n

where Xi is the data value for data point i and n is the number of data points.

Usually, the Greek symbol  is used to represent the mean value of a population. The
value for  is rarely known because of the difficulty in measuring all parts of a population.
Quality Engineering (SYE3102) 32
[Link]@[Link]

In most process industries, it is impossible to measure all of the output from a process to
calculate .

When we have a set of grouped data, we can calculate the mean as

k
X n X
i 1
i i /n

where Xi is the midpoint of for the cell i and ni is the frequency for that cell. The number
of cells is k and n is the sum of the frequencies.

For the data set in Table 4.1, the sum is 150.9 and the number of points is 50, so the mean
is 150.9/50=3.02. This can be compared with the mean calculated using the grouped data
which is also 3.02 as the cell's midpoint is the same as the actual measurement value.
However, in most of cases, when data is grouped, some information is lost because it is
not always that the measurement data fall on the cell's midpoint. Actually, we do not know
the actual value for grouped data.

The median which is defined as the middle value in a set of numbers when that data set is
arranged from the lowest to the highest value, is also used in describing the location of a
distribution. The median is not as common as the mean and it is not as good a measure of
location as the mean. However because it is easy to compute, especially when the
measurement values are already ranked, some companies still prefer this measure as a
measure of location of the population.

For the data set in Table 4.1, the median is equal to 3 as if we order the series from small
to large value, the middle point has the value 3. In fact, this can be used for the grouped
data as well.

The mode is simply the value that appears most frequently. Sometimes there is no mode if
no value appears more than once. The mode can be used for grouped data and when a
large number of data points are available. The mode is most suitable to describe the
location of a severely skewed distribution. It should not be used for small samples. The
mode for the data set in Table 4.1 is equal to 3. This is easily seen from the frequency
histogram.
Quality Engineering (SYE3102) 33
[Link]@[Link]

For symmetric distribution, mode gives the similar value as the mean and median. For
other distributions, they can be different, although the mean is usually preferred.

MODE
140

120
MEDIAN
100

80 MEAN
60

40

20

0
0.0000
0.5000
1.0000
1.5000
2.0000
2.5000
3.0000
3.5000
4.0000
4.5000
5.0000
5.5000
6.0000
6.5000
7.0000
7.5000
8.0000
8.5000
9.0000
9.5000
10.0000
10.5000
11.0000
11.5000
12.0000
12.5000
13.0000
13.5000
14.0000
14.5000
15.0000
15.5000
16.0000
16.5000
17.0000
17.5000
18.0000

Figure 4.3. An example of the difference between the mean, median and mode.

Although the mean seems to be the most reasonable measure for the central value of the
distribution, a problem associated with it is that it is not as robust as the others. For
example, for the values 1, 2, 3, 4, 5, 6, 7, 8, both the median and the mean are 4.5.
However, if for some reason such as error in typing, the data used is 1, 2, 3, 4, 5, 6, 7, 80.
In this case, the mean is changed to 13.5 while the median remains 4.5.

4.3. Measure of dispersion

The range is a simple measure of dispersion, or the variation of the distribution. It is


defined as the difference between the largest value and the smallest value in a data set. It
is very easy to compute:

Range = the largest value - the smallest value

The common notation for the range is R and it describes only the absolute spread of the
data. It tells nothing about how much the data values vary from the mean.
Quality Engineering (SYE3102) 34
[Link]@[Link]

For the data set in Table 4.1, the maximum value is 3.4 and the minimum value is 2.7, so
the range is 3.4-2.7=0.7. This is the absolute spread of the data and it is for example useful
when constructing the histogram.

The standard deviation is a more useful measure of variance because it is based on the
difference of the data values from their mean. The symbol  is usually used to denote the
standard deviation for a population and s is usually used to denote the standard deviation
of a sample. The most common formula for the calculation of the standard deviation is


n
(X i  X )2
s i 1

n 1

It should be noted that the denominator is (n-1) instead of n as the case for the calculation
of the mean. This can be explained by the fact that when n-1, it is more reasonable to say
that we cannot compute the standard deviation rather than saying that it is zero.

For the data set in Table 4.1, the standard deviation is equal to 0.17.

It can be noted that the formula for the calculation of standard deviation can be rewritten
as

ni 1 X i  (i 1 X i ) 2
n 2 n

s
n(n  1)

and a similar for grouped data is

ni 1 f i X i  (i 1 f i X i ) 2
k 2 k

s
n(n  1)

Some other measures related to statistical distribution is the skewness and kurtosis which
are defined as


k
fi ( X i  X )3 / n
Skewness  i 1

s3
Quality Engineering (SYE3102) 35
[Link]@[Link]

and


k
fi ( X i  X )4 / n
Kurtosis  i 1

s4

respectively. The skewness is a measure of the lack of symmetry of the data while kurtosis
indicates the peakedness of the data. They are commonly used for grouped data and for
the assessment of the shape of histogram.

4.4. Normal Distribution

Normal distribution is the most widely used statistical distribution for the description of
measurement data. The normal distribution with mean  and standard variation , N(,),
has frequency distribution function

1  ( X  )2 
f (X )  exp 
2   2 

A typical frequency distribution curve for the normal distribution is as follows in figure
4.4.

The cumulative distribution function is common denoted as

t- 
F(t) =  ,

where (t) is the standard normal distribution function. The standard normal distribution
has a mean of zero and a standard deviation of 1. Through the transformation

X 
Z

all normal distributions can be converted to standard normal distribution.


Quality Engineering (SYE3102) 36
[Link]@[Link]

Figure 4.4. A typical normal distribution curve.

In the statistical analysis of data, the normal distribution is very useful because of its
statistical properties and the existing results that can be found in most standard statistical
texts. Statistical process control techniques, especially those using control chart, are
mostly based on normal distribution.

The normal distribution is also known as Gaussian distribution and it is characterised by


the fact that it is unimodal, symmetrical and bell-shaped. The mean, the median and the
mode are all the same.

If a data set is from the normal distribution, we can plot the data on the so-called the
Normal Probability paper. We can first estimate the empirical distribution function. Then
it is plotted on the vertical axis while the data is plotted on the horizontal axis on the
Normal Probability paper. The plotted points are the fitted by a straight line. The mean is
estimated as the point at which the fitted line cuts 0.5 on the vertical axis. The slope of the
fitted line is then the standard deviation of the normal distribution.

Given the values of mean and standard deviation, which are the parameters in the normal
distribution, we can calculate the probability that a measurement value will fall into any
certain interval. The most common application of normal distribution is to determine the
probability that a value will be less than a prescribed value x. After the transformation to
standard normal distribution, statistical tables can be used to obtain this probability.

The normal distribution has many useful properties and that is why it is commonly used in
statistical data analysis. One of these is related to the sum of normally distributed random
Quality Engineering (SYE3102) 37
[Link]@[Link]

variables. If X1, X2, ..., Xn are normally and independently distributed random variables,
then Y=X1+X2+...+Xn is also normally distributed.

Another important property of the normal distribution is that it is a good approximation to


the average value of independent random variables. This is because the distribution of
sample means tend to be normally distributed when the sample size is large enough. This
property is commonly known as the central limit theorem which states that the sum of n
independent random variables is approximately normally distributed, regardless of the
distributions of the individual variables. The approximation improves as n increases. This
has often been used as a justification of approximate normality in practice.

4.5. Binomial distribution

The most useful discrete probability distribution is the binomial distribution. Consider a
process that consists of a sequence of n independent trials, where the outcome of each trial
is either a success or a failure. If the probability of success on any trial is p, then the
number of successes in n trials follows the binomial distribution.

If a random variable Z follows binomial distribution, the probability that Z=d is given as

n!
P(Z  d )  p d (1  p) n  d
d!(n  d )!

where p is commonly known as the proportion nonconforming in the population and n is


the sample size. The parameter p can generally be interpreted as the probability of a
specified event and n is the number of trials.

Binomial distribution has been widely in quality control. It is the appropriate model for
sampling from a large population. Tables for the binomial distribution are available in
many statistical textbooks and statistical tables.

The parameters of the binomial distribution are n and p. The mean and variance of the
binomial distribution are

Mean = np
and
Quality Engineering (SYE3102) 38
[Link]@[Link]

Variance = np(1-p).

Column 4

220
200
180
160
140
120
100
80
60
40
20

0
0 1 2 3 4 5 6 7 8 9

Figure 4.5. Frequency distribution for 1000 simulated


Binomial (n=20, p=0.2) variables.

4.6. Poisson distribution

If a random variable Z follows Poisson distribution, the probability that Z=c is given as

c
P ( Z  c)  e 
c!

where  is the average count in a sample.

Poisson distribution has also been commonly used in quality control. A typical application
of the Poisson distribution is as a model of the number of defects or nonconformities that
occur in a unit of product. In fact, any random phenomenon that occurs on a per unit basis
is often well approximated by the Poisson distribution.
Quality Engineering (SYE3102) 39
[Link]@[Link]

Column 5

200
180
160
140
120
100
80
60
40
20

0
0 1 2 3 4 5 6 7 8 9 10 11 12

Figure 4.6. Frequency distribution for 1000 simulated Poisson (=4) variables.

Tables for the Poisson distribution are available in many statistical textbooks and
statistical tables.

There is only one parameter in the Poisson distribution and it is . The mean and variance
of the Poisson distribution are given by:

Mean = Variance = 

That is, the mean and variance of the Poisson distribution are both equal to the parameter
.

It is possible to derive the Poisson distribution as a limiting form of the binomial


distribution. In a binomial distribution with parameters n and p, if we let n approach
infinity and p approaches zero in such a way that np=, then we will obtain a Poisson
distribution.
Quality Engineering (SYE3102) 40
[Link]@[Link]

4.7. Some other distributions and approximations

When the population is finite, hypergeometric probability distribution can be useful.


Suppose that a random sample of n items is selected from the population without
replacement and the number of items in the sample that fall into the class of interest is Y.
Then Y follows hypergeometric distribution and the probability that Y=d is given by

 D  N  D 
  
 d  n  d 
P (Y  d ) 
N
 
n

where N is the population size and D is the number of the items in the whole population
that fall into the class of interest. Also,

a a!
  
 b  b!(a  b)!

In certain quality control problems it is sometimes useful to approximate one probability


distribution with another. This is particularly helpful in situations where the original
distribution is difficult to deal with analytically.

The binomial distribution can be considered as an approximation to hyper geometric


distribution. In the hypergeometric distribution, when N is very large and the sample size
is small, i.e. n<<N, then the binomial distribution with parameter n and p=D/N is a good
approximation to the hypergeometric distribution. That the sample size is much smaller
than the population is commonly the case in quality control problems.

For binomial distribution itself, if the number of trials is large, then we may use the central
limit theorem to justify the normal approximation with mean np and variance np(1-p) as
an approximation of the binomial. The normal distribution to the binomial distribution is
good for p close to 0.5 and n>10. For other values of p, larger values of n are needed.

You might also like