Quality Notes Part II
Quality Notes Part II
In this part, we introduce the basic seven tools, or the Ichikawa seven tools as they are
first summarized in the book by the late Japanese Professor Ichikawa. These techniques
are applicable to most engineering processes and serve as basic technique in analysing and
monitoring processes. It should be mentioned that there are also other quality control tools
and some of them will be discussed later. The basic tools, however, serves the purpose of
preliminary analysis in most of practical situations.
In order to carry out statistical analysis of a process, we need some information about the
process in form of some process characteristics data. Cheek sheet is a very useful
approach in data collection activity and it can be used by everyone to collect data in a
reliable and organized way.
Check sheets are also known as Data Collection Sheets. Where they are used for counting,
a commonly used term is Tally Sheet. Some specific names are process distribution check
sheet, defective count check sheet, checklist, etc. They are all useful in data collection
process. With the help of computer, check sheet can also be used for keying in
measurement data.
An Example
Usually, there are a number of important information in the check sheet and there are
different types of check sheets depending on the objectives of the data collection. The
most important information in a check sheet is the measurement data. A sample check
sheet is displayed in Figure 3.1.
The form of the check sheet is usually individualized for each situation and it is designed
by the project team. Whenever it is possible, check sheets should be designed to show
location and time of the data collection. It should be informative and user-friendly.
Creativity usually plays an important role in designing a check sheet.
Check sheet should be used to ensure the data is accurately recorded and it is easy to use
later, either for direct interpretation or for transcription, for example into a computer. It is
especially useful when the recording involves counting, classifying, checking or locating.
Suitably designed check sheet is also useful to see the distribution of measures as they are
built up.
1 9 18 16 12 7 2
10 20 30 40 50 60 70 80
Value of the measurement
Figure 3.1. A simple check sheet for measurement of certain percentage values.
Step 1: Identify the objective of the measurement and the type of information that is
needed for the analysis
Step 2: Identify the period of data collection and the number of measurements needed
Step 3: Design a check sheet that is easy to use and contain the items needed for the
data collection
Step 4: Ensure that data entry is done correctly, especially when the interpretation is
needed for entering data into correct category
Quality Engineering (SYE3102) 17
[Link]@[Link]
This is a technique to identify the most critical quality problems. This technique is
advocated by [Link] Juran, one of the most famous quality gurus. It is named after an
Italian economist, Vifredo Pareto, who determined that 85% of the world's wealth was
owned by 15% of the people. In terms of quality, we can usually say that a large
percentage of the problems are determined by a small percentage of the causes.
Identifying these "vital few" causes and correcting them will give us the most
improvement for our efforts. We should not worry about the "trivial many" causes that
account for only a few quality problems at this time.
Pareto charts provide the vital information needed to identify and prioritize problems in
any process. Once the frequency of defects or errors has been prioritized, the cost
associated with each category should be examined before corrective action is taken. That
is, the defects or problem containing the highest cost should be addressed first. This
procedure will enable you to improve the process and reduce costs in a more efficient
manner.
To conduct a Pareto analysis, we collect data on the frequency of different causes, or types
of quality problems. Then we sort these problems in order of frequency and calculate the
percentage of each as well as the cumulative percentage. Often, we also express the result
using a graphical diagram called a Pareto diagram. This method is best seen using an
example.
Assume that for a particular production process, defectives for the past weeks are
classified as far as possible according to categories A-G with the rest as O category. The
number of counts are 8, 24, 4, 26, 6, 13, 1, 3, respectively. The Pareto diagram is the
produced as follows. It can be seen that the most important problem categories are B and
Quality Engineering (SYE3102) 18
[Link]@[Link]
D types of problems. They together count for about 60 percent of overall problems. These
two types of problems should be dealt with as early as possible.
40 100
30
number of counts
20 50
10
0
B
D
O
E
C
F
Category
Summarize the data collected on the check sheet by counting the number of times that
each event occurred. Then rearrange the data in decreasing order of occurrence.
3.3. Histogram
Uses of histogram
An example
For the data set given before, we can plot the frequency value using a bar graph, as shown
below.
10
8
Freq. Value
6 Freq. Value
0
2.7 2.8 2.9 3 3.1 3.2 3.3 3.4
Categories
Interpretation of histogram
A histogram provides a convenient picture of the data. From the histogram, it is easy to
see that the range and the centre of the data. Usually, the frequencies are smaller the
further away we move from the centre of the histogram. However, this is not always the
case.
Quality Engineering (SYE3102) 21
[Link]@[Link]
A symmetric distribution means that both large and small measurements fall equally
around a central value. This is typical of the output of most industrial processes. From the
histogram, we can easily identify whether the data are equally distributed on each side of
the centre or the data are skewed to the right or to the left. Statistical distribution can then
be selected based on this characteristic.
Another characteristic is the peakedness of the data. Sometimes more than one peak is
possible. A bi-modal histogram has two distinct peaks and it indicates that there are two
frequently occurring measurements. Often this results from a mixture of two different
populations in the sample data.
Step 1: Collect the data. The more the better, subject to practical restrictions. The
sample size is then the number of data collected;
Step 2: Determine the largest value and the smallest value. This is to calculate the
range of the data;
Step 3: Based on the difference between the largest and the smallest values, determine
the number of intervals to be used.
Step 4: Determine the intervals to be used;
Step 5: Check the data which belong to each class. Enter and count the total. To avoid
mistake, double check;
Step 6: In a graph paper, put in the vertical and horizontal axis. Plot the frequency on
the vertical axis. Draw a bar graph.
An Example
Figure 3.4 is a run chart showing the sales per week for a certain product. It can be seen
that the sales has increased over the past couple of month and probably has stabilized.
1000
750
Sales in $ per week
500
250
0
0
10
12
14
A trend shows an overall movement in one direction which may be massed by the ups and
downs of individual points. For example, where daily sales figures go up and down, but
general sales slowly increase over the longer term.
A spike is a short term changes which may be caused by some unusual event, such as a
weak batch of yeast causing a large number of rejections in a bakery.
Statistical process control technique originated in early twenties when Professor Shewhart
in his famous book, Shewhart (1931), presented control charts for monitoring process
characteristic. The basic idea is that the processes are always subject to random variation,
because of the variation in the quality of incoming material, the random changes of
environmental factors such as temperature, humidity and vibration, and the difference in
handling of the product by different people.
I Line Chart
Range Span: 2
Control Limits: 3 Sigma
3
UCL = 2.641
2.5
2
%defectivees
.5
LCL = .345
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
Observation
Quality Engineering (SYE3102) 24
[Link]@[Link]
Although there are many types of control charts and there also different ways to construct
them, the common steps in setting up a control chart in practice can be summarized as
follows.
Step 2: Calculate the process mean and use it as the centre line (CL)
Step 4: Calculate the upper control limit (UCL) and the lower control limit (LCL) as
the mean plus and minus three standard deviation, respectively
Step 5: Plot the process characteristics on the chart and connect the consecutive points
Step 6: If there are points that fall out of the limits, check the reason
When investigating problems, typically when searching for their causes, it may be
suspected that two items are related in some way. The scatter diagram helps to identify the
existence of a measurable relationship between two such items by measuring them in pairs
and plotting them in a graph.
Scatter diagram can be used to show the type and degree of any causal relationship
between two factors. Especially when it is suspected that the variation of two items is
connected in some way, to show any actual correlation between the two.
The following is a set of data collected for a type of soft drink. The percentage of
defectives after a certain storage time is measured. It is aimed to study the relationship
between the storage time and the percentage.
291 232 495 437 503 488 516 598 485 171
0.09 0.06 0.09 0.12 0.08 0.08 0.09 0.13 0.14 0.05
100 418 377 230 294 403 342 378 380 339
0.03 0.06 0.08 0.05 0.08 0.03 0.12 0.10 0.07 0.10
138 431 388 370 284 171 302 413 140 141
0.06 0.11 0.08 0.04 0.05 0.03 0.09 0.06 0.05 0.05
It is very difficult to draw any conclusion from the tabulated values. However, a scatter
plot can be made and it clearly indicates the strong relationship between the storage time
and the percentage of defectives.
0.15
% defectives
0.10
0.05
0.00
0
100
200
300
400
500
600
700
800
storage time
Quality Engineering (SYE3102) 26
[Link]@[Link]
If two variables show a relationship, they are said to be correlated. A correlation exists
between them. This correlation may be linear or non-linear.
If the scatter plot shows an increasing trend, the variables are positively correlated. On the
other hand, if it shows a decreasing trend, they are negatively correlated.
The basic idea is that the processes are always subject to random variation, because of the
variation in the quality of incoming material, the random changes of environmental factors
such as temperature, humidity and vibration, and the difference in handling of the product
by different people. An example of a cause-effect diagram is given as below.
Quality Engineering (SYE3102) 27
[Link]@[Link]
Man Machine
low skill
different system
old machine
lack of attention
temperature
Material Environment
Step 4: Specify the major potential cause categories and join them as boxes connected
to the centre line;
Step 7: Rank order the causes to identify those that seem most likely to impact the
problem;
Cause-effect diagram is a very powerful tool for problem analysis. A highly detailed
cause-effect diagram can serve as an effective trouble-shooting aid. It has nearly unlimited
application in various industries.
Modern quality control techniques are all based on sound statistical principles. In this
chapter, we summarise some basic concepts related to statistical distributions which are
essential for the understanding of statistical quality control theory. In practice, the lack of
a clear understanding of these principles may lead to certain costly mistakes that otherwise
could be avoided.
Products produced can never be identical, even if they are similar. We always have the so-
called variation that is the difference between similar products. In order to describe the
variability of the product characteristic, we need quantitative measures. Frequency
function is the most common type of description. A frequency distribution is an
arrangement of the data by magnitude and it measures how often a certain value occurs.
3.0 3.1 3.0 2.9 3.3 3.2 2.9 2.8 3.0 3.2
3.1 2.7 3.0 2.8 2.9 3.2 3.1 3.0 2.8 3.0
2.7 3.2 3.1 3.4 2.8 2.9 2.8 3.0 3.3 2.9
2.9 3.3 3.0 2.8 2.9 3.2 3.1 3.0 2.9 3.2
3.1 2.9 3.2 3.3 2.8 3.0 3.2 3.1 2.9 3.0
Based on the data set in Table 4.1, it is difficult to draw any conclusion. We can, however,
obtain the frequency of the occurrence of the data and obtain a set of grouped data as in
Table 4.2. It is clear that the measurement values seem to be centred around 4.2.
To get a clearer picture, we can construct a frequency histogram as shown in Figure 4.1.
Quality Engineering (SYE3102) 30
[Link]@[Link]
Column 1
11
10
9
8
7
6
5
4
3
2
1
0
2.40 2.50 2.60 2.70 2.80 2.90 3.00 3.10 3.20 3.30
Sometimes it is more useful to have the so-called cumulative frequency which is the
accumulated values at each point. Another type of frequency histogram which is common
is the relative frequency and we also have the corresponding relative cumulative
frequency histogram. They show the relative values rather than the actual number of
counts. The shape, however, remains the same.
Column 3
55
50
45
40
35
30
25
20
15
10
5
A frequency histogram presents a visual display of the data in which one may easily see
the shape, location and spread. This is very useful in selecting the appropriate distribution
for statistical analysis. However, to provide better summary information and to compare
different data sets, we need numerical values for the average, and spread. They will be
discussed in the following.
4.2. Mean/Mode/Median
Normally, the mean is used to describe the location of the distribution of a population or a
sample. The mean is the arithmetic average of the data. It is calculated by summing the
data values and dividing this total by the number of data values. That is
n
X X
i 1
i /n
where Xi is the data value for data point i and n is the number of data points.
Usually, the Greek symbol is used to represent the mean value of a population. The
value for is rarely known because of the difficulty in measuring all parts of a population.
Quality Engineering (SYE3102) 32
[Link]@[Link]
In most process industries, it is impossible to measure all of the output from a process to
calculate .
k
X n X
i 1
i i /n
where Xi is the midpoint of for the cell i and ni is the frequency for that cell. The number
of cells is k and n is the sum of the frequencies.
For the data set in Table 4.1, the sum is 150.9 and the number of points is 50, so the mean
is 150.9/50=3.02. This can be compared with the mean calculated using the grouped data
which is also 3.02 as the cell's midpoint is the same as the actual measurement value.
However, in most of cases, when data is grouped, some information is lost because it is
not always that the measurement data fall on the cell's midpoint. Actually, we do not know
the actual value for grouped data.
The median which is defined as the middle value in a set of numbers when that data set is
arranged from the lowest to the highest value, is also used in describing the location of a
distribution. The median is not as common as the mean and it is not as good a measure of
location as the mean. However because it is easy to compute, especially when the
measurement values are already ranked, some companies still prefer this measure as a
measure of location of the population.
For the data set in Table 4.1, the median is equal to 3 as if we order the series from small
to large value, the middle point has the value 3. In fact, this can be used for the grouped
data as well.
The mode is simply the value that appears most frequently. Sometimes there is no mode if
no value appears more than once. The mode can be used for grouped data and when a
large number of data points are available. The mode is most suitable to describe the
location of a severely skewed distribution. It should not be used for small samples. The
mode for the data set in Table 4.1 is equal to 3. This is easily seen from the frequency
histogram.
Quality Engineering (SYE3102) 33
[Link]@[Link]
For symmetric distribution, mode gives the similar value as the mean and median. For
other distributions, they can be different, although the mean is usually preferred.
MODE
140
120
MEDIAN
100
80 MEAN
60
40
20
0
0.0000
0.5000
1.0000
1.5000
2.0000
2.5000
3.0000
3.5000
4.0000
4.5000
5.0000
5.5000
6.0000
6.5000
7.0000
7.5000
8.0000
8.5000
9.0000
9.5000
10.0000
10.5000
11.0000
11.5000
12.0000
12.5000
13.0000
13.5000
14.0000
14.5000
15.0000
15.5000
16.0000
16.5000
17.0000
17.5000
18.0000
Figure 4.3. An example of the difference between the mean, median and mode.
Although the mean seems to be the most reasonable measure for the central value of the
distribution, a problem associated with it is that it is not as robust as the others. For
example, for the values 1, 2, 3, 4, 5, 6, 7, 8, both the median and the mean are 4.5.
However, if for some reason such as error in typing, the data used is 1, 2, 3, 4, 5, 6, 7, 80.
In this case, the mean is changed to 13.5 while the median remains 4.5.
The common notation for the range is R and it describes only the absolute spread of the
data. It tells nothing about how much the data values vary from the mean.
Quality Engineering (SYE3102) 34
[Link]@[Link]
For the data set in Table 4.1, the maximum value is 3.4 and the minimum value is 2.7, so
the range is 3.4-2.7=0.7. This is the absolute spread of the data and it is for example useful
when constructing the histogram.
The standard deviation is a more useful measure of variance because it is based on the
difference of the data values from their mean. The symbol is usually used to denote the
standard deviation for a population and s is usually used to denote the standard deviation
of a sample. The most common formula for the calculation of the standard deviation is
n
(X i X )2
s i 1
n 1
It should be noted that the denominator is (n-1) instead of n as the case for the calculation
of the mean. This can be explained by the fact that when n-1, it is more reasonable to say
that we cannot compute the standard deviation rather than saying that it is zero.
For the data set in Table 4.1, the standard deviation is equal to 0.17.
It can be noted that the formula for the calculation of standard deviation can be rewritten
as
ni 1 X i (i 1 X i ) 2
n 2 n
s
n(n 1)
ni 1 f i X i (i 1 f i X i ) 2
k 2 k
s
n(n 1)
Some other measures related to statistical distribution is the skewness and kurtosis which
are defined as
k
fi ( X i X )3 / n
Skewness i 1
s3
Quality Engineering (SYE3102) 35
[Link]@[Link]
and
k
fi ( X i X )4 / n
Kurtosis i 1
s4
respectively. The skewness is a measure of the lack of symmetry of the data while kurtosis
indicates the peakedness of the data. They are commonly used for grouped data and for
the assessment of the shape of histogram.
Normal distribution is the most widely used statistical distribution for the description of
measurement data. The normal distribution with mean and standard variation , N(,),
has frequency distribution function
1 ( X )2
f (X ) exp
2 2
A typical frequency distribution curve for the normal distribution is as follows in figure
4.4.
t-
F(t) = ,
where (t) is the standard normal distribution function. The standard normal distribution
has a mean of zero and a standard deviation of 1. Through the transformation
X
Z
In the statistical analysis of data, the normal distribution is very useful because of its
statistical properties and the existing results that can be found in most standard statistical
texts. Statistical process control techniques, especially those using control chart, are
mostly based on normal distribution.
If a data set is from the normal distribution, we can plot the data on the so-called the
Normal Probability paper. We can first estimate the empirical distribution function. Then
it is plotted on the vertical axis while the data is plotted on the horizontal axis on the
Normal Probability paper. The plotted points are the fitted by a straight line. The mean is
estimated as the point at which the fitted line cuts 0.5 on the vertical axis. The slope of the
fitted line is then the standard deviation of the normal distribution.
Given the values of mean and standard deviation, which are the parameters in the normal
distribution, we can calculate the probability that a measurement value will fall into any
certain interval. The most common application of normal distribution is to determine the
probability that a value will be less than a prescribed value x. After the transformation to
standard normal distribution, statistical tables can be used to obtain this probability.
The normal distribution has many useful properties and that is why it is commonly used in
statistical data analysis. One of these is related to the sum of normally distributed random
Quality Engineering (SYE3102) 37
[Link]@[Link]
variables. If X1, X2, ..., Xn are normally and independently distributed random variables,
then Y=X1+X2+...+Xn is also normally distributed.
The most useful discrete probability distribution is the binomial distribution. Consider a
process that consists of a sequence of n independent trials, where the outcome of each trial
is either a success or a failure. If the probability of success on any trial is p, then the
number of successes in n trials follows the binomial distribution.
If a random variable Z follows binomial distribution, the probability that Z=d is given as
n!
P(Z d ) p d (1 p) n d
d!(n d )!
Binomial distribution has been widely in quality control. It is the appropriate model for
sampling from a large population. Tables for the binomial distribution are available in
many statistical textbooks and statistical tables.
The parameters of the binomial distribution are n and p. The mean and variance of the
binomial distribution are
Mean = np
and
Quality Engineering (SYE3102) 38
[Link]@[Link]
Variance = np(1-p).
Column 4
220
200
180
160
140
120
100
80
60
40
20
0
0 1 2 3 4 5 6 7 8 9
If a random variable Z follows Poisson distribution, the probability that Z=c is given as
c
P ( Z c) e
c!
Poisson distribution has also been commonly used in quality control. A typical application
of the Poisson distribution is as a model of the number of defects or nonconformities that
occur in a unit of product. In fact, any random phenomenon that occurs on a per unit basis
is often well approximated by the Poisson distribution.
Quality Engineering (SYE3102) 39
[Link]@[Link]
Column 5
200
180
160
140
120
100
80
60
40
20
0
0 1 2 3 4 5 6 7 8 9 10 11 12
Figure 4.6. Frequency distribution for 1000 simulated Poisson (=4) variables.
Tables for the Poisson distribution are available in many statistical textbooks and
statistical tables.
There is only one parameter in the Poisson distribution and it is . The mean and variance
of the Poisson distribution are given by:
Mean = Variance =
That is, the mean and variance of the Poisson distribution are both equal to the parameter
.
D N D
d n d
P (Y d )
N
n
where N is the population size and D is the number of the items in the whole population
that fall into the class of interest. Also,
a a!
b b!(a b)!
For binomial distribution itself, if the number of trials is large, then we may use the central
limit theorem to justify the normal approximation with mean np and variance np(1-p) as
an approximation of the binomial. The normal distribution to the binomial distribution is
good for p close to 0.5 and n>10. For other values of p, larger values of n are needed.