Proc Capability - Sas User Guide
Proc Capability - Sas User Guide
SAS/QC 15.3
User’s Guide
The CAPABILITY
Procedure
®
SAS Documentation
January 31, 2023
This document is an individual chapter from SAS/QC® 15.3 User’s Guide.
The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2023. SAS/QC® 15.3 User’s Guide. Cary, NC:
SAS Institute Inc.
SAS/QC® 15.3 User’s Guide
Copyright © 2023, SAS Institute Inc., Cary, NC, USA
All Rights Reserved. Produced in the United States of America.
For a hard-copy book: No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by
any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute
Inc.
For a web download or e-book: Your use of this publication shall be governed by the terms established by the vendor at the time
you acquire this publication.
The scanning, uploading, and distribution of this book via the internet or any other means without the permission of the publisher is
illegal and punishable by law. Please purchase only authorized electronic editions and do not participate in or encourage electronic
piracy of copyrighted materials. Your support of others’ rights is appreciated.
January 2023
SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the
USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.
SAS software may be provided with certain third-party software, including but not limited to open source software, which is licensed
under its applicable third-party software license agreement. For license information about third-party software distributed with SAS
software, refer to Third-Party Software Reference | SAS Support.
Chapter 6
The CAPABILITY Procedure
Contents
Introduction: CAPABILITY Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Learning about the CAPABILITY Procedure . . . . . . . . . . . . . . . . . . . . . . 205
PROC CAPABILITY and General Statements . . . . . . . . . . . . . . . . . . . . . . . . . 205
Overview: CAPABILITY Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Getting Started: CAPABILITY Procedure . . . . . . . . . . . . . . . . . . . . . . . . 207
Computing Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . 207
Computing Capability Indices . . . . . . . . . . . . . . . . . . . . . . . . . 209
Syntax: CAPABILITY Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
PROC CAPABILITY Statement . . . . . . . . . . . . . . . . . . . . . . . . 212
BY Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
CLASS Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
FREQ Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
ID Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
SPEC Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
VAR Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
WEIGHT Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
Graphical Enhancement Statements . . . . . . . . . . . . . . . . . . . . . . 230
Details: CAPABILITY Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
Input Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
Output Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
Signed Rank Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Tests for Normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
Percentile Computations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
Robust Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
Computing the Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
Assumptions and Terminology for Capability Indices . . . . . . . . . . . . . 246
Standard Capability Indices . . . . . . . . . . . . . . . . . . . . . . . . . . 246
Specialized Capability Indices . . . . . . . . . . . . . . . . . . . . . . . . . 250
Missing Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
ODS Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
Examples: CAPABILITY Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 260
Example 6.1: Reading Specification Limits . . . . . . . . . . . . . . . . . . 260
Example 6.2: Enhancing Reference Lines . . . . . . . . . . . . . . . . . . . 262
Example 6.3: Displaying a Confidence Interval for Cpk . . . . . . . . . . . . 264
CDFPLOT Statement: CAPABILITY Procedure . . . . . . . . . . . . . . . . . . . . . . . 266
200 F Chapter 6: The CAPABILITY Procedure
descriptive statistics based on moments, including skewness and kurtosis. Other descriptive information
provided includes quantiles or percentiles (such as the median), frequency tables, and details on extreme
values.
histograms. Optionally, these can be superimposed with specification limits, fitted probability density
curves for various distributions, and kernel density estimates.
cumulative distribution function plots (cdf plots). Optionally, these can be superimposed with specifi-
cation limits and probability distribution curves for various distributions.
quantile-quantile plots (Q-Q plots), probability plots, and probability-probability plots (P-P plots).
These plots facilitate the comparison of a data distribution with various theoretical distributions.
Optionally, Q-Q plots and probability plots can be superimposed with specification limits.
comparative histograms, cdf plots, Q=Q plots, probability plots, and P-P plots. These are composite
graphs that are composed of plots that correspond to the different levels of specified CLASS variables.
goodness-of-fit tests for a variety of distributions including the normal. The assumption of normality is
critical to the interpretation of capability indices.
statistical intervals (prediction, tolerance, and confidence intervals) for a normal population
the ability to produce plots either as traditional graphics, ODS Graphics output, or legacy line printer
plots. Traditional graphics can be saved, replayed, and annotated.
204 F Chapter 6: The CAPABILITY Procedure
the ability to inset summary statistics and capability indices in graphical output
the ability to create output data sets containing summary statistics, capability indices, histogram
intervals, parameters of fitted curves, and statistical intervals
You can use the PROC CAPABILITY statement, together with the VAR and SPEC statements, to compute
summary statistics and process capability indices. See “Getting Started: CAPABILITY Procedure” on
page 207 for introductory examples. In addition, you can use the statements summarized in Table 6.1 to
request plots and specialized analyses:
Statement Result
CDFPLOT Cumulative distribution function plot
COMPHISTOGRAM Comparative histogram
HISTOGRAM Histogram
INSET Inset table on plot
INTERVALS Statistical intervals
OUTPUT Output data set with summary statistics and capability indices
PPPLOT Probability-probability plot
PROBPLOT Probability plot
QQPLOT Quantile-quantile plot
You have three alternatives for producing plots with the CAPABILITY procedure:
ODS Graphics output is produced if ODS Graphics is enabled, for example by specifying the ODS
GRAPHICS ON statement prior to the PROC statement.
Legacy line printer charts are produced when you specify the LINEPRINTER option in the PROC
statement.
See Chapter 4, “SAS/QC Graphics,” for more information about producing these different kinds of graphs.
You can use the INSET statement with any of the plot statements to enhance the plot with an inset table of
summary statistics. The INSET statement is not applicable when you produce line printer plots.
Learning about the CAPABILITY Procedure F 205
To broaden your knowledge of the procedure, read “PROC CAPABILITY and General Statements” on
page 205 which summarizes the syntax for the entire procedure and describes the PROC CAPABILITY
statement, the VAR statement, the CLASS statement, and the SPEC statement. Subsequent chapters describe
the statements listed in Table 6.1. In addition to introductory examples, each chapter provides syntax
summaries, descriptions of options, computational details, and advanced examples. Although the chapters
are self-contained, much of what you learn about one plot statement, including the syntax, is transferable to
other plot statements.
The PROC CAPABILITY statement is required to invoke the CAPABILITY procedure. You can use
this statement by itself to compute summary statistics.
The VAR statement, which is optional, specifies the variables in the input data set that are to be
analyzed. These are called the analysis or process variables. By default, all of the numeric variables
are analyzed.
206 F Chapter 6: The CAPABILITY Procedure
The CLASS statement, which is optional, specifies one or two variables that group the data into
classification levels. A separate analysis is carried out for each combination of levels, and you can use
the CLASS statement with plot statements (such as HISTOGRAM) to create comparative displays. 1
The SPEC statement, which is optional, provides specification limits for the variables that are to be
analyzed. When you use a SPEC statement, the procedure computes process capability indices in
addition to summary statistics. Furthermore, the specification limits are displayed in plots created with
plot statements that are described in subsequent chapters.
You can use the PROC CAPABILITY statement to request a variety of statistics for summarizing the data
distribution of each analysis variable:
sample moments
missing values
You can use the PROC CAPABILITY and SPEC statements together to request a variety of statistics for
process capability analysis:
overall analysis.
Getting Started: CAPABILITY Procedure F 207
request legacy line printer plots and define special printing characters used for features
suppress tables
control the appearance of the areas under a histogram outside the specification limits
data Cans;
label Weight = "Fluid Weight (ounces)";
input Weight @@;
datalines;
12.07 12.02 12.00 12.01 11.98 11.96 12.04 12.05 12.01 11.97
12.03 12.03 12.00 12.04 11.96 12.02 12.06 12.00 12.02 11.91
12.05 11.98 11.91 12.01 12.06 12.02 12.05 11.90 12.07 11.98
12.02 12.11 12.00 11.99 11.95 11.98 12.05 12.00 12.10 12.04
12.06 12.04 11.99 12.06 11.99 12.07 11.96 11.97 12.00 11.97
12.09 11.99 11.95 11.99 11.99 11.96 11.94 12.03 12.09 12.03
11.99 12.00 12.05 12.04 12.05 12.01 11.97 11.93 12.00 11.97
12.13 12.07 12.00 11.96 11.99 11.97 12.05 11.94 11.99 12.02
11.95 11.99 11.91 12.06 12.03 12.06 12.05 12.04 12.03 11.98
12.05 12.05 12.11 11.96 12.00 11.96 11.96 12.00 12.01 11.98
;
You can use the PROC CAPABILITY and VAR statements to compute summary statistics for the weights.
208 F Chapter 6: The CAPABILITY Procedure
Moments
N 100 Sum Weights 100
Mean 12.0093 Sum Observations 1200.93
Std Deviation 0.04695269 Variance 0.00220456
Skewness 0.05928405 Kurtosis -0.1717404
Uncorrected SS 14422.5469 Corrected SS 0.218251
Coeff Variation 0.39096946 Std Error Mean 0.00469527
Quantiles (Definition 5)
Level Quantile
100% Max 12.130
99% 12.120
95% 12.090
90% 12.065
75% Q3 12.050
50% Median 12.000
25% Q1 11.980
10% 11.955
5% 11.935
1% 11.905
0% Min 11.900
Extreme Observations
Lowest Highest
Value Obs Value Obs
11.90 28 12.09 59
11.91 83 12.10 39
11.91 23 12.11 32
11.91 20 12.11 93
11.93 68 12.13 71
Specification Limits
Limit Percent
Lower (LSL) 11.95000 % < LSL 7.00000
Target 12.00000 % Between 77.00000
Upper (USL) 12.05000 % > USL 16.00000
Frequency Counts
Percents
Value Count Cell Cum
11.90 1 1.0 1.0
11.91 3 3.0 4.0
11.93 1 1.0 5.0
11.94 2 2.0 7.0
11.95 3 3.0 10.0
11.96 8 8.0 18.0
11.97 6 6.0 24.0
11.98 6 6.0 30.0
11.99 10 10.0 40.0
12.00 11 11.0 51.0
12.01 5 5.0 56.0
12.02 6 6.0 62.0
12.03 6 6.0 68.0
12.04 6 6.0 74.0
12.05 10 10.0 84.0
12.06 6 6.0 90.0
12.07 4 4.0 94.0
12.09 2 2.0 96.0
12.10 1 1.0 97.0
12.11 2 2.0 99.0
12.13 1 1.0 100.0
In Figure 6.2, the table labeled Specification Limits lists the specification limits and target value, together with
the percents of observations outside and between the limits. The table labeled Process Capability Indices
lists estimates for the standard process capability indices Cp , CPL, CPU , Cpk , and Cpm , along with 95%
confidence limits. The index Cpm is not computed unless you specify a TARGET= value. See “Standard
Syntax: CAPABILITY Procedure F 211
Capability Indices” on page 246 for formulas used to compute the indices.
If you specify more than one variable in the VAR statement, you can provide corresponding specification
limits and target values by specifying lists of values for the LSL=, USL=, and TARGET= options. As an
alternative to the SPEC statement, you can read specification limits and target values from a data set specified
with the SPEC= option in the PROC CAPABILITY statement. This is illustrated in Example 6.1.
The FREQ option in the PROC CAPABILITY statement requests the table labeled Frequency Counts in
Figure 6.2.
The following section lists all options. See the section “Dictionary of Options” on page 214 for detailed
information.
Summary of Options
Table 6.2 lists all the PROC CAPABILITY options by function.
Option Description
Input Data Set Options
ANNOTATE= Specifies input data set containing annotation information
DATA= Specifies input data set
EXCLNPWGT Specifies that non-positive weights are to be excluded
NOBYSPECS Specifies that specification limits in SPEC= data set are to be
applied to all BY groups
SPEC= Specifies input data set with specification limits
Computational Options
FORCEQN Forces calculation of the robust estimator of scale Qn
FORCESN Forces calculation of the robust estimator of scale Sn
PCTLDEF= Specifies definition used to calculate percentiles
ROUND= Specifies units used to round variable values
VARDEF= Specifies divisor used to calculate variances and standard
deviations
Output Options
NOPRINT Suppresses printed output
OUTTABLE= Creates an output data set containing univariate statistics and
capability indices in tabular form
Syntax: CAPABILITY Procedure F 213
Option Description
Hypothesis Testing Options
MU0= Specifies mean for null hypothesis in tests for location
LOCCOUNT Requests table of counts used in sign test and signed rank test
NORMALTEST Performs tests for normality
TRIMMED-Options
ALPHA= Specifies confidence level
TYPE= Specifies type of confidence limit
WINSORIZED-Options
ALPHA= Specifies confidence level
TYPE= Specifies type of confidence limit
Capability Index Options
CPMA= (obsolete) Specifies a for Cpm .a/
CHECKINDICES Requests test of normality in conjunction with standard
indices
SPECIALINDICES Requests table of specialized indices including Boyles’ Cpm ,
Cjkp , Cpmk , Cpm .a/, and Wright’s Cs
CHECKINDICES-Options
ALPHA= Specifies cutoff probability for p-values for test for normality
used in conjunction with process capability indices
TEST= Specifies test for normality (Shapiro-Wilk,
Kolmogorov-Smirnov, Anderson-Darling, Cramér–von Mises,
or no test)
Option Description
CIBASIC-Options
ALPHA= Specifies confidence level
TYPE= Specifies type of confidence limit
CIINDICES-Options
ALPHA= Specifies confidence level
TYPE= Specifies type of confidence limit
CIPCTLDF-Options
ALPHA= Specifies confidence level
TYPE= Specifies type of confidence limit
CIPCTLNORMAL-Options
ALPHA= Specifies confidence level
TYPE= Specifies type of confidence limit
CIPROBEX-Options
ALPHA= Specifies confidence level
TYPE= Specifies type of confidence limit
Dictionary of Options
The following entries provide detailed descriptions of the options in the PROC CAPABILITY statement.
ALL
requests all of the tables generated by the FREQ, MODES, NEXTRVAL=5, CIBASIC, CIPCTLDF,
and CIPCTLNORMAL options. If a WEIGHT statement is not used, the ALL option also requests
the tables generated by the LOCCOUNT, NORMALTEST, ROBUSTSCALE, TRIMMED=.25, and
WINSORIZED=.25 options. PROC CAPABILITY uses any values that you specify with the ALPHA=,
MUO=, NEXTRVAL=, CIBASIC, CIPCTLDF, CIPCTLNORMAL, TRIMMED=, or WINSORIZED=
options in conjunction with the ALL option.
ALPHA=value
specifies the default confidence level for all confidence limits computed by the CAPABILITY procedure.
The coverage percent for the confidence limits is .1 value/100. For example, ALPHA=0.10 results
in 90% confidence limits. The default value is 0.05.
Note that specialized ALPHA= options are available for a number of confidence interval options. For
example, you can specify CIBASIC( ALPHA=0.10 ) to request a table of Basic Confidence Limits at
the 90% level. The default values of these options default to the value of the general ALPHA= option.
Syntax: CAPABILITY Procedure F 215
ANNOTATE=SAS-data-set
ANNO=SAS-data-set
specifies an input data set containing annotate variables as described in SAS/GRAPH documentation.
You can use this data set to add features to traditional graphics. Use this data set only when creating
traditional graphics; it is ignored when the LINEPRINTER option is specified and when ODS Graphics
is in effect. Features provided in this data set are added to every plot produced in the current run of the
procedure.
ALPHA=value
specifies the cutoff probability for p-values for a test for normality used in conjunction with
process capability indices. The value must be between zero and 0.5. The default value is 0.05.
ALPHA=value
specifies the confidence level. The coverage percent for the confidence limits is .1 value/100.
For example, ALPHA=0.10 requests 90% confidence limits. The default value is 0.05.
TYPE=keyword
specifies the type of confidence limit, where keyword is LOWER, UPPER, or TWOSIDED. The
default value is TWOSIDED.
ALPHA=value
specifies the confidence level. The coverage percent for the confidence limits is .1 value/100.
For example, ALPHA=0.10 requests 90% confidence limits. The default value is 0.05.
TYPE=keyword
specifies the type of confidence limit, where keyword is LOWER, UPPER, or TWOSIDED. The
default value is TWOSIDED.
ALPHA=value
specifies the confidence level. The coverage percent for the confidence limits is .1 value/100.
For example, ALPHA=0.10 requests 90% confidence limits. The default value is 0.05.
TYPE=keyword
specifies the type of confidence limit, where keyword is LOWER, UPPER, SYMMETRIC, or
ASYMMETRIC. The default value is SYMMETRIC.
ALPHA=value
specifies the confidence level. The coverage percent for the confidence limits is .1 value/100.
For example, ALPHA=0.10 requests 90% confidence limits. The default value is 0.05.
TYPE=keyword
specifies the type of confidence limit, where keyword is LOWER, UPPER, or TWOSIDED. The
default value is TWOSIDED.
ALPHA=value
specifies the confidence level. The coverage percent for the confidence limits is .1 value/100.
For example, ALPHA=0.10 requests 90% confidence limits. The default value is 0.05.
TYPE=keyword
specifies the type of confidence limit, where keyword is LOWER, UPPER, or TWOSIDED. The
default value is TWOSIDED.
CPMA=value
specifies the value of the parameter a for the capability index Cpm .a/. This option has been superseded
by the SPECIALINDICES(CPMA=) option.
DATA=SAS-data-set
specifies the input data set containing the observations to be analyzed. If the DATA= option is omitted,
the procedure uses the most recently created SAS data set.
DEF=index
is an alias for the PCTLDEF= option. See the entry for the PCTLDEF= option.
EXCLNPWGT
excludes observations with non-positive weight values (zero or nonnegative) for the analysis. By
default, PROC CAPABILITY treats observations with negative weights like those with zero weights
and counts them in the total number of observations. This option is applicable only if you specify a
WEIGHT statement.
FORCEQN
forces calculation of the robust estimate of scale Qn . Because this calculation is very computationally
intensive, by default Qn is not computed for a variable that has more than 65,526 nonmissing obser-
vations. On some hosts, Qn cannot be computed at all when there are more than 65,526 nonmissing
observations.
218 F Chapter 6: The CAPABILITY Procedure
FORCESN
forces calculation of the robust estimate of scale Sn . Because this calculation is computationally
intensive, by default Sn is not computed for a variable that has more than 1 million nonmissing
observations.
FORMCHAR(index)=‘string’
defines characters used for features on legacy line printer plots, where index is a number ranging from
1 to 11, and string is a character or hexadecimal string. This option is ignored unless you specify the
LINEPRINTER option in the PROC CAPABILITY statement.
The index identifies which features are controlled with the string characters, as discussed in the table
that follows. If you specify the FORMCHAR= option omitting the index , the string controls all 11
features.
By default, the form character list specified with the SAS system option FORMCHAR= is used;
otherwise, the default is FORMCHAR=’|---|+|--’. If you print to a PC screen or your device
supports the ASCII symbol set (1 or 2), the following is recommended:
formchar='B3,C4,DA,C2,BF,C3,C5,B4,C0,C1,D9'X
As an example, suppose you want to plot the data values of the empirical cumulative distribution
function with asterisks (*). You can change the appropriate character by using the following:
formchar(2)='*'
Note that the FORMCHAR= option in the PROC CAPABILITY statement enables you to temporarily
override the values of the SAS system option with the same name. The values of the SAS system
option are not altered by using the FORMCHAR= option in PROC CAPABILITY statement.
The features associated with values of index are shown in Table 6.3.
Value of
index Description of Character Chart Feature
1 Vertical bar Frame, ecdf line, HREF= lines
2 Horizontal bar Frame, ecdf line, VREF= lines
3 Box character (upper left) Frame, ecdf line, histogram bars
4 Box character (upper middle) Histogram bars, tick marks
(horizontal axis)
5 Box character (upper right) Frame, histogram bars
6 Box character (middle left) Histogram bars
7 Box character (middle middle) Not used
8 Box character (middle right) Histogram bars, tick marks
(vertical axis)
9 Box character (lower left) Frame
Syntax: CAPABILITY Procedure F 219
Value of
index Description of Character Chart Feature
10 Box character (lower middle) Histogram bars
11 Box character (lower right) Frame, ecdf line
FREQ
requests a frequency table in the printed output that contains the variable values, frequencies, percent-
ages, and cumulative percentages. See Figure 6.2 for an example.
GOUT=graphics-catalog
specifies a graphics catalog in which to save traditional graphics output. This option is ignored unless
you are producing traditional graphics.
LINEPRINTER
requests that legacy line printer plots be produced by the CDFPLOT, HISTOGRAM, PROBPLOT,
PPPLOT, and QQPLOT statements. The CLASS and COMPHISTOGRAM statements cannot be used
when the LINEPRINTER option is specified.
LOCCOUNT
requests a table with the number of observations greater than, not equal to, and less than the value of
MUO=. PROC CAPABILITY uses these values to construct the sign test and signed rank test. This
option is not available if you specify a WEIGHT statement.
MODES
MODE
requests a table of all possible modes. By default, when the data contains multiple modes, PROC
CAPABILITY displays the lowest mode in the table of basic statistical measures. When all values are
unique, PROC CAPABILITY does not produce a table of modes.
MU0=value(s)
LOCATION=value(s)
specifies the value of the mean or location parameter (o ) in the null hypothesis for the tests summarized
in the table labeled Tests for Location: Mu0=value. If you specify a single value, PROC CAPABILITY
tests the same null hypothesis for all analysis variables. If you specify multiple values, a VAR statement
is required, and PROC CAPABILITY tests a different null hypothesis for each analysis variable by
matching the VAR variables with the values in the corresponding order. The default value is 0.
NEXTROBS=n
specifies the number of extreme observations in the table labeled Extreme Observations. The table
lists the n lowest observations and the n highest observations. The default value is 5. The value of n
must be an integer between 0 and half the number of observations. You can specify NEXTROBS=0 to
suppress the table.
220 F Chapter 6: The CAPABILITY Procedure
NEXTRVAL=n
requests the table labeled Extreme Values and specifies the number of extreme values in the table. The
table lists the n lowest unique values and the n highest unique values. The value of n must be an integer
between 0 and half the maximum number of observations. By default, n = 0 and no table is displayed.
NOBYSPECS
specifies that specification limits in SPEC= data set be applied to all BY groups. If you use a BY
statement and specify a SPEC= data set that does not contain the BY variables, you must specify the
NOBYSPECS option.
NOPRINT
suppresses the tables of descriptive statistics and capability indices which are created by the PROC CA-
PABILITY statement. The NOPRINT option does not suppress the tables created by the INTERVALS
or plot statements. You can use the NOPRINT options in these statements to suppress the creation of
their tables.
NORMALTEST
NORMAL
requests a table of Tests for Normality for each of the analysis variables. The table provides test
statistics and p-values for the Shapiro-Wilk test (provided the sample size is less than or equal to 2000),
the Kolmogorov-Smirnov test, the Anderson-Darling test, and the Cramér–von Mises test. See “Tests
for Normality” on page 238 for details. If specification limits are provided, the NORMALTEST option
is assumed.
OUTTABLE=SAS-data-set
specifies an output data set that contains univariate statistics and capability indices arranged in tabular
form. See “OUTTABLE= Data Set” on page 232 for details.
PCTLDEF=index
DEF=index
specifies one of five definitions used to calculate percentiles. The value of index can be 1, 2, 3, 4, or 5.
See “Percentile Computations” on page 240 for details. By default, PCTLDEF=5.
ROBUSTSCALE
requests a table of robust measures of scale. These measures include the interquartile range, Gini’s
mean difference, the median absolute deviation about the median (MAD), and two statistics proposed
by Rousseeuw and Croux (1993), Qn , and Sn . This option is not available if you specify a WEIGHT
statement.
ROUND=value-list
specifies units used to round variable values. The ROUND= option reduces the number of unique
values for each variable and hence reduces the memory required for temporary storage. Values must be
greater than 0 for rounding to occur.
If you use only one value, the procedure uses this unit for all variables. If you use a list of values, you
must also use a VAR statement. The procedure then uses the roundoff values for variables in the order
given in the VAR statement. For example, the following statements specify a roundoff value of 1 for
Yieldstrength and a roundoff value of 0.5 for TENSTREN.
Syntax: CAPABILITY Procedure F 221
When a variable value is midway between the two nearest rounded points, the value is rounded to the
nearest even multiple of the roundoff value. For example, with a roundoff value of 1, the variable
values of –2.5, –2.2, and –1.5 are rounded to –2; the values of –0.5, 0.2, and 0.5 are rounded to 0; and
the values of 0.6, 1.2, and 1.4 are rounded to 1.
SPECIALINDICES
requests a table of specialized process capability indices. These indices include k, Boyles’ modified
Cpm (also denoted as Cpm C), Cjkp , Cpm .a/, Cp.5:15/ , Cpk .5:15/ , Cpmk , Wright’s Cs , Boyles’ Sjkp ,
00 W
Cpp , Cpp , Cpg , Cpq , CpW , Cpk W , C , and Vännmann’s C .u; v/ and C .v/.
, Cpm pc p p
You can provide values for the parameters a for Cpm .a/, u and v for Cp .u; v/ and Cp .v/, and for the
multiplier for Cs by specifying the following options in parentheses after the SPECIALINDICES
option.
CPMA=value
specifies the value of the parameter a for the capability index Cpm .a/ described in Section 3.7
of Kotz and Johnson (1993). The value must be positive. The default value is 0.5. The existing
CPMA= option in the PROC CAPABILITY statement is considered obsolete but still works.
CPU=value
specifies the value of the parameter u for Vännmann’s capability index Cp .u; v/. The value must
be greater than or equal to zero. The default value is zero.
CPV=value
specifies the value of the parameter v for Vännmann’s capability indices Cp .u; v/ and Cp .v/.
The value must be greater than or equal to zero. The default value is 4.
CSGAMMA=value
specifies the value of the multiplier suggested by Chen and Kotz (1996) for Wright’s capability
index Cs . The value must be greater than zero. The default value is 1.
SPEC=SAS-data-set
SPECS=SAS-data-set
specifies an input data set containing specification limits for each of the variables in the VAR statement.
This option is an alternative to the SPEC statement, which also provides specification limits. See
“SPEC= Data Set” on page 230 for details on SPEC= data sets, and Example 6.1 for an example. If you
use both the SPEC= option and a SPEC statement, the SPEC= option is ignored.
ALPHA=value
specifies the confidence level. The coverage percent is .1 value/100. For example, AL-
PHA=0.10 requests a 90% confidence limit. The default value is 0.05.
TYPE=keyword
specifies the type of confidence limit, where keyword is LOWER, UPPER, or TWOSIDED. The
default value is TWOSIDED.
ALPHA=value
specifies the confidence level. The coverage percent is .1 value/100. For example, AL-
PHA=0.10 results in a 90% confidence limit. The default value is 0.05.
TYPE=keyword
specifies the type of confidence limit, where keyword is LOWER, UPPER, or TWOSIDED. The
default value is TWOSIDED.
BY Statement
BY variables ;
You can specify a BY statement in PROC CAPABILITY to obtain separate analyses of observations in groups
that are defined by the BY variables. When a BY statement appears, the procedure expects the input data
set to be sorted in order of the BY variables. If you specify more than one BY statement, only the last one
specified is used.
If your input data set is not sorted in ascending order, use one of the following alternatives:
Sort the data by using the SORT procedure with a similar BY statement.
Syntax: CAPABILITY Procedure F 223
Create an index on the BY variables by using the DATASETS procedure (in Base SAS software).
For more information about BY-group processing, see the discussion in SAS Language Reference: Concepts.
For more information about the DATASETS procedure, see the discussion in the Base SAS Procedures Guide.
CLASS Statement
CLASS variable-1 < (v-options) > < variable-2 < (v-options) > >
< / KEYLEVEL= value1 | (value1 value2) > ;
The CLASS statement specifies one or two variables used to group the data into classification levels. Variables
in a CLASS statement are referred to as CLASS variables. CLASS variables can be numeric or character.
Class variables can have floating point values, but they typically have a few discrete values that define
levels of the variable. You do not have to sort the data by CLASS variables. PROC CAPABILITY uses the
formatted values of the CLASS variables to determine the classification levels.
N OTE : You cannot specify a COMPHISTOGRAM statement together with a CLASS statement.
You can specify the following v-options enclosed in parentheses after a CLASS variable:
MISSING
specifies that missing values for the CLASS variable are to be treated as valid classification levels.
Special missing values that represent numeric values (‘.A’ through ‘.Z’ and ‘._’) are each considered
as a separate value. If you omit MISSING, PROC CAPABILITY excludes the observations with a
missing CLASS variable value from the analysis. Enclose this option in parentheses after the CLASS
variable.
DATA orders values according to their order in the input data set. When you use a plot
statement, PROC CAPABILITY displays the rows (columns) of the comparative
plot from top to bottom (left to right) in the order that the CLASS variable values
first appear in the input data set.
FORMATTED orders values by their ascending formatted values. This order might depend on
your operating environment. When you use a plot statement, PROC CAPABILITY
displays the rows (columns) of the comparative plot from top to bottom (left to
right) in increasing order of the formatted CLASS variable values. For example,
suppose a numeric CLASS variable DAY (with values 1, 2, and 3) has a user-defined
format that assigns Wednesday to the value 1, Thursday to the value 2, and Friday
to the value 3. The rows of the comparative plot will appear in alphabetical order
(Friday, Thursday, Wednesday) from top to bottom.
If there are two or more distinct internal values with the same formatted value, then
PROC CAPABILITY determines the order by the internal value that occurs first in
224 F Chapter 6: The CAPABILITY Procedure
the input data set. For numeric variables without an explicit format, the levels are
ordered by their internal values.
FREQ orders values by descending frequency count so that levels with the most observa-
tions are listed first. If two or more values have the same frequency count, PROC
CAPABILITY uses the formatted values to determine the order.
When you use a plot statement, PROC CAPABILITY displays the rows (columns)
of the comparative plot from top to bottom (left to right) in order of decreasing
frequency count for the CLASS variable values.
INTERNAL orders values by their unformatted values, which yields the same order as PROC
SORT. This order may depend on your operating environment.
When you use a plot statement, PROC CAPABILITY displays the rows (columns)
of the comparative plot from top to bottom (left to right) in increasing order of the
internal (unformatted) values of the CLASS variable. The first CLASS variable
is used to label the rows of the comparative plots (top to bottom). The second
CLASS variable is used to label the columns of the comparative plots (left to right).
For example, suppose a numeric CLASS variable DAY (with values 1, 2, and 3)
has a user-defined format that assigns Wednesday to the value 1, Thursday to the
value 2, and Friday to the value 3. The rows of the comparative plot will appear in
day-of-the-week order (Wednesday, Thursday, Friday) from top to bottom.
You can specify the following options after the slash (/) in the CLASS statement.
NOKEYMOVE
specifies that the location of the key cell in a comparative plot be unchanged by the CLASS statement
KEYLEVEL= option. By default, the key cell is positioned as the first cell in a comparative plot.
The NOKEYMOVE option has no effect unless you specify a plot statement.
FREQ Statement
FREQ variable ;
The FREQ statement names a variable that provides frequencies for each observation in the input data set. If
n is the value of the FREQ variable for a particular observation, then that observation is used n times. If the
value of the FREQ variable is missing or is less than one, the observation is not used in the analysis. If the
value is not an integer, only the integer portion is used.
ID Statement
ID variables ;
The ID statement specifies one or more variables to include in the table of extreme observations. The
corresponding values of the ID variables appear beside the n largest and n smallest observations, where n is
the value of the NEXTROBS= option.
SPEC Statement
The syntax for the SPEC statement is as follows:
SPEC < options > ;
You can use at most one SPEC statement in the CAPABILITY procedure. When you provide specification
limits and target values in a SPEC statement, the tabular output produced by the PROC CAPABILITY
statement includes process capability indices as well as summary statistics. You can use the SPEC statement
in conjunction with the CDFPLOT, COMPHISTOGRAM, HISTOGRAM, PROBPLOT, and QQPLOT
statements to add specification limit and target lines to the plots produced with these statements.
options
control features of the specification limits and target values. Table 6.4 lists all options by function.
Summary of Options
Option Description
Lower Specification Limit Options
CLEFT= Color used to fill area left of lower specification limit (histograms
only)
CLSL= Color of lower specification limit line
LLSL= Line type of lower specification limit line
LSL= Lower specification limit values
LSLSYMBOL= Character used for lower specification limit line in line printer plots
226 F Chapter 6: The CAPABILITY Procedure
Option Description
PLEFT= Pattern type used to fill area left of lower specification limit
(histograms only)
WLSL= Width of lower specification limit line
Target Options
CTARGET= Color of target line
LTARGET= Line type of target line
TARGET= Target value
TARGETSYMBOL= Character used for target in line printer plots
WTARGET= Width of target line
General Options
You can specify the following options whether you are producing ODS Graphics output or traditional
graphics:
CLEFT=color
CLEFT
determines the color used to fill the area under a histogram to the left of the lower specification limit.
You can specify the CLEFT option without an argument to fill this area with an appropriate color from
the ODS style. If you are producing ODS Graphics output, an explicit color specification is ignored.
This option is applicable only when the SPEC statement is used in conjunction with a HISTOGRAM
or COMPHISTOGRAM statement. See Output 6.2.1 for an example. The CLEFT= option also applies
to the area under a fitted curve;for an example, see Output 6.8.1.
CRIGHT=color
CRIGHT
determines the color used to fill the area under a histogram to the right of the upper specification
limit. You can specify the CRIGHT option without an argument to fill this area with an appropriate
color from the ODS style. If you are producing ODS Graphics output, an explicit color specification
is ignored. This option is applicable only when the SPEC statement is used in conjunction with a
HISTOGRAM or COMPHISTOGRAM statement. See Output 6.2.1 for an example. The CRIGHT=
option also applies to the area under a fitted curve; for an example, see Output 6.8.1.
Syntax: CAPABILITY Procedure F 227
LSL=value-list
specifies the lower specification limits for the variables listed in the VAR statement, or for all numeric
variables in the input data set if no VAR statement is used. If you specify only one lower limit, it is
used for all of the variables; otherwise, the number of limits must match the number of variables. See
the section “Computing Capability Indices” on page 209 for an example.
TARGET=value-list
specifies target values for the variables listed in the VAR statement, or for all numeric variables in the
input data set if no VAR statement is used. If you specify only one target value, it is used for all of
the variables; otherwise, the number of values must match the number of variables. See the section
“Computing Capability Indices” on page 209 for an example.
USL=value-list
specifies the upper specification limits for the variables listed in the VAR statement, or for all numeric
variables in the input data set if no VAR statement is used. If you specify only one upper limit, it is
used for all of the variables; otherwise, the number of limits must match the number of variables. See
the section “Computing Capability Indices” on page 209 for an example.
CLSL=color
specifies the color of the lower specification line displayed in plots created with the CDFPLOT,
COMPHISTOGRAM, HISTOGRAM, PROBPLOT, and QQPLOT statements.
CTARGET=color
specifies the color of the target line displayed in plots created with the CDFPLOT, COMPHIS-
TOGRAM, HISTOGRAM, PROBPLOT, and QQPLOT statements.
CUSL=color
specifies the color of the upper specification line displayed in plots created with the CDFPLOT,
COMPHISTOGRAM, HISTOGRAM, PROBPLOT, and QQPLOT statements.
LLSL=linetype
specifies the line type for the lower specification line displayed in plots created with the CDFPLOT,
COMPHISTOGRAM, HISTOGRAM, PROBPLOT, and QQPLOT statements. See Output 6.2.1 for an
example. The default is 1, which produces a solid line.
LTARGET=linetype
specifies the line type for the target line in plots created with the CDFPLOT, COMPHISTOGRAM,
HISTOGRAM, PROBPLOT, and QQPLOT statements. See Output 6.2.1 for an example. The default
is 1, which produces a solid line.
LUSL=linetype
specifies the line type for the upper specification line displayed in plots created with the CDFPLOT,
COMPHISTOGRAM, HISTOGRAM, PROBPLOT, and QQPLOT statements. See Output 6.2.1 for an
example. The default is 1, which produces a solid line.
228 F Chapter 6: The CAPABILITY Procedure
PLEFT=pattern
specifies the pattern used to fill the area under a histogram to the left of the lower specification limit.
This option is applicable only when the SPEC statement is used in conjunction with a HISTOGRAM or
COMPHISTOGRAM statement. For an example, see Output 6.2.1. The PLEFT= option also applies
to the area under a fitted curve; for an example, see Output 6.8.1. The default pattern is a solid fill.
PRIGHT=pattern
specifies the pattern used to fill the area under a histogram to the right of the upper specification limit.
This option is applicable only when the SPEC statement is used in conjunction with a HISTOGRAM or
COMPHISTOGRAM statement. For an example, see Output 6.2.1. The PRIGHT= option also applies
to the area under a fitted curve; for an example, see Output 6.8.1. The default pattern is a solid fill.
WLSL=n
specifies the width in pixels of the lower specification line in plots created with the CDFPLOT,
COMPHISTOGRAM, HISTOGRAM, PROBPLOT, and QQPLOT statements. See Output 6.2.1 for an
illustration. The default is 1.
WTARGET=n
specifies the width in pixels of the target line in plots created with the CDFPLOT, COMPHISTOGRAM,
HISTOGRAM, PROBPLOT, and QQPLOT statements. See Output 6.2.1 for an illustration. The
default is 1.
WUSL=n
specifies the width in pixels of the upper specification line in plots created with the CDFPLOT,
COMPHISTOGRAM, HISTOGRAM, PROBPLOT, and QQPLOT statements. See Output 6.2.1 for an
illustration. The default is 1.
LSLSYMBOL=‘character ’
specifies the character used to display the lower specification line in line printer plots created with the
CDFPLOT, HISTOGRAM, PROBPLOT, and QQPLOT statements. The default character is ‘L’.
TARGETSYMBOL=‘character ’
TARGETSYM=‘character’
specifies the character used to display the target line in line printer plots created with the CDFPLOT,
HISTOGRAM, PROBPLOT, and QQPLOT statements. The default character is ‘T’.
USLSYMBOL=‘character ’
specifies the character used to display the upper specification line in line printer plots created with the
CDFPLOT, HISTOGRAM, PROBPLOT, and QQPLOT statements. The default character is ‘U’.
Syntax: CAPABILITY Procedure F 229
VAR Statement
VAR variables ;
The VAR statement specifies the analysis variables and their order in the results. By default, if you omit
the VAR statement, PROC CAPABILITY analyzes all numeric variables that are not listed in the other
statements.
You must provide a VAR statement when you use an OUTPUT statement. To store the same statistic for
several analysis variables in the OUT= data set, you specify a list of names in the OUTPUT statement. PROC
CAPABILITY makes a one-to-one correspondence between the order of the analysis variables in the VAR
statement and the list of names that follow a statistic keyword.
WEIGHT Statement
WEIGHT variable ;
The WEIGHT statement names a variable that provides weights for each observation in the input data set. The
CAPABILITY procedure uses the values wi of the WEIGHT variable to modify the computation of a number
of summary statistics by assuming that the variance of the ith value Xi of the analysis variable is equal to
2 =wi , where is an unknown parameter. This assumption is rarely applicable in process capability analysis,
and the purpose of the WEIGHT statement is simply to make the CAPABILITY procedure consistent with
other data summarization procedures, such as the UNIVARIATE procedure.
The values of the WEIGHT variable do not have to be integers and are typically positive. By default,
observations with non-positive or missing values of the WEIGHT variable are handled as follows:
If the value is zero, the observation is counted in the total number of observations.
If the value is negative, it is converted to zero, and the observation is counted in the total number of
observations.
To exclude observations that contain negative and zero weights from the analysis, specify the option
EXCLNPWGT in the PROC statement. Note that most SAS/STAT procedures, such as PROC GLM, exclude
negative and zero weights by default.
When you specify a WEIGHT variable, the procedure uses its values, wi , to compute weighted versions of
the statistics provided in the Moments
P table. For example, the procedure computes a weighted mean X w and
2 i wi xi 1 P
a weighted variance sw as X w D P and sw D d i wi .xi X w /2 where xi is the ith variable value.
2
i wi
The divisor d is controlled by the VARDEF= option in the PROC CAPABILITY statement.
When you use both the WEIGHT and SPEC statements, capability indices are computed using X w and sw in
place of X and s. Again, note that weighted capability indices are seldom needed in practice.
When you specify a WEIGHT statement, the procedure also computes a weighted standard error and a
weighted version of Student’s t test. This test is the only test of location that is provided when weights are
specified.
The WEIGHT statement does not affect the determination of the mode, extreme values, extreme observations,
or the number of missing values of the analysis variables. However, the weights wi are used to compute
weighted percentiles.
230 F Chapter 6: The CAPABILITY Procedure
The WEIGHT variable has no effect on the calculation of extreme values, and it has no effect on graphical
displays produced with the plot statements.
input data sets specified with the DATA= option, the SPEC= option, and the ANNOTATE= option
descriptive statistics
robust estimators
the same specification limits are referred to in more than one analysis
Details: CAPABILITY Procedure F 231
a BY statement is used
Variable Description
_LSL_ Lower specification limit
_TARGET_ Target value
_USL_ Upper specification limit
_VAR_ Name of the variable
You may omit either _LSL_ or _USL_ but not both. _TARGET_ is optional. If the SPEC= data set
contains both _LSL_ and _USL_, you can assign missing values to _LSL_ or _USL_ to indicate one-sided
specifications. You can assign missing values to _TARGET_ when the variable does not use a target value.
_LSL_, _USL_, and _TARGET_ must be numeric variables. _VAR_ must be a character variable.
You can include the optional variables listed in Table 6.5 in a SPEC= data set to control the appearance of
specification limits on charts.
Variable Description
_CLEFT_ Color used to fill area left of LSL (histograms only)
_CLSL_ Color of LSL line
_CRIGHT_ Color used to fill area right of USL (histograms only)
_CTARGET_ Color of target line
_CUSL_ Color of USL line
_LLSL_ Line type of LSL line
_LSLSYM_ Character used for LSL line in line printer plots
_LTARGET_ Line type of target line
_LUSL_ Line type of USL line
_PLEFT_ Pattern type used to fill area left of LSL (histograms only)
_PRIGHT_ Pattern type used to fill area right of USL (histograms only)
_TARGETSYM_ Character used for target in line printer plots
_USLSYM_ Character used for USL line in line printer plots
_WLSL_ Width of LSL line
_WTARGET_ Width of target line
_WUSL_ Width of USL line
If you are using the HISTOGRAM statement to create “clickable” histograms in HTML, you can also provide
232 F Chapter 6: The CAPABILITY Procedure
Variable Description
_LOURL_ URL associated with area to left of lower specification limit
_HIURL_ URL associated with area to right of upper specification limit
_URL_ URL associated with area between specification limits
These are character variables whose values are Uniform Resource Locators (URLs) linked to areas on a
histogram. When you view the ODS HTML output with a browser, you can click on an area, and the browser
will bring up the page specified by the corresponding URL.
If you use a BY statement, the SPEC= data set must also contain the BY variables. The SPEC= data set must
be sorted in the same order as the DATA= data set. Within a BY group, specification limits for each variable
plotted are read from the first observation where _VAR_ matches the variable name.
See the section “Examples: CAPABILITY Procedure” on page 260 for an example of reading specification
limits from a SPEC= data set.
Variable Description
_CP_ Capability index Cp
_CPLCL_ Lower confidence limit for Cp
_CPUCL_ Upper confidence limit for Cp
_CPK_ Capability index Cpk
_CPKLCL_ Lower confidence limit for Cpk
_CPKUCL_ Upper confidence limit for Cpk
_CPL_ Capability index CPL
_CPLLCL_ Lower confidence limit for CPL
_CPLUCL_ Upper confidence limit for CPL
_CPM_ Capability index Cpm
_CPMLCL_ Lower confidence limit for Cpm
_CPMUCL_ Upper confidence limit for Cpm
_CPU_ Capability index CPU
_CPULCL_ Lower confidence limit for CPU
_CPUUCL_ Upper confidence limit for CPU
_CSS_ Corrected sum of squares
_CV_ Coefficient of variation
_GEOMEAN_ Geometric mean
_GINI_ Gini’s mean difference
_HARMEAN_ Harmonic mean
_K_ Capability index K
_KURT_ Kurtosis
_LSL_ Lower specification limit
_MAD_ Median absolute difference about the median
_MAX_ Maximum
_MEAN_ Mean
_MEDIAN_ Median
_MIN_ Minimum
_MODE_ Mode
_MSIGN_ Sign statistic
_NMISS_ Number of missing observations
_NOBS_ Number of nonmissing observations
_P1_ 1st percentile
_P5_ 5th percentile
_P10_ 10th percentile
_P90_ 90th percentile
_P95_ 95th percentile
_P99_ 99th percentile
_PCTGTR_ Percentage of observations greater than upper specification limit
_PCTLSS_ Percentage of observations less than lower specification limit
_PROBM_ p-value of sign statistic
_PROBN_ p-value of test for normality
_PROBS_ p-value of signed rank test
_PROBT_ p-value of t statistic
234 F Chapter 6: The CAPABILITY Procedure
Variable Description
_Q1_ 25th percentile (lower quartile)
_Q3_ 75th percentile (upper quartile)
_QN_ Qn (see “Robust Estimates of Scale” on page 244)
_QRANGE_ Interquartile range (upper quartile minus lower quartile)
_RANGE_ Range
_SGNRNK_ Centered sign rank
_SKEW_ Skewness
_SN_ Sn (see “Robust Estimates of Scale” on page 244)
_STD_ Standard deviation
_STDGINI_ Gini’s standard deviation
_STDMAD_ MAD standard deviation
_STDMEAN_ Standard error of the mean
_STDQN_ Qn standard deviation
_STDQRANGE_ Interquartile range standard deviation
_STDSN_ Sn standard deviation
_SUMWGT_ Sum of the weights
_SUM_ Sum
_TARGET_ Target value
_USL_ Upper specification limit
_USS_ Uncorrected sum of squares
_VARI_ Variance
_VAR_ Variable name
N OTE : The variables _CP_, _CPLCL_, _CPUCL_, _CPK_, _CPKLCL_, _CPKUCL_, _CPL_, _CPLLCL_,
_CPLUCL_, _CPM_, _CPMLCL_, _CPMUCL_, _CPU_, _CPULCL_, _CPUUCL_, _K_, _LSL_, _PCTGTR_,
_PCTLSS_, _TARGET_, and _USL_ are included if you provide specification limits.
The OUTTABLE= data set and the OUT= data set2 contain essentially the same information. However,
the structure of the OUTTABLE= data set may be more appropriate when you are computing summary
statistics or capability indices for more than one process variable in the same invocation of the CAPABILITY
procedure. Each observation in the OUTTABLE= data set corresponds to a different process variable, and
the variables in the data set correspond to summary statistics and indices.
N OTE : See Tabulating Results for Multiple Variables in the SAS/QC Sample Library.
For example, suppose you have ten process variables (P1-P10). The following statements create an OUT-
TABLE= data set named Table, which contains summary statistics and capability indices for each of these
variables:
2 See “OUTPUT Statement: CAPABILITY Procedure” on page 439 for details on the OUT= data set.
Details: CAPABILITY Procedure F 235
The following statements create the table shown in Figure 6.4, which contains the mean, standard deviation,
lower and upper specification limits, and capability index Cpk for each process variable:
Descriptive Statistics
This section provides computational details for the descriptive statistics which are computed with the PROC
CAPABILITY statement. These statistics can also be saved in the OUT= data set by specifying the keywords
listed in Table 6.59 in the OUTPUT statement.
Standard algorithms (Fisher 1973) are used to compute the moment statistics. The computational methods
used by the CAPABILITY procedure are consistent with those used by other SAS procedures for calculating
descriptive statistics. For details on statistics also calculated by Base SAS software, see Base SAS Procedures
Guide.
The following sections give specific details on several statistics calculated by the CAPABILITY procedure.
Mean
The sample mean is calculated as
Pn
wi xi
PiD1
n
i D1 wi
where n is the number of nonmissing values for a variable, xi is the ith value of the variable, and wi is the
weight associated with the ith value of the variable. If there is no WEIGHT= variable, the formula reduces to
1 Pn
n i D1 xi .
Sum
The sum is calculated as niD1 wi xi , where n is the number of nonmissing values for a variable, xi is the
P
ith value of the variable, and wi is the weight
Pnassociated with the ith value of the variable. If there is no
WEIGHT= variable, the formula reduces to i D1 xi .
236 F Chapter 6: The CAPABILITY Procedure
Variance
The variance is calculated as
n
1X
wi .xi XN w /2
d
i D1
where n is the number of nonmissing values for a variable, xi is the ith value of the variable, XN w is the
weighted mean, wi is the weight associated with the ith value of the variable, and d is the divisor controlled
by the VARDEF= option in the PROC CAPABILITY statement. If there is no WEIGHT= variable, the
formula reduces to
n
1X
.xi XN w /2
d
i D1
Standard Deviation
The standard deviation is calculated as
v
u n
u1 X
t wi .xi XN w /2
d
iD1
where n is the number of nonmissing values for a variable, xi is the ith value of the variable, XN w is the
weighted mean, wi is the weight associated with the ith value of the variable, and d is the divisor controlled
by the VARDEF= option in the PROC CAPABILITY statement. If there is no WEIGHT= variable, the
formula reduces to
v
u n
u1 X
t .xi XN w /2
d
iD1
Skewness
The sample skewness is calculated as
n
!3
n X xi XN
.n 1/.n 2/ s
i D1
Details: CAPABILITY Procedure F 237
where n is the number of nonmissing values for a variable and must be greater than 2, xi is the ith value of
the variable, XN is the sample average, and s is the sample standard deviation.
The sample skewness can be positive or negative; it measures the asymmetry of the data distribution and
p 3
estimates the theoretical skewness ˇ1 D 3 2 2 , where 2 and 3 are the second and third central
moments. Observations that are normally distributed should have a skewness near zero.
Kurtosis
The sample kurtosis is calculated as
n
!4
n.n C 1/ X xi XN 3.n 1/2
.n 1/.n 2/.n 3/ s .n 2/.n 3/
i D1
where n > 3. The sample kurtosis measures the heaviness of the tails of the data distribution. It estimates
the adjusted theoretical kurtosis denoted as ˇ2 3, where ˇ2 D 42 , and 4 is the fourth central moment.
2
Observations that are normally distributed should have a kurtosis near zero.
Geometric Mean
The geometric mean is calculated as
n
!1= PniD1 wi
xiwi
Y
i D1
where n is the number of nonmissing values for a variable, xi is the ith value of the variable, and wi is the
weight associated with the ith value of the variable.
If there is no WEIGHT variable, the formula reduces to
n
!1=n
Y
xi
i D1
where riC is the rank of jxi 0 j after discarding values of xi D 0 , and n is the number of xi values not
equal to 0 . Average ranks are used for tied values.
238 F Chapter 6: The CAPABILITY Procedure
If n 20, the significance of S is computed from the exact distribution of S, where the distribution is a
convolution of scaled binomial distributions. When n > 20, the significance of S is computed by treating
r
n 1
S
nV S 2
as a Student t variate with n 1 degrees of freedom. V is computed as
1 1 X
V D n.n C 1/.2n C 1/ ti .ti C 1/.ti 1/
24 48
where the sum is over groups tied in absolute value and where ti is the number of values in the ith group
(Iman 1974, Conover 1980). The null hypothesis tested is that the mean (or median) is 0 , assuming that the
distribution is symmetric. Refer to Lehmann and D’Abrera (1975).
Shapiro-Wilk test
Kolmogorov-Smirnov test
Anderson-Darling test
Tests for normality are particularly important in process capability analysis because the commonly used
capability indices are difficult to interpret unless the data are at least approximately normally distributed.
Furthermore, the confidence limits for capability indices displayed in the table labeled Process Capability
Indices require the assumption of normality. Consequently, the tests of normality are always computed when
you specify the SPEC statement, and a note is added to the table when the hypothesis of normality is rejected.
You can specify the particular test and the significance level with the CHECKINDICES option.
Shapiro-Wilk Test
If the sample size is 2000 or less, the procedure computes the Shapiro-Wilk statistic W (also denoted as
Wn to emphasize its dependence on the sample size n). The statistic Wn is the ratio of the best estimator
of the variance (based on the square of a linear combination of the order statistics) to the usual corrected
sum of squares estimator of the variance. When n is greater than three, the coefficients to compute the linear
combination of the order statistics are approximated by the method of Royston (1992). The statistic Wn is
always greater than zero and less than or equal to one .0 < W 1/.
Small values of W lead to rejection of the null hypothesis. The method for computing the p-value (the
probability of obtaining a W statistic less than or equal to the observed value) depends on n. For n = 3,
the probability distribution of W is known and is used to determine the p-value. For n > 4, a normalizing
transformation is computed:
. log. log.1 Wn // /= if 4 n 11
Zn D
.log.1 Wn / /= if 12 n 2000
Details: CAPABILITY Procedure F 239
The values of , , and are functions of n obtained from simulation results. Large values of Zn indicate
departure from normality, and because the statistic Zn has an approximately standard normal distribution,
this distribution is used to determine the p-values for n > 4.
Note that Fn .x/ is a step function that takes a step of height n1 at each observation. This function estimates
the distribution function F .x/. At any value x, Fn .x/ is the proportion of observations less than or equal to x,
while F .x/ is the probability of an observation less than or equal to x. EDF statistics measure the discrepancy
between Fn .x/ and F .x/.
The EDF tests make use of the probability integral transformation U D F .X /. If F .X / is the distribution
function of X, the random variable U is uniformly distributed between 0 and 1. Given n observations
X.1/ ; : : : ; X.n/ , the values U.i / D F .X.i / / are computed. These values are used to compute the EDF test
statistics, as described in the next three sections. The CAPABILITY procedures computes the associated
p-values by interpolating internal tables of probability levels similar to those given by D’Agostino and
Stephens (1986).
Kolmogorov-Smirnov Test
The Kolmogorov-Smirnov statistic (D) is defined as
D D supx jFn .x/ F .x/j
The Kolmogorov-Smirnov statistic belongs to the supremum class of EDF statistics. This class of statistics is
based on the largest vertical difference between F .x/ and Fn .x/.
The Kolmogorov-Smirnov statistic is computed as the maximum of D C and D , where D C is the largest
vertical distance between the EDF and the distribution function when the EDF is greater than the distribution
function, and D is the largest vertical distance when the EDF is less than the distribution function.
D C D maxi ni U.i /
D D maxi U.i / i n1
D max D C ; D
D
PROC CAPABILITY uses a modified Kolmogorov D statistic to test the data against a normal distribution
with mean and variance equal to the sample mean and variance.
240 F Chapter 6: The CAPABILITY Procedure
Anderson-Darling Test
The Anderson-Darling statistic and the Cramér–von Mises statistic belong to the quadratic class of EDF
statistics. This class of statistics is based on the squared difference .Fn .x/ F .x//2 . Quadratic statistics
have the following general form:
Z C1
QDn .Fn .x/ F .x//2 .x/dF .x/
1
The function .x/ weights the squared difference .Fn .x/ F .x//2 .
The Anderson-Darling statistic (A2 ) is defined as
Z C1
A2 D n .Fn .x/ F .x//2 ŒF .x/ .1 F .x// 1
dF .x/
1
1
Here the weight function is .x/ D ŒF .x/ .1 F .x// .
The Anderson-Darling statistic is computed as
n
1 X
A2 D
n .2i 1/ log U.i / C .2n C 1 2i / log f1 U.i /
n
i D1
Percentile Computations
The CAPABILITY procedure automatically computes the 1st, 5th, 10th, 25th, 50th, 75th, 90th, 95th, and
99th percentiles (quantiles), as well as the minimum and maximum of each analysis variable. To compute
percentiles other than these default percentiles, use the PCTLPTS= and PCTLPRE= options in the OUTPUT
statement.
You can specify one of five definitions for computing the percentiles with the PCTLDEF= option. Let n be
the number of nonmissing values for a variable, and let x1 ; x2 ; : : : ; xn represent the ordered values of the
t
variable. Let the tth percentile be y, set p D 100 , and let
np D j C g when PCTLDEF=1, 2, 3, or 5
.n C 1/p D j C g when PCTLDEF=4
where j is the integer part of np, and g is the fractional part of np. Then the PCTLDEF= option defines the
tth percentile, y, as described in Table 6.7.
Details: CAPABILITY Procedure F 241
y D xi if g ¤ 21
2 Observation numbered y D xj if g D 21 and j is even
closest to np y D xj C1 if g D 21 and j is odd
where i is the integer part of np C 12
y D xj if g D 0
3 Empirical distribution function
y D xj C1 if g > 0
y D 12 .xj C xj C1 / if g D 0
5 Empirical distribution function
y D xj C1 if g > 0
with averaging
Weighted Percentiles
When you use a WEIGHT statement, the percentiles are computed differently. The 100pth weighted percentile
y is computed from the empirical distribution function with averaging
8
ˆ
< x1 if w1 > pW
1
.xi C xi C1 / if ij D1 wj D pW
P
yD 2
if ij D1 wj < pW < ijC1
ˆ P P
: xi C1 D1 wj
where wi is the weight associated with xi , and where W D niD1 wi is the sum of the weights.
P
Note that the PCTLDEF= option is not applicable when a WEIGHT statement is used. However, in this
case, if all the weights are identical, the weighted percentiles are the same as the percentiles that would be
computed without a WEIGHT statement and with PCTLDEF=5.
where n is the sample size. When 0:5 p < 1:0, the two-sided 100.1 ˛/% confidence limits for the
100p-th percentile are
One-sided 100.1 ˛/% confidence bounds are computed by replacing ˛=2 by ˛ in the appropriate preceding
equation. The factor g 0 . ; p; n/ is related to the noncentral t distribution and is described in Owen and Hua
(1977) and Odeh and Owen (1980).
You can use the CIPCTLDF option to request confidence limits for percentiles which are distribution free (in
particular, it is not necessary to assume that the data are normally distributed). These limits are described
in Section 5.2 of Hahn and Meeker (1991). The two-sided 100.1 ˛/% confidence limits for the 100p-th
percentile are
where X.j / is the jth order statistic when the data values are arranged in increasing order:
The lower rank l and upper rank u are integers that are symmetric (or nearly symmetric) around bnpc C 1
where bnpc is the integer part of np, and where n is the sample size. Furthermore, l and u are chosen so that
X.l/ and X.u/ are as close to XbnpcC1 as possible while satisfying the coverage probability requirement
Q.u 1I n; p/ Q.l 1I n; p/ 1 ˛
In some cases, the coverage requirement cannot be met, particularly when n is small and p is near 0 or 1. To
relax the requirement of symmetry, you can specify CIPCTLDF( TYPE = ASYMMETRIC ). This option
requests symmetric limits when the coverage requirement can be met, and asymmetric limits otherwise.
If you specify CIPCTLDF( TYPE = LOWER ), a one-sided 100.1 ˛/% lower confidence bound is computed
as Xl , where l is the largest integer that satisfies the inequality
1 Q.l 1I n; p/ 1 ˛
with 0 < l n. If you specify CIPCTLDF( TYPE = UPPER ), a one-sided 100.1 ˛/% upper confidence
bound is computed as Xu , where u is the smallest integer that satisfies the inequality
Q.u 1I n; p/ 1 ˛
where 0 < u n.
Note that confidence limits for percentiles are not computed when a WEIGHT statement is specified.
Details: CAPABILITY Procedure F 243
Robust Estimators
The CAPABILITY procedure provides several methods for computing robust estimates of location and scale,
which are insensitive to outliers in the data.
Winsorized Means
The k-times Winsorized mean is a robust estimator of location which is computed as
0 1
nX k 1
1
xN wk D @.k C 1/x.kC1/ C x.i / C .k C 1/x.n k/ A
n
i DkC2
where n is the number of observations, and x.i / is the ith order statistic when the observations are arranged in
increasing order:
The Winsorized mean is the mean computed after replacing the k smallest observations with the (k + 1)st
smallest observation, and the k largest observations with the (k + 1)st largest observation.
For data from a symmetric distribution, the Winsorized mean is an unbiased estimate of the population mean.
However, the Winsorized mean does not have a normal distribution even if the data are normally distributed.
The Winsorized sum of squared deviations is defined as
nX
k 1
2 2
swk D .k C 1/.x.kC1/ xN wk / C .x.i / xN wk /2 C .k C 1/.x.n k/ xN wk /2
i DkC2
When the data are from a symmetric distribution, the distribution of twk is approximated by a Student’s t
distribution with n 2k 1 degrees of freedom. Refer to Tukey and McLaughlin (1963) and Dixon and
Tukey (1968).
A 100.1 ˛/% Winsorized confidence interval for the mean has upper and lower limits
xN wk ˙ t1 N wk /
˛=2 STDERR.x
where t1 ˛=2 is the .1 ˛=2/100th percentile of the Student’s t distribution with n 2k 1 degrees of
freedom.
244 F Chapter 6: The CAPABILITY Procedure
Trimmed Means
The k-times trimmed mean is a robust estimator of location which is computed as
n
Xk
1
xN t k D x.i /
n 2k
i DkC1
where n is the number of observations, and x.i / is the ith order statistic when the observations are arranged in
increasing order:
The trimmed mean is the mean computed after the k smallest observations and the k largest observations in
the sample are deleted.
For data from a symmetric distribution, the trimmed mean is an unbiased estimate of the population mean.
However, the trimmed mean does not have a normal distribution even if the data are normally distributed.
A robust estimate of the variance of the trimmed mean tt k can be obtained from the Winsorized sum of
squared deviations; refer to Tukey and McLaughlin (1963). the corresponding trimmed t test is given by
xN t k 0
tt k D
STDERR.xN t k /
where the standard error of the trimmed mean is
st k
STDERR.xN t k / D p
.n 2k/.n 2k 1/
and swk is the square root of the Winsorized sum of squared deviations.
When the data are from a symmetric distribution, the distribution of tt k is approximated by a Student’s t
distribution with n 2k 1 degrees of freedom. Refer to Tukey and McLaughlin (1963) and Dixon and
Tukey (1968).
A 100.1 ˛/% trimmed confidence interval for the mean has upper and lower limits
xN t k ˙ t1 Nt k /
˛=2 STDERR.x
where t1 ˛=2 is the .1 ˛=2/100th percentile of the Student’s t distribution with n 2k 1 degrees of
freedom.
where the inner median, medj .xj /, is the median of the n observations, and the outer median (taken over i)
is the median of the n absolute values of the deviations about the inner median. For a normal population,
1.4826MAD is an estimator of .
The MAD has low efficiency for normal distributions, and it may not always be appropriate for symmetric
distributions. Rousseeuw and Croux (1993) proposed two statistics as alternatives to the MAD. The first is
where the outer median (taken over i) is the median of the n medians of jxi xj j, j D 1; 2; : : : ; n. To
reduce small-sample bias, csn Sn is used to estimate , where csn is a correction factor; refer to Croux and
Rousseeuw (1992).
The second statistic is
where
h
kD
2
n
and h D Œn=2 C 1. In other words, Qn is 2.2219 times the kth order statistic of the distances between
2
the data points. The bias-corrected statistic cq n Qn is used to estimate , where cq n is a correction factor;
refer to Croux and Rousseeuw (1992).
are computed. The control chart analysis yields estimates for the process mean and standard deviation ,
which are based on subgrouped data and can be used to estimate Cpk . In particular, can be estimated by
N 2
sR D R=d
You can use the SHEWHART procedure to carry out the control chart analysis and to compute capability
indices based on sR . On the other hand, the CAPABILITY procedure computes indices based on s.
Some industry manuals distinguish these two approaches. For instance, the ASQC/AIAG manual Fundamen-
tal Process Control uses the notation Cpk for the estimate based on sR , and it uses the notation Ppk for the
estimate based on s. However, assuming that the process is in control and only common cause variation is
present, both sR and s are estimates of the same parameter , and so there is fundamentally no difference in
the two approaches2 .
Once control has been established, attention should focus on the distribution of the process measurements,
and at this point there is no practical or statistical advantage to working with subgrouped measurements.
In fact, the use of s is closely associated with a wide variety of methods that are highly useful for process
capability analysis, including tests for normality, graphical displays such as histograms and probability plots,
and confidence intervals for parameters and capability indices.
The Index Cp
The process capability index Cp , sometimes called the “process potential index,” the “process capability
ratio,” or the “inherent capability index,” is estimated as
b p D USL
C
LSL
6s
2 Statistically, s is a more efficient estimator of than sR .
Details: CAPABILITY Procedure F 247
where USL is the upper specification limit, LSL is the lower specification limit, and s is the sample standard
deviation. If you do not specify both the upper and the lower specification limits in the SPEC statement or
the SPEC= data set, then Cp is assigned a missing value.
The interpretation of Cp can depend on the application, on past experience, and on local practice. However,
broad guidelines for interpretation have been proposed by several authors. Ekvall and Juran (1974) classify
Cp values as
1.50 for new processes or for existing processes when the variable is critical (for example, related to
safety or strength)
Exact 100.1 ˛/% lower and upper confidence limits for Cp (denoted by LCL and UCL) are computed
using percentiles of the chi-square distribution, as indicated by the following equations:
q
lower limit D COp 2˛=2;n 1
=.n 1/
q
upper limit D COp 21 ˛=2;n 1
=.n 1/
Here, 2˛; denotes the lower 100˛th percentile of the chi-square distribution with degrees of freedom.
Refer to Chou, Owen, and Borrego (1990) and Kushler and Hurley (1992).
You can specify ˛ with the ALPHA= option in the PROC CAPABILITY statement or with the CIINDICES(
ALPHA=value ) in the PROC CAPABILITY statement. The default value is 0.05. You can save these limits
in the OUT= data set by specifying the keywords CPLCL and CPUCL in the OUTPUT statement. In addition,
you can display these limits on plots produced by the CAPABILITY procedure by specifying the keywords
in the INSET statement.
b
CPL D
XN LSL
3s
where XN is the sample mean, LSL is the lower specification limit, and s is the sample standard deviation. If
you do not specify the lower specification limit in the SPEC statement or the SPEC= data set, then CPL is
assigned a missing value.
Montgomery (1996) refers to CPL as the “process capability ratio” in the case of one-sided lower specifica-
tions and recommends minimum values as follows:
248 F Chapter 6: The CAPABILITY Procedure
Exact 100.1 ˛/% lower and upper confidence limits for CPL are computed using a generalization of the
method of Chou, Owen, and Borrego (1990), who point out that the 100.1 ˛/ lower confidence limit for
CPL (denoted by CPLLCL )satisfies the equation
p p
PrfTn 1 .ı D 3 n/ CPLLCL 3CPL ng D 1 ˛
where Tn 1 .ı/ has a non-central t distribution with n – 1 degrees of freedom and noncentrality parameter ı.
You can specify ˛ with the ALPHA= option in the PROC CAPABILITY statement. The default value is 0.05.
The confidence limits can be saved in an output data set by specifying the keywords CPLLCL and CPLUCL
in the OUTPUT statement. In addition, you can display these limits on plots produced by the CAPABILITY
procedure by specifying these keywords in the INSET statement.
1
CPU D
USL XN
3s
where USL is the upper specification limit, XN is the sample mean, and s is the sample standard deviation. If
you do not specify the upper specification limit in the SPEC statement or the SPEC= data set, then CPU is
assigned a missing value.
Montgomery (1996) refers to CPU as the “process capability ratio” in the case of one-sided upper specifica-
tions and recommends minimum values that are the same as those specified previously for CPL.
Exact 100.1 ˛/% lower and upper confidence limits for CPU are computed using a generalization of the
method of Chou, Owen, and Borrego (1990), who point out that the 100.1 ˛/ lower confidence limit for
CPU (denoted by CPULCL )satisfies the equation
p p
PrfTn 1 .ı D 3 n CPULCL 3CPU ng D 1 ˛
where Tn 1 .ı/ has a non-central t distribution with n – 1 degrees of freedom and noncentrality parameter ı.
You can specify ˛ with the ALPHA= option in the PROC CAPABILITY statement. The default value is 0.05.
The confidence limits can be saved in an output data set by specifying the keywords CPULCL and CPUUCL
in the OUTPUT statement. In addition, you can display these limits on plots produced by the CAPABILITY
procedure by specifying these keywords in the INSET statement.
b pk D 1 min.USL
C XN ; XN LSL/ D min.CPU ; CPL/
3s
Details: CAPABILITY Procedure F 249
where USL is the upper specification limit, LSL is the lower specification limit, XN is the sample mean, and s
is the sample standard deviation.
If you specify only the upper limit in the SPEC statement or the SPEC= data set, then Cpk is computed as
CPU , and if you specify only the lower limit in the SPEC statement or the SPEC= data set, then Cpk is
computed as CPL.
Bissell (1990) derived approximate two-sided 95% confidence limits for Cpk by assuming that the distribution
b pk is normal. Using Bissell’s approach, 100.1 ˛/% lower and upper confidence limits can be computed
of C
as
" s #
1 1 1
lower limit D C
b pk 1 ˆ .1 ˛=2/ C
9nCb2 2.n 1/
pk
" s #
1 1 1
upper limit D C
b pk 1 C ˆ .1 ˛=2/ C
b2
9nC 2.n 1/
pk
where ˆ denotes the cumulative standard normal distribution function. Kushler and Hurley (1992) concluded
that Bissell’s method gives reasonably accurate results.
You can specify ˛ with the ALPHA= option in the PROC CAPABILITY statement. The default value is
0.05. These limits can be saved in an output data set by specifying the keywords CPKLCL and CPKUCL in
the OUTPUT statement. In addition, you can display these limits on plots produced by the CAPABILITY
procedure by specifying these same keywords in the INSET statement.
USL LSL
Cpm D p
6 2 C . T /2
where d D .USL LSL/=2 and m D .USL C LSL/=2. If T D m, then Cpm D Cpm . However, if T ¤ m,
then both indices suffer from problems of interpretation, as pointed out by Kotz and Johnson (1993), and
their use should be avoided in this case.
The CAPABILITY procedure computes an estimator of Cpm as
b pm D min.USL T; T LSL/
C p
3 s 2 C .XN T /2
where s is the sample standard deviation.
250 F Chapter 6: The CAPABILITY Procedure
If you specify only a single specification limit SL in the SPEC statement or the SPEC= data set, then Cpm is
estimated as
b pm D p jT SLj
C
3 s 2 C .XN T /2
Boyles (1991) proposed a slightly modified point estimate for Cpm computed as
e pm D q .USL LSL/=2
C
3 . n n 1 /s 2 C .XN T /2
Boyles also suggested approximate two-sided 100.1 ˛/% confidence limits for Cpm , which are computed
as
q
lower limit D C e pm 2 =
˛=2;
q
upper limit D C e pm 2 =
1 ˛=2;
Here 2˛; denotes the lower 100˛th percentile of the chi-square distribution with degrees of freedom,
where equals
N
n.1 C . X s T /2 /
N
1 C 2. X s T /2
You can specify ˛ with the ALPHA= option in the PROC CAPABILITY statement. The default value is
0.05. These confidence limits can be saved in an output data set by specifying the keywords CPMLCL and
CPMUCL in the OUTPUT statement. In addition, you can display these limits on plots produced by the
CAPABILITY procedure by specifying these keywords in the INSET statement.
The Index k
The process capability index k (also denoted by K) is computed as
2jm XN j
kD
USL LSL
where m D 12 .USL C LSL/ is the midpoint of the specification limits, XN is the sample mean, USL is the
upper specification limit, and LSL is the lower specification limit.
The formula for k used here is given by Kane (1986). Note that k is sometimes computed without taking the
absolute value of m XN in the numerator. See Wadsworth, Stephens, and Godfrey (1986).
If you do not specify the upper and lower limits in the SPEC statement or the SPEC= data set, then k is
assigned a missing value.
Details: CAPABILITY Procedure F 251
C
Boyles’ Index Cpm
C which is defined as
Boyles (1992) proposed the process capability index Cpm
" # 1=2
1 EX <T .X T /2 EX>T .X T /2
C
Cpm D C
3 .T LSL/2 .USL T /2
He proposed this index as a modification of Cpm for use when ¤ T . The quantities
and
T /2 D E .X T /2 jX > T P r ŒX > T
EX >T .X
are referred to as semivariances. Kotz and Johnson (1993) point out that if T D .LSL C USL/=2, then
C DC
Cpm pm .
C is
Kotz and Johnson (1993) suggest that a natural estimator for Cpm
2 (P 3
2
P 2
) 1=2
1 41 Xi <T .X i T / Xi >T .Xi T /
bC
C pm D C 5
3 n .T LSL/2 .USL T /2
Note that this index is not defined when either of the specification limits is equal to the target T. Refer to
Section 3.5 of Kotz and Johnson (1993) for further details.
For further details, refer to Section 4.4 of Kotz and Johnson (1993).
where D . T /= . The motivation for this definition is that if jj is small, then
1 2
Cpm .1 /Cp
2
A natural estimator of Cpm .a/ is
8 !2 9
d b < N
X T =
C pm .a/ D 1 a
3s : s ;
where d D .USL LSL/=2. You can specify the value of a with the SPECIALINDICES(CPMA=) option in
the PROC CAPABILITY statement. By default, a = 0.5.
This index is not recommended for situation in which the target T is not equal to the midpoint of the
specification limits.
For additional details, refer to Section 3.7 of Kotz and Johnson (1993).
where is chosen so that the proportion of conforming items is robust with respect to the shape of the process
distribution. In particular, Kotz and Johnson (1993) recommend use of
USL LSL
Cp.5:15/ D
5:15
which is estimated as
d j .USL C LSL/=2j
Cpk .5:15/ D
2:575
.USL LSL/=2 j mj
Cpmk D p
3 2 C . T /2
where m D .LCL C UCL/=2. A natural estimator for Cpmk is
Wright’s Index Cs
Wright (1995) defines the capability index
min .USL ; LSL/
Cs D p
3 2 C . T /2 C 3 =
where c4 is an unbiasing constant for the sample standard deviation, and b3 is a measure of skewness. Wright
(1995) shows that Cs compares favorably with Cpmk even when skewness is not present, and he advocates
the use of Cs for monitoring near-normal processes when loss of capability typically leads to asymmetry.
Chen and Kotz (1996) proposed a modification to Wright’s Cs index which introduces a multiplier, > 0,
and is estimated as
N
b s D q .USL LSL/=2 jX mj
C
3 n n 1 s 2 C .XN T /2 C jc4 s 2 b3 j
If you specify a value for with the SPECIALINDICES(CSGAMMA=) option, the index Cs is computed
with this modification. Otherwise it is computed using Wright’s original definition.
!
USL T T LSL
Sjkp D S p ;p
2EX >T Œ.X T /2 2EX <T Œ.X T /2
254 F Chapter 6: The CAPABILITY Procedure
0 1
USL T T LSL
S jkp D S @ q P ;q P
b B C
A
2 Xi >T .Xi T /2 =n 2 Xi <T .Xi T /2 =n
!2 2
XN T
b pp D s
C C
d =3
d =3
!2
00
b pp D A
b s
C
C
d =3 d =3
where
( )
.XN T /d .T XN /d
A D max
b ;
T LSL USL T
1
Cpg D 2
Cpm
which is estimated as
b pg D 1
C
b 2pm
C
Details: CAPABILITY Procedure F 255
2 !2 3
b pq D C 1 XN T
C b p 41 5
2 s
Cp
CpW D p
1 C j1 2Px j
bW C
bp
C p D q
1 C j1 bx j
2P
W
The Index Cpk
Bai and Choi (1997) also proposed the index
( )
W USL LSL
Cpk D min p ; p
3 2Px 3 2.1 Px /
It is estimated by
8 9
< USL XN
ˆ
XN LSL =
>
bW
C pk D min q ; q
ˆ b x 3s 2.1 P
: 3s 2P bx / >
;
W
The Index Cpm
W , also introduced by Bai and Choi (1997), is defined as
The index Cpm
W Cpm
Cpm Dp
1 C j1 2PT j
bW C
b pm
C pm D q
1 C j1 bT j
2P
USL LSL
Cpc D q
6 2 EjX M j
b pc D USLq LSL
C
6 2 c
where
n
1X
cD jXi Mj
n
i D1
Cp .0; 0/ D Cp
Cp .0; 1/ D Cpk
Cp .1; 0/ D Cpm
Details: CAPABILITY Procedure F 257
Cp .1; 1/ D Cpmk
Cp .u; v/ is defined as
d uj M j
Cp .u; v/ D p
3 2 C v. T /2
and estimated by
N
b p .u; v/ D q d ujX M j
C
3 . n n 1 /s 2 C v.XN T /2
You can specify u with the SPECIALINDICES(CPU=) option and v with the SPECIALINDICES(CPV=)
option. By default, u = 0 and v = 4.
N
b p .v/ D q d jX M j
C
3 . n n 1 /s 2 C v.XN T /2
Missing Values
If a variable for which statistics are calculated has a missing value, that value is ignored in the calculation of
statistics, and the missing values are tabulated separately. A missing value for one such variable does not
affect the treatment of other variables in the same observation.
If the WEIGHT variable has a missing value, the observation is excluded from the analysis. If the FREQ
variable has a missing value, the observation is excluded from the analysis. If a variable in a BY or ID
statement has a missing value, the procedure treats it as it would treat any other value of a BY or ID variable.
ODS Tables
This section describes the ODS tables produced by the CAPABILITY procedure.
Table 6.8 summarizes the ODS tables that you can request with options in the PROC CAPABILITY statement.
258 F Chapter 6: The CAPABILITY Procedure
Table 6.9 summarizes the ODS tables related to capability indices that you can request with options in the
PROC CAPABILITY statement when you provide specification limits with a SPEC statement or with a
SPEC= data set.
Table 6.10 summarizes the ODS tables related to fitted distributions that you can request with options in the
HISTOGRAM statement.
Details: CAPABILITY Procedure F 259
Table 6.11 summarizes the ODS tables that you can request with options in the INTERVALS statement.
The following DATA step creates a data set named Limits containing specification limits for the fluid weight
and the can weight. Limits has 4 variables (_VAR_, _LSL_, _USL_, and _TARGET_) and 2 observations. The
first observation contains the specification limit information for the variable Weight, and the second contains
the specification limit information for the variable Cweight.
data Limits;
length _var_ $8;
_var_ = 'Weight';
_lsl_ = 11.95;
_target_ = 12;
_usl_ = 12.05;
output;
_var_ = 'Cweight';
_lsl_ = 0.90;
_target_ = 1;
_usl_ = 1.10;
output;
run;
The following statements read the specification information from the Limits data set into the CAPABILITY
procedure by using the SPEC= option. These statements print summary statistics, capability indices, and
specification limit information for Weight and Cweight. Figure 6.1 and Figure 6.2 display the output for
Weight. Output 6.1.2 displays the output for Cweight.
Examples: CAPABILITY Procedure F 261
Moments
N 100 Sum Weights 100
Mean 1.004 Sum Observations 100.4
Std Deviation 0.06330941 Variance 0.00400808
Skewness -0.074821 Kurtosis -0.5433858
Uncorrected SS 101.1984 Corrected SS 0.3968
Coeff Variation 6.30571767 Std Error Mean 0.00633094
Quantiles (Definition 5)
Level Quantile
100% Max 1.150
99% 1.140
95% 1.105
90% 1.080
75% Q3 1.045
50% Median 1.000
25% Q1 0.960
10% 0.910
5% 0.900
1% 0.870
0% Min 0.860
Extreme Observations
Lowest Highest
Value Obs Value Obs
0.86 2 1.11 42
0.88 89 1.12 28
0.88 64 1.12 34
0.90 68 1.13 48
0.90 59 1.15 52
Specification Limits
Limit Percent
Lower (LSL) 0.900000 % < LSL 3.00000
Target 1.000000 % Between 92.00000
Upper (USL) 1.100000 % > USL 5.00000
data Amps;
label Decibels = 'Amplification in Decibels (dB)';
input Decibels @@;
datalines;
4.54 4.87 4.66 4.90 4.68 5.22 4.43 5.14 3.07 4.22
5.09 3.41 5.75 5.16 3.96 5.37 5.70 4.11 4.83 4.51
4.57 4.16 5.73 3.64 5.48 4.95 4.57 4.46 4.75 5.38
5.19 4.35 4.98 4.87 3.53 4.46 4.57 4.69 5.27 4.67
5.03 4.50 5.35 4.55 4.05 6.63 5.32 5.24 5.73 5.08
5.07 5.42 5.05 5.70 4.79 4.34 5.06 4.64 4.82 3.24
4.79 4.46 3.84 5.05 5.46 4.64 6.13 4.31 4.81 4.98
4.95 5.57 4.11 4.15 5.95
;
The SPEC statement provides several options to control the appearance of reference lines for the specification
limits and the target value. The following statements use the data set Amps to create a histogram that
demonstrates some of these options:
be displayed using the INSET statement (as shown in Output 6.3.1) or saved in an output data set by using
the OUTPUT statement. For formulas and details about capability indices, see the section “Specialized
Capability Indices” on page 250. For more information about the INSET statement, see “INSET Statement:
CAPABILITY Procedure” on page 398.
The following statements can be used to produce a table of process capability indices including the index
Cpk :
where N is the number of nonmissing observations. The cdf is an increasing step function that has a vertical
jump of N1 at each value of x equal to an observed value. The cdf is also referred to as the empirical
cumulative distribution function (ecdf).
You can use options in the CDFPLOT statement to do the following:
You can also create a comparative cdf plot by using the CDFPLOT statement in conjunction with a CLASS
statement.
You have three alternatives for producing cdf plots with the CDFPLOT statement:
ODS Graphics output is produced if ODS Graphics is enabled, for example by specifying the ODS
GRAPHICS ON statement prior to the PROC statement.
Legacy line printer charts are produced when you specify the LINEPRINTER option in the PROC
statement.
See Chapter 4, “SAS/QC Graphics,” for more information about producing these different kinds of graphs.
data Cord;
label Strength="Breaking Strength (psi)";
input Strength @@;
datalines;
6.94 6.97 7.11 6.95 7.12 6.70 7.13 7.34 6.90 6.83
7.06 6.89 7.28 6.93 7.05 7.00 7.04 7.21 7.08 7.01
7.05 7.11 7.03 6.98 7.04 7.08 6.87 6.81 7.11 6.74
6.95 7.05 6.98 6.94 7.06 7.12 7.19 7.12 7.01 6.84
6.91 6.89 7.23 6.98 6.93 6.83 6.99 7.00 6.97 7.01
;
variables
specify variables for which to create cdf plots. If you specify a VAR statement, the variables must also
be listed in the VAR statement. Otherwise, the variables can be any numeric variables in the input
data set. If you do not specify variables in a CDFPLOT statement, then a cdf plot is created for each
variable listed in the VAR statement, or for each numeric variable in the input data set if you do not use
a VAR statement.
For example, suppose a data set named steel contains exactly three numeric variables, length, width
and height. The following statements create a cdf plot for each of the three variables:
Syntax: CDFPLOT Statement F 269
The following statements create a cdf plot for length and a cdf plot for width:
By default, the horizontal axis of a cdf plot is labeled with the variable name. If you specify a label for
a variable, however, the label is used. The default vertical axis label is Cumulative Percent, and the
axis is scaled in percent of observations.
If you specify a SPEC statement or a SPEC= data set in addition to the CDFPLOT statement, then the
specification limits for each variable are displayed as reference lines and are identified in a legend.
options
add features to plots. All options appear after the slash (/) in the CDFPLOT statement. In the following
example, the NORMAL option superimposes a normal cdf on the plot, and the CTEXT= option
specifies the color of the text.
Summary of Options
The following tables list all options by function. The section “Dictionary of Options” on page 274 describes
each option in detail.
Distribution Options
You can use the options listed in Table 6.12 to superimpose a fitted theoretical distribution function on your
cdf plot.
270 F Chapter 6: The CAPABILITY Procedure
Option Description
BETA(beta-options) Plots beta distribution with threshold
parameter , scale parameter , and shape
parameters ˛ and ˇ
EXPONENTIAL(exponential-options) Plots exponential distribution with threshold
parameter and scale parameter
GAMMA(gamma-options) Plots gamma distribution with threshold
parameter , scale parameter , and shape
parameter ˛
GUMBEL(Gumbel-options) Plots Gumbel distribution with location
parameter and scale parameter
IGAUSS(iGauss-options) Plots inverse Gaussian distribution with
mean and shape parameter
LOGNORMAL(lognormal-options) Plots lognormal distribution with threshold
parameter , scale parameter , and shape
parameter ,
NORMAL(normal-options) Plots normal distribution with mean and
standard deviation
PARETO(Pareto-options) Plots generalized Pareto distribution with
threshold parameter , scale parameter ,
and shape parameter ˛
POWER(power-options) Plots power function distribution with
threshold parameter , scale parameter ,
and shape parameter ˛
RAYLEIGH(Rayleigh-options) Plots Rayleigh distribution with threshold
parameter and scale parameter
WEIBULL(Weibull-options) Plots Weibull distribution function with
threshold parameter , scale parameter ,
and shape parameter c
Table 6.13 summarizes options that specify distribution parameters and control the display of the theoretical
distribution curve. You can specify these options in parentheses after the distribution option. For example,
the following statements use the NORMAL option to superimpose a normal distribution:
proc capability;
cdfplot / normal(mu=10 sigma=0.5 color=red);
run;
The COLOR= option specifies the color for the curve, and the normal-options MU= and SIGMA= specify
the parameters D 10 and D 0:5 for the distribution function. If you do not specify these parameters,
maximum likelihood estimates are computed.
Syntax: CDFPLOT Statement F 271
Option Description
Options Used with All Distributions
COLOR= Specifies color of theoretical distribution function
L= Specifies line type of theoretical distribution function
SYMBOL= Specifies character used to plot theoretical distribution function on
line printer plots
W= Specifies width of theoretical distribution function
Beta-Options
ALPHA= Specifies first shape parameter ˛ for beta distribution function
BETA= Specifies second shape parameter ˇ for beta distribution function
SIGMA= Specifies scale parameter for beta distribution function
THETA= Specifies lower threshold parameter for beta distribution function
Exponential-Options
SIGMA= Specifies scale parameter for exponential distribution function
THETA= Specifies threshold parameter for exponential distribution
function
Gamma-Options
ALPHA= Specifies shape parameter ˛ for gamma distribution function
ALPHADELTA= Specifies change in successive estimates of ˛ at which the
Newton-Raphson approximation of ˛O terminates
ALPHAINITIAL= Specifies initial value for ˛ in the Newton-Raphson approximation
of ˛O
MAXITER= Specifies maximum number of iterations in the Newton-Raphson
approximation of ˛O
SIGMA= Specifies scale parameter for gamma distribution function
THETA= Specifies threshold parameter for gamma distribution function
Gumbel-Options
MU= Specifies location parameter for Gumbel distribution function
SIGMA= Specifies scale parameter for Gumbel distribution function
IGauss-Options
LAMBDA= Specifies shape parameter for inverse Gaussian distribution
function
MU= Specifies mean for inverse Gaussian distribution function
Lognormal-Options
SIGMA= Specifies shape parameter for lognormal distribution function
THETA= Specifies threshold parameter for lognormal distribution function
ZETA= Specifies scale parameter for lognormal distribution function
272 F Chapter 6: The CAPABILITY Procedure
Option Description
Normal-Options
MU= Specifies mean for normal distribution function
SIGMA= Specifies standard deviation for normal distribution function
Pareto-Options
ALPHA= Specifies shape parameter ˛ for generalized Pareto distribution
function
SIGMA= Specifies scale parameter for generalized Pareto distribution
function
THETA= Specifies threshold parameter for generalized Pareto distribution
function
Power-Options
ALPHA= Specifies shape parameter ˛ for power function distribution
SIGMA= Specifies scale parameter for power function distribution
THETA= Specifies threshold parameter for power function distribution
Rayleigh-Options
SIGMA= Specifies scale parameter for Rayleigh distribution function
THETA= Specifies threshold parameter for Rayleigh distribution function
Weibull-Options
C= Specifies shape parameter c for Weibull distribution function
CDELTA= Specifies change in successive estimates of c at which the
Newton-Raphson approximation of cO terminates
CINITIAL= Specifies initial value for c in the Newton-Raphson approximation
of cO
MAXITER= Specifies maximum number of iterations in the Newton-Raphson
approximation of cO
SIGMA= Specifies scale parameter for Weibull distribution function
THETA= Specifies threshold parameter for Weibull distribution function
General Options
Option Description
General Plot Layout Options
CONTENTS= Specifies table of contents entry for cdf plot grouping
HREF= Specifies reference lines perpendicular to the horizontal axis
HREFLABELS= Specifies labels for HREF= lines
NOCDFLEGEND Suppresses legend for superimposed theoretical cdf
NOECDF Suppresses plot of empirical (observed) distribution function
Syntax: CDFPLOT Statement F 273
Option Description
NOFRAME Suppresses frame around plotting area
NOLEGEND Suppresses legend
NOSPECLEGEND Suppresses specifications legend
VREF= Specifies reference lines perpendicular to the vertical axis
VREFLABELS= Specifies labels for VREF= lines
VSCALE= Specifies scale for vertical axis
Graphics Options
ANNOTATE= Specifies annotate data set
CAXIS= Specifies color for axis
CFRAME= Specifies color for frame
CHREF= Specifies colors for HREF= lines
CSTATREF= Specifies colors for STATREF= lines
CTEXT= Specifies color for text
CVREF= Specifies colors for VREF= lines
DESCRIPTION= Specifies description for graphics catalog member
FONT= Specifies text font
HAXIS= Specifies AXIS statement for horizontal axis
HEIGHT= Specifies height of text used outside framed areas
HMINOR= Specifies number of horizontal axis minor tick marks
HREFLABPOS= Specifies position for HREF= line labels
INFONT= Specifies software font for text inside framed areas
INHEIGHT= Specifies height of text inside framed areas
LHREF= Specifies line styles for HREF= lines
LSTATREF= Specifies line styles for STATREF= lines
LVREF= Specifies line styles for VREF= lines
NAME= Specifies name for plot in graphics catalog
NOHLABEL Suppresses label for horizontal axis
NOVLABEL Suppresses label for vertical axis
NOVTICK Suppresses tick marks and tick mark labels for vertical axis
STATREF= Specifies reference lines at values of summary statistics
STATREFLABELS= Specifies labels for STATREF= lines
STATREFSUBCHAR= Specifies substitution character for displaying statistic values in
STATREFLABELS= labels
TURNVLABELS Turns and vertically strings out characters in labels for vertical axis
VAXIS= Specifies AXIS statement for vertical axis
VAXISLABEL= Specifies label for vertical axis
VMINOR= Specifies number of vertical axis minor tick marks
VREFLABPOS= Specifies position for VREF= line labels
WAXIS= Specifies line thickness for axes and frame
Option Description
ODSTITLE= Specifies title displayed on cdf plot
ODSTITLE2= Specifies secondary title displayed on cdf plot
Dictionary of Options
The following entries provide detailed descriptions of the options specific to the CDFPLOT statement. See
“Dictionary of Common Options: CAPABILITY Procedure” on page 550 for detailed descriptions of options
common to all the plot statements.
ALPHA=value
specifies the shape parameter ˛ for distribution functions requested with the BETA, GAMMA,
PARETO, and POWER options. Enclose the ALPHA= option in parentheses after the distribu-
tion keyword. If you do not specify a value for ˛, the procedure calculates a maximum likelihood
estimate. For examples, see the entries for the distribution options.
BETA< (beta-options) >
displays a fitted beta distribution function on the cdf plot. The equation of the fitted cdf is
8
< 0 for x
F .x/ D I x .˛; ˇ/ for < x < C
:
1 for x C
proc capability;
cdfplot / beta(theta=50 sigma=25);
run;
The beta distribution has two shape parameters, ˛ and ˇ. If these parameters are known, you can
specify their values with the ALPHA= and BETA= beta-options. If you do not specify values for ˛
and ˇ, the procedure calculates maximum likelihood estimates.
The BETA option can appear only once in a CDFPLOT statement. See Table 6.13 for a list of secondary
options you can specify with the BETA distribution option.
BETA=value
B=value
specifies the second shape parameter ˇ for beta distribution functions requested by the BETA option.
Enclose the BETA= option in parentheses after the BETA keyword. If you do not specify a value for ˇ,
the procedure calculates a maximum likelihood estimate. For examples, see the preceding entry for the
BETA option.
C=value
specifies the shape parameter c for Weibull distribution functions requested with the WEIBULL option.
Enclose the C= option in parentheses after the WEIBULL keyword. If you do not specify a value for c,
the procedure calculates a maximum likelihood estimate. You can specify the SHAPE= option as an
alias for the C= option.
CDFSYMBOL=‘character ’
specifies the character used to plot the points on legacy line printer cdf plots. The default is the plus sign
(+). This option is ignored unless you specify the LINEPRINTER option in the PROC CAPABILITY
statement. Use the SYMBOL statement to control the plotting symbol in traditional graphics output.
where
D threshold parameter
D scale parameter . > 0/
The parameter must be less than or equal to the minimum data value. You can specify with the
THETA= exponential-option. The default value for is 0. You can specify with the SIGMA=
276 F Chapter 6: The CAPABILITY Procedure
exponential-option. By default, a maximum likelihood estimate is computed for . For example, the
following statements fit an exponential distribution with D 10 and a maximum likelihood estimate
for :
proc capability;
cdfplot / exponential(theta=10 l=2 color=green);
run;
where
D threshold parameter
D scale parameter . > 0/
˛ D shape parameter .˛ > 0/
The parameter for the gamma distribution must be less than the minimum data value. You can specify
with the THETA= gamma-option. The default value for is 0. In addition, the gamma distribution
has a shape parameter ˛ and a scale parameter . You can specify these parameters with the ALPHA=
and SIGMA= gamma-options. By default, maximum likelihood estimates are computed for ˛ and .
For example, the following statements fit a gamma distribution function with D 4 and maximum
likelihood estimates for ˛ and :
proc capability;
cdfplot / gamma(theta=4);
run;
Note that the maximum likelihood estimate of ˛ is calculated iteratively using the Newton-Raphson
approximation. The gamma-options ALPHADELTA=, ALPHAINITIAL=, and MAXITER= control
the approximation.
The GAMMA option can appear only once in a CDFPLOT statement. See Table 6.13 for a list of
secondary options you can specify with the GAMMA option.
where
Syntax: CDFPLOT Statement F 277
D location parameter
D scale parameter . > 0/
You can specify known values for and with the MU= and SIGMA= Gumbel-options. By default,
maximum likelihood estimates are computed for and .
The GUMBEL option can appear only once in a CDFPLOT statement. See Table 6.13 for a list of
secondary options you can specify with the GUMBEL option.
LAMBDA=value
specifies the shape parameter for distribution functions requested with the IGAUSS option. Enclose
the LAMBDA= option in parentheses after the IGAUSS distribution keyword. If you do not specify a
value for , the procedure calculates a maximum likelihood estimate.
LEGEND=name | NONE
specifies the name of a LEGEND statement describing the legend for specification limit reference lines
and superimposed distribution functions. Specifying LEGEND=NONE, which suppresses all legend
information, is equivalent to specifying the NOLEGEND option. This option is ignored unless you are
producing traditional graphics.
D threshold parameter
D scale parameter
D shape parameter . > 0/
The parameter for the lognormal distribution must be less than the minimum data value. You can
specify with the THETA= lognormal-option. The default value for is 0. In addition, the lognormal
distribution has a shape parameter and a scale parameter . You can specify these parameters with the
278 F Chapter 6: The CAPABILITY Procedure
SIGMA= and ZETA= lognormal-options. By default, estimates of and are computed as described
in “Lognormal Distribution” on page 353. For example, the following statements fit a lognormal
distribution function with D 10 and estimates for and :
proc capability;
cdfplot / lognormal(theta = 10);
run;
The LOGNORMAL option can appear only once in a CDFPLOT statement. See Table 6.13 for a list
of secondary options you can specify with the LOGNORMAL option.
MU=value
specifies the parameter for distribution functions requested with the GUMBEL, IGAUSS, and
NORMAL options. Enclose the MU= option in parentheses after the distribution keyword. For the
normal and inverse Gaussian distributions, the default value of is the sample mean. If you do not
specify a value for for the Gumbel distribution, the procedure calculates a maximum likelihood
estimate.
NOCDFLEGEND
suppresses the legend for the superimposed theoretical cumulative distribution function.
NOECDF
suppresses the observed distribution function (the empirical cumulative distribution function) of the
variable, which is drawn by default. This option enables you to create theoretical cdf plots without
displaying the data distribution. The NOECDF option can be used only with a theoretical distribution
(such as the NORMAL option).
NOLEGEND
suppresses legends for specification limits, theoretical distribution functions, and hidden observations.
Specifying the NOLEGEND option is equivalent to specifying LEGEND=NONE.
D mean
D standard deviation . > 0/
You can specify known values for and with the MU= and SIGMA= normal-options, as shown in
the following statements:
proc capability;
cdfplot / normal(mu=14 sigma=.05);
run;
By default, the sample mean and sample standard deviation are calculated for and . The NORMAL
option can appear only once in a CDFPLOT statement. For an example, see Output 6.4.1. See
Table 6.13 for a list of secondary options you can specify with the NORMAL option.
Syntax: CDFPLOT Statement F 279
NOSPECLEGEND
NOSPECL
suppresses the portion of the legend for specification limit reference lines.
where
D threshold parameter
D scale parameter . > 0/
˛ D shape parameter
The parameter for the generalized Pareto distribution must be less than the minimum data value.
You can specify with the THETA= Pareto-option. The default value for is 0. In addition, the
generalized Pareto distribution has a shape parameter ˛ and a scale parameter . You can specify
these parameters with the ALPHA= and SIGMA= Pareto-options. By default, maximum likelihood
estimates are computed for ˛ and .
The PARETO option can appear only once in a CDFPLOT statement. See Table 6.13 for a list of
secondary options you can specify with the PARETO option.
where
where
D threshold parameter
D scale parameter . > 0/
The parameter for the Rayleigh distribution must be less than the minimum data value. You can
specify with the THETA= Rayleigh-option. The default value for is 0. You can specify with the
SIGMA= Rayleigh-option. By default, a maximum likelihood estimate is computed for .
The RAYLEIGH option can appear only once in a CDFPLOT statement. See Table 6.13 for a list of
secondary options you can specify with the RAYLEIGH option.
SIGMA=value
specifies the parameter for distribution functions requested by the BETA, EXPONENTIAL,
GAMMA, GUMBEL, LOGNORMAL, NORMAL, PARETO, POWER, RAYLEIGH, and WEIBULL
options. Enclose the SIGMA= option in parentheses after the distribution keyword. Table 6.15
summarizes the use of the SIGMA= option.
SYMBOL=‘character ’
specifies the character used to plot the theoretical distribution function on legacy line printer plots.
Enclose the SYMBOL= option in parentheses after the distribution option. The default character
is the first letter of the distribution option keyword. This option is ignored unless you specify the
LINEPRINTER option in the PROC CAPABILITY statement.
Syntax: CDFPLOT Statement F 281
THETA=value
THRESHOLD=value
specifies the lower threshold parameter for theoretical cumulative distribution functions requested
with the BETA, EXPONENTIAL, GAMMA, LOGNORMAL, PARETO, POWER, RAYLEIGH, and
WEIBULL options. Enclose the THETA= option in parentheses after the distribution keyword. The
default value is 0.
VSCALE=PERCENT | PROPORTION
specifies the scale of the vertical axis. The value PERCENT scales the data in units of percent
of observations per data unit. The value PROPORTION scales the data in units of proportion of
observations per data unit. The default is PERCENT.
where
D threshold parameter
D scale parameter . > 0/
c D shape parameter .c > 0/
The parameter must be less than the minimum data value. You can specify with the THETA=
Weibull-option. The default value for is 0. In addition, the Weibull distribution has a shape parameter
c and a scale parameter . You can specify these parameters with the SIGMA= and C= Weibull-options.
By default, maximum likelihood estimates are computed for c and . For example, the following
statements fit a Weibull distribution function with D 15 and maximum likelihood estimates for
and c:
proc capability;
cdfplot / weibull(theta=15);
run;
Note that the maximum likelihood estimate of c is calculated iteratively using the Newton-Raphson
approximation. The Weibull-options CDELTA=, CINITIAL=, and MAXITER= control the approxima-
tion.
The WEIBULL option can appear only once in a CDFPLOT statement. See Table 6.13 for a list of
secondary options you can specify with the WEIBULL option.
ZETA=value
specifies a value for the scale parameter for a lognormal distribution function requested with the
LOGNORMAL option. Enclose the ZETA= option in parentheses after the LOGNORMAL keyword.
If you do not specify a value for , a maximum likelihood estimate is computed. You can specify the
SCALE= option as an alias for the ZETA= option.
282 F Chapter 6: The CAPABILITY Procedure
See Chapter 4, “SAS/QC Graphics,” for more information about ODS Graphics and other methods for
producing charts.
The NORMAL option requests the fitted curve. The INSET statement requests an inset containing the
mean, the standard deviation, and the percent of observations below the lower specification limit. For more
information about the INSET statement, see “INSET Statement: CAPABILITY Procedure” on page 398. The
SPEC statement requests a lower specification limit at 6.8. For more information about the SPEC statement,
see “SPEC Statement” on page 225.
The agreement between the empirical and the normal distribution functions in Output 6.4.1 is evidence that
the normal distribution is an appropriate model for the distribution of breaking strengths.
The CAPABILITY procedure provides a variety of other tools for assessing goodness of fit. Goodness-of-
fit tests (see “Printed Output” on page 362) provide a quantitative assessment of a proposed distribution.
Probability and Q-Q plots, created with the PROBPLOT (“PROBPLOT Statement: CAPABILITY Procedure”
on page 477), QQPLOT (“QQPLOT Statement: CAPABILITY Procedure” on page 508), and PPPLOT
(“PPPLOT Statement: CAPABILITY Procedure” on page 454) statements, provide effective graphical
diagnostics.
284 F Chapter 6: The CAPABILITY Procedure
exploring stratification in process data due to different lots, machines, manufacturing methods, and so
forth
inset summary statistics and process capability indices on the component histograms
You have two alternatives for producing comparative histograms with the COMPHISTOGRAM statement:
ODS Graphics output is produced if ODS Graphics is enabled, for example by specifying the ODS
GRAPHICS ON statement prior to the PROC statement.
See Chapter 4, “SAS/QC Graphics,” for more information about producing these different kinds of graphs.
N OTE : You cannot use the COMPHISTOGRAM statement together with the CLASS statement.
data Channel;
length Lot $ 16;
input Length @@;
select;
when (_n_ <= 425) Lot='Lot 1';
when (_n_ >= 926) Lot='Lot 3';
otherwise Lot='Lot 2';
end;
datalines;
0.91 1.01 0.95 1.13 1.12 0.86 0.96 1.17 1.36 1.10
0.98 1.27 1.13 0.92 1.15 1.26 1.14 0.88 1.03 1.00
0.98 0.94 1.09 0.92 1.10 0.95 1.05 1.05 1.11 1.15
1.11 0.98 0.78 1.09 0.94 1.05 0.89 1.16 0.88 1.19
1.01 1.08 1.19 0.94 0.92 1.27 0.90 0.88 1.38 1.02
2.13 2.05 1.90 2.07 2.15 1.96 2.15 1.89 2.15 2.04
1.95 1.93 2.22 1.74 1.91
;
The data set Channel is also used in Example 6.12, where a kernel density estimate is superimposed on the
histogram of channel lengths. The display in Output 6.12.1 reveals that there are three distinct peaks in the
process distribution. To investigate whether these peaks (modes) in the histogram are related to the lot source,
you can create a comparative histogram that uses Lot as a classification variable. The following statements
create the comparative histogram shown in Figure 6.6:
run;
The comparative histogram is displayed in Figure 6.7.
Specifying INTERTILE=1 inserts a space of one percent screen unit between the framed areas, which are
referred to as tiles. The shaded bars, added with the CPROP= option, represent the relative frequency of
observations in each cell. See “Dictionary of Options” on page 295 for details concerning these options.
290 F Chapter 6: The CAPABILITY Procedure
You can specify the keyword COMPHIST as an alias for COMPHISTOGRAM. You can use any number of
COMPHISTOGRAM statements after a PROC CAPABILITY statement.
To create a comparative histogram, you must specify at least one variable and either one or two class-variables
(also referred to as classification variables). The COMPHISTOGRAM statement displays a component
histogram of the values of the variable for each level of the class-variables. The observations in a particular
level are referred to as a cell.
The components of the COMPHISTOGRAM statement are described as follows:
variables
are the process variables for which comparative histograms are to be created. If you specify a VAR
statement, the variables must also be listed in the VAR statement. Otherwise, variables can be any
numeric variables in the input data set that are not also listed as class-variables. If you do not specify
variables in a COMPHISTOGRAM statement or a VAR statement, then by default a comparative
histogram is created for each numeric variable in the DATA= data set that is not used as a class-variable.
If you use a VAR statement and do not specify variables in the COMPHISTOGRAM statement, then
by default a comparative histogram is created for each variable listed in the VAR statement.
For example, suppose a data set named steel contains two process variables named length and width,
a numeric classification variable named lot, and a character classification variable named day. The
following statements create two comparative histograms, one for length and one for Width:
Likewise, the following statements create comparative histograms for length and width:
The following statements create three comparative histograms (for length, width, and lot):
class-variables
are one or two required classification variables. For example, the following statements create a one-way
comparative histogram for width by using the classification variable lot:
The following statements create a two-way comparative histogram for width classified by lot and day:
Note that the parentheses surrounding the class-variables are needed only if two classification variables
are specified. See Output 6.6.1 and Output 6.7.1 for further examples.
options
control the features of the comparative histogram. All options are specified after the slash (/) in the
COMPHIST statement. In the following example, the CLASS= option specifies the classification
variable, the NORMAL option fits a normal density curve in each cell, and the CTEXT= option
specifies the color of the text:
Summary of Options
The following tables list the COMPHIST statement options by function. For complete descriptions, see
“Dictionary of Options” on page 295.
Distribution Options
Table 6.17 lists the options for requesting that a fitted normal distribution or a kernel density estimate be
overlaid on the comparative histogram.
292 F Chapter 6: The CAPABILITY Procedure
Option Description
KERNEL(kernel-options) Fits kernel density estimates
NORMAL(normal-options) Fits normal distribution with mean and
standard deviation
You can specify the secondary options listed in Table 6.18 in parentheses after the KERNEL option to control
features of kernel density estimates.
Option Description
C= Specifies standardized bandwidth parameter c for kernel
density estimate
COLOR= Specifies color of the kernel density curve
FILL Fills area under kernel density curve
K= Specifies NORMAL, TRIANGULAR, or QUADRATIC
kernel
L= Specifies line type used for kernel density curve
LOWER= Specifies lower bound for kernel density curve
UPPER= Specifies upper bound for kernel density curve
W= Specifies line width for kernel density curve
You can specify the secondary options listed in Table 6.19 in parentheses after the NORMAL option to
control features of fitted normal distributions.
Option Description
COLOR= Specifies color of normal curve
FILL Fills area under normal curve
L= Specifies line type of normal curve
MU= Specifies mean for fitted normal curve
SIGMA= Specifies standard deviation for fitted normal curve
W= Specifies width of normal curve
For example, the following statements use the NORMAL option to fit a normal curve in each cell of the
comparative histogram:
proc capability;
comphistogram / class = machine
normal(color=red l=2);
run;
Syntax: COMPHISTOGRAM Statement F 293
The COLOR= normal-option draws the curve in red, and the L= normal-option specifies a line style of 2
(a dashed line) for the curve. In this example, maximum likelihood estimates are computed for the normal
parameters and for each cell because these parameters are not specified.
General Options
Option Description
Classification Options
CLASS= Specifies classification variables
CLASSKEY= Specifies key cell
MISSING1 Requests that missing values of first CLASS= variable be treated
as a level of that CLASS= variable
MISSING2 Requests that missing values of second CLASS= variable be
treated as a level of that CLASS= variable
ORDER1= Specifies display order for values of the first CLASS= variable
ORDER2= Specifies display order for values of the second CLASS= variable
Layout Options
BARLABEL= Produces labels above histogram bars
BARWIDTH= Specifies width for the bars
CLIPSPEC= Clips histogram bars at specification limits if there are no
observations beyond the limits
ENDPOINTS= Labels interval endpoints and specifies how they are determined
HOFFSET= Specifies offset for horizontal axis
INTERTILE= Specifies distance between tiles
MAXNBIN= Specifies maximum number of bins displayed
MAXSIGMAS= Limits number of bins displayed to range of value standard
deviations above and below mean of data in key cell
MIDPOINTS= Specifies how midpoints are determined
NCOLS= Specifies number of columns in comparative histogram
NOBARS Suppresses histogram bars
NOFRAME Suppresses frame around plotting area
NOKEYMOVE Suppresses rearrangement of cells that occurs by default with the
CLASSKEY= option
NOPLOT Suppresses plot
NROWS= Specifies number of rows in comparative histogram
RTINCLUDE Includes right endpoint in interval
WBARLINE= Specifies line thickness for bar outlines
Option Description
NOVTICK Suppresses tick marks and tick mark labels for vertical axis
TILELEGLABEL= Specifies label displayed when _CTILE_ and _TILELG_ variables
are provided in the CLASSSPEC= data set
TURNVLABELS Turns and strings out vertically characters in vertical axis labels
VAXIS= Specifies tick mark values for vertical axis
VAXISLABEL= Specifies label for vertical axis
VOFFSET= Specifies length of offset at upper end of vertical axis
VSCALE= Specifies scale for vertical axis
WAXIS= Specifies line thickness for axes and frame
WGRID= Specifies line thickness for grid
Option Description
PFILL= Specifies pattern used to fill bars
Dictionary of Options
The following sections describe in detail the options specific to the COMPHISTOGRAM statement. See
“Dictionary of Common Options: CAPABILITY Procedure” on page 550 for detailed descriptions of options
common to all the plot statements.
General Options
You can specify the following options whether you are producing ODS Graphics output or traditional
graphics:
C=value-list | MISE
specifies the standardized bandwidth parameter c for kernel density estimates requested with the
KERNEL option. You can specify up to five values to display multiple estimates in each cell. You can
also specify the keyword MISE to request the bandwidth parameter that minimizes the estimated mean
integrated square error (MISE). For example, consider the following statements (for more information,
see “Kernel Density Estimates” on page 360):
proc capability;
comphist length / class=batch kernel(c = 0.5 1.0 mise);
run;
The KERNEL option displays three density estimates. The first two have standardized bandwidths
of 0.5 and 1.0, respectively. The third has a bandwidth parameter that minimizes the MISE. You can
also use the C= and K= options (K= specifies kernel type) to display multiple estimates. For example,
consider the following statements:
296 F Chapter 6: The CAPABILITY Procedure
proc capability;
comphist length / class = batch
kernel(c = 0.75 k = normal triangular);
run;
Here two estimates are displayed. The first uses a normal kernel and bandwidth parameter of 0.75, and
the second uses a triangular kernel and a bandwidth parameter of 0.75. In general, if more kernel types
are specified than bandwidth parameters, the last bandwidth parameter in the list will be repeated for
the remaining estimates. Likewise, if more bandwidth parameters are specified than kernel types, the
last kernel type will be repeated for the remaining estimates. The default is MISE.
CLASS=variable
CLASS=(variable1 variable2)
specifies that a comparative histogram is to be created using the levels of the variables (also referred to
as class-variables or classification variables).
If you specify a single variable, a one-way comparative histogram is created. The observations in
the input data set are sorted by the formatted values (levels) of the variable. A separate histogram is
created for the process variable values in each level, and these component histograms are arranged in
an array to form the comparative histogram. Uniform horizontal and vertical axes are used to facilitate
comparisons. For an example, see Figure 6.6.
If you specify two classification variables, a two-way comparative histogram is created. The obser-
vations in the input data set are cross-classified according to the values (levels) of these variables. A
separate histogram is created for the process variable values in each cell of the cross-classification, and
these component histograms are arranged in a matrix to form the comparative histogram. The levels
of variable1 are used to label the rows of the matrix, and the levels of variable2 are used to label the
columns of the matrix. Uniform horizontal and vertical axes are used to facilitate comparisons. For an
example, see Output 6.7.1.
Classification variables can be numeric or character. Formatted values are used to determine the levels.
You can specify whether missing values are to be treated as a level with the MISSING1 and MISSING2
options.
If a label is associated with a classification variable, the label is displayed on the comparative histogram.
The variable label is displayed parallel to the column (or row) labels. For an example, see Figure 6.6.
CLASSKEY=‘value’
CLASSKEY=(‘value1’ ‘value2’)
specifies the key cell in a comparative histogram requested with the CLASS= option. The bin size and
midpoints are first determined for the key cell, and then the midpoint list is extended to accommodate
the data ranges for the remaining cells. Thus, the choice of the key cell determines the uniform
horizontal axis used for all cells.
If you specify CLASS=variable, you can specify CLASSKEY=’value’ to identify the key cell as the
level for which variable is equal to value. You must specify a formatted value. By default, the levels
are sorted in the order determined by the ORDER1= option, and the key cell is the level that occurs
first in this order. The cells are displayed in this order from top to bottom (or left to right), and,
consequently, the key cell is displayed at the top or at the left. If you specify a different key cell with
Syntax: COMPHISTOGRAM Statement F 297
the CLASSKEY= option, this cell is displayed at the top or at the left unless you also specify the
NOKEYMOVE option.
If you specify CLASS=(variable1 variable2), you can specify CLASSKEY=(’value1’ ’value2’) to
identify the key cell as the level for which variable1 is equal to value1 and variable2 is equal to
value2. Here, value1 and value2 must be formatted values, and they must be enclosed in quotes. For
an example of the CLASSKEY= option with a two-way comparative histogram, see Output 6.7.1. By
default, the levels of variable1 are sorted in the order determined by the ORDER1= option, and within
each of these levels, the levels of variable2 are sorted in the order determined by the ORDER2= option.
The default key cell is the combination of levels of variable1 and variable2 that occurs first in this
order. The cells are displayed in order of variable1 from top to bottom and in order of variable2 from
left to right. Consequently, the default key cell is displayed in the upper left corner. If you specify a
different key cell with the CLASSKEY= option, this cell is displayed in the upper left corner unless
you also specify the NOKEYMOVE option.
CLASSSPEC=SAS-data-set
CLASSSPECS=SAS-data-set
specifies a data set that provides distinct specification limits for each cell, as well as a color, legend,
and label for the corresponding tile. Table 6.21 lists the variables that are read from a CLASSSPEC=
data set.
If you specify a CLASSSPEC= data set, you cannot use the SPEC statement or a SPEC= data set. If
you use a BY statement, the CLASSSPEC= data set must contain one observation for each unique
combination of process and classification variables within each BY group. See Example 6.6 for an
example of a CLASSSPEC= data set.
Also note that
you can suppress the background color for a tile by assigning the value ‘EMPTY’ or a blank
value to the variable _CTILE_
298 F Chapter 6: The CAPABILITY Procedure
you can use the NLEGENDPOS= option to specify the corner of the tile in which the _TILELB_
label is displayed. You can frame the label with the CFRAMENLEG= option.
you cannot use the variable _TILELG_ unless you specify the variable _CTILE_
the variable _TILELB_ takes precedence over the NLEGEND option
FILL
fills areas under a fitted density curve with colors and patterns. Enclose the FILL option in parentheses
after the keyword NORMAL or KERNEL. Depending on the area to be filled (outside or between the
specification limits), you can specify the color and pattern with options in the SPEC statement and the
COMPHISTOGRAM statement, as summarized in the following table:
If you do not display specification limits, you can use the CFILL= and PFILL= options to specify
the color and pattern for the entire area under the curve. Solid fills are used by default if patterns
are not specified. You can specify the FILL option with only one fitted curve. For an example, see
Output 6.6.1. Refer to SAS/GRAPH: Reference for a list of available patterns and colors. If you do not
specify the FILL option but you do specify the options in the preceding table, the colors and patterns
are applied to the corresponding areas under the histogram.
Syntax: COMPHISTOGRAM Statement F 299
GRID
adds a grid to the comparative histogram. Grid lines are horizontal lines positioned at major tick marks
on the vertical axis.
INTERTILE=value
specifies the distance in horizontal percent screen units between tiles. For an example, see Figure 6.7.
By default, the tiles are contiguous.
Option Description
FILL Specifies that the area under the curve is to be filled
COLOR= Specifies the color of the curve
L= Specifies the line style for the curve
W= Specifies the width of the curve
K= Specifies the type of kernel
C= Specifies the smoothing parameter
LOWER= Specifies the lower bound for the curve
UPPER= Specifies the upper bound for the curve
See Output 6.6.1 for an example. By default, the estimate is based on the AMISE method. For more
information, see “Kernel Density Estimates” on page 360.
LOWER=value
specifies the lower bound for a kernel density estimate curve. Enclose the LOWER= option in
parentheses after the KERNEL option. You can specify a single lower bound or a list of lower bounds.
By default, a kernel density estimate curve has no lower bound.
MAXNBIN=n
specifies the maximum number of bins to be displayed. This option is useful in situations where the
scales or ranges of the data distributions differ greatly from cell to cell. By default, the bin size and
midpoints are determined for the key cell, and then the midpoint list is extended to accommodate the
data ranges for the remaining cells. However, if the cell scales differ considerably, the resulting number
of bins may be so great that each cell histogram is scaled into a narrow region. By limiting the number
of bins with the MAXNBIN= option, you can narrow the window about the data distribution in the key
cell. Note that the MAXNBIN= option provides an alternative to the MAXSIGMAS= option.
300 F Chapter 6: The CAPABILITY Procedure
MAXSIGMAS=value
limits the number of bins to be displayed to a range of value standard deviations (of the data in the key
cell) above and below the mean of the data in the key cell. This option is useful in situations where the
scales or ranges of the data distributions differ greatly from cell to cell. By default, the bin size and
midpoints are determined for the key cell, and then the midpoint list is extended to accommodate the
data ranges for the remaining cells. If the cell scales differ considerably, however, the resulting number
of bins may be so great that each cell histogram is scaled into a narrow region. By limiting the number
of bins with the MAXSIGMAS= option, you narrow the window about the data distribution in the key
cell. Note that the MAXSIGMAS= option provides an alternative to the MAXNBIN= option.
MISSING1
specifies that missing values of the first CLASS= variable are to be treated as a level of the CLASS=
variable. If the first CLASS= variable is a character variable, a missing value is defined as a blank
internal (unformatted) value. If the process variable is numeric, a missing value is defined as any of
the SAS System missing values. If you do not specify MISSING1, observations for which the first
CLASS= variable is missing are excluded from the analysis.
MISSING2
specifies that missing values of the second CLASS= variable are to be treated as a level of the CLASS=
variable. If the second CLASS= variable is a character variable, a missing value is defined as a blank
internal (unformatted) value. If the process variable is numeric, a missing value is defined as any of
the SAS System missing values. If you do not specify MISSING2, observations for which the second
CLASS= variable is missing are excluded from the analysis.
Syntax: COMPHISTOGRAM Statement F 301
MU=value
specifies the parameter for the normal density curves requested with the NORMAL option. Enclose
the MU= option in parentheses after the NORMAL option. The default value is the sample mean of
the observations in the cell.
NOBARS
suppresses the display of the bars in a comparative histogram.
NOCHART
suppresses the creation of a comparative histogram. This is an alias for NOPLOT.
NOKEYMOVE
suppresses the rearrangement of cells that occurs by default when you use the CLASSKEY= option to
specify the key cell. For details, see the entry for the CLASSKEY= option.
NOPLOT
suppresses the creation of a comparative histogram. This option is useful when you are using the
COMPHISTOGRAM statement solely to create an output data set.
where
D mean
D standard deviation . > 0/
h D width of histogram interval
v D vertical scaling factor
and
8
< n the sample size, for VSCALE=COUNT
vD 100 for VSCALE=PERCENT
1 for VSCALE=PROPORTION
:
If you specify values for and with the MU= and SIGMA= normal-options, the same curve is
displayed for each cell. By default, a distinct curve is displayed for each cell based on the sample mean
and standard deviation for that cell. For example, the following statements display a distinct curve for
each level of the variable Supplier:
The curves are drawn in red with a line style of 2 (a dashed line). See Figure 6.7 for another illustration.
Table 6.19 lists options that can be specified in parentheses after the NORMAL option.
302 F Chapter 6: The CAPABILITY Procedure
If you specify ORDER1=INTERNAL, the rows (columns) are displayed from top to bottom
(left to right) in increasing order of the internal (unformatted) values of the first CLASS= variable.
If there are two or more distinct internal values with the same formatted value, then the order is
determined by the internal value that occurs first in the input data set.
For example, suppose that you specify a numeric CLASS= variable called Day (with values 1, 2,
and 3). Suppose also that a format (created with the FORMAT procedure) is associated with Day
and that the formatted values are as follows: 1 = ’Wednesday’, 2 = ’Thursday’, and 3 = ’Friday’.
If you specify ORDER1=INTERNAL, the rows of the comparative histogram will appear in
day-of-the-week order (Wednesday, Thursday, Friday) from top to bottom.
If you specify ORDER1=FORMATTED, the rows (columns) are displayed from top to bottom
(left to right) in increasing order of the formatted values of the first CLASS= variable. In
the preceding illustration, if you specify ORDER1=FORMATTED, the rows will appear in
alphabetical order (Friday, Thursday, Wednesday) from top to bottom.
If you specify ORDER1=DATA, the rows (columns) are displayed from top to bottom (left to
right) in the order in which the values of the first CLASS= variable first appear in the input data
set.
If you specify ORDER1=FREQ, the rows (columns) are displayed from top to bottom (left to
right) in order of decreasing frequency count. If two or more classes have the same frequency
count, the order is determined by the formatted values.
By default, ORDER1=INTERNAL.
OUTHISTOGRAM=SAS-data-set
creates a SAS data set that saves the midpoints or endpoints of the histogram intervals, the observed
percent of observations in each interval, and (optionally) the percent of observations in each interval
estimated from a fitted normal distribution. By default, interval midpoint values are saved in the
variable _MIDPT_. If the ENDPOINTS= option is specified, intervals are identified by endpoint
values instead. If RTINCLUDE is specified, the _MAXPT_ variable contains upper endpoint values.
Otherwise, lower endpoint values are saved in the _MINPT_ variable.
RTINCLUDE
includes the right endpoint of each histogram interval in that interval. The left endpoint is included by
default.
SIGMA=value
specifies the parameter for normal density curves requested with the NORMAL option. Enclose the
SIGMA= option in parentheses after the NORMAL option. The default value is the sample standard
deviation of the observations in the cell.
UPPER=value
specifies the upper bound for a kernel density estimate curve. Enclose the UPPER= option in paren-
theses after the KERNEL option. You can specify a single upper bound or a list of upper bounds. By
default, a kernel density estimate curve has no upper bound.
BARWIDTH=value
specifies the width of the histogram bars in screen percent units.
CBARLINE=color
specifies the color of the outline of the histogram bars. This option overrides the C= option in the
SYMBOL1 statement.
CFILL=color
specifies a color used to fill the bars of the histograms (or the areas under a fitted curve if you also
specify the FILL option). See the entry for the FILL option for additional details. See Output 6.6.1 and
Example 6.7 for examples. Refer to SAS/GRAPH: Reference for a list of colors. By default, bars and
curve areas are not filled.
CFRAMENLEG=color | EMPTY
CFRAMENLEG
specifies that the legend requested with the NLEGEND option (or the variable _TILELB_ in a
CLASSSPEC= data set) is to be framed and that the frame is to be filled with the color indicated. If
you specify CFRAMENLEG=EMPTY, a frame is drawn but not filled with a color.
304 F Chapter 6: The CAPABILITY Procedure
CGRID=color
specifies the color for grid lines requested with the GRID option. By default, grid lines are the same
color as the axes. If you use CGRID=, you do not need to specify the GRID option.
CLIPSPEC=CLIP | NOFILL
specifies that histogram bars are clipped at the upper and lower specification limit lines when there are
no observations outside the specification limits. The bar intersecting the lower specification limit is
clipped if there are no observations less than the lower limit; the bar intersecting the upper specification
limit is clipped if there are no observations greater than the upper limit. If you specify CLIPSPEC=CLIP,
the histogram bar is truncated at the specification limit. If you specify CLIPSPEC=NOFILL, the portion
of a filled histogram bar outside the specification limit is left unfilled. Specifying CLIPSPEC=NOFILL
when histogram bars are not filled has no effect.
FRONTREF
draws reference lines requested with the HREF= and VREF= options in front of the histogram bars.
By default, reference lines are drawn behind the histogram bars and can be obscured by them.
HOFFSET=value
specifies the offset in percent screen units at both ends of the horizontal axis. Specify HOFFSET=0 to
eliminate the default offset.
LGRID=n
specifies the line type for the grid requested with the GRID option. If you use the LGRID= option, you
do not need to specify the GRID option. The default is 1, which produces a solid line.
If you specify the NLEGEND option, the form is N D n where n is the cell sample size.
If you specify the NLEGEND=’label’ option, the form is label = n where n is the cell sample
size. The label can be up to 16 characters and must be enclosed in quotes. For instance, you
might specify NLEGEND=’Number of Parts’ to request a label of the form Number of Parts = n.
See Figure 6.6 for an example. You can use the CFRAMENLEG= option to frame the sample size
legend. The variable _TILELB_ in a CLASSSPEC= data set overrides the NLEGEND option. By
default, no legend is displayed.
NLEGENDPOS=NW | NE
specifies the position of the legend requested with the NLEGEND option or the variable _TILELB_ in
a CLASSSPEC= data set. If NLEGENDPOS=NW, the legend is displayed in the northwest corner
of the tile; if NLEGENDPOS=NE, the legend is displayed in the northeast corner of the tile. See
Figure 6.6 for an illustration. The default is NE.
PFILL=pattern
specifies a pattern used to fill the bars of the histograms (or the areas under a fitted curve if you also
specify the FILL option). See the entries for the CFILL= and FILL options for additional details. Refer
to SAS/GRAPH: Reference for a list of pattern values. By default, the bars and curve areas are not
filled.
Details: COMPHISTOGRAM Statement F 305
TILELEGLABEL=’label’
specifies a label displayed to the left of the legend that is created when you provide _CTILE_ and
_TILELG_ variables in a CLASSSPEC= data set. The label can be up to 16 characters and must be
enclosed in quotes. The default label is Tiles:.
VOFFSET=value
specifies the offset in percent screen units at the upper end of the vertical axis.
WBARLINE=n
specifies the width of bar outlines. By default, n = 1.
WGRID=n
specifies the width of the grid lines requested with the GRID option. By default, grid lines are the
same width as the axes. If you use the WGRID= option, you do not need to specify the GRID option.
See Chapter 4, “SAS/QC Graphics,” for more information about ODS Graphics and other methods for
producing charts.
306 F Chapter 6: The CAPABILITY Procedure
data Machines;
input position @@;
label position='Position in Millimeters';
if (_n_ <= 100) then Machine = 'Machine 1';
else if (_n_ <= 200) then Machine = 'Machine 2';
else Machine = 'Machine 3';
datalines;
-0.17 -0.19 -0.24 -0.24 -0.12 0.07 -0.61 0.22 1.91 -0.08
-0.59 0.05 -0.38 0.82 -0.14 0.32 0.12 -0.02 0.26 0.19
-0.07 0.13 -0.49 0.07 0.65 0.94 -0.51 -0.61 -0.57 -0.51
0.01 -0.51 0.07 -0.16 -0.32 -0.42 -0.42 -0.34 -0.34 -0.35
-0.49 0.11 -0.42 0.76 0.02 -0.59 -0.28 1.12 -0.02 -0.60
-0.64 0.13 -0.32 -0.77 -0.02 -0.07 -0.49 -0.53 -0.22 0.61
-0.23 0.02 0.53 0.23 -0.44 -0.05 0.37 -0.42 0.70 -0.35
0.58 0.46 0.58 0.92 0.70 0.81 0.07 0.33 0.82 0.62
0.48 0.41 0.78 0.58 0.43 0.07 0.27 0.49 0.79 0.92
0.79 0.66 0.22 0.71 0.53 0.57 0.90 0.48 1.17 1.03
;
Distinct specification limits for the three machines are provided in a data set named speclims.
data speclims;
input Machine $9. _lsl_ _usl_;
_var_ = 'position';
datalines;
Machine 1 -0.5 0.5
Machine 2 0.0 1.0
Machine 3 0.0 1.0
;
The following statements create a comparative histogram for the measurements in Machines that displays the
specification limits in speclims.
The INSET statement is used to inset the sample mean and standard deviation for each machine in the
corresponding tile. The MIDPOINTS= option specifies the midpoints of the histogram bins. Kernel density
estimates are displayed using the KERNEL option. The curve areas outside the specification limits are filled
using the CLEFT and CRIGHT options in the SPEC statement, and the area between the limits is filled using
the CFILL= option in COMPHISTOGRAM statement.
308 F Chapter 6: The CAPABILITY Procedure
proc format ;
value mytime 1 = '1992'
2 = '1993' ;
data disk;
input @1 supplier $10. year width;
label width = 'Opening Width (inches)';
format year mytime.;
datalines;
Supplier A 1 1.8932
Supplier A 1 1.8952
. . .
. . .
Supplier B 1 1.8980
Supplier B 1 1.8986
Supplier A 2 1.8978
Supplier A 2 1.8966
. . .
. . .
Supplier B 2 1.8967
Supplier B 2 1.8997
;
The following statements create the comparative histogram in Output 6.7.1:
end;
Supplier = 'Supplier B';
avg = 1.8983 ;
std = 0.0024 ;
do i = 1 to 260;
Width = avg + std * rannor(15535); output;
end;
Year = 2;
Supplier = 'Supplier A';
avg = 1.8970 ;
std = 0.0013 ;
do i = 1 to 260;
Width = avg + std * rannor(15535); output;
end;
Supplier = 'Supplier B';
avg = 1.8980 ;
std = 0.0013 ;
do i = 1 to 260;
Width = avg + std * rannor(15535); output;
end;
run;
The CLASSKEY= option specifies the key cell as the observations for which Supplier is equal to ‘SUPPLIER
A’ and Year is equal to 2. This cell determines the binning for the other cells, and (because the NOKEYMOVE
option is not specified) the columns are interchanged so that this cell is displayed in the upper left corner.
Note that if the CLASSKEY= option were not specified, the default key cell would be the observations
for which Supplier is equal to ‘SUPPLIER A’ and Year is equal to 1. If the CLASSKEY= option were not
specified (or if the NOKEYMOVE option were specified), the column labeled 1992 would be displayed to
the left of the column labeled 1993. See the entry for the CLASSKEY= option on page 296 for details.
The VAXIS= option specifies the tick mark labels for the vertical axis, while NROWS=2 and NCOLS=2
specify a 2 2 arrangement for the tiles. The INSET statement is used to display the capability index Cpk
for each cell. Output 6.7.1 provides evidence that both suppliers have reduced variability from 1992 to 1993.
HISTOGRAM Statement: CAPABILITY Procedure F 311
save histogram intervals and parameters of fitted distributions in output data sets
create comparative histograms by using the HISTOGRAM statement together with a CLASS statement
You have three alternatives for producing histograms with the HISTOGRAM statement:
ODS Graphics output is produced if ODS Graphics is enabled, for example by specifying the ODS
GRAPHICS ON statement prior to the PROC statement.
Legacy line printer charts are produced when you specify the LINEPRINTER option in the PROC
statement.
See Chapter 4, “SAS/QC Graphics,” for more information about producing these different kinds of graphs.
312 F Chapter 6: The CAPABILITY Procedure
data Trans;
input Thick @@;
label Thick='Plating Thickness (mils)';
datalines;
3.468 3.428 3.509 3.516 3.461 3.492 3.478 3.556 3.482 3.512
3.490 3.467 3.498 3.519 3.504 3.469 3.497 3.495 3.518 3.523
3.458 3.478 3.443 3.500 3.449 3.525 3.461 3.489 3.514 3.470
3.561 3.506 3.444 3.479 3.524 3.531 3.501 3.495 3.443 3.458
3.481 3.497 3.461 3.513 3.528 3.496 3.533 3.450 3.516 3.476
3.512 3.550 3.441 3.541 3.569 3.531 3.468 3.564 3.522 3.520
3.505 3.523 3.475 3.470 3.457 3.536 3.528 3.477 3.536 3.491
3.510 3.461 3.431 3.502 3.491 3.506 3.439 3.513 3.496 3.539
3.469 3.481 3.515 3.535 3.460 3.575 3.488 3.515 3.484 3.482
3.517 3.483 3.467 3.467 3.502 3.471 3.516 3.474 3.500 3.466
;
The following statements create the histogram shown in Figure 6.8:
parameters for the normal curve. The normal parameters and are estimated by the sample mean
(O D 3:49533) and the sample standard deviation (O D 0:032117).
goodness-of-fit tests based on the empirical distribution function (EDF): the Anderson-Darling, Cramér–
von Mises, and Kolmogorov-Smirnov tests. The p-values for these tests are greater than the usual
cutoff values of 0.05 and 0.10, indicating that the thicknesses are normally distributed.
a chi-square goodness-of-fit test. The p-value of 0.223 for this test indicates that the thicknesses are
normally distributed. In general EDF tests (when available) are preferable to chi-square tests. See the
section “EDF Goodness-of-Fit Tests” on page 364 for details.
For details, including formulas for the goodness-of-fit tests, see “Printed Output” on page 362. Note that the
NOPRINT option in the PROC CAPABILITY statement suppresses only the printed output with summary
316 F Chapter 6: The CAPABILITY Procedure
statistics for the variable Thick. To suppress the printed output in Figure 6.9, specify the NOPRINT option
enclosed in parentheses after the NORMAL option as in “Customizing a Histogram” on page 316.
The NORMAL option is one of many options that you can specify in the HISTOGRAM statement. See
the section “Syntax: HISTOGRAM Statement” on page 318 for a complete list of options or the section
“Dictionary of Options” on page 326 for detailed descriptions of options.
Customizing a Histogram
N OTE : See Histogram with Fitted Normal Curve in the SAS/QC Sample Library.
This example is a continuation of the preceding example. The following statements show how you can use
HISTOGRAM statement options and INSET statements to customize a histogram:
The MIDPOINTS= option specifies a list of values to use as bin midpoints. The VSCALE=COUNT option
requests a vertical axis scaled in counts rather than percents. The INSET statements inset the specification
limits and summary statistics. The NOSPECLEGEND option suppress the default legend for the specification
limits that is shown in Figure 6.8.
For more information about HISTOGRAM statement options, see the section “Dictionary of Options” on
page 326. For details on the INSET statement, see “INSET Statement: CAPABILITY Procedure” on
page 398.
318 F Chapter 6: The CAPABILITY Procedure
You can specify the keyword HIST as an alias for HISTOGRAM. You can use any number of HISTOGRAM
statements after a PROC CAPABILITY statement. The components of the HISTOGRAM statement are
described as follows.
variables
are the process variables for which histograms are to be created. If you specify a VAR statement,
the variables must also be listed in the VAR statement. Otherwise, the variables can be any numeric
variables in the input data set. If you do not specify variables in a VAR statement or in the HISTOGRAM
statement, then by default, a histogram is created for each numeric variable in the DATA= data set. If
you use a VAR statement and do not specify any variables in the HISTOGRAM statement, then by
default, a histogram is created for each variable listed in the VAR statement.
For example, suppose a data set named steel contains exactly two numeric variables named length and
width. The following statements create two histograms, one for length and one for width:
The following statements also create histograms for length and width:
options
add features to the histogram. Specify all options after the slash (/) in the HISTOGRAM statement.
For example, in the following statements, the NORMAL option displays a fitted normal curve on the
histogram, the MIDPOINTS= option specifies midpoints for the histogram, and the CTEXT= option
specifies the color of the text:
Syntax: HISTOGRAM Statement F 319
Summary of Options
The following tables list the HISTOGRAM statement options by function. For detailed descriptions, see
“Dictionary of Options” on page 326.
Option Description
BETA(beta-options) Fits beta distribution with threshold
parameter , scale parameter , and
shape parameters ˛ and ˇ
EXPONENTIAL(exponential-options) Fits exponential distribution with
threshold parameter and scale
parameter
GAMMA(gamma-options) Fits gamma distribution with threshold
parameter , scale parameter , and
shape parameter ˛
GUMBEL(Gumbel-options) Plots Gumbel distribution with location
parameter and scale parameter
IGAUSS(iGauss-options) Plots inverse Gaussian distribution with
mean and shape parameter
LOGNORMAL(lognormal-options) Fits lognormal distribution with threshold
parameter , scale parameter , and shape
parameter
NORMAL(normal-options) Fits normal distribution with mean and
standard deviation
PARETO(Pareto-options) Plots Pareto distribution with threshold
parameter , scale parameter , and
shape parameter ˛
POWER(power-options) Plots power function distribution with
threshold parameter , scale parameter ,
and shape parameter ˛
RAYLEIGH(Rayleigh-options) Plots Rayleigh distribution with threshold
parameter and scale parameter
SB(SB-options) Fits Johnson SB distribution with
threshold parameter , scale parameter ,
and shape parameters ı and
320 F Chapter 6: The CAPABILITY Procedure
Option Description
SU(SU-options) Fits Johnson SU distribution with
location parameter , scale parameter ,
and shape parameters ı and
WEIBULL(Weibull-options) Fits Weibull distribution with threshold
parameter , scale parameter , and
shape parameter c
Table 6.24 lists secondary options that specify parameters for fitted parametric distributions and that control
the display of fitted curves. Specify these secondary options in parentheses after the distribution keyword.
For example, the following statements fit a normal curve by using the NORMAL option:
proc capability;
histogram / normal(color=red mu=10 sigma=0.5);
run;
The COLOR= normal-option draws the curve in red, and the MU= and SIGMA= normal-options specify the
parameters D 10 and D 0:5 for the curve. Note that the sample mean and sample standard deviation are
used to estimate and , respectively, when the MU= and SIGMA= options are not specified.
You can specify lists of values for distribution parameters to display more than one fitted curve from the same
distribution family on a histogram. Option values are matched by list position. You can specify the value
EST in a list of distribution parameter values to use an estimate of the parameter.
For example, the following code displays two normal curves on a histogram:
proc capability;
histogram / normal(color=(red blue) mu=10 est sigma=0.5 est);
run;
The first curve is red, with D 10 and D 0:5. The second curve is blue, with equal to the sample mean
and equal to the sample standard deviation.
See the section “Formulas for Fitted Curves” on page 349 for detailed information about the families of
parametric distributions that you can fit with the HISTOGRAM statement.
Option Description
Options Used with All Parametric Distributions
COLOR= Specifies color of fitted density curve
FILL Fills area under fitted density curve
INDICES Calculates capability indices based on fitted distribution
L= Specifies line type of fitted curve
MIDPERCENTS Prints table of midpoints of histogram intervals
NOPRINT Suppresses printed output summarizing fitted curve
PERCENTS= Lists percents for which quantiles calculated from data and
quantiles estimated from fitted curve are tabulated
Syntax: HISTOGRAM Statement F 321
Option Description
SYMBOL= Specifies character used for fitted density curve in line printer plots
W= Specifies width of fitted density curve
Beta-Options
ALPHA= Specifies first shape parameter ˛ for fitted beta curve
BETA= Specifies second shape parameter ˇ for fitted beta curve
SIGMA= Specifies scale parameter for fitted beta curve
THETA= Specifies lower threshold parameter for fitted beta curve
Exponential-Options
SIGMA= Specifies scale parameter for fitted exponential curve
THETA= Specifies threshold parameter for fitted exponential curve
Gamma-Options
ALPHA= Specifies shape parameter ˛ for fitted gamma curve
ALPHADELTA= Specifies change in successive estimates of ˛ at which the
Newton-Raphson approximation of ˛O terminates
ALPHAINITIAL= Specifies initial value for ˛ in Newton-Raphson approximation of
˛O
MAXITER= Specifies maximum number of iterations in Newton-Raphson
approximation of ˛O
SIGMA= Specifies scale parameter for fitted gamma curve
THETA= Specifies threshold parameter for fitted gamma curve
Gumbel-Options
EDFNSAMPLES= Specifies number of samples for EDF goodness-of-fit simulation
EDFSEED= Specifies seed value for EDF goodness-of-fit simulation
MU= Specifies location parameter for fitted Gumbel curve
SIGMA= Specifies scale parameter for fitted Gumbel curve
IGauss-Options
EDFNSAMPLES= Specifies number of samples for EDF goodness-of-fit simulation
EDFSEED= Specifies seed value for EDF goodness-of-fit simulation
LAMBDA= Specifies shape parameter for fitted inverse Gaussian curve
MU= Specifies mean for fitted inverse Gaussian curve
Lognormal-Options
SIGMA= Specifies shape parameter for fitted lognormal curve
THETA= Specifies threshold parameter for fitted lognormal curve
ZETA= Specifies scale parameter for fitted lognormal curve
Normal-Options
MU= Specifies mean for fitted normal curve
SIGMA= Specifies standard deviation for fitted normal curve
322 F Chapter 6: The CAPABILITY Procedure
Option Description
Pareto-Options
ALPHA= Specifies shape parameter ˛ for fitted Pareto curve
EDFNSAMPLES= Specifies number of samples for EDF goodness-of-fit simulation
EDFSEED= Specifies seed value for EDF goodness-of-fit simulation
SIGMA= Specifies scale parameter for fitted Pareto curve
THETA= Specifies threshold parameter for fitted Pareto curve
Power-Options
ALPHA= Specifies shape parameter ˛ for fitted power function curve
SIGMA= Specifies scale parameter for fitted power function curve
THETA= Specifies threshold parameter for fitted power function curve
Rayleigh-Options
EDFNSAMPLES= Specifies number of samples for EDF goodness-of-fit simulation
EDFSEED= Specifies seed value for EDF goodness-of-fit simulation
SIGMA= Specifies scale parameter for fitted Rayleigh curve
THETA= Specifies threshold parameter for fitted Rayleigh curve
SB -Options
DELTA= Specifies first shape parameter ı for fitted SB curve
FITINTERVAL= Specifies z-value for method of percentiles
FITMETHOD= Specifies method of parameter estimation
FITTOLERANCE= Specifies tolerance for method of percentiles
GAMMA= Specifies second shape parameter for fitted SB curve
SIGMA= Specifies scale parameter for fitted SB curve
THETA= Specifies lower threshold parameter for fitted SB curve
SU -Options
DELTA= Specifies first shape parameter ı for fitted SU curve
FITINTERVAL= Specifies z-value for method of percentiles
FITMETHOD= Specifies method of parameter estimation
FITTOLERANCE= Specifies tolerance for method of percentiles
GAMMA= Specifies second shape parameter for fitted SU curve
OPTBOUNDRANGE= Specifies the sampling range for parameter starting values in MLE
optimization
OPTMAXITER= Specifies an iteration limit for MLE optimization
OPTMAXSTARTS= Specifies the maximum number of starting points to be used for
MLE optimization
OPTPRINT Prints an iteration history for MLE optimization
OPTSEED= Specifies a seed value for MLE optimization
OPTTOLERANCE= Specifies the optimality tolerance for MLE optimization
SIGMA= Specifies scale parameter for fitted SU curve
THETA= Specifies location parameter for fitted SU curve
Syntax: HISTOGRAM Statement F 323
Option Description
Weibull-Options
C= Specifies shape parameter c for fitted Weibull curve
CDELTA= Specifies change in successive estimates of c at which the
Newton-Raphson approximation of cO terminates
CINITIAL= Specifies initial value for c in Newton-Raphson approximation of cO
MAXITER= Specifies maximum number of iterations in Newton-Raphson
approximation of cO
SIGMA= Specifies scale parameter for fitted Weibull curve
THETA= Specifies threshold parameter for fitted Weibull curve
Option Description
KERNEL(kernel-options) Fits kernel density estimates
Specify the options listed in Table 6.26 in parentheses after the keyword KERNEL to control features of
kernel density estimates requested with the KERNEL option.
Option Description
C= Specifies standardized bandwidth parameter c for fitted kernel
density estimate
COLOR= Specifies color of the fitted kernel density curve
FILL Fills area under fitted kernel density curve
K= Specifies type of kernel function
L= Specifies line type used for fitted kernel density curve
LOWER= Specifies lower bound for fitted kernel density curve
SYMBOL= Specifies character used for fitted kernel density curve in line
printer plots
UPPER= Specifies upper bound for fitted kernel density curve
W= Specifies line width for fitted kernel density curve
General Options
Table 6.27 summarizes general options for the HISTOGRAM statement, including options for enhancing
charts and producing output data sets.
324 F Chapter 6: The CAPABILITY Procedure
Option Description
Options to Create Output Data Sets
OUTFIT= Requests information about fitted curves
OUTHISTOGRAM= Requests information about histogram intervals
OUTKERNEL= Creates a data set containing kernel density estimates
Option Description
CGRID= Specifies color for grid lines
CHREF= Specifies colors for HREF= lines
CLIPREF Draws reference lines behind histogram bars
CLIPSPEC= Clips histogram bars at specification limits
CSTATREF= Specifies colors for STATREF= lines
CTEXT= Specifies color for text
CVREF= Specifies colors for VREF= lines
DESCRIPTION= Specifies description for plot in graphics catalog
FONT= Specifies software font for text
FRONTREF Draws reference lines in front of histogram bars
GRID Creates a grid
HAXIS= Specifies AXIS statement for horizontal axis
HEIGHT= Specifies height of text used outside framed areas
HMINOR= Specifies number of horizontal minor tick marks
HOFFSET= Specifies offset for horizontal axis
HREFLABPOS= Specifies vertical position of labels for HREF= lines
INFONT= Specifies software font for text inside framed areas
INHEIGHT= Specifies height of text inside framed areas
INTERBAR= Specifies space between histogram bars
LEGEND= Identifies LEGEND statement
LGRID= Specifies a line type for grid lines
LHREF= Specifies line styles for HREF= lines
LSTATREF= Specifies line styles for STATREF= lines
LVREF= Specifies line styles for VREF= lines
MAXNBIN= Specifies maximum number of bins to display
MAXSIGMAS= Limits the number of bins that display to within a specified
number of standard deviations above and below mean of data
in key cell
MIDPOINTS= Specifies midpoints for histogram intervals
NAME= Specifies name for plot in graphics catalog
NOHLABEL Suppresses label for horizontal axis
NOVLABEL Suppresses label for vertical axis
NOVTICK Suppresses tick marks and tick mark labels for vertical axis
PFILL= Specifies pattern for filling under curve
STATREF= Specifies reference lines at values of summary statistics
STATREFLABELS= Specifies labels for STATREF= lines
STATREFSUBCHAR= Specifies substitution character for displaying statistic values
in STATREFLABELS= labels
TURNVLABELS Turns and vertically strings out characters in labels for vertical
axis
VAXIS= Specifies AXIS statement or values for vertical axis
VAXISLABEL= Specifies label for vertical axis
VMINOR= Specifies number of vertical minor tick marks
VOFFSET= Specifies length of offset at upper end of vertical axis
326 F Chapter 6: The CAPABILITY Procedure
Option Description
VREFLABPOS= Specifies horizontal position of labels for VREF= lines
WAXIS= Specifies line thickness for axes and frame
WBARLINE= Specifies line thickness for bar outlines
WGRID= Specifies line thickness for grid
Dictionary of Options
The following sections provide detailed descriptions of options specific to the HISTOGRAM statement. See
“Dictionary of Common Options: CAPABILITY Procedure” on page 550 for detailed descriptions of options
common to all the plot statements.
General Options
ALPHA=value-list
specifies the shape parameter ˛ for fitted curves requested with the BETA, GAMMA, PARETO, and
POWER options. Enclose the ALPHA= option in parentheses after the distribution keyword. If you do
not specify a value for ˛, the procedure calculates a maximum likelihood estimate. See Example 6.8.
You can specify A= as an alias for ALPHA= if you use it as a beta-option. You can specify SHAPE=
as an alias for ALPHA= if you use it as a gamma-option.
Syntax: HISTOGRAM Statement F 327
BARFILL=variable-list
specifies one or more variables whose values determine the colors of the bars in the cells of a
comparative histogram. Cells that are associated with a particular value of a variable in the variable-list
are the same color. The colors that are used are determined by the ODS style. If the HISTOGRAM
statement applies to more than one analysis variable, you can specify more than one variable in the
variable-list , and those variables are matched with analysis variables by their positions in the list.
N OTE : This option applies only when ODS Graphics is enabled.
.˛/.ˇ /
where B.˛; ˇ/ D .˛Cˇ /
and
The beta distribution is bounded below by the parameter and above by the value C . You can
specify and by using the THETA= and SIGMA= beta-options. The following statements fit a beta
distribution bounded between 50 and 75 by using maximum likelihood estimates for ˛ and ˇ:
proc capability;
histogram length / beta(theta=50 sigma=25);
run;
In general, the default values for THETA= and SIGMA= are 0 and 1, respectively. You can specify
THETA=EST and SIGMA=EST to request maximum likelihood estimates for and .
The beta distribution has two shape parameters, ˛ and ˇ. If these parameters are known, you can
specify their values with the ALPHA= and BETA= beta-options. If you do not specify values, the
procedure calculates maximum likelihood estimates for ˛ and ˇ.
328 F Chapter 6: The CAPABILITY Procedure
The BETA option can appear only once in a HISTOGRAM statement. Table 6.24 lists secondary
options you can specify with the BETA option. See Example 6.8. Also see “Formulas for Fitted Curves”
on page 349.
BETA=value-list
B=value-list
specifies the second shape parameter ˇ for beta density curves requested with the BETA option.
Enclose the BETA= option in parentheses after the BETA option. If you do not specify a value for ˇ,
the procedure calculates a maximum likelihood estimate. See Example 6.8.
C=value-list
specifies the shape parameter c for Weibull density curves requested with the WEIBULL option.
Enclose the C= option in parentheses after the WEIBULL option. If you do not specify a value for
c, the procedure calculates a maximum likelihood estimate. See Example 6.9. You can specify the
SHAPE= option as an alias for the C= option.
C=value-list | MISE
specifies the standardized bandwidth parameter c for kernel density estimates requested with the
KERNEL option. Enclose the C= option in parentheses after the KERNEL option. You can specify up
to five values to request multiple estimates. You can also specify the C=MISE option, which produces
the estimate with a bandwidth that minimizes the approximate mean integrated square error (MISE).
For example, the following statements compute three density estimates:
proc capability;
histogram length / kernel(c=0.5 1.0 mise);
run;
The first two estimates have standardized bandwidths of 0.5 and 1.0, respectively, and the third has a
bandwidth that minimizes the approximate MISE.
You can also use the C= option with the K= option, which specifies the kernel function, to compute
multiple estimates. If you specify more kernel functions than bandwidths, the last bandwidth in the
list is repeated for the remaining estimates. Likewise, if you specify more bandwidths than kernel
functions, the last kernel function is repeated for the remaining estimates. For example, the following
statements compute three density estimates:
Syntax: HISTOGRAM Statement F 329
proc capability;
histogram length / kernel(c=1 2 3 k=normal quadratic);
run;
The first uses a normal kernel and a bandwidth of 1, the second uses a quadratic kernel and a bandwidth
of 2, and the third uses a quadratic kernel and a bandwidth of 3. See Example 6.12.
If you do not specify a value for c, the bandwidth that minimizes the approximate MISE is used for all
the estimates.
CLIPCURVES
scales the vertical axis without taking fitted curves into consideration. Curves that extend above the
tallest histogram bar may be clipped. You can use this option to avoid compression of the histogram
bars due to extremely high fitted curve peaks.
DELTA=value-list
specifies the first shape parameter ı for Johnson SB and Johnson SU density curves requested with the
SB and SU options. Enclose the DELTA= option in parentheses after the SB or SU option. If you do
not specify a value for ı, the procedure calculates an estimate.
EDFNSAMPLES=value
specifies the number of simulation samples used to compute p-values for EDF goodness-of-fit statistics
for density curves requested with the GUMBEL, IGAUSS, PARETO, and RAYLEIGH options. Enclose
the EDFNSAMPLES= option in parentheses after the distribution option. The default value is 500.
EDFSEED=value
specifies an integer value used to start the pseudo-random number generator when creating simulation
samples for computing EDF goodness-of-fit statistic p-values for density curves requested with
the GUMBEL, IGAUSS, PARETO, and RAYLEIGH options. Enclose the EDFSEED= option in
parentheses after the distribution option. By default, the procedure uses a random number seed
generated from reading the time of day from the computer’s clock.
ENDPOINTS
ENDPOINTS=value-list
specifies that histogram interval endpoints, rather than midpoints, are aligned with horizontal axis tick
marks. If you specify ENDPOINTS, the number of histogram intervals is based on the number of
observations by using the method of Terrell and Scott (1985). If you specify ENDPOINTS=value-list,
the values must be listed in increasing order and must be evenly spaced. All observations in the input
data set, as well as any specification limits, must lie between the first and last values specified. The
same value-list is used for all variables.
where
330 F Chapter 6: The CAPABILITY Procedure
D threshold parameter
D scale parameter . > 0/
h D width of histogram interval
v D vertical scaling factor
and
8
< n the sample size, for VSCALE=COUNT
vD 100 for VSCALE=PERCENT
1 for VSCALE=PROPORTION
:
The parameter must be less than or equal to the minimum data value. You can specify with the
THETA= exponential-option. The default value for is zero. If you specify THETA=EST, a maximum
likelihood estimate is computed for . You can specify with the SIGMA= exponential-option. By
default, a maximum likelihood estimate is computed for . For example, the following statements fit
an exponential curve with D 10 and with a maximum likelihood estimate for :
proc capability;
histogram / exponential(theta=10 l=2 color=red);
run;
The curve is red and has a line type of 2. The EXPONENTIAL option can appear only once in a
HISTOGRAM statement. Table 6.24 lists secondary options you can specify with the EXPONENTIAL
option. See “Formulas for Fitted Curves” on page 349.
FILL
fills areas under a parametric density curve or kernel density estimate with colors and patterns. Enclose
the FILL option in parentheses after a curve option or the KERNEL option, as in the following
statements:
proc capability;
histogram length / normal(fill) cfill=green pfill=solid;
run;
Depending on the area to be filled (outside or between the specification limits), you can specify the
color and pattern with options in the SPEC statement and HISTOGRAM statement, as summarized in
the following table:
If you do not display specification limits, the CFILL= and PFILL= options specify the color and pattern
for the entire area under the curve. Solid fills are used by default if patterns are not specified. You
can specify the FILL option with only one fitted curve. For an example, see Output 6.8.1. Refer to
SAS/GRAPH: Reference for a list of available patterns and colors. If you do not specify the FILL option
but specify the options in the preceding table, the colors and patterns are applied to the corresponding
areas under the histogram.
Syntax: HISTOGRAM Statement F 331
FITINTERVAL=value
specifies the value of z for the method of percentiles when this method is used to fit a Johnson SB or
Johnson SU distribution. The FITINTERVAL= option is specified in parentheses after the SB or SU
option. The default of z is 0.524.
FITTOLERANCE=value
specifies the tolerance value for the ratio criterion when the method of percentiles is used to fit a
Johnson SB or Johnson SU distribution. The FITTOLERANCE= option is specified in parentheses
after the SB or SU option. The default value is 0.01.
where
D threshold parameter
D scale parameter . > 0/
˛ D shape parameter .˛ > 0/
h D width of histogram interval
v D vertical scaling factor
and
8
< n the sample size, for VSCALE=COUNT
vD 100 for VSCALE=PERCENT
1 for VSCALE=PROPORTION
:
The parameter for the gamma distribution must be less than the minimum data value. You can
specify with the THETA= gamma-option. The default value for is 0. If you specify THETA=EST,
a maximum likelihood estimate is computed for . In addition, the gamma distribution has a shape
parameter ˛ and a scale parameter . You can specify these parameters with the ALPHA= and
SIGMA= gamma-options. By default, maximum likelihood estimates are computed for ˛ and .
For example, the following statements fit a gamma curve with D 4 and with maximum likelihood
estimates for ˛ and :
proc capability;
histogram length / gamma(theta=4);
run;
332 F Chapter 6: The CAPABILITY Procedure
Note that the maximum likelihood estimate of ˛ is calculated iteratively using the Newton-Raphson
approximation. The ALPHADELTA=, ALPHAINITIAL=, and MAXITER= gamma-options control
the approximation.
The GAMMA option can appear only once in a HISTOGRAM statement. Table 6.24 lists secondary
options you can specify with the GAMMA option. See Example 6.9 and “Formulas for Fitted Curves”
on page 349.
GAMMA=value-list
specifies the second shape parameter for Johnson SB and Johnson SU density curves requested with
the SB and SU options. Enclose the GAMMA= option in parentheses after the SB or SU option. If
you do not specify a value for , the procedure calculates an estimate.
GRID
adds a grid to the histogram. Grid lines are horizontal lines positioned at major tick marks on the
vertical axis.
hv .x /=
.x /=
p.x/ D e exp e
where
D location parameter
D scale parameter . > 0/
h D width of histogram interval
v D vertical scaling factor
and
8
< n the sample size, for VSCALE=COUNT
vD 100 for VSCALE=PERCENT
1 for VSCALE=PROPORTION
:
You can specify values for and with the MU= and SIGMA= Gumbel-options. By default, maximum
likelihood estimates are computed for and .
The GUMBEL option can appear only once in a HISTOGRAM statement. Table 6.24 lists secondary
options you can specify with the GUMBEL option. See “Formulas for Fitted Curves” on page 349.
HANGING
HANG
requests a hanging histogram , as illustrated in Figure 6.12.
Syntax: HISTOGRAM Statement F 333
You can use the HANGING option with only one fitted density curve. A hanging histogram aligns the
tops of the histogram bars (displayed as lines) with the fitted curve. The lines are positioned at the
midpoints of the histogram bins. A hanging histogram is a goodness-of-fit diagnostic in the sense that
the closer the lines are to the horizontal axis, the better the fit. Hanging histograms are discussed by
Tukey (1977), Wainer (1974), and Velleman and Hoaglin (1981).
You can specify values for and with the MU= and LAMBDA= iGauss-options. By default, the
sample mean is used for and a maximum likelihood estimate is computed for .
The IGAUSS option can appear only once in a HISTOGRAM statement. Table 6.24 lists secondary
options you can specify with the IGAUSS option. See “Formulas for Fitted Curves” on page 349.
INDICES
requests capability indices based on the fitted distribution. Enclose the keyword INDICES in parenthe-
ses after the distribution keyword. See “Indices Using Fitted Curves” on page 367 for computational
details and see Output 6.11.2.
proc capability;
histogram length / kernel(k=quadratic);
run;
You can specify kernel functions for up to five estimates. You can also use the K= option together with
the C= option, which specifies standardized bandwidths. If you specify more kernel functions than
bandwidths, the last bandwidth in the list is repeated for the remaining estimates. Likewise, if you
specify more bandwidths than kernel functions, the last kernel function is repeated for the remaining
estimates. For example, the following statements compute three estimates with bandwidths of 0.5, 1.0,
and 1.5:
proc capability;
histogram length / kernel(c=0.5 1.0 1.5 k=normal quadratic);
run;
The first estimate uses a normal kernel, and the last two estimates use a quadratic kernel. By default, a
normal kernel is used.
Option Description
C= Specifies the smoothing parameter
COLOR= Specifies the color of the curve
FILL Specifies that the area under the curve is to be filled
K= Specifies the type of kernel function
L= Specifies the line style for the curve
LOWER= Specifies the lower bound for the curve
SYMBOL= Specifies the character used for the kernel density curve in line
printer plots
UPPER= Specifies the upper bound for the curve
W= Specifies the width of the curve
You can request multiple kernel density estimates on the same histogram by specifying a list of values
for either the C= or K= option. For more information, see the entries for these options. Also see
Output 6.6.1 and “Kernel Density Estimates” on page 360. By default, kernel density estimates are
computed using the AMISE method.
LAMBDA=value
specifies the shape parameter for fitted curves requested with the IGAUSS option. Enclose the
LAMBDA= option in parentheses after the IGAUSS distribution keyword. If you do not specify a
value for , the procedure calculates a maximum likelihood estimate.
LOGNORMAL< (lognormal-options) >
displays a fitted lognormal density curve on the histogram. The curve equation is
.log.x / /2
(
p hv exp for x >
p.x/ D 2.x / 2 2
0 for x
where
D threshold parameter
D scale parameter
D shape parameter . > 0/
h D width of histogram interval
v D vertical scaling factor
and
8
< n the sample size, for VSCALE=COUNT
vD 100 for VSCALE=PERCENT
1 for VSCALE=PROPORTION
:
Note that the lognormal distribution is also referred to as the SL distribution in the Johnson system of
distributions.
The parameter for the lognormal distribution must be less than the minimum data value. You
can specify with the THETA= lognormal-option. The default value for is zero. If you specify
THETA=EST, a maximum likelihood estimate is computed for . You can specify the parameters
and with the SIGMA= and ZETA= lognormal-options. By default, estimates of and are computed
as described in “Lognormal Distribution” on page 353. For example, the following statements fit a
lognormal distribution function with D 0 and estimates for and :
336 F Chapter 6: The CAPABILITY Procedure
proc capability;
histogram length / lognormal;
run;
The LOGNORMAL option can appear only once in a HISTOGRAM statement. Table 6.24 lists
secondary options that you can specify with the LOGNORMAL option. See Example 6.9 and
“Formulas for Fitted Curves” on page 349.
LOWER=value-list
specifies lower bounds for kernel density estimates requested with the KERNEL option. Enclose the
LOWER= option in parentheses after the KERNEL option. You can specify up to five lower bounds
for multiple kernel density estimates. If you specify more kernel estimates than lower bounds, the last
lower bound is repeated for the remaining estimates.
MAXNBIN=n
specifies the maximum number of bins to be displayed in a comparative histogram. This option is
useful in situations where the scales or ranges of the data distributions differ greatly from cell to cell.
By default, the bin size and midpoints are determined for the key cell, and then the midpoint list is
extended to accommodate the data ranges for the remaining cells. However, if the cell scales differ
considerably, the resulting number of bins may be so great that each cell histogram is scaled into a
narrow region. By limiting the number of bins with the MAXNBIN= option, you can narrow the
window about the data distribution in the key cell. Note that the MAXNBIN= option provides an
alternative to the MAXSIGMAS= option.
MAXSIGMAS=value
limits the number of bins to be displayed to a range of value standard deviations (of the data in the key
cell) above and below the mean of the data in the key cell. This option is useful in situations where the
scales or ranges of the data distributions differ greatly from cell to cell. By default, the bin size and
midpoints are determined for the key cell, and then the midpoint list is extended to accommodate the
data ranges for the remaining cells. If the cell scales differ considerably, however, the resulting number
of bins may be so great that each cell histogram is scaled into a narrow region. By limiting the number
of bins with the MAXSIGMAS= option, you narrow the window about the data distribution in the key
cell. Note that the MAXSIGMAS= option provides an alternative to the MAXNBIN= option.
MIDPERCENTS
requests a table listing the midpoints and percent of observations in each histogram interval. For
example, the following statements create the table in Figure 6.13:
proc capability;
histogram Length / midpercents;
run;
Syntax: HISTOGRAM Statement F 337
If you specify the MIDPERCENTS option in parentheses after a density estimate option, a table listing
the midpoints, observed percent of observations, and the estimated percent of the population in each
interval (estimated from the fitted distribution) is printed.
The following statements create the table shown in Figure 6.14:
proc capability;
histogram Length / gamma(theta=3 midpercents);
run;
midpoints=2 to 10 by 0.5
then all of the observations and specification limits should fall between 1.75 and 10.25. (Otherwise, a
default list of midpoints is used.) You must use evenly spaced midpoints listed in increasing order.
KEY determines the midpoints for the data in the key cell. The initial number of midpoints
is based on the number of observations in the key cell that use the method of Terrell
and Scott (1985). The procedure extends the midpoint list for the key cell in either
direction as necessary until it spans the data in the remaining cells.
UNIFORM determines the midpoints by using all the observations as if there were no cells. In
other words, the number of midpoints is based on the total sample size by using the
method of Terrell and Scott (1985).
Neither KEY nor UNIFORM apply unless you use the CLASS statement. By default, if you
use a CLASS statement, MIDPOINTS=KEY. However, if the key cell is empty then MID-
POINTS=UNIFORM. Otherwise, the procedure computes the midpoints by using the algorithm
described in Terrell and Scott (1985). The default midpoints are primarily applicable to continuous
data that are approximately normally distributed.
If you produce traditional graphics and use the MIDPOINTS= and HAXIS= options, you can use the
ORDER= option in the AXIS statement you specified with the HAXIS= option. However, for the tick
mark labels to coincide with the histogram interval midpoints, the range of the ORDER= list must
encompass the range of the MIDPOINTS= list, as illustrated in the following statements:
proc capability;
histogram length / midpoints=20 to 80 by 10
haxis=axis1;
axis1 length=6 in order=10 20 30 40 50 60 70 80 90;
run;
MIDPTAXIS=name
is an alias for the HAXIS= option.
MU=value-list
specifies the parameter for fitted curves requested with the GUMBEL, IGAUSS, and NORMAL
options. Enclose the MU= option in parentheses after the distribution keyword. For the normal and
inverse Gaussian distributions, the default value of is the sample mean. If you do not specify a value
for for the Gumbel distribution, the procedure calculates a maximum likelihood estimate.
NENDPOINTS=n
specifies the number of histogram interval endpoints and causes the endpoints, rather than interval
midpoints, to be aligned with horizontal axis tick marks.
Syntax: HISTOGRAM Statement F 339
NMIDPOINTS=n
specifies the number of histogram intervals.
NOBARS
suppresses drawing of histogram bars. This option is useful when you want to display fitted curves
only.
NOCURVELEGEND
NOCURVEL
suppresses the portion of the legend for fitted curves. If you use the INSET statement to display
information about the fitted curve on the histogram, you can use the NOCURVELEGEND option to
prevent the information about the fitted curve from being repeated in a legend at the bottom of the
histogram. See Output 6.15.1.
NOLEGEND
suppresses legends for specification limits, fitted curves, and hidden observations. See Example 6.13.
Specifying the NOLEGEND option is equivalent to specifying LEGEND=NONE.
NOPLOT
suppresses the creation of a plot. Use the NOPLOT option when you want only to print summary
statistics for a fitted density or create either an OUTFIT= or an OUTHISTOGRAM= data set. See
Example 6.11.
NOPRINT
suppresses printed output summarizing the fitted curve. Enclose the NOPRINT option in parentheses
following the distribution option. See “Customizing a Histogram” on page 316 for an example.
where
D mean
D standard deviation . > 0/
h D width of histogram interval
v D vertical scaling factor
and
8
< n the sample size, for VSCALE=COUNT
vD 100 for VSCALE=PERCENT
1 for VSCALE=PROPORTION
:
Note that the normal distribution is also referred to as the SN distribution in the Johnson system of
distributions.
You can specify values for and with the MU= and SIGMA= normal-options, as shown in the
following statements:
340 F Chapter 6: The CAPABILITY Procedure
proc capability;
histogram length / normal(mu=14 sigma=0.05);
run;
By default, the sample mean and sample standard deviation are used for and . The NORMAL
option can appear only once in a HISTOGRAM statement. Table 6.24 lists secondary options that you
can specify with the NORMAL option. See Figure 6.10 and “Formulas for Fitted Curves” on page 349.
NOSPECLEGEND
NOSPECL
suppresses the portion of the legend for specification limit reference lines. See Figure 6.11.
NOTABCONTENTS
suppresses the table of contents entries for tables produced by the HISTOGRAM statement. See
the section “ODS Tables” on page 372 for descriptions of the tables produced by the HISTOGRAM
statement.
OPTBOUNDRANGE=value
defines the sampling range for each parameter during maximum likelihood estimation for the Johnson
SU distribution. PROC CAPABILITY computes initial estimates for each parameter by using the
method of percentiles. The value determines the range of parameter values around the initial estimate
that can be sampled for local optimization starting values. The default is 100.
OPTMAXITER=value
limits the number of iterations that are used by the optimizer in maximum likelihood estimation for the
Johnson SU distribution. The default is 500.
OPTMAXSTARTS=N
defines the maximum number of starting points to be used for local optimization in maximum likelihood
estimation for the Johnson SU distribution. That is, no more than N local optimizations are used in the
multistart algorithm. The default value is 100.
OPTPRINT
prints the iteration history for the Johnson SU distribution maximum likelihood estimation.
OPTSEED=value
specifies a positive integer seed for generating random number sequences in Johnson SU distribution
maximum likelihood estimation. You can use this option to replicate results from different runs.
OPTTOLERANCE=value
specifies the tolerance for declaring optimality in maximum likelihood estimation for the Johnson SU
distribution. The default value is 1E–8.
OUTFIT=SAS-data-set
creates a SAS data set that contains parameter estimates for fitted curves and related goodness-of-fit
information. See “Output Data Sets” on page 369.
Syntax: HISTOGRAM Statement F 341
OUTHISTOGRAM=SAS-data-set
OUTHIST=SAS-data-set
creates a SAS data set that contains information about histogram intervals. Specifically, the data set
contains the midpoints of the histogram intervals, the observed percent of observations in each interval,
and the estimated percent of observations in each interval (estimated from each of the specified fitted
curves). See “Output Data Sets” on page 369.
OUTKERNEL=SAS-data-set
creates a SAS data set containing information about kernel density estimates requested with the
KERNEL option. See “OUTKERNEL= Output Data Set” on page 372 for details.
where
D threshold parameter
D scale parameter . > 0/
˛ D shape parameter
h D width of histogram interval
v D vertical scaling factor
and
8
< n the sample size, for VSCALE=COUNT
vD 100 for VSCALE=PERCENT
1 for VSCALE=PROPORTION
:
The parameter must be less than the minimum data value. You can specify with the THETA=
Pareto-option. The default value for is zero. If you specify THETA=EST, a maximum likelihood
estimate is computed for . In addition, the generalized Pareto distribution has a shape parameter ˛ and
a scale parameter . You can specify these parameters with the ALPHA= and SIGMA= Pareto-options.
By default, maximum likelihood estimates are computed for ˛ and .
The PARETO option can appear only once in a HISTOGRAM statement. Table 6.24 lists secondary
options you can specify with the PARETO option. See “Formulas for Fitted Curves” on page 349.
PCTAXIS=name|value-list
is an alias for the VAXIS= option.
PERCENTS=value-list
PERCENT=value-list
specifies a list of percents for which quantiles calculated from the data and quantiles estimated from
the fitted curve are tabulated. The percents must be between 0 and 100. Enclose the PERCENTS=
option in parentheses after the curve option. The default percents are 1, 5, 10, 25, 50, 75, 90, 95, and
99.
For example, the following statements create the table shown in Figure 6.15:
342 F Chapter 6: The CAPABILITY Procedure
proc capability;
histogram Length / lognormal(percents=1 3 5 95 97 99);
run;
Figure 6.15 Estimated and Observed Quantiles for the Lognormal Curve
The CAPABILITY Procedure
Fitted Lognormal Distribution for Length (Attachment Point Offset in mm)
where
D threshold parameter
D scale parameter . > 0/
˛ D shape parameter
h D width of histogram interval
v D vertical scaling factor
and
8
< n the sample size, for VSCALE=COUNT
vD 100 for VSCALE=PERCENT
1 for VSCALE=PROPORTION
:
The parameter must be less than or equal to the minimum data value. You can specify and with
the THETA= and the SIGMA= power-options. The default values for and are 0 and 1, respectively.
You can specify THETA=EST and SIGMA=EST to request maximum likelihood estimates for and
.
In addition, the generalized Pareto distribution has a shape parameter ˛. You can specify ˛ with the
ALPHA= power-option. By default, a maximum likelihood estimate is computed for ˛.
The POWER option can appear only once in a HISTOGRAM statement. Table 6.24 lists secondary
options you can specify with the POWER option. See “Formulas for Fitted Curves” on page 349.
Syntax: HISTOGRAM Statement F 343
where
D threshold parameter
D scale parameter . > 0/
h D width of histogram interval
v D vertical scaling factor
and
8
< n the sample size, for VSCALE=COUNT
vD 100 for VSCALE=PERCENT
1 for VSCALE=PROPORTION
:
The parameter must be less than or equal to the minimum data value. You can specify with the
THETA= Rayleigh-option. The default value for is zero. If you specify THETA=EST, a maximum
likelihood estimate is computed for . You can specify with the SIGMA= Rayleigh-option. By
default, a maximum likelihood estimate is computed for .
The RAYLEIGH option can appear only once in a HISTOGRAM statement. Table 6.24 lists secondary
options you can specify with the RAYLEIGH option. See “Formulas for Fitted Curves” on page 349.
RTINCLUDE
includes the right endpoint of each histogram interval in that interval. By default, the left endpoint is
included in the histogram interval.
where
The SB distribution is bounded below by the parameter and above by the value C . The parameter
must be less than the minimum data value. You can specify with the THETA= SB -option, or you
can request that be estimated with the THETA = EST SB -option. The default value for is zero.
The sum C must be greater than the maximum data value. The default value for is one. You
can specify with the SIGMA= SB -option, or you can request that be estimated with the SIGMA
= EST SB -option. You can specify ı with the DELTA= SB -option, and you can specify with the
GAMMA= SB -option. Note that the SB -options are given in parentheses after the SB option.
By default, the method of percentiles is used to estimate the parameters of the SB distribution.
Alternatively, you can request the method of moments or the method of maximum likelihood with the
FITMETHOD = MOMENTS or FITMETHOD = MLE options, respectively. Consider the following
example:
proc capability;
histogram length / sb;
histogram length / sb( theta=est sigma=est );
histogram length / sb( theta=0.5 sigma=8.4
delta=0.8 gamma=-0.6 );
run;
The first HISTOGRAM statement fits an SB distribution with default values of D 0 and D 1 and
with percentile-based estimates for ı and . The second HISTOGRAM statement estimates all four
parameters with the method of percentiles. The third HISTOGRAM statement displays an SB curve
with specified values for all four parameters.
The SB option can appear only once in a HISTOGRAM statement. Table 6.24 lists secondary options
you can specify with the SB option.
SIGMA=value-list
specifies the parameter for fitted curves requested with the BETA, EXPONENTIAL, GAMMA,
GUMBEL, LOGNORMAL, NORMAL, PARETO, POWER, RAYLEIGH, SB, SU, and WEIBULL
options. Enclose the SIGMA= option in parentheses after the distribution keyword. Table 6.28
summarizes the use of the SIGMA= option.
If you specify SIGMA=EST, an estimate is computed for . For syntax examples, see the entries for
the distribution options.
8
ıhv p 1
ˆ p
2 1C..x /=/2
ˆ
ˆ
< 2
p.x/ D exp 1
C ı sinh 1 x
for x >
ˆ
ˆ 2
ˆ
0 for x
:
where
You can specify the parameters with the THETA=, SIGMA=, DELTA=, and GAMMA= SU -options,
which are enclosed in parentheses after the SU option. If you do not specify these parameters, they are
estimated.
By default, the method of percentiles is used to estimate the parameters of the SU distribution.
Alternatively, you can request the method of moments or the method of maximum likelihood with the
FITMETHOD = MOMENTS or FITMETHOD = MLE options, respectively. Consider the following
example:
proc capability;
histogram length / su;
histogram length / su( theta=0.5 sigma=8.4
delta=0.8 gamma=-0.6 );
run;
346 F Chapter 6: The CAPABILITY Procedure
The first HISTOGRAM statement estimates all four parameters with the method of percentiles. The
second HISTOGRAM statement displays an SU curve with specified values for all four parameters.
The SU option can appear only once in a HISTOGRAM statement. Table 6.24 lists secondary options
you can specify with the SU option.
THETA=value-list
THRESHOLD=value-list
specifies the lower threshold parameter for curves requested with the BETA, EXPONENTIAL,
GAMMA, LOGNORMAL, PARETO, POWER, RAYLEIGH, SB, and WEIBULL options, and the
location parameter for curves requested with the SU option. Enclose the THETA= option in
parentheses after the curve option. See Example 6.8. The default value is zero. If you specify
THETA=EST, an estimate is computed for .
UPPER=value-list
specifies upper bounds for kernel density estimates requested with the KERNEL option. Enclose the
UPPER= option in parentheses after the KERNEL option. You can specify up to five upper bounds for
multiple kernel density estimates. If you specify more kernel estimates than upper bounds, the last
upper bound is repeated for the remaining estimates.
where
D threshold parameter
D scale parameter . > 0/
c D shape parameter .c > 0/
h D width of histogram interval
v D vertical scaling factor
and
8
< n the sample size, for VSCALE=COUNT
vD 100 for VSCALE=PERCENT
1 for VSCALE=PROPORTION
:
The parameter must be less than the minimum data value. You can specify with the THETA=
Weibull-option. The default value for is zero. If you specify THETA=EST, a maximum likelihood
estimate is computed for . You can specify and c with the SIGMA= and C= Weibull-options.
By default, maximum likelihood estimates are computed for c and . For example, the following
statements fit a Weibull distribution with D 15 and with maximum likelihood estimates for and c:
Syntax: HISTOGRAM Statement F 347
proc capability;
histogram length / weibull(theta=15);
run;
Note that the maximum likelihood estimate of c is calculated iteratively using the Newton-Raphson
approximation. The CDELTA=, CINITIAL=, and MAXITER= Weibull-options control the approxima-
tion.
The WEIBULL option can appear only once in a HISTOGRAM statement. Table 6.24 lists secondary
options that you can specify with the WEIBULL option. See Example 6.9 and “Formulas for Fitted
Curves” on page 349.
ZETA=value-list
specifies a value for the scale parameter for lognormal density curves requested with the LOGNOR-
MAL option. Enclose the ZETA= option in parentheses after the LOGNORMAL option. By default,
the procedure calculates a maximum likelihood estimate for . You can specify the SCALE= option as
an alias for the ZETA= option.
BMCFILL=color
specifies the fill color for a box-and-whisker plot in a bottom margin requested with the BMPLOT=
option. By default, the box-and-whisker plot is not filled.
BMCFRAME=color
specifies the color for filling the frame of a bottom margin plot requested with the BMPLOT= option.
By default, this area is not filled.
BMCOLOR=color
specifies the color of a carpet plot, or the outline color of a box-and-whisker plot, in a bottom margin
plot requested with the BMPLOT= option.
BMMARGIN=height
specifies the height in screen percentage units of a bottom margin plot requested with the BMPLOT=
option. By default, a bottom margin plot occupies 15 percent of the vertical display space.
CBARLINE=color
specifies the color of the outline of histogram bars. This option overrides the C= option in the
SYMBOL1 statement.
CFILL=color
specifies a color used to fill the bars of the histogram (or the area under a fitted curve if you also
specify the FILL option). See the entries for the FILL and PFILL= options for additional details. See
Figure 6.11 and Output 6.8.1. Refer to SAS/GRAPH: Reference for a list of colors. By default, bars are
filled with an appropriate color from the ODS style.
348 F Chapter 6: The CAPABILITY Procedure
CGRID=color
specifies the color for grid lines requested with the GRID option. By default, grid lines are the same
color as the axes. If you use CGRID=, you do not need to specify the GRID option.
CLIPREF
draws reference lines requested with the HREF= and VREF= options behind the histogram bars. By
default, reference lines are drawn in front of the histogram bars.
CLIPSPEC=CLIP | NOFILL
specifies that histogram bars are clipped at the upper and lower specification limit lines when there are
no observations outside the specification limits. The bar intersecting the lower specification limit is
clipped if there are no observations less than the lower limit; the bar intersecting the upper specification
limit is clipped if there are no observations greater than the upper limit. If you specify CLIPSPEC=CLIP,
the histogram bar is truncated at the specification limit. If you specify CLIPSPEC=NOFILL, the portion
of a filled histogram bar outside the specification limit is left unfilled. Specifying CLIPSPEC=NOFILL
when histogram bars are not filled has no effect.
CURVELEGEND=name | NONE
specifies the name of a LEGEND statement describing the legend for specification limits and fitted
curves. Specifying CURVELEGEND=NONE suppresses the legend for fitted curves; this is equivalent
to specifying the NOCURVELEGEND option.
FRONTREF
draws reference lines requested with the HREF= and VREF= options in front of the histogram bars.
When the NOGSTYLE system option is specified, reference lines are drawn behind the histogram bars
by default, and can be obscured by them.
HOFFSET=value
specifies the offset in percent screen units at both ends of the horizontal axis. Specify HOFFSET=0 to
eliminate the default offset.
INTERBAR=value
specifies the horizontal space in percent screen units between histogram bars. By default, the bars are
contiguous.
LEGEND=name | NONE
specifies the name of a LEGEND statement describing the legend for specification limit reference lines
and fitted curves. Specifying LEGEND=NONE suppresses all legend information and is equivalent to
specifying the NOLEGEND option.
LGRID=n
specifies the line type for the grid requested with the GRID option. If you use the LGRID= option, you
do not need to specify the GRID option. The default is 1, which produces a solid line.
PFILL=pattern
specifies a pattern used to fill the bars of the histograms (or the areas under a fitted curve if you also
specify the FILL option). See the entries for the CFILL= and FILL options for additional details. Refer
to SAS/GRAPH: Reference for a list of pattern values. By default, the bars and curve areas are not
filled.
Details: HISTOGRAM Statement F 349
SPECLEGEND=name | NONE
specifies the name of a LEGEND statement describing the legend for specification limits and fitted
curves. Specifying SPECLEGEND=NONE, which suppresses the portion of the legend for specification
limit references lines, is equivalent to specifying the NOSPECLEGEND option.
VOFFSET=value
specifies the offset in percent screen units at the upper end of the vertical axis.
WBARLINE=n
specifies the width of bar outlines. By default, n = 1.
WGRID=n
specifies the width of the grid lines requested with the GRID option. By default, grid lines are the
same width as the axes. If you use the WGRID= option, you do not need to specify the GRID option.
printed output
Beta Distribution
The fitted density function is
( ˛ 1
.x / . C x/ˇ 1
B.˛;ˇ / .˛Cˇ 1/
hv for < x < C
p.x/ D
0 for x or x C
.˛/.ˇ /
where B.˛; ˇ/ D .˛Cˇ /
and
8
< n the sample size, for VSCALE=COUNT
vD 100 for VSCALE=PERCENT
1 for VSCALE=PROPORTION
:
N OTE : This notation is consistent with that of other distributions that you can fit with the HISTOGRAM
statement. However, many texts, including Johnson, Kotz, and Balakrishnan (1995), write the beta density
function as
(
.x a/p 1 .b x/q 1
B.p;q/.b a/pCq 1
for a < x < b
p.x/ D
0 for x a or x b
The two notations are related as follows:
Db a
Da
˛Dp
ˇDq
The range of the beta distribution is bounded below by a threshold parameter D a and above by C D b.
If you specify a fitted beta curve by using the BETA option, must be less than the minimum data value,
and C must be greater than the maximum data value. You can specify and with the THETA= and
SIGMA= beta-options in parentheses after the keyword BETA. By default, D 1 and D 0. If you specify
THETA=EST and SIGMA=EST, maximum likelihood estimates are computed for and .
In addition, you can specify ˛ and ˇ with the ALPHA= and BETA= beta-options, respectively. By default,
the procedure calculates maximum likelihood estimates for ˛ and ˇ. For example, to fit a beta density curve
to a set of data bounded below by 32 and above by 212 with maximum likelihood estimates for ˛ and ˇ, use
the following statement:
Exponential Distribution
The fitted density function is
hv x
p.x/ D exp. . // for x
0 for x <
where
D threshold parameter
D scale parameter . > 0/
h D width of histogram interval
v D vertical scaling factor, and
8
< n the sample size, for VSCALE=COUNT
vD 100 for VSCALE=PERCENT
1 for VSCALE=PROPORTION
:
The threshold parameter must be less than or equal to the minimum data value. You can specify with the
THRESHOLD= exponential-option. By default, D 0. If you specify THETA=EST, a maximum likelihood
estimate is computed for . In addition, you can specify with the SCALE= exponential-option. By default,
the procedure calculates a maximum likelihood estimate for . Note that some authors define the scale
parameter as 1 .
The exponential distribution is a special case of both the gamma distribution (with ˛ D 1) and the Weibull
distribution (with c = 1). A related distribution is the extreme value distribution. If Y D exp. X / has an
exponential distribution, then X has an extreme value distribution.
Gamma Distribution
The fitted density function is
(
hv
.˛/
. x /˛ 1 exp. . x // for x >
p.x/ D
0 for x
where
D threshold parameter
D scale parameter . > 0/
˛ D shape parameter .˛ > 0/
h D width of histogram interval
v D vertical scaling factor, and
8
< n the sample size, for VSCALE=COUNT
vD 100 for VSCALE=PERCENT
1 for VSCALE=PROPORTION
:
The threshold parameter must be less than the minimum data value. You can specify with the THRESH-
OLD= gamma-option. By default, D 0. If you specify THETA=EST, a maximum likelihood estimate is
352 F Chapter 6: The CAPABILITY Procedure
computed for . In addition, you can specify and ˛ with the SCALE= and ALPHA= gamma-options. By
default, the procedure calculates maximum likelihood estimates for and ˛.
The gamma distributions are also referred to as Pearson Type III distributions, and they include the chi-square,
exponential, and Erlang distributions. The probability density function for the chi-square distribution is
(
1
x 2 1
2. 2 / 2
exp. x2 / for x > 0
p.x/ D
0 for x 0
Notice that this is a gamma distribution with ˛ D 2 , D 2, and D 0. The exponential distribution
is a gamma distribution with ˛ D 1, and the Erlang distribution is a gamma distribution with ˛ being a
1 ;:::;Xn /
positive integer. A related distribution is the Rayleigh distribution. If R D max.X
min.X1 ;:::;Xn /
where the Xi ’s
2
are independent variables, then log R is distributed with a distribution having a probability density
function of
( h
i 1 2
2 2 1 . 2 / x 1 exp. x2 / for x > 0
p.x/ D
0 for x 0
Gumbel Distribution
The fitted density function is
hv .x /=
.x /=
p.x/ D e exp e
where
D location parameter
D scale parameter . > 0/
h D width of histogram interval
v D vertical scaling factor, and
8
< n the sample size, for VSCALE=COUNT
vD 100 for VSCALE=PERCENT
1 for VSCALE=PROPORTION
:
You can specify and with the MU= and SIGMA= Gumbel-options, respectively. By default, the procedure
calculates maximum likelihood estimates for these parameters.
N OTE : The Gumbel distribution is also referred to as Type 1 extreme value distribution.
N OTE : The random variable X has Gumbel (Type 1 extreme value) distribution if and only if e X has Weibull
distribution and exp..X /= / has standard exponential distribution.
Details: HISTOGRAM Statement F 353
8
< n the sample size, for VSCALE=COUNT
vD 100 for VSCALE=PERCENT
1 for VSCALE=PROPORTION
:
The location parameter has to be greater then zero. You can specify with the MU= iGauss-option. In
addition, you can specify shape parameter with the LAMBDA= iGauss-option. By default, the procedure
uses the sample mean for and calculates a maximum likelihood estimate for .
N OTE : The special case where D 1 and D corresponds to the Wald distribution.
Lognormal Distribution
The fitted density function is
.log.x / /2
(
p hv exp for x >
p.x/ D 2.x / 2 2
0 for x
where
D threshold parameter
D scale parameter . 1 < < 1/
D shape parameter . > 0/
h D width of histogram interval
v D vertical scaling factor, and
8
< n the sample size, for VSCALE=COUNT
vD 100 for VSCALE=PERCENT
1 for VSCALE=PROPORTION
:
The threshold parameter must be less than the minimum data value. You can specify with the THRESH-
OLD= lognormal-option. By default, D 0. If you specify THETA=EST, a maximum likelihood estimate is
computed for . You can specify and with the SCALE= and SHAPE= lognormal-options, respectively.
By default, the procedure calculates an estimate of as
Pn
O D i D1 log .xi /
n
354 F Chapter 6: The CAPABILITY Procedure
and an estimate of as
s
Pn
i D1 .log .xi / /2
O D
n 1
N OTE : The lognormal distribution is also referred to as the SL distribution in the Johnson system of
distributions.
N OTE : This book uses to denote the shape parameter of the lognormal distribution, whereas is used to
denote the scale parameter of the beta, exponential, gamma, Gumbel, inverse Gaussian, normal, generalized
Pareto, power function, Rayleigh, and Weibull distributions. The use of to denote the lognormal shape
parameter is based on the fact that 1 .log.X / / has a standard normal distribution if X is lognormally
distributed.
Normal Distribution
The fitted density function is
hv 1 x 2
p.x/ D p exp 2. / for 1 < x < 1
2
where
D mean
D standard deviation . > 0/
h D width of histogram interval
v D vertical scaling factor, and
8
< n the sample size, for VSCALE=COUNT
vD 100 for VSCALE=PERCENT
1 for VSCALE=PROPORTION
:
You can specify and with the MU= and SIGMA= normal-options, respectively. By default, the procedure
estimates with the sample mean and with the sample standard deviation.
You can use the DATA step function PROBIT to compute normal quantiles and the DATA step function
PROBNORM to compute probabilities.
N OTE : The normal distribution is also referred to as the SN distribution in the Johnson system of distributions.
where
D threshold parameter
D scale parameter . > 0/
˛ D shape parameter
Details: HISTOGRAM Statement F 355
8
< n the sample size, for VSCALE=COUNT
vD 100 for VSCALE=PERCENT
1 for VSCALE=PROPORTION
:
The support of the distribution is x > for ˛ 0 and < x < =˛ for ˛ > 0.
N OTE : Special cases of the generalized Pareto distribution with ˛ D 0 and ˛ D 1 correspond respectively to
the exponential distribution with mean and uniform distribution on the interval .; /.
The threshold parameter must be less than the minimum data value. You can specify with the THETA=
Pareto-option. By default, D 0. You can also specify ˛ and with the ALPHA= and SIGMA= Pareto-
options,respectively. By default, the procedure calculates maximum likelihood estimates for these parameters.
N OTE : Maximum likelihood estimation of the parameters works well if ˛ < 12 , but not otherwise. In
this case the estimators are asymptotically normal and asymptotically efficient. The asymptotic normal
distribution of the maximum likelihood estimates has mean .˛; / and variance-covariance matrix
.1 ˛/2
1 .1 ˛/
:
n .1 ˛/ 2 2 .1 ˛/
there is no maximum likelihood estimator. More details on how to find maximum likelihood estimators and a
suggested algorithm can be found in Grimshaw (1993).
where
8
< n the sample size, for VSCALE=COUNT
vD 100 for VSCALE=PERCENT
1 for VSCALE=PROPORTION
:
356 F Chapter 6: The CAPABILITY Procedure
N OTE : This notation is consistent with that of other distributions that you can fit with the HISTOGRAM
statement. However, many texts, including Johnson, Kotz, and Balakrishnan (1995), write the density
function of power function distribution as
(
p x a p 1
for a < x < b
p.x/ D b a b a
0 for x a or x b
Db a
Da
˛Dp
N OTE : The family of power function distributions is a subclass of beta distribution with density function
(
/˛ 1 .C x/ˇ 1
hv .x B.˛;ˇ / .˛Cˇ 1/
for < x < C
p.x/ D
0 for x or x C
Rayleigh Distribution
The fitted density function is
(
2 2
hv x 2 e .x / =.2 / for x
p.x/ D
0 for x <
where
8
< n the sample size, for VSCALE=COUNT
vD 100 for VSCALE=PERCENT
1 for VSCALE=PROPORTION
:
Johnson SB Distribution
The fitted density function is
8 h i 1
ıhv x x
ˆ
ˆ p
1
< 2
ˆ 2
p.x/ D exp 1 x
C ı log. C / for < x < C
ˆ
ˆ 2 x
ˆ
0 for x or x C
:
where
8
< n the sample size, for VSCALE=COUNT
vD 100 for VSCALE=PERCENT
1 for VSCALE=PROPORTION
:
The SB distribution is bounded below by the parameter and above by the value C . The parameter
must be less than the minimum data value. You can specify with the THETA= SB -option, or you can
request that be estimated with the THETA = EST SB -option. The default value for is zero. The sum
C must be greater than the maximum data value. The default value for is one. You can specify with
the SIGMA= SB -option, or you can request that be estimated with the SIGMA = EST SB -option.
358 F Chapter 6: The CAPABILITY Procedure
By default, the method of percentiles given by Slifker and Shapiro (1980) is used to estimate the parameters.
This method is based on four data percentiles, denoted by x 3z , x z , xz , and x3z , which correspond to the
four equally spaced percentiles of a standard normal distribution, denoted by 3z, z, z, and 3z, under the
transformation
x
z D C ı log
C x
The default value of z is 0.524. The results of the fit are dependent on the choice of z, and you can specify
other values with the FITINTERVAL= option (specified in parentheses after the SB option). If you use the
method of percentiles, you should select a value of z that corresponds to percentiles which are critical to your
application.
The following values are computed from the data percentiles:
m D x3z xz
n D x z x 3z
p D xz x z
A tolerance interval around one is used to discriminate among the three families with this ratio criterion. You
can specify the tolerance with the FITTOLERANCE= option (specified in parentheses after the SB option).
The default tolerance is 0.01. Assuming that the criterion satisfies the inequality
mn
<1 tolerance
p2
the parameters of the SB distribution are computed using the explicit formulas derived by Slifker and Shapiro
(1980).
If you specify FITMETHOD = MOMENTS (in parentheses after the SB option) the method of moments is
used to estimate the parameters. If you specify FITMETHOD = MLE (in parentheses after the SB option) the
method of maximum likelihood is used to estimate the parameters. Note that maximum likelihood estimates
may not always exist. Refer to Bowman and Shenton (1983) for discussion of methods for fitting Johnson
distributions.
Johnson SU Distribution
The fitted density function is
8
ıhv p 1
ˆ p
2 1C..x /=/2
ˆ
ˆ
< 2
p.x/ D exp 1
C ı sinh 1 x
for x >
ˆ
ˆ 2
ˆ
0 for x
:
where
Details: HISTOGRAM Statement F 359
8
< n the sample size, for VSCALE=COUNT
vD 100 for VSCALE=PERCENT
1 for VSCALE=PROPORTION
:
You can specify the parameters with the THETA=, SIGMA=, DELTA=, and GAMMA= SU -options, which
are enclosed in parentheses after the SU option. If you do not specify these parameters, they are estimated.
By default, the method of percentiles given by Slifker and Shapiro (1980) is used to estimate the parameters.
This method is based on four data percentiles, denoted by x 3z , x z , xz , and x3z , which correspond to the
four equally spaced percentiles of a standard normal distribution, denoted by 3z, z, z, and 3z, under the
transformation
x
z D C ı sinh 1
The default value of z is 0.524. The results of the fit are dependent on the choice of z, and you can specify
other values with the FITINTERVAL= option (specified in parentheses after the SU option). If you use the
method of percentiles, you should select a value of z that corresponds to percentiles which are critical to your
application.
The following values are computed from the data percentiles:
m D x3z xz
n D x z x 3z
p D xz x z
It was demonstrated by Slifker and Shapiro (1980) that
mn
p2
> 1 for any SU distribution
mn
p2
< 1 for any SB distribution
mn
p2
D 1 for any SL (lognormal) distribution
A tolerance interval around one is used to discriminate among the three families with this ratio criterion. You
can specify the tolerance with the FITTOLERANCE= option (specified in parentheses after the SU option).
The default tolerance is 0.01. Assuming that the criterion satisfies the inequality
mn
> 1 C tolerance
p2
the parameters of the SU distribution are computed using the explicit formulas derived by Slifker and Shapiro
(1980).
If you specify FITMETHOD = MOMENTS (in parentheses after the SU option) the method of moments is
used to estimate the parameters. If you specify FITMETHOD = MLE (in parentheses after the SU option) the
method of maximum likelihood is used to estimate the parameters. Note that maximum likelihood estimates
may not always exist. Refer to Bowman and Shenton (1983) for discussion of methods for fitting Johnson
distributions.
360 F Chapter 6: The CAPABILITY Procedure
Weibull Distribution
The fitted density function is
chv x c 1
. / exp. . x /c / for x >
p.x/ D
0 for x
where
D threshold parameter
D scale parameter . > 0/
c D shape parameter .c > 0/
h D width of histogram interval
v D vertical scaling factor, and
8
< n the sample size, for VSCALE=COUNT
vD 100 for VSCALE=PERCENT
1 for VSCALE=PROPORTION
:
The threshold parameter must be less than the minimum data value. You can specify with the THRESH-
OLD= Weibull-option. By default, D 0. If you specify THETA=EST, a maximum likelihood estimate is
computed for . You can specify and c with the SCALE= and SHAPE= Weibull-options, respectively. By
default, the procedure calculates maximum likelihood estimates for and c.
The exponential distribution is a special case of the Weibull distribution where c = 1.
n
1 X x x
i
fO .x/ D K0
n
i D1
where K0 ./ is a kernel function, is the bandwidth, n is the sample size, and xi is the ith observation.
The KERNEL option provides three kernel functions (K0 ): normal, quadratic, and triangular. You can specify
the function with the K= kernel-option in parentheses after the KERNEL option. Values for the K= option
are NORMAL, QUADRATIC, and TRIANGULAR (with aliases of N, Q, and T, respectively). By default, a
normal kernel is used. The formulas for the kernel functions are
K0 .t / D p1 1 2
1<t <1
Normal
2
exp. 2t / for
3
Quadratic K0 .t / D 4 .1 t 2/ for jtj 1
Triangular K0 .t / D 1 jtj for jtj 1
Details: HISTOGRAM Statement F 361
The value of , referred to as the bandwidth parameter, determines the degree of smoothness in the estimated
density function. You specify indirectly by specifying a standardized bandwidth c with the C= kernel-option.
If Q is the interquartile range, and n is the sample size, then c is related to by the formula
1
D cQn 5
For a specific kernel function, the discrepancy between the density estimator fO .x/ and the true density f .x/
is measured by the mean integrated square error (MISE):
Z Z
MISE./ D fE.fO .x// f .x/g2 dx C var.fO .x//dx
x x
The MISE is the sum of the integrated squared bias and the variance. An approximate mean integrated square
error (AMISE) is
Z 2 Z
1 1
Z
2
AMISE./ D 4 2
t K.t /dt 00
f .x/ dx C K.t /2 dt
4 t x n t
A bandwidth that minimizes AMISE can be derived by treating f .x/ as the normal density having parameters
and estimated by the sample mean and standard deviation. If you do not specify a bandwidth parameter
or if you specify C=MISE, the bandwidth that minimizes AMISE is used. The value of AMISE can be used
to compare different density estimates. For each estimate, the bandwidth parameter c, the kernel function
type, and the value of AMISE are reported in the SAS log.
The general kernel density estimates assume that the domain of the density to estimate can take on all values
on a real line. However, sometimes the domain of a density is an interval bounded on one or both sides. For
example, if a variable Y is a measurement of only positive values, then the kernel density curve should be
bounded so that it is zero for negative Y values.
The CAPABILITY procedure uses a reflection technique to create the bounded kernel density curve, as
described in Silverman (1986, pp 30-31). It adds the reflections of kernel density that are outside the boundary
to the bounded kernel
estimates. The general form of the bounded kernel density estimator is computed by
replacing K0 x xi in the original equation with
x xi .x xl / C .xi xl / .xu x/ C .xu xi /
K0 C K0 C K0
When C=MISE is used with a bounded kernel density, the CAPABILITY procedure uses a bandwidth that
minimizes the AMISE for its corresponding unbounded kernel.
362 F Chapter 6: The CAPABILITY Procedure
Printed Output
If you request a fitted parametric distribution, printed output summarizing the fit is produced in addition to
the graphical display. Figure 6.16 shows the printed output for a fitted lognormal distribution requested by
the following statements:
Capability
Indices Based
on Lognormal
Distribution
Cp 0.795463
CPL 0.776822
CPU 0.814021
Cpk 0.776822
Cpm 0.792237
Details: HISTOGRAM Statement F 363
Parameters
Specifications
Histogram Intervals
Quantiles
Parameters
This section lists the parameters for the fitted curve as well as the estimated mean and estimated standard
deviation. See “Formulas for Fitted Curves” on page 349.
364 F Chapter 6: The CAPABILITY Procedure
m
X .Oi Ei /2
2 D
Ei
i D1
where
Note that Fn .x/ is a step function that takes a step of height n1 at each observation. This function estimates
the distribution function F .x/. At any value x, Fn .x/ is the proportion of observations less than or equal to x,
while F .x/ is the probability of an observation less than or equal to x. EDF statistics measure the discrepancy
between Fn .x/ and F .x/.
The computational formulas for the EDF statistics make use of the probability integral transformation
U D F .X/. If F .X / is the distribution function of X, the random variable U is uniformly distributed
between 0 and 1.
Given n observations X.1/ ; : : : ; X.n/ , the values U.i / D F .X.i / / are computed by applying the transformation,
as shown in the following sections.
The HISTOGRAM statement provides three EDF tests:
Kolmogorov-Smirnov
Anderson-Darling
Details: HISTOGRAM Statement F 365
Cramér–von Mises
These tests are based on various measures of the discrepancy between the empirical distribution function
Fn .x/ and the proposed parametric cumulative distribution function F .x/.
The following sections provide formal definitions of the EDF statistics.
Anderson-Darling Statistic The Anderson-Darling statistic and the Cramér–von Mises statistic be-
long to the quadratic class of EDF statistics. This class of statistics is based on the squared difference
.Fn .x/ F .x//2 . Quadratic statistics have the following general form:
Z C1
QDn .Fn .x/ F .x//2 .x/dF .x/
1
The function .x/ weights the squared difference .Fn .x/ F .x//2 .
The Anderson-Darling statistic (A2 ) is defined as
Z C1
2
A Dn .Fn .x/ F .x//2 ŒF .x/ .1 F .x// 1
dF .x/
1
1
Here the weight function is .x/ D ŒF .x/ .1 F .x// .
The Anderson-Darling statistic is computed as
n
2 1 X
A D n .2i 1/ log U.i / C .2n C 1 2i / log f1 U.i /
n
i D1
Probability Values for EDF Tests Once the EDF test statistics are computed, the associated probability
values (p-values) must be calculated.
For the Gumbel, inverse Gaussian, generalized Pareto, and Rayleigh distributions, the procedure computes
associated probability values (p-values) by resampling from the estimated distribution. It generates k random
samples of size n, where k is specified by the EDFNSAMPLES= option and n is the number of observations
in the original data. EDF test statistics are computed for each sample, and the p-value is the proportion of
samples whose EDF statistic is greater than or equal to the statistic computed for the original data. You can
use the EDFSEED= option to specify a seed value for generating the sample values.
For the beta, exponential, gamma, lognormal, normal, power function, and Weibull distributions, the
CAPABILITY procedure uses internal tables of probability levels similar to those given by D’Agostino and
Stephens (1986). If the value is between two probability levels, then linear interpolation is used to estimate
the probability value. The probability value depends upon the parameters that are known and the parameters
that are estimated for the distribution you are fitting. Table 6.29 summarizes different combinations of
estimated parameters for which EDF tests are available.
Specifications
This section is included in the summary only if you provide specification limits, and it tabulates the limits as
well as the observed percentages and estimated percentages outside the limits.
The estimated percentages are computed only if fitted distributions are requested and are based on the
probability that an observed value exceeds the specification limits, assuming the fitted distribution. The
observed percentages are the percents of observations outside the specification limits.
P0:5 LSL
CPL D
P0:5 P0:00135
USL P0:5
CPU D
P0:99865 P0:5
USL LSL
Cp D
P0:99865 P0:00135
P0:5 LSL USL P0:5
Cpk D min ;
P0:5 P0:00135 P0:99865 P0:5
368 F Chapter 6: The CAPABILITY Procedure
ˇ1 ˇ
ˇ .USL C LSL/ P0:5 ˇ
2
K D2
USL LSL
T LSL USL T
min P0:5 P0:00135 ; P0:99865 P0:5
Cpm D r 2
T
1C
If the data are normally distributed, these formulas reduce to the formulas for the standard capability indices,
which are given in the section “Standard Capability Indices” on page 246.
The following guidelines apply to the use of generalized capability indices requested with the INDICES
option:
When you choose the family of parametric distributions for the fitted curve, consider whether an
appropriate family can be derived from assumptions about the process.
Whenever possible, examine the data distribution with a histogram, probability plot, or quantile-quantile
plot.
Apply goodness-of-fit tests to assess how well the parametric distribution models the data.
Consider whether a generalized index has a meaningful practical interpretation in your application.
At the time of this writing, there is ongoing research concerning the application of generalized capability
indices, and it is important to note that other approaches can be used with nonnormal data:
Transform the data to normality, then compute and report standard capability indices on the transformed
scale.
Report the proportion of nonconforming output estimated from the fitted distribution.
If it is not possible to adequately model the data distribution with a parametric density, smooth the data
distribution with a kernel density estimate and simply report the proportion of nonconforming output.
Histogram Intervals
This section is included in the summary only if you specify the MIDPERCENTS option in parentheses after
the distribution option, as in the statements that produce Figure 6.16. This table lists the interval midpoints
along with the observed and estimated percentages of the observations that lie in the interval. The estimated
percentages are based on the fitted distribution.
In addition, you can specify the MIDPERCENTS option to request a table of interval midpoints with the
observed percent of observations that lie in the interval. See the entry for the MIDPERCENTS option on
page 336.
Details: HISTOGRAM Statement F 369
Quantiles
This table lists observed and estimated quantiles. You can use the PERCENTS= option to specify the list of
quantiles to appear in this list. The list in Figure 6.16 is the default list. See the entry for the PERCENTS=
option on page 341.
Variable Description
_ADASQ_ Anderson-Darling EDF goodness-of-fit statistic
_ADP_ p-value for Anderson-Darling EDF goodness-of-fit test
_CHISQ_ Chi-square goodness-of-fit statistic
_CP_ Generalized capability index Cp based on the fitted curve
_CPK_ Generalized capability index Cpk based on the fitted curve
_CPL_ Generalized capability index CPL based on the fitted curve
_CPM_ Generalized capability index Cpm based on the fitted curve
_CPU_ Generalized capability index CPU based on the fitted curve
_CURVE_ Name of fitted distribution (abbreviated to 8 characters)
_CVMWSQ_ Cramér–von Mises EDF goodness-of-fit statistic
_CVMP_ p-value for Cramér–von Mises EDF goodness-of-fit test
_DF_ Degrees of freedom for chi-square goodness-of-fit test
_ESTGTR_ Estimated percent of population greater than upper specification
limit
_ESTLSS_ Estimated percent of population less than lower specification limit
_ESTSTD_ Estimated standard deviation
_EXPECT_ Estimated mean
_K_ Generalized capability index K based on the fitted curve
_KSD_ Kolmogorov-Smirnov EDF goodness-of-fit statistic
_KSP_ p-value for Kolmogorov-Smirnov EDF goodness-of-fit test
370 F Chapter 6: The CAPABILITY Procedure
Variable Description
_LOCATN_ Location parameter for fitted distribution. For the Gumbel, inverse
Gaussian, and normal distributions, this is either the value of
specified with the MU= option or the value estimated by the
procedure. For all other distributions, this is either the value
specified or estimated according to the THETA= option, or zero.
_LSL_ Lower specification limit
_MAXPT1_ Upper endpoint of first interval used to calculate the value of the
chi-square statistic.
_MAXPTN_ Upper endpoint of last interval used to calculate the value of the
chi-square statistic.
_MIDPT1_ Midpoint of first interval used to calculate the value of the
chi-square statistic. This is the leftmost interval that contains at
least one value of the variable.
_MIDPTN_ Midpoint of last interval used to calculate the value of the
chi-square statistic. This is the rightmost interval that contains at
least one value of the variable.
_MINPT1_ Lower endpoint of first interval used to calculate the value of the
chi-square statistic.
_MINPTN_ Lower endpoint of last interval used to calculate the value of the
chi-square statistic.
_OBSGTR_ Observed percent of data greater than upper specification limit
_OBSLSS_ Observed percent of data less than the lower specification limit
_PCHISQ_ p-value for chi-square goodness-of-fit test
_SCALE_ Value of scale parameter for fitted distribution. For the lognormal
distribution, this is the value of specified or estimated according
to the ZETA= option. For all other distributions, this is the value
specified or estimated according to the SIGMA= option.
_SHAPE1_ Value of shape parameter for fitted distribution. For the beta,
gamma, generalized Pareto, and power function distributions, this
is the value of ˛, either specified with the ALPHA= option or
estimated by the procedure. For the lognormal distribution, this is
the value of , either specified with the SIGMA= option or
estimated by the procedure. For the Weibull distribution, this is the
value of c, either specified with the C= option or estimated by the
procedure. For the Johnson SB and SU distributions, this is the
value of ı, either specified with the DELTA= option or estimated
by the procedure. For distributions without a shape parameter
(Gumbel, normal, exponential, and Rayleigh distributions),
_SHAPE1_ is set to missing.
Details: HISTOGRAM Statement F 371
Variable Description
_SHAPE2_ Value of shape parameter for fitted distribution. For the beta
distribution, this is the value of ˇ, either specified with the BETA=
option or estimated by the procedure. For the Johnson SB and SU
distributions, this is the value of , either specified with the
GAMMA= option or estimated by the procedure. For all other
distributions, _SHAPE2_ is set to missing.
_TARGET_ Target value
_USL_ Upper specification limit
_VAR_ Variable name
_WIDTH_ Width of histogram interval
Variable Description
_COUNT_ Number of variable values in histogram interval
_CURVE_ Name of fitted distribution (if requested in HISTOGRAM
statement)
_EXPPCT_ Estimated percent of population in histogram interval determined
from optional fitted distribution
_MAXPT_ Upper endpoint of histogram interval
_MIDPT_ Midpoint of histogram interval
_MINPT_ Lower endpoint of histogram interval
_OBSPCT_ Percent of variable values in histogram interval
_VAR_ Variable name
372 F Chapter 6: The CAPABILITY Procedure
Variable Description
_C_ Standardized bandwidth parameter
_COUNT_ Kernel density scaled for VSCALE=COUNT
_DENSITY_ Kernel density
_PERCENT_ Kernel density scaled for VSCALE=PERCENT (default)
_PROPORTION_ Kernel density scaled for VSCALE=PROPORTION
_TYPE_ Kernel function
_VALUE_ Variable value at which kernel function is calculated
_VAR_ Variable name
ODS Tables
Table 6.33 summarizes the ODS tables related to fitted distributions that you can request with the HIS-
TOGRAM statement.
ODS Graphics
Before you create ODS Graphics output, ODS Graphics must be enabled (for example, by using the ODS
GRAPHICS ON statement). For more information about enabling and disabling ODS Graphics, see the
section “Enabling and Disabling ODS Graphics” (Chapter 24, SAS/STAT User’s Guide).
The appearance of a graph produced with ODS Graphics is determined by the style associated with the ODS
destination where the graph is produced. HISTOGRAM options used to control the appearance of traditional
graphics are ignored for ODS Graphics output.
When ODS Graphics is in effect, the HISTOGRAM statement assigns a name to the graph it creates. You
can use this name to reference the graph when using ODS. The name is listed in Table 6.34.
See Chapter 4, “SAS/QC Graphics,” for more information about ODS Graphics and other methods for
producing charts.
data Measures;
input Length @@;
label Length = 'Attachment Point Offset in mm';
datalines;
10.147 10.070 10.032 10.042 10.102
10.034 10.143 10.278 10.114 10.127
10.122 10.018 10.271 10.293 10.136
10.240 10.205 10.186 10.186 10.080
10.158 10.114 10.018 10.201 10.065
10.061 10.133 10.153 10.201 10.109
10.122 10.139 10.090 10.136 10.066
Examples: HISTOGRAM Statement F 377
The following statements create a histogram with a fitted beta density curve:
The FILL beta-option specifies that the area under the curve is to be filled with the CFILL= color. (If FILL
were omitted, the CFILL= color would be used to fill the histogram bars instead.) The CRIGHT= option in
the SPEC statement specifies the color under the curve to the right of the upper specification limit. If the
CRIGHT= option were not specified, the entire area under the curve would be filled with the CFILL= color.
When a lower specification limit is available, you can use the CLEFT= option in the SPEC statement to
specify the color under the curve to the left of this limit.
The HREF= option draws a reference line at the lower bound, and the HREFLABEL= option adds the label
Lower Bound. The option LHREF=2 specifies a dashed line type. The INSET statement adds an inset with
the sample size and the p-value for a chi-square goodness-of-fit test.
In addition to displaying the beta curve, the BETA option summarizes the curve fit, as shown in Output 6.8.2.
The output tabulates the parameters for the curve, the chi-square goodness-of-fit test whose p-value is shown
in Output 6.8.1, the observed and estimated percents above the upper specification limit, and the observed
and estimated quantiles. For instance, based on the beta model, the percent of offsets greater than the upper
specification limit is 6.6%. For computational details, see the section “Formulas for Fitted Curves” on
page 349.
Examples: HISTOGRAM Statement F 379
Percent Outside
Specifications for Beta
Distribution
Upper Limit
USL 10.250000
Obs Pct > USL 8.000000
Est Pct > USL 6.618103
the output of a welding process assumed to be in statistical control. The lower and upper specification limits
for the gap are 0.3 cm and 0.8 cm, respectively. The measurements are saved in a data set named Plates.
data Plates;
label Gap='Plate Gap in cm';
input Gap @@;
datalines;
0.746 0.357 0.376 0.327 0.485 1.741 0.241 0.777 0.768 0.409
0.252 0.512 0.534 1.656 0.742 0.378 0.714 1.121 0.597 0.231
0.541 0.805 0.682 0.418 0.506 0.501 0.247 0.922 0.880 0.344
0.519 1.302 0.275 0.601 0.388 0.450 0.845 0.319 0.486 0.529
1.547 0.690 0.676 0.314 0.736 0.643 0.483 0.352 0.636 1.080
;
The following statements fit three distributions (lognormal, Weibull, and gamma) and display their density
curves on a single histogram:
The LOGNORMAL, WEIBULL, and GAMMA options also produce the summaries for the fitted distributions
shown in Output 6.9.2, Output 6.9.3, and Output 6.9.4.
Output 6.9.2 provides four goodness-of-fit tests for the lognormal distribution: the chi-square test and three
tests based on the EDF (Anderson-Darling, Cramér–von Mises, and Kolmogorov-Smirnov). See “Chi-Square
Goodness-of-Fit Test” on page 364 and “EDF Goodness-of-Fit Tests” on page 364 for more information.
The EDF tests are superior to the chi-square test because they are not dependent on the set of midpoints used
for the histogram.
At the ˛ D 0:10 significance level, all four tests support the conclusion that the two-parameter lognormal
distribution with scale parameter O D 0:58, and shape parameter O D 0:50 provides a good model for the
distribution of plate gaps.
Examples: HISTOGRAM Statement F 383
Output 6.9.3 provides two EDF goodness-of-fit tests for the Weibull distribution: the Anderson-Darling
and the Cramér–von Mises tests. (See Table 6.29 for a complete list of the EDF tests available in the
HISTOGRAM statement.) The probability values for the chi-square and EDF tests are all less than 0.10,
indicating that the data do not support a Weibull model.
384 F Chapter 6: The CAPABILITY Procedure
Output 6.9.4 provides four goodness-of-fit tests for the gamma distribution. The probability value for the
chi-square test is less than 0.10, indicating that the data do not support a gamma model.
Based on this analysis, the fitted lognormal distribution is the best model for the distribution of plate gaps.
You can use this distribution to calculate useful quantities. For instance, you can compute the probability that
the gap of a randomly sampled plate exceeds the upper specification limit, as follows:
D 1 ˆ 1 .log.USL / /
Examples: HISTOGRAM Statement F 385
where Z has a standard normal distribution, and ˆ./ is the standard normal cumulative distribution function.
Note that ˆ./ can be computed with the DATA step function PROBNORM. In this example, USL = 0.8 and
PrŒgap > 0:8 D 0:2352. This value is expressed as a percent (Est Pct > USL) in Output 6.9.2.
data Plates;
label Gap='Plate Gap in cm';
input Gap @@;
datalines;
0.746 0.357 0.376 0.327 0.485 1.741 0.241 0.777 0.768 0.409
0.252 0.512 0.534 1.656 0.742 0.378 0.714 1.121 0.597 0.231
0.541 0.805 0.682 0.418 0.506 0.501 0.247 0.922 0.880 0.344
0.519 1.302 0.275 0.601 0.388 0.450 0.845 0.319 0.486 0.529
1.547 0.690 0.676 0.314 0.736 0.643 0.483 0.352 0.636 1.080
;
A summary of the lognormal fit is shown in Output 6.10.2. The p-value for the chi-square goodness-of-fit
test is 0.082. Because this value is less than 0.10 (a typical cutoff level), the conclusion is that the lognormal
distribution is not an appropriate model for the data. This is the opposite conclusion drawn from the chi-
square test in Example 6.9, which is based on a different set of midpoints and has a p-value of 0.2756 (see
Output 6.9.2). Moreover, the results of the EDF goodness-of-fit tests are the same because these tests do not
depend on the midpoints. When available, the EDF tests provide more powerful alternatives to the chi-square
test. For a thorough discussion of EDF tests, refer to D’Agostino and Stephens (1986).
Examples: HISTOGRAM Statement F 387
data Plates;
label Gap='Plate Gap in cm';
input Gap @@;
datalines;
0.746 0.357 0.376 0.327 0.485 1.741 0.241 0.777 0.768 0.409
0.252 0.512 0.534 1.656 0.742 0.378 0.714 1.121 0.597 0.231
0.541 0.805 0.682 0.418 0.506 0.501 0.247 0.922 0.880 0.344
0.519 1.302 0.275 0.601 0.388 0.450 0.845 0.319 0.486 0.529
1.547 0.690 0.676 0.314 0.736 0.643 0.483 0.352 0.636 1.080
;
The CHECKINDICES option in the PROC statement requests a goodness-of-fit test for normality in conjunc-
tion with the indices and displays the warning that normality is rejected at the significance level ˛ D 0:05.
Example 6.9 concluded that the fitted lognormal distribution summarized in Output 6.9.2 is a good model, so
one might consider computing generalized capability indices based on this distribution. These indices are
requested with the INDICES option and are shown in Output 6.11.2. Formulas and recommendations for
these indices are given in “Indices Using Fitted Curves” on page 367.
Capability
Indices Based
on Lognormal
Distribution
Cp 0.210804
CPL 0.595156
CPU 0.124927
Cpk 0.124927
data Channel;
length Lot $ 16;
input Length @@;
select;
when (_n_ <= 425) Lot='Lot 1';
when (_n_ >= 926) Lot='Lot 3';
otherwise Lot='Lot 2';
end;
datalines;
0.91 1.01 0.95 1.13 1.12 0.86 0.96 1.17 1.36 1.10
0.98 1.27 1.13 0.92 1.15 1.26 1.14 0.88 1.03 1.00
0.98 0.94 1.09 0.92 1.10 0.95 1.05 1.05 1.11 1.15
Examples: HISTOGRAM Statement F 389
1.11 0.98 0.78 1.09 0.94 1.05 0.89 1.16 0.88 1.19
1.01 1.08 1.19 0.94 0.92 1.27 0.90 0.88 1.38 1.02
2.13 2.05 1.90 2.07 2.15 1.96 2.15 1.89 2.15 2.04
1.95 1.93 2.22 1.74 1.91
;
When you use kernel density estimates to explore a data distribution, you should try several choices for the
bandwidth parameter c because this determines the smoothness and closeness of the fit. You can specify a
list of C= values with the KERNEL option to request multiple density estimates, as shown in the following
statements:
Output 6.12.1 reveals strong trimodality in the data, which are explored further in “Creating a One-Way
Comparative Histogram” on page 286.
data Plastic;
label Strength='Strength in psi';
input Strength @@;
datalines;
30.26 31.23 71.96 47.39 33.93 76.15 42.21
81.37 78.48 72.65 61.63 34.90 24.83 68.93
43.27 41.76 57.24 23.80 34.03 33.38 21.87
31.29 32.48 51.54 44.06 42.66 47.98 33.73
25.80 29.95 60.89 55.33 39.44 34.50 73.51
43.41 54.67 99.43 50.76 48.81 31.86 33.88
35.57 60.41 54.92 35.66 59.30 41.96 45.32
;
The following statements use the LOGNORMAL option in the HISTOGRAM statement to display the fitted
three-parameter lognormal curve shown in Output 6.13.1:
data Assembly;
label Offset = 'Offset (in mm)';
input Offset @@;
datalines;
11.11 13.07 11.42 3.92 11.08 5.40 11.22 14.69 6.27 9.76
9.18 5.07 3.51 16.65 14.10 9.69 16.61 5.67 2.89 8.13
9.97 3.28 13.03 13.78 3.13 9.53 4.58 7.94 13.51 11.43
11.98 3.90 7.67 4.32 12.69 6.17 11.48 2.82 20.42 1.01
3.18 6.02 6.63 1.72 2.42 11.32 16.49 1.22 9.13 3.34
1.29 1.70 0.65 2.62 2.04 11.08 18.85 11.94 8.34 2.07
0.31 8.91 13.62 14.94 4.83 16.84 7.09 3.37 0.49 15.19
5.16 4.14 1.92 12.70 1.97 2.10 9.38 3.18 4.18 7.22
15.84 10.85 2.35 1.93 9.19 1.39 11.40 12.20 16.07 9.23
Examples: HISTOGRAM Statement F 393
0.05 2.15 1.95 4.39 0.48 10.16 4.81 8.28 5.68 22.81
0.23 0.38 12.71 0.06 10.11 18.38 5.53 9.36 9.32 3.63
12.93 10.39 2.05 15.49 8.12 9.52 7.77 10.70 6.37 1.91
8.60 22.22 1.74 5.84 12.90 13.06 5.08 2.09 6.41 1.40
15.60 2.36 3.97 6.17 0.62 8.56 9.36 10.19 7.16 2.37
12.91 0.95 0.89 3.82 7.86 5.33 12.92 2.64 7.92 14.06
;
The assembly process is in statistical control, and it is decided to fit a folded normal distribution to the offset
measurements. A variable X has a folded normal distribution if X D jY j, where Y is distributed as N.; /.
The fitted density is
.x /2 .x C /2
1
h.x/ D p exp C exp ; x0
2 2 2 2 2
You can use SAS/IML software to compute preliminary estimates of and based on a method of moments
given by Elandt (1961). These estimates are computed by solving equation (19) of Elandt (1961), which is
given by
2
2
p2 e =2 Œ1 2ˆ. /
2
f ./ D DA
1 C 2
where ˆ./ is the standard normal distribution function, and
xN 2
AD 1 P n 2
n i D1 xi
Begin by using the MEANS procedure to compute the first and second moments and using the DATA step to
compute the constant A.
proc iml;
use Stat;
read all var {m2} into m2;
read all var {a} into a;
read all var {min} into min;
emu = xr[1];
esig = sqrt(xr[2]);
etheta = emu/esig;
create Parmest var{emu esig etheta};
append;
close Parmest;
quit;
Now, using O 0 and O 0 as initial estimates, call the NLPDD subroutine to maximize the log likelihood, l.; /,
of the folded normal distribution, where, up to a constant,
n
/2 .xi C /2
X .xi
l.; / D n log C log exp C exp
2 2 2 2
i D1
close parmest;
quit;
To annotate the curve on a histogram, begin by computing the width and endpoints of the histogram intervals.
The following statements save these values in an OUTFIT= data set called OUT. Note that a plot is not
produced at this point.
_VAR_ _CURVE_ _LOCATN_ _SCALE_ _CHISQ_ _DF_ _PCHISQ_ _MIDPT1_ _WIDTH_ _MIDPTN_
Offset NORMAL 7.62 5.24 31.17 5 0 1.5 3 22.5
The following statements create an annotate data set named Anno, which contains the coordinates of the
fitted curve:
Examples: HISTOGRAM Statement F 397
data Anno;
merge Parmest Out;
length function color $ 8;
function = 'point';
color = 'black';
size = 2;
xsys = '2';
ysys = '2';
when = 'a';
constant = 39.894*_width_;;
left = _midpt1_ - .5*_width_;
right = _midptn_ + .5*_width_;
inc = (right-left)/100;
do x = left to right by inc;
z1 = (x-emu)/esig;
z2 = (x+emu)/esig;
y = (constant/esig)*(exp(-0.5*z1*z1)+exp(-0.5*z2*z2));
output;
function = 'draw';
end;
run;
The following statements read the ANNOTATE= data set and display the histogram and fitted curve, as
shown in Output 6.14.4:
specify graphical enhancements, such as background colors, text colors, text height, text font, and drop
shadows
The INSET statement is not applicable when you produce line printer plots by specifying the LINEPRINTER
option in the PROC CAPABILITY statement.
data Wire;
label Strength='Torsion Strength in lb/in';
input Strength @@;
datalines;
25 25 36 31 26 36 29 37 37 20
34 27 21 35 30 41 33 21 26 26
19 25 14 32 30 29 31 26 22 24
34 33 28 26 43 30 40 32 32 31
25 26 27 34 33 27 33 29 30 31
;
A histogram is used to examine the data distribution. For a more complete report, the sample size, minimum
value, maximum value, mean, and standard deviation are displayed on the histogram. The following
statements illustrate how to inset these statistics:
A complete list of keywords that you can use with the INSET statement is provided in “Summary of INSET
Keywords” on page 405. Note that the set of keywords available for a particular display depends on both the
plot statement that precedes the INSET statement and the options that you specify in the plot statement.
The following examples illustrate options commonly used for enhancing the appearance of an inset.
The ODS GRAPHICS ON statement specified before the PROC CAPABILITY statement enables ODS
Graphics, so the histogram is created using ODS Graphics instead of traditional graphics.
The resulting histogram is displayed in Figure 6.18. You can provide your own label by specifying the
keyword for that statistic followed by an equal sign (=) and the label in quotes. The label can have up to 24
characters.
The format 5.2 specified in parentheses after the keyword STD displays the standard deviation with a field
width of five and two decimal places. In general, you can specify any numeric SAS format in parentheses
after an inset keyword. You can also specify a format to be used for all the statistics in the INSET statement
with the FORMAT= option (see the next section, “Adding a Header and Positioning the Inset” on page 402).
For more information about SAS formats, refer to SAS Formats and Informats: Reference.
Note that if you specify both a label and a format for a statistic, the label must appear before the format, as
with the keyword STD in the previous statements.
You can use any number of INSET statements in the CAPABILITY procedure. Each INSET statement pro-
duces an inset and must follow one of the plot statements: CDFPLOT, COMPHISTOGRAM, HISTOGRAM,
PPPLOT, PROBPLOT, or QQPLOT. The inset appears in all displays produced by the plot statement that
immediately precedes it. The statistics are displayed in the order in which they are specified. For example,
the following statements produce a cumulative distribution plot with two insets and a histogram with one
inset:
The statistics displayed in an inset are computed for a specific process variable from observations for the
current BY group. For example, in the following statements, there are two process variables (Strength and
Diameter) and a BY variable (Batch). If there are three different batches (levels of Batch), then a total of six
histograms are produced. The statistics in each inset are computed for a particular variable and batch. The
labels in the inset are the same for each histogram.
keyword-list
can include any of the keywords listed in “Summary of INSET Keywords” on page 405. Some
keywords allow secondary keywords to be specified in parentheses immediately after the primary
keyword. Also, some inset statistics are available only if you request plot statements and options for
which those statistics are calculated. For example, consider the following statements:
The keywords MEAN and STD display the sample mean and standard deviation of Strength. The
primary keyword NORMAL with the secondary keywords AD and ADPVAL display the Anderson-
Darling goodness-of-fit test statistic and p-value in the inset as well. The statistics specified with the
NORMAL keyword are available only because a normal distribution has been fit to the data by using
the NORMAL option in the HISTOGRAM statement. See the section “Summary of INSET Keywords”
for a list of available keywords.
Typically, you specify keywords, to display statistics computed by the CAPABILITY procedure.
However, you can also specify the keyword DATA= followed by the name of a SAS data set to display
customized statistics. This data set must contain two variables:
a character variable named _LABEL_ whose values provide labels for inset entries.
a variable named _VALUE_, which can be either character or numeric, and whose values provide
values for inset entries.
The label and value from each observation in the DATA= data set occupy one line in the inset. The
position of the DATA= keyword in the keyword list determines the position of its lines in the inset.
By default, inset statistics are identified with appropriate labels, and numeric values are printed using
appropriate formats. However, you can provide customized labels and formats. You provide the
customized label by specifying the keyword for that statistic followed by an equal sign (=) and the label
in quotes. Labels can have up to 24 characters. You provide the numeric format in parentheses after
the keyword. Note that if you specify both a label and a format for a statistic, the label must appear
before the format. For an example, see “Formatting Values and Customizing Labels” on page 400.
Syntax: INSET Statement F 405
options
appear after the slash (/) and control the appearance of the inset. For example, the following INSET
statement uses two appearance options (POSITION= and CTEXT=):
The POSITION= option determines the location of the inset, and the CTEXT= option specifies the
color of the text of the inset.
See “Summary of Options” on page 414 for a list of all available options, and “Dictionary of Options”
on page 415 for detailed descriptions. Note the difference between keywords and options; keywords
specify the information to be displayed in an inset, whereas options control the appearance of the inset.
Keyword Description
CSS Corrected sum of squares
CV Coefficient of variation
GEOMEAN Geometric mean
HARMEAN Harmonic mean
KURTOSIS | KURT Kurtosis
MAX Largest value
MEAN Sample mean
MIN Smallest value
MODE Most frequent value
N Sample size
NEXCL Number of observations excluded by MAXNBIN= or
MAXSIGMAS= option
NMISS Number of missing values
NOBS Number of observations
RANGE Range
SKEWNESS | SKEW Skewness
STD | STDDEV Standard deviation
STDMEAN | STDERR Standard error of the mean
SUM Sum of the observations
SUMWGT Sum of the weights
USS Uncorrected sum of squares
VAR Variance
406 F Chapter 6: The CAPABILITY Procedure
Keyword Description
P1 1st percentile
P5 5th percentile
P10 10th percentile
Q1 | P25 Lower quartile (25th percentile)
MEDIAN | Q2 | P50 Median (50th percentile)
Q3 | P75 Upper quartile (75th percentile)
P90 90th percentile
P95 95th percentile
P99 99th percentile
QRANGE Interquartile range (Q3–Q1)
Table 6.40 lists keywords for distribution-free confidence limits for percentiles requested with the CIPCTLDF
option.
Keyword Description
P1_LCL_DF 1st percentile lower confidence limit
P1_UCL_DF 1st percentile upper confidence limit
P5_LCL_DF 5th percentile lower confidence limit
P5_UCL_DF 5th percentile upper confidence limit
P10_LCL_DF 10th percentile lower confidence limit
P10_UCL_DF 10th percentile upper confidence limit
Q1_LCL_DF | P25_LCL_DF Lower quartile (25th percentile) lower confidence
limit
Q1_UCL_DF | P25_UCL_DF Lower quartile (25th percentile) upper confidence
limit
MEDIAN_LCL_DF | Q2_LCL_DF | Median (50th percentile) lower confidence limit
P50_LCL_DF
MEDIAN_UCL_DF | Q2_UCL_DF | Median (50th percentile) upper confidence limit
P50_UCL_DF
Q3_LCL_DF | P75_LCL_DF Upper quartile (75th percentile) lower confidence limit
Q3_UCL_DF | P75_UCL_DF Upper quartile (75th percentile) upper confidence limit
P90_LCL_DF 90th percentile lower confidence limit
P90_UCL_DF 90th percentile upper confidence limit
P95_LCL_DF 95th percentile lower confidence limit
P95_UCL_DF 95th percentile upper confidence limit
P99_LCL_DF 99th percentile lower confidence limit
P99_UCL_DF 99th percentile upper confidence limit
Table 6.41 lists keywords for percentile confidence limits computed assuming normality requested with the
Syntax: INSET Statement F 407
CIPCTLNORMAL option.
Keyword Description
P1_LCL 1st percentile lower confidence limit
P1_UCL 1st percentile upper confidence limit
P5_LCL 5th percentile lower confidence limit
P5_UCL 5th percentile upper confidence limit
P10_LCL 10th percentile lower confidence limit
P10_UCL 10th percentile upper confidence limit
Q1_LCL | P25_LCL Lower quartile (25th percentile) lower confidence limit
Q1_UCL | P25_UCL Lower quartile (25th percentile) upper confidence limit
MEDIAN_LCL | Q2_LCL | Median (50th percentile) lower confidence limit
P50_LCL
MEDIAN_UCL | Q2_UCL | Median (50th percentile) upper confidence limit
P50_UCL
Q3_LCL | P75_LCL Upper quartile (75th percentile) lower confidence limit
Q3_UCL | P75_UCL Upper quartile (75th percentile) upper confidence limit
P90_LCL 90th percentile lower confidence limit
P90_UCL 90th percentile upper confidence limit
P95_LCL 95th percentile lower confidence limit
P95_UCL 95th percentile upper confidence limit
P99_LCL 99th percentile lower confidence limit
P99_UCL 99th percentile upper confidence limit
Keyword Description
GINI Gini’s mean difference
MAD Median absolute difference about the median
QN Qn , alternative to MAD
SN Sn , alternative to MAD
STD_GINI Gini’s standard deviation
STD_MAD MAD standard deviation
STD_QN Qn standard deviation
STD_QRANGE Interquartile range standard deviation
STD_SN Sn standard deviation
408 F Chapter 6: The CAPABILITY Procedure
Keyword Description
MSIGN Sign statistic
NORMALTEST Test statistic for normality
PNORMAL Probability value for the test of normality
SIGNRANK Signed rank statistic
PROBM Probability of greater absolute value for the sign statistic
PROBN Probability value for the test of normality
PROBS Probability value for the signed rank test
PROBT Probability value for the Student’s t test
T Statistics for Student’s t test
Keyword Description
DATA= (label, value) pairs from input data set
Keyword Description
CP Capability index Cp
CPLCL Lower confidence limit for Cp
CPUCL Upper confidence limit for Cp
CPK Capability index Cpk
CPKLCL Lower confidence limit for Cpk
CPKUCL Upper confidence limit for Cpk
CPL Capability index CPL
CPM Capability index Cpm
CPMLCL Lower confidence limit for Cpm
CPMUCL Upper confidence interval for Cpm
CPU Capability index CPU
K Capability index K
Keyword Description
LSL Lower specification limit
USL Upper specification limit
TARGET Target value
PCTGTR Percent of nonmissing observations that exceed the
upper specification limit
Syntax: INSET Statement F 409
Keyword Description
PCTLSS Percent of nonmissing observations that are less than
the lower specification limit
PCTBET Percent of nonmissing observations between the upper
and lower specification limits (inclusive)
Table 6.48 lists the secondary keywords available with each distribution keyword listed in Table 6.47. In
many cases, aliases can be used (for example, ALPHA in place of SHAPE1).
Secondary
Keyword Alias Description
Secondary Keywords Available with the BETA Keyword
ALPHA SHAPE1 First shape parameter ˛
BETA SHAPE2 Second shape parameter ˇ
SIGMA SCALE Scale parameter
THETA THRESHOLD Lower threshold parameter
MEAN Mean of the fitted distribution
STD Standard deviation of the fitted distribution
Secondary
Keyword Alias Description
Secondary Keywords Available with the NORMAL Keyword
MU MEAN Mean parameter
SIGMA STD Scale parameter
Secondary
Keyword Alias Description
THETA THRESHOLD Threshold parameter
MEAN Mean of the fitted distribution
STD Standard deviation of the fitted distribution
The secondary keywords listed in Table 6.49 can be used with any distribution keyword but only with the
HISTOGRAM and COMPHISTOGRAM plot statements.
Secondary
Keyword Description
CP Capability index Cp
CPK Capability index Cpk
CPL Capability index CPL
CPM Capability index Cpm
CPU Capability index CPU
ESTPCTLSS Estimated percentage less than the lower specification limit
ESTPCTGTR Estimated percentage greater than the upper specification limit
K Capability index K
The secondary keywords listed in Table 6.50 can be used with any distribution keyword but only with the
HISTOGRAM plot statement (see Example 6.15).
Secondary
Keyword Description
CHISQ Chi-square statistic
DF Degrees of freedom for the chi-square test
PCHISQ Probability value for the chi-square test
AD Anderson-Darling EDF test statistic
ADPVAL Anderson-Darling EDF test p-value
CVM Cramér–von Mises EDF test statistic
Syntax: INSET Statement F 413
Secondary
Keyword Description
CVMPVAL Cramér–von Mises EDF test p-value
KSD Kolmogorov-Smirnov EDF test statistic
KSDPVAL Kolmogorov-Smirnov EDF test p-value
Table 6.51 lists primary keywords available only with the HISTOGRAM and COMPHISTOGRAM plot
statements. These keywords display fill areas on a histogram. If you fit a parametric density on a histogram
and request that the area under the curve be filled, these keywords display the percentage of the distribution
area that lies below the lower specification limit, between the specification limits, or above the upper
specification limit. If you do not fill the area beneath a parametric density estimate, these keywords display
the observed proportion of observations (that is, the area in the bars of the histogram).
You should use these options with the FILL, CFILL=, and PFILL= options in the HISTOGRAM and
COMPHISTOGRAM statements and with the CLEFT=, CRIGHT=, PLEFT=, and PRIGHT= options in the
SPEC statements. See Output 6.16.1 for an example.
Keyword Description
KERNEL Displays statistics for all kernel estimates
KERNELn Displays statistics for only the nth kernel density estimate
n D 1; 2; 3; 4; or 5
Summary of Options
The following table lists the INSET statement options. See the section “Dictionary of Options” for complete
descriptions of the options.
Option Description
CFILL= Specifies color of inset background
CFILLH= Specifies color of header background
CFRAME= Specifies color of frame
CHEADER= Specifies color of header text
CSHADOW= Specifies color of drop shadow
CTEXT= Specifies color of inset text
DATA Specifies data units for POSITION=.x; y/
coordinates
Syntax: INSET Statement F 415
Option Description
FONT= Specifies font of text
FORMAT= Specifies format of values in inset
GUTTER= Specifies gutter width for inset in top or
bottom margin
HEADER= Specifies header text
HEIGHT= Specifies height of inset text
NCOLS= Specifies number of columns for inset in top
or bottom margin
NOFRAME Suppresses frame around inset
POSITION= Specifies position of inset
REFPOINT= Specifies reference point of inset positioned
with POSITION=.x; y/ coordinates
Dictionary of Options
The following sections provide detailed descriptions of options for the INSET statement. Terms used in this
section are illustrated in Figure 6.20.
General Options
You can specify the following general options:
DATA
specifies that data coordinates are to be used in positioning the inset with the POSITION= option.
The DATA option is available only when you specify POSITIOND .x; y/, and it must be placed
immediately after the coordinates .x; y/. For details, see the entry for the POSITION= option or
“Positioning the Inset Using Coordinates” on page 420. See Figure 6.23 for an example.
416 F Chapter 6: The CAPABILITY Procedure
FORMAT=format
specifies a format for all the values displayed in an inset. If you specify a format for a particular statistic,
then this format overrides the format you specified with the FORMAT= option. See Figure 6.19 or
Output 6.15.1 for an example.
GUTTER=value
specifies the gutter width in percent screen units for an inset located in the top or bottom margin of
ODS Graphics output. The gutter is the space between columns of (label, value) pairs in an inset. The
default value is four. This option is ignored if ODS Graphics is disabled.
HEADER= ‘string’
specifies the header text. The string cannot exceed 40 characters. If you do not specify the HEADER=
option, no header line appears in the inset. If all the keywords listed in the INSET statement are
secondary keywords corresponding to a fitted curve on a histogram, a default header is displayed that
indicates the distribution and identifies the curve. See Figure 6.19 for an example of a specified header
and Output 6.15.1 for an example of the default header for a fitted normal curve.
NCOLS=n
specifies the number of columns of (label, value) pairs displayed in an inset located in the top or bottom
margin of ODS Graphics output. The default value is three. This option is ignored if ODS Graphics is
disabled.
NOFRAME
suppresses the frame drawn around the text.
POSITION=position
POS=position
determines the position of the inset. The position can be a compass point keyword, a margin keyword,
or a pair of coordinates .x; y/. You can specify coordinates in axis percent units or axis data units. For
more information, see “Details: INSET Statement” on page 418. By default, POSITION=NW, which
positions the inset in the upper left (northwest) corner of the display.
N OTE : In this release of the CAPABILITY procedure, you cannot specify coordinates with the
POSITION= option when producing ODS Graphics output.
CFILL=color | BLANK
specifies the color of the background (including the header background if you do not specify the
CFILLH= option). See Output 6.15.1 for an example.
If you do not specify the CFILL= option, then by default, the background is empty. This means that
items that overlap the inset (such as curves, histogram bars, or specification limits) show through the
inset. If you specify any value for the CFILL= option, then overlapping items no longer show through
the inset. Specify CFILL=BLANK to leave the background uncolored and also to prevent items from
showing through the inset.
Syntax: INSET Statement F 417
CFILLH=color
specifies the color of the header background. By default, if you do not specify a CFILLH= color, the
CFILL= color is used.
CFRAME=color
specifies the color of the frame. By default, the frame is the same color as the axis of the plot.
CHEADER=color
specifies the color of the header text. By default, if you do not specify a CHEADER= color, the
CTEXT= color is used.
CSHADOW=color
CS=color
specifies the color of the drop shadow. See Output 6.16.1 for an example. By default, if you do not
specify the CSHADOW= option, a drop shadow is not displayed.
CTEXT=color
CT=color
specifies the color of the text. By default, the inset text color is the same as the other text on the plot.
FONT=font
specifies the font of the text. By default, the font is SIMPLEX if the inset is located in the interior of
the plot, and the font is the same as the other text displayed on the plot if the inset is located in the
exterior of the plot.
HEIGHT=value
specifies the height of the text.
REFPOINT=BR | BL | TR | TL
RP=BR | BL | TR | TL
specifies the reference point for an inset that is positioned by a pair of coordinates with the POSITION=
option. Use the REFPOINT= option with POSITION= coordinates. The REFPOINT= option specifies
which corner of the inset frame you want positioned at coordinates .x; y/. The keywords BL, BR, TL,
and TR represent bottom left, bottom right, top left, and top right, respectively. See Figure 6.24 for an
example. The default is REFPOINT=BL.
If you specify the position of the inset as a compass point or margin keyword, the REFPOINT= option
is ignored. For more information, see “Positioning the Inset Using Coordinates” on page 420.
418 F Chapter 6: The CAPABILITY Procedure
compass points
For an example of an inset placed in the right margin, see Figure 6.19. Margin positions are recommended if
a large number of statistics are listed in the INSET statement. If you attempt to display a lengthy inset in the
interior of the plot, it is likely that the inset will collide with the data display.
The histogram is displayed in Output 6.16.1. The LSLPCT keyword in the INSET statement requests a
legend for the area under the curve to the left of the lower specification limit. The CLEFT= option is used to
fill the area under the normal curve to the left of the line, and the CFILL= color is used to fill the remaining
area. If the FILL normal-option were not specified, the CLEFT= and CFILL= colors would be applied to the
corresponding areas under the histogram, not the normal curve, and the inset box would reflect the area under
the histogram bars.
You can use the USLPCT keyword in the INSET statement to request a legend for the area to the right of
an upper specification limit, and you can use the BETWEENPCT keyword to request a legend for the area
between the lower and upper limits. By default, the legend requested with each of the keywords LSLPCT,
USLPCT, and BETWEENPCT displays a rectangle that matches the color of the corresponding area. You
can substitute a customized label for each rectangle by specifying the keyword followed by an equal sign (=)
and the label in quotes.
intervals that are computed assuming the data are sampled from a normal population—see Hahn and
Meeker (1991) for a detailed discussion of these intervals:
intervals that are computed without any assumption about the distribution of the population—see
Krishnamoorthy and Mathew (2009) for a detailed discussion of these intervals.
– nonparametric statistical tolerance intervals that contain at least a specified proportion of the
population
data Cans;
label Weight = "Fluid Weight (ounces)";
input Weight @@;
datalines;
12.07 12.02 12.00 12.01 11.98 11.96 12.04 12.05 12.01 11.97
12.03 12.03 12.00 12.04 11.96 12.02 12.06 12.00 12.02 11.91
12.05 11.98 11.91 12.01 12.06 12.02 12.05 11.90 12.07 11.98
12.02 12.11 12.00 11.99 11.95 11.98 12.05 12.00 12.10 12.04
12.06 12.04 11.99 12.06 11.99 12.07 11.96 11.97 12.00 11.97
12.09 11.99 11.95 11.99 11.99 11.96 11.94 12.03 12.09 12.03
11.99 12.00 12.05 12.04 12.05 12.01 11.97 11.93 12.00 11.97
12.13 12.07 12.00 11.96 11.99 11.97 12.05 11.94 11.99 12.02
11.95 11.99 11.91 12.06 12.03 12.06 12.05 12.04 12.03 11.98
12.05 12.05 12.11 11.96 12.00 11.96 11.96 12.00 12.01 11.98
;
Note that this data set is introduced in “Computing Descriptive Statistics” on page 207 of “PROC CAPABIL-
ITY and General Statements” on page 205. The analysis in that section provides evidence that the weight
measurements are normally distributed.
By default, the INTERVALS statement computes and prints the six intervals described in the entry for the
METHODS= option. The following statements tabulate these intervals for the variable Weight:
Approximate Prediction
Interval Containing All of k
Future Observations
Prediction
Confidence k Limits
99.00% 1 11.89 12.13
99.00% 2 11.87 12.14
99.00% 3 11.87 12.15
95.00% 1 11.92 12.10
95.00% 2 11.90 12.12
95.00% 3 11.89 12.12
90.00% 1 11.93 12.09
90.00% 2 11.92 12.10
90.00% 3 11.91 12.11
Prediction Interval
Containing the Mean of k
Future Observations
Prediction
Confidence k Limits
99.00% 1 11.89 12.13
99.00% 2 11.92 12.10
99.00% 3 11.94 12.08
95.00% 1 11.92 12.10
95.00% 2 11.94 12.08
95.00% 3 11.95 12.06
90.00% 1 11.93 12.09
90.00% 2 11.95 12.06
90.00% 3 11.96 12.05
Confidence Limits
Containing the Mean
Confidence
Confidence Limits
99.00% 11.997 12.022
95.00% 12.000 12.019
90.00% 12.002 12.017
Confidence Limits
Containing the Standard
Deviation
Confidence
Confidence Limits
99.00% 0.040 0.057
95.00% 0.041 0.055
90.00% 0.042 0.053
You can specify INTERVAL as an alias for INTERVALS. You can use any number of INTERVALS statements
in the CAPABILITY procedure. The components of the INTERVALS statement are described as follows.
variables
gives a list of variables for which to compute intervals. If you specify a VAR statement, the variables
must also be listed in the VAR statement. Otherwise, the variables can be any numeric variable in the
input data set. If you do not specify a list of variables, then by default the INTERVALS statement
computes intervals for all variables in the VAR statement (or all numeric variables in the input data set
if you do not use a VAR statement).
430 F Chapter 6: The CAPABILITY Procedure
options
alter the defaults for computing and printing intervals and for creating output data sets.
Summary of Options
The following tables list the INTERVALS statement options by function. For complete descriptions, see
“Dictionary of Options” on page 430.
Option Description
ALPHA= Specifies probability or confidence levels associated with the
intervals
K= Specifies values of k for prediction intervals
METHODS= Specifies which intervals are computed
NOPRINT Suppresses the output tables
OUTINTERVALS= Specifies an output data set containing interval information
P= Specifies values of p for tolerance intervals
TYPE= Specifies the type of intervals (one-sided lower, one-sided upper, or
two-sided)
Dictionary of Options
The following entries provide detailed descriptions of options in the INTERVALS statement.
ALPHA=value-list
specifies values of ˛, the probability or confidence associated with the interval. For example, the
following statements tabulate the default intervals at probability or confidence levels of ˛ D 0:05,
˛ D 0:10, ˛ D 0:15, and ˛ D 0:20:
Note that some references use D 1 ˛ to denote probability or confidence levels. Values for the
ALPHA= option must be between 0.00001 to 0.99999. By default, values of 0.01, 0.05, and 0.10 are
used.
K=value-list
specifies values of k for prediction intervals. Default values of 1, 2, and 3 are used for the prediction
interval for k future observations and for the prediction interval for the mean of k future observations.
Default values of 2 and 3 are used for the prediction interval for the standard deviation of k future
observations. The values must be integers.
Syntax: INTERVALS Statement F 431
METHODS=indices
METHOD=indices
specifies which intervals are computed. The indices can range from 1 to 7, and they correspond to the
intervals described in Table 6.56.
Option Description
1 Approximate simultaneous prediction interval for k future observations
2 Prediction interval for the mean of k future observations
3 Statistical tolerance interval that contains at least proportion p of the
population
4 Confidence interval for the population mean
5 Prediction interval for the standard deviation of k future observations
6 Confidence interval for the population standard deviation
7 Nonparametric statistical tolerance interval that contains at least proportion
p of the population
For example, the following statements tabulate confidence limits for the population mean
(METHOD=4) and confidence limits for the population standard deviation (METHOD=6):
Formulas for the intervals are given in “Methods for Computing Statistical Intervals” on page 432. By
default, the procedure computes the first six intervals, for which it assumes that the data are sampled
from a normal population.
NOPRINT
suppresses the tables produced by default. This option is useful when you only want to save the interval
information in an OUTINTERVALS= data set.
OUTINTERVALS=SAS-data-set
OUTINTERVAL=SAS-data-set
OUTINT=SAS-data-set
specifies an output SAS data set containing the intervals and related information. For example, the
following statements create a data set named ints containing intervals for the variable width:
P=value-list
specifies values of p for the tolerance intervals. These values must be between 0.00001 to 0.99999.
Note that the P= option applies only to the tolerance intervals (a value of 3 or 7, or both, for the
METHODS= option). By default, values of 0.90, 0.95, and 0.99 are used.
Notation Definition
n Number of nonmissing values for a variable
XN Mean of variable
s Standard deviation of variable
z˛ 100˛th percentile of the standard normal distribution
t˛ ./ 100˛th percentile of the central t distribution with degrees of freedom
t˛0 .ı; / 100˛th percentile of the noncentral t distribution with noncentrality
parameter ı and degrees of freedom
F˛ .1 ; 2 / 100˛th percentile of the F distribution with 1 degrees of freedom in
the numerator and 2 degrees of freedom in the denominator
2˛ ./ 100˛th percentile of the 2 distribution with degrees of freedom
2˛ .ı; / 100˛th percentile of the noncentral 2 distribution with noncentrality
parameter ı and degrees of freedom
X.i / ith order statistic from a sample X D .X1 ; X2 ; : : : ; Xn /
The values of the variable are assumed to be independent and normally distributed. The intervals are
computed using the degrees of freedom as the divisor for the standard deviation s. This divisor corresponds
to the default of VARDEF=DF in the PROC CAPABILITY statement. If you specify another value for the
VARDEF= option, intervals are not computed.
You select the intervals to be computed with the METHODS= option. The next six sections give computational
details for each of the METHODS= options.
Details: INTERVALS Statement F 433
METHODS=1
This requests an approximate simultaneous prediction interval for k future observations. Two-sided intervals
are computed using the conservative approximations
q
1
Lower Limit D XN t1 ˛
2k
.n 1/s 1 C n
q
1
Upper Limit D XN C t1 ˛
2k
.n 1/s 1 C n
q
1
Lower Limit D XN t1 ˛
k
.n 1/s 1 C n
q
1
Upper Limit D XN C t1 ˛
k
.n 1/s 1 C n
Hahn (1970b) states that these approximations are satisfactory except for combinations of small n, large k,
and large ˛. Refer also to Hahn (1969, 1970a) and Hahn and Meeker (1991).
METHODS=2
This requests a prediction interval for the mean of k future observations. Two-sided intervals are computed as
q
1 1
Lower Limit D XN t1 ˛
2
.n 1/s k
C n
q
1 1
Upper Limit D XN C t1 ˛
2
.n 1/s k
C n
q
1 1
Lower Limit D XN t1 ˛ .n 1/s k
C n
q
1 1
Upper Limit D XN C t1 ˛ .n 1/s k
C n
METHODS=3
This requests a statistical tolerance interval that contains at least proportion p of the population. Two-sided
intervals are computed as
Lower Limit D XN ks
Upper Limit D XN C ks
where k is the solution of the integral equation
!
1 1/2p .z 2 ; 1/
r
2n .n
Z
1 2
P 2n 1 > e 2 nz dz D 1 ˛
0 k2
434 F Chapter 6: The CAPABILITY Procedure
METHODS=4
This requests a confidence interval for the population mean. Two-sided intervals are computed as
Lower Limit D XN t1 ˛
2
.n 1/ psn
Upper Limit D XN C t1 ˛
2
.n 1/ psn
One-sided limits are computed as
METHODS=5
This requests a prediction interval for the standard deviation of k future observations. Two-sided intervals are
computed as
1
2
Lower Limit D s F1 ˛
2
.n 1; k 1/
12
Upper Limit D s F1 ˛
2
.k 1; n 1/
One-sided limits are computed as
1
Lower Limit D s .F1 ˛ .n 1; k 1// 2
1
Upper Limit D s .F1 ˛ .k 1; n 1// 2
METHODS=6
This requests a confidence interval for the population standard deviation. Two-sided intervals are computed
as
r
n 1
Lower Limit D s
21 ˛ .n 1/
2
r
n 1
Upper Limit D s
2˛ .n 1/
2
Details: INTERVALS Statement F 435
q
n 1
Lower Limit D s
21 ˛ .n 1/
q
n 1
Upper Limit D s
2˛ .n 1/
METHODS=7
This requests a nonparametric statistical tolerance interval that contains at least proportion p of the population.
Nonparametric tolerance intervals are based on order statistics, as described by Krishnamoorthy and Mathew
(2009).
Two-sided intervals are computed as
where Y is a random variable from a binomial.n; 1 p/ distribution and k is the largest integer for which
P .Y k/ 1 ˛.
The OUTINTERVALS= data set contains a group of observations for each variable analyzed.
Each group contains one or more observations for each interval you specify with the METHODS=
option. The actual number depends upon the number of combinations of the ALPHA=, K=, and P=
values.
Variable Description
_ALPHA_ Value of ˛ associated with the intervals
_K_ Value of K= for the prediction intervals
_LOWER_ Lower endpoint of interval
_METHOD_ Interval index (1–7)
_P_ Value of P= for the tolerance intervals
_TYPE_ Type of interval (ONESIDED or TWOSIDED)
_UPPER_ Upper endpoint of interval
_VAR_ Variable name
If you use a BY statement, the BY variables are also saved in the OUTINTERVALS= data set.
ODS Tables
Table 6.58 summarizes the ODS tables that you can request with the INTERVALS statement.
Approximate Prediction
Limit For All of k Future
Observations
Lower
Confidence k Limit
99.00% 1 11.90
99.00% 2 11.89
99.00% 3 11.88
95.00% 1 11.93
95.00% 2 11.92
95.00% 3 11.91
90.00% 1 11.95
90.00% 2 11.93
90.00% 3 11.92
compute and save percentiles not automatically computed by the CAPABILITY procedure
data Belts;
label Strength = 'Breaking Strength (lb/in)'
Width = 'Width in Inches';
input Strength Width @@;
datalines;
1243.51 3.036 1221.95 2.995 1131.67 2.983 1129.70 3.019
1198.08 3.106 1273.31 2.947 1250.24 3.018 1225.47 2.980
1126.78 2.965 1174.62 3.033 1250.79 2.941 1216.75 3.037
1285.30 2.893 1214.14 3.035 1270.24 2.957 1249.55 2.958
1166.02 3.067 1278.85 3.037 1280.74 2.984 1201.96 3.002
1101.73 2.961 1165.79 3.075 1186.19 3.058 1124.46 2.929
1213.62 2.984 1213.93 3.029 1289.59 2.956 1208.27 3.029
1247.48 3.027 1284.34 3.073 1209.09 3.004 1146.78 3.061
1224.03 2.915 1200.43 2.974 1183.42 3.033 1195.66 2.995
1258.31 2.958 1136.05 3.022 1177.44 3.090 1246.13 3.022
1183.67 3.045 1206.50 3.024 1195.69 3.005 1223.49 2.971
1147.47 2.944 1171.76 3.005 1207.28 3.065 1131.33 2.984
1215.92 3.003 1202.17 3.058
440 F Chapter 6: The CAPABILITY Procedure
;
The following statements produce two output data sets containing summary statistics:
You can use the PCTLPTS=, PCTLPRE=, and PCTLNAME= options to save percentiles not automatically
computed by the CAPABILITY procedure. For example, the following statements create an output data set
named Pctls containing the 20th and 40th percentiles of the variables Strength and Width:
You can use any number of OUTPUT statements in the CAPABILITY procedure. Each OUTPUT statement
creates a new data set containing the statistics specified in that statement. When you use the OUTPUT
statement, you must also use the VAR statement. In addition, the OUTPUT statement must contain at least
one of the following:
You can use the OUT= option to specify the name of the output data set:
OUT=SAS-data-set
specifies the name of the output data set. To create a permanent SAS data set, specify a two-level name.
See SAS DATA Step Statements: Reference for more information on permanent SAS data sets. For
example, the previous statements create an output data set named Summary. If the OUT= option is
omitted, then by default the new data set is named using the DATAn convention.
A keyword=names specification selects a statistic to be included in the output data set and gives names to the
new variables that contain the statistics. Specify a keyword for each desired statistic, an equal sign, and the
names of the variables to contain the statistic.
In the output data set, the first variable listed after a keyword in the OUTPUT statement contains the statistic
for the first variable listed in the VAR statement; the second variable contains the statistic for the second
variable in the VAR statement, and so on. The list of names following the equal sign can be shorter than the
list of variables in the VAR statement. In this case, the procedure uses the names in the order in which the
variables are listed in the VAR statement. Consider the following example:
Keyword Description
Descriptive Statistics
CSS Sum of squares corrected for the mean
CV Percent coefficient of variation
Syntax: OUTPUT Statement F 443
Keyword Description
GEOMEAN Geometric mean
HARMEAN Harmonic mean
KURTOSIS | KURT Kurtosis
MAX Largest (maximum) value
MEAN Mean
MIN Smallest (minimum) value
MODE Most frequent value (if not unique, the smallest mode)
N Number of observations on which calculations are based
NMISS Number of missing values
NOBS Number of observations
RANGE Range
SKEWNESS | SKEW Skewness
STD | STDDEV Standard deviation
STDMEAN | STDERR Standard error of the mean
SUM Sum
SUMWGT Sum of weights
USS Uncorrected sum of squares
VAR Variance
Quantile Statistics
MEDIAN | P50 | Q2 Median (50th percentile)
P1 1st percentile
P5 5th percentile
P10 10th percentile
P90 90th percentile
P95 95th percentile
P99 99th percentile
Q1 | P25 Lower quartile (25th percentile)
Q3 | P75 Upper quartile (75th percentile)
QRANGE Interquartile range (Q3–Q1)
Robust Statistics
GINI Gini’s mean difference
MAD Median absolute difference
QN 2nd variation of median absolute difference
SN 1st variation of median absolute difference
STD_GINI Standard deviation for Gini’s mean difference
STD_MAD Standard deviation for median absolute difference
STD_QN Standard deviation for the second variation of the median absolute
difference
STD_QRANGE Estimate of the standard deviation, based on interquartile range
STD_SN Standard deviation for the first variation of the median absolute
difference
444 F Chapter 6: The CAPABILITY Procedure
Keyword Description
Hypothesis Test Statistics
MSIGN Sign statistic
NORMAL Test statistic for normality. If the sample size is less than or equal to
2000, this is the Shapiro-Wilk W statistic. Otherwise, it is the
Kolmogorov D statistic.
PNORMAL | PROBN p-value for normality test
PROBM Probability of a greater absolute value for the sign statistic
PROBS Probability of a greater absolute value for the signed rank statistic
PROBT Two-tailed p-value for Student’s t statistic with n 1 degrees of
freedom
SIGNRANK Signed rank statistic
T Student’s t statistic to test the null hypothesis that the population mean
is equal to 0
The CAPABILITY procedure automatically computes the 1st, 5th, 10th, 25th, 50th, 75th, 90th, 95th, and
99th percentiles for the data. You can save these statistics in an output data set by using keyword=names
Syntax: OUTPUT Statement F 445
specifications. You can request additional percentiles by using the PCTLPTS= option. The following
percentile-options are related to these additional percentiles:
CIPCTLDF=(cipctl-options)
CIQUANTDF=(cipctl-options)
requests distribution-free confidence limits for percentiles that are requested with the PCTLPTS=
option. In other words, no specific parametric distribution such as the normal is assumed for the data.
PROC CAPABILITY uses order statistics (ranks) to compute the confidence limits as described by
Hahn and Meeker (1991). This option does not apply if you use a WEIGHT statement. You can specify
the following cipctl-options:
ALPHA=˛
specifies the level of significance ˛ for 100.1 ˛/% confidence intervals. The value ˛ must be
between 0 and 1; the default value is 0.05, which results in 95% confidence intervals. The default
value is the value of ALPHA= given in the PROC statement.
LOWERPRE=prefixes
specifies one or more prefixes that are used to create names for variables that contain the lower
confidence limits. To save lower confidence limits for more than one analysis variable, specify a
list of prefixes. The order of the prefixes corresponds to the order of the analysis variables in the
VAR statement.
LOWERNAME=suffixes
specifies one or more suffixes that are used to create names for variables that contain the lower
confidence limits. PROC CAPABILITY creates a variable name by combining the LOWERPRE=
value and suffix name. Because the suffixes are associated with the requested percentiles, list the
suffixes in the same order as the PCTLPTS= percentiles.
TYPE=keyword
specifies the type of confidence limit, where keyword is LOWER, UPPER, SYMMETRIC, or
ASYMMETRIC. The default value is SYMMETRIC.
UPPERPRE=prefixes
specifies one or more prefixes that are used to create names for variables that contain the upper
confidence limits. To save upper confidence limits for more than one analysis variable, specify a
list of prefixes. The order of the prefixes corresponds to the order of the analysis variables in the
VAR statement.
UPPERNAME=suffixes
specifies one or more suffixes that are used to create names for variables that contain the upper
confidence limits. PROC CAPABILITY creates a variable name by combining the UPPERPRE=
value and suffix name. Because the suffixes are associated with the requested percentiles, list the
suffixes in the same order as the PCTLPTS= percentiles.
N OTE : See the entries for the PCTLPTS=, PCTLPRE=, and PCTLNAME= options for a detailed
description of how variable names are created using prefixes, percentile values, and suffixes.
446 F Chapter 6: The CAPABILITY Procedure
CIPCTLNORMAL=(cipctl-options)
CIQUANTNORMAL=(cipctl-options)
requests confidence limits based on the assumption that the data are normally distributed for percentiles
that are requested with the PCTLPTS= option. The computational method is described in Section
4.4.1 of Hahn and Meeker (1991) and uses the noncentral t distribution as given by Odeh and Owen
(1980). This option does not apply if you use a WEIGHT statement. You can specify the following
cipctl-options:
ALPHA=˛
specifies the level of significance ˛ for 100.1 ˛/% confidence intervals. The value ˛ must be
between 0 and 1; the default value is 0.05, which results in 95% confidence intervals. The default
value is the value of ALPHA= given in the PROC statement.
LOWERPRE=prefixes
specifies one or more prefixes that are used to create names for variables that contain the lower
confidence limits. To save lower confidence limits for more than one analysis variable, specify a
list of prefixes. The order of the prefixes corresponds to the order of the analysis variables in the
VAR statement.
LOWERNAME=suffixes
specifies one or more suffixes that are used to create names for variables that contain the lower
confidence limits. PROC CAPABILITY creates a variable name by combining the LOWERPRE=
value and suffix name. Because the suffixes are associated with the requested percentiles, list the
suffixes in the same order as the PCTLPTS= percentiles.
TYPE=keyword
specifies the type of confidence limit, where keyword is LOWER, UPPER, or TWOSIDED. The
default is TWOSIDED.
UPPERPRE=prefixes
specifies one or more prefixes that are used to create names for variables that contain the upper
confidence limits. To save upper confidence limits for more than one analysis variable, specify a
list of prefixes. The order of the prefixes corresponds to the order of the analysis variables in the
VAR statement.
UPPERNAME=suffixes
specifies one or more suffixes that are used to create names for variables that contain the upper
confidence limits. PROC CAPABILITY creates a variable name by combining the UPPERPRE=
value and suffix name. Because the suffixes are associated with the requested percentiles, list the
suffixes in the same order as the PCTLPTS= percentiles.
N OTE : See the entries for the PCTLPTS=, PCTLPRE=, and PCTLNAME= options for a detailed
description of how variable names are created using prefixes, percentile values, and suffixes.
PCTLGROUP=BYSTAT | BYVAR
specifies the order in which variables that you request with the PCTLPTS= option are added to the
OUT= data set when the VAR statement lists more than one analysis variable. By default (or if you
specify PCTLGROUP=BYSTAT), all variables that are associated with a percentile value are created
consecutively. If you specify PCTLGROUP=BYVAR, all variables that are associated with an analysis
variable are created consecutively.
Consider the following statements:
Syntax: OUTPUT Statement F 447
The order of variables in the data set ByStat is Pre_20, Post_20, Pre_40, Post_40. The order of
variables in the data set ByVar is Pre_20, Pre_40, Post_20, Post_40.
PCTLNAME=suffixes
provides name suffixes for the new variables created by the PCTLPTS= option. These suffixes are
appended to the prefixes you specify with the PCTLPRE= option, replacing the percentile values that
are used as suffixes by default. List the suffixes in the same order in which you specify the percentiles.
If you specify n suffixes with the PCTLNAME= option and m percentile values with the PCTLPTS=
option, where m > n, the suffixes are used to name the first n percentiles, and the default names are
used for the remaining m n percentiles. For example, consider the following statements:
proc capability;
var length width height;
output pctlpts = 20 40
pctlpre = pl pw ph
pctlname = twenty;
run;
The value “twenty” in the PCTLNAME= option is used for only the first percentile in the PCTLPTS=
list. This suffix is appended to the values in the PCTLPRE= option to generate the new variable names
pltwenty, pwtwenty, and phtwenty, which contain the 20th percentiles for length, width, and height,
respectively. Because a second PCTLNAME= suffix is not specified, variable names for the 40th
percentiles for length, width, and height are generated using the prefixes and percentile values. Thus,
the output data set contains the variables pltwenty, pl40, pwtwenty, pw40, phtwenty, and ph40.
PCTLNDEC=value
specifies the number of decimal places in percentile values that are incorporated into percentile variable
names. The default value is 1. For example, the following statements create two output data sets, each
containing one percentile variable. The variable in data set short is named pwid85_1, while the one in
data set long is named pwid85_125.
proc capability;
var width;
output out=short pctlpts=85.125 pctlpre=pwid;
output out=long pctlpts=85.125 pctlpre=pwid pctlndec=3;
run;
448 F Chapter 6: The CAPABILITY Procedure
PCTLPRE=prefixes
specifies prefixes used to create variable names for percentiles requested with the PCTLPTS= option.
The PCTLPRE= and PCTLPTS= options must be used together.
The procedure generates new variable names by using the prefix and the percentile values. If the
specified percentile is an integer, the variable name is simply the prefix followed by the value. For
noninteger percentiles, an underscore replaces the decimal point in the variable name, and decimal
values are truncated to one decimal place. For example, the following statements create the variables
pwid20, pwid33_3, pwid66_6, and pwid80 for the 20th, 33.33rd, 66.67th, and 80th percentiles of width,
respectively:
If you request percentiles for more than one variable, you should list prefixes in the same order in
which the variables appear in the VAR statement. For example, the following statements compute the
80th and 87.5th percentiles for length and width and save the new variables plength80, plength87_5,
pwidth80, and pwidth87_5 in the output data set:
PCTLPTS=percentiles
specifies percentiles that are not automatically computed by the procedure. The CAPABILITY
procedure automatically computes the 1st, 5th, 10th, 25th, 50th, 75th, 90th, 95th, and 99th percentiles
for the data. These can be saved in an output data set by using keyword=names specifications. The
PCTLPTS= option generates additional percentiles and outputs them to a data set; these additional
percentiles are not printed.
If you use the PCTLPTS= option, you must also use the PCTLPRE= option to provide a prefix for
the new variable names. For example, to create variables that contain the 20th, 40th, 60th, and 80th
percentiles of length, use the following statements:
This creates the variables plen20, plen40, plen60, and plen80, whose values are the corresponding
percentiles of length. In addition to specifying name prefixes with the PCTLPRE= option, you can also
use the PCTLNAME= option to create name suffixes for the new variables created by the PCTLPTS=
option.
Details: OUTPUT Statement F 449
variables in the BY statement. The values of these variables match the values in the corresponding BY
group in the DATA= data set.
variables in the CLASS statement. The values of these variables identify the CLASS level within a BY
group in the DATA= data set that from which statistics are computed.
variables in the ID statement. The values of these variables match those for the first observation in
each BY group, or for the first observation in the data set if you do not specify a BY statement.
variables created by selecting statistics in the OUTPUT statement. The values of the statistics are
computed using all the nonmissing data, or statistics are computed for each BY group if you use a BY
statement.
variables created by requesting new percentiles with the PCTLPTS= option. The names of these new
variables depend on the values of the PCTLPRE= and PCTLNAME= options.
If the output data set contains a percentile variable or a quartile variable, the percentile definition assigned
with the PCTLDEF= option in the PROC CAPABILITY statement is recorded on the output data set label.
The values of variables requested with the statistics keywords CP, CPK, CPL, CPM, CPU, K, PCTGTR, and
PCTLSS are missing unless you identify specification limits in a SPEC statement or in a SPEC= data set.
As an alternative to OUT= data sets, you can create an OUTTABLE= data set. The structure of the
OUTTABLE= data set may be more appropriate when you are computing summary statistics and capability
indices for multiple process variables. See “OUTTABLE= Data Set” on page 232.
data Titanium;
label Hardness = 'Hardness Measurement';
input Hardness @@;
datalines;
1.38 1.49 1.43 1.60 1.59
1.34 1.44 1.64 1.83 1.57
1.45 1.74 1.61 1.39 1.63
1.73 1.61 1.35 1.51 1.47
1.46 1.41 1.56 1.40 1.58
1.43 1.53 1.53 1.58 1.62
1.58 1.46 1.26 1.57 1.41
1.53 1.36 1.63 1.36 1.66
1.49 1.55 1.67 1.41 1.39
1.75 1.37 1.36 1.86 1.49
;
The target value for hardness is 1.6, and the lower and upper specification limits are 0.8 and 2.4, respectively.
The samples are produced by an in-control process, and the measurements are assumed to be normally
distributed.
The following statements use the OUTPUT statement to save various descriptive statistics and an estimate of
the index Cpm in a data set named Indices:
d j mj
Cpmk D p
3 2 C . T /2
where d D .USL LSL/=2, m D .USL C LSL/=2, and and are the mean and standard deviation of the
normal distribution. Refer to Section 3.6 of Kotz and Johnson (1993). A natural estimator for Cpmk is
d jX mj
CO pmk D q P
3 n1 niD1 .Xi T /2
data Indices;
set Indices;
d = 0.5*( USL - LSL );
m = 0.5*( USL + LSL );
num = d - abs( avg - m );
den = 3 * sqrt( (n-1)*var/n + (avg-t)*(avg-t) );
cpmk = num/den;
run;
Note that the p-value for the Kolmogorov-Smirnov test of normality is 0.25111, indicating that the assumption
of normality is justified.
The following statements also compute an estimate of the index Cpm by using the SPECIALINDICES option:
b pk ˙ kb
C pk
2 s 3
1 n 1 .n 1/ 2 ..n 2/=2/ 5
LCL D C
b pk 41 ˆ ..1 /=2/
n 3 2 2 ..n 1/=2/
2 s 3
1 n 1 .n 1/ 2 ..n 2/=2/ 5
UCL D C
b pk 41 C ˆ .1 .1 /=2/
n 3 2 2 ..n 1/=2/
This assumes that Cb pk is normally distributed. You can also compute approximate confidence limits based
on equation (6) of ZSW, which provides an exact expression for the variance of C
b pk .
The following program uses the methods of Bissell (1990) and ZSW to compute approximate confidence
limits for Cpk for the variable Hardness in the data set Titanium (see Example 6.19).
data Summary;
set Summary;
length Method $ 16;
Examples: OUTPUT Statement F 453
Method = "Bissell";
lcl = cpklcl;
ucl = cpkucl;
output;
Output 6.20.2 Approximate Confidence Limits for Cpk using the SPECIALINDICES option
Approximate 95% Confidence Limits for Cpk
beta
exponential
gamma
Gumbel
inverse Gaussian
lognormal
normal
generalized Pareto
power function
Getting Started: PPPLOT Statement F 455
Rayleigh
Weibull
You can also create a comparative P-P plot by using the PPPLOT statement in conjunction with a CLASS
statement.
You have three alternatives for producing P-P plots with the PPPLOT statement:
ODS Graphics output is produced if ODS Graphics is enabled, for example by specifying the ODS
GRAPHICS ON statement prior to the PROC statement.
Legacy line printer charts are produced when you specify the LINEPRINTER option in the PROC
statement.
See Chapter 4, “SAS/QC Graphics,” for more information about producing these different kinds of graphs.
N OTE : Probability-probability plots should not be confused with probability plots, which compare a set of
ordered measurements with percentiles from a specified distribution. You can create probability plots with
the PROBPLOT statement.
data Sheets;
input Distance @@;
label Distance='Hole Distance in cm';
datalines;
9.80 10.20 10.27 9.70 9.76
10.11 10.24 10.20 10.24 9.63
9.99 9.78 10.10 10.21 10.00
3 These data are also used to create Q-Q plots in “QQPLOT Statement: CAPABILITY Procedure” on page 508.
456 F Chapter 6: The CAPABILITY Procedure
The cutting process is in statistical control. As a preliminary step in a capability analysis of the process, it is
decided to check whether the distances are normally distributed. The following statements create a P-P plot,
shown in Figure 6.30, which is based on the normal distribution with mean D 10 and standard deviation
D 0:3:
The linearity of the pattern in Figure 6.30 is evidence that the measurements are normally distributed with
mean 10 and standard deviation 0.3. The SQUARE option displays the plot in a square format.
options
specify the theoretical distribution for the plot or add features to the plot. If you specify more than
one variable, the options apply equally to each variable. Specify all options after the slash (/) in the
PPPLOT statement. You can specify only one option naming a distribution, but you can specify any
number of other options. The distributions available are the beta, exponential, gamma, Gumbel, inverse
Gaussian, lognormal, normal, generalized Pareto, power function, Rayleigh, and Weibull. By default,
the procedure produces a P-P plot based on the normal distribution.
In the following example, the NORMAL, MU= and SIGMA= options request a P-P plot based on the
normal distribution with mean 10 and standard deviation 0.3. The SQUARE option displays the plot in
a square frame, and the CTEXT= option specifies the text color.
Summary of Options
The following tables list the PPPLOT statement options by function. For complete descriptions, see the
section “Dictionary of Options” on page 462.
Distribution Options
Table 6.60 summarizes the options for requesting a specific theoretical distribution.
Option Description
BETA(beta-options) Specifies beta P-P plot
EXPONENTIAL(exponential-options) Specifies exponential P-P plot
GAMMA(gamma-options) Specifies gamma P-P plot
GUMBEL(Gumbel-options) Specifies Gumbel P-P plot
IGAUSS(iGauss-options) Specifies inverse Gaussian P-P plot
LOGNORMAL(lognormal-options) Specifies lognormal P-P plot
NORMAL(normal-options) Specifies normal P-P plot
PARETO(Pareto-options) Specifies generalized Pareto P-P plot
POWER(power-options) Specifies power function P-P plot
Syntax: PPPLOT Statement F 459
Option Description
RAYLEIGH(Rayleigh-options) Specifies Rayleigh P-P plot
WEIBULL(Weibull-options) Specifies Weibull P-P plot
Table 6.61 summarizes options that specify distribution parameters and control the display of the diagonal
distribution reference line. Specify these options in parentheses after the distribution option. For example,
the following statements use the NORMAL option to request a normal P-P plot:
Option Description
Distribution Reference Line Options
COLOR= Specifies color of distribution reference line
L= Specifies line type of distribution reference line
NOLINE Suppresses the distribution reference line
SYMBOL= Specifies plotting character for line printer plots
W= Specifies width of distribution reference line
Beta-Options
ALPHA= Specifies shape parameter ˛
BETA= Specifies shape parameter ˇ
SIGMA= Specifies scale parameter
THETA= Specifies lower threshold parameter
Exponential-Options
SIGMA= Specifies scale parameter
THETA= Specifies threshold parameter
Gamma-Options
ALPHA= Specifies shape parameter ˛
SIGMA= Specifies scale parameter
THETA= Specifies threshold parameter
Gumbel-Options
MU= Specifies location parameter
SIGMA= Specifies scale parameter
IGauss-Options
LAMBDA= Specifies shape parameter
460 F Chapter 6: The CAPABILITY Procedure
Option Description
MU= Specifies mean
Lognormal-Options
SIGMA= Specifies shape parameter
THETA= Specifies threshold parameter
ZETA= Specifies scale parameter
Normal-Options
MU= Specifies mean
SIGMA= Specifies standard deviation
Pareto-Options
ALPHA= Specifies shape parameter ˛
SIGMA= Specifies scale parameter
THETA= Specifies threshold parameter
Power-Options
ALPHA= Specifies shape parameter ˛
SIGMA= Specifies scale parameter
THETA= Specifies threshold parameter
Rayleigh-Options
SIGMA= Specifies scale parameter
THETA= Specifies threshold parameter
Weibull-Options
C= Specifies shape parameter c
SIGMA= Specifies scale parameter
THETA= Specifies threshold parameter
General Options
Table 6.62 lists options that control the appearance of the plots.
Option Description
General Plot Layout Options
CONTENTS= Specifies table of contents entry for P-P plot grouping
HREF= Specifies reference lines perpendicular to the horizontal axis
HREFLABELS= Specifies line labels for HREF= lines
NOFRAME Suppresses frame around plotting area
SQUARE Displays P-P plot in square format
VREF= Specifies reference lines perpendicular to the vertical axis
Syntax: PPPLOT Statement F 461
Option Description
VREFLABELS= Specifies line labels for VREF= lines
Graphics Options
ANNOTATE= Provides an annotate data set
CAXIS= Specifies color for axis
CFRAME= Specifies color for frame
CHREF= Specifies colors for HREF= lines
CTEXT= Specifies color for text
CVREF= Specifies colors for VREF= lines
DESCRIPTION= Specifies description for plot in graphics catalog
FONT= Specifies software font for text
HAXIS= Specifies AXIS statement for horizontal axis
HEIGHT= Specifies height of text used outside framed areas
HMINOR= Specifies number of minor tick marks on horizontal axis
HREFLABPOS= Specifies position for HREF= line labels
INFONT= Specifies software font for text inside framed areas
INHEIGHT= Specifies height of text inside framed areas
LHREF= Specifies line styles for HREF= lines
LVREF= Specifies line styles for VREF= lines
NAME= Specifies name for plot in graphics catalog
NOHLABEL Suppresses label for horizontal axis
NOVLABEL Suppresses label for vertical axis
NOVTICK Suppresses tick marks and tick mark labels for vertical axis
TURNVLABELS Turns and vertically strings out characters in labels for vertical
axis
VAXIS= Specifies AXIS statement for vertical axis
VAXISLABEL= Specifies label for vertical axis
VMINOR= Specifies number of minor tick marks on vertical axis
VREFLABPOS= Specifies position for VREF= line labels
WAXIS= Specifies line thickness for axes and frame
Option Description
CTEXTTOP= Specifies color for column labels
INTERTILE= Specifies distance between tiles in comparative plot
NCOLS= Specifies number of columns in comparative plot
NROWS= Specifies number of rows in comparative plot
OVERLAY Overlays plots for different class levels (ODS Graphics only)
Dictionary of Options
The following entries provide detailed descriptions of the options specific to the PPPLOT statement. See
“Dictionary of Common Options: CAPABILITY Procedure” on page 550 for detailed descriptions of options
common to all the plot statements.
ALPHA=value
specifies the shape parameter ˛ (˛ > 0) for P-P plots requested with the BETA, GAMMA, PARETO,
and POWER options. For examples, see the entries for the distribution options.
BETA< (beta-options) >
creates a beta P-P plot. To create the plot, the n nonmissing observations are ordered from smallest to
largest:
x.1/ x.2/ x.n/
The y-coordinate of the ith point is the empirical cdf value ni . The x-coordinate is the theoretical beta
cdf value
R
x x /˛ 1 .C t /ˇ 1
B˛ˇ .i/ D .i/ .t B.˛;ˇ / .˛Cˇ 1/
dt
.˛/.ˇ /
where B˛ˇ ./ is the normalized incomplete beta function, B.˛; ˇ/ D .˛Cˇ /
, and
If you do not specify values for these parameters, then by default, D 0, D 1, and maximum
likelihood estimates are calculated for ˛ and ˇ.
IMPORTANT: If the default unit interval (0,1) does not adequately describe the range of your data,
then you should specify THETA= and SIGMA= so that your data fall in the interval .; C /.
If the data are beta distributed with parameters ˛, ˇ, , and , then the points on the plot for ALPHA=˛,
BETA=ˇ, SIGMA=, and THETA= tend to fall on or near the diagonal line y D x, which is displayed
by default. Agreement between the diagonal line and the point pattern is evidence that the specified
beta distribution is a good fit. You can specify the SCALE= option as an alias for the SIGMA= option
and the THRESHOLD= option as an alias for the THETA= option.
BETA=value
specifies the shape parameter ˇ (ˇ > 0) for P-P plots requested with the BETA distribution option. See
the preceding entry for the BETA distribution option for an example.
C=value
specifies the shape parameter c (c > 0) for P-P plots requested with the WEIBULL option. See the
entry for the WEIBULL option for examples.
where
D threshold parameter
D scale parameter . > 0/
You can specify and with the SIGMA= and THETA= exponential-options, as illustrated in the
following example:
If you do not specify values for these parameters, then by default, D 0 and a maximum likelihood
estimate is calculated for .
IMPORTANT: Your data must be greater than or equal to the lower threshold . If the default D 0
is not an adequate lower bound for your data, specify with the THETA= option.
If the data are exponentially distributed with parameters and , the points on the plot for SIGMA=
and THETA= tend to fall on or near the diagonal line y D x, which is displayed by default.
464 F Chapter 6: The CAPABILITY Procedure
Agreement between the diagonal line and the point pattern is evidence that the specified exponential
distribution is a good fit. You can specify the SCALE= option as an alias for the SIGMA= option and
the THRESHOLD= option as an alias for the THETA= option.
D threshold parameter
D scale parameter . > 0/
˛ D shape parameter .˛ > 0/
You can specify ˛, , and with the ALPHA=, SIGMA=, and THETA= gamma-options, as illustrated
in the following example:
If you do not specify values for these parameters, then by default, D 0 and maximum likelihood
estimates are calculated for ˛ and .
IMPORTANT: Your data must be greater than or equal to the lower threshold . If the default D 0
is not an adequate lower bound for your data, specify with the THETA= option.
If the data are gamma distributed with parameters ˛, , and , the points on the plot for ALPHA=˛,
SIGMA= , and THETA= tend to fall on or near the diagonal line y D x, which is displayed by
default. Agreement between the diagonal line and the point pattern is evidence that the specified
gamma distribution is a good fit. You can specify the SHAPE= option as an alias for the ALPHA=
option, the SCALE= option as an alias for the SIGMA= option, and the THRESHOLD= option as an
alias for the THETA= option.
where
D location parameter
D scale parameter . > 0/
You can specify and with the MU= and SIGMA= Gumbel-options. By default, maximum
likelihood estimates are computed for and .
If the data are Gumbel distributed with parameters and , the points on the plot for MU= and
SIGMA= tend to fall on or near the diagonal line y D x, which is displayed by default. Agreement
between the diagonal line and the point pattern is evidence that the specified Gumbel distribution is a
good fit.
LAMBDA=value
specifies the shape parameter ( > 0) for P-P plots requested with the IGAUSS option. Enclose the
LAMBDA= option in parentheses after the IGAUSS distribution keyword. If you do not specify a
value for , the procedure calculates a maximum likelihood estimate.
log.x.i/ /
ˆ
D threshold parameter
D scale parameter
D shape parameter . > 0/
You can specify , , and with the THETA=, ZETA=, and SIGMA= lognormal-options, as illustrated
in the following example:
If you do not specify values for these parameters, then by default, D 0 and estimates of and are
computed as described in “Lognormal Distribution” on page 353.
IMPORTANT: Your data must be greater than the lower threshold . If the default D 0 is not an
adequate lower bound for your data, specify with the THETA= option.
If the data are lognormally distributed with parameters , , and , the points on the plot for SIGMA= ,
THETA= , and ZETA= tend to fall on or near the diagonal line y D x, which is displayed by default.
Agreement between the diagonal line and the point pattern is evidence that the specified lognormal
distribution is a good fit. You can specify the SHAPE= option as an alias for the SIGMA= option, the
SCALE= option as an alias for the ZETA= option, and the THRESHOLD= option as an alias for the
THETA= option.
MU=value
specifies the parameter for a P-P plot requested with the GUMBEL, IGAUSS, and NORMAL options.
For examples, see Figure 6.30, or Figure 6.31 and Figure 6.32. For the normal and inverse Gaussian
distributions, the default value of is the sample mean. If you do not specify a value for for the
Gumbel distribution, the procedure calculates a maximum likelihood estimate.
NOLINE
suppresses the diagonal reference line.
NOOBSLEGEND
NOOBSL
suppresses the legend that indicates the number of hidden observations in a legacy line printer plot.
This option is ignored unless you specify the LINEPRINTER option in the PROC CAPABILITY
statement.
i
The y-coordinate of the ith point is the empirical cdf value n. The x-coordinate is the theoretical
normal cdf value
x R x.i/ 1 .t /2
ˆ .i/ D 1 p exp 2 2 dt
2
By default, the sample mean and sample standard deviation are used for and .
If the data are normally distributed with parameters and , the points on the plot for MU= and
SIGMA= tend to fall on or near the diagonal line y D x, which is displayed by default. Agreement
between the diagonal line and the point pattern is evidence that the specified normal distribution is a
good fit. For an example, see Figure 6.30.
D threshold parameter
D scale parameter . > 0/
˛ D shape parameter
The parameter for the generalized Pareto distribution must be less than the minimum data value.
You can specify with the THETA= Pareto-option. The default value for is 0. In addition, the
generalized Pareto distribution has a shape parameter ˛ and a scale parameter . You can specify
these parameters with the ALPHA= and SIGMA= Pareto-options. By default, maximum likelihood
estimates are computed for ˛ and .
If the data are generalized Pareto distributed with parameters , , and ˛, the points on the plot for
THETA=, SIGMA= , and ALPHA=˛ tend to fall on or near the diagonal line y D x, which is
displayed by default. Agreement between the diagonal line and the point pattern is evidence that the
specified generalized Pareto distribution is a good fit.
468 F Chapter 6: The CAPABILITY Procedure
where
PPSYMBOL=‘character ’
specifies the character used to plot the points in a legacy line printer plot. The default is the plus sign
(+). This option is ignored unless you specify the LINEPRINTER option in the PROC CAPABILITY
statement.
where
D threshold parameter
D scale parameter . > 0/
Syntax: PPPLOT Statement F 469
The parameter for the Rayleigh distribution must be less than the minimum data value. You can
specify with the THETA= Rayleigh-option. The default value for is 0. You can specify with the
SIGMA= Rayleigh-option. By default, a maximum likelihood estimate is computed for .
If the data are Rayleigh distributed with parameters and , the points on the plot for THETA= and
SIGMA= tend to fall on or near the diagonal line y D x, which is displayed by default. Agreement
between the diagonal line and the point pattern is evidence that the specified Rayleigh distribution is a
good fit.
SIGMA=value
specifies the parameter , where > 0. When used with the BETA, EXPONENTIAL, GAMMA,
GUMBEL, NORMAL, PARETO, POWER, RAYLEIGH, and WEIBULL options, the SIGMA= option
specifies the scale parameter. When used with the LOGNORMAL option, the SIGMA= option specifies
the shape parameter. Enclose the SIGMA= option in parentheses after the distribution keyword. For an
example of the SIGMA= option used with the NORMAL option, see Figure 6.30.
SQUARE
displays the P-P plot in a square frame. The default is a rectangular frame. See Figure 6.30 for an
example.
SYMBOL=‘character ’
specifies the character used for the diagonal reference line in legacy line printer plots. The default
character is the first letter of the distribution option keyword. This option is ignored unless you specify
the LINEPRINTER option in the PROC CAPABILITY statement.
THETA=value
THRESHOLD=value
specifies the lower threshold parameter for plots requested with the BETA, EXPONENTIAL,
GAMMA, LOGNORMAL, PARETO, POWER, RAYLEIGH, and WEIBULL options.
where
D threshold parameter
D scale parameter . > 0/
c = shape parameter (c > 0)
You can specify c, , and with the C=, SIGMA=, and THETA= Weibull-options, as illustrated in the
following example:
470 F Chapter 6: The CAPABILITY Procedure
If you do not specify values for these parameters, then by default D 0 and maximum likelihood
estimates are calculated for and c.
IMPORTANT: Your data must be greater than or equal to the lower threshold . If the default D 0
is not an adequate lower bound for your data, you should specify with the THETA= option.
If the data are Weibull distributed with parameters c, , and , the points on the plot for C=c,
SIGMA= , and THETA= tend to fall on or near the diagonal line y D x, which is displayed by
default. Agreement between the diagonal line and the point pattern is evidence that the specified
Weibull distribution is a good fit. You can specify the SHAPE= option as an alias for the C= option,
the SCALE= option as an alias for the SIGMA= option, and the THRESHOLD= option as an alias for
the THETA= option.
ZETA=value
specifies a value for the scale parameter for lognormal P-P plots requested with the LOGNORMAL
option.
data Sheets;
input Distance @@;
label Distance='Hole Distance in cm';
datalines;
9.80 10.20 10.27 9.70 9.76
10.11 10.24 10.20 10.24 9.63
9.99 9.78 10.10 10.21 10.00
9.96 9.79 10.08 9.79 10.06
10.10 9.95 9.84 10.11 9.93
10.56 10.47 9.42 10.44 10.16
10.11 10.36 9.94 9.77 9.36
9.89 9.62 10.05 9.72 9.82
9.99 10.16 10.58 10.70 9.54
10.31 10.07 10.33 9.98 10.15
;
Figure 6.32 Normal P-P Plot with Standard Deviation Specified Incorrectly
Specifying a mean of 9.5 instead of 10 results in the plot shown in Figure 6.31, while specifying a standard
deviation of 0.5 instead of 0.3 results in the plot shown in Figure 6.32. Both plots clearly reveal the model
misspecification.
The construction of a Q-Q plot does not require that the location or scale parameters of F ./ be specified.
The theoretical quantiles are computed from a standard distribution within the specified family. A
linear point pattern indicates that the specified family reasonably describes the data distribution, and
the location and scale parameters can be estimated visually as the intercept and slope of the linear
pattern. In contrast, the construction of a P-P plot requires the location and scale parameters of F ./ to
evaluate the cdf at the ordered data values.
474 F Chapter 6: The CAPABILITY Procedure
The linearity of the point pattern on a Q-Q plot is unaffected by changes in location or scale. On a P-P
plot, changes in location or scale do not necessarily preserve linearity.
On a Q-Q plot, the reference line representing a particular theoretical distribution depends on the
location and scale parameters of that distribution, having intercept and slope equal to the location and
scale parameters. On a P-P plot, the reference line for any distribution is always the diagonal line
y D x.
Consequently, you should use a Q-Q plot if your objective is to compare the data distribution with a family of
distributions that vary only in location and scale, particularly if you want to estimate the location and scale
parameters from the plot.
An advantage of P-P plots is that they are discriminating in regions of high probability density, because in
these regions the empirical and theoretical cumulative distributions change more rapidly than in regions of
low probability density. For example, if you compare a data distribution with a particular normal distribution,
differences in the middle of the two distributions are more apparent on a P-P plot than on a Q-Q plot.
For further details on P-P plots, refer to Gnanadesikan (1997) and Wilk and Gnanadesikan (1968).
Parameters
Family Distribution Function F .x/ Range Location Scale Shape
Rx .t /˛ 1 . C t /ˇ 1
Beta B.˛;ˇ / .˛Cˇ 1/
dt <x < C ˛, ˇ
x
Exponential 1 exp x
Rx ˛ 1
1 t t
Gamma .˛/ exp dt x> ˛
Gumbel exp e .x /= All x
q
x
Inverse Gaussian ˆ x 1 C x>0
q
2= x
e ˆ x C1
Rx
p 1 .log.t / /2
Lognormal 2.t / exp 2 2
dt x>
Rx
p1 .t /2
Normal 1 2 exp 2 2
dt All x
1=˛
˛.x /
Generalized Pareto 1 1 All x ˛
˛
x
Power function <x < C ˛
Details: PPPLOT Statement F 475
Parameters
Family Distribution Function F .x/ Range Location Scale Shape
.x /2 =.2 2 /
Rayleigh 1 e x
c
x
Weibull 1 exp x> c
You can request these distributions with the BETA, EXPONENTIAL, GAMMA, GUMBEL, IGAUSS,
NORMAL, LOGNORMAL, PARETO, POWER, RAYLEIGH, and WEIBULL options, respectively. If you
do not specify a distribution option, a normal P-P plot is created.
To create a P-P plot, you must provide all of the parameters for the theoretical distribution. If you do not
specify parameters, then default values or estimates are substituted, as summarized by Table 6.64.
ODS Graphics
Before you create ODS Graphics output, ODS Graphics must be enabled (for example, by using the ODS
GRAPHICS ON statement). For more information about enabling and disabling ODS Graphics, see the
section “Enabling and Disabling ODS Graphics” (Chapter 24, SAS/STAT User’s Guide).
The appearance of a graph produced with ODS Graphics is determined by the style associated with the ODS
destination where the graph is produced. PPPLOT options used to control the appearance of traditional
graphics are ignored for ODS Graphics output.
When ODS Graphics is in effect, the PPPLOT statement assigns a name to the graph it creates. You can use
this name to reference the graph when using ODS. The name is listed in Table 6.65.
See Chapter 4, “SAS/QC Graphics,” for more information about ODS Graphics and other methods for
producing charts.
PROBPLOT Statement: CAPABILITY Procedure F 477
beta
exponential
gamma
Gumbel
three-parameter lognormal
normal
generalized Pareto
power function
Rayleigh
two-parameter Weibull
three-parameter Weibull
display a reference line corresponding to specified or estimated location and scale parameters for the
theoretical distribution
You can also create a comparative probability plot by using the PROBPLOT statement in conjunction with a
CLASS statement.
You have three alternatives for producing probability plots the PROBPLOT statement:
ODS Graphics output is produced if ODS Graphics is enabled, for example by specifying the ODS
GRAPHICS ON statement prior to the PROC statement.
478 F Chapter 6: The CAPABILITY Procedure
Legacy line printer charts are produced when you specify the LINEPRINTER option in the PROC
statement.
See Chapter 4, “SAS/QC Graphics,” for more information about producing these different kinds of graphs.
N OTE : Probability plots are similar to Q-Q plots, which you can create with the QQPLOT statement (see
“QQPLOT Statement: CAPABILITY Procedure” on page 508). Probability plots are preferable for graphical
estimation of percentiles, whereas Q-Q plots are preferable for graphical estimation of distribution parameters
and capability indices.
data Rods;
input Diameter @@;
label Diameter='Diameter in mm';
datalines;
5.501 5.251 5.404 5.366 5.445
5.576 5.607 5.200 5.977 5.177
5.332 5.399 5.661 5.512 5.252
5.404 5.739 5.525 5.160 5.410
5.823 5.376 5.202 5.470 5.410
5.394 5.146 5.244 5.309 5.480
5.388 5.399 5.360 5.368 5.394
5.248 5.409 5.304 6.239 5.781
5.247 5.907 5.208 5.143 5.304
5.603 5.164 5.209 5.475 5.223
;
The process producing the rods is in statistical control, and as a preliminary step in a capability analysis of
the process, you decide to check whether the diameters are normally distributed. The following statements
create the normal probability plot shown in Figure 6.33:
4 This data set is analyzed using quantile-quantile plots in Example 6.23 and Example 6.24.
Getting Started: PROBPLOT Statement F 479
run;
Note that the PROBPLOT statement creates a normal probability plot for Diameter by default.
The nonlinearity of the point pattern indicates a departure from normality. Because the point pattern is
curved with slope increasing from left to right, a theoretical distribution that is skewed to the right, such as a
lognormal distribution, should provide a better fit than the normal distribution. This possibility is explored in
the next example.
Based on Figure 6.35, the 95th percentile of the diameter distribution is approximately 5.9 mm, because this
is the value corresponding to the intersection of the point pattern with the reference line.
The following statements illustrate how you can create a lognormal probability plot for Diameter by using an
estimate of . (See “Lognormal Distribution” on page 353 for a description of how the estimate is computed.)
You can specify the keyword PROB as an alias for PROBPLOT, and you can use any number of PROBPLOT
statements in the CAPABILITY procedure. The components of the PROBPLOT statement are described as
follows.
variables
are the process variables for which to create probability plots. If you specify a VAR statement, the
variables must also be listed in the VAR statement. Otherwise, the variables can be any numeric
variables in the input data set. If you do not specify a list of variables, then by default the procedure
creates a probability plot for each variable listed in the VAR statement, or for each numeric variable
in the DATA= data set if you do not specify a VAR statement. For example, each of the following
PROBPLOT statements produces two probability plots, one for length and one for width:
options
specify the theoretical distribution for the plot or add features to the plot. If you specify more than
one variable, the options apply equally to each variable. Specify all options after the slash (/) in the
PROBPLOT statement. You can specify only one option naming the distribution in each PROBPLOT
statement, but you can specify any number of other options. The distributions available are the beta,
exponential, gamma, Gumbel, lognormal, normal, generalized Pareto, power function, Rayleigh,
two-parameter Weibull, and three-parameter Weibull. By default, the procedure produces a plot for the
normal distribution.
In the following example, the NORMAL option requests a normal probability plot for each variable,
while the MU= and SIGMA= normal-options request a distribution reference line corresponding to
the normal distribution with D 10 and D 0:3. The SQUARE option displays the plot in a square
frame, and the CTEXT= option specifies the text color.
Summary of Options
The following tables list the PROBPLOT statement options by function. For complete descriptions, see the
section “Dictionary of Options” on page 488.
Distribution Options
Table 6.66 summarizes the options for requesting a specific theoretical distribution.
Option Description
BETA(beta-options) Specifies beta probability plot for shape
parameters ˛, ˇ specified with mandatory
ALPHA= and BETA= beta-options
EXPONENTIAL(exponential-options) Specifies exponential probability plot
GAMMA(gamma-options) Specifies gamma probability plot for
shape parameter ˛ specified with
mandatory ALPHA= gamma-option
GUMBEL(Gumbel-options) Specifies Gumbel probability plot
LOGNORMAL(lognormal-options) Specifies lognormal probability plot for
shape parameter specified with
mandatory SIGMA= lognormal-option
NORMAL(normal-options) Specifies normal probability plot
PARETO(Pareto-options) Specifies generalized Pareto probability
plot for shape parameter ˛ specified with
mandatory ALPHA= Pareto-option
POWER(power-options) Specifies power function probability plot
for shape parameter ˛ specified with
mandatory ALPHA= power-option
RAYLEIGH(Rayleigh-options) Specifies Rayleigh probability plot
WEIBULL(Weibull-options) Specifies three-parameter Weibull
probability plot for shape parameter c
specified with mandatory C=
Weibull-option
WEIBULL2(Weibull2-options) Specifies two-parameter Weibull
probability plot
Table 6.67 summarizes options that specify distribution parameters and control the display of a distribution
reference line. Specify these options in parentheses after the distribution option. For example, the following
statements use the NORMAL option to request a normal probability plot with a distribution reference line:
Option Description
Distribution Reference Line Options
COLOR= Specifies color of distribution reference line
L= Specifies line type of distribution reference line
SYMBOL= Specifies plotting character for line printer plots
W= Specifies width of distribution reference line
Beta-Options
ALPHA= Specifies mandatory shape parameter ˛
BETA= Specifies mandatory shape parameter ˇ
SIGMA= Specifies 0 for distribution reference line
THETA= Specifies 0 for distribution reference line
Exponential-Options
SIGMA= Specifies 0 for distribution reference line
THETA= Specifies 0 for distribution reference line
Gamma-Options
ALPHA= Specifies mandatory shape parameter ˛
SIGMA= Specifies 0 for distribution reference line
THETA= Specifies 0 for distribution reference line
Gumbel-Options
MU= Specifies location parameter
SIGMA= Specifies scale parameter
Lognormal-Options
SIGMA= Specifies mandatory shape parameter
SLOPE= Specifies slope of distribution reference line
THETA= Specifies 0 for distribution reference line
ZETA= Specifies 0 for distribution reference line (slope is exp.0 /)
Normal-Options
MU= Specifies 0 for distribution reference line
SIGMA= Specifies 0 for distribution reference line
Pareto-Options
ALPHA= Specifies mandatory shape parameter ˛
SIGMA= Specifies scale parameter
THETA= Specifies threshold parameter
Power-Options
ALPHA= Specifies mandatory shape parameter ˛
SIGMA= Specifies scale parameter
THETA= Specifies threshold parameter
486 F Chapter 6: The CAPABILITY Procedure
Option Description
Rayleigh-Options
SIGMA= Specifies scale parameter
THETA= Specifies threshold parameter
Weibull-Options
C= Specifies mandatory shape parameter c
SIGMA= Specifies 0 for distribution reference line
THETA= Specifies 0 for distribution reference line
Weibull2-Options
C= Specifies c0 for distribution reference line (slope is 1=c0 )
SIGMA= Specifies 0 for distribution reference line (intercept is
log.0 /)
SLOPE= Specifies slope of distribution reference line
THETA= Specifies known lower threshold 0
General Options
Table 6.68 lists options that control the appearance of the plots.
Option Description
General Plot Layout Options
CONTENTS= Specifies table of contents entry for probability plot grouping
GRID Draws grid lines perpendicular to the percentile axis
HREF= Specifies reference lines perpendicular to the horizontal axis
HREFLABELS= Specifies line labels for HREF= lines
LEGEND= Identifies LEGEND statement
NADJ= Adjusts sample size (N) when computing percentiles
NOFRAME Suppresses frame around plotting area
NOLEGEND Suppresses legend
NOLINELEGEND Suppresses distribution reference line information in legend
NOSPECLEGEND Suppresses specifications information in legend
PCTLMINOR Requests minor tick marks for percentile axis
PCTLORDER= Specifies tick mark labels for percentile axis
RANKADJ= Adjusts ranks when computing percentiles
ROTATE Switches horizontal and vertical axes
SQUARE Displays plot in square format
VREF= Specifies reference lines perpendicular to the vertical axis
VREFLABELS= Specifies line labels for VREF= lines
Syntax: PROBPLOT Statement F 487
Option Description
Graphics Options
ANNOTATE= Specifies annotate data set
CAXIS= Specifies color for axis
CFRAME= Specifies color for frame
CGRID= Specifies color for grid lines
CHREF= Specifies colors for HREF= lines
CTEXT= Specifies color for text
CSTATREF= Specifies colors for STATREF= lines
CVREF= Specifies colors for VREF= lines
DESCRIPTION= Specifies description for plot in graphics catalog
FONT= Specifies software font for text
HAXIS= Specifies AXIS statement for horizontal axis
HEIGHT= Specifies height of text used outside framed areas
HMINOR= Specifies number of horizontal minor tick marks
HREFLABPOS= Specifies position for HREF= line labels
INFONT= Specifies software font for text inside framed areas
INHEIGHT= Specifies height of text inside framed areas
LGRID= Specifies a line type for grid lines
LHREF= Specifies line styles for HREF= lines
LSTATREF= Specifies line styles for STATREF= lines
LVREF= Specifies line styles for VREF= lines
NAME= Specifies name for plot in graphics catalog
NOHLABEL Suppresses label for horizontal axis
NOVLABEL Suppresses label for vertical axis
NOVTICK Suppresses tick marks and tick mark labels for vertical axis
STATREF= Specifies reference lines at values of summary statistics
STATREFLABELS= Specifies labels for STATREF= lines
STATREFSUBCHAR= Specifies substitution character for displaying statistic values
in STATREFLABELS= labels
TURNVLABELS Turns and vertically strings out characters in labels for vertical
axis
VAXIS= Specifies AXIS statement for vertical axis
VAXISLABEL= Specifies label for vertical axis
VMINOR= Specifies number of vertical minor tick marks
VREFLABPOS= Specifies horizontal position of labels for VREF= lines
WAXIS= Specifies line thickness for axes and frame
WGRID= Specifies line thickness for grid
Option Description
Options for Comparative Plots
ANNOKEY Applies annotation to key cell only
CFRAMESIDE= Specifies color for filling frame for row labels
CFRAMETOP= Specifies color for filling frame for column labels
CPROP= Specifies color for proportion of frequency bar
CTEXTSIDE= Specifies color for row labels
CTEXTTOP= Specifies color for column labels
INTERTILE= Specifies distance between tiles
NCOLS= Specifies number of columns in comparative probability plot
NROWS= Specifies number of rows in comparative probability plot
OVERLAY Overlays plots for different class levels (ODS Graphics only)
Dictionary of Options
The following sections provide detailed descriptions of options specific to the PROBPLOT statement. See
“Dictionary of Common Options: CAPABILITY Procedure” on page 550 for detailed descriptions of options
common to all the plot statements.
General Options
You can specify the following options whether you are producing ODS Graphics output or traditional
graphics:
ALPHA=value-list|EST
specifies values for a mandatory shape parameter ˛ .˛ > 0/ for probability plots requested with
the BETA, GAMMA, PARETO, and POWER options. A plot is created for each value specified.
For examples, see the entries for the distribution options. If you specify ALPHA=EST, a maximum
likelihood estimate is computed for ˛.
Agreement between the reference line and the point pattern indicates that the beta distribution with
parameters ˛, ˇ, 0 and 0 is a good fit. You can specify the SCALE= option as an alias for the
SIGMA= option and the THRESHOLD= option as an alias for the THETA= option.
BETA=value-list|EST
specifies values for the shape parameter ˇ .ˇ > 0/ for probability plots requested with the BETA
distribution option. A plot is created for each value specified with the BETA= option. If you specify
BETA=EST, a maximum likelihood estimate is computed for ˇ. For examples, see the preceding entry
for the BETA option.
490 F Chapter 6: The CAPABILITY Procedure
C=value(-list)|EST
specifies the shape parameter c (c > 0) for probability plots requested with the WEIBULL and
WEIBULL2 options. You must specify C= as a Weibull-option with the WEIBULL option; in this
situation it accepts a list of values, or if you specify C=EST, a maximum likelihood estimate is computed
for c. You can optionally specify C=value or C=EST as a Weibull2-option with the WEIBULL2 option
to request a distribution reference line; in this situation, you must also specify SIGMA=value or
SIGMA=EST.
For example, the first PROBPLOT statement below creates three three-parameter Weibull plots
corresponding to the shape parameters c = 1, c = 2, and c = 3. The second PROBPLOT statement
creates a single three-parameter Weibull plot corresponding to an estimated value of c. The third
PROBPLOT statement creates a single two-parameter Weibull plot with a distribution reference line
corresponding to c0 D 2 and 0 D 3.
Agreement between the reference line and the point pattern indicates that the exponential distribution
with parameters 0 and 0 is a good fit. You can specify the SCALE= option as an alias for the
SIGMA= option and the THRESHOLD= option as an alias for the THETA= option.
Syntax: PROBPLOT Statement F 491
D threshold parameter
D scale parameter . > 0/
˛ D shape parameter .˛ > 0/
The intercept and slope are based on the quantile scale for the horizontal axis, which is displayed on a
Q-Q plot; see “QQPLOT Statement: CAPABILITY Procedure” on page 508.
To obtain a graphical estimate of ˛, specify a list of values for the ALPHA= option, and select the
value that most nearly linearizes the point pattern.
To assess the point pattern, you can add a diagonal distribution reference line corresponding to 0 and 0
with the gamma-options THETA=0 and SIGMA=0 . Alternatively, you can add a line corresponding
to estimated values of 0 and 0 with the gamma-options THETA=EST and SIGMA=EST. Specify
these options in parentheses, as in the following example:
Agreement between the reference line and the point pattern indicates that the gamma distribution with
parameters ˛, 0 and 0 is a good fit. You can specify the SCALE= option as an alias for the SIGMA=
option and the THRESHOLD= option as an alias for the THETA= option.
492 F Chapter 6: The CAPABILITY Procedure
GRID
draws reference lines perpendicular to the percentile axis at major tick marks.
e .x /=
.x /=
p.x/ D exp e
where is a location parameter and is a positive scale parameter.
The intercept and slope are based on the quantile scale for the horizontal axis, which is displayed on a
Q-Q plot; see “QQPLOT Statement: CAPABILITY Procedure” on page 508.
To assess the point pattern, you can add a diagonal distribution reference line corresponding to 0 and
0 with the Gumbel-options MU=0 and SIGMA=0 . Alternatively, you can add a line corresponding
to estimated values of 0 and 0 with the Gumbel-options MU=EST and SIGMA=EST. Specify these
options in parentheses following the GUMBEL option.
Agreement between the reference line and the point pattern indicates that the Gumbel distribution with
parameters 0 and 0 is a good fit.
where
D threshold parameter
D scale parameter
D shape parameter . > 0/
The intercept and slope are based on the quantile scale for the horizontal axis, which is displayed on a
Q-Q plot; see “QQPLOT Statement: CAPABILITY Procedure” on page 508.
To obtain a graphical estimate of , specify a list of values for the SIGMA= option, and select the
value that most nearly linearizes the point pattern.
To assess the point pattern, you can add a diagonal distribution reference line corresponding to 0 and 0
with the lognormal-options THETA=0 and ZETA=0 . Alternatively, you can add a line corresponding
to estimated values of 0 and 0 with the lognormal-options THETA=EST and ZETA=EST.
Specify these options in parentheses, as in the following example:
Agreement between the reference line and the point pattern indicates that the lognormal distribution
with parameters , 0 , and 0 is a good fit. See Example 6.22 for an example.
You can specify the THRESHOLD= option as an alias for the THETA= option and the SCALE= option
as an alias for the ZETA= option.
MU=value|EST
specifies the mean 0 for a probability plot requested with the GUMBEL and NORMAL options. If
you specify MU=EST, 0 is equal to the sample mean for the normal distribution. For the Gumbel
distribution, a maximum likelihood estimate is calculated. See Example 6.21.
NADJ=value
specifies the adjustment value added to the sample size in the calculation of theoretical percentiles.
The default is 14 , as recommended by Blom (1958). Also refer to Chambers et al. (1983) for additional
information.
NOLEGEND
suppresses legends for specification limits, fitted curves, distribution lines, and hidden observations.
NOLINELEGEND
NOLINEL
suppresses the legend for the optional distribution reference line.
distribution, and n is the number of nonmissing observations. The horizontal axis is scaled in percentile
units.
The point pattern on the plot tends to be linear with intercept and slope if the data are normally
distributed with the specific
.x /2
p.x/ D p1 exp 2 2 for all x
2
Agreement between the reference line and the point pattern indicates that the normal distribution with
parameters 0 and 0 is a good fit.
NOSPECLEGEND
NOSPECL
suppresses the legend for specification limit reference lines.
where
D threshold parameter
Syntax: PROBPLOT Statement F 495
PCTLORDER=value-list
specifies the tick mark values labeled on the theoretical percentile axis. Because the values are
percentiles, the labels must be between 0 and 100, exclusive. The values must be listed in increasing
order and must cover the plotted percentile range. Otherwise, a default list is used. For example,
consider the following:
Note that the ORDER= option in the AXIS statement is not supported by the PROBPLOT statement.
where
D threshold parameter
D scale parameter . > 0/
˛ D shape parameter .˛ > 0/
496 F Chapter 6: The CAPABILITY Procedure
The intercept and slope are based on the quantile scale for the horizontal axis, which is displayed on a
Q-Q plot; see “QQPLOT Statement: CAPABILITY Procedure” on page 508.
To obtain a graphical estimate of ˛, specify a list of values for the ALPHA= option, and select the
value that most nearly linearizes the point pattern.
To assess the point pattern, you can add a diagonal distribution reference line corresponding to 0 and 0
with the power-options THETA=0 and SIGMA=0 . Alternatively, you can add a line corresponding
to estimated values of 0 and 0 with the power-options THETA=EST and SIGMA=EST. Specify
these options in parentheses following the POWER option.
Agreement between the reference line and the point pattern indicates that the power function distribution
with parameters ˛, 0 , and 0 is a good fit.
RANKADJ=value
specifies the adjustment value added to the ranks in the calculation of theoretical percentiles. The
default is 38 , as recommended by Blom (1958). Also refer to Chambers et al. (1983) for additional
information.
ROTATE
switches the horizontal and vertical axes so that the theoretical percentiles are plotted vertically while
the data are plotted horizontally. Regardless of whether the plot has been rotated, horizontal axis
options (such as HAXIS=) still refer to the horizontal axis, and vertical axis options (such as VAXIS=)
still refer to the vertical axis. All other options that depend on axis placement adjust to the rotated axes.
Syntax: PROBPLOT Statement F 497
SIGMA=value-list|EST
specifies the value of the parameter , where > 0. Alternatively, you can specify SIGMA=EST to
request an estimate of 0 that is computed as
s
Pn
i D1 .log .xi / /
O0 D
n 1
The interpretation and use of the SIGMA= option depend on the distribution option with which it is
specified, as indicated by Table 6.69.
In the following example, the first PROBPLOT statement requests a normal plot with a distribution
reference line corresponding to 0 D 5 and 0 D 2, and the second PROBPLOT statement requests a
lognormal plot with shape parameter D 3:
SLOPE=value|EST
specifies the slope for a distribution reference line requested with the LOGNORMAL and WEIBULL2
options. The intercept and slope are based on the quantile scale for the horizontal axis, which is
displayed on a Q-Q plot; see “QQPLOT Statement: CAPABILITY Procedure” on page 508.
498 F Chapter 6: The CAPABILITY Procedure
When you use the SLOPE= option with the LOGNORMAL option, you must also specify a threshold
parameter value 0 with the THETA= lognormal-option to request the line. The SLOPE= option is an
alternative to the ZETA= lognormal-option for specifying 0 , because the slope is equal to exp.0 /.
When you use the SLOPE= option with the WEIBULL2 option, you must also specify a scale parameter
value 0 with the SIGMA= Weibull2-option to request the line. The SLOPE= option is an alternative
to the C= Weibull2-option for specifying c0 , because the slope is equal to 1=c0 . See “Location and
Scale Parameters” on page 503.
For example, the first and second PROBPLOT statements below produce the same set of probability
plots as the third and fourth PROBPLOT statements:
SQUARE
displays the probability plot in a square frame. For an example, see Output 6.22.1. The default is a
rectangular frame.
THETA=value|EST
THRESHOLD=value
specifies the lower threshold parameter for probability plots requested with the BETA, EXPONEN-
TIAL, GAMMA, LOGNORMAL, PARETO, POWER, RAYLEIGH, WEIBULL, and WEIBULL2
options. When used with the WEIBULL2 option, the THETA= option specifies the known lower
threshold 0 , for which the default is 0. When used with the other distribution options, the THETA=
option specifies 0 for a distribution reference line; alternatively in this situation, you can specify
THETA=EST to request a maximum likelihood estimate for 0 . To request the line, you must also
specify a scale parameter. See Output 6.22.1 for an example of the THETA= option with a lognormal
probability plot.
WEIBULL(C=value-list|EST < Weibull-options >)
WEIB(C=value-list < Weibull-options >)
creates a three-parameter Weibull probability plot for each value of the shape parameter c given by the
mandatory C= option or its alias, the SHAPE= option. If you specify C=EST, a plot is created based
on a maximum likelihood estimate for c. In the following example, the first PROBPLOT statement
creates four plots, and the second PROBPLOT statement creates a single plot:
To create the plot, the observations are ordered from smallest to largest, and the ith ordered observation
1c
is plotted against the quantile log 1 inC0:25
0:375
, where n is the number of nonmissing obser-
vations, and c is the Weibull distribution shape parameter. The horizontal axis is scaled in percentile
units.
Syntax: PROBPLOT Statement F 499
The point pattern on the plot for C=c tends to be linear with intercept and slope if the data are
Weibull distributed with the specific density function
( c 1 c
c x x
exp for x >
p.x/ D
0 for x
where
D threshold parameter
D scale parameter . > 0/
c D shape parameter .c > 0/
The intercept and slope are based on the quantile scale for the horizontal axis, which is displayed on a
Q-Q plot; see “QQPLOT Statement: CAPABILITY Procedure” on page 508.
To obtain a graphical estimate of c, specify a list of values for the C= option, and select the value that
most nearly linearizes the point pattern.
To assess the point pattern, you can add a diagonal distribution reference line corresponding to 0 and 0
with the Weibull-options THETA=0 and SIGMA=0 . Alternatively, you can add a line corresponding
to estimated values of 0 and 0 with the Weibull-options THETA=EST and SIGMA=EST. Specify
these options in parentheses, as in the following example:
Agreement between the reference line and the point pattern indicates that the Weibull distribution with
parameters c, 0 , and 0 is a good fit. You can specify the SCALE= option as an alias for the SIGMA=
option and the THRESHOLD= option as an alias for the THETA= option.
To create the plot, the observations are ordered from smallest to largest, and the log of the
shifted
ith ordered observation
x.i / , denoted by log.x.i / 0 /, is plotted against the quantile
i 0:375
log log 1 nC0:25 , where n is the number of nonmissing observations. The horizontal axis
is scaled in percentile units. Note that the C= shape parameter option is not mandatory with the
WEIBULL2 option.
1
The point pattern on the plot for THETA=0 tends to be linear with intercept log. / and slope c if the
data are Weibull distributed with the specific density function
( c 1 c
c x 0 x 0
exp for x > 0
p.x/ D
0 for x 0
where
Agreement between the distribution reference line and the point pattern indicates that the Weibull
distribution with parameters c0 , 0 and 0 is a good fit. You can specify the SCALE= option as an
alias for the SIGMA= option and the SHAPE= option as an alias for the C= option.
ZETA=value|EST
specifies a value for the scale parameter for lognormal probability plots requested with the LOG-
NORMAL option. Specify THETA=0 and ZETA=0 to request a distribution reference line with
intercept 0 and slope exp.0 /. See Output 6.22.1 for an example.
CGRID=color
specifies the color for the grid lines requested by the GRID option.
LEGEND=name | NONE
specifies the name of a LEGEND statement describing the legend for specification limit reference lines
and fitted curves. Specifying LEGEND=NONE is equivalent to specifying the NOLEGEND option.
LGRID=linetype
specifies the line type for the grid lines requested by the GRID option.
PCTLMINOR
requests minor tick marks for the percentile axis. See Output 6.22.1 for an example.
WGRID=n
specifies the width of the grid lines requested with the GRID option. If you use the WGRID= option,
you do not need to specify the GRID option.
GRIDCHAR=‘character ’
specifies the character used for the lines requested by the GRID option for a line printer plot. The
default is the vertical bar (|).
NOOBSLEGEND
NOOBSL
suppresses the legend that indicates the number of hidden observations.
PROBSYMBOL=‘character ’
specifies the character used to mark the points in a line printer plot. The default is the plus sign (+).
SYMBOL=‘character ’
specifies the character used to display the distribution reference line in a line printer plot. The default
character is the first letter of the distribution option keyword.
Parameters
Distribution Density Function p.x/ Range Location Scale Shape
.x /˛ 1 .C x/ˇ 1
Beta B.˛;ˇ / .˛Cˇ 1/
<x < C ˛, ˇ
1 x
Exponential exp x
˛ 1
1 x x
Gamma .˛/ exp x> ˛
.x /=
e
Gumbel exp e .x /= all x
p 1 .log.x / /2
Lognormal exp 2 2 x>
2.x /
(3-parameter)
p1 .x /2
Normal exp 2 2
all x
2
502 F Chapter 6: The CAPABILITY Procedure
Parameters
Distribution Density Function p.x/ Range Location Scale Shape
Generalized ˛¤0 1
.1 ˛.x /= /1=˛ 1
x> ˛
1
Pareto ˛D0 exp. .x /= /
˛ 1
˛ x
Power function x> ˛
x
Rayleigh 2
exp. .x /2 =.2 2 // x
c 1 c
c x x
Weibull exp x> c
(3-parameter)
c 1 c
c x 0 x 0
Weibull exp x > 0 0 c
(2-parameter) (known)
You can request these distributions with the BETA, EXPONENTIAL, GAMMA, LOGNORMAL, NORMAL,
WEIBULL, and WEIBULL2 options, respectively. If you do not specify a distribution option, a normal
probability plot is created.
Shape Parameters
Some of the distribution options in the PROBPLOT statement require you to specify one or two shape
parameters in parentheses after the distribution keyword. These are summarized in Table 6.71.
You can visually estimate the value of a shape parameter by specifying a list of values for the shape parameter
option. The PROBPLOT statement produces a separate plot for each value. You can then use the value of the
shape parameter producing the most nearly linear point pattern. Alternatively, you can request that the plot
Details: PROBPLOT Statement F 503
be created using an estimated shape parameter. For an example, see “Creating Lognormal Probability Plots”
on page 479.
For the LOGNORMAL and WEIBULL2 options, you can specify the slope directly with the SLOPE= option.
That is, for the LOGNORMAL option, specifying THETA=0 and SLOPE=exp.0 / displays the same line as
specifying THETA=0 and ZETA=0 . For the WEIBULL2 option, specifying SIGMA=0 and SLOPE= c10
displays the same line as specifying SIGMA=0 and C=c0 .
5 The intercept and slope are based on the quantile scale for the horizontal axis, which is displayed on a Q-Q plot; see “QQPLOT
ODS Graphics
Before you create ODS Graphics output, ODS Graphics must be enabled (for example, by using the ODS
GRAPHICS ON statement). For more information about enabling and disabling ODS Graphics, see the
section “Enabling and Disabling ODS Graphics” (Chapter 24, SAS/STAT User’s Guide).
The appearance of a graph produced with ODS Graphics is determined by the style associated with the ODS
destination where the graph is produced. PROBPLOT options used to control the appearance of traditional
graphics are ignored for ODS Graphics output.
When ODS Graphics is in effect, the PROBPLOT statement assigns a name to the graph it creates. You can
use this name to reference the graph when using ODS. The name is listed in Table 6.74.
Examples: PROBPLOT Statement F 505
See Chapter 4, “SAS/QC Graphics,” for more information about ODS Graphics and other methods for
producing charts.
data Sheets;
input Distance @@;
label Distance='Hole Distance in cm';
datalines;
9.80 10.20 10.27 9.70 9.76
10.11 10.24 10.20 10.24 9.63
9.99 9.78 10.10 10.21 10.00
9.96 9.79 10.08 9.79 10.06
10.10 9.95 9.84 10.11 9.93
10.56 10.47 9.42 10.44 10.16
10.11 10.36 9.94 9.77 9.36
9.89 9.62 10.05 9.72 9.82
9.99 10.16 10.58 10.70 9.54
10.31 10.07 10.33 9.98 10.15
;
The cutting process is in control, and you decide to check whether the process distribution is normal. The
following statements create a normal probability plot for Distance with lower and upper specification lines at
9.5 cm and 10.5 cm:
The plot is shown in Output 6.21.1. The MU= and SIGMA= normal-options request the diagonal reference
line that corresponds to the normal distribution with estimated parameters O D 10:027 and O D 0:2889. The
LSL= and USL= SPEC statement options request the lower and upper specification lines. The SYMBOL
statement specifies the symbol marker for the plotted points.
The close agreement between the diagonal reference line and the point pattern indicates that the specific
lognormal distribution with O D 0:5, O D 5:004, and O D 1:003 is a good fit for the diameter measurements.
Specifying HREF=95 adds a reference line indicating the 95th percentile of the lognormal distribution. The
HREFLABEL= option specifies a label for this line. The PCTLMINOR option displays minor tick marks on
508 F Chapter 6: The CAPABILITY Procedure
the percentile axis. The VREF= option adds reference lines indicating diameter values of 5.8, 5.9, and 6.0,
and the CHREF= and CVREF= options specify colors for the horizontal and vertical reference lines.
Based on the intersection of the diagonal reference line with the HREF= line, the estimated 95th percentile of
the diameter distribution is 5.85 mm.
Note that you could also construct a similar plot in which all three parameters are estimated by substituting
SIGMA=EST for SIGMA=0.5 in the preceding statements.
beta
exponential
gamma
Gumbel
three-parameter lognormal
normal
generalized Pareto
power function
Rayleigh
two-parameter Weibull
three-parameter Weibull
display a reference line corresponding to specific location and scale parameters for the theoretical
distribution
Getting Started: QQPLOT Statement F 509
You can also create a comparative Q-Q plot by using the QQPLOT statement in conjunction with a CLASS
statement.
You have three alternatives for producing Q-Q plots with the QQPLOT statement:
ODS Graphics output is produced if ODS Graphics is enabled, for example by specifying the ODS
GRAPHICS ON statement prior to the PROC statement.
Legacy line printer charts are produced when you specify the LINEPRINTER option in the PROC
statement.
See Chapter 4, “SAS/QC Graphics,” for more information about producing these different kinds of graphs.
N OTE : Q-Q plots are similar to probability plots, which you can create with the PROBPLOT statement (see
“PROBPLOT Statement: CAPABILITY Procedure” on page 477). Q-Q plots are preferable for graphical
estimation of distribution parameters and capability indices, whereas probability plots are preferable for
graphical estimation of percentiles.
data Sheets;
input Distance @@;
label Distance='Hole Distance in cm';
datalines;
9.80 10.20 10.27 9.70 9.76
10.11 10.24 10.20 10.24 9.63
9.99 9.78 10.10 10.21 10.00
9.96 9.79 10.08 9.79 10.06
10.10 9.95 9.84 10.11 9.93
10.56 10.47 9.42 10.44 10.16
10.11 10.36 9.94 9.77 9.36
9.89 9.62 10.05 9.72 9.82
9.99 10.16 10.58 10.70 9.54
10.31 10.07 10.33 9.98 10.15
;
510 F Chapter 6: The CAPABILITY Procedure
The cutting process is in control, and you decide to check whether the process distribution is normal. The
following statements create a Q-Q plot for Distance, shown in Figure 6.38, with lower and upper specification
lines at 9.5 cm and 10.5 cm:6
6 For a P-P plot using these data, see Figure 6.30. For a probability plot using these data, see Example 6.22.
Getting Started: QQPLOT Statement F 511
Specifying MU=EST and SIGMA=EST with the NORMAL option requests the reference line (alternatively,
you can specify numeric values for 0 and 0 with the MU= and SIGMA= options). The COLOR= and L=
512 F Chapter 6: The CAPABILITY Procedure
options specify the color of the line and the line type. The SQUARE option displays the plot in a square
format, and the NOSPECLEGEND option suppresses the legend for the specification lines.
You can specify the keyword QQ as an alias for QQPLOT, and you can use any number of QQPLOT
statements in the CAPABILITY procedure. The components of the QQPLOT statement are described as
follows.
variables
are the process variables for which to create Q-Q plots. If you specify a VAR statement, the variables
must also be listed in the VAR statement. Otherwise, the variables can be any numeric variables in the
input data set. If you do not specify a list of variables, then by default the procedure creates a Q-Q
plot for each variable listed in the VAR statement, or for each numeric variable in the DATA= data
set if you do not specify a VAR statement. For example, each of the following QQPLOT statements
produces two Q-Q plots, one for length and one for width:
options
specify the theoretical distribution for the plot or add features to the plot. If you specify more than
one variable, the options apply equally to each variable. Specify all options after the slash (/) in
the QQPLOT statement. You can specify only one option naming the distribution in each QQPLOT
statement, but you can specify any number of other options. The distributions available are the beta,
exponential, gamma, Gumbel, lognormal, normal, generalized Pareto, power function, Rayleigh,
two-parameter Weibull, and three-parameter Weibull. By default, the procedure produces a plot for the
normal distribution.
In the following example, the NORMAL option requests a normal Q-Q plot for each variable. The
MU= and SIGMA= normal-options request a distribution reference line with intercept 10 and slope
0.3 for each plot, corresponding to a normal distribution with mean D 10 and standard deviation
D 0:3. The SQUARE option displays the plot in a square frame, and the CTEXT= option specifies
the text color.
ctext=blue;
run;
Summary of Options
The following tables list the QQPLOT statement options by function. For complete descriptions, see
“Dictionary of Options” on page 517.
Distribution Options
Table 6.75 summarizes the options for requesting a specific theoretical distribution.
Option Description
BETA(beta-options) Specifies beta Q-Q plot for shape parameters ˛, ˇ
specified with mandatory ALPHA= and BETA=
beta-options
EXPONENTIAL(exponential-options) Specifies exponential Q-Q plot
GAMMA(gamma-options) Specifies gamma Q-Q plot for shape parameter ˛
specified with mandatory ALPHA=
gamma-option
GUMBEL(Gumbel-options) Specifies Gumbel Q-Q plot
LOGNORMAL(lognormal-options) Specifies lognormal Q-Q plot for shape parameter
specified with mandatory SIGMA=
lognormal-option
NORMAL(normal-options) Specifies normal Q-Q plot
PARETO(Pareto-options) Specifies generalized Pareto Q-Q plot for shape
parameter ˛ specified with mandatory ALPHA=
Pareto-option
POWER(power-options) Specifies power function Q-Q plot for shape
parameter ˛ specified with mandatory ALPHA=
power-option
RAYLEIGH(Rayleigh-options) Specifies Rayleigh Q-Q plot
WEIBULL(Weibull-options) Specifies three-parameter Weibull Q-Q plot for
shape parameter c specified with mandatory C=
Weibull-option
WEIBULL2(Weibull2-options) Specifies two-parameter Weibull Q-Q plot
Table 6.76 summarizes options that specify parameter values for theoretical distributions and that control the
display of a distribution reference line. Specify these options in parentheses after the distribution option. For
example, the following statements use the NORMAL option to request a normal Q-Q plot with a specific
distribution reference line. The MU= and SIGMA= normal-options display a distribution reference line with
intercept 10 and slope 0.3. The COLOR= normal-option draws the line in red.
514 F Chapter 6: The CAPABILITY Procedure
Option Description
Distribution Reference Line Options
COLOR= Specifies color of distribution reference line
L= Specifies line type of distribution reference line
SYMBOL= Specifies plotting character for line printer plots
W= Specifies width of distribution reference line
Beta-Options
ALPHA= Specifies mandatory shape parameter ˛
BETA= Specifies mandatory shape parameter ˇ
SIGMA= Specifies reference line slope
THETA= Specifies reference line intercept
Exponential-Options
SIGMA= Specifies reference line slope
THETA= Specifies reference line intercept
Gamma-Options
ALPHA= Specifies mandatory shape parameter ˛
SIGMA= Specifies reference line slope
THETA= Specifies reference line intercept
Gumbel-Options
MU= Specifies reference line intercept
SIGMA= Specifies reference line slope
Lognormal-Options
SIGMA= Specifies mandatory shape parameter
SLOPE= Specifies reference line slope
THETA= Specifies reference line intercept
ZETA= Specifies reference line slope exp.0 /
Normal-Options
CPKREF Specifies vertical reference lines at intersection of specification
limits with distribution reference line
CPKSCALE Rescales horizontal axis in Cpk units
MU= Specifies reference line intercept
SIGMA= Specifies reference line slope
Pareto-Options
ALPHA= Specifies mandatory shape parameter ˛
Syntax: QQPLOT Statement F 515
Option Description
SIGMA= Specifies reference line slope
THETA= Specifies reference line intercept
Power-Options
ALPHA= Specifies mandatory shape parameter ˛
SIGMA= Specifies reference line slope
THETA= Specifies reference line intercept
Rayleigh-Options
SIGMA= Specifies reference line slope
THETA= Specifies reference line intercept
Weibull-Options
C= Specifies mandatory shape parameter c
SIGMA= Specifies reference line slope
THETA= Specifies reference line intercept
Weibull2-Options
C= Specifies c0 for reference line (slope is c10 )
SIGMA= Specifies 0 for reference line (intercept is log.0 /)
SLOPE= Specifies reference line slope
THETA= Specifies known lower threshold 0
General Options
Table 6.77 lists options that control the appearance of the plots.
Option Description
General Plot Layout Options
CONTENTS= Specifies table of contents entry for Q-Q plot grouping
HREF= Specifies reference lines perpendicular to the horizontal axis
HREFLABELS= Specifies labels for HREF= lines
LEGEND= Specifies LEGEND statement
NADJ= Adjusts sample size (N) when computing quantiles
NOFRAME Suppresses frame around plotting area
NOLEGEND Suppresses legend
NOLINELEGEND Suppresses distribution reference line information in legend
NOSPECLEGEND Suppresses specifications information in legend
PCTLAXIS Adds a nonlinear percentile axis
PCTLMINOR Adds minor tick marks to percentile axis
PCTLSCALE Replaces theoretical quantiles with percentiles
516 F Chapter 6: The CAPABILITY Procedure
Option Description
RANKADJ= Adjusts ranks when computing quantiles
ROTATE Switches horizontal and vertical axes
SQUARE Displays Q-Q plot in square format
VREF= Specifies reference lines perpendicular to the vertical axis
VREFLABELS= Specifies labels for VREF= lines
Graphics Options
ANNOTATE= Specifies annotate data set
CAXIS= Specifies color for axis
CFRAME= Specifies color for frame
CGRID= Specifies color for grid lines
CHREF= Specifies colors for HREF= lines
CSTATREF= Specifies colors for STATREF= lines
CTEXT= Specifies color for text
CVREF= Specifies colors for VREF= lines
DESCRIPTION= Specifies description for plot in graphics catalog
FONT= Specifies software font for text
GRID Draws grid lines perpendicular to the quantile axis
HEIGHT= Specifies height of text used outside framed areas
HMINOR= Specifies number of horizontal minor tick marks
HREFLABPOS= Specifies vertical position of labels for HREF= lines
INFONT= Specifies software font for text inside framed areas
INHEIGHT= Specifies height of text inside framed areas
LGRID= Specifies a line type for grid lines
LHREF= Specifies line styles for HREF= lines
LSTATREF= Specifies line styles for STATREF= lines
LVREF= Specifies line styles for VREF= lines
NAME= Specifies name for plot in graphics catalog
NOHLABEL Suppresses label for horizontal axis
NOVLABEL Suppresses label for vertical axis
NOVTICK Suppresses tick marks and tick mark labels for vertical axis
STATREF= Specifies reference lines at values of summary statistics
STATREFLABELS= Specifies labels for STATREF= lines
STATREFSUBCHAR= Specifies substitution character for displaying statistic values
in STATREFLABELS= labels
VAXIS= Specifies AXIS statement for vertical axis
VAXISLABEL= Specifies label for vertical axis
VMINOR= Specifies number of vertical minor tick marks
VREFLABPOS= Specifies horizontal position of labels for VREF= lines
WAXIS= Specifies line thickness for axes and frame
WGRID= Specifies thickness for grid lines
Option Description
ODSFOOTNOTE2= Specifies secondary footnote displayed on Q-Q plot
ODSTITLE= Specifies title displayed on Q-Q plot
ODSTITLE2= Specifies secondary title displayed on Q-Q plot
Dictionary of Options
The following sections provide detailed descriptions of options specific to the QQPLOT statement. See
“Dictionary of Common Options: CAPABILITY Procedure” on page 550 for detailed descriptions of options
common to all the plot statements.
General Options
You can specify the following options whether you are producing ODS Graphics output or traditional
graphics:
ALPHA=value-list|EST
specifies values for a mandatory shape parameter ˛ .˛ > 0/ for Q-Q plots requested with the BETA,
GAMMA, PARETO, and POWER options. A plot is created for each value specified. For examples, see
the entries for the distribution options. If you specify ALPHA=EST, a maximum likelihood estimate is
computed for ˛.
518 F Chapter 6: The CAPABILITY Procedure
Agreement between the reference line and the point pattern indicates that the beta distribution with
parameters ˛, ˇ, 0 , and 0 is a good fit. You can specify the SCALE= option as an alias for the
SIGMA= option and the THRESHOLD= option as an alias for the THETA= option.
Syntax: QQPLOT Statement F 519
BETA=value-list|EST
specifies values for the shape parameter ˇ .ˇ > 0/ for Q-Q plots requested with the BETA distribution
option. A plot is created for each value specified with the BETA= option. If you specify BETA=EST, a
maximum likelihood estimate is computed for ˇ. For examples, see the preceding entry for the BETA
distribution option.
C=value(-list)|EST
specifies the shape parameter c (c > 0) for Q-Q plots requested with the WEIBULL and WEIBULL2
options. You must specify C= as a Weibull-option with the WEIBULL option; in this situation it accepts
a list of values, or if you specify C=EST, a maximum likelihood estimate is computed for c. You can
optionally specify C=value or C=EST as a Weibull2-option with the WEIBULL2 option to request a
distribution reference line; in this situation, you must also specify SIGMA=value or SIGMA=EST. For
an example, see Output 6.25.1.
CPKSCALE
rescales the quantile axis in Cpk units for plots requested with the NORMAL option. Specify
CPKSCALE in parentheses after the NORMAL option. You can use the CPKSCALE option with the
CPKREF option for graphical estimation of the capability indices CPU , CPL, and Cpk , as illustrated
in Output 6.26.1.
(
1 x
exp for x
p.x/ D
0 for x <
where is the threshold parameter, and is the scale parameter . > 0/.
To assess the point pattern, you can add a diagonal distribution reference line with intercept 0 and
slope 0 with the exponential-options THETA=0 and SIGMA=0 . Alternatively, you can add a line
corresponding to estimated values of 0 and slope 0 with the exponential-options THETA=EST and
SIGMA=EST. Specify these options in parentheses, as in the following example: as in the following
example:
Agreement between the reference line and the point pattern indicates that the exponential distribution
with parameters 0 and 0 is a good fit. You can specify the SCALE= option as an alias for the
SIGMA= option and the THRESHOLD= option as an alias for the THETA= option.
520 F Chapter 6: The CAPABILITY Procedure
D threshold parameter
D scale parameter . > 0/
˛ D shape parameter .˛ > 0/
To obtain a graphical estimate of ˛, specify a list of values for the ALPHA= option, and select the
value that most nearly linearizes the point pattern.
To assess the point pattern, you can add a diagonal distribution reference line with intercept 0
and slope 0 with the gamma-options THETA=0 and SIGMA=0 . Alternatively, you can add a
line corresponding to estimated values of 0 and 0 with the gamma-options THETA=EST and
SIGMA=EST. Specify these options in parentheses, as in the following example:
Agreement between the reference line and the point pattern indicates that the gamma distribution with
parameters ˛, 0 , and 0 is a good fit. You can specify the SCALE= option as an alias for the SIGMA=
option and the THRESHOLD= option as an alias for the THETA= option.
e .x /=
.x /=
p.x/ D exp e
Syntax: QQPLOT Statement F 521
GRID
draws reference lines perpendicular to the quantile axis at major tick marks.
LEGEND=name | NONE
specifies the name of a LEGEND statement describing the legend for specification limit reference lines
and fitted curves. Specifying LEGEND=NONE is equivalent to specifying the NOLEGEND option.
THRESHOLD= option as an alias for the THETA= option and the SCALE= option as an alias for the
ZETA= option.
You can also display the reference line by specifying THETA=0 , and you can specify the slope with
the SLOPE= option. For example, the following two QQPLOT statements produce charts with identical
reference lines:
MU=value|EST
specifies a value for the mean for a Q-Q plot requested with the GUMBEL and NORMAL options.
For the normal distribution, you can specify MU=EST to request a distribution reference line with
intercept equal to the sample mean, as illustrated in Figure 6.39. If you specify MU=EST for the
Gumbel distribution, a maximum likelihood estimate is calculated.
NADJ=value
specifies the adjustment value added to the sample size in the calculation of theoretical quantiles.
The default is 14 , as described by Blom (1958). Also refer to Chambers et al. (1983) for additional
information.
NOLEGEND
LEGEND=NONE
suppresses legends for specification limits, fitted curves, distribution lines, and hidden observations.
For an example, see Output 6.26.1.
NOLINELEGEND
NOLINEL
suppresses the legend for the optional distribution reference line.
p1 .x /2
p.x/ D exp 2 2
for all x
2
corresponding to estimated values of 0 and 0 with the normal-options MU=EST and SIGMA=EST;
the estimates of 0 and 0 are the sample mean and sample standard deviation. Specify these options
in parentheses, as in the following example:
For an example, see “Adding a Distribution Reference Line” on page 511. Agreement between the
reference line and the point pattern indicates that the normal distribution with parameters 0 and 0 is
a good fit. You can specify MU=EST and SIGMA=EST to request a distribution reference line with
the sample mean and sample standard deviation as the intercept and slope.
Other normal-options include CPKREF and CPKSCALE. The CPKREF option draws reference lines
extending from the intersections of specification limits with the distribution reference line to the
theoretical quantile axis. The CPKSCALE option rescales the theoretical quantile axis in Cpk units.
You can use the CPKREF option with the CPKSCALE option for graphical estimation of the capability
indices CPU , CPL, and Cpk , as illustrated in Output 6.26.1.
NOSPECLEGEND
NOSPECL
suppresses the legend for specification limit reference lines. For an example, see Figure 6.39.
where
D threshold parameter
D scale parameter . > 0/
˛ D shape parameter .˛ > 0/
To obtain a graphical estimate of ˛, specify a list of values for the ALPHA= option, and select the
value that most nearly linearizes the point pattern.
To assess the point pattern, you can add a diagonal distribution reference line corresponding to 0 and 0
with the Pareto-options THETA=0 and SIGMA=0 . Alternatively, you can add a line corresponding
524 F Chapter 6: The CAPABILITY Procedure
to estimated values of 0 and 0 with the Pareto-options THETA=EST and SIGMA=EST. Specify
these options in parentheses following the PARETO option.
Agreement between the reference line and the point pattern indicates that the generalized Pareto
distribution with parameters ˛, 0 , and 0 is a good fit.
PCTLAXIS(axis-options)
adds a nonlinear percentile axis along the frame of the Q-Q plot opposite the theoretical quantile axis.
The added axis is identical to the axis for probability plots produced with the PROBPLOT statement.
When using the PCTLAXIS option, you must specify HREF= values in quantile units, and you cannot
use the NOFRAME option. You can specify the following axis-options:
CGRID=color
specifies the color used for grid lines.
GRID
draws grid lines perpendicular to the percentile axis at major tick marks.
GRIDCHAR=‘character ’
specifies the character used to draw grid lines associated with the percentile axis on line printer
plots.
LABEL=‘string’
specifies the label for the percentile axis.
LGRID=linetype
specifies the line type used for grid lines associated with the percentile axis.
PCTLORDER=value-list
specifies the tick mark values labeled on the percentile axis. The values must be listed in
increasing order and must be between 0 and 100, exclusive. Values that correspond to quantiles
that are outside the range of the theoretical quantile axis are not displayed.
WGRID=value
specifies the thickness for grid lines associated with the percentile axis.
N OTE : See Creating Normal Q-Q Plots in the SAS/QC Sample Library.
For example, the following statements display the plot in Figure 6.40:
PCTLSCALE
requests scale labels for the theoretical quantile axis in percentile units, resulting in a nonlinear axis
scale. Tick marks are drawn uniformly across the axis based on the quantile scale. In all other respects,
the plot remains the same, and you must specify HREF= values in quantile units. For a true nonlinear
axis, use the PCTLAXIS option or use the PROBPLOT statement.
N OTE : See Creating Normal Q-Q Plots in the SAS/QC Sample Library.
For example, the following statements display the plot in Figure 6.41:
Figure 6.41 Normal Q-Q Plot for Reading Percentiles of Specification Limits
where
Syntax: QQPLOT Statement F 527
D threshold parameter
D scale parameter . > 0/
˛ D shape parameter .˛ > 0/
To obtain a graphical estimate of ˛, specify a list of values for the ALPHA= option, and select the
value that most nearly linearizes the point pattern.
To assess the point pattern, you can add a diagonal distribution reference line corresponding to 0 and 0
with the power-options THETA=0 and SIGMA=0 . Alternatively, you can add a line corresponding
to estimated values of 0 and 0 with the power-options THETA=EST and SIGMA=EST. Specify
these options in parentheses following the POWER option.
Agreement between the reference line and the point pattern indicates that the power function distribution
with parameters ˛, 0 , and 0 is a good fit.
RANKADJ=value
specifies the adjustment value added to the ranks in the calculation of theoretical quantiles. The default
is 83 , as described by Blom (1958). Also refer to Chambers et al. (1983) for additional information.
ROTATE
switches the horizontal and vertical axes so that the theoretical percentiles are plotted vertically while
the data are plotted horizontally. Regardless of whether the plot has been rotated, horizontal axis
options (such as HAXIS=) refer to the horizontal axis, and vertical axis options (such as VAXIS=)
refer to the vertical axis. All other options that depend on axis placement adjust to the rotated axes.
528 F Chapter 6: The CAPABILITY Procedure
SIGMA=value-list|EST
specifies the value of the distribution parameter , where > 0. Alternatively, you can specify
SIGMA=EST to request an estimate for 0 that is computed as described in “Lognormal Distribution”
on page 353. The use of the SIGMA= option depends on the distribution option specified, as indicated
by Table 6.78.
For an example using SIGMA=EST, see Output 6.26.1. For an example of lognormal plots using the
SIGMA= option, see Example 6.24.
SLOPE=value|EST
specifies the slope for a distribution reference line requested with the LOGNORMAL and WEIBULL2
options.
When you use the SLOPE= option with the LOGNORMAL option, you must also specify a threshold
parameter value 0 with the THETA= option. Specifying the SLOPE= option is an alternative to
specifying ZETA=0 , which requests a slope of exp.0 /. See Output 6.24.4 for an example.
When you use the SLOPE= option with the WEIBULL2 option, you must also specify a scale parameter
value 0 with the SIGMA= option. Specifying the SLOPE= option is an alternative to specifying C=c0 ,
which requests a slope of c10 .
For example, the first and second QQPLOT statements that follow produce plots identical to those
produced by the third and fourth QQPLOT statements:
Syntax: QQPLOT Statement F 529
SQUARE
displays the Q-Q plot in a square frame. Compare Figure 6.38 with Figure 6.39. The default is a
rectangular frame.
THETA=value|EST
THRESHOLD=value|EST
specifies the lower threshold parameter for Q-Q plots requested with the BETA, EXPONENTIAL,
GAMMA, LOGNORMAL, PARETO, POWER, RAYLEIGH, WEIBULL, and WEIBULL2 options.
When used with the WEIBULL2 option, the THETA= option specifies the known lower threshold 0 ,
for which the default is 0. See Output 6.25.2 for an example.
When used with the other distribution options, the THETA= option specifies 0 for a distribution
reference line; alternatively in this situation, you can specify THETA=EST to request a maximum like-
lihood estimate for 0 . To request the line, you must also specify a scale parameter. See Output 6.24.4
for an example of the THETA= option with a lognormal Q-Q plot.
To create the plot, the observations are ordered from smallest to largest, and the ith ordered obser-
1c
vation is plotted against the quantile log 1 inC0:25
0:375
, where n is the number of nonmissing
observations, and c is the Weibull distribution shape parameter.
The pattern on the plot for C=c tends to be linear with intercept and slope if the data are Weibull
distributed with the specific density function
( c 1 c
c x x
exp for x >
p.x/ D
0 for x
where is the threshold parameter, is the scale parameter . > 0/, and c is the shape parameter (c >
0).
To obtain a graphical estimate of c, specify a list of values for the C= option, and select the value
that most nearly linearizes the point pattern. For an illustration, see Example 6.25. To assess the
530 F Chapter 6: The CAPABILITY Procedure
point pattern, you can add a diagonal distribution reference line with intercept 0 and slope 0 with
the Weibull-options THETA=0 and SIGMA=0 . Alternatively, you can add a line corresponding to
estimated values of 0 and 0 with the Weibull-options THETA=EST and SIGMA=EST. Specify these
options in parentheses, as in the following example:
Agreement between the reference line and the point pattern indicates that the Weibull distribution with
parameters c, 0 , and 0 is a good fit. You can specify the SCALE= option as an alias for the SIGMA=
option and the THRESHOLD= option as an alias for the THETA= option.
WEIBULL2< (Weibull2-options) >
W2< (Weibull2-options) >
creates a two-parameter Weibull Q-Q plot. You should use the WEIBULL2 option when your data
have a known lower threshold 0 . You can specify the threshold value 0 with the THETA= option
or its alias, the THRESHOLD= option. If you are uncertain of the lower threshold value, you can
estimate 0 graphically by specifying a list of values for the THETA= option. Select the value that
most linearizes the point pattern. The default is 0 D 0.
To create the plot, the observations are ordered from smallest to largest, and the log of the shifted
ith ordered observation x.i / , log.x.i / 0 /, is plotted against the quantile log log 1 inC0:25
0:375
,
where n is the number of nonmissing observations. Unlike the three-parameter Weibull quantile, the
preceding expression is free of distribution parameters. This is why the C= shape parameter option is
not mandatory with the WEIBULL2 option.
1
The pattern on the plot for THETA=0 tends to be linear with intercept log. / and slope c if the data
are Weibull distributed with the specific density function
( c 1 c
c x 0 x 0
exp for x > 0
p.x/ D
0 for x 0
where 0 is a known lower threshold parameter, is a scale parameter . > 0/, and c is a shape
parameter (c > 0).
The advantage of a two-parameter Weibull plot over a three-parameter Weibull plot is that you can
visually estimate the shape parameter c and the scale parameter from the slope and intercept of
the point pattern; see Example 6.25 for an illustration of this method. The disadvantage is that the
two-parameter Weibull distribution applies only in situations where the threshold parameter is known.
See “Graphical Estimation” on page 535 for more information.
To assess the point pattern, you can add a diagonal distribution reference line corresponding to the scale
parameter 0 and shape parameter c0 with the Weibull2-options SIGMA=0 and C=c0 . Alternatively,
you can add a distribution reference line corresponding to estimated values of 0 and c0 with the
Weibull2-options SIGMA=EST and C=EST. This line has intercept log.0 / and slope c1 . Agreement
0
between the line and the point pattern indicates that the Weibull distribution with parameters c0 , 0 ,
and 0 is a good fit. You can specify the SCALE= option as an alias for the SIGMA= option and the
SHAPE= option as an alias for the C= option.
You can also display the reference line by specifying SIGMA=0 , and you can specify the slope with
the SLOPE= option. For example, the following QQPLOT statements produce identical plots:
Syntax: QQPLOT Statement F 531
ZETA=value|EST
specifies a value for the scale parameter for lognormal Q-Q plots requested with the LOGNORMAL
option. Specify THETA=0 and ZETA=0 to request a distribution reference line with intercept 0 and
slope exp.0 /.
CGRID=color
specifies the color for the grid lines associated with the quantile axis, requested by the GRID option.
LGRID=linetype
specifies the line type for the grid lines associated with the quantile axis, requested by the GRID option.
CPKREF
draws reference lines extending from the intersections of the specification limits with the distribution
reference line to the quantile axis in plots requested with the NORMAL option. Specify CPKREF in
parentheses after the NORMAL option. You can use the CPKREF option with the CPKSCALE option
for graphical estimation of the capability indices CPU , CPL, and Cpk , as illustrated in Output 6.26.1.
PCTLMINOR
requests minor tick marks for the percentile axis displayed when you use the PCTLAXIS option. See
the entry for the PCTLAXIS option for an example.
WGRID=n
specifies the width of the grid lines associated with the quantile axis, requested with the GRID option.
If you use the WGRID= option, you do not need to specify the GRID option.
NOOBSLEGEND
NOOBSL
suppresses the legend that indicates the number of hidden observations.
QQSYMBOL=‘character ’
specifies the character used to plot the Q-Q points in line printer plots. The default is the plus sign (C).
SYMBOL=‘character ’
specifies the character used for a distribution reference line in a line printer plot. The default character
is the first letter of the distribution option keyword.
532 F Chapter 6: The CAPABILITY Procedure
graphical estimation of shape parameters, location and scale parameters, theoretical percentiles, and
capability indices
Then the ith orderedvalue x.i/ is represented on the plot by a point whose y-coordinate is x.i / and whose
i 0:375
x-coordinate is F 1 nC0:25 , where F ./ is the theoretical distribution with zero location parameter and
unit scale parameter.
You can modify the adjustment constants –0.375 and 0.25 with the RANKADJ= and NADJ= options. This
default combination is recommended by Blom (1958). For additional information, refer to Chambers et al.
(1983). Because x.i / is a quantile of the empirical cumulative distribution function (ecdf), a Q-Q plot
Details: QQPLOT Statement F 533
compares quantiles of the ecdf with quantiles of a theoretical distribution. Probability plots (see “PROBPLOT
Statement: CAPABILITY Procedure” on page 477) are constructed the same way, except that the x-axis is
scaled nonlinearly in percentiles.
If the quantiles of the theoretical and data distributions agree, the plotted points fall on or near the line
y D x.
If the theoretical and data distributions differ only in their location or scale, the points on the plot fall
on or near the line y D ax C b. The slope a and intercept b are visual estimates of the scale and
location parameters of the theoretical distribution.
Q-Q plots are more convenient than probability plots for graphical estimation of the location and scale
parameters because the x-axis of a Q-Q plot is scaled linearly. On the other hand, probability plots are more
convenient for estimating percentiles or probabilities.
There are many reasons why the point pattern in a Q-Q plot may not be linear. Chambers et al. (1983) and
Fowlkes (1987) discuss the interpretations of commonly encountered departures from linearity, and these are
summarized in Table 6.79.
In some applications, a nonlinear pattern may be more revealing than a linear pattern. However, Chambers
et al. (1983) note that departures from linearity can also be due to chance variation.
534 F Chapter 6: The CAPABILITY Procedure
Parameters
Distribution Density Function p.x/ Range Location Scale Shape
.x /˛ 1 .C x/ˇ 1
Beta B.˛;ˇ / .˛Cˇ 1/
<x < C ˛, ˇ
1 x
Exponential exp x
˛ 1
1 x x
Gamma .˛/ exp x> ˛
.x /=
e
Gumbel exp e .x /= all x
p 1 .log.x / /2
Lognormal exp 2 2
x>
2.x /
(3-parameter)
p1 .x /2
Normal exp 2 2
all x
2
Generalized ˛¤0 1
.1 ˛.x /= /1=˛ 1
x> ˛
1
Pareto ˛D0 exp. .x /= /
˛ 1
˛ x
Power function x> ˛
x
Rayleigh 2
exp. .x /2 =.2 2 // x
c 1 c
c x x
Weibull exp x> c
(3-parameter)
c 1 c
c x 0 x 0
Weibull exp x > 0 0 c
(2-parameter) (known)
You can request these distributions with the BETA, EXPONENTIAL, GAMMA, LOGNORMAL, NORMAL,
WEIBULL, and WEIBULL2 options, respectively. If you do not specify a distribution option, a normal Q-Q
plot is created.
Details: QQPLOT Statement F 535
Graphical Estimation
You can use Q-Q plots to estimate shape, location, and scale parameters and to estimate percentiles. If you
are working with a normal Q-Q plot, you can also estimate certain capability indices.
Shape Parameters
Some distribution options in the QQPLOT statement require that you specify one or two shape parameters in
parentheses after the distribution keyword. These are summarized in Table 6.81.
You can visually estimate a shape parameter by specifying a list of values for the shape parameter option. A
separate plot is displayed for each value, and you can then select the value that linearizes the point pattern.
Alternatively, you can request that the plot be created using an estimated shape parameter. See the entries for
the distribution options in the section “Dictionary of Options” on page 517. for details on specification of
shape parameters. Example 6.24 and Example 6.25 illustrate shape parameter estimation with lognormal and
Weibull Q-Q plots.
Note that for Q-Q plots requested with the WEIBULL2 option, you can estimate the shape parameter c from
a linear pattern using the fact that the slope of the pattern is 1c . For an illustration, see Example 6.25.
Beta ˛,ˇ
Exponential
Gamma ˛
Gumbel
Lognormal exp./
Normal
Generalized Pareto ˛
Power function ˛
Rayleigh
Weibull (3-parameter) c
1
Weibull (2-parameter) 0 (known) c log. / c
You can enhance a Q-Q plot with a diagonal distribution reference line by specifying the parameters that
determine the slope and intercept of the line; alternatively, you can request estimates for these parameters.
This line is an aid to checking the linearity of the point pattern, and it facilitates parameter estimation. For
instance, specifying MU=3 and SIGMA=2 with the NORMAL option requests a line with intercept 3 and
slope 2. Specifying SIGMA=1 and C=2 with the WEIBULL2 option requests a line with intercept log.1/ D 0
and slope 21 .
With the LOGNORMAL and WEIBULL2 options, you can specify the slope directly with the SLOPE=
option. That is, for the LOGNORMAL option, specifying THETA=0 and SLOPE=exp.0 / gives the same
reference line as specifying THETA=0 and ZETA=0 . For the WEIBULL2 option, specifying SIGMA=0
and SLOPE= c10 gives the same reference line as specifying SIGMA=0 and C=c0 .
For an example of parameter estimation using a normal Q-Q plot, see “Adding a Distribution Reference
Line” on page 511. Example 6.24 illustrates parameter estimation using a lognormal plot, and Example 6.25
illustrates estimation using two-parameter and three-parameter Weibull plots.
Theoretical Percentiles
There are two ways to estimate percentiles from a Q-Q plot:
Specify the PCTLAXIS option, which adds a percentile axis opposite the theoretical quantile axis. The
scale for the percentile axis ranges between 0 and 100 with tick marks at percentile values such as 1, 5,
10, 25, 50, 75, 90, 95, and 99. See Figure 6.40 for an example.
Specify the PCTLSCALE option, which relabels the horizontal axis tick marks with their percentile
equivalents but does not alter their spacing. For example, on a normal Q-Q plot, the tick mark labeled
“0” is relabeled as “50” because the 50th percentile corresponds to the zero quantile. See Figure 6.41
for an example.
You can also estimate percentiles using probability plots created with the PROBPLOT statement. See
Output 6.22.1 for an example.
Details: QQPLOT Statement F 537
Capability Indices
When the point pattern on a normal Q-Q plot is linear, you can estimate the capability indices CPU , CPL,
and Cpk from the plot, as explained by Rodriguez (1992). This method exploits the fact that the horizontal
axis of a Q-Q plot indicates the distance in standard deviation units (multiple of ) between a measurement
or specification limit and the process average.
In particular, one-third the standardized distance between an upper specification limit and the mean is the
one-sided capability index CPU .
USL
CPU D
3
Likewise, one-third the standardized distance between a lower specification limit and the mean is the
one-sided capability index CPL.
LSL
CPL D
3
Consequently, if you rescale the quantile axis of a normal Q-Q plot by a factor of three, you can read CPU
and CPL from the horizontal coordinates of the points at which the upper and lower specification lines
intersect the point pattern. Because Cpk is defined as the minimum of CPU and CPL this method also
provides a graphical estimate of Cpk . For an illustration, see Example 6.26.
ODS Graphics
Before you create ODS Graphics output, ODS Graphics must be enabled (for example, by using the ODS
GRAPHICS ON statement). For more information about enabling and disabling ODS Graphics, see the
section “Enabling and Disabling ODS Graphics” (Chapter 24, SAS/STAT User’s Guide).
The appearance of a graph produced with ODS Graphics is determined by the style associated with the ODS
destination where the graph is produced. QQPLOT options used to control the appearance of traditional
graphics are ignored for ODS Graphics output.
When ODS Graphics is in effect, the QQPLOT statement assigns a name to the graph it creates. You can use
this name to reference the graph when using ODS. The name is listed in Table 6.84.
See Chapter 4, “SAS/QC Graphics,” for more information about ODS Graphics and other methods for
producing charts.
Examples: QQPLOT Statement F 539
data Measures;
input Diameter @@;
label Diameter='Diameter in mm';
datalines;
5.501 5.251 5.404 5.366 5.445 5.576 5.607
5.200 5.977 5.177 5.332 5.399 5.661 5.512
5.252 5.404 5.739 5.525 5.160 5.410 5.823
5.376 5.202 5.470 5.410 5.394 5.146 5.244
5.309 5.480 5.388 5.399 5.360 5.368 5.394
5.248 5.409 5.304 6.239 5.781 5.247 5.907
5.208 5.143 5.304 5.603 5.164 5.209 5.475
5.223
;
The nonlinearity of the points in Output 6.23.1 indicates a departure from normality. Because the point
pattern is curved with slope increasing from left to right, a theoretical distribution that is skewed to the right,
such as a lognormal distribution, should provide a better fit than the normal distribution. The mild curvature
suggests that you should examine the data with a series of lognormal Q-Q plots for small values of the shape
parameter, as illustrated in the next example.
N OTE : You must specify a value for the shape parameter for a lognormal Q-Q plot with the SIGMA=
option or its alias, the SHAPE= option.
The plot in Output 6.24.2 displays the most linear point pattern, indicating that the lognormal distribution
with D 0:5 provides a reasonable fit for the data distribution.
Data with this particular lognormal distribution have the density function
( p
2
exp 2.log.x / /2
p for x >
p.x/ D .x /
0 for x
The points in the plot fall on or near the line with intercept and slope exp./. Based on Output 6.24.2,
5 and exp./ 1:2 3 D 0:4, giving log.0:4/ 0:92.
Estimating Percentiles
N OTE : See Creating Lognormal Q-Q Plots in the SAS/QC Sample Library.
You can use a Q-Q plot to estimate percentiles such as the 95th percentile of the lognormal distribution.7
The point pattern in Output 6.24.2 has a slope of approximately 0.39 and an intercept of 5. The following
statements reproduce this plot, adding a lognormal reference line with this slope and intercept.
7 You can also use a probability plot for this purpose. See Output 6.22.1.
544 F Chapter 6: The CAPABILITY Procedure
The PCTLAXIS option labels the major percentiles, and the GRID option draws percentile axis reference
lines. The 95th percentile is 5.9, because the intersection of the distribution reference line and the 95th
reference line occurs at this value on the vertical axis.
Alternatively, you can compute this percentile from the estimated lognormal parameters. The 100˛th
percentile of the lognormal distribution is
1
P˛ D exp. ˆ .˛/ C / C
where ˆ 1 ./
is the inverse cumulative standard normal distribution. Consequently,
1 1 1
P0:95 exp ˆ .0:95/ C log.0:39/ C 5 exp 1:645 0:94 C 5 5:89
2 2
data Measures;
set Measures;
Logdiam=log(Diameter-5);
label Logdiam='log(Diameter-5)';
run;
Because the point pattern in Output 6.24.5 is linear, you can estimate the lognormal parameters and as
the normal plot estimates of and , which are –0.99 and 0.51. These values correspond to the previous
estimates of –0.92 for and 0.5 for .
546 F Chapter 6: The CAPABILITY Procedure
data Failures;
input Time @@;
label Time='Time in Months';
datalines;
29.42 32.14 30.58 27.50 26.08 29.06 25.10 31.34
29.14 33.96 30.64 27.32 29.86 26.28 29.68 33.76
29.32 30.82 27.26 27.92 30.92 24.64 32.90 35.46
30.28 28.36 25.86 31.36 25.26 36.32 28.58 28.88
26.72 27.42 29.02 27.54 31.60 33.46 26.78 27.82
29.18 27.94 27.66 26.42 31.00 26.64 31.44 32.52
;
USL
CPU D
3
and CPL is defined as
LSL
CPL D
3
Examples: QQPLOT Statement F 549
it follows that an estimate of CPU is 1:7=3 D 0:57, and an estimate of CPL is 1:8=3 D 0:6. Thus, except for
a factor of three, you can estimate CPU and CPL from the points of intersection between the specification
lines and the point pattern.
The following statements facilitate this type of estimation by creating a Q-Q plot, displayed in Output 6.26.1,
in which the horizontal axis is rescaled by a factor of three:
Using this display, you can estimate CPU and CPL directly from the horizontal axis as 0.55 and 0.60,
respectively (the negative sign for –0.60 is ignored). The minimum of these values (0.55) is an estimate of
Cpk . Note that this estimate agrees with the numerically obtained estimate for Cpk that is displayed on the
plot with the INSET statement.
See Rodriguez (1992) for further discussion concerning the use of Q-Q plots in process capability analysis.
CDFPLOT
COMPHISTOGRAM
HISTOGRAM
PPPLOT
PROBPLOT
QQPLOT
As noted, some options are applicable only to comparative plots produced by the COMPHISTOGRAM
statement or by another plot statement in conjunction with a CLASS statement.
General Options
ALPHADELTA=value
specifies the change in successive estimates of ˛O at which iteration terminates in the Newton-Raphson
approximation of the maximum likelihood estimate of ˛ for gamma distributions requested with the
GAMMA option. Enclose the ALPHADELTA= option in parentheses after the GAMMA keyword.
Iteration continues until the change in ˛ is less than the value specified or the number of iterations
exceeds the value of the MAXITER= option. The default value is 0.00001.
ALPHAINITIAL=value
specifies the initial value for ˛O in the Newton-Raphson approximation of the maximum likelihood
estimate of ˛ for gamma distributions requested with the GAMMA option. Enclose the ALPHAINI-
TIAL= option in parentheses after the GAMMA keyword. The default value is Thom’s approximation
of the estimate of ˛. See Johnson, Kotz, and Balakrishnan (1995).
CDELTA=value
specifies the change in successive estimates of c at which iterations terminate in the Newton-Raphson
approximation of the maximum likelihood estimate of c for Weibull distributions requested by the
WEIBULL option. Enclose the CDELTA= option in parentheses after the WEIBULL keyword.
Iteration continues until the change in c between consecutive steps is less than the value specified
or until the number of iterations exceeds the value of the MAXITER= option. The default value is
0.00001.
General Options F 551
CINITIAL=value
specifies the initial value for cO in the Newton-Raphson approximation of the maximum likelihood
estimate of c for Weibull distributions requested with the WEIBULL or WEIBULL2 option. The
default value is 1.8. See Johnson, Kotz, and Balakrishnan (1995).
CONTENTS=‘string’
specifies the table of contents grouping entry for output produced by the plot statement. You can
specify CONTENTS=‘’ to suppress the grouping entry.
CPROP
CPROP=color | EMPTY
specifies the color for a horizontal bar whose length (relative to the width of the tile) indicates the
proportion of the total frequency that is represented by the corresponding cell in a comparative plot. By
default, no proportion bars are displayed. You can specify the keyword EMPTY to display empty bars.
For traditional graphics with the GSTYLE system option in effect, you can specify CPROP with no
argument to produce proportion bars using an appropriate color from the ODS style. The CPROP
option is not available with ODS Graphics.
HAXIS=value
specifies the name of an AXIS statement describing the horizontal axis.
HREF=values
draws reference lines that are perpendicular to the horizontal axis at the values that you specify. Also
see the CHREF= and LHREF= options.
HREFLABELS=‘label1’ . . . ‘labeln’
HREFLABEL=‘label1’ . . . ‘labeln’
HREFLAB=‘label1’ . . . ‘labeln’
specifies labels for the lines requested by the HREF= option. The number of labels must equal the
number of lines. Enclose each label in quotes. Labels can have up to 16 characters.
HREFLABPOS=n
specifies the vertical position of HREFLABELS= labels, as described in the following table:
n Position
1 Along top of plot
2 Staggered from top to bottom of plot
3 Along bottom of plot
4 Staggered from bottom to top of plot
INTERTILE=value
specifies the distance in horizontal percentage screen units between the framed areas, called tiles,
of a comparative plot. By default, INTERTILE=0.75 percentage screen units. You can specify
INTERTILE=0 to create contiguous tiles.
MAXITER=n
specifies the maximum number of iterations in the Newton-Raphson approximation of the maximum
likelihood estimate of ˛ for gamma distributions requested with the GAMMA option and c for Weibull
distributions requested with the WEIBULL and WEIBULL2 options. Enclose the MAXITER= option
in parentheses after the GAMMA, WEIBULL, or WEIBULL2 keywords. The default value of n is 20.
NCOLS=n
NCOL=n
specifies the number of columns per panel in a comparative plot. By default, NCOLS=1 if you specify
only one CLASS variable, and NCOLS=2 if you specify two CLASS variables. If you specify two
CLASS variables, you can use the NCOLS= option with the NROWS= option.
NOHLABEL
suppresses the label for the horizontal axis. You can use this option to reduce clutter.
NOVLABEL
suppresses the label for the vertical axis. You can use this option to reduce clutter.
NOVTICK
suppresses the tick marks and tick mark labels for the vertical axis. This option also suppresses the
label for the vertical axis.
NROWS=n
NROW=n
specifies the number of rows per panel in a comparative plot. By default, NROWS=2. If you specify
two CLASS variables, you can use the NCOLS= option with the NROWS= option.
ODSFOOTNOTE2=FOOTNOTE2 | ‘string’
adds a secondary footnote to ODS Graphics output. If you specify the FOOTNOTE2 keyword, the
value of SAS FOOTNOTE2 statement is used as the secondary graph footnote. If you specify a quoted
string, that is used as the secondary footnote. The quoted string can contain any of the following
escaped characters, which are replaced with the appropriate values from the analysis:
TITLE (or TITLE1) uses the value of SAS TITLE statement as the graph title.
NONE suppresses all titles from the graph.
DEFAULT uses the default ODS Graphics title (a descriptive title consisting of the plot type
and the analysis variable name.)
LABELFMT uses the default ODS Graphics title with the variable label instead of the variable
name.
If you specify a quoted string, that is used as the graph title. The quoted string can contain the following
escaped characters, which are replaced with the appropriate values from the analysis:
ODSTITLE2=TITLE2 | ‘string’
specifies a secondary title for ODS Graphics output. If you specify the TITLE2 keyword, the value
of SAS TITLE2 statement is used as the secondary graph title. If you specify a quoted string, that is
used as the secondary title. The quoted string can contain the following escaped characters, which are
replaced with the appropriate values from the analysis:
OVERLAY
specifies that plots associated with different levels of a CLASS variable be overlaid onto a single plot,
rather than displayed as separate cells in a comparative plot. If you specify the OVERLAY option with
one CLASS variable, the output associated with each level of the CLASS variable is overlaid on a
single plot. If you specify the OVERLAY option with two CLASS variables, a comparative plot based
on the first CLASS variable’s levels is produced. Each cell in this comparative plot contains overlaid
output associated with the levels of the second CLASS variable.
The OVERLAY option applies only to ODS Graphics output. It is not available in the COMPHIS-
TOGRAM statement.
SCALE=value
is an alias for the SIGMA= option for distributions requested by the BETA, EXPONENTIAL, GAMMA,
SB, SU, WEIBULL, and WEIBULL2 options and for the ZETA= option for distributions requested by
the LOGNORMAL option.
SHAPE=value
is an alias for the ALPHA= option for distributions requested by the GAMMA option, for the SIGMA=
option for distributions requested by the LOGNORMAL option, and for the C= option for distributions
requested by the WEIBULL and WEIBULL2 options.
554 F Chapter 6: The CAPABILITY Procedure
STATREF=keyword-list
draws reference lines at the values of the statistics requested in the keyword-list . These reference lines
are perpendicular to the horizontal axis in a histogram or cdf plot, and perpendicular to the vertical
axis in a probability or Q-Q plot (unless the ROTATE option is specified). The STATREF= option does
not apply to the PPPLOT statement.
Valid keywords are listed in the following table:
Keyword Statistic
MAX Largest value
MEAN Sample mean
MEDIAN | Q2 Median (50th percentile)
MIN Smallest value
MODE Most frequent value
P pctl pctlth percentile
Q1 Lower quartile (25th percentile)
Q3 Upper quartile (75th percentile)
factor STD factor standard deviations from the mean
Note that the factor specified with the STD keyword can be positive (which puts a reference line above
the mean) or negative (below the mean).
Also see the CSTATREF=, LSTATREF=, STATREFLABELS=, and STATREFSUBCHAR= options.
STATREFLABELS=‘label1’ . . . ‘labeln’
STATREFLABEL=‘label1’ . . . ‘labeln’
STATREFLAB=‘label1’ . . . ‘labeln’
specifies labels for the lines requested by the STATREF= option. The number of labels must equal the
number of lines. Enclose each label in quotes. Labels can have up to 16 characters.
STATREFSUBCHAR=‘keyword-list’
specifies a substitution character (such as #) for labels specified with the STATREFLABELS= option.
When the labels are displayed on a graph, the first occurrence of the specified character in each label is
replaced with the value of the corresponding STATREF= statistic.
VAXIS=name
VAXIS=value-list
specifies the name of an AXIS statement describing the vertical axis. In a COMPHISTOGRAM or
HISTOGRAM statement, you can alternatively specify a value-list for the vertical axis.
VAXISLABEL=‘label’
specifies a label for the vertical axis. Labels can have up to 40 characters.
VREF=value-list
draws reference lines perpendicular to the vertical axis at the values specified. Also see the CVREF=
and LVREF= options.
Options for Traditional Graphics F 555
VREFLABELS=‘label1’. . . ‘labeln’
VREFLABEL=‘label1’. . . ‘labeln’
VREFLAB=‘label1’. . . ‘labeln’
specifies labels for the lines requested by the VREF= option. The number of labels must equal the
number of lines. Enclose each label in quotes. Labels can have up to 16 characters.
VREFLABPOS=n
specifies the horizontal position of VREFLABELS= labels. If you specify VREFLABPOS=1, the
labels are positioned at the left of the plot. If you specify VREFLABPOS=2, the labels are positioned
at the right of the plot. By default, VREFLABPOS=1 for traditional graphics and 2 for ODS Graphics.
ANNOTATE=SAS-data-set
ANNO=SAS-data-set
specifies an input data set that contains annotate variables, as described in SAS/GRAPH: Reference, for
annotating traditional graphics. The ANNOTATE= data set you specify in the plot statement is used
for all plots created by the statement. You can also specify an ANNOTATE= data set in the PROC
CAPABILITY statement to enhance all plots created by the procedure (see “ANNOTATE= Data Sets”
on page 232).
CAXIS=color
CAXES=color
CA=color
specifies the color for the axes and tick marks. This option overrides any COLOR= specifications in an
AXIS statement.
CFRAME=color
specifies the color for the area that is enclosed by the axes and frame.
CFRAMESIDE=color
specifies the color to fill the frame area for the row labels that display along the left side of a comparative
plot. This color also fills the frame area for the label of the corresponding CLASS variable, if you
associate a label with the variable.
CFRAMETOP=color
specifies the color to fill the frame area for the column labels that display across the top of a comparative
plot. This color also fills the frame area for the label of the corresponding CLASS variable, if you
associate a label with the variable.
556 F Chapter 6: The CAPABILITY Procedure
CHREF=color | (color-list)
CH=color | (color-list)
specifies the colors for horizontal axis reference lines requested by the HREF= option. If you specify a
single color, it is used for all HREF= lines. Otherwise, if there are fewer colors specified than reference
lines requested, the remaining lines are displayed with the default reference line color. You can also
specify the value _default in the color list to request the default color.
COLOR=color
COLOR=color-list
specifies the color of the curve or reference line associated with a distribution or kernel density estimate.
Enclose the COLOR= option in parentheses after a distribution option or the KERNEL option. In a
HISTOGRAM statement, you can specify a list of colors in parentheses for multiple density curves.
CSTATREF=color | (color-list)
specifies the colors for reference lines requested by the STATREF= option. If you specify a single
color, it is used for all STATREF= lines. Otherwise, if there are fewer colors specified than reference
lines requested, the remaining lines are displayed with the default reference line color. You can also
specify the value _default in the color list to request the default color.
CTEXT=color
CT=color
specifies the color for tick mark values and axis labels.
CTEXTSIDE=color
specifies the color for the row labels that display along the left side of a comparative plot. If you do
not specify the CTEXTSIDE= option, the color specified with the CTEXT= option is used. You can
specify the CFRAMESIDE= option to change the background color for the row labels.
CTEXTTOP=color
specifies the color for the column labels that display along the left side of a comparative plot. If you do
not specify the CTEXTTOP= option, the color specified with the CTEXT= option is used. You can use
the CFRAMETOP= option to change the background color for the column labels.
CVREF=color | (color-list)
CV=color | (color-list)
specifies the colors for lines requested with the VREF= option. If you specify a single color, it is used
for all VREF= lines. Otherwise, if there are fewer colors specified than reference lines requested, the
remaining lines are displayed with the default reference line color. You can also specify the value
_default in the color list to request the default color.
DESCRIPTION=‘string’
DES=‘string’
specifies a description, up to 256 characters long, for the GRSEG catalog entry for a traditional graphics
chart. The default value is the analysis variable name.
Options for Traditional Graphics F 557
FONT=font
specifies a font for reference line and axis labels. You can also specify fonts for axis labels in an AXIS
statement. The FONT= option takes precedence over the FTEXT= font specified in the GOPTIONS
statement. For a list of software fonts, see SAS/GRAPH: Reference.
HEIGHT=value
specifies the height, in percentage screen units, of text for axis labels, tick mark labels, and legends.
This option takes precedence over the HTEXT= option in the GOPTIONS statement.
HMINOR=n
HM=n
specifies the number of minor tick marks between each major tick mark on the horizontal axis. Minor
tick marks are not labeled. By default, HMINOR=0.
INFONT=font
specifies a font to use for text inside the framed areas of the plot. The INFONT= option takes
precedence over the FTEXT= option in the GOPTIONS statement. For a list of software fonts, see
SAS/GRAPH: Reference.
INHEIGHT=value
specifies the height, in percentage screen units, of text used inside the framed areas of the plot. If you
do not specify the INHEIGHT= option, the height specified with the HEIGHT= option is used.
L=linetype
L=linetype-list
specifies the line type of the curve or reference line associated with a distribution or kernel density
estimate. Enclose the L= option in parentheses after the distribution option or the KERNEL option.
In a HISTOGRAM statement, you can specify a list of line types in parentheses for multiple density
curves.
LHREF=linetype | linetype-list
LH=linetype | linetype-list
specifies the line types for the reference lines that you request with the HREF= option. If you specify a
single line type, it is used for all HREF= lines. Otherwise, if there are fewer line types specified than
reference lines requested, the remaining lines are displayed with the default reference line type. You
can also specify line type 0 to request the default color.
LSTATREF=linetype | linetype-list
specifies the line types for the reference lines that you request with the STATREF= option. If you
specify a single line type, it is used for all STATREF= lines. Otherwise, if there are fewer line types
specified than reference lines requested, the remaining lines are displayed with the default reference
line type. You can also specify line type 0 to request the default color.
LVREF=linetype | linetype-list
LV=linetype | linetype-list
specifies the line types for lines requested with the VREF= option. If you specify a single line type,
it is used for all VREF= lines. Otherwise, if there are fewer line types specified than reference lines
requested, the remaining lines are displayed with the default reference line type. You can also specify
line type 0 to request the default color.
558 F Chapter 6: The CAPABILITY Procedure
NAME=‘string’
specifies the name of the GRSEG catalog entry for a traditional graphics plot, and the name of the
graphics output file if one is created. The name can be up to 256 characters long, but the GRSEG name
is truncated to eight characters. The default value is ‘CAPABILI’.
NOFRAME
suppresses the frame around the subplot area.
TURNVLABELS
TURNVLABEL
turns the characters in the vertical axis labels so that they display vertically.
VMINOR=n
VM=n
specifies the number of minor tick marks between each major tick mark on the vertical axis. Minor
tick marks are not labeled. The default is zero.
W=value
W=value-list
specifies the width in pixels of the curve or reference line associated with a distribution or kernel
density estimate. Enclose the W= option in parentheses after the distribution option or the KERNEL
option. In a HISTOGRAM statement, you can specify a list of widths in parentheses for multiple
density curves.
WAXIS=n
specifies the line thickness, in pixels, for the axes and frame.
VREFCHAR=‘character ’
VREF= option for a line printer chart. specifies the character used to form the lines requested by the
The default is the hyphen (-).
References
Bai, D. S., and Choi, I. S. (1997). “Process Capability Indices for Skewed Populations.” Unpublished
manuscript, Korean Advanced Institute of Science and Technology, Taejon, Korea.
Bissell, A. F. (1990). “How Reliable Is Your Capability Index?” Journal of the Royal Statistical Society,
Series C 39:331–340.
Blom, G. (1958). Statistical Estimates and Transformed Beta Variables. New York: John Wiley & Sons.
References F 559
Boyles, R. A. (1991). “The Taguchi Capability Index.” Journal of Quality Technology 23:107–126.
Boyles, R. A. (1992). Cpm for Asymmetrical Tolerances. Technical report, Precision Castparts Corp.,
Portland, OR.
Chambers, J. M., Cleveland, W. S., Kleiner, B., and Tukey, P. A. (1983). Graphical Methods for Data
Analysis. Belmont, CA: Wadsworth International Group.
Chen, H. F., and Kotz, S. (1996). “An Asymptotic Distribution of Wright’s Process Capability Index Sensitive
to Skewness.” Journal of Statistical Computation and Simulation 55:147–158.
Chen, K. S. (1998). “Incapability Index with Asymmetric Tolerances.” Statistica Sinica 8:253–262.
Chou, Y., Owen, D. B., and Borrego, S. A. (1990). “Lower Confidence Limits on Process Capability Indices.”
Journal of Quality Technology 22:223–229; corrigenda, 24, 251.
Conover, W. J. (1980). Practical Nonparametric Statistics. 2nd ed. New York: John Wiley & Sons.
Croux, C., and Rousseeuw, P. J. (1992). “Time-Efficient Algorithms for Two Highly Robust Estimators of
Scale.” Computational Statistics 1:411–428.
D’Agostino, R. B., and Stephens, M., eds. (1986). Goodness-of-Fit Techniques. New York: Marcel Dekker.
Dixon, W. J., and Tukey, J. W. (1968). “Approximate Behavior of the Distribution of Winsorized t (Trim-
ming/Winsorization 2).” Technometrics 10:83–98.
Ekvall, D. N., and Juran, J. M. (1974). “Manufacturing Planning.” In Quality Control Handbook, 3rd ed.,
edited by J. M. Juran. New York: McGraw-Hill.
Elandt, R. C. (1961). “The Folded Normal Distribution: Two Methods of Estimating Parameters from
Moments.” Technometrics 3:551–562.
Gnanadesikan, R. (1997). Statistical Data Analysis of Multivariate Observations. New York: John Wiley &
Sons.
Grimshaw, S. D. (1993). “Computing Maximum Likelihood Estimates for the Generalized Pareto Distribution.”
Technometrics 35:185–191.
Gupta, A. K., and Kotz, S. (1997). “A New Process Capability Index.” Metrika 45:213–224.
Hahn, G. J. (1969). “Factors for Calculating Two-Sided Prediction Intervals for Samples from a Normal
Distribution.” Journal of the American Statistical Association 64:878–898.
560 F Chapter 6: The CAPABILITY Procedure
Hahn, G. J. (1970a). “Additional Factors for Calculating Prediction Intervals for Samples from a Normal
Distribution.” Journal of the American Statistical Association 65:1668–1676.
Hahn, G. J. (1970b). “Statistical Intervals for a Normal Population, Part 2: Formulas, Assumptions, Some
Derivations.” Journal of Quality Technology 2:195–206.
Hahn, G. J., and Meeker, W. Q. (1991). Statistical Intervals: A Guide for Practitioners. New York: John
Wiley & Sons.
Iman, R. L. (1974). “Use of a t-Statistic as an Approximation to the Exact Distribution of the Wilcoxon
Signed Rank Statistic.” Communications in Statistics 3:795–806.
Johnson, N. L., Kotz, S., and Balakrishnan, N. (1994). Continuous Univariate Distributions. 2nd ed. Vol. 1.
New York: John Wiley & Sons.
Johnson, N. L., Kotz, S., and Balakrishnan, N. (1995). Continuous Univariate Distributions. 2nd ed. Vol. 2.
New York: John Wiley & Sons.
Johnson, N. L., Kotz, S., and Kemp, A. W. (1992). Univariate Discrete Distributions. 2nd ed. New York:
John Wiley & Sons.
Johnson, N. L., Kotz, S., and Pearn, W. L. (1994). “Flexible Process Capability Indices.” Pakistan Journal of
Statistics 10:23–31.
Kotz, S., and Johnson, N. L. (1993). Process Capability Indices. London: Chapman & Hall.
Kotz, S., and Lovelace, C. R. (1998). Process Capability Indices in Theory and Practice. London: Edward
Arnold.
Krishnamoorthy, K., and Mathew, T. (2009). Statistical Tolerance Regions: Theory, Applications, and
Computation. Hoboken, NJ: John Wiley & Sons.
Kushler, R. H., and Hurley, P. (1992). “Confidence Bounds for Capability Indices.” Journal of Quality
Technology 24:188–195.
Lehmann, E. L., and D’Abrera, H. J. M. (1975). Nonparametrics: Statistical Methods Based on Ranks. San
Francisco: Holden-Day.
Luceño, A. (1996). “A Process Capability Index with Reliable Confidence Intervals.” Communications in
Statistics—Simulation and Computation 25:235–245.
Marcucci, M. O., and Beazley, C. F. (1988). “Capability Indices: Process Performance Measures.” Transac-
tions of ASQC Congress 42:516–523.
Montgomery, D. C. (1996). Introduction to Statistical Quality Control. 3rd ed. New York: John Wiley &
Sons.
Odeh, R. E., and Owen, D. B. (1980). Tables for Normal Tolerance Limits, Sampling Plans, and Screening.
New York: Marcel Dekker.
Owen, D. B., and Hua, T. A. (1977). “Tables of Confidence Limits on the Tail Area of the Normal Distribution.”
Communications in Statistics—Simulation and Computation 6:285–311.
References F 561
Pearn, W. L., Kotz, S., and Johnson, N. L. (1992). “Distributional and Inferential Properties of Process
Capability Indices.” Journal of Quality Technology 24:216–231.
Rodriguez, R. N., and Bynum, R. A. (1992). “Examples of Short Run Process Control Methods with the
SHEWHART Procedure in SAS/QC Software.” Unpublished manuscript available from the authors.
Rousseeuw, P. J., and Croux, C. (1993). “Alternatives to the Median Absolute Deviation.” Journal of the
American Statistical Association 88:1273–1283.
Royston, J. P. (1992). “Approximating the Shapiro-Wilk W Test for Nonnormality.” Statistics and Computing
2:117–119.
Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. New York: Chapman & Hall.
Slifker, J. F., and Shapiro, S. S. (1980). “The Johnson System: Selection and Parameter Estimation.”
Technometrics 22:239–246.
Terrell, G. R., and Scott, D. W. (1985). “Oversmoothed Nonparametric Density Estimates.” Journal of the
American Statistical Association 80:209–214.
Tukey, J. W., and McLaughlin, D. H. (1963). “Less Vulnerable Confidence and Significance Procedures for
Location Based on a Single Sample: Trimming/Winsorization 1.” Sankhya, N Series A 25:331–352.
Vännmann, K. (1997). “A General Class of Capability Indices in the Case of Asymmetric Tolerances.”
Communications in Statistics—Theory and Methods 26:2049–2072.
Velleman, P. F., and Hoaglin, D. C. (1981). Applications, Basics, and Computing of Exploratory Data
Analysis. Boston: Duxbury Press.
Wadsworth, H. M., Stephens, K. S., and Godfrey, A. B. (1986). Modern Methods for Quality Control and
Improvement. New York: John Wiley & Sons.
Wainer, H. (1974). “The Suspended Rootogram and Other Visual Displays: An Empirical Validation.”
American Statistician 28:143–145.
Wilk, M. B., and Gnanadesikan, R. (1968). “Probability Plotting Methods for the Analysis of Data.”
Biometrika 49:525–545.
Wright, P. A. (1995). “A Process Capability Index Sensitive to Skewness.” Journal of Statistical Computation
and Simulation 52:195–203.
Zhang, N. F., Stenback, G. A., and Wardrop, D. M. (1990). “Interval Estimation of Process Capability Index
Cpk.” Communications in Statistics—Theory and Methods 19:4455–4470.