Decision Analysis Using Microsoft Excel PDF
Decision Analysis Using Microsoft Excel PDF
ANALYSIS
USING
MICROSOFT
EXCEL
SPRING 2006
Michael R. Middleton
School of Business and Management
University of San Francisco
This page is intentionally mostly blank.
RandBiVarNormal ................................................................................................... 78
RandCumulative....................................................................................................... 79
RandDiscrete ............................................................................................................ 80
RandExponential ...................................................................................................... 82
RandInteger .............................................................................................................. 83
RandNormal ............................................................................................................. 84
RandSample ............................................................................................................. 85
RandPoisson............................................................................................................. 85
RandTriangular ........................................................................................................ 86
RandUniform............................................................................................................ 87
7.10 RiskSim Technical Details ................................................................................... 88
7.11 Modeling Uncertain Relationships ....................................................................... 90
Base Model, Four Inputs .......................................................................................... 90
Three Inputs ............................................................................................................. 91
Two Inputs ............................................................................................................... 92
Four Inputs with Three Uncertainties....................................................................... 93
Intermediate Details ................................................................................................. 95
Chapter 1 introduces the terminology for decision models that is used throughout the
book. Several ways to describe a decision problem are discussed, including spreadsheet
models, influence charts, decision trees, and consequence tables.
Chapter 2 contains the documentation and examples for the SensIt sensitivity analysis
add-in for Excel.
Chapter 3 discusses multi-attribute utility which is a useful model for decision problems
with conflicting objectives. The discussion includes extensive sensitivity analysis for
multi-attribute utility using standard Excel features.
12
Abstraction
Math Model
Difficult
Problem
Operations
on Model
Implementation
Model Results
Influence chart
Rectangle for controllable inputs
Rounded rectangle or oval for other variables
Performance
Measure (Output)
Intermediate
... Variables ...
Controllable Uncontrollable
Factor (Input) ... Factor (Input) ...
16 Chapter 1 Introduction to Decision Modeling
Net Output
Cash Flow
Total
Costs
Intermediate
Variables
Sales Total
Revenue Variable Cost
... ...
= =
Decision Decision Event Event
with many Fan with many Fan
possible possible
alternatives outcomes
Net
$ Cash
Flow
Input Values
Type numbers in the Start, Step, and Stop edit boxes to specify values to be used in the
input variable’s cell. Cell references are not allowed.
Click OK: SensIt uses the Start, Step, and Stop values to prepare a table of values. Each
value is copied to the input variable Value cell, the worksheet is recalculated, and the
value of the output variable Value cell is copied to the table. (You could do this manually
in Excl using the Edit | Fill | Series and Data | Table commands.) SensIt uses the paired
input and output values to prepare an XY (Scatter) chart. The text in the label cells you
identified are used as the chart’s axis labels. (You could do this manually using the
ChartWizard.)
2.6 Many Inputs, Many Outputs Tornado 23
Annual Profit
500 $1,250
$10,000
550 $3,375
600 $5,500
650 $7,625 $5,000
700 $9,750
750 $11,875
800 $14,000 $0
850 $16,125
900 $18,250
950 $20,375 -$5,000
400 500 600 700 800 900 1000
1000 $22,500
Hours Flow n
From the table and chart, we observe that Eagle must fly approximately 480 hours to
achieve a positive profit, assuming all other assumptions stay the same. The exact
threshold value for Hours Flown could be obtained using Excel's Goal Seek feature.
The uncertainty about Capacity of Scheduled Flights is associated with the widest swing
in Annual Profit.
-$15,000 -$10,000 -$5,000 $0 $5,000 $10,000 $15,000 $20,000 $25,000 $30,000 $35,000 $40,000
Annual Profit
2.10 SPIDER
Use SensIt’s Spider option to see how your model’s output depends on the same
percentage changes for each of the model’s input variables.
Click OK: SensIt Spider uses the Start (%), Step (%), and Stop (%) values and the
original (base case) numeric value in each input variable cell to prepare a table of
percentage change input values. For each input variable, all other input values are set at
their base case values, each percentage change input value is copied to the input variable
cell, the worksheet is recalculated, and the value of the output variable cell is copied to
the table. SensIt prepares two XY (Scatter) charts; the horizontal axis is percentage
change of input variables; the vertical axis is model output value on one chart and
percentage change of model output value on the other; the input variables’ labels are used
for chart legends.
2.11 Tips for Many Inputs, One Output 29
31 Operating Cost/Hour
32 $15,000 Hours Flown
33 Charter Price/Hour
34 $10,000 Proportion of Chartered Flights
35 Ticket Price/Hour
36 $5,000 Insurance
37
38 $0
39
40 -$5,000
41
42 -$10,000
43
44 -$15,000
45 50.0% 60.0% 70.0% 80.0% 90.0% 100.0% 110.0% 120.0% 130.0% 140.0% 150.0%
46
Input Value as % of Base Case
47
48
Alternatively, in some situations the values for each input variable may have lower and
upper bounds, so you may specify low and high values that are the absolute lowest and
highest possible values.
When you click OK, SensIt sets all of the input variables to their base-case values and
records the output value. Then SensIt goes through each of the input variables one at a
time, plugs the low-case value into the input cell, and records the value in the output cell.
It then repeats the process for the high case. For each substitution, all input values are
kept at their base-case values except for the single input value that is setn at it low or high
value. SensIt then produces a spreadsheet that lists the numerical results as shown in
columns F, G, and H of the worksheet with the tornado chart.
In the worksheet, the variables are sorted by their "swing" -- the absolute value of the
difference between the output values in the low and high cases. "Swing" serves as a
rough measure of the impact of each input variable. The rows of numerical output are
sorted from highest swing at the top down to lowest swing at the bottom. Then SensIt
creates a bar chart of the sorted data.
In general, you should focus your modeling efforts on those variables with the greatest
impact on the value measure.
If your model has input variables that are discrete or categorical, you should create
multiple tornado charts using different base case values of that input variable. For
example, if your model has an input variable "Government Regulation" that has possible
values 0 (zero) or 1, the low and high values will be 0 and 1, but you should run one
tornado chart with base case = 0 and another tornado chart with base case = 1.
2.12 Eagle Airlines Problem 31
Figure 2.14 Ten-Variable Worst Case and Best Case Inputs Determined by Solver
Variable Worst Case Base Case Best Case
Hours Flown 1000 800 1000
Charter Price/Hour $300 $325 $350
Ticket Price/Hour $95 $100 $108
Capacity of Scheduled Flights 40% 50% 60%
Proportion Of Chartered Flights 0.45 0.5 0.7
Operating Cost/Hour $260 $245 $230
Insurance $25,000 $20,000 $18,000
Proportion Financed 0.5 0.4 0.3
Interest Rate 13.0% 11.5% 10.5%
Purchase Price $90,000 $87,500 $85,000
Energy to produce
Cost
Environmental waste
Customer service
Selecting a best job
Monetary compensation
Geographical location
Travel requirements
Nature of work
Attribute Scores
0.6
0.4
0.2
0.0
5 6 7 8 9 10 11 12 13
Life Span, in years
0.6
0.4
0.2
0.0
$5,000 $10,000 $15,000 $20,000
Price
36 Chapter 3 Multiattribute Utility
0.6
0.4
0.2
0.0
Red Blue Yellow
Color
Swing Weights
A B C D E F G
1 Swing Weights
2
3 Consequence to Compare
4 Attribute Swung from
5 Worst to Best Life span Price Color Rank Rate Weight
6 (Benchmark) 6 years $17,000 red 4 0 0.000
7 Life span 12 years $17,000 red 2 75 0.405
8 Price 6 years $8,000 red 1 100 0.541
9 Color 6 years $17,000 yellow 3 10 0.054
10 185
Overall Scores
Other attributes might be important, e.g., comfort and prestige. The cost attribute should
include operating costs, insurance, and salvage value, in addition to purchase price. It
might be appropriate to combine the cost and lifetime attributes into a single attribute,
e.g., cost per year. Clemen [1] suggests that a set of attributes should be complete (so that
all important objectives are included), as small as possible (to facilitate analysis), not
redundant (to avoid double-counting a common underlying characteristic), and
decomposable (so that the decision maker can think about each attribute separately).
Dominance
An alternative can be eliminated if another alternative is better on some objectives and no
worse on the others. The Garnett is more expensive than the Delta, has the same lifetime,
and has a lower safety rating. So the Garnett can be eliminated from further
consideration.
Compared to the assessments for individual utility, the assessments for tradeoffs are
usually much more difficult to make. The following sections focus on assessments of
tradeoff weights and sensitivity analysis.
With three attributes, the two assessed weight ratios determine two equations and the
requirement that the weights sum to one determines a third equation. Using algebra, a
solution for the three unknown weights is shown in cells J8:J10 in Figure 5.
The formula for overall utility in cell B7, with a relative reference to the attribute utilities
in B3:B5 and an absolute reference to the weights in J8:J10, is copied to cells C7:G7.
The MAX worksheet function determines the maximum overall utility in B7:G7, the
MATCH function determines the location of that maximum in B7:G7, and the INDEX
function returns the alternative name located in B2:G2. The zero argument in the
MATCH function is needed to specify that an exact match is required; the zero argument
in the INDEX function is used as a placeholder and could be omitted in this application
without affecting the results. Cell B13 combines these functions into a single formula.
After deleting cells A9:B12, the single formula is in cell B9. The arrangement shown in
Figure 6 is used for the remaining analyses.
3.3 Sensitivity Analysis Methods 43
Cell P7, corresponding to the original assessments, has a border. The data table is
dynamic, so the macro view may be refined near the base-case assessments by specifying
different input values.
44 Chapter 3 Multiattribute Utility
Figure 8 shows that the Cost/Safety weight ratio must be less than 1.2 to affect the
choice. If the decision maker regards 1.2 as "far away" from 1.5, then the Egret choice is
appropriate. Otherwise, the decision maker should think more carefully about the original
assessments before making a choice based on this analysis. The assessment of the
Cost/Lifetime weight ratio is not as critical, because any value between 4 and 6 yields the
same choice.
4) Sum the scores, as shown in cell N9. In the additive utility function, the weight
for each attribute equals the score divided by sum of the scores. (The algebra
solution, not shown here, is based on the special zero and one individual utility
values of the hypothetical alternatives.) Formulas are shown in Figure 10.
The individual utility values are in a column, and the weights are in a row. The
SUMPRODUCT function requires that the two arrays for its arguments have the same
orientation, so the TRANSPOSE function converts the weights into a column format, as
shown in Figure 11. The function in B7 must be array-entered; after typing the function,
hold down Control and Shift while you press Enter.
46 Chapter 3 Multiattribute Utility
The results in the left table Figure 13, cells Q13:R28, indicate that the Best-Lifetime
score must be greater than 30 to affect the choice. A refined data table in cells T13:U19
shows that the score must be greater than 33 before the choice changes from Egret to
Cruiser. If the decision maker regards 33 as "far away" from 20, then the Egret choice is
appropriate.
Figure 14 shows a similar sensitivity analysis for the Best-Safety score. The assessed
score of 70 must be greater than 89 to affect the choice.
48 Chapter 3 Multiattribute Utility
To construct a two-way data table for sensitivity analysis of the swing weight
assessments as shown in Figure 15, enter a set of values in a row, R4:V4, and another set
of values in a column, Q5:Q13. In the top left cell of the data table, Q4, enter a formula
for determining the data table's output values, =B9. (To improve the appearance of the
table, cell Q4 is formatted with a custom three-semicolon format so that the formula
result is not displayed.) Select Q4:V13. Choose Data | Table. In the Data Table dialog
box, specify L9 as the Row Input Cell and M9 as the Column Input Cell. Click OK.
3.3 Sensitivity Analysis Methods 49
The table shows that the choice changes from Egret to Cruiser if the combination of
assessments is changed from 20 & 70 to 30 & 75. This table could be refined to examine
the exact threshold values.
The formula in cell B9 includes an IF function to verify that each weight is between 0
and 1, inclusive, and that the sum of the weights equals one. If not, the formula returns
empty text. This formula must be array-entered; after typing the function, hold down
Control and Shift while you press Enter.
50 Chapter 3 Multiattribute Utility
Figure 18 shows a two-way table for sensitivity analysis of the weights. Cell R5
corresponds to the approximate base case assessments in the weight ratio and swing
weight methods.
Figure 19 is a more detailed view. The choice formula in cell B9 is modified by placing
the INDEX function inside the LEFT function so that only the first letter of the
alternative's name is returned.
3.3 Sensitivity Analysis Methods 51
The results in Figure 19 show that all alternatives in this data set are candidates
depending on the tradeoffs specified by the decision maker. In general, moving left to
right, if more weight is given to cost, a less expensive alternative is chosen.
Summary
This paper considered three methods for assessing tradeoffs in the additive utility
function. For each method sensitivity analysis is useful for gaining insight into which
tradeoff assumptions are critical. Kirkwood [2] includes Excel VBA methods for
sensitivity analysis of individual utility functions in addition to weights.
Part 2 discusses Monte Carlo simulation which is useful for incorporating uncertainty
into spreadsheet what-if models.
Separate chapters describe simulation using standard Excel features and simulation using
the RiskSim simulation add-in for Excel.
Additional topics in this part include multi-period evaluation models, inventory decisions,
and queuing models.
54
Net
$ Cash
Flow
Net
$ Cash
Flow
Net
Cash
Flow
$29
0.00025
0
0 1000 2000 3000 4000 5000 6000
Unit Sales, x
5.2 Continuous Uncertain Quantities 59
1.00
Cumulative Probability,
0.75
P(X<=x)
0.50
0.25
0.00
0 1000 2000 3000 4000 5000 6000
Unit Sales, x
Both probability mass functions (for discrete UQs) and probability density functions (for
continuous UQs) have corresponding cumulative probability functions.
It is important to understand the relationship between a density function and its
cumulative probability function.
Cumulative probability can be expressed in four ways:
P(X<=x) probability that UQ X is inclusive left -tail
less than or equal to x
P(X<x) probability that UQ X is exclusive left -tail
strictly less than x
P(X>=x) probability that UQ X is inclusive right -tail
greater than or equal to x
P(X>x) probability that UQ X is exclusive right -tail
strictly greater than x
For continuous UQs the cumulative probability is the same for inclusive and exclusive.
P(X<=x) is the most common type.
60 Chapter 5 Uncertain Quantities
0.0005
0
0 1000 2000 3000 4000 5000 6000
Unit Sales, x
1.00
0.75
0.50
0.25
0.00
0 1000 2000 3000 4000 5000 6000
Unit Sales, x
62 Chapter 5 Uncertain Quantities
0.0005
0
0 1000 2000 3000 4000 5000 6000
Unit Sales, x
1.00
0.75
P(X<=x)
0.50
0.25
0.00
0 1000 2000 3000 4000 5000 6000
Unit Sales, x
64 Chapter 5 Uncertain Quantities
the RiskSim xla file from the C:\MyAddIns folder or if you opened the workbook on
another computer where the RiskSim xla file is not located at the same path), Excel
displays a dialog box like the one shown below.
If you see this dialog box or a similar warning when you open an Excel file, choose the
"Don't Update" option. The workbook will be opened, but any cell containing a reference
to a RiskSim function will display the #NAME? or similar error code.
To update the links after the workbook is open, be sure that a RiskSim xla file is open.
Then choose Edit | Links to see the dialog box shown below. (In this example the
workbook originally used functions from the RiskSim xla file located at
C:\middleton\risksim\risksim.xla.)
To update the links, click the Change Source button. A file browser window will open,
where you can navigate to the RiskSim xla file that is open. After you select the file using
the file browser, click OK. Back in the Edit Links dialog box, click the Close button.
In Excel 2003 the Edit Links dialog box has a Startup Prompt button. To avoid possible
problems when Excel tries to automatically update links while a file is being opened, we
recommend the default "Let users choose to display the alert or not."
Optionally, select the "Output Label Cell" edit box, and point or type a reference to a cell
containing the name of the model output (for example, a cell whose contents is the text
label "Net Profit").
Select the "Output Formula Cell" edit box, and point to a single cell on your worksheet or
type a cell reference. The output cell of your model must contain a formula that depends,
usually indirectly, on the model inputs determined by the random number generator
functions.
Select the "Random Number Seed" edit box, and type a number between zero and one. (If
you want to change the seed without performing a simulation, enter zero in the "Number
of iterations" edit box.)
Select the "Number Of Trials" edit box, and type an integer value (for example, 100 or
500). This value, sometimes called the sample size or number of iterations, specifies the
number of times the worksheet will be recalculated to determine output values of your
model.
100
19
20 80
21 60
22 40
23
20
24
25 0
26 -$8,000 -$6,000 -$4,000 -$2,000 $0 $2,000 $4,000 $6,000 $8,000 $10,000 $12,000 $14,000
27 Net Cash Flow
28
29
30
RiskSim 2.31 Pro - Cumulative Chart
31
32
33 1.0
34 0.9
35 0.8
Cumulative Probability
36 0.7
37 0.6
38 0.5
39 0.4
40
0.3
41
0.2
42
0.1
43
44 0.0
45 -$8,000 -$6,000 -$4,000 -$2,000 $0 $2,000 $4,000 $6,000 $8,000 $10,000 $12,000 $14,000
46 Net Cash Flow
47
48
The histogram is based on the frequency distribution in columns I:J. The cumulative
distribution is based on the sorted output values in column C and the cumulative
probabilities in column D.
7.8 Customizing RiskSim Charts 75
The cumulative probabilities start at 1/(2*N), where N is the number of trials, and
increase by 1/N. The rationale is that the lowest ranked output value of the sampled
values is an estimate of the population's values in the range from 0 to 1/N, and the lowest
ranked value is associated with the median of that range.
Column B contains the original sampled output values.
Columns F:G show percentiles based on Excel's PERCENTILE worksheet function.
Refer to Excel's online help for the interpolation method used by the PERCENTILE
function.
The summary measures in columns Q:R are also based on Excel worksheet functions:
AVERAGE, STDEV, QUARTILE, and SKEW.
widen the chart). Another way is to select the horizontal axis (click between the labels on
the horizontal axis so that "Value (X) axis" appears in the name box in the upper left of
Excel) and change to a smaller font size using the Font Size drop-down edit box on the
the Formatting tool bar.
The histogram chart is a combination chart using a column chart type for the vertical bars
and an XY (Scatter) chart type for the horizontal axis. The two chart types align properly
as long as the horizontal axis retains the same minimum and maximum values.
For example, if you want more spacing between the dollar labels on the horizontal axis,
select the horizontal axis (so that "Value (X) axis" appears in the name box in the upper
left of Excel), choose Format | Selected Axis | Scale, and change the "Major unit" from
2000 to 4000. Do not change the Minimum = –8000 or the Maximum = 14000. The
histogram appears as shown below.
Figure 7.12 Original Histogram With Modified Horizontal Axis Major Unit
160
140
120
100
Frequency
80
60
40
20
0
-$8,000 -$4,000 $0 $4,000 $8,000 $12,000
Net Cash Flow
The cumulative chart is a standard XY (Scatter) chart type, so you can change the major
unit as described above, but you can also change the minimum and maximum without
affecting the integrity of the chart.
Another way to obtain more spacing on the horizontal axis of the histogram or
cumulative chart is to use a custom format. For example, if you want to show values in
thousands instead of the original units, select the horizontal axis (click between the labels
on the horizontal axis so that "Value (X) axis" appears in the name box in the upper left
of Excel), choose Format | Selected Axis | Number | Custom, and enter a comma at the
end of the current format shown in the "Type:" edit box. After changing the original
7.9 Random Number Generator Functions 77
format "$#,##0" to "$#,##0," and modifying the horizontal axis title, the cumulative chart
appears as shown below.
Figure 7.13 Original Cumulative Chart With Horizontal Axis Custom Format
1.0
0.9
0.8
Cumulative Probability
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
-$8 -$6 -$4 -$2 $0 $2 $4 $6 $8 $10 $12 $14
Net Cash Flow, in thousands of dollars
Returns #NUM! if trials is non-integer or less than one, or probability_s is less than zero
or more than one.
Returns #VALUE! if an argument is a defined name of a cell and the cell is blank or
contains text.
RANDBINOMIAL Example
A salesperson makes ten unsolicited calls per day, where the probability of making a sale
on each call is 30 percent. The uncertain total number of sales in one day is
=RANDBINOMIAL(10,0.3)
RANDBINOMIAL Related Function
FASTBINOMIAL: Same as RANDBINOMIAL without any error checking of the
arguments.
CRITBINOM(trials,probability_s,RAND()): Excel's inverse of the cumulative binomial,
or CRITBINOM(trials,probability_s,RANDUNIFORM(0,1)) to use the RiskSim Seed
feature.
RandBiVarNormal
Returns two random values from a bivariate normal distribution with a specified
correlation.
To use this random number generator function, select two adjacent cells on the
worksheet. Type =RANDBIVARNORMAL followed by numerical values for the five
arguments or references to cells containing the values, separated by commas, enclosed in
starting and ending parentheses. After typing the ending parentheses, do not press Enter.
Instead, hold down the Control and Shift keys while you press Enter, thus "array
entering" the function.
Syntax:
RANDBIVARNORMAL(mean1,stdev1,mean2,stdev2,correl12)
Returns #REF! if the array function is not entered into two adjacent cells.
Returns #NUM! if a standard deviation is negative or the correlation is outside the range
between -1 and +1.
Returns #VALUE! if an argument is not numeric.
Example: Select two adjacent cells, type
=RANDBIVARNORMAL(100,10,50,5,0.5)
Hold down Control and Shift while you press Enter.
7.9 Random Number Generator Functions 79
RandCumulative
Returns a random value from a piecewise-linear cumulative distribution. This function
can model a continuous-valued uncertain quantity, X, by specifying points on its
cumulative distribution. Each point is specified by a possible value, x, and a
corresponding left-tail cumulative probability, P(X<=x). Random values are based on
linear interpolation between the specified points.
RANDCUMULATIVE Syntax: RANDCUMULATIVE(value_cumulative_table)
Value_cumulative_table must be a reference, or the defined name of a reference, for a
two-column range, with values in the left column and corresponding cumulative
probabilities in the right column.
RANDCUMULATIVE Remarks
Returns #N/A if there are too few or too many arguments.
Returns #NAME! if the argument is text and the name is undefined.
Returns #NUM! if the first (top) cumulative probability is not zero, if the last (bottom)
cumulative probability is not one, or if the values or cumulative probabilities are not in
ascending order.
Returns #REF! if the number of columns in the table reference is not two.
Returns #VALUE! if the argument is not a reference, if the argument is a defined name
but not for a reference, or if any cell of the table contains text or is blank.
RANDCUMULATIVE Example
A corporate planner thinks that minimum possible market demand is 1000 units, median
is 5000, and maximum possible is 9000. Also, there is a ten percent chance that demand
will be less than 4000 and a ten percent chance it will exceed 7000. The values, x, and
cumulative probabilities, P(X<=x), are entered into spreadsheet cells A1:B5.
0.0005
0.0004
Probability Density, f(x)
0.0003
0.0002
0.0001
0
0 2000 4000 6000 8000 10000
Market Demand, x, in units
1
Cumulative Probability, P(X<=x
0.8
0.6
0.4
0.2
0
0 2000 4000 6000 8000 10000
Market Demand, x, in units
RandDiscrete
Returns a random value from a discrete probability distribution. This function can model
a discrete-valued uncertain quantity, X, by specifying its probability mass function. The
7.9 Random Number Generator Functions 81
function is specified by each possible discrete value, x, and its corresponding probability,
P(X=x).
RANDDISCRETE Syntax: RANDDISCRETE(value_discrete_table)
Value_discrete_table must be a reference, or the defined name of a reference, for a two-
column range, with values in the left column and corresponding probability mass in the
right column.
RANDDISCRETE Remarks
Returns #N/A if there are too few or too many arguments.
Returns #NAME! if the argument is text and the name is undefined.
Returns #NUM! if a probability is negative or if the probabilities do not sum to one.
Returns #REF! if the number of columns in the table reference is not two.
Returns #VALUE! if the argument is not a reference, if the argument is a defined name
but not for a reference, or if any cell of the table contains text or is blank.
RANDDISCRETE Example
A corporate planner thinks that uncertain market demand, X, can be approximated by
three possible values and their associated probabilities: P(X=3000) = 0.3, P(X=4000) =
0.6, and P(X=5000) = 0.1. The values and probabilities are entered into spreadsheet cells
A1:B3.
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 1000 2000 3000 4000 5000 6000 7000
Ma rke t D e ma nd, x, in units
1
Cumulative Probability, P(X<=x
0.8
0.6
0.4
0.2
0
0 1000 2000 3000 4000 5000 6000 7000
Market Demand, x, in units
RandExponential
Returns a random value from an exponential distribution. This function can model the
uncertain time interval between successive arrivals at a queuing system or the uncertain
time required to serve a customer.
RANDEXPONENTIAL Syntax: RANDEXPONENTIAL(lambda)
Lambda is the mean number of occurrences per unit of time.
7.9 Random Number Generator Functions 83
RANDEXPONENTIAL Remarks
Returns #N/A if there are too few or too many arguments.
Returns #NAME! if the argument is text and the name is undefined.
Returns #NUM! if lambda is negative or zero.
Returns #VALUE! if the argument is a defined name of a cell and the cell is blank or
contains text.
RANDEXPONENTIAL Examples
Cars arrive at a toll plaza with a mean rate of 3 cars per minute. The uncertain time
between successive arrivals, measured in minutes, is =RANDEXPONENTIAL(3). The
average value returned by repeated recalculation of RANDEXPONENTIAL(3) is 0.333.
A bank teller requires an average of two minutes to serve a customer. The uncertain
customer service time, measured in minutes, is =RANDEXPONENTIAL(0.5). The
average value returned by repeated recalculation of RANDEXPONENTIAL(0.5) is 2.
RANDEXPONENTIAL Related Functions
FASTEXPONENTIAL: Same as RANDEXPONENTIAL without any error checking of
the arguments.
−LN(RAND())/lambda: Excel's inverse of the exponential, or
−LN(RANDUNIFORM(0,1))/lambda to use the RiskSim Seed feature.
RANDPOISSON: Counts number of occurrences for a Poisson process.
RandInteger
Returns a uniformly distributed random integer between two integers you specify.
RANDINTEGER Syntax: RANDINTEGER(bottom,top)
Bottom is the smallest integer RANDINTEGER will return.
Top is the largest integer RANDINTEGER will return.
RANDINTEGER Remarks
Returns #N/A if there are too few or too many arguments.
Returns #NAME! if an argument is text and the name is undefined.
Returns #NUM! if top is less than or equal to bottom.
Returns #VALUE! if bottom or top is not an integer or if an argument is a defined name
of a cell and the cell is blank or contains text.
84 Chapter 7 Monte Carlo Simulation Using RiskSim
RANDINTEGER Example
The number of orders a particular customer will place next year is between 7 and 11, with
no number more likely than the others. The uncertain number of orders is
=RANDINTEGER(7,11).
RANDINTEGER Related Function
FASTINTEGER: Same as RANDINTEGER without any error checking of the
arguments.
RANDBETWEEN(bottom,top): Excel’s function for uniformly distributed integers,
without RiskSim’s capability of setting the seed.
RandNormal
Returns a random value from a normal distribution. This function can model a variety of
phenomena where the values follow the familiar bell-shaped curve, and it has wide
application in statistical quality control and statistical sampling.
RANDNORMAL Syntax: RANDNORMAL(mean,standard_dev)
Mean is the arithmetic mean of the normal distribution.
Standard_dev is the standard deviation of the normal distribution.
RANDNORMAL Remarks
Returns #N/A if there are too few or too many arguments.
Returns #NAME! if an argument is text and the name is undefined.
Returns #NUM! if standard_dev is negative.
Returns #VALUE! if an argument is a defined name of a cell and the cell is blank or
contains text.
RANDNORMAL Example
The total market for a product is approximately normally distributed with mean 60,000
units and standard deviation 5,000 units. The uncertain total market is
=RANDNORMAL(60000,5000).
RANDNORMAL Related Function
FASTNORMAL: Same as RANDNORMAL without any error checking of the
arguments.
NORMINV(RAND(),mean,standard_dev): Excel's inverse of the normal, or
NORMINV(RANDUNIFORM(0,1),mean,standard_dev) to use the RiskSim Seed
feature.
7.9 Random Number Generator Functions 85
RandSample
Returns a random sample without replacement from a population.
To use this random number generator function, select a number of cells equal to the
sample size, either in a single column or in a single row. Type =RANDSAMPLE
followed by a reference to the cells containing the population values, enclosed in
parentheses. After typing the ending parentheses, do not press Enter. Instead, hold down
the Control and Shift keys while you press Enter, thus "array entering" the function.
Syntax: RANDSAMPLE(population)
The population argument is a reference to a range of values in a single column.
Returns #N/A if the population range is not part of a single column.
Returns #REF! if the function is not entered into two adjacent cells.
Example: Type population values into cells A1:A5. For a sample of size 3, select cells
B1:B3, and type =RANDSAMPLE(A1:A5) but don't press Enter. Hold down Control and
Shift while you press Enter.
RandPoisson
Returns a random value from a Poisson distribution. This function can model the
uncertain number of occurrences during a specified time interval, for example, the
number of arrivals at a service facility during an hour. The possible values of
RANDPOISSON are the non-negative integers, 0, 1, 2, ... .
RANDPOISSON Syntax: RANDPOISSON(mean)
Mean is the mean number of occurrences per unit of time.
RANDPOISSON Remarks
Returns #N/A if there are too few or too many arguments.
Returns #NAME! if the argument is text and the name is undefined.
Returns #NUM! if mean is negative or zero.
Returns #VALUE! if mean is a defined name of a cell and the cell is blank or contains
text.
RANDPOISSON Examples
Cars arrive at a toll plaza with a mean rate of 3 cars per minute. The uncertain number of
arrivals in a minute is =RANDPOISSON(3). The average value returned by repeated
recalculation of RANDPOISSON(3) is 3.
86 Chapter 7 Monte Carlo Simulation Using RiskSim
A bank teller requires an average of two minutes to serve a customer. The uncertain
number of customers served in a minute is =RANDPOISSON(0.5). The average value
returned by repeated recalculation of RANDPOISSON(0.5) is 0.5.
RANDPOISSON Related Functions
FASTPOISSON: Same as RANDPOISSON without any error checking of the arguments.
RANDEXPONENTIAL: Describes time between occurrences for a Poisson process.
RandTriangular
Returns a random value from a triangular probability density function. This function can
model an uncertain quantity where the most likely value (mode) has the largest
probability of occurrence, the minimum and maximum possible values have essentially
zero probability of occurrence, and the probability density function is linear between the
minimum and the mode and between the mode and the maximum. This function can also
model a ramp density function where the minimum equals the mode or the mode equals
the maximum.
RANDTRIANGULAR Syntax:
RANDTRIANGULAR(minimum,most_likely,maximum)
Minimum is the smallest value RANDTRIANGULAR will return.
Most_likely is the most likely value RANDTRIANGULAR will return.
Maximum is the largest value RANDTRIANGULAR will return.
RANDTRIANGULAR Remarks
Returns #N/A if there are too few or too many arguments.
Returns #NAME! if an argument is text and the name is undefined.
Returns #NUM! if minimum is greater than or equal to maximum, if most_likely is less
than minimum, or if most_likely is greater than maximum.
Returns #VALUE! if an argument is a defined name of a cell and the cell is blank or
contains text.
RANDTRIANGULAR Example
The minimum time required to complete a particular task that is part of a large project is
4 hours, the most likely time required is 6 hours, and the maximum time required is 10
hours.
The function returning the uncertain time required for the task is entered into a cell:
=RANDTRIANGULAR(4,6,10).
7.9 Random Number Generator Functions 87
0.6
0.5
Probability Density, f(x)
0.4
0.3
0.2
0.1
0
0 2 4 6 8 10
Task Time, x, in hours
1
Cumulative Probability, P(X<=x
0.8
0.6
0.4
0.2
0
0 2 4 6 8 10
Task Time, x, in hours
RandUniform
88 Chapter 7 Monte Carlo Simulation Using RiskSim
Returns a uniformly distributed random value between two values you specify. As a
special case, RANDUNIFORM(0,1) is the same as Excel's built-in RAND() function.
RANDUNIFORM Syntax: RANDUNIFORM(minimum,maximum)
Minimum is the smallest value RANDUNIFORM will return.
Maximum is the largest value RANDUNIFORM will return.
RANDUNIFORM Remarks
Returns #N/A if there are too few or too many arguments.
Returns #NAME! if an argument is text and the name is undefined.
Returns #NUM! if minimum is greater than or equal to maximum.
Returns #VALUE! if an argument is a defined name of a cell and the cell is blank or
contains text.
RANDUNIFORM Example
A corporate planner thinks that the company's product will garner between 10% and 15%
of the total market, with all possible percentages equally likely in the specified range. The
uncertain market proportion is =RANDUNIFORM(0.10,0.15).
RANDUNIFORM Related Function
FASTUNIFORM: Same as RANDUNIFORM without any error checking of the
arguments.
In the Risk Simulation dialog box, the "Random number seed" edit box changes the seed
only for the RiskSim functions; it does not have any effect on Excel's built-in RAND()
function.
Each of RiskSim's random number generator functions use RandSeed as a building block.
RANDBINOMIAL(trials,probability_s) uses RandSeed as the cumulative probability in
Excel's built-in CRITBINOM function.
RANDBIVARNORMAL(mean1,stdev1,mean2,stdev2,correl12) uses two values of
RandNormal to obtain correlated normal values.
RANDCUMULATIVE(value_cumulative_table) uses the value of RandSeed, R, searches
to find the adjacent cumulative probabilities that bracket R, and interpolates on the linear
segment of the cumulative distribution to find the corresponding value.
RANDDISCRETE(value_discrete_table) compares RandSeed with summed probabilities
of the input table until the sum exceeds the RandSeed value, and then returns the previous
value from the input table.
RANDEXPONENTIAL(lambda) uses the value of RandSeed, R, as follows. If the
exponential density function is f(t) = lambda*EXP(-lambda*t), the cumulative is P(T<=t)
= 1 - EXP(-lambda*t). Associating R with P(T<=t), the inverse cumulative is t = -LN(1-
R)/lambda. Since R and 1-R are both uniformly distributed between 0 and 1, RiskSim
uses -LN(R)/lambda for the returned value.
RANDINTEGER(bottom,top) returns bottom + INT(RandSeed*(top-bottom+1)).
RANDNORMAL(mean,standard_dev) uses two RandSeed values in the well-
documented Box-Muller method.
RANDPOISSON(mean) compares RandSeed with cumulative probabilities of Excel's
built-in POISSON function until the probability exceeds the RandSeed value, and then
returns the previous value.
RANDSAMPLE(population) uses RandSeed for each of the cells that were selected when
the function was array-entered, avoiding population values that have already been
selected, thus providing sampling without replacement.
RANDTRIANGULAR(minimum,most_likely,maximum) uses RandSeed once. The
triangular density function has two linear segments, so the cumulative distribution has
two quadratic segments. The returned value is determined by interpolation on the
appropriate quadratic segment.
RANDUNIFORM(minimum,maximum) returns minimum + RandSeed*(maximum-
minimum). RANDUNIFORM(0,1) is equivalent to Excel's built-in RAND() function.
90 Chapter 7 Monte Carlo Simulation Using RiskSim
Net Cash
Flow
Unit
Price Fixed Costs Units Sold
Variable Cost
Three Inputs
Price is variable. Units sold depends on price. The two cost inputs are independent.
Net Cash
Flow
Units Sold
Unit
Price Fixed Costs Variable Cost
Two Inputs
Price is variable. Units sold depends on price. Unit variable cost depends on fixed costs.
Net Cash
Flow
Unit
Units Sold
Variable Cost
Net Cash
Flow
Unit
Units Sold
Variable Cost
Intermediate Details
Price is variable. Units sold depends on price. Unit variable cost depends on fixed costs.
Fixed costs, units sold, and unit variable cost are uncertain.
Include revenue, total variable cost, and total costs as intermediate variables.
Net Cash
Flow
Total
Variable Cost
Unit
Units Sold
Variable Cost
accumulated cash will include net cash flow (income minus expense) in each of the three
years, interest from CDs received at the end of the second and third years, and cash from
the sale of the property at the end of the third year.
In your initial analysis you have decided to ignore depreciation and other issues related to
income taxes.
Instead of purchasing the apartment building, you could invest the entire $2,000,000 in
certificates of deposits yielding 5 percent per year.
8.1 Apartment Building Purchase Problem 99
Competitor Entry 1 0
Competitor Entry 1 0
$100,000 $200,000 $300,000 $400,000 $500,000 $600,000 $700,000 $800,000 $900,000 $1,000,00 $1,100,00
NPV 10% 0 0
AJS Process 1
Under the first process, AJS's current machinery is used to make the product. The
following inputs are used:
Demand Demand for each of the three years is unknown. The three annual demands are
modeled as discrete uncertain quantities with the probability distributions shown in the
spreadsheet display.
106 Chapter 8 Multiperiod What-If Modeling
Variable Cost Variable cost per unit changes each year, depending on the costs for
materials and labor. The uncertainty about each variable cost is represented by a
continuous normal distribution with mean $4.00 and standard deviation $0.40.
Machine Failure Each year, AJS's machines fail occasionally, but obviously it is
impossible to predict when or how many failures will occur during the year. Each time a
machine fails, it costs the firm $8000. The uncertainty about the number of machine
failures in each of the three years is represented by a Poisson random variable with
average 4 failures per year.
Fixed Cost Each year a fixed cost of $12,000 is incurred.
AJS Process 2
The second process involves scrapping the current equipment (it has no salvage value)
and purchasing new equipment to make the product at a cost of $60,000. Assume that the
firm pays cash for the new machine, and ignore tax effects.
Demand Because of the new machine, the final product is slightly altered and improved,
and consequently the demands are likely to be higher than before, although more
uncertain. The new demand distributions are shown in the spreadsheet display.
Variable Cost Variable cost per unit still changes each year. With the new machine it is
judged to be slightly lower but with more uncertainty, so the cost is described by a
normal distribution with mean $3.50 and standard deviation $1.00.
Machine Failure Equipment failures are less likely with the new equipment, with an
average of three per year. Such failures tend to be less serious with the new machine,
costing only $6000.
Fixed Cost The annual fixed cost of $12,000 is unchanged.
8.3 Machine Simulation Model 107
400
350
300
250
Frequency
200
150
100
50
0
-$100,000 $0 $100,000 $200,000
NPV, Upper Limit of Interval
1.0
0.9
0.8
Cumulative Probability
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
-$100,000 -$50,000 $0 $50,000 $100,000 $150,000 $200,000 $250,000
NPV
110 Chapter 8 Multiperiod What-If Modeling
300
250
200
Frequency
150
100
50
0
-$100,000 $0 $100,000 $200,000 $300,000
NPV, Upper Limit of Interval
1.0
0.9
0.8
Cumulative Probability
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
-$100,00 -$50,000 $0 $50,000 $100,00 $150,00 $200,00 $250,00 $300,00 $350,00
0 0 0 0 0 0 0
NPV
8.3 Machine Simulation Model 111
Follow these instructions to show two or more risk profiles on the same chart.
Use RiskSim to obtain the sorted values, cumulative probabilities, and XY charts for
strategy A and strategy B.
To add the data for strategy B to the existing plot for strategy A, select the sorted values
and cumulative probabilities for strategy B (without including the text labels in row 1),
and choose Edit | Copy.
Click just inside the outer border of the strategy A chart to select it. From the main menu,
choose Edit | Paste Special. In the Paste Special dialog box, select "Add cells as New
series," select "Values (Y) in Columns," check the box for "Categories (X Values) in
First Column," and click OK.
Use the same method to add data for other strategies to the strategy A chart.
To change the lines and markers of a data series, click a data point on the chart to select
the data series, and choose Format | Selected Data Series | Patterns.
If the X values are quite different for the various strategies, it may be necessary to adjust
the minimum and maximum values on the Scale tab of the Format Axis dialog box.
112 Chapter 8 Multiperiod What-If Modeling
1.0
0.9
Process 1 Process 2
0.8
0.7
Cumulative Probability
0.6
0.5
0.4
0.3
0.2
0.1
0.0
-$100,000 -$50,000 $0 $50,000 $100,000 $150,000 $200,000 $250,000 $300,000 $350,000
NPV
Modeling Inventory
Decisions
9
This chapter describes simulation and expected value methods for determining how much
of a product or service to have on hand for a single period when there is uncertain
demand and no possibility of reordering.
Performance measures
Equilibrium
Average waiting time
Average number of customers in line
System utilization, rho = mean arrival rate / mean service rate
Stable system: rho < 1
Cost of
Delays
Unloading Capacity
10.1 Queue Simulation 117
32 31 $ 7,000 30
33 32 $ 6,000 25
34 33 $ 7,500
35 34 $ 9,500 20
36 35 $ 7,500 15
37 36 $ 12,500 10
38 37 $ 8,500
39 38 $ 14,000 5
40 39 $ 7,000 0
41 40 $ 31,000 5000 20000 35000 50000 65000 80000 95000
42 41 $ 22,000
Annual Cost of Delays
43 42 $ 40,000
44 43 $ 10,500
45 44 $ 8,500
10.1 Queue Simulation 119
Part 3 describes decision tree models, which are particularly useful for sequential
decision problems under uncertainty. Documentation and examples are included for the
TreePlan decision tree add-in for Excel.
Sensitivity analysis with standard Excel features is used to check decision tree input
assumptions regarding probabilities and cash flows
Subsequent chapters describe value of information and risk attitude.
122
inexpensive approach uses magnetic components. This magnetic method costs more than
the electronic method, and the engineers think that it has a higher chance of success.
DriveTek Research can work on only one approach at a time and has time to try only two
approaches. If it tries either the magnetic or electronic method and the attempt fails, the
second choice must be the mechanical method to guarantee a successful model.
The management of DriveTek Research needs help in incorporating this information into
a decision to proceed or not.
In the DriveTek problem, the first portion of the decision tree is shown in Figure 10.2.
Awarded contract
Prepare proposal
If DriveTek is awarded the contract, they must decide which approach to use. For the
electronic and magnetic approaches, the result is uncertain, as shown in Figure 10.3. The
arrangement of the decision and event branches is called the structure of the decision
tree.
126 Chapter 11 Introduction to Decision Trees
Electronic success
Electronic failure
Magnetic success
Prepare proposal
Try magnetic method
Magnetic failure
For representing a sequential decision problem, the tree diagram is usually better than the
written description. In some decision problems, the choice may be obvious by looking at
the diagram. That is, the decision maker may know enough about the desirability of the
outcomes (endpoints in the tree) and how likely they are. But usually the next step in the
analysis after documenting the structure is to assign values to the endpoints.
that the sure-success mechanical method will cost $120,000. The possibly-successful
electronic approach will cost $50,000, and the more-likely-successful magnetic approach
will cost $80,000. In the DriveTek problem, these distinct cash flows associated with
many of the decision and event branches are shown in Figure 10.4.
Electronic success
$150,000
Try electronic method $0
Awarded contract
-$50,000
$250,000 Electronic failure
$30,000
-$120,000
Magnetic success
Prepare proposal $120,000
Try magnetic method $0
-$50,000
-$80,000
Magnetic failure
$0
-$120,000
Figure 10.4 also shows the sum of branch cash flows at the endpoints. For example, the
$30,000 terminal value on the far right of the diagram is associated with the scenario
shown in Figure 10.5.
0.5
Electronic success
$150,000
0.5 Try electronic method
Awarded contract
0.5
Electronic failure
$30,000
0.7
Magnetic success
Prepare proposal $120,000
Try magnetic method
0.3
Magnetic failure
$0
0.5
Not awarded contract
-$50,000
Next: How do you decide what choice to make at each decision node?
Occasional Use
If you plan to use TreePlan on an irregular basis, simply use Excel’s File | Open
command to load TreePlan.xla each time you want to use it. You may keep the
TreePlan.xla file on a floppy disk, your computer’s hard drive, or a network server.
Selective Use
You can use Excel’s Add-In Manager to install TreePlan. First, copy TreePlan.xla to a
location on your computer’s hard drive. Second, if you save TreePlan.xla in the Excel or
Office Library subdirectory, go to the third step. Otherwise, run Excel, choose Tools |
Add-Ins; in the Add-Ins dialog box, click the Browse button, use the Browse dialog box
to specify the location of TreePlan.xla, and click OK. Third, in the Add-Ins dialog box,
note that TreePlan is now listed with a check mark, indicating that its menu command
will appear in Excel, and click OK.
If you plan to not use TreePlan and you want to free up main memory, uncheck the box
for TreePlan in the Add-In Manager. When you do want to use TreePlan, choose Tools |
Add-Ins and check TreePlan’s box.
130 Chapter 12 Decision Trees Using TreePlan
To remove TreePlan from the Add-In Manager, use Windows Explorer or another file
manager to delete TreePlan.xla from the Library subdirectory or from the location you
specified when you used the Add-In Manager’s Browse command. The next time you
start Excel and choose Tools | Add-Ins, a dialog box will state “Cannot find add-in …
treeplan.xla. Delete from list?” Click Yes.
Steady Use
If you want TreePlan’s options immediately available each time you run Excel, use
Windows Explorer or another file manager to save TreePlan.xla in the Excel XLStart
directory. Alternatively, in Excel you can use Tools | Options | General to specify an
alternate startup file location and use a file manager to save TreePlan.xla there. When you
start Excel, it tries to open all files in the XLStart directory and in the alternate startup file
location.
For additional information visit “TreePlan FAQ” at www.treeplan.com.
After opening TreePlan.xla in Excel, the command "Decision Tree" appears at the bottom
of the Tools menu (or, if you have a customized main menu, at the bottom of the sixth
main menu item).
Build up a tree by adding or modifying branches or nodes in the default tree. To change
the branch labels or probabilities, click on the cell containing the label or probability and
type the new label or probability. To modify the structure of the tree (e.g., add or delete
branches or nodes in the tree), select the node or the cell containing the node in the tree to
modify, and choose Tools | Decision Tree or press Ctrl+t. TreePlan will then present a
dialog box showing the available commands.
For example, to add an event node to the top branch of the tree shown above, select the
square cell (cell G4) next to the vertical line at the end of a terminal branch and press
Ctrl+t.. TreePlan then presents this dialog box.
To add an event node to the branch, we change the selected terminal node to an event
node by selecting Change to event node in the dialog box, selecting the number of
branches (here two), and pressing OK. TreePlan then redraws the tree with a chance node
in place of the terminal node.
Figure 12.3
A B C D E F G H I J K L M
1
2 0.5
3 Event 3
4 0
5 Decision 1 0 0
6
7 0 0 0.5
8 Event 4
9 0
10 1 0 0
11 0
12
13 Decision 2
14 0
15 0 0
16
The dialog boxes presented by TreePlan vary depending on what you have selected when
you choose Tools | Decision Tree or press Ctrl+t. The dialog box shown below is
presented when you press Ctrl+t with an event node selected; a similar dialog box is
132 Chapter 12 Decision Trees Using TreePlan
presented when you select a decision node. If you want to add a branch to the selected
node, choose Add branch and press OK. If you want to insert a decision or event node
before the selected node, choose Insert decision or Insert event and press OK. To get a
description of the available commands, click on the Help button.
Figure 12.4
The Copy subtree command is particularly useful when building large trees. If two or
more parts of the tree are similar, you can copy and paste "subtrees" rather than building
up each part separately. To copy a subtree, select the node at the root of the subtree and
choose Copy subtree. This tells TreePlan to copy the selected node and everything to the
right of it in the tree. To paste this subtree, select a terminal node and choose Paste
subtree. TreePlan then duplicates the specified subtree at the selected terminal node.
Since TreePlan decision trees are built directly in Excel, you can use Excel's commands
to format your tree. For example, you can use bold or italic fonts for branch labels: select
the cells you want to format and change them using Excel's formatting commands. To
help you, TreePlan provides a Select dialog box that appears when you choose Tools
Decision Tree or press Ctrl+t without a node selected. You can also bring up this dialog
box by pressing the Select button on the Node dialog box. From here, you can select all
items of a particular type in the tree. For example, if you choose Probabilities and press
OK, TreePlan selects all cells containing probabilities in the tree. You can then format all
of the probabilities simultaneously using Excel's formatting commands. (Because of
limitations in Excel, the Select dialog box will not be available when working with very
large trees.)
cell references, or labels pertaining to that branch. You may edit the labels, probabilities,
and partial cash flows associated with each branch. The partial cash flows are the amount
the firm "gets paid" to go down that branch. Here, the firm pays $50,000 if it decides to
prepare the proposal, receives $250,000 up front if awarded the contract, spends $50,000
to try the electronic method, and spends $120,000 on the mechanical method if the
electronic method fails.
Figure 12.5
PROBABILITIES: Enter numbers TERMINAL VALUES: TreePlan formula for
or formulas in these cells. sum of partial cash flows along path.
DECISION NODES: TreePlan formula
for which alternative is optimal.
TERMINAL NODES
Don't prepare proposal
$0
$0 $0
The trees are "solved" using formulas embedded in the spreadsheet. The terminal values
sum all the partial cash flows along the path leading to that terminal node. The tree is
then "rolled back" by computing expected values at event nodes and by maximizing at
decision nodes; the rollback EVs appear next to each node and show the expected value
at that point in the tree. The numbers in the decision nodes indicate which alternative is
optimal for that decision. In the example, the "1" in the first decision node indicates that
it is optimal to prepare the proposal, and the "2" in the second decision node indicates the
firm should try the electronic method because that alternative leads to a higher expected
value, $90,000, than the mechanical method, $80,000.
TreePlan has a few options that control the way calculations are done in the tree. To
select these options, press the Options button in any of TreePlan's dialog boxes. The first
choice is whether to Use Expected Values or Use Exponential Utility Function for
computing certainty equivalents. The default is to rollback the tree using expected values.
If you choose to use exponential utilities, TreePlan will compute utilities of endpoint cash
flows at the terminal nodes and compute expected utilities instead of expected values at
event nodes. Expected utilities are calculated in the cell below the certainty equivalents.
You may also choose to Maximize (profits) or Minimize (costs) at decision nodes; the
default is to maximize profits. If you choose to minimize costs instead, the cash flows are
134 Chapter 12 Decision Trees Using TreePlan
interpreted as costs, and decisions are made by choosing the minimum expected value or
certainty equivalent rather than the maximum. See the Help file for details on these
options.
DriveTek Problem
DriveTek Research Institute discovers that a computer company wants a new tape drive
for a proposed new computer system. Since the computer company does not have
research people available to develop the new drive, it will subcontract the development to
an independent research firm. The computer company has offered a fee of $250,000 for
the best proposal for developing the new tape drive. The contract will go to the firm with
the best technical plan and the highest reputation for technical competence.
DriveTek Research Institute wants to enter the competition. Management estimates a cost
of $50,000 to prepare a proposal with a fifty-fifty chance of winning the contract.
However, DriveTek's engineers are not sure about how they will develop the tape drive if
they are awarded the contract. Three alternative approaches can be tried. The first
approach is a mechanical method with a cost of $120,000, and the engineers are certain
they can develop a successful model with this approach. A second approach involves
electronic components. The engineers estimate that the electronic approach will cost only
$50,000 to develop a model of the tape drive, but with only a 50 percent chance of
satisfactory results. A third approach uses magnetic components; this costs $80,000, with
a 70 percent chance of success.
DriveTek Research can work on only one approach at a time and has time to try only two
approaches. If it tries either the magnetic or electronic method and the attempt fails, the
second choice must be the mechanical method to guarantee a successful model.
The management of DriveTek Research needs help in incorporating this information into
a decision to proceed or not.
12.4 Step-by-Step TreePlan Tutorial 135
[Source: The tape drive example is adapted from Spurr and Bonini, Statistical Analysis
for Business Decisions, Irwin.]
Terminal Values
Each terminal node has an associated terminal value, sometimes called a payoff value,
outcome value, or endpoint value. Each terminal value measures the result of a scenario:
the sequence of decisions and events on a unique path leading from the initial decision
node to a specific terminal node.
To determine the terminal value, one approach assigns a cash flow value to each decision
branch and event branch and then sum the cash flow values on the branches leading to a
terminal node to determine the terminal value. In the DriveTek problem, there are distinct
cash flows associated with many of the decision and event branches. Some problems
require a more elaborate value model to determine the terminal values.
The following diagram shows the arrangement of branch names, probabilities, and cash
flow values on an unsolved tree.
12.4 Step-by-Step TreePlan Tutorial 137
Figure 12.7
Use mechanical method
-$120,000
0.5
Electronic success
-$120,000
0.7
Magnetic success
Prepare proposal
Try magnetic method $0
-$50,000
-$80,000 0.3
Magnetic failure
-$120,000
0.5
Not awarded contract
$0
$0
To build the decision tree, you use TreePlan’s dialog boxes to develop the structure. You
enter a branch name, branch cash flow, and branch probability (for an event) in the cells
above and below the left side of each branch. As you build the tree diagram, TreePlan
enters formulas in other cells.
Figure 12.8
Figure 12.9
A B C D E F G
1
2 Decision 1
3 0
4 0 0
5 1
6 0
7 Decision 2
8 0
9 0 0
3. Do not type the quotation marks in the following instructions. Select cell D2,
and enter Prepare proposal. Select cell D4, and enter –50000. Select cell D7,
and enter Don't prepare proposal.
Figure 12.10
A B C D E F G
1
2 Prepare proposal
3 -50000
4 -50000 -50000
5 2
6 0
7 Don't prepare proposal
8 0
9 0 0
4. Select cell F3. From the Tools menu, choose Decision Tree. In the TreePlan
Terminal dialog box, select Change To Event Node, select Two Branches, and
click OK. The tree is redrawn.
12.4 Step-by-Step TreePlan Tutorial 139
Figure 12.11
Figure 12.12
A B C D E F G H I J K
1 0.5
2 Event 3
3 -50000
4 Prepare proposal 0 -50000
5
6 -50000 -50000 0.5
7 Event 4
8 -50000
9 2 0 -50000
10 0
11
12 Don't prepare proposal
13 0
14 0 0
5. Select cell H2, and enter Awarded contract. Select cell H4, and enter 250000.
Select cell H7, and enter Not awarded contract.
140 Chapter 12 Decision Trees Using TreePlan
Figure 12.13
A B C D E F G H I J K
1 0.5
2 Awarded contract
3 200000
4 Prepare proposal 250000 200000
5
6 -50000 75000 0.5
7 Not awarded contract
8 -50000
9 1 0 -50000
10 75000
11
12 Don't prepare proposal
13 0
14 0 0
6. Select cell J3. From the Tools menu, choose Decision Tree. In the TreePlan
Terminal dialog box, select Change To Decision Node, select Three Branches,
and click OK. The tree is redrawn.
Figure 12.14
A B C D E F G H I J K L M N O
1
2 Decision 5
3 200000
4 0 200000
5
6 0.5
7 Awarded contract Decision 6
8 1 200000
9 250000 200000 0 200000
10
11
12 Prepare proposal Decision 7
13 200000
14 -50000 75000 0 200000
15
16 0.5
17 Not awarded contract
18 1 -50000
19 75000 0 -50000
20
21
22 Don't prepare proposal
23 0
24 0 0
7. Select cell L2, and enter Use mechanical method. Select cell L4, and enter –
120000. Select cell L7, and enter Try electronic method. Select cell L9, and
12.4 Step-by-Step TreePlan Tutorial 141
enter –50000. Select cell L12, and enter Try magnetic method. Select cell L14,
and enter –80000.
Figure 12.15
A B C D E F G H I J K L M N O
1
2 Use mechanical method
3 80000
4 -120000 80000
5
6 0.5
7 Awarded contract Try electronic method
8 2 150000
9 250000 150000 -50000 150000
10
11
12 Prepare proposal Try magnetic method
13 120000
14 -50000 50000 -80000 120000
15
16 0.5
17 Not awarded contract
18 1 -50000
19 50000 0 -50000
20
21
22 Don't prepare proposal
23 0
24 0 0
8. Select cell N8. From the Tools menu, choose Decision Tree. In the TreePlan
Terminal dialog box, select Change To Event Node, select Two Branches, and
click OK. The tree is redrawn.
142 Chapter 12 Decision Trees Using TreePlan
Figure 12.16
A B C D E F G H I J K L M N O P Q R S
1
2 Use mechanical method
3 80000
4 -120000 80000
5
6 0.5
7 Event 8
8 0.5 150000
9 Awarded contract Try electronic method 0 150000
10 2
11 250000 150000 -50000 150000 0.5
12 Event 9
13 150000
14 0 150000
15 Prepare proposal
16
17 -50000 50000 Try magnetic method
18 120000
19 -80000 120000
20
21 0.5
22 1 Not awarded contract
23 50000 -50000
24 0 -50000
25
26
27 Don't prepare proposal
28 0
29 0 0
9. Select cell P7, and enter Electronic success. Select cell P12, and enter
Electronic failure. Select cell P14, and enter –120000.
12.4 Step-by-Step TreePlan Tutorial 143
Figure 12.17
A B C D E F G H I J K L M N O P Q R S
1
2 Use mechanical method
3 80000
4 -120000 80000
5
6 0.5
7 Electronic success
8 0.5 150000
9 Awarded contract Try electronic method 0 150000
10 3
11 250000 120000 -50000 90000 0.5
12 Electronic failure
13 30000
14 -120000 30000
15 Prepare proposal
16
17 -50000 35000 Try magnetic method
18 120000
19 -80000 120000
20
21 0.5
22 1 Not awarded contract
23 35000 -50000
24 0 -50000
25
26
27 Don't prepare proposal
28 0
29 0 0
10. Select cell N18. From the Tools menu, choose Decision Tree. In the TreePlan
Terminal dialog box, select Change To Event Node, select Two Branches, and
click OK. The tree is redrawn.
144 Chapter 12 Decision Trees Using TreePlan
Figure 12.18
A B C D E F G H I J K L M N O P Q R S
1
2 Use mechanical method
3 80000
4 -120000 80000
5
6 0.5
7 Electronic success
8 150000
9 0.5 Try electronic method 0 150000
10 Awarded contract
11 3 -50000 90000 0.5
12 250000 120000 Electronic failure
13 30000
14 -120000 30000
15
16 0.5
17 Event 10
18 Prepare proposal 120000
19 Try magnetic method 0 120000
20 -50000 35000
21 -80000 120000 0.5
22 Event 11
23 120000
24 0 120000
25
26 1 0.5
27 35000 Not awarded contract
28 -50000
29 0 -50000
30
31
32 Don't prepare proposal
33 0
34 0 0
11. Select cell P16, and enter .7. Select cell P17, and enter Magnetic success. Select
cell P21, and enter .3. Select cell P22, and enter Magnetic failure. Select cell
P24, and enter –120000.
12.4 Step-by-Step TreePlan Tutorial 145
Figure 12.19
A B C D E F G H I J K L M N O P Q R S
1
2 Use mechanical method
3 80000
4 -120000 80000
5
6 0.5
7 Electronic success
8 150000
9 0.5 Try electronic method 0 150000
10 Awarded contract
11 2 -50000 90000 0.5
12 250000 90000 Electronic failure
13 30000
14 -120000 30000
15
16 0.7
17 Magnetic success
18 Prepare proposal 120000
19 Try magnetic method 0 120000
20 -50000 20000
21 -80000 84000 0.3
22 Magnetic failure
23 0
24 -120000 0
25
26 1 0.5
27 20000 Not awarded contract
28 -50000
29 0 -50000
30
31
32 Don't prepare proposal
33 0
34 0 0
12. Double-click the sheet tab (or right-click the sheet tab and choose Rename from
the shortcut menu), and enter Original. Save the workbook.
Figure 12.20
Branch Type Branch Name Cash Flow
Decision Prepare proposal –$50,000
Event Awarded contract $250,000
Decision Try electronic method –$50,000
Event Electronic failure (Use mechanical method) –$120,000
Terminal value $30,000
TreePlan put the formula =SUM(P14,L11,H12,D20) into cell S13 for determining the
terminal value.
146 Chapter 12 Decision Trees Using TreePlan
Other formulas, called rollback formulas, are in cells below and to the left of each node.
These formulas are used to determine the optimal choice at each decision node.
In cell B26, a formula displays 1, indicating that the first branch is the optimal choice.
Thus, the initial choice is to prepare the proposal. In cell J11, a formula displays 2,
indicating that the second branch (numbered 1, 2, and 3, from top to bottom) is the
optimal choice. If awarded the contract, DriveTek should try the electronic method. A
subsequent chapter provides more details about interpretation.
Figure 12.21
15. Select cell H12. From the Tools menu, choose Decision Tree. In the TreePlan
Select dialog box, verify that the option button for Cells with Partial Cash Flows
is selected, and click OK. With all partial cash flow cells selected, click the
Align Left button. With those cells still selected, choose Format | Cells. In the
Format Cells dialog box, click the Number tab. In the Category list box, choose
12.4 Step-by-Step TreePlan Tutorial 147
Currency; type 0 (zero) for Decimal Places; select $ in the Symbol list box;
select -$1,234 for Negative Numbers. Click OK.
Figure 12.22
16. Select cell I12. From the Tools menu, choose Decision Tree. In the TreePlan
Select dialog box, verify that the option button for Cells with Rollback EVs/CEs
is selected, and click OK. With all rollback cells selected, choose Format | Cells.
Repeat the Currency formatting of step 16 above.
17. Select cell S3. From the Tools menu, choose Decision Tree. In the TreePlan
Select dialog box, verify that the option button for Cells with Terminal Values is
selected, and click OK. With all terminal value cells selected, choose Format |
Cells. Repeat the Currency formatting of step 16 above.
148 Chapter 12 Decision Trees Using TreePlan
Figure 12.23
A B C D E F G H I J K L M N O P Q R S
1
2 Use mechanical method
3 $80,000
4 -$120,000 $80,000
5
6 0.5
7 Electronic success
8 $150,000
9 0.5 Try electronic method $0 $150,000
10 Awarded contract
11 2 -$50,000 $90,000 0.5
12 $250,000 $90,000 Electronic failure
13 $30,000
14 -$120,000 $30,000
15
16 0.7
17 Magnetic success
18 Prepare proposal $120,000
19 Try magnetic method $0 $120,000
20 -$50,000 $20,000
21 -$80,000 $84,000 0.3
22 Magnetic failure
23 $0
24 -$120,000 $0
25
26 1 0.5
27 $20,000 Not awarded contract
28 -$50,000
29 $0 -$50,000
30
31
32 Don't prepare proposal
33 $0
34 $0 $0
18. Double-click the Original (2) sheet tab (or right-click the sheet tab and choose
Rename from the shortcut menu), and enter Formatted. Save the workbook.
Figure 12.24
Explanation: A custom number format has four sections of format codes. The sections are
separated by semicolons, and they define the formats for positive numbers, negative
numbers, zero values, and text, in that order. When you specify three semicolons without
format codes, Excel does not display positive numbers, negative numbers, zero values, or
text. The formula remains in the cell, but its result is not displayed. Later, if you want to
display the result, you can change the format without having to enter the formula again.
Editing an existing format does not delete it. All formats are saved with the workbook
unless you explicitly delete a format.
21. Select cell A27. From the Tools menu, choose Decision Tree. In the TreePlan
Select dialog box, verify that the option button for Cells with Rollback EVs/CEs
is selected, and click OK. With all rollback values selected, choose Format |
Cells | Number. In the Category list box, select Custom. Scroll to the bottom of
the Type list box, and select the three-semicolon entry. Click OK.
22. Double-click the Formatted (2) sheet tab (or right-click the sheet tab and choose
Rename from the shortcut menu), and enter Model Inputs. Save the workbook.
150 Chapter 12 Decision Trees Using TreePlan
Figure 12.25
0.5
Electronic success
$150,000
0.5 Try electronic method $0
Awarded contract
-$50,000 0.5
$250,000 Electronic failure
$30,000
-$120,000
0.7
Magnetic success
Prepare proposal $120,000
Try magnetic method $0
-$50,000
-$80,000 0.3
Magnetic failure
$0
-$120,000
0.5
Not awarded contract
-$50,000
$0
Alternative Model
If you want to emphasize that the time constraint forces DriveTek to use the mechanical
approach if they try either of the uncertain approaches and experience a failure, you can
change the terminal nodes in cells R13 and R23 to decision nodes, each with a single
branch.
Figure 12.26
0.5
Electronic success
$150,000
0.5 Try electronic method $0
Awarded contract
-$50,000 0.5
$250,000 Electronic failure Use mechanical method
$30,000
$0 -$120,000
0.7
Magnetic success
Prepare proposal $120,000
Try magnetic method $0
-$50,000
-$80,000 0.3
Magnetic failure Use mechanical method
$0
$0 -$120,000
0.5
Not awarded contract
-$50,000
$0
Payoff Distribution
Each strategy has an associated payoff distribution, sometimes called a risk profile. The
payoff distribution of a particular strategy is a probability distribution showing the
probability of obtaining each terminal value associated with a particular strategy.
In decision tree models, the payoff distribution can be shown as a list of possible payoff
values, x, and the discrete probability of obtaining each value, P(X=x), where X
represents the uncertain terminal value associated with a strategy. Since a strategy
specifies a choice at each decision node, the uncertainty about terminal values depends
only on the occurrence of events. The probability of obtaining a specific terminal value
equals the product of the probabilities on the event branches on the path leading to the
terminal node.
DriveTek Strategies
In this section each strategy of the DriveTek problem is described by a shorthand
statement and a more detailed statement. The possible branches following a specific
strategy are shown in decision tree form, and the payoff distribution is shown in a table
with an explanation of the probability calculations.
12.5 Decision Tree Solution 153
Figure 12.27
0.5
Electronic success
$150,000
0.5 Try electronic method
Awarded contract
0.5
Electronic failure
$30,000
0.7
Magnetic success
Prepare proposal $120,000
Try magnetic method
0.3
Magnetic failure
$0
0.5
Not awarded contract
-$50,000
Figure 12.28
Probability
Value, x P(X=x)
$80,000 0.50
-$50,000 0.50
1.00
154 Chapter 12 Decision Trees Using TreePlan
Figure 12.29
0.5
Electronic success
$150,000
0.5 Try electronic method
Awarded contract
0.5
Electronic failure
$30,000
0.7
Magnetic success
Prepare proposal $120,000
Try magnetic method
0.3
Magnetic failure
$0
0.5
Not awarded contract
-$50,000
Figure 12.30
Probability
Value, x P(X=x)
$150,000 0.25 = 0.5 * 0.5
$30,000 0.25 = 0.5 * 0.5
-$50,000 0.50
1.00
12.5 Decision Tree Solution 155
Figure 12.31
0.5
Electronic success
$150,000
0.5 Try electronic method
Awarded contract
0.5
Electronic failure
$30,000
0.7
Magnetic success
Prepare proposal $120,000
Try magnetic method
0.3
Magnetic failure
$0
0.5
Not awarded contract
-$50,000
Figure 12.32
Probability
Value, x P(X=x)
$120,000 0.35 = 0.5 * 0.7
$0 0.15 = 0.5 * 0.3
-$50,000 0.50
1.00
156 Chapter 12 Decision Trees Using TreePlan
Figure 12.33
0.5
Electronic success
$150,000
0.5 Try electronic method
Awarded contract
0.5
Electronic failure
$30,000
0.7
Magnetic success
Prepare proposal $120,000
Try magnetic method
0.3
Magnetic failure
$0
0.5
Not awarded contract
-$50,000
Figure 12.34
Probability
Value, x P(X=x)
$0 1.00
1.00
Strategy Choice
Since each strategy can be characterized completely by its payoff distribution, selecting
the best strategy becomes a problem of choosing the best payoff distribution.
One approach is to make a choice by direct comparison of the payoff distributions.
12.5 Decision Tree Solution 157
Figure 12.35
Strategy 1 (Mechanical) Strategy 2 (Electronic)
Probability Probability
Value, x P(X=x) Value, x P(X=x)
$80,000 0.50 $150,000 0.25
-$50,000 0.50 $30,000 0.25
1.00 -$50,000 0.50
1.00
Certainty Equivalent
A certainty equivalent is a certain payoff value which is equivalent, for the decision
maker, to a particular payoff distribution. If the decision maker can determine his or her
certainty equivalent for the payoff distribution of each strategy, then the optimal strategy
is the one with the highest certainty equivalent.
The certainty equivalent is the minimum selling price for a payoff distribution; it depends
on the decision maker's personal attitude toward risk. A decision maker may be risk
preferring, risk neutral, or risk avoiding.
If the terminal values are not regarded as extreme (relative to the decision maker's total
assets), if the decision maker will encounter other decision problems with similar payoffs,
and if the decision maker has the attitude that he or she will "win some and lose some,"
then the decision maker's attitude toward risk may be described as risk neutral.
If the decision maker is risk neutral, the expected value is the appropriate certainty
equivalent for choosing among the strategies. Thus, for a risk neutral decision maker, the
optimal strategy is the one with the highest expected value.
The expected value of a payoff distribution is calculated by multiplying each terminal
value by its probability and summing the products. The expected value calculations for
each of the four strategies of the DriveTek problem are shown below.
158 Chapter 12 Decision Trees Using TreePlan
Figure 12.36
Strategy 1 (Mechanical)
Probability
Value, x P(X=x) x * P(X=x)
$80,000 0.50 $40,000
-$50,000 0.50 -$25,000
$15,000
Strategy 2 (Electronic)
Probability
Value, x P(X=x) x * P(X=x)
$150,000 0.25 $37,500
$30,000 0.25 7,500
-$50,000 0.50 -$25,000
$20,000
Strategy 3 (Magnetic)
Probability
Value, x P(X=x) x * P(X=x)
$120,000 0.35 $42,000
$0 0.15 $0
-$50,000 0.50 -$25,000
$17,000
Strategy 4 (Don't)
Probability
Value, x P(X=x) x * P(X=x)
$0 1.00 $0
$0
The four strategies of the DriveTek problem have expected values of $15,000, $20,000,
$17,000, and $0. Strategy 2 (Electronic) is the optimal strategy with expected value
$20,000.
A risk neutral decision maker's choice is based on the expected value. However, note that
if strategy 2 (Electronic) is chosen, the decision maker does not receive $20,000. The
actual payoff will be $150,000, $30,000, or -$50,000, with probabilities shown in the
payoff distribution.
12.5 Decision Tree Solution 159
Rollback Method
If we have a method for determining certainty equivalents (expected values for a risk
neutral decision maker), we don't need to examine every possible strategy explicitly.
Instead, the method known as rollback determines the single best strategy.
The rollback algorithm, sometimes called backward induction or "average out and fold
back," starts at the terminal nodes of the tree and works backward to the initial decision
node, determining the certainty equivalent rollback values for each node. Rollback values
are determined as follows:
• At a terminal node, the rollback value equals the terminal value.
• At an event node, the rollback value for a risk neutral decision maker is
determined using expected value; the branch probability is multiplied times the
successor rollback value, and the products are summed.
• At a decision node, the rollback value is set equal to the highest rollback value
on the immediate successor nodes.
In TreePlan tree diagrams the rollback values are located to the left and below each
decision, event, and terminal node. Terminal values and rollback values for the DriveTek
problem are shown below.
Figure 12.37
0.5
Electronic success
$150,000
0.5 Try electronic method $150,000
Awarded contract
$90,000 0.5
$90,000 Electronic failure
$30,000
$30,000
0.7
Magnetic success
Prepare proposal $120,000
Try magnetic method $120,000
$20,000
$84,000 0.3
Magnetic failure
$0
$0
0.5
$20,000 Not awarded contract
-$50,000
-$50,000
Optimal Strategy
After the rollback method has determined certainty equivalents for each node, the optimal
strategy can be identified by working forward through the tree. At the initial decision
node, the $20,000 rollback value equals the rollback value of the "Prepare proposal"
branch, indicating the alternative that should be chosen. DriveTek will either be awarded
the contract or not; there is a subsequent decision only if DriveTek obtains the contract.
(In a more complicated decision tree, the optimal strategy must include decision choices
for all decision nodes that might be encountered.) At the decision node following
"Awarded contract," the $90,000 rollback value equals the rollback value of the "Try
electronic method" branch, indicating the alternative that should be chosen.
Subsequently, if the electronic method fails, DriveTek must use the mechanical method
to satisfy the contract.
Cell B26 has the formula =IF(A27=E20,1,IF(A27=E34,2)) which displays 1, indicating
that the first branch is the optimal choice. Thus, the initial choice is to prepare the
proposal. Cell J11 has the formula =IF(I12=M4,1,IF(I12=M11,2,IF(I12=M21,3))) which
displays 2, indicating that the second branch (numbered 1, 2, and 3, from top to bottom)
is the optimal choice. If awarded the contract, DriveTek should try the electronic method.
The pairs of rollback values at the relevant decision nodes ($20,000 and $90,000) and the
preferred decision branches are shown below in bold.
12.5 Decision Tree Solution 161
Figure 12.38
A B C D E F G H I J K L M N O P Q R S
1
2 Use mechanical method
3 $80,000
4 $80,000
5
6 0.5
7 Electronic success
8 $150,000
9 0.5 Try electronic method $150,000
10 Awarded contract
11 2 $90,000 0.5
12 $90,000 Electronic failure
13 $30,000
14 $30,000
15
16 0.7
17 Magnetic success
18 Prepare proposal $120,000
19 Try magnetic method $120,000
20 $20,000
21 $84,000 0.3
22 Magnetic failure
23 $0
24 $0
25
26 1 0.5
27 $20,000 Not awarded contract
28 -$50,000
29 -$50,000
30
31
32 Don't prepare proposal
33 $0
34 $0
Taking into account event branches with subsequent terminal nodes, all branches and
terminal values associated with the optimal risk neutral strategy are shown below.
162 Chapter 12 Decision Trees Using TreePlan
Figure 12.39
0.5
Electronic success
$150,000
0.5 Try electronic method
Awarded contract
0.5
Electronic failure
$30,000
0.7
Magnetic success
Prepare proposal $120,000
Try magnetic method
0.3
Magnetic failure
$0
0.5
Not awarded contract
-$50,000
The rollback method has identified strategy 2 (Electronic) as optimal. The rollback value
on the initial branch of the optimal strategy is $20,000, which must be the same as the
expected value for the payoff distribution of strategy 2. Some of the intermediate
calculations for the rollback method differ from the calculations for the payoff
distributions, but both approaches identify the same optimal strategy with the same initial
expected value. For decision trees with a large number of strategies, the rollback method
is more efficient.
(assuming that Newox has actually found gas). Alternatively, if gas is found, Newox can
decide to keep the well instead of selling to West Gas; in this case Newox manages the
gas production and takes its chances by selling the gas on the open market.
At the current price of natural gas, if gas is found it would have a value of $150,000 on
the open market. However, there is a possibility that the price of gas will rise to double its
current value, in which case a successful well will be worth $300,000.
The company's engineers feel that the chance of finding gas is 30 percent; their staff
economist thinks there is a 60 percent chance that the price of gas will double.
following decision tree is based on information about cash flows and probability
assignments.
Figure 12.40
0.6
Success
+$5,000
National
0.4
Failure
-$1,000
0.93
Success
+$4,965
National
0.07
0.58 Failure
Favorable -$1,035
Don't
Brandon -$35
Test
0.14
Success
+$4,965
National
0.86
0.42 Failure
Unfavorable -$1,035
Don't
-$35
Don't
$0
0.4
Failure
-$1,000
0.93
Success
+$4,965
National
0.07
0.58 Failure
Favorable -$1,035
Don't
Brandon -$35
Test
0.14
Success
+$4,965
National
0.86
0.42 Failure
Unfavorable -$1,035
Don't
-$35
Don't
$0
166 Chapter 12 Decision Trees Using TreePlan
0.4
Failure
-$1,000
0.93
Success
+$4,965
National
0.07
0.58 Failure
Favorable -$1,035
Don't
Brandon -$35
Test
0.14
Success
+$4,965
National
0.86
0.42 Failure
Unfavorable -$1,035
Don't
-$35
Don't
$0
12.7 Brandon Decision Tree Problem 167
0.4
Failure
-$1,000
0.93
Success
+$4,965
National
0.07
0.58 Failure
Favorable -$1,035
Don't
Brandon -$35
Test
0.14
Success
+$4,965
National
0.86
0.42 Failure
Unfavorable -$1,035
Don't
-$35
Don't
$0
168 Chapter 12 Decision Trees Using TreePlan
0.4
Failure
-$1,000
0.93
Success
+$4,965
National
0.07
0.58 Failure
Favorable -$1,035
Don't
Brandon -$35
Test
0.14
Success
+$4,965
National
0.86
0.42 Failure
Unfavorable -$1,035
Don't
-$35
Don't
$0
12.7 Brandon Decision Tree Problem 169
0.4
Failure
-$1,000
0.93
Success
+$4,965
National
0.07
0.58 Failure
Favorable -$1,035
Don't
Brandon -$35
Test
0.14
Success
+$4,965
National
0.86
0.42 Failure
Unfavorable -$1,035
Don't
-$35
Don't
$0
170 Chapter 12 Decision Trees Using TreePlan
0.4
Failure
-$1,000
0.93
Success
+$4,965
National
0.07
0.58 Failure
Favorable -$1,035
Don't
Brandon -$35
Test
0.14
Success
+$4,965
National
0.86
0.42 Failure
Unfavorable -$1,035
Don't
-$35
Don't
$0
Sensitivity Analysis
for Decision Trees
13
13.1 ONE-VARIABLE SENSITIVITY ANALYSIS
One-Variable Sensitivity Analysis using an Excel data table
1. Construct a decision tree model or financial planning model.
2. Identify the model input cell (H1) and model output cell (A10).
3. Modify the model so that probabilities will always sum to one. (That is, enter the
formula =1-H1 in cell H6.)
Figure 13.2
M N O P
1
2 +$100 =A10
3 0.00
4 0.10
5 0.20
6 0.30
7 0.40
8 0.50
9 0.60
10 0.70
11 0.80
12 0.90
13 1.00
14
8. In the Data Table dialog box, select the Column Input Cell edit box. Type the
model input cell (H1), or point to the model input cell (in which case the edit
box displays $H$1). Click OK.
Figure 13.3
9. The Data Table command substitutes each input value into the model input cell,
recalculates the worksheet, and displays the corresponding model output value
in the table.
10. Optional: Change the formula in cell O2 to
=CHOOSE(B9,”Introduce”,”Don’t”).
13.2 Two-Variable Sensitivity Analysis 173
Figure 13.4
M N O P
1 P(High Sales) Exp. Value
2
3 0.00 0
4 0.10 0
5 0.20 0
6 0.30 0
7 0.40 0
8 0.50 50
9 0.60 100
10 0.70 150
11 0.80 200
12 0.90 250
13 1.00 300
14
Optional: Activate the Base Case worksheet. From the Edit menu, choose Move Or Copy
Sheet. In the Move Or Copy dialog box, check the box for Create A Copy, and click OK.
Double-click the new worksheet tab and enter Strategy Region Table.
Figure 13.6
U V W X Y Z AA AB AC AD AE AF AG
1 P(Mag OK)
2 Elec 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
3 P(Elec OK) 1.0
4 0.9
5 0.8
6 0.7
7 0.6
8 0.5
9 0.4
10 0.3
11 0.2
12 0.1
13 0.0
From the Data menu, choose Table. In the Table dialog box, type P16 in the Row Input
Cell edit box, type P6 in the Column Input Cell edit box, and click OK.
With cells V2:AG13 still selected, click the Align Right button.
Figure 13.7
U V W X Y Z AA AB AC AD AE AF AG
1 P(Mag OK)
2 Elec 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
3 P(Elec OK) 1.0 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec
4 0.9 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec
5 0.8 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec
6 0.7 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Mag
7 0.6 Elec Elec Elec Elec Elec Elec Elec Elec Elec Mag Mag
8 0.5 Elec Elec Elec Elec Elec Elec Elec Elec Mag Mag Mag
9 0.4 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag
10 0.3 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag
11 0.2 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag
12 0.1 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag
13 0.0 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag
Embellishments
Select cells U1:AG13, and click the Copy button. Select cell AI1, right-click, and from
the shortcut menu choose Paste Special. In the Paste Special dialog box, click the Values
option button, and click OK. Right-click again, choose Paste Special, click the Formats
option button, and click OK.
Select columns AJ:AU. Choose Format | Cells | Width, type 5, and click OK.
Select cell AJ2, right-click, and from the shortcut menu choose Clear Contents. Select
cells AK2:AU2, move the cursor near the border of the selection until it becomes an
arrow, click and drag the selection down to cells AK14:AU14. Similarly, select cell AK1
and move its contents down to cell AP15. Also, move the contents of cell AI3 to cell AI8.
Select cell AN1, and enter Strategy Region Table.
176 Chapter 13 Sensitivity Analysis for Decision Trees
Figure 13.8
AI AJ AK AL AM AN AO AP AQ AR AS AT AU
1 Strategy Region Table
2
3 1.0 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec
4 0.9 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec
5 0.8 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec
6 0.7 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Mag
7 0.6 Elec Elec Elec Elec Elec Elec Elec Elec Elec Mag Mag
8 P(Elec OK) 0.5 Elec Elec Elec Elec Elec Elec Elec Elec Mag Mag Mag
9 0.4 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag
10 0.3 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag
11 0.2 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag
12 0.1 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag
13 0.0 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag
14 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
15 P(Mag OK)
Apply borders to appropriate ranges and cells to show the strategy regions. Apply
shading to cell AR8 to show the base case strategy.
Figure 13.9
AI AJ AK AL AM AN AO AP AQ AR AS AT AU
1 Strategy Region Table
2
3 1.0 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec
4 0.9 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec
5 0.8 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec
6 0.7 Elec Elec Elec Elec Elec Elec Elec Elec Elec Elec Mag
7 0.6 Elec Elec Elec Elec Elec Elec Elec Elec Elec Mag Mag
8 P(Elec OK) 0.5 Elec Elec Elec Elec Elec Elec Elec Elec Mag Mag Mag
9 0.4 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag
10 0.3 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag
11 0.2 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag
12 0.1 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag
13 0.0 Mech Mech Mech Mech Mech Mech Mech Mag Mag Mag Mag
14 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
15 P(Mag OK)
Keep same relative likelihood (base case) for the other probabilities.
Figure 13.10
A B C D E F G H I J K L M N O
1 0.2 P(Low Sales) OptStrat
2 High Sales
3 +$1,500 1.00 Don't
4 +$2,500 +$1,500 0.90 Don't
5 0.80 Don't
6 0.5 0.70 Don't
7 Intro Medium Sales 0.60 Intro
8 +$500 0.50 Intro
9 -$1,000 +$400 +$1,500 +$500 0.40 Intro
10 Base -> 0.30 Intro
11 0.3 0.20 Intro
12 Low Sales 0.10 Intro
13 1 -$500 0.00 Intro
14 +$400 +$500 -$500
15
16
17 Don't
18 $0
19 $0 $0
Figure 13.11
A B C D E F G H I J K L M N O
1 =(0.2/(0.2+0.5))*(1-H11) P(Low Sales) OptStrat
2 High Sales =CHOOSE(B13,"Intro","Don't")
3 1.00
4 0.90
5 0.80
6 =(0.5/(0.2+0.5))*(1-H11) 0.70
7 Intro Medium Sales 0.60
8 0.50
9 0.40
10 Base -> 0.30
11 0.3 0.20
12 Low Sales 0.10
13 0.00
14
15
16
17 Don't
18
19
Developing attributes for these three objectives turns out to be relatively straightforward.
Disposable income can be measured directly by calculating monthly take-home pay
minus average monthly rent (being careful to include utilities) for an appropriate
apartment. The second attribute is annual snowfall. For the third attribute, Robin has
located a magazine survey of large cities that scores those cities as places for single
professionals to live. Although the survey is not perfect from Robin's point of view, it
does capture the main elements of her concern about the quality of the singles community
and available activities. Also all three of the cities under consideration are included in the
survey.
Here are descriptions of the three job offers:
1 MPR Manufacturing in Flagstaff, Arizona. Disposable income estimate: $1600
per month. Snowfall range: 150 to 320 cm per year. Magazine score: 50 (out of
100).
2 Madison Publishing in St. Paul, Minnesota. Disposable income estimate: $1300
to $1500 per month. (This uncertainty here is because Robin knows there is a
wide variety in apartment rental prices and will not know what is appropriate
and available until spending some time in the city.) Snowfall range: 100 to 400
cm per year. Magazine score: 75.
3 Pandemonium Pizza in San Francisco, California. Disposable income estimate:
$1200 per month. Snowfall range: negligible. Magazine score: 95.
Robin has created a decision tree to represent the situation. The uncertainty about
snowfall and disposable income are represented by the chance nodes as Robin has
included them in the tree. The ratings in the consequence matrix are such that the worst
consequence has a rating of zero points and the best has 100.
Ratings in the consequence matrix (three attribute values at each endpoint of the decision
tree) are proportional scores, corresponding to linear individual utility over the range of
possible values for each attribute.
After considering the situation, Robin concludes that the quality of the city is most
important, the amount of snowfall is next, and the third is income. (Income is important,
but the variation between $1200 and $1600 is not enough to make much difference to
Robin.) Furthermore, Robin concludes that the weight of the magazine rating in the
consequence matrix should be 1.5 time the weight for the snowfall rating and three times
as much as the weight for the income rating. This information is used to calculate the
weights for the three attributes and to calculate overall scores for each of the endpoints in
the decision tree.
13.4 Robin Pinelli's Sensitivity Analysis 179
0.3
Introduce Product Medium Sales
$100,000
-$300,000 $400,000
0.2
Low Sales
1 -$200,000
$100,000
Don't Introduce
$0
$0
0.3
Introduce Product Medium Sales
$100,000
$190,000
0.2
Low Sales
1 -$200,000
$190,000
Don't Introduce
$0
The two figures above show what is called the prior problem, i.e., the decision problem
under uncertainty before obtaining any additional information.
14.2 Expected Value of Perfect Information 183
High Sales
Don't Introduce
High Sales
Don't Introduce
High Sales
Don't Introduce
Before you get a perfect prediction, you are uncertain about what that prediction will be.
If you originally think the probability of High Sales is 0.5, then you should also think the
probability is 0.5 that a perfect prediction will tell you that sales will be high.
After you get a prediction of "High Sales," the probability of actually having high sales is
1.0.
184 Chapter 14 Value of Information in Decision Trees
0.0
Introduce Product Medium Sales
$100,000
$400,000
0.5 0.0
"High Sales" Low Sales
1 -$200,000
$400,000
Don't Introduce
$0
0.0
High Sales
$400,000
1.0
Introduce Product Medium Sales
$100,000
$100,000
0.3 0.0
Perfect Prediction "Medium Sales" Low Sales
1 -$200,000
$230,000 $100,000
Don't Introduce
$0
0.0
High Sales
$400,000
0.0
Introduce Product Medium Sales
$100,000
-$200,000
0.2 1.0
"Low Sales" Low Sales
2 -$200,000
$0
Don't Introduce
$0
For a perfect prediction, the information message "Low Sales" is the same as the event
Low Sales, so the detailed structure shown above is not needed.
Introduce Product
0.5 $400,000
High Sales
1
$400,000
Don't Introduce
$0
Introduce Product
0.3 $100,000
Perfect Prediction Medium Sales
1
$230,000 $100,000
Don't Introduce
$0
Introduce Product
0.2 -$200,000
Low Sales
2
$0
Don't Introduce
$0
Figure 14.6 Payoff Table for Prior Problem with Expected Values
Alternatives
Probability Event Introduce Don't
0.5 High Sales $400,000 $0
0.3 Medium Sales $100,000 $0
0.2 Low Sales -$200,000 $0
For each row in the body of the payoff table, if you receive a perfect prediction that the
event in that row will occur, which alternative would you choose and what would your
payoff be?
Before you receive the prediction, you don't know which of the payoffs you will receive
(either $400,000 or $100,000 or $0), so you summarize the payoff distribution using
expected value, EVPP.
In the example, the improvements associated with a perfect prediction of high, medium,
and low are $0, $0, and $200,000, respectively, with probabilities 0.5, 0.3, 0.2.
EVPI = Expected Improvement = 0.5*0 + 0.3*0 + 0.2*200,000 = $40,000
Introduce Product
Low Sales
Success Prediction
Don't Introduce
High Sales
Introduce Product
Low Sales
Market Survey Inconclusive
Don't Introduce
High Sales
Introduce Product
Low Sales
Failure Prediction
Don't Introduce
High Sales
Introduce Product
Low Sales
Don't Survey
Don't Introduce
14.2 Expected Value of Perfect Information 189
0.5
Electronic success
+$150,000
0.5 Try electronic method +$150,000
Awarded contract
2 +$90,000 0.5
+$90,000 Electronic failure
+$30,000
+$30,000
0.7
Magnetic success
Prepare proposal +$120,000
Try magnetic method +$120,000
+$20,000
+$84,000 0.3
Magnetic failure
$0
$0
No Additional Information
1 1 0.5
+$20,000 +$20,000 Not awarded contract
-$50,000
-$50,000
0.5
Electronic success
+$150,000
0.5 Try electronic method +$150,000
Awarded contract
3 +$90,000 0.5
+$120,000 Electronic failure
+$30,000
+$30,000
1.0
2 Magnetic success
+$30,500 Prepare proposal +$120,000
Try magnetic method +$120,000
+$35,000
+$120,000 0.0
Magnetic failure
$0
0.7 $0
"Magnetic Success"
1 0.5
+$35,000 Not awarded contract
-$50,000
-$50,000
0.5
Perfect Prediction Electronic success
+$150,000
+$30,500 0.5 Try electronic method +$150,000
Awarded contract
2 +$90,000 0.5
+$90,000 Electronic failure
+$30,000
+$30,000
0.0
Magnetic success
Prepare proposal +$120,000
Try magnetic method +$120,000
+$20,000
$0 1.0
Magnetic failure
$0
0.3 $0
"Magnetic Failure"
1 0.5
+$20,000 Not awarded contract
-$50,000
-$50,000
Use mechanical
-120000
-120000 -120000
0.5
Electronic success
-50000
Try electronic 0 -50000
0.7
Magnetic success
-80000
Try magnetic 0 -80000
Use mechanical
-120000
-120000 -120000
1
Electronic success
-50000
0.5 Try electronic 0 -50000
"Electronic success"
2 -50000 -50000 0
0 -50000 Electronic failure
-170000
-120000 -170000
0.7
Magnetic success
-80000
Try magnetic 0 -80000
Use mechanical
-120000
-120000 -120000
0
Electronic success
-50000
0.5 Try electronic 0 -50000
"Electronic failure"
3 -50000 -170000 1
0 -116000 Electronic failure
-170000
-120000 -170000
0.7
Magnetic success
-80000
Try magnetic 0 -80000
Use mechanical
-120000
-120000 -120000
0.5
Electronic success
-50000
0.7 Try electronic 0 -50000
"Magnetic success"
3 -50000 -110000 0.5
0 -80000 Electronic failure
-170000
-120000 -170000
1
Magnetic success
-80000
Try magnetic 0 -80000
-80000 -80000 0
Magnetic failure
-200000
-89000 -120000 -200000
Use mechanical
-120000
-120000 -120000
0.5
Electronic success
-50000
0.3 Try electronic 0 -50000
"Magnetic failure"
2 -50000 -110000 0.5
0 -110000 Electronic failure
-170000
-120000 -170000
0
Magnetic success
-80000
Try magnetic 0 -80000
-80000 -200000 1
Magnetic failure
-200000
-120000 -200000
14.3 DriveTek Post-Contract-Award Problem 193
1
Electronic success
-50000
0.7 Try electronic 0 -50000
"Magnetic success"
2 -50000 -50000 0
0 -50000 Electronic failure
-170000
-120000 -170000
1
Magnetic success
-80000
Try magnetic 0 -80000
Use mechanical
-120000
-120000 -120000
1
Electronic success
-50000
0.3 Try electronic 0 -50000
"Magnetic failure"
2 -50000 -50000 0
0 -50000 Electronic failure
-170000
-120000 -170000
0
Magnetic success
-80000
Try magnetic 0 -80000
-80000 -200000 1
Magnetic failure
-200000
-71000 -120000 -200000
Use mechanical
-120000
-120000 -120000
0
Electronic success
-50000
0.7 Try electronic 0 -50000
"Magnetic success"
3 -50000 -170000 1
0 -80000 Electronic failure
-170000
-120000 -170000
1
Magnetic success
-80000
Try magnetic 0 -80000
Use mechanical
-120000
-120000 -120000
0
Electronic success
-50000
0.3 Try electronic 0 -50000
"Magnetic failure"
1 -50000 -170000 1
0 -120000 Electronic failure
-170000
-120000 -170000
0
Magnetic success
-80000
Try magnetic 0 -80000
-80000 -200000 1
Magnetic failure
-200000
-120000 -200000
194 Chapter 14 Value of Information in Decision Trees
Imperfect Information
An engineer at Technometrics has developed a simple test device to evaluate the
component before shipping. For each component, the test device registers positive,
inconclusive, or negative. The test is not perfect, but it is consistent for a particular
component; that is, the test yields the same result for a given component regardless of
how many times it is tested. To calibrate the test device, it was run on a batch of known
good components and on a batch of know defective components. The results in the table
below, based on relative frequencies, show the probability of a test device result,
conditional on the true condition of the component.
For example, of the known defective components tested, sixty percent had a negative test
result.
An analyst at Technometrics suggested using Bayesian revision of probabilities to
combine the assessments about the reliability of the test device (shown above) with the
original assessment of the components' condition (25 percent defectives).
Technometrics uses expected monetary value for making decisions under uncertainty.
What is the maximum (per component) the company should be willing to pay for using
the test device?
Six possible outcomes (most detailed description of result of random process), described
by test result and component condition
Figure 15.6 Joint Frequency Table with Row and Column Totals
Component Condition
Test Result Good Defective
Positive 210 10 220
Inconclusive 60 30 90
Negative 30 60 90
300 100 400
198 Chapter 15 Value of Imperfect Information
Figure 15.7 Joint Probability Table with Row and Column Totals
Component Condition
Test Result Good Defective
Positive 0.525 0.025 0.550
Inconclusive 0.150 0.075 0.225
Negative 0.075 0.150 0.225
0.750 0.250 1.000
15.1 Technometrics Problem 199
Revision of Probability
U V W X Y
1 Prior 0.75 0.25 = P(Main)
2 Likelihood Good Bad
3 Positive 0.7 0.1 = P(Info | Main)
4 Inconclusive 0.2 0.3
5 Negative 0.1 0.6
6
7 Joint Good Bad Preposterior
8 Positive 0.525 0.025 0.550 = P(Info)
9 Inconclusive 0.150 0.075 0.225
10 Negative 0.075 0.150 0.225
11
12 Posterior Good Bad
13 Positive 0.9545 0.0455 = P(Main | Info)
14 Inconclusive 0.6667 0.3333
15 Negative 0.3333 0.6667
by finding a value on the horizontal axis, scanning up to the plotted curve, and looking
left to the vertical axis to determine the utility.
A typical risk utility function might have the general shape shown below if you draw a
smooth curve approximately through the points.
1.0
0.9
0.8
Utility U(x) or Expected Utility
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
-$50,000 -$25,000 $0 $25,000 $50,000 $75,000 $100,000 $125,000 $150,000
Monetary Value x or Certainty Equivalent
Since more value generally means more utility, the utility function is monotonically non-
decreasing, and its inverse is well-defined. On the plot of the utility function, you locate a
utility on the vertical axis, scan right to the plotted curve, and look down to read the
corresponding value.
The concept of a payoff distribution, risk profile, gamble, or lottery is important for
discussing utility functions. A payoff distribution is a set of payoffs, e.g., x1, x2, and x3,
with corresponding probabilities, P(X=x1), P(X=x2), and P(X=x3). For example, a
payoff distribution may be represented in decision tree form as shown below.
16.1 Risk Utility Function 203
P(X=x 2 )
x2
P(X=x 3 )
x3
The fundamental property of a utility function is that the utility of the certainty equivalent
CE of a payoff distribution is equal to the expected utility of the payoffs, i.e,
U(CE) = P(X=x1)*U(x1) + P(X=x2)*U(x2) + P(X=x3)*U(x3).
It follows that if you compute the expected utility (EU) of a lottery,
EU = P(X=x1)*U(x1) + P(X=x2)*U(x2) + P(X=x3)*U(x3),
the certainty equivalent of the payoff distribution can be determined using the inverse of
the utility function. That is, you locate the expected utility on the vertical axis, scan right
to the plotted curve, and look down to read the corresponding certainty equivalent.
If a utility function has been determined, you can use this fundamental property to
determine the certainty equivalent of any payoff distribution. Calculations for the
Magnetic strategy in the DriveTek problem are shown below. First, using a plot of the
utility function, locate each payoff x on the horizontal axis and determine the
corresponding utility U(x) on the vertical axis. Second, compute the expected utility EU
of the lottery by multiplying each utility by its probability and summing the products.
Third, locate the expected utility on the vertical axis and determine the corresponding
certainty equivalent CE on the horizontal axis.
-$8,000 CE
204 Chapter 16 Modeling Attitude Toward Risk
Computed values are displayed with four decimal places, but Excel's 15-digit precision is
used in all calculations. For a decision maker with a risk tolerance parameter of
$100,000, the payoff distribution for the Magnetic strategy has a certainty equivalent of
-$7,676. That is, if the decision maker is facing the payoff distribution shown in A9:B12
in Figure 4, he or she would be willing to pay $7,676 to be relieved of the obligation.
Formulas are shown in Figure 5. To construct the worksheet, enter the text in column A
and the monetary values in column B. To define names, select A2:B4, and choose Insert |
Name | Create. Similarly, select A6:B7, and choose Insert | Name | Create. Then enter the
formulas in B6:B7. Enter formulas in C10 and D10, and copy down. Finally, enter the
EU formula in D13 and the CE formula in D15. The defined names are absolute
references by default.
206 Chapter 16 Modeling Attitude Toward Risk
Figure 6 shows results for the same payoff distribution using a simplified form of the
exponential risk utility function with A = 1 and B = 1. This function could be represented
as U(x) = 1–EXP(–x/RT) with inverse CE = –RT*LN(1–EU). The utility and expected
utility calculations are different, but the certainty equivalent is the same.
0.5
Tails
-$Y/2
Don't
$0
For example, in a personal decision, you may be willing to play the game shown in
Figure 7 with equally-likely payoffs of $100 and –$50, but you might not play with
payoffs of $100,000 and –$50,000. As the better payoff increases from $100 to $100,000
(and the corresponding worse payoff increases from –$50 to –$50,000), you reach a value
where you are indifferent between playing the game and receiving $0 for certain. At that
point, the value of the better payoff is an approximation of RT for an exponential risk
utility function describing your risk attitude.
In a business decision for a small company, the company may be willing to play the game
with payoffs of $200,000 and –$100,000 but not with payoffs of $20,000,000 and
-$10,000,000. Somewhere between a better payoff of $200,000 and $20,000,000, the
company would be indifferent between playing the game and not playing, thereby
determining the approximate RT for their business decision.
0.5
Better Payoff
Certainty
Equivalent =
0.5
Worse Payoff
According to the fundamental property of a risk utility function, the utility of the
certainty equivalent equals the expected utility of the lottery, so the three values are
related as follows.
U(CertEquiv) = 0.5*U(BetterPayoff) + 0.5*U(WorsePayoff)
If you use the general form for an exponential utility function with parameters A, B, and
RT, and if you simplify terms, it follows that RT must satisfy the following equation.
Exp(–CertEquiv/RT) = 0.5*Exp(–BetterPayoff/RT) + 0.5*Exp(–WorsePayoff/RT)
Given the values for CE, Better, and Worse, you could use trial-and-error to find the
value of RT that exactly satisfies the equation. In Excel you can use Goal Seek or Solver
by creating a worksheet like Figure 9.
Enter the text in column A. Enter the assessment lottery values in B2:B4. Enter a
tentative RT value in B6. Select A2:B4, and use Insert | Name | Create; repeat for A6:B6
and A8:B9. Note that the parentheses symbol is not allowed in a defined name, so Excel
changes U(CE) to U_CE and EU(Lottery) to EU_Lottery.
16.4 Exact Risk Tolerance Using Excel 209
Figure 10 shows tentative values for the search. From the Tools menu, choose Goal Seek.
In the Goal Seek dialog box, enter B11, 0, and B6. If you point to cells, the reference
appears in the edit box as an absolute reference, as shown in Figure 11. Click OK.
210 Chapter 16 Modeling Attitude Toward Risk
The Goal Seek Status dialog box shows that a solution has been found. Click OK. The
worksheet appears as shown in Figure 12.
The difference between U(CE) and EU(Lottery) is not exactly zero. If you start at
$250,000, the Goal Seek converges to a difference of –6.2E–05 or 0.000062, which is
closer to zero, resulting in a RT of $243,041.
If extra precision is needed, use Solver. With Solver's default settings, the difference is
2.39E–08 with RT equal to $243,261. If you change the precision from 0.000001 to
0.00000001 or an even smaller value in Solver's Options, the difference will be even
closer to zero.
16.5 Exact Risk Tolerance Using RiskTol.xla 211
Figure 16.17
M N O P Q R S T U V W X Y Z
1 RiskTolerance CE Process 1 CE Process 2
2
3 $5,000 -$25,597 -$37,262 AJS
4 $10,000 $3,504 -$10,097
5 $15,000 $23,904 $10,897 $100,000
6 $20,000 $37,468 $26,409
Process 2
7 $25,000 $46,811 $38,010
8 $30,000 $53,528 $46,998 $80,000
9 $35,000 $58,541 $54,184
10 $40,000 $62,404 $60,067 Process 1
$60,000
11 $45,000 $65,459 $64,972
Certainty Equivalent
Utility Function
1.0
Utility or Expected Utility, U(x)
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
-50000 -25000 0 25000 50000 75000 100000 125000 150000
Value or Certainty Equivalent, x
216 Chapter 16 Modeling Attitude Toward Risk
0.5
Heads
+$Y +$10 +$100 +$1,000 +$10,000 +$100,000 +$200,000 +$300,000
Play
0.5
Tails
-$Y/2 -$5 -$50 -$500 -$5,000 -$50,000 -$100,000 -$150,000
Don't
$0 $0 $0 $0 $0 $0 $0 $0
more less
risk risk
aversion aversion
Part 4 reviews basic concepts of data analysis and uses multiple regression to model
relationships for both cross-sectional and time series data.
The spreadsheet analysis uses Excel's standard Analysis ToolPak. Several chapters
include step-by-step instructions for descriptive statistics, histograms, and multiple
regression.
220
Categorical Measure
also called qualitative measure
assign a category level to each object of analysis
Nominal Measure: simple classification, "assign a name"
Ordinal Measure: ranked categories, "assign an ordered classification"
Numerical Measure
also called quantitative measure
assign a numerical value to each object of analysis
Interval Measure:, rankings and numerical differences are meaningful
222 Chapter 17 Introduction to Data Analysis
Appropriate summary measures for shape are Excel's SKEW worksheet function and
Pearson's coefficient of skewness.
Distribution Shapes
Value
In a distribution with positive skew, the mean is greater than the median.
Value
224 Chapter 17 Introduction to Data Analysis
In a distribution with negative skew, the mean is less than the median.
Value
Value
In a bimodal distribution, there is often a distinguishing characteristic for the two groups
of data that have been combined into a single distribution.
Univariate Numerical Data
18
Excel includes several analysis tools useful for summarizing single-variable data. The
Descriptive Statistics analysis tool provides measures of central tendency, variability, and
skewness. The Histogram analysis tool provides a frequency distribution table,
cumulative frequencies, and the histogram column chart.
These tools are appropriate for data without any time dimension. If the data were
collected over time, first examine a time sequence plot of the data to detect patterns. If
the time sequence plot appears random, then the univariate tools may be used to
summarize the data.
If the Data Analysis command doesn't appear on the Tools menu, choose the Add-Ins
command from the Tools menu; in the Add-Ins Available list box, check the box next to
Analysis Tools. If Analysis Tools doesn't appear in the Add-Ins Available list box, you
may need to add the Analysis ToolPak through a custom installation using the Microsoft
Excel Setup program.
7. Output Range: Click the option button, click the adjacent edit box, and specify
a reference for the upper-left cell of the range where the descriptive statistics
output should appear, either by typing C1 or by clicking on cell C1 (in which
case $C$1 appears as the output range as shown in this example). Alternatively,
you can choose to send the output to a new sheet in the current workbook or to a
new sheet in a new workbook.
8. Summary statistics: This feature is the primary reason for using the Descriptive
Statistics analysis tool, so it should be selected. The summary statistics require
two columns in the output range for each data set.
9. Confidence Level for Mean: Select this checkbox to see the half-width of a
confidence interval for the mean, and type a number in the % edit box for the
desired confidence level. This example requests the half-width for a 90%
confidence interval.
10. Kth Largest: Select this checkbox if you want to know the kth largest value in
the data set, and type a number for k in the Kth Largest edit box. This example
requests the fourth largest value.
11. Kth Smallest: Select this checkbox to get the kth smallest value in the data set
and type a number for k in the Kth Smallest edit box. This example requests the
fourth smallest value.
228 Chapter 18 Univariate Numerical Data
12. When finished, click OK. Excel computes the descriptive statistics and puts the
results in the output range.
view uses different column widths and formatting for the entire worksheet. To display
only specific formulas, put a single quotation mark before the equal sign so that Excel
displays the cell contents as text, as shown in column F in Figure 18.3.
misleading measure of variation because it is based only on the two most extreme values,
which may not be representative.
The sample standard deviation (9.214 mpg) is the most widely used measure of variation
in data analysis. For each value in the data set the deviation between the value and the
mean is computed. Each deviation is squared, and the squared deviations are summed.
The sum of the squared deviations is divided by the count minus one (that is, n – 1),
obtaining the sample variance (84.890). The standard deviation equals the square root of
the variance.
The standard deviation has the same units or dimensions as the original values: mpg, in
this example. The variance is expressed in squared units: squared miles per gallon. The
standard deviation and variance reported in the output table are the sample standard
deviation and sample variance, computed using n – 1 in the denominator. To determine
the population standard deviation and population variance, computed using n in the
denominator, use the STDEVP and VARP worksheet functions.
The largest(4) and smallest(4) values in the output table are the fourth largest (33 mpg)
and fourth smallest (16 mpg) gas mileage values. To obtain similar results for all values
in the data set, use the Rank and Percentile analysis tool. These values correspond to
approximately the 75th percentile (third quartile) and 25th percentile (first quartile) in the
data set of 17 values. Interpolated values for the third and first quartiles are obtained
using the QUARTILE worksheet function, =QUARTILE(A2:A18,3) and
=QUARTILE(A2:A18,1), respectively.
The standard error of the mean (2.235 mpg) equals the sample standard deviation
divided by the square root of the sample size. The standard error is a measure of
uncertainty about the mean, and it is used for statistical inference (confidence intervals
and hypothesis tests).
The value shown for the confidence level (90.0%) (3.901 mpg) is the half-width of a 90%
confidence interval for the mean. The specified confidence level, 90% in this example,
corresponds to t = 1.746 for the t distribution with 10% in the sum of two tails and n – 1
= 17 – 1 = 16 degrees of freedom. The half-width of a confidence interval is t times the
standard error—that is, 1.7459 times 2.2346 mpg, or 3.901 mpg.
A 90% confidence interval for the mean extends from the mean minus the half-width to
the mean plus the half-width—that is, from 23.471 – 3.901 to 23.471 + 3.901, or
approximately 19.6 to 27.4 mpg. Therefore, if we think of these 17 cars as a random
sample from a larger population, we can say there is a 90% chance that the unknown
population mean is between 19.8 and 27.1 mpg.
Kurtosis measures the degree of peakedness in symmetric distributions. If a symmetric
distribution is flatter than the normal distribution—that is, if there are more values in the
tails than a corresponding normal distribution—the kurtosis measure is positive. If the
18.1 Analysis Tool: Descriptive Statistics 231
distribution is more peaked than the normal distribution—that is, if there are fewer values
in the tails—the kurtosis measure is negative. In this example, the distribution is
approximately symmetric with negative kurtosis (–0.547). (Excel computes the kurtosis
value using the fourth power of deviations from the mean. For details, search Help for
"KURT function.")
Skewness refers to the lack of symmetry in a distribution. If there are a few extreme
values in the positive direction, we say the distribution is positively skewed, or skewed to
the right. If there are a few extreme values in the negative direction, the distribution is
negatively skewed, or skewed to the left. Otherwise, the distribution is symmetric or
approximately symmetric. In this example, the measure is positive (+0.361). (Excel
computes the skewness value using the third power of deviations from the mean. For
details, search Help for "SKEW function.")
4. Input Range: Enter the reference for the range of cells containing the data
(A1:A18), including the label.
5. Bin Range: Enter the reference for the range of cells containing the values that
separate the intervals (H1:H10), including the label. These interval break points,
or bins, must be in ascending order.
6. Labels: Check this box to indicate that labels have been included in the
references for the input range and bin range.
7. Output Range: Enter the reference for the upper-left cell of the range where
you want the output table to appear (I1). The combined table and chart output
requires approximately ten columns.
8. Pareto: To obtain a standard frequency distribution and chart, clear the Pareto
checkbox. If this box is checked, the intervals are sorted according to
frequencies before preparing the chart. (In this example the box has been
cleared.)
9. Cumulative Percentage: Check this box for cumulative frequencies in addition
to the individual frequencies for each interval. (In this example the box has been
cleared.)
10. Chart Output: Check this box to obtain a histogram chart in addition to the
frequency distribution table on the worksheet. (In this example the box has been
checked.)
18.2 Analysis Tool: Histogram 235
11. After you provide inputs to the dialog box, click OK. (If you receive the error
message "Cannot add chart to a shared workbook," click the OK button. Then
click New Workbook under Output in the Histogram dialog box. Use the Edit |
Move or Copy Sheet command to copy the results to the original workbook.)
Excel puts the frequency distribution and histogram on the worksheet. As shown in
Figure 18.6, the output table in columns I and J includes the original bins specified. These
bins are actually the upper limit for each interval; that is, the bins are actually bin
boundaries.
For example, the interval associated with bin value 15 (cell I4) includes mileage values
strictly greater than 10 (the previous bin value) and less than or equal to 15. There are
two such mileage values in this data set: 12 mpg and 15 mpg. Thus, for bin value 15 the
frequency is 2 (cell J4).
Histogram Embellishments
To make the chart more like a traditional histogram and easier to interpret, make the
following changes.
1. Legend: Because only one series is shown on the chart, a legend isn't needed.
Click on the legend ("Frequency" on the right side of the chart) and press the
Delete key.
2. Plot area pattern: The plot area is the rectangular area bounded by the x and y
axes. Double-click the plot area (above the bars); in the Format Plot Area dialog
box, change Border to None and change Area to None. Click OK.
3. Y-axis labels: If you resize the chart vertically, intermediate values (0.5, 1.5,...)
may appear on the y axis, but frequencies must be integer values. Double-click
the y-axis (value axis); in the Format Axis dialog box on the Scale tab, set the
Major Unit and Minor Unit values to 1. Click OK.
236 Chapter 18 Univariate Numerical Data
4. Bar width: In traditional histograms, the bars are adjacent to each other, not
separated. Double-click one of the bars; in the Format Data Series dialog box on
the Options tab, change the gap width from 150% to 0%. Click OK.
5. X-axis labels: Double-click the x-axis (category axis); in the Format Axis dialog
box on the Alignment tab, double-click the Degrees edit box and type 0 (zero).
With this setting, the x-axis labels will be horizontal even if the chart is resized.
Click OK.
6. Chart title: Click on Histogram (chart title). Type Distribution of Gas
Mileage, hold down Alt and press Enter, type for 17 cars, and press Enter.
Click the Bold button to change from bold to normal type.
7. Y-axis title: Click on Frequency (value axis title). Click the Bold button to
change from bold to normal type.
8. X-axis title: Click on Bin. Enter Interval Maximum, in miles per gallon. Click
the Bold button to change from bold to normal type. Excel puts the x-axis values
at the center of each interval, not at the marks that separate the intervals. This
title makes it clear to the reader that these values are the maximum ones for each
interval.
9. Bar color: Columns in a dark color may print as black with no gaps, in which
case it is difficult to see the boundaries. Click on the center of one of the
columns to select the data series. Click the right mouse button, choose Format
Data Series, and click the Patterns tab. In the dialog box, leave Border at
Automatic and change Area from Automatic to None. Click OK.
To move the chart, click just inside the chart's outer border (chart area) and drag the chart
to the desired location. To resize the chart, first click the chart area and then click and
drag one of the eight handles.
When you first create a chart, Excel uses automatic scaling for the font sizes of the chart
title, the axis titles, and the axis labels. When you resize the chart, the font sizes change
and the number of axis labels displayed may change. For example, if the axis labels on
the horizontal axis have a large font size and you resize the chart to be narrow, perhaps
only every other axis label will be displayed.
One approach to chart and font sizing is to first decide the size of the chart. For this
example the chart is 6 columns wide using the standard column width of 8.43 and 14
rows high. The font size of the three titles is Arial 10, and the font size of the two axes is
Arial 8 so that all axis labels are displayed. The resulting histogram chart is shown in
Figure 18.7.
18.3 Better Histograms Using Excel 237
Histogram
4
Frequency
0
0 5 10 15 20 25 30 35 40 45 50
A histogram is usually shown in Excel as a Column chart type (vertical bars). The labels
of a Column chart are aligned under each bar as shown in Figure 18.7, and there is no
238 Chapter 18 Univariate Numerical Data
Excel feature for changing the alignment. A better histogram has a horizontal axis with
numerical labels aligned under the tick marks between the bars as shown in Figure 18.8.
To download a free Excel add-in for automatically creating a better histogram from data
on a worksheet or to view step-by-step instructions for creating a better histogram using
Excel's built-in features, go to the Better Histograms page at www.treeplan.com.
EXERCISES
Exercise 18.1 Construct a frequency distribution and histogram for the following selling
prices of 15 properties:
$26,000 $38,000 $43,600
31,000 39,600 44,800
37,400 31,200 40,600
34,800 37,200 41,800
39,200 38,400 45,200
Use intervals $5,000 wide starting at $25,000. Comment on the symmetry or skewness of
the selling prices.
Exercise 18.2 Determine measures of central tendency and dispersion for the selling
prices of the 15 properties in Exercise 18.1. Which measure(s) of central tendency should
be used to describe a typical selling price? What is the mode or modal interval?
Exercise 18.3 To verify the symmetry or skewness observed in Exercise 18.1, calculate
Pearson's coefficient of skewness.
Bivariate Numerical Data
19
A scatterplot is useful for examining the relationship between two numerical variables. In
Excel this kind of chart is called an XY (scatter) chart; other names include scatter
diagram, scattergram, and XY plot. Such a graphical display is often the first step before
fitting a curve to the data using a regression model.
Example 19.1 (Adapted from Cryer, p. 139) The data shown in Figure 19.1 were
collected in a study of real estate property valuation. The 15 properties were sold in a
particular calendar year in a particular neighborhood in a city stratified into a number of
neighborhoods. Although the data displayed are from a single year, similar data are
available for each neighborhood for a number of years. Cryer's RealProp.dat file contains
4 variables for 60 observations; these 15 properties are the first and every fourth
observation.
Because we expect that selling price might depend on square feet of living space, selling
price becomes the dependent variable and square feet the explanatory variable. Some call
the dependent variable the response variable or the y variable. Similarly, other terms for
the explanatory variable are predictor variable, independent variable, or the x variable.
Our initial purpose is to visually examine the relationship between the square feet of
living space and the selling price of the parcels. Then we will calculate two summary
measures, correlation and covariance, using both the analysis tool and functions. Finally,
we will include a third variable, assessed value of the property, and use the analysis tool
to compute pairwise correlations. In subsequent chapters we will fit straight lines and
curves to these same data using regression models.
240 Chapter 19 Bivariate Numerical Data
6. In step 3 (Chart Options) on the Titles tab, select the Chart Title edit box and
type Real Estate Properties. Don't press Enter; use the mouse or Tab key to
move among the edit boxes. Type Living Space, in Sq. Ft. for the value (x) axis
title (the horizontal axis), and Selling Price, in Thousands of Dollars for the
value (y) axis title (the vertical axis).
7. In step 3 (Chart Options) on the Gridlines tab, clear all checkboxes.
8. In step 3 (Chart Options) on the Legends tab, clear the checkbox for Show
Legend. (With only one set of data on the chart, a legend is not needed.) Click
Next.
9. In step 4 (Chart Location), verify that you want to place the chart as an object in
the current worksheet. Click Finish.
The chart is embedded on the worksheet, as shown in Figure 19.1. The property data
show a general positive relationship; more living space is associated with a higher selling
price, on the average. Follow steps 10 through 12 to obtain the embellished scatterplot
shown in Figure 19.2.
10. Change the x-axis to display 400 to 1400 square feet. Select the value (x) axis.
Right-click, choose Format Axis from the shortcut menu, and click the Scale
tab. Type 400 in the Minimum edit box, 1400 in the Maximum edit box, and 200
in the Major Unit edit box. Click OK.
11. Change the y-axis to display 20 to 50 thousands of dollars. Select the value (y)
axis. Right-click, choose Format Axis from the shortcut menu, and click the
242 Chapter 19 Bivariate Numerical Data
Scale tab; type 20, 50, and 10 in the Minimum, Maximum, and Major Unit edit
boxes. Click the Number tab and set Decimal Places to zero. Then click OK.
12. To obtain the appearance shown in Figure 19.2, click just inside the outer border
of the chart to select the chart area. Click and drag the sizing handles so the
chart is approximately 6 standard column widths by 15 rows. Click the chart title
and choose Arial Bold 12 from the formatting toolbar. For each horizontal and
vertical axis and title, click the chart object and choose Arial Regular 10 from
the formatting toolbar. Double-click the y-axis title and change the space after
the comma to a carriage return. Double-click the grey plot area and change the
pattern for both border and area to None. Select the Price data (B2:B16) and
click the Increase Decimal button several times so that three significant figures
are displayed to the right of the decimal point.
5. Click OK. The output appears in cells D2:F4 as shown in Figure 19.3. (The
discussions of CORREL function and covariance outputs follow.)
The output is a matrix of pairwise correlations. The diagonal values are 1, indicating that
each variable has perfect positive correlation with itself. The value 0.814651 is the
correlation of Price and SqFt. The upper-right section is blank, because its values would
be the same as those in the lower-left section.
The following steps describe how to use Excel's CORREL function to determine the
correlation.
1. Enter CORREL Function in cell D6.
2. Select cell D7. Click the insert Function tool button (icon fx). In the Insert
Function dialog box, select Statistical in the category list box. In the function list
box, select CORREL. Then click OK.
3. To move the CORREL dialog box, click in any open area and drag. Select the
Array1 edit box, and click and drag on the worksheet to select A2:A16. Select
the Array2 edit box, and click and drag to select B2:B16. Do not include the text
labels in row 1 in either selection. Then click OK.
The value of the correlation coefficient appears in cell D7. Alternatively, you could have
entered the formula =CORREL(A2:A16,B2:B16) by typing or by a combination of
typing and pointing. Unlike the static text output of the analysis tool, the worksheet
function is dynamic. If the data values in A2:B16 are changed, the value of the
correlation coefficient in cell D7 will change.
the Array2 edit box, and click and drag to select B2:B16. Do not include the text
labels in row 1 in either selection. Then click OK.
The population covariance value appears in cell D16. Alternatively, you could have
entered the formula =COVAR(A2:A16,B2:B16) by typing or by a combination of typing
and pointing. If the data values in A2:B16 are changed, the population covariance value
in cell D16 will change. The covariance computed by Excel's COVAR function uses n in
the denominator. In this example, n = 15, so 853.2427 = (14/15)*914.1886.
3. From the Tools menu, choose Data Analysis. From the Data Analysis dialog
box, select Correlation in the Analysis Tools list box and press OK. The
Correlation dialog box appears as shown in Figure 19.5.
246 Chapter 19 Bivariate Numerical Data
4. In the Input section, specify the location of the data in the Input Range edit box,
including the labels (A1:C16). Verify that the data is grouped in columns and be
sure the Labels box is checked.
5. In the Output Options section, click the Output Range button, click the adjacent
edit box, and specify the upper-left cell where the correlation output will be
located (E3).
6. Click OK. The output appears in cells E3:H6 as shown in Figure 19.4.
The output shows three pairwise correlations. The highest correlation, 0.814651, is
between SqFt and Price. The correlation between Assessed and Price, 0.67537, is smaller,
indicating less of a linear relationship between these two variables. The lowest
correlation, 0.424219, is between SqFt and Assessed.
If we must use a single explanatory variable to predict selling price in a linear regression
model, these correlations suggest that SqFt is a better candidate than Assessed, because
0.814651 is higher than 0.67537. If we can use two explanatory variables to predict
selling price in a multiple regression model, both SqFt and Assessed should be useful,
and there shouldn't be a problem with multicollinearity because the correlation between
these two explanatory variables is only 0.424219.
Exercises 247
EXERCISES
Exercise 19.1 (Adapted from Keller, p. 642) An economist wanted to determine how
office vacancy rates depend on average rent. She took a random sample of the monthly
office rents per square foot and the percentage of vacant office space in ten different
cities. The results are shown in the following table.
Arrange the data in appropriate columns and prepare a scatterplot. Does there appear to
be a positive or negative relationship between the two variables?
Exercise 19.4 Compute the correlation coefficient for the data in Exercise 19.3.
Comment on the direction and strength of the linear relationship.
One-Sample Inference
for the Mean
20
This chapter covers the basic methods of statistical inference for the mean of a single
population. These methods are appropriate for a single random sample consisting of
values for a single variable. For example, a random sample of a particular brand of tires
would be used to construct a confidence interval for the average mileage of all tires of
that brand or to test the hypothesis that the average mileage of all tires is at least 40,000
miles.
the alternative hypothesis, HA. Usually, the alternative hypothesis is a statement about
what we are trying to show or prove. For example, to detect if the mean of monthly
accounts is significantly less than $70, the alternative hypothesis is HA: Mean < 70.
The null hypothesis is the opposite of the alternative hypothesis-that is, H0: Mean ≥ 70 or
simply H0: Mean = 70. Using the hypothesis test method, develop the distribution of
sample results that would be expected if the null hypothesis is true. Then compare the
particular sample result with this sampling distribution. If the sample result is one that is
likely to be obtained when the null hypothesis is true, we cannot reject the null
hypothesis, and we cannot conclude that the alternative hypothesis is true. On the other
hand, if the sample result is one that is unlikely to occur when the null hypothesis is true,
reject the null hypothesis and conclude the alternative hypothesis may be true.
null hypothesis. The end result of using this approach is a decision to either reject or not
reject the null hypothesis.
The other way to summarize the results of a hypothesis test is to report a p-value
(probability value, or prob-value). Using this reporting approach, we do not specify a
significance level or make a decision about rejecting the null hypothesis. Instead, we
simply report how likely it is that the observed sample result, or a sample result more
extreme, could be obtained if the null hypothesis is true. In a left-tail or right-tail test, we
report the probability in a single tail; in a two-tail test, we report the probability of
obtaining a difference (between the observed sample mean and the hypothesized
population mean) in either direction. A small p-value is associated with a more extreme
sample result-that is, a sample mean that is significantly different from the hypothesized
population mean.
252 Chapter 20 One-Sample Inference for the Mean
5. Click the Options tab of the Add Trendline dialog box, as shown in Figure 21.3.
6. On the Add Trendline Options tab, select the Automatic: Linear (Series1) button
for Trendline Name. Be sure the checkbox for Set Intercept is clear. Click to put
checks in the Display Equation on Chart and Display R-squared Value on Chart
checkboxes, as shown in Figure 21.3. Then click OK. The trendline, equation,
and R2 are inserted on the scatterplot as shown in Figure 21.4.
Trendline Interpretation
We can answer the question "What is the average relationship?" by examining the fitted
equation y = 0.021x + 18.789, which may be written as
Predicted Price = 18.789 + 0.021 * SqFt.
The y-intercept or constant term in the equation is 18.789, measured in the same units as
the y variable. Naively, the constant term says that a property with zero square feet of
living space has a selling price of 18.789 thousands of dollars. However, there are no
properties with fewer than 521 square feet in our data, so this constant can be considered
a starting point that is relevant for properties with living space between 521 and 1,298
square feet.
The slope or regression coefficient, 0.021, indicates the average change in the y variable
for a unit change in the x variable. The measurement units in this example are 0.021
thousands of dollars per square foot, or $21 per square foot. If two properties differ by
100 square feet of living space, we expect the selling prices to differ by 0.021 * 100 = 2.1
thousands of dollars, or $2,100.
One popular way to answer the question "How good is the relationship?" is to examine
the value for R2, which measures the proportion of variation in the dependent variable, y,
that is explained using the x variable and the regression line. Here the R2 value of 0.6637
indicates that approximately 66% of the variation in selling prices can be explained by a
linear model using living space. Perhaps the remaining 34% of the variation can be
explained using other property characteristics in a multiple regression model.
21.2 Regression Analysis Tool 257
Trendline Embellishments
If the equation displayed on the chart is used to calculate predicted selling prices, the
results may be imprecise because the intercept and slope have only three decimal places.
To display more decimal places, double-click the chart to activate it and click on the
region containing the equation and R2 value to select them for editing. Then click the
Increase Decimal tool repeatedly to display more decimal places. The equation values
shown in Figure 21.5 were obtained by clicking Increase Decimal twice to change from
three decimal places to five. These changes affect both the equation and R2 value, and
these changes must be made before any other editing.
With the equation and R2 value selected, you can move the entire text box by clicking and
dragging near the edge of the box, and you can use the regular text editing options for
rearranging the text. Figure 21.5 shows the result of such editing; variable names were
substituted for x and y, terms were rearranged, and the last three significant figures of R2
were deleted. Once you begin any such editing, you are unable to use the Increase
Decimal or Decrease Decimal tools to change the displayed precision.
1. Arrange the data in columns with the x variable on the left and the y variable on
the right, as before. Make space for the results of the regression analysis to the
right of the data. Allow at least 16 columns. (Delete the scatterplot or move it far
to the right.)
2. From the Tools menu, choose the Data Analysis command. In the Data Analysis
dialog box, scroll the list box, select Regression, and click OK. The Regression
dialog box appears as shown in Figure 21.6.
In the Regression dialog box, move from box to box using the mouse or the tab key. For a
box requiring a range, select the box and then select the appropriate range on the
worksheet by pointing. To see cells on the worksheet, move the Regression dialog box by
clicking on its title bar and dragging, or click the collapse button on the right side of each
range edit box. Click the Help button for additional information.
3. Input Y Range: Point to or enter the reference for the range containing values of
the dependent variable. Include the label above the data.
4. Input X Range: Point to or enter the reference for the range containing values of
the explanatory variable. Include the label above the data.
21.2 Regression Analysis Tool 259
5. Labels: Select this box, because the labels at the top of the Input Y Range and
Input X Range were included in those ranges.
6. Constant is Zero: Select this box only if you want to force the regression line to
pass through the origin (0,0).
7. Confidence Level: Excel automatically includes 95% confidence intervals for
the regression coefficients. For an additional confidence interval, select this box
and enter the level in the Confidence Level box.
8. Output location: Click the Output Range button, click to select the range edit
box on its right, and point to or type a reference for the top-left corner cell of a
range 16 columns wide where the summary output and charts should appear.
Alternatively, click the New Worksheet Ply button if you want the output to
appear on a separate sheet and optionally type a name for the new sheet, or click
the New Workbook button if you want the output in a separate workbook.
9. Residuals: Select this box to obtain the fitted values (predicted y) and residuals.
10. Residual Plots: Select this box to obtain charts of residuals versus each x
variable.
11. Standardized Residuals: Select this box to obtain standardized residuals (each
residual divided by the standard deviation of the residuals). This output makes it
easy to identify outliers.
12. Line Fit Plots: Select this box to obtain an XY (scatter) chart of the y input data
and fitted y values versus the x variable. This chart is similar to the scatterplot
with an inserted trendline shown in Figure 21.4.
13. Normal Probability Plots: This option is not implemented properly, so don't
check this box.
14. After selecting all options and pointing to or typing references, click OK. (If you
receive the error message "Cannot add chart to a shared workbook," click the
OK button. Then click New Workbook under Output in the Regression dialog
box. If desired, use the Edit | Move or Copy Sheet command to copy the results
back to the original workbook.) The summary output and charts appear.
15. Optional: To change column widths so that all summary output is visible, make
a nonadjacent selection. First select the cell containing the Adjusted R Square
label (D6). Hold down the Control key while clicking the following cells:
Significance F (I11), Coefficients (E16), Standard Error (F16), and Upper 95%
(J16). From the Format menu, choose Column | AutoFit Selection. The
formatted summary output is shown in Figure 21.7.
260 Chapter 21 Simple Linear Regression
16. Optional: The residual output appears below the summary output. To relocate
the residuals to facilitate comparisons, select columns C:E and choose Insert
from the shortcut menu. Select the residual output (H24:J39), including the row
of labels but excluding the Observation numbers, and choose Cut or Copy from
the shortcut menu. Select cell C1 and choose Paste from the shortcut menu.
Adjust the widths of columns C:E and decrease the decimals displayed in cells
C2:E16 to obtain the results shown in Figure 21.8.
Regression Interpretation
The intercept and slope of the fitted regression line are in the lower-left section labeled
"Coefficients" of the summary output in Figure 21.7. The Intercept coefficient
18.7894675 is the constant term in the linear regression equation, and the SqFt coefficient
0.02101025 is the slope. The regression equation is
Predicted Price = 18.7894675 + 0.02101025 * SqFt.
For an explanation of the intercept and slope, refer to Trendline Interpretation, Section
21.1.
In the residual output shown in Figure 21.8, the predicted prices, sometimes termed the
fitted values, are the result of estimating the selling price of each property using this
regression equation. The residuals are the difference between the actual and fitted values.
For example, the first property has 521 square feet. On the average, we would expect this
property to have a selling price of $29,736, but its actual selling price is $26,000. The
residual for this property is $26,000 – $29,736—that is, –$3,736. Its actual selling price is
$3,736 below what is expected. The residuals are also termed deviations or errors.
The four most common measures to answer the question "How good is the relationship?"
are the standard error, R2, t statistics, and analysis of variance. The standard error,
3.23777441, shown in cell E7 of Figure 21.7, is expressed in the same units as the
dependent variable, selling price. As the standard deviation of the residuals, it measures
the scatter of the actual selling prices around the regression line. This summary of the
residuals is $3,238. The standard error is often called the standard error of the estimate.
R square, shown in cell E5 of Figure 21.7, measures the proportion of variation in the
dependent variable that is explained using the regression line. This proportion must be a
number between zero and one, and it is often expressed as a percentage. Here
approximately 66% of the variation in selling prices is explained using living space as a
predictor in a linear equation. Adjusted R square, shown in cell E6, is useful for
comparing this model with other models using additional explanatory variables.
The t statistics, shown in cells G17:G18 of Figure 21.7, are part of individual hypothesis
tests of the regression coefficients. For example, these 15 properties could be treated as a
sample from a larger population. The null hypothesis is that there is no relationship: the
population regression coefficient for living space is zero, implying that differences in
living space don't affect selling price. With a sample regression coefficient of 0.02101025
and a standard error of the coefficient (an estimate of the sampling error) of 0.004148397,
the coefficient is 5.064667 standard errors from zero. The two-tail p-value, 0.000217,
shown in cell H18, is the probability of obtaining these results, or something more
extreme, assuming the null hypothesis is true. Therefore, we reject the null hypothesis
and conclude there is a significant relationship between selling price and living space.
262 Chapter 21 Simple Linear Regression
The analysis of variance table, shown in cells D10:I14 of Figure 21.7, is a test of the
overall fit of the regression equation. Because it summarizes a test of the null hypothesis
that all regression coefficients are zero, it will be discussed in Chapter 23 with multiple
regression.
Regression Charts
For simple linear regression the analysis tool provides two charts: residual plot and line
fit plot. These charts are embedded near the top of the worksheet to the right of the
summary output. In the real estate properties example, the charts are originally located in
cells M1:S12; after relocating the residuals, the charts are in cells P1:V12.
The line fit plot is shown in Figure 21.9. This chart is similar to the scatterplot with
inserted trendline, except that the predicted values in this chart are markers without a line.
The following steps describe how to format the line fit plot.
1. Select the data series for Predicted Price by clicking one of the square markers
that are in a straight line. (Alternatively, select any chart object and use the up
and down arrow keys to make the selection.) The points are highlighted and
"=SERIES("Predicted Price",...)" appears in the formula bar. Right-click, choose
Format Data Series from the shortcut menu, and click the Patterns tab. Select
Automatic for Line and select None for Marker. Then click OK.
2. Select the x-axis by clicking on the horizontal line at the bottom of the plot area.
A square handle appears at each end of the x-axis. Right-click, choose Format
Axis from the shortcut menu, and click the Scale tab. Clear the Auto checkbox
for Minimum and type 400 in its edit box; clear the Auto checkbox for
Maximum and type 1400 in its edit box; clear the Auto checkbox for Major Unit
and type 200 in its edit box. Then click OK.
3. Select the y-axis. Right-click, choose Format Axis from the shortcut menu, and
click the Scale tab. Clear the Auto checkbox for Minimum and type 20 in its edit
21.2 Regression Analysis Tool 263
box; clear the Auto checkbox for Maximum and type 50 in its edit box; clear the
Auto checkbox for Major Unit and type 10 in its edit box. Click the Number tab,
select Number in the Category list box, and click the Decimal Places spinner
control to select 0. Then click OK.
4. Optional: To obtain the appearance shown in Figure 21.10, select and enter more
descriptive text for the chart title, x-axis title, and y-axis title. Resize the chart so
that it is approximately 7 columns wide and 14 rows high. Select the chart title
and choose Arial 10 bold from the formatting toolbar. For the legend, axes and
axis titles, select each object and choose Arial 8.
This method, shown in cell H11, calculates the intercept and slope using least squares and
returns the predicted value of y for the specified value of x.
Yet another method for obtaining predicted y values is the TREND function, which has
the following syntax:
TREND(known_y's,known_x's,new_x's,const)
This function, unlike the FORECAST function, can also be used for multiple regression
(two or more x variables). Because the TREND function is an array function, it must be
entered in a special way, as described in the following steps.
1. Enter the data for the x and y variables (A2:B16) and values of the x variable
(D13:D16) for which predicted y values will be calculated.
2. Select a range where the predicted y values are to appear (H13:H16).
3. From the Insert menu, choose the Function command. Alternatively, click the
Insert Function button (icon fx). In the Insert Function dialog box, select
Statistical in the category list box and select TREND in the function list box.
Then click OK.
4. In the TREND dialog box, type or point (click and drag) to ranges on the
worksheet containing the known y values (B2:B16), known x values (A2:A16),
and new x values (D13:D16). Do not include the labels in row 1 in these ranges.
In the edit box labeled "Const," type the integer 1, which is interpreted as true,
indicating that an intercept term is desired. Then click OK.
266 Chapter 21 Simple Linear Regression
5. With the function cells (H13:H16) still selected, press the F2 key (for editing).
The word "Edit" appears in the status bar at the bottom of the screen. Hold down
the Control and Shift keys and press Enter. The formula bar shows curly
brackets around the TREND function, indicating that the array function has been
entered correctly.
A companion function, LINEST, provides regression coefficients, standard errors, and
other summary measures. Like TREND, this function can be used for multiple regression
(two or more x variables) and must be array-entered. Its syntax is
LINEST(known_y's,known_x's,const,stats).
The "const" and "stats" arguments are true-or-false values, where "const" specifies
whether the fitted equation has an intercept term and "stats" indicates whether summary
statistics are desired.
To obtain the results shown in Figure 21.13, select D1:E5, type or use the Insert Function
tool to enter LINEST, press F2, and finally hold down the Control and Shift keys while
you press Enter. Cells D7:E11 show the numerical results that appear in cells D1:E5, and
cells D13:E17 describe the contents of those cells. These same values appear with labels
in the Regression analysis tool summary output shown in Figure 21.7.
EXERCISES
Exercise 21.1 Refer to the data on vacancy percentages and monthly rents for ten cities in
Exercise 19.1.
1. Prepare a scatterplot and insert a linear trendline.
2. Use the Regression analysis tool to obtain complete diagnostics.
3. Make a prediction of vacancy percentage for a city where monthly rent per square
foot is $3.50.
Exercise 21.2 Refer to the data on study hours and test grades for 20 students in Exercise
19.3.
1. Prepare a scatterplot and insert a linear trendline.
2. Use the Regression analysis tool to obtain complete diagnostics.
3. Make a prediction of test grade for a student who studies ten hours.
4. Student 7 studied ten hours and received a test grade of 63. Taking into account the
number of study hours, is this test grade below average, average, or above average?
268 Chapter 21 Simple Linear Regression
For example, the upper-left panel shows data where the bulge points toward the
northwest (NW). The power (for x > 1) and logarithmic functions are appropriate for this
pattern. The lower-left panel shows data with a bulge toward the southwest (SW), in
which case the power, logarithmic, or exponential functions are candidates. And the
lower-right panel shows data with a bulge toward the southeast (SE), where the power
(for x > 1) and exponential functions are appropriate. In addition, all four data patterns
may be modeled using a quadratic function (polynomial of order 2).
If the pattern of the data on a scatterplot doesn't fit any of the single-bulge examples
shown in Figure 22.1, some other functional form may be needed. For example, if the
data have two bulges (an S shape), a cubic function (polynomial of order 3) may be
appropriate.
The general approach for inserting a nonlinear trendline is as follows. First, construct the
scatterplot. (Arrange the data on a worksheet with the x data in a column on the left and
the y data in a column on the right. Select both the x and y data and use the Chart Wizard
to construct the XY chart.) Second, click a data point on the chart to select the data series,
and choose Add Trendline from the Chart menu; alternatively, right-click the data series
and choose Add Trendline from the shortcut menu. The upper portion of the Add
Trendline dialog box Type tab is shown in Figure 22.2.
To obtain the trendline results shown in this chapter, select the appropriate type
(polynomial, logarithmic, power, or exponential) and in the Options tab select the
checkboxes for Display Equation on Chart and Display R-squared Value on Chart.
The first example is the real estate property data set described in Chapter 19. The
dependent variable is selling price, in thousands of dollars, and the explanatory variable
is living space, in square feet. Details for constructing the scatterplot are described in
Chapter 19, and steps for inserting a linear trendline are in Chapter 21.
22.1 Polynomial 271
In the residual plot of real estate property data—shown in Figure 21.11—the first two
properties with low square footage and the last two or three properties with high square
footage have negative residuals. This observation is some indication that a nonlinear fit
may be more appropriate. Although the curvature is minimal, the scatterplot shows a
slight bulge pointing toward the northwest (NW). Thus, the quadratic (polynomial of
order 2), power, and logarithmic functions are candidates.
22.1 POLYNOMIAL
Figure 22.3 shows the results for a quadratic fit (polynomial of order 2). The R2 value of
68% is only slightly better than the value of 66% obtained with the linear fit described in
Chapter 21.
The following steps describe how to obtain more complete regression results using the
quadratic model.
1. Enter the data into columns A and C as shown in Figure 22.4. If the SqFt and
Price data are already in columns A and B, select column B and choose Insert
from the shortcut menu. Enter the label SqFt^2 in cell B1.
2. Select cell B2 and enter the formula =A2^2. To copy the formula to the other
cells in column B, select cell B2 and double-click the fill handle in its lower-
right corner. The squared values appear in column B.
272 Chapter 22 Simple Nonlinear Regression
3. From the Tools menu, choose Data Analysis. In the Data Analysis dialog box,
scroll the list box, select Regression, and click OK. The Regression dialog box
appears.
4. Input Y Range: Point to or enter the reference for the range containing values of
the dependent variable (C1:C16), including the label in row 1.
5. Input X Range: Point to or enter the reference for the range containing values of
the explanatory variables (A1:B16), including the labels in row 1.
6. Labels: Select this box, because labels were included in the Input X and Y
Ranges.
7. Do not select the checkboxes for Constant is Zero or Confidence Level.
8. Output options: Click the Output Range option button, select the edit box to the
right, and point to or enter a reference for the top-left corner cell of a range 16
columns wide where the summary output and charts should appear (E1). If
desired, check the appropriate boxes for Residuals. Then click OK.
Figure 22.4 shows the regression output after deleting the ANOVA portion (by selecting
E10:M14 and choosing Delete | Shift Cells Up from the shortcut menu). Compared to the
linear model in Chapter 21, this quadratic model has a slightly larger standard error and a
smaller adjusted R2; using these criteria, the quadratic model is not really better than the
linear one.
To make a prediction of average selling price using the quadratic model, enter the SqFt
value in a cell (A17, for example) and a formula for SqFt^2 (=A17^2 in cell B17). Then
22.2 Logarithmic 273
22.2 LOGARITHMIC
The logarithmic model creates a trendline using the equation
y = c * Ln(x) + b
where Ln is the natural log function with base e (approximately 2.718). Because the log
function is defined only for positive values of x, the values of the explanatory variable in
your data set must be positive. If any x values are zero or negative, the Logarithmic icon
on the Add Trendline Type tab will be grayed out. (As a workaround, you can add a
constant to each x value.) The results of adding a logarithmic trendline to the scatterplot
of real estate property data are shown in Figure 22.5.
The following steps describe how to use the Regression analysis tool to obtain more
complete regression results using the logarithmic model.
274 Chapter 22 Simple Nonlinear Regression
1. Enter the data into columns A and C as shown in Figure 22.6. If the SqFt and
Price data are already in columns A and B, select column B and choose Insert
from the shortcut menu. Enter the label Ln(SqFt) in cell B1.
2. Select cell B2 and enter the formula =LN(A2). To copy the formula to the other
cells in column B, select cell B2 and double-click the fill handle in its lower-
right corner. The log values appear in column B.
3. From the Tools menu, choose Data Analysis. In the Data Analysis dialog box,
scroll the list box, select Regression, and click OK. The Regression dialog box
appears.
4. Input Y Range: Point to or enter the reference for the range containing values of
the dependent variable (C1:C16), including the label in row 1.
5. Input X Range: Point to or enter the reference for the range containing values of
the explanatory variable (B1:B16), including the label in row 1.
6. Labels: Select this box, because labels were included in the Input X and Y
Ranges.
7. Do not select the checkboxes for Constant is Zero or Confidence Level.
8. Output options: Click the Output Range option button, select the text box to the
right, and point to or enter a reference for the top-left corner cell of a range 16
columns wide where the summary output and charts should appear (E1). If
desired, check the appropriate boxes for Residuals. Then click OK.
Figure 22.6 shows the regression output after deleting the ANOVA portion (by selecting
E10:M14 and choosing Delete | Shift Cells Up from the shortcut menu). Compared with
the linear model in Chapter 21, this logarithmic model has a smaller standard error and a
higher adjusted R2; using these criteria, the logarithmic model is somewhat better than the
linear one.
22.3 Power 275
To make a prediction of average selling price using the logarithmic model, enter the SqFt
value in a cell (A17, for example) and a formula for Ln(SqFt) (=LN(A17) in cell B17).
Then build a formula for predicted price (=F12+F13*B17 in cell C17).
22.3 POWER
The power model creates a trendline using the equation
y = c * xb.
Excel uses a log transformation of the original x and y data to determine fitted values, so
the values of both the dependent and explanatory variables in your data set must be
positive. If any y or x values are zero or negative, the Power icon on the Add Trendline
Type tab will be grayed out. (As a workaround, you can add a constant to each y and x
value.) The results of adding a power trendline to the scatterplot of real estate property
data are shown in Figure 22.7.
The power trendline feature does not find values of b and c that minimize the sum of
squared deviations between actual y and predicted y (= c * xb). Instead, Excel's method
takes the logarithm of both sides of the power formula, which then can be written as
Ln(y) = Ln(c) + b * Ln(x),
and uses standard linear regression with Ln(y) as the dependent variable and Ln(x) as the
explanatory variable. That is, Excel finds the intercept and slope that minimize the sum of
squared deviations between actual Ln(y) and predicted Ln(y), using the formula
Ln(y) = Intercept + Slope * Ln(x).
276 Chapter 22 Simple Nonlinear Regression
Therefore, the Intercept value corresponds to Ln(c), and c in the power formula is equal
to Exp(Intercept). The Slope value corresponds to b in the power formula.
The following steps describe how to use the Regression analysis tool on the transformed
data to obtain regression results for the power model.
1. Enter the data into columns A and B as shown in Figure 22.8.
2. Enter the label Ln(SqFt) in cell C1. Select cell C2 and enter the formula
=LN(A2).
3. Enter the label Ln(Price) in cell D1. Select cell D2 and enter the formula
=LN(B2).
4. To copy the formulas to the other cells, select cells C2 and D2, and double-click
the fill handle in the lower-right corner of cell D2. The log values appear in
columns C and D.
5. From the Tools menu, choose Data Analysis. In the Data Analysis dialog box,
select Regression and click OK. The Regression dialog box appears.
6. Input Y Range: Point to or enter the reference for the range containing values of
the dependent variable (D1:D16), including the label in row 1.
7. Input X Range: Point to or enter the reference for the range containing values of
the explanatory variable (C1:C16), including the label in row 1.
8. Labels: Select this box, because labels are included in the Input X and Y
Ranges.
22.4 Exponential 277
To determine the value of c for the power formula, select cell G14 and enter the formula
=EXP(G12). To make a prediction of average selling price using the power model, enter
the SqFt value in a cell (A17, for example). Then build a formula for predicted price
(=G14*A17^G13 in cell B17).
22.4 EXPONENTIAL
The exponential model creates a trendline using the equation
y = c * ebx.
Excel uses a log transformation of the original y data to determine fitted values, so the
values of the dependent variable in your data set must be positive. If any y values are zero
278 Chapter 22 Simple Nonlinear Regression
or negative, the Exponential icon on the Add Trendline Type tab will be grayed out. (As a
workaround, you can add a constant to each y value.)
This function may be used to model exponentially increasing growth. The data shown in
Figure 22.9 are an example of such a pattern.
Time series data are often displayed using an Excel line chart instead of an XY (scatter)
chart. The following steps describe how to construct the line chart with an exponential
trendline shown in Figure 22.10.
1. Enter the year and sales data as shown in Figure 22.9.
2. Select the sales data (B2:B9) and click the Chart Wizard button.
3. In step 1 of the Chart Wizard (Chart Type) on the Standard Types tab, select
"Line with markers displayed at each data value." Click Next. In step 2 (Chart
Source Data) on the Series tab, select the range edit box for Category (X) Axis
Labels, and click and drag A2:A9 on the worksheet. Click Next. In step 3 (Chart
Options) on the Titles tab, type the chart and axis labels shown in Figure 22.10;
on the Legend tab, clear the checkbox for Show Legend. Click Finish.
4. Click one of the data points of the chart to select the data series. Right-click and
choose Add Trendline from the shortcut menu. On the Type tab, click the
Exponential icon. On the Options tab, click Display Equation on Chart and click
Display R-squared Value on Chart. Then click OK.
Because this is a line chart instead of an XY (scatter) chart, Excel does not use the Year
data in column A for fitting the exponential function. The Year data are used only as
labels for the x-axis, but the values used for x in the exponential function are the numbers
1 through 8.
The exponential trendline feature does not find values of b and c that minimize the sum
of squared deviations between actual y and predicted y (= c * ebx). Instead, Excel's
22.4 Exponential 279
method takes the logarithm of both sides of the exponential formula, which then can be
written as
Ln(y) = Ln(c) + b * x
and uses standard linear regression with Ln(y) as the dependent variable and x as the
explanatory variable. That is, Excel finds the intercept and slope that minimize the sum of
squared deviations between actual Ln(y) and predicted Ln(y), using the formula
Ln(y) = Intercept + Slope * x.
Therefore, the Intercept value corresponds to Ln(c), and c in the exponential formula is
equal to Exp(Intercept). The Slope value corresponds to b in the exponential formula.
The following steps describe how to use the Regression analysis tool on the transformed
data to obtain regression results for the exponential model.
1. Enter the data into columns A, B, and C as shown in Figure 22.11. If the Year
and Sales data are already in columns A and B as shown in Figure 22.9, select
column B, choose Insert from the shortcut menu, and enter the label X and
integers 1 through 8 in column B.
2. Enter the label Ln(Sales) in cell D1. Enter the formula =LN(C2) in cell D2.
3. To copy the formula, select cell D2 and double-click the fill handle in its lower-
right corner. The log values appear in column D.
4. From the Tools menu, choose Data Analysis. In the Data Analysis dialog box,
select Regression and click OK. The Regression dialog box appears.
280 Chapter 22 Simple Nonlinear Regression
5. Input Y Range: Point to or enter the reference for the range containing values of
the dependent variable (D1:D9), including the label in row 1.
6. Input X Range: Point to or enter the reference for the range containing values of
the explanatory variable (B1:B9), including the label in row 1.
7. Labels: Labels were included in the Input X and Y Ranges, so select this box.
8. Do not select the checkboxes for Constant is Zero or Confidence Level.
9. Output options: Click the Output Range option button, select the range edit box
to the right, and point to or enter a reference for the top-left corner cell of a
range 16 columns wide where the summary output and charts should appear
(F1). If desired, check the appropriate boxes for Residuals. Then click OK.
Figure 22.11 shows the regression output after deleting the ANOVA portion (by selecting
F10:N14 and choosing Delete | Shift Cells Up from the shortcut menu). The R Square
and Standard Error values cannot be compared directly with the linear model in Chapter
21. Here, R square is the proportion of variation in Ln(y) explained by x in a linear model,
and the standard error is expressed in the same units of measurement as Ln(y).
To determine the value of c for the exponential formula, select cell G14, and enter the
formula =EXP(G12). To make a prediction of average sales using the exponential model,
enter the x value in a cell (9 in cell B10, for example). Then build a formula for predicted
sales (=G14*EXP(G13*B10) in cell C10).
An alternative method for obtaining exponential regression results is to use the LOGEST
and GROWTH worksheet functions. The descriptions of these functions in Excel's on-
line help use the equation
y = b * mx.
22.4 Exponential 281
The GROWTH function is similar to the TREND function, except that it returns fitted
values for the exponential equation instead of the linear equation. GROWTH can also be
used for multiple regression (two or more x variables) and must be array-entered.
282 Chapter 22 Simple Nonlinear Regression
EXERCISES
Exercise 22.1 Seven identical automobiles were driven by employees for business
purposes for several days. The drivers reported average speed, in miles per hour, and gas
mileage, in miles per gallon, as shown in the following table.
Speed Gas Mileage
MPH MPG
32 20
37 23
44 26
49 27
56 26
62 25
68 22
1. Prepare a scatterplot and insert a quadratic trendline.
2. Use the Regression analysis tool to obtain complete diagnostics.
3. Make a prediction of gas mileage for an automobile driven at an average speed of 50
miles per hour.
Exercise 22.2 A chain store tried different prices for a television set in five retail markets
during a four-week period. The following table shows the retail prices and sales rates, in
units sold per thousand of residents in the market.
Price Sales Rate
$275 1.60
$300 0.95
$325 0.65
$350 0.50
$375 0.45
1. Prepare a scatterplot and insert an appropriate trendline.
2. Use the Regression analysis tool to obtain complete diagnostics.
3. Make a prediction of sales rate for a market where the price is $295.
Multiple Regression
23
In Chapter 21, a simple linear regression model examined the relationship between
selling price and living space for 15 real estate properties. The standard error was $3,328,
and R square was 0.664, indicating 66% of the variation in selling prices could be
explained using living space as the explanatory variable in a linear model.
More of the variation in selling prices might be explained by using an additional variable.
Data on the most recent assessed value (for property tax purposes) are also available;
perhaps selling price is related to assessed value. Multiple regression can examine the
relationship between selling price and two explanatory variables, living space and
assessed value. (The pairwise correlations among these three variables were examined in
Chapter 19.) The following steps describe how to use the Regression analysis tool for
multiple regression.
1. Arrange the data in columns with the two explanatory variables in columns on
the left and the dependent variable in a column on the right. The two (or more)
explanatory variables must be in adjacent columns. If the data from Chapter 21
(or Example 19.1) are in columns A and B, insert a new column B and enter the
new data for assessed value as shown in Figure 16.1.
2. From the Tools menu, choose Data Analysis. In the Data Analysis dialog box,
scroll the list box, select Regression, and choose OK.
3. Input Y Range: Point to or enter the reference for the range containing values of
the dependent variable (selling prices, C1:C16). Include the label above the data.
4. Input X Range: Point to or enter the reference for the range containing values of
the two explanatory variables (SqFt and Assessed, A1:B16). Include the labels
above the data.
5. Other dialog box entries: Fill in the other checkboxes and edit boxes as shown in
Figure 23.1. Then click OK. If the error message "Regression - Cannot add chart
to a shared workbook" appears, click Cancel; to obtain chart output, select New
Workbook under Output Options in the Regression dialog box.
284 Chapter 23 Multiple Regression
6. Optional: To change column widths so that all summary output labels are
visible, select the cell containing the Adjusted R Square label (E6) and hold
down the Control key while selecting cells containing the labels Coefficients
(F16), Standard Error (G16), Significance F (J11), and Upper 95% (K16). From
the Format menu, choose the Column command and select AutoFit Selection.
The results are shown in Figure 23.2.
23.1 Interpretation of Regression Output 285
Significance of Coefficients
The t statistic for the SqFt coefficient is greater than two, indicating that 0.017 is
significantly different from zero. We can reject the null hypothesis that there is no
relationship between SqFt and Price in this model and conclude that a significant
relationship exists.
The t statistic for the Assessed coefficient is 2.79, indicating that 0.361 is significantly
different from zero.
286 Chapter 23 Multiple Regression
The p-value is a two-tail probability using the t distribution. Since we would expect to see
a positive relationship between selling price and each explanatory variable, one-tail tests
are appropriate here. Dividing each p-value in the summary output by two, the one-tail p-
values are approximately 0.00038 and 0.0081. Thus, in this model we can reject the
hypotheses of no relationship between selling price and each explanatory variable at the
1% level of significance.
The t statistic for the Intercept term is usually ignored.
If the relationship between selling price and living space is linear (after taking into
account assessed value), then a random pattern should appear in the residual plot. On the
other hand, if we see curvature or some other systematic pattern, then we should change
our model to incorporate the nonlinear relationship.
Most observers would conclude that the residual plot is essentially random, so no
additional modeling is required. Because our sample size is so small (15 observations), it
can be difficult to detect nonlinear patterns.
Residual plots are useful for detecting situations where the residuals are smaller in one
region and larger in another. The residual plot would have the shape of a tree resting on
its side. In such cases the standard error of the estimate, which summarizes all of the
residual terms, would overstate the variation in one region and understate the variation in
another.
Looking at the plot of residuals versus assessed values shown in Figure 23.4, the pattern
also appears random. Once again, the small sample size makes it difficult to detect
nonlinear patterns.
288 Chapter 23 Multiple Regression
Instead of typing the TREND function, an alternative is to select the output cells
(D18:D21) and click the Insert Function tool (icon fx). In the Insert Function dialog box,
select Statistical in the category list box, select TREND in the function list box, and click
OK. In the TREND dialog box, type or point to (click and drag) ranges on the worksheet
containing the known y values (C2:C16), known x values (A2:B16), and new x values
(A18:B21). Do not include the labels in row 1 in these ranges. In the edit box labeled
"Const," type the integer 1, which is interpreted as true, indicating that an intercept term
is desired. Then click OK. With the function cells (D18:D21) still selected, press the F2
key (for editing). The word "Edit" appears in the status bar at the bottom of the screen.
Hold down the Control and Shift keys and press Enter.
However, there are two things approximate about this prediction interval. First, instead of
using the standard error of the estimate, which measures only the scatter of the actual
values around the regression equation, we should use the standard error of a prediction,
which also takes into account uncertainty in the coefficients of the regression equation.
The standard error of a prediction is always greater than the standard error of the
estimate. Unfortunately, there is no simple way to compute the standard error of a
prediction using Excel.
Second, the number of standard errors for a 95% prediction interval based on 15
observations with our model should use a value of the t statistic with 12 degrees of
freedom, which is 2.179, not 2. (For a very large sample size, the normal distribution is
appropriate, and the number of standard errors is 1.96, which is approximately 2.)
Therefore, our approximate interval is very approximate. An exact 95% prediction
interval would be wider.
EXERCISES
Exercise 23.1 The president of a national real estate company wanted to know why
certain branches of the company outperformed others. He felt that the key factors in
determining total annual sales (in $ millions) were the advertising budget (in $ thousands)
and the number of sales agents. To analyze the situation, he took a sample of eight offices
and collected the data in the following table.
Advertising Number Annual Sales
Office ($ thousands) of Agents ($ millions)
1 249 15 32
2 183 14 18
3 310 21 49
4 246 18 52
5 288 13 36
6 248 21 43
7 256 20 24
8 241 19 41
1. Prepare a regression model and interpret the coefficients.
2. Test to determine whether there is a linear relationship between each explanatory
variable and the dependent variable, with a 5% level of significance.
3. Make a prediction of annual sales for a branch with an advertising budget of
$250,000 and 17 agents.
Exercise 23.2 (adapted from Canavos, p. 602) A university placement office conducted a
study to determine whether the variation in starting salaries for school of business
Exercises 291
graduates can be explained by the students' grade point average (GPA) and age upon
graduation. The placement office obtained the sample data shown in the following table.
GPA Age Starting Salary
2.95 22 $25,500
3.40 23 28,100
3.20 27 28,200
3.10 25 25,000
3.05 23 22,700
2.75 28 22,500
3.15 26 26,000
2.75 26 23,800
1. Prepare a regression model and interpret the coefficients.
2. Determine whether grade point average and age contribute substantially in
explaining the variation in the sample of starting salaries.
3. Make a prediction of starting salary for a 24-year-old graduate with a 3.00 GPA.
292 Chapter 23 Multiple Regression
The initial analysis uses only construction grade as the predictor of selling price, followed
by a multiple regression model using construction grade and the other predictor variables
(square feet of living space and assessed value).
The following steps describe how to use indicator variables in a regression model. An
indicator variable is defined for each of the three categories. Low is selected as the base-
case category; only indicator variables for the Medium and High categories are included
in the regression model.
1. Arrange the data in a worksheet as shown in Figure 24.1.
2. Select columns C:E. With the pointer in the selected range, right-click and
choose Insert from the shortcut menu. Enter the labels Low, Medium, and High
in cells C1:E1.
3. Enter a formula in cell C2 for determining values of the Low indicator variable:
=IF(B2="Low",1,0). The meaning of this formula is "If the grade is low, use
the value 1; otherwise use the value 0."
4. Enter a formula in cell D2 for determining values of the Medium indicator
variable: =IF(B2="Medium",1,0). The meaning of this formula is "If the grade
is medium, use the value 1; otherwise use the value 0."
5. Enter a formula in cell E2 for determining values of the High indicator variable:
=IF(B2="High",1,0). The meaning of this formula is "If the grade is high, use
the value 1; otherwise use the value 0." If the three formulas are entered
correctly, the contents of cells C2:E2 are 1, 0, and 0.
24.1 Categories as Explanatory Variables 295
6. Select the new formulas in cells C2:E2. To copy the formulas to the other cells,
double-click the fill handle (small square in the lower-right corner of the
selected range). The worksheet should appear as shown in Figure 24.2.
These regression results yield the same average selling prices that would be obtained by
simply averaging the price for each construction grade. For example, the mean selling
price for the three high construction grade properties (44.8, 41.8, and 45.2) is 43.933.
An advantage of using indicator variables is that they can be combined with other
explanatory variables in a multiple regression model. The following steps provide a
general description of how to use construction grade, assessed value, and living space as
explanatory variables.
1. The four x variables (SqFt, Medium, High, and Assessed) must be in adjacent
columns. If the data are arranged as shown in Figure 24.2, one method is to
select column F (Assessed), right-click, and choose Insert from the shortcut
menu. Then select column A (SqFt), right-click, and choose Copy from the
shortcut menu; select column F (empty), right-click, and choose Paste from the
shortcut menu. (Alternatively, after inserting empty column F, select column A,
position the mouse pointer near the edge of column A until it turns into an
arrow, and click and drag column A to column F.)
2. In the Regression dialog box, the Input Y Range contains the selling prices
(H1:H16), the Input X Range contains the values for the four explanatory
variables, Medium, High, SqFt, and Assessed (D1:G16), the Output Range is J1,
and the Labels, Residuals, and Standardized Residuals checkboxes are selected.
The following steps describe how to perform discriminant analysis for a binary dependent
variable using multiple regression.
1. Enter the data shown in Figure 24.5 on a worksheet.
2. Use the Regression analysis tool as described in Chapters 21, 22, and 23. The
Input Y Range is the bankruptcy 1/0 variable (C1:C17), the Input X Range
contains the two financial ratios (A1:B17), and the Output Range is E1. Select
the Labels checkbox and the Residuals checkbox.
3. Format the regression summary output as described in Chapter 21. The result is
shown in Figure 24.6.
300 Chapter 24 Regression Using Categorical Variables
Referring to the coefficients in the summary output shown in Figure 24.6 and rounding to
four decimal places, the fitted regression model is
Bankrupt = - 0.0027 - 1.7623 * NI/TA + 0.9600 * CA/NS.
The Predicted Bankrupt values calculated using this model are located below the
regression summary output. The following steps relocate the predicted values and
calculate other values for the discriminant analysis.
4. To make room for additional calculations, select columns D:F. With the pointer
in the selected range, right-click and choose Insert from the shortcut menu.
5. To relocate the predicted values, select cells I25:I41. With the pointer in the
selected range, right-click and choose Copy from the shortcut menu. Then select
cell D1, right-click, and choose Paste from the shortcut menu.
6. Optional: With the pasted range D1:D17 still selected, choose Column from the
Format menu and select AutoFit Selection. Select the predicted values D2:D17
and repeatedly click the Decrease Decimal tool button until three decimal places
are displayed.
The regression model uses the two financial ratios to predict the value 1 for bankrupt
firms and 0 for the sound firms. However, the predicted values are not exactly equal to 1
or 0, so we need a rule for predicting which firms are bankrupt and which are sound. A
simple rule is to predict bankruptcy if the Predicted Bankrupt value is greater than 0.5
and predict soundness if the Predicted Bankrupt value is less than or equal to 0.5.
24.4 Categories as the Dependent Variable 301
7. Enter the label Classification in cell E1 and adjust the column width. To
classify the Predicted Bankrupt values, enter a formula in cell E2:
=IF(D2>0.5,1,0). The meaning of this formula is "If the Predicted Bankrupt
value is greater than 0.5, use the value 1; otherwise use the value 0."
8. Enter the label Correct in cell F1. To determine which firms were classified
correctly, enter a formula in cell F2: =IF(C2=E2,1,0). This means "If the actual
Bankrupt value equals the predicted classification, use the value 1; otherwise use
the value 0."
9. Select the two formulas (E2:F2). To copy the formulas to the other cells, double-
click the fill handle (small square in the lower-right corner of the selected
range).
10. To determine the total number of correct classifications, select cell F18 and click
the sum tool twice. The results are shown in Figure 24.7.
would substitute the firm's financial ratios into our model, evaluate the regression
equation to obtain a fitted value, and predict bankruptcy if the fitted value exceeds 0.5.
Additional analysis could involve trying classification threshold values other than 0.5.
Such analysis could be automated using Excel's Data Table feature.
EXERCISES
Exercise 24.1 Refer to the real estate property data in Figure 24.1. Determine the selling
price per square foot of living space for each of the 15 properties. Develop a regression
model using indicator variables for construction grade to explain the variation in price per
square foot. Interpret the coefficients. What is the expected price per square foot for a
property with low construction grade?
Exercise 24.2 (adapted from Canavos, p. 607) A personnel recruiter for industry wishes
to identify the factors that explain the starting salaries for business school graduates. He
believes that a student's grade point average (GPA) and academic major are appropriate
explanatory variables.
GPA Major Starting Salary
2.95 Management $21,500
3.20 Management 23,000
3.40 Management 24,100
2.85 Accounting 24,000
3.10 Accounting 27,000
2.85 Accounting 27,800
2.75 Finance 20,500
3.10 Finance 22,200
3.15 Finance 21,800
Fit an appropriate model to these data, evaluate it, and interpret it. What is the expected
starting salary for an accounting major with a 3.00 GPA?
Exercises 303
Exercise 24.4 A credit manager has classified each of the company's loans as being either
current or in default. For each loan, the manager has data describing the person's annual
income and assets (both in thousands of dollars) and years of employment. The manager
wants to use this information to develop a rule for predicting whether a loan applicant
will default.
Current=1
Years of Default=0
Income Assets Employment Performance
44 105 10 1
26 109 19 1
39 120 12 1
50 139 20 1
42 84 9 1
35 120 13 0
28 84 10 0
37 114 5 0
26 109 15 0
33 114 10 1
37 150 5 0
30 144 4 0
32 75 15 1
32 135 8 0
42 135 4 0
33 94 13 1
33 124 7 0
25 135 14 0
1. Use a regression model for discriminant analysis of these data.
2. What proportion of the loans is properly classified by the model?
3. If an applicant has $40,000 annual income, $100,000 assets, and 11 years of
employment, what is the predicted performance: current or default?
Regression Models for
Cross-Sectional Data
25
25.1 CROSS-SECTIONAL REGRESSION CHECKLIST
Plot Y versus each X
1 Verify that the relationship agrees with your prior judgment, e.g., positive vs
negative relationship, linear vs nonlinear, strong vs weak
2 Identify outliers or unusual observations and decide whether to exclude
3 Determine whether the relationship is linear; if not, consider using a nonlinear
form, e.g., quadratic (include X and X^2 in the model)
Time
308 Chapter 26 Time Series Data and Forecasts
Positive Nonlinear
Positive Linear
Value
Negative Linear
No Trend
Time
Figure 26.3 Typical Quarterly Seasonal Time Series with Linear Trend
Value
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Quarter
26.1 Time Series Patterns 309
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
Quarter
Strong seasonal pattern, no trend during first 12 quarters, positive trend during middle 12
quarters, no trend during last 12 quarters
310 Chapter 26 Time Series Data and Forecasts
The first step is to examine a time sequence plot. Select the wage data, and use Excel's
Chart Wizard to create a Line chart type. Figure 27.1 shows the data and a plot of average
312 Chapter 27 Autocorrelation and Autoregression
hourly wages of textile and apparel workers for the 18 months from January 1986
through June 1987. These data are the last 18 values from the 72-value data file
APAWAGES.DAT that accompanies Cryer, second edition; the original source is Survey
of Current Business, September issues, 1981–1987.
The R-square value indicates that approximately 73% of the variation in wages can be
explained using a linear time trend. The regression model is Fitted Wage = 5.7709 +
0.0095 * Month, indicating that wages increase by 0.0095 dollars per month, on the
average. The t statistic and p-value verify that there is a significant linear relationship.
The R Square, t statistic, and p-value indicate an excellent fit, but the line fit plot shown
in Figure 27.3 shows that the regression model assumption of independent residuals may
be violated. When wages are above the linear time trend, they tend to stay above, and
when they are below the trend line, they tend to stay below. In other words, if the
previous residual is positive, the current residual is likely to be positive, and if the
previous residual is negative, the current residual is likely to be negative. Thus, the
residuals are not independent. Successive residuals in this model are positively
27.2 Durbin-Watson Statistic 313
correlated. This "stickiness" is positive autocorrelation, which can be quantified using the
Durbin-Watson statistic.
For the linear time trend model, the residuals are in cells F25:F42. In Figure 27.4, cell
H25 contains the following formula for computing the Durbin-Watson statistic:
=SUMXMY2(F26:F42,F25:F41)/SUMSQ(F25:F42)
In general, for time periods 1 through n, the first argument for SUMXMY2 is the range
containing residuals for periods 2 through n, and the second argument is the range for
residuals for periods 1 through n – 1. The argument for SUMSQ is the range containing
residuals for periods 1 through n.
The possible values of the Durbin-Watson statistic range from 0 to 4. Values close to 0
indicate strong positive autocorrelation; a value of 2 indicates zero autocorrelation;
values near 4 indicate strong negative autocorrelation. Here the value 1.050 shows that
there is some positive autocorrelation of residuals.
27.3 AUTOCORRELATION
The Durbin-Watson statistic measures autocorrelation of residuals associated with a
model. It is often useful to examine the correlation of time series values with themselves
before modeling. This approach looks at the correlation between current and previous
values. The previous values are called lagged values, and the number of time periods
between each current and previous value is the lag length. For example, values that are
one time period before the current values are called lag 1; values that are two periods
earlier are called lag 2.
The following steps describe how to construct an autocorrelation plot for lag 1.
1. Enter the month and wage data in columns A and B of a sheet as shown in
Figure 27.1 or copy previously entered data to a new sheet.
2. Select column B, right-click, and choose Insert from the shortcut menu.
3. Type the label Lag 1 in cell B1.
4. Select cells C2:C18 containing the first 17 wage values, right-click, and choose
Copy from the shortcut menu.
5. Select cell B3, right-click, and choose Paste from the shortcut menu. The top
section of the sheet appears as shown in Figure 27.5.
27.3 Autocorrelation 315
6. Select row 2, right-click, and choose Delete from the shortcut menu. The results
appear as shown in columns A, B, and C in Figure 27.6.
7. To calculate the correlation coefficient, enter the label CORREL= in cell F1
and enter the formula =CORREL(B2:B18,C2:C18) in cell G1. The value of the
correlation coefficient, r = 0.8545, appears in cell G1 as shown in Figure 27.6.
8. To prepare the chart, select cells B2:C18 and click the Chart Wizard button.
9. In step 1 (Chart Type) of the Chart Wizard on the Standard Types tab, click the
XY (Scatter) chart type and click Next. In step 2 (Chart Source Data), verify the
data range and click Next. In step 3 (Chart Options) on the Titles tab, type chart
and axis titles as shown in Figure 27.6; on the Gridlines tab, clear all
checkboxes; on the Legend tab, clear the checkbox for Show Legend and click
Finish.
10. To facilitate interpreting the autocorrelation plot, change its size and axes. Use
the handles on the outermost edge of the chart to obtain a nearly square shape.
For both the vertical axis and the horizontal axis, select the axis, double-click or
right-click and choose Format Axis from the shortcut menu, click the Scale tab,
change Minimum to 5.7, change Maximum to 6, change Major Unit to .05, and
click OK. Change font size of the axes and titles to 8. The result appears as
shown in Figure 27.6.
316 Chapter 27 Autocorrelation and Autoregression
The autocorrelation plot shown in Figure 27.6 shows relatively strong correlation
between current wage and one-month previous wage. When the wage is low in a
particular month, it is likely that it will be low in the following month; when the wage is
high in a particular month, it is likely to be high in the following month.
27.4 AUTOREGRESSION
A regression model may be used to quantify the functional relationship between current
and previous values of time sequence data. When regression is used to analyze data that
exhibit autocorrelation, the technique is called autoregression, and the model is called an
autoregressive model. If only one-period lagged data are used for the explanatory
variable, the model is called an AR(1) model.
To develop an AR(1) for the wage data, prepare the autocorrelation plot described in the
previous section. Right-click on a data point and choose Add Trendline from the shortcut
menu. In the Add Trendline dialog box, click the Type tab and click the Linear icon.
Click the Options tab and click the checkboxes for Display Equation on Chart and
Display R-squared Value on Chart. Then click OK. Optionally, click and drag to relocate
the equation and R2. The results appear as shown in Figure 27.7.
27.4 Autoregression 317
The linear fit equation could be written as Wage = 0.8253 + 0.86 * Lag 1, or Current =
0.8253 + 0.86 * Previous, or Yt = 0.8253 + 0.86 * Yt – 1. The R2 value indicates that
approximately 73% of the variation in wages can be explained using this simple linear
autoregressive model.
A forecast of wage for period 19 can be expressed as Y19 = 0.8253 + 0.86 * Y18 = 0.8253
+ 0.86 * 5.91 = 5.9079. A forecast for period 20 could be based on the forecast for period
19: Y20 = 0.8253 + 0.86 * Y19 = 0.8253 + 0.86 * 5.9079 = 5.9061. Of course, the likely
error increases for forecasts made further into the future. To quantify the error, to obtain
additional diagnostics, and to plot fitted and actual values in a time sequence plot, use the
Regression analysis tool.
If a blank sheet is needed, choose Worksheet from the Insert menu. Copy the data shown
in columns A, B, and C in Figure 27.7, select a blank worksheet, select cell A1, and
Paste. From the Tools menu, choose Data Analysis. In the Data Analysis dialog box,
click Regression in the Analysis Tools list box and click OK.
In the Regression dialog box, the Input Y Range is C1:C18 and the Input X Range is
B1:B18. Check the Labels checkbox. The Output Range is E1. Check the Residuals and
Residual Plots checkboxes. Then click OK. (If the error message "Cannot add chart to a
shared workbook" appears, click Cancel; in the Regression dialog box, click New
Workbook in the Output Options, and click OK.) The results are shown in Figure 27.8.
318 Chapter 27 Autocorrelation and Autoregression
Referring to cell F7 in Figure 27.8, the standard error of estimate for this AR(1) model is
0.03235, slightly larger than the standard error for the linear time trend model, 0.0319.
Thus, an approximate 95% prediction interval uses the previously calculated point
estimate plus or minus six cents (two standard errors = 2 * $0.03235 = $0.0647). The
residual plot, not shown here, has an essentially random pattern, indicating that the linear
relationship between wage and lag 1 is appropriate.
27.4 Autoregression 319
The following steps describe how to construct a time sequence plot showing actual and
fitted values.
1. Select C1:C18 and hold down the Control key while selecting F24:F41. Click
the Chart Wizard tool.
2. In step 1 of the Chart Wizard (Chart Type) on the Standard Types tab, select
Line for the chart type, select "Line with markers displayed at each data value"
for the chart sub-type, and click Next.
3. In step 2 (Chart Source Data) on the Series tab, select the range edit box for
Category (X) Axis Labels, click and drag cells A2:A18 on the worksheet, and
click Next.
4. In step 3 (Chart Options) on the Titles tab, type the chart and axis titles as shown
in Figure 27.9. On the Gridlines tab, uncheck all boxes and click Finish.
5. Select the horizontal axis and double-click, or right-click and choose Format
Axis from the shortcut menu. In the Format Axis dialog box, click the
Alignment tab, select the Degrees edit box, type 0 (zero), and click OK.
6. Select the Predicted Wage data series by clicking one of its markers on the chart.
Right-click and choose Format Data Series from the shortcut menu. In the
Format Data Series dialog box, click the Patterns tab. For Line, click the Custom
button and select the small dashed-line pattern from the Line Style drop-down
list box. Click OK.
7. Use the chart's fill handles to resize the chart to be approximately 8 standard
columns wide and 17 rows high. Change the font size of the chart title, axis
titles, axes, and legend to 8. The chart appears as shown in Figure 27.9.
320 Chapter 27 Autocorrelation and Autoregression
Each Predicted Wage value shown in Figure 27.9 depends upon the actual wage in the
previous month. The standard error of estimate is a summary measure of the vertical
distances between the actual wage and predicted wage for each month.
With cell C2 selected, double-click the fill handle in the lower-right corner.
With cells C2:C19 still selected, click the Decrease Decimal button repeatedly
until three decimal places are displayed.
4. Enter the labels Lag and ACF in cells E1 and F1, respectively. Enter the digits 1
through 6 in cells E2:E7. (Here we examine only the first 6 lags. For monthly
data where seasonality is expected, the first 12 lags should be investigated.)
5. Select cell F2. Enter the formula
=SUMPRODUCT(OFFSET(Z,E2,0,18-E2),OFFSET(Z,0,0,18-E2))/17.
With cell F2 selected, double-click the fill handle in the lower-right corner. With
cells F2:F7 still selected, click the Decrease Decimal button repeatedly until
three decimal places are displayed. The results appear as shown in columns A:F
in Figure 27.10. (To adapt the formula to other data, use the number of
observations, n, instead of 18, and use n–1 instead of 17.)
6. To create the correlogram, select cells F2:F7 and click the Chart Wizard tool.
7. In step 1 of the Chart Wizard (Chart Type), select Column as the chart type and
Clustered Column as the chart sub-type, and click Next. In step 2 (Chart Source
Data), verify the data range and click Next. In step 3 (Chart Options) on the
Titles tab, type the chart and axis titles shown in Figure 27.10; on the Gridlines
tab, clear all checkboxes; on the Legend tab, clear the checkbox for Show
Legend, and click Finish.
8. Double-click the vertical axis, or right-click and choose Format Axis from the
shortcut menu. In the Format Axis dialog box, click the Scale tab; click
Minimum and type –0.2; click Maximum and type 1; click Major Unit and type
0.2; click OK.
9. Double-click the horizontal axis, or right-click and choose Format Axis from the
shortcut menu. In the Format Axis dialog box, click the Patterns tab; in the Tick-
Mark Labels section click Low and click OK. The correlogram appears as
shown in Figure 27.10.
The lag 1 autocorrelation coefficient 0.822 shown in Figure 27.10 differs slightly from
the regular correlation coefficient 0.8545 for current and lag 1 shown in cell G1 in Figure
27.6. One of the reasons is that the autocorrelation coefficient uses z values for current
and lag based on the mean and standard deviation of all 18 observations, but the regular
correlation coefficient computes z values using the first 17 observations for current and
using the last 17 for lag. The autocorrelation coefficients for wages decrease gradually,
indicating that it may be worthwhile to investigate autoregressive models incorporating
lagged values beyond lag 1.
322 Chapter 27 Autocorrelation and Autoregression
6. Select rows 2 and 3. Choose Delete from the shortcut menu. Columns A through
D appear as shown in Figure 27.12.
After arranging the data, from the Tools menu choose Data Analysis. In the Data
Analysis dialog box, click Regression in the Analysis Tools list box and click OK. In the
Regression dialog box, the Input Y Range is D1:D17 and the Input X Range is B1:C17.
Check the Labels checkbox. The Output Range is F1. Optionally, select outputs in the
Residuals section and click OK. Formatted and edited results without the ANOVA table
are shown in Figure 27.12.
Compared to the AR(1) model, this AR(2) model has a slightly higher standard error of
estimate and a lower adjusted R2. The t statistic for the Lag 2 explanatory variable is
0.16251, indicating that the Lag 2 regression coefficient is not significantly different from
zero. After taking lag 1 into account, the addition of lag 2 is not useful for explaining the
variation in wages.
324 Chapter 27 Autocorrelation and Autoregression
EXERCISES
Exercise 27.1 (adapted from Keller, p. 930) As a preliminary step in forecasting future
values, a large mail-order retail outlet has recorded the sales figures, in millions of
dollars, shown in the following table.
Year Sales Year Sales
1974 6.7 1984 14.2
1975 7.4 1985 18.1
1976 8.5 1986 16.0
1977 11.2 1987 11.2
1978 12.5 1988 14.8
1979 10.7 1989 15.2
1980 11.9 1990 14.1
1981 11.4 1991 12.2
1982 9.8 1992 15.7
1983 11.5
1. Fit a linear time trend and compute the Durbin-Watson statistic.
2. Construct an autocorrelation plot and develop an autoregressive model.
3. Make forecasts for 1993 using the linear time trend and autoregressive model.
Exercise 27.2 The following table shows annual sales in thousands of units for a new
product from the Ekans company.
Year Sales Year Sales Year Sales
1980 36 1985 61 1990 79
1981 44 1986 63 1991 87
1982 52 1987 66 1992 97
1983 56 1988 69 1993 101
1984 58 1989 73 1994 103
1. Fit a linear time trend and compute the Durbin-Watson statistic.
2. Calculate values of the autocorrelation function for lags 1 through 6.
3. Try autoregressive models AR(1), AR(2), AR(3), and AR(4). Which of these models
is most appropriate?
Time Series Smoothing
28
This chapter describes two methods for smoothing time series data: moving averages and
exponential smoothing. The purpose of smoothing is to eliminate the irregular and
seasonal variation in the data so it's easier to see the long-run behavior of the time series.
The long-run pattern is called the trend, and it may also include variation due to the
business cycle. The smoothed version of the data may be used to make a forecast of
trend, or it may be used as part of the analysis of seasonality, as described in Chapter 29.
The data set used for moving averages in this chapter and for seasonal analysis in Chapter
29 is quarterly U.S. retail sales, in billions of dollars, from first quarter 1983 through
fourth quarter 1987. These data, shown in column C of Figure 28.1, are a quarterly
aggregation of the monthly data in the file RETAIL.DAT that accompanies the second
edition of Cryer; the original source is Survey of Current Business, 1987.
326 Chapter 28 Time Series Smoothing
The following steps describe how to construct a time sequence plot using two lines
(quarter and year) for labeling the horizontal axis.
1. Enter the labels Year, Quarter, and Sales in row 1 and enter the years, quarters,
and sales data in columns A, B, and C.
2. Select cells A1:C21 and click the Chart Wizard button.
3. In step 1 (Chart Type) of the Chart Wizard on the Standard Types tab, select
Line for chart type and "Line with markers displayed at each data value" for
chart sub-type. Click Next.
4. In step 2 (Chart Source Data), verify the data range and click Next.
5. In step 3 (Chart Options) on the Titles tab, type the chart and axis titles shown in
Figure 28.2. On the Gridlines tab, clear all checkboxes. On the Legend tab, clear
the checkbox for Show Legend. Click Finish.
6. Click and drag the sizing handles so that the chart is approximately 9 columns
wide and 20 rows high.
7. To change the font size of the chart title, axis titles, and axes to 7, select each
object, click the Font Size tool on the Formatting toolbar, and enter 7.
28.1 Moving Average Using Add Trendline 327
8. Select the vertical axis and click the Decrease Decimal button.
9. Double-click the vertical axis; in the Format Axis dialog box on the Scale tab,
enter 200 for the Minimum.
10. Double-click the horizontal axis; in the Format Axis dialog box on the
Alignment tab, enter 0 (zero) in the Degrees edit box. The chart appears as
shown in Figure 28.2.
Quarterly U.S. retail sales exhibit strong seasonality with an upward linear trend. A
moving average may be used to eliminate the seasonal variation so the trend is even more
apparent.
The first moving average shown in Figure 28.4 is an average of the first four quarters and
is associated with 1983 quarter IV. The period is specified as 4 in this example because
the repeating pattern is four quarters long. If the time series data are monthly, the period
is usually 12. If daily data have a recurring pattern each week, the period should be 7.
When the Add Trendline command is used to obtain the moving average, the default
pattern is a medium-weight line as shown in Figure 28.4. The style and weight of the line
may be changed by double-clicking on the moving average line, but it isn't possible to
add markers. Also, there is no way to access the values that Excel uses to plot the moving
average.
28.2 Moving Average Data Analysis Tool 329
3. Make entries in the Moving Average dialog box as shown in Figure 19.5. Then
click OK. (If you receive the error message "Cannot add chart to a shared
workbook," click the OK button. To construct a line chart, select the Sales and
Moving Average data and click the Chart Wizard button.) The output appears in
columns D and E, as shown in Figure 28.6.
4. Double-click the vertical axis. In the Format Axis dialog box on the Scale tab,
click the Minimum edit box and enter 200. The results appear as shown in
Figure 28.6.
330 Chapter 28 Time Series Smoothing
The Moving Average analysis tool puts formulas in the worksheet. Cell D5 contains the
formula =AVERAGE(C2:C5), cell D6 contains =AVERAGE(C3:C6), and so on. Each
average uses four values: the current sales and the three previous sales.
Cell E8 contains the formula =SQRT(SUMXMY2(C5:C8,D5:D8)/4). The
SUMXMY2(C5:C8,D5:D8) portion of this formula computes the difference between the
smoothed values in cells D5:D8 and the actual values in cells C5:C8, squares each of the
four differences, and sums the squared differences. Each of the standard error values in
column E is based on the four most recent values.
A simplistic forecasting model could use the last moving average, 376.8, as a forecast for
the next quarter's trend, with the standard error, 23.7, as a measure of uncertainty. A
forecast of the seasonal component could be combined with this trend forecast to obtain a
more accurate prediction of next quarter's sales.
The following steps describe how to use the Exponential Smoothing analysis tool without
specifying an initial smoothed value.
1. Enter the Quarter and Actual labels and data in columns A and B of a new
worksheet as shown in Figure 28.7. Enter the label Forecast in cell C1 and the
label StdError in cell D1.
332 Chapter 28 Time Series Smoothing
2. From the Tools menu, choose Data Analysis. In the Data Analysis dialog box,
click Exponential Smoothing in the Analysis Tools list box and click OK. The
Exponential Smoothing dialog box appears as shown in Figure 28.8.
3. Make entries in the Exponential Smoothing dialog box as shown in Figure 28.8.
Then click OK. The output appears in columns C and D, with the chart output to
the right. Adjust the size of the chart by clicking and dragging a handle on the
border to obtain the results shown in Figure 28.7.
The Exponential Smoothing analysis tool puts formulas in the worksheet. The actual
value in the first period is used as the forecast for the second period. That is, cell C3
contains the formula =B2. The forecast for the third period uses the actual value and
forecast from the second period in the recursive formula; cell C4 contains the formula
=0.1*B3+0.9*C3. In general, the forecast for a specific period is based on the actual and
forecast values from the previous period.
The damping factor specified here is 0.9, so the smoothing constant alpha is 0.1. To
obtain a forecast, the most recent actual value receives weight 0.1 in the recursive
formula. Because this weight is relatively small, the smoothed values respond very
slowly to changes in the actual values.
Cell D6 contains the formula =SQRT(SUMXMY2(B3:B5,C3:C5)/3). Each of the
standard error values in column D is based on the three previous actual values and
forecasts.
To obtain a forecast for quarter 19, a simplistic forecasting model could use the actual
and forecast values from quarter 18 in the recursive formula: 0.1 * 1.3 + 0.9 * 2.669 =
2.532. This forecast could be obtained by selecting cell C19 and dragging the fill handle
in the lower-right corner down to cell C20, which then contains the copied formula
=0.1*B19+0.9*C19, with the result 2.532.
Exercises 333
EXERCISES
Exercise 28.1 (adapted from Mendenhall, p. 635) The week's end closing prices for the
securities of the Color-Vision Company, a manufacturer of color television sets, have
been recorded over a period of 30 consecutive weeks as shown in the following table.
Week Price Week Price Week Price
1 $71 11 $75 21 $72
2 70 12 70 22 73
3 69 13 75 23 72
4 68 14 75 24 77
5 64 15 74 25 83
6 65 16 78 26 81
7 72 17 86 27 81
8 78 18 82 28 85
9 75 19 75 29 85
10 75 20 73 30 84
1. Determine the five-week moving average.
2. Use exponential smoothing with smoothing constant, alpha, of 0.1.
3. Use exponential smoothing with smoothing constant, alpha, of 0.5.
4. Which of the three smoothing results are most appropriate for detecting the long-
term trend for these data?
Exercise 28.2 (adapted from Mendenhall, p. 638) The following table shows gross
monthly sales revenue, in thousands of dollars, of a pharmaceutical company from
January 1989 through December 1992.
Year
Month 1989 1990 1991 1992
January 18.0 23.3 24.7 28.3
February 18.5 22.6 24.4 27.5
March 19.2 23.1 26.0 28.8
April 19.0 20.9 23.2 22.7
May 17.8 20.2 22.8 19.6
June 19.5 22.5 24.3 20.3
July 20.0 24.1 27.4 20.7
August 20.7 25.0 28.6 21.4
September 19.1 25.2 28.8 22.6
October 19.6 23.8 25.1 28.3
November 20.8 25.7 29.3 27.5
December 21.0 26.3 31.4 28.1
334 Chapter 28 Time Series Smoothing
The time series shown in Figure 29.1 has a strong seasonal pattern with an upward trend.
Sales are consistently highest in quarter IV of each year and lowest in quarter I. The trend
appears to be linear.
336 Chapter 29 Time Series Seasonality
The following steps describe how to develop a regression model with linear time trend
and seasonal indicator variables.
1. Enter the labels and data shown in Figure 29.2. (Enter 1 and 2 in cells D2:D3,
select D2:D3, and double-click the fill handle. Enter the zero-one pattern in cells
E2:H5, copy, and paste to cells E6, E10, E14, and E18.)
2. From the Tools menu, choose Data Analysis. In the Data Analysis dialog box,
select Regression from the Analysis Tools list box and click OK. The
Regression dialog box appears as shown in Figure 29.3.
29.1 Regression Using Indicator Variables 337
3. In the Regression dialog box, the Input Y Range is C1:C21 and the Input X
Range is D1:G21. (It is important to include only three of the four indicator
variables as x variables for the regression model.) Check the Labels box. Click
the Output Range option button, select the adjacent text box, and specify J1.
Check all checkboxes in the Residuals section. Then click OK. (If the error
message "Cannot add chart to a shared workbook" appears, click Cancel; in the
Regression dialog box, click New Workbook in the Output Options, and click
OK.) An edited portion of the regression output is shown in Figure 29.4.
The Coefficients section of the output in Figure 29.4 shows that the fitted equation is
Sales = 311.005 + 5.106*Time – 56.601*Winter – 19.387*Spring – 22.574*Summer.
After taking seasonality into account, retail sales increase by 5.106 billions of dollars per
quarter, on the average. The Fall quarter indicator variable was not included in the
regression input, so the Fall seasonal effect is included in the constant term 311.005. The
coefficient for the Winter indicator variable tells us that retail sales in the Winter quarter
are 56.601 billions of dollars less than sales in the Fall, on the average. Similarly, the
seasonal effect of Spring relative to Fall is measured by the –19.387 coefficient, and the
effect of Summer relative to Fall is measured by the –22.574 coefficient.
R square indicates that approximately 98.2% of the variation in retail sales can be
explained using linear time trend and seasonal indicators. The standard error of the
residuals is 6.089 billions of dollars, which may be loosely interpreted as the error
associated with predictions using this model. The absolute values of the t statistics are far
greater than two, and the related p-values are less than 0.0005, indicating significant
relationships between each explanatory variable and retail sales.
The Regression analysis tool's line fit plot for explanatory variable Time shows the actual
and fitted values in a time sequence plot. The following steps describe some
embellishments to obtain the chart shown in Figure 29.5.
4. Click and drag the chart sizing handles so that the chart is approximately 10
columns wide and 20 rows high. Change the font size to 10 for the chart title,
axis titles, axes, and legend.
5. Select the vertical axis. Double-click, or right-click and choose Format Axis
from the shortcut menu. In the Format Axis dialog box, click the Scale tab.
Click Minimum and type 200. Click Maximum and type 450. Click OK.
6. Select the horizontal axis. Double-click, or right-click and choose Format Axis
from the shortcut menu. In the Format Axis dialog box, click the Scale tab.
Click Minimum and type 1. Click Maximum and type 20. Click Major Unit and
type 1. Click OK.
7. Click one of the square markers associated with the Predicted Sales data series,
or use the up and down arrow keys to select the series. The formula bar shows
=SERIES("Predicted Sales",...). Double-click, or right-click and choose Format
Data Series from the shortcut menu. In the Format Data Series dialog box, click
the Patterns tab. Click Automatic for Line, click None for Marker, and click OK.
The chart appears as shown in Figure 29.5.
29.1 Regression Using Indicator Variables 339
A forecast of retail sales in quarter 21 (Winter 1988) is obtained by setting Time = 21,
Winter = 1, Spring = 0, and Summer = 0. Referring to the fitted equation,
predicted Sales = 311.005 + 5.106 * 21 – 56.601 * 1 – 19.387 * 0 – 22.574 * 0
= 311.005 + 107.226 – 56.601 – 0 – 0
= 361.63 billions of dollars.
Forecasts for individual quarters may be calculated in a similar manner.
To calculate fitted values and forecasts for a large number of quarters, the TREND
function is convenient. The following steps describe how to obtain fitted values for the
first 20 quarters and forecasts for the next 4 quarters.
8. Copy cells A18:B21 and paste into cell A22. Enter 1988 in cell A22.
9. Select cells D20:D21 and drag the fill handle down to cell D25.
10. Copy cells E18:H21 and paste into cell E22.
11. Enter the label Forecast in cell I1.
12. Select cells I2:I25. Click the Insert Function tool button (icon fx). In the Insert
Function dialog box, select Statistical in the category list box, select TREND in
the function list box, and click OK. In the TREND dialog box, fill in the dialog
box as shown in Figure 29.6 and click OK.
340 Chapter 29 Time Series Seasonality
13. With I2:I25 selected, press F2 (or click in formula bar). To array-enter the
formula, hold down the Control and Shift keys and press Enter. Click the
Decrease Decimal button to display one decimal place. The results appear as
shown in Figure 29.7.
The forecasts for the next four quarters are shown in cells I22:I25 in Figure 29.7. The
forecast for quarter 21 (Winter 1988) using TREND agrees with the value calculated
earlier using the fitted equation from the Regression analysis tool: 361.6 billions of
dollars.
The following steps describe how to prepare a time sequence plot showing the actual,
fitted, and forecast values.
14. Select cells C1:C25. Hold down the Control key and select I1:I25. Click the
Chart Wizard button.
15. In step 1 of the Chart Wizard (Chart Type) on the Standard Types tab, select
Line for chart type and "Line with markers displayed at each data value" for
chart sub-type. Click Next.
16. In step 2 (Chart Source Data) on the Series tab, select the range edit box for
Category (X) Axis Labels, and click and drag A2:B25. Click Next.
17. In step 3 (Chart Options) on the Titles tab, type the chart and axis labels shown
in Figure 29.8. On the Gridlines tab, clear all checkboxes. Click Finish.
18. Double-click the vertical axis, or right-click and choose Format Axis from the
shortcut menu. In the Format Axis dialog box, click the Scale tab; click
Minimum and type 200; click Maximum and type 500; click Major Unit and
type 50; click OK.
19. Double-click the horizontal axis, or right-click and choose Format Axis from the
shortcut menu. In the Format Axis dialog box on the Alignment tab, select the
Degrees edit box, type 0 (zero), and click OK.
20. Click on a data point or use the up and down arrow keys to select the actual
sales data series. Double-click, or right-click and choose Format Data Series
from the shortcut menu. In the Format Data Series dialog box, click the Patterns
tab. Click None for Line, click Automatic for Marker, and click OK.
21. Click on a data point or use the up and down arrow keys to select the forecast
data series. Double-click, or right-click and choose Format Data Series from the
shortcut menu. In the Format Data Series dialog box, click the Patterns tab.
Click Automatic for Line, click None for Marker, and click OK.
22. To format the chart as shown in Figure 29.8, click and drag the chart sizing
handles so that the chart is approximately 10 standard columns wide and 20
rows high. Change the font size to 8 for the chart title, axis titles, and legend.
Change the font size to 6 for the axes.
342 Chapter 29 Time Series Seasonality
11. Select rows 2:5. Right-click and choose Delete from the shortcut menu. The data
appear as shown in columns C:E in Figure 29.11.
12. From the Tools menu, choose Data Analysis. In the Data Analysis dialog box,
select Regression from the Analysis Tools list box and click OK. In the
Regression dialog box, the Input Y Range is E1:E17 and the Input X Range is
C1:D17. Check the Labels box. Click the Output Range option button, select the
adjacent text box, and specify H1. Check the Residuals checkbox in the
Residuals section. Then click OK. A portion of the regression output is shown in
Figure 29.11.
Rounded to four decimal places, the fitted equation is Sales = 87.5903 – 0.1198 * Lag1 +
0.9236 * Lag4. The t statistics and p-values indicate significant relationships, and R
29.2 AR(4) Model 345
square shows that approximately 97% of the variation in sales can be explained using the
lagged variables.
The standard error of this AR(4) model is 5.9 billions of dollars, very close to the
standard error of the model using indicator variables, 6.1 billions of dollars. The
following steps describe how to obtain forecasts for the next four quarters and a plot of
actual, fitted, and forecast values.
13. Copy cells A14:B17 and paste into cell A18. Enter 1988 in cell A18.
14. Enter the label Forecast in cell F1.
15. The Predicted Sales values from regression output appear below the Summary
Output. Copy cells I26:I41 into cell F2.
16. Select cell E18. Enter the formula =I$17+I$18*E17+I$19*E14. Click the fill
handle and drag down to cell E21. The results appear as shown in Figure 29.12.
17. Select cells E18:E21. Move the mouse pointer near the edge of the selected
region until the pointer becomes an arrow. Click and drag right to column F.
(Alternatively, cut E18:E21 and paste special values to F18.) The results appear
as shown in Figure 29.13.
346 Chapter 29 Time Series Seasonality
18. To prepare a line chart, select cells E1:F21 and click the Chart Wizard button. In
step 2 (Chart Source Data), select the range edit box for Category (X) Axis
Labels, and click and drag cells A2:B21.
19. Details for the Chart Wizard steps and formatting are described in steps 15
through 22 in section 29.1. The results appear as shown in Figure 29.14.
29.3 Classical Time Series Decomposition 347
3. Select cell D4 and enter the formula =AVERAGE(C2:C5). This average of the
first four quarters is actually associated with a time point located between the
second and third quarters. Because it is located on the row of the third quarter, it
is labeled "Early_MA."
4. Select cell E4 and enter the formula =AVERAGE(C3:C6). This average of the
second through fifth quarters is actually associated with a time point located
between the third and fourth quarters. Since it is located on the row of the third
quarter, it is labeled "Late_MA."
5. Select cell F4 and enter the formula =AVERAGE(D4:E4). This average of the
Early_MA and Late_MA is centered on the third quarter.
6. Select cells D4:F4. Click the fill handle in the lower-right corner of the selection
and drag down to cell F19. Format the extended selection to display one decimal
place. The results appear as shown in Figure 29.15.
7. To chart the moving average, select cells C1:C25. Hold down the Control key
and select cells F1:F25. Click the Chart Wizard button.
29.3 Classical Time Series Decomposition 349
8. In step 1 of the Chart Wizard (Chart Type) on the Standard Types tab, select
Line for chart type and "Line with markers displayed at each data value" for
chart sub-type; click Next. In step 2 (Chart Source Data) on the Series tab, select
the range edit box for Category (X) Axis Labels, and click and drag A2:B25;
click Next. In step 3 (Chart Options) on the Titles tab, type the chart and axis
labels shown in Figure 29.16; on the Gridlines tab, clear all checkboxes; click
Finish.
9. To format the chart, double-click the vertical axis, or right-click and choose
Format Axis from the shortcut menu. In the Format Axis dialog box, click the
Scale tab; click Minimum and type 200; click Maximum and type 500; click
Major Units and type 50; click OK.
10. Double-click the horizontal axis, or right-click and choose Format Axis from the
shortcut menu. In the Format Axis dialog box on the Alignment tab, select the
Degrees edit box, type 0 (zero), and click OK.
11. Click on a data point to select the centered moving average data series. Double-
click, or right-click and choose Format Data Series from the shortcut menu. In
the Format Data Series dialog box, click the Patterns tab. Click Automatic for
Line, click None for Marker, and click OK.
12. To display all labels on the horizontal axis, click and drag the sizing handles to
make the chart wider. Also, select a smaller font size for the axes, axis titles, and
legend. The results are shown in Figure 29.16.
13. Enter the labels Ratio, AvgRatio, and Standard in cells G1:I1.
14. Select cell G4. Enter the formula =C4/F4. With cell G4 selected, click the fill
handle and drag down to cell G19. The results appear as shown in column G in
Figure 29.17. These numbers are the ratio of actual sales to the moving average.
For example, the number 1.0748 in cell G5 indicates that actual sales in that
particular fourth quarter were 107.48% of the average sales during the year.
15. Select cell H2 and enter the formula =AVERAGE(G6,G10,G14,G18). With
cell H2 selected, click the fill handle and drag down to cell H3.
16. Select cell H4 and enter the formula =AVERAGE(G4,G8,G12,G16). With cell
H4 selected, click the fill handle and drag down to cell H5. The results are
shown in column H in Figure 29.17. These formulas summarize the ratios for a
particular quarter for all years. For example, the value 1.0175 (approximately
1.02) in cell H3 indicates that sales in the second quarter are typically 2% above
the annual average. If the set of ratios in column G for a particular quarter has
outliers, these summaries in column H could use the MEDIAN or TRIMMEAN
functions.
17. Select cell H6 and click the AutoSum tool twice.
18. The base for an index is 1.00, so the four prospective indexes should sum to 4.
To modify the average ratios so that they sum to 4, select cell I2 and enter the
formula =H2*4/$H$6. With cell I2 selected, click the fill handle and drag down
to cell I5.
19. Select cell I6 and click the AutoSum tool twice. The seasonal indexes in column
I sum to 4 as shown in Figure 29.17.
One use for the seasonal indexes shown in cells I2:I5 in Figure 29.17 is to seasonally
adjust historical data. The multiplicative model is Valuet = Trendt * Seasonalt * Randomt,
so if an original value is divided by the seasonal index, the result has only trend and
random components remaining. Successive seasonally adjusted values can be compared
to detect changes in the long-run behavior of the time series.
A second use is to combine the seasonal index with a forecast of trend to obtain a forecast
of value. The trend forecast may be obtained by extrapolating the moving average or
using a regression model. The following steps describe how to seasonally adjust the
historical data, extrapolate the linear time trend of the adjusted values four quarters, and
multiply the extrapolated trend by the appropriate seasonal index to obtain the forecasts.
29.3 Classical Time Series Decomposition 351
20. Enter the labels Index, Trend, and Forecast in cells J1:L1.
21. Select cells I2:I5 and click the Copy button (or right-click and choose Copy
from the shortcut menu). Select cell J2, right-click, and choose Paste Special
from the shortcut menu. In the Paste Special dialog box, select Values for Paste
and None for Operation. Leave the Skip Blanks and Transpose checkboxes clear
and click OK.
22. Copy the values in cells J2:J5 and paste into cells J6, J10, J14, J18, and J22.
23. Select cell K2 and enter the formula =C2/J2. With cell K2 selected, click the fill
handle and drag down to cell K21. The values in cells K2:K21 are the seasonally
adjusted historical data.
24. With cells K2:K21 selected, right-click and choose Copy from the shortcut
menu. With cells K2:K21 still selected, right-click and choose Paste Special
from the shortcut menu. In the Paste Special dialog box, select Values for Paste
and None for Operation. Leave the Skip Blanks and Transpose checkboxes clear
and click OK.
25. With cells K2:K21 selected, click the fill handle in the lower-right corner of cell
21 and drag down to cell K25. The results are shown in column K in Figure
29.18. When Excel's AutoFill is used in this manner, the series of numbers in
K2:K21 is extended using a linear trend. The same results could be obtained
352 Chapter 29 Time Series Seasonality
using the values 1 through 20 as explanatory variables for fitting simple linear
regression and using the values 21 through 24 for predictions.
26. To chart the actual sales, seasonally adjusted sales, and the linear extrapolation,
select cells C1:C25, hold down the Control key, and select cells K1:K25. Click
the Chart Wizard, prepare a line chart, and format using steps 8 through 12 in
this section. The result is shown in Figure 29.19.
27. To combine the trend and seasonal components in the forecasts, select cell L22
and enter the formula =J22*K22. With cell L22 selected, double-click the fill
handle. The results appear as shown in Figure 29.18.
28. To chart the actual sales and forecasts, select cells C1:C25, hold down the
Control key, and select cells L1:L25. Click the Chart Wizard, prepare a line
chart, and format using steps 8 through 12 in this section. The result is shown in
Figure 29.20.
The three methods analyze seasonality using different models, so there are some
differences in the results, as shown in Figure 29.21.
The additive model using linear time trend and seasonal indicator variables and the
multiplicative model using classical time series decomposition have very similar results.
For these particular data, the autoregressive model produces forecasts that are
consistently below the results of the other models; the autoregressive model using lag 1
and lag 4 would be more appropriate for seasonal data with a long-term meandering
pattern.
EXERCISES
Exercise 29.1 (adapted from Mendenhall, p. 647) The following table shows quarterly
earnings, in millions of dollars, for a multimedia communications firm for the years 1984
through 1989.
Year
Quarter 1984 1985 1986 1987 1988 1989
1 302.2 426.5 504.2 660.9 743.6 1043.6
2 407.3 451.5 592.4 706.0 774.5 1037.8
3 483.3 543.9 647.9 751.3 915.7 1167.6
4 463.2 590.5 726.4 758.6 1013.4 1345.3
1. Construct a time sequence plot of the quarterly earnings.
2. Develop a regression model using linear time trend and quarterly indicator variables.
Make forecasts for the next four quarters.
3. Develop a regression model using quadratic time trend and quarterly indicator
variables. Make forecasts for the next four quarters.
4. Develop an AR(4) model. Make forecasts for the next four quarters.
5. Use classical time series decomposition to obtain seasonal indexes.
Exercise 29.2 (adapted from Mendenhall, p. 646) Texas Chemical Products manufactures
an agricultural chemical that is applied to farmlands after crops have been harvested.
Because the chemical tends to deteriorate in storage, Texas Chemical cannot stockpile
quantities in advance of the winter season demand for the product. The following table
shows sales of the product, in thousands of pounds, over four consecutive years.
Exercises 355
Year
Month 1 2 3 4
January 123 134 144 145
February 130 146 159 146
March 157 174 168 164
April 155 163 153 158
May 161 176 179 182
June 169 154 164 169
July 142 166 160 166
August 157 168 170 174
September 169 166 160 166
October 185 223 208 215
November 209 238 221 213
December 238 252 244 258
1. Construct a time sequence plot of the monthly sales.
2. Develop a regression model using linear time trend and monthly indicator variables.
Make forecasts for the next 12 months.
3. Develop an AR(12) model. Make forecasts for the next 12 months.
4. Use classical time series decomposition to obtain seasonal indexes.
356 Chapter 29 Time Series Seasonality
time series forecast usually extrapolates beyond the original range of data, so the
standard error of estimate is a minimum indication of the uncertainty
surrounding a forecast.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Time
360 Chapter 30 Regression Models for Time Series Data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Time
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Time
Part 5 Constrained Optimization
Graphical Solution
Constraints
Feasible region
Corner points (extreme points)
Objective function value at each corner point
Total enumeration vs. simplex algorithm (search)
Optimal solution
Sensitivity Analysis
Post-optimality analysis and interpretation of computer print-outs
Shadow price (a marginal value)
(Excel Solver Sensitivity Report, Constraints section, “Shadow Price”)
The shadow price for a particular constraint is the amount of change in the value
of the objective function corresponding to a unit change in the right-hand-side
value of the constraint.
364 Chapter 31 Product Mix Optimization
A B C D E F G
1 Small Example 1: Product mix problem
2 Your company manufactures TVs and stereos, using a common parts inventory
3 of power supplies, speaker cones, etc. Parts are in limited supply and you must
4 determine the most profitable mix of products to build.
5
6 TV set Stereo RHS
7 Number to Build-> 250 100 Used Available Slack
8 Part Name Chassis 1 1 350 450 100
9 Picture Tube 1 0 250 250 0
10 Speaker Cone 2 2 700 800 100
11 Power Supply 1 1 350 450 100
12 Electronics 2 1 600 600 0
13 Profit
14 Per Unit $75 $50
15 By Product $18,750 $5,000
16 Total $23,750
600
Five Constraints
500
Speaker Cone
300
Picture Tube
200
0
0 100 200 300 400 500 600 700
Number of Stereos
31.2 Basic Product Mix Problem 367
Adjustable Cells
Cell Name Original Value Final Value
$C$7 Number to Build-> TV set 250 200
$D$7 Number to Build-> Stereo 100 200
Constraints
Cell Name Cell Value Formula Status Slack
$E$8 Chassis Used 400 $E$8<=$F$8 Not Binding 50
$E$9 Picture Tube Used 200 $E$9<=$F$9 Not Binding 50
$E$10 Speaker Cone Used 800 $E$10<=$F$10 Binding 0
$E$11 Power Supply Used 400 $E$11<=$F$11 Not Binding 50
$E$12 Electronics Used 600 $E$12<=$F$12 Binding 0
Constraints
Final Shadow Constraint Allowable Allowable
Cell Name Value Price R.H. Side Increase Decrease
$E$8 Chassis Used 400 $0.00 450 1E+30 50
$E$9 Picture Tube Used 200 $0.00 250 1E+30 50
$E$10 Speaker Cone Used 800 $12.50 800 100 100
$E$11 Power Supply Used 400 $0.00 450 1E+30 50
$E$12 Electronics Used 600 $25.00 600 50 200
370 Chapter 31 Product Mix Optimization
Product
Department Chair Bench Table Present Capacity
Bending 1.2 1.7 1.2 1,000 hours
Welding 0.8 0.0 2.3 1,200 hours
The profit contribution that Outdoors receives from manufacture and sale of one unit of
each product is $3 for a chair, $3 for a bench, and $5 for a table.
The company is trying to plan its production mix for the current selling season. They feel
that they can sell any number they produce, but unfortunately production is further
limited by available material because of a prolonged strike. The company currently has
on hand 2,000 pounds of tubing. The three products require the following amounts of
this tubing: 2 pounds per chair, 3 pounds per bench, and 4.5 pounds per table.
In order to determine the optimal product mix, the production manager has formulated
the linear programming problem as shown below.
31.3 Outdoors Problem 371
Product
Chair Bench Table
Contribution $3 $3 $5
A. The inventory manager suggests that the company produce 200 units of each
product. Is the plan to produce 200 units of each product a feasible plan, i.e.,
does it satisfy all contraints? If not, which constraints are not satisfied?
B. If the company produces 200 chairs, 200 benches, and 200 tables, how much
tubing, if any, will be left over?
Each of the following questions refer to the solution of the original linear programming
problem.
C. A local manufacturing firm has excess capacity in its welding department and
has offered to sell 100 hours of welding time to Outdoors for $3 per hour. This
arrangement would cost $300 and would increase welding capacity from 1,200
hours to 1,300 hours. Should Outdoors purchase the additional welding
capacity? Why or why not?
D. The marketing manager thinks that the original estimate of $3 profit contribution
per chair should be changed to $2.50 per chair. Should the production manager
solve the linear programming problem again using the $2.50 value, or should
Outdoors go ahead with the plan to produce 700 chairs, zero benches, and 133
tables? Why or why not?
E. A local metal products distributor has offered to sell Outdoors some additional
metal tubing for 60 cents per pound. Should Outdoors buy additional tubing at
this price? If so, how much would their contribution increase if they bought 500
pounds and used it in an optimal fashion?
F. The R&D department has been redesigning the bench to make it more
profitable. The new design will require 1.1 hours of tube bending time, 2 hours
of welding time, and 2.0 pounds of metal tubing. If they can sell one unit of this
bench with a unit contribution of $3, what effect will it have on overall
contribution?
G. Marketing has suggested a new patio awning that would require 1.8 hours of
tube bending time, 0.5 hours of welding time, and 1.3 pounds of metal tubing.
372 Chapter 31 Product Mix Optimization
What contribution must this new product have to make it attractive to produce
this season?
H. Outdoors, Inc., has a chance to sell some of its capacity in tube bending at a
price of $1.50 per hour. If it sells 200 hours at that price, how will this affect
contribution?
I. If Outdoors, Inc., feels that it must produce benches to round out its production
line, what effect will production of benches have on overall contribution?
Adapted from Vatter et al., Quantitative Methods in Management, Irwin, 1978.
Spreadsheet Model
Solver Reports
Adjustable Cells
Cell Name Original Value Final Value
$C$3 Number to Build-> Chair 100 700
$D$3 Number to Build-> Bench 100 0
$E$3 Number to Build-> Table 100 133.33
Constraints
Cell Name Cell Value Formula Status Slack
$F$4 Tube Bending Used 1000 $F$4<=$G$4 Binding 0
$F$5 Welding Used 866.67 $F$5<=$G$5 Not Binding 333.33
$F$6 Tubing Used 2000 $F$6<=$G$6 Binding 0
374 Chapter 31 Product Mix Optimization
Constraints
Final Shadow Constraint Allowable Allowable
Cell Name Value Price R.H. Side Increase Decrease
$F$4 Tube Bending Used 1000 $1.167 1000 200 466.67
$F$5 Welding Used 866.67 $0.00 1200 1E+30 333.33
$F$6 Tubing Used 2000 $0.80 2000 555.56 333.33
Modeling Marketing
Decisions
32
32.1 ALLOCATING ADVERTISING EXPENDITURES
Figure 32.1 Quick Tour
A B C D E F G H I
1 Quick Tour of Microsoft Excel Solver
2 Month Q1 Q2 Q3 Q4 Total
3 Seasonality 0.9 1.1 0.8 1.2
4
5 Units Sold 3,592 4,390 3,192 4,789 15,962
6 Sales Revenue $143,662 $175,587 $127,700 $191,549 $638,498
7 Cost of Sales 89,789 109,742 79,812 119,718 399,061
8 Gross Margin 53,873 65,845 47,887 71,831 239,437
9
10 Salesforce 8,000 8,000 9,000 9,000 34,000
11 Advertising 10,000 10,000 10,000 10,000 40,000
12 Corp Overhead 21,549 26,338 19,155 28,732 95,775
13 Total Costs 39,549 44,338 38,155 47,732 169,775
14
15 Prod. Profit $14,324 $21,507 $9,732 $24,099 $69,662
16 Profit Margin 10% 12% 8% 13% 11%
17
18 Product Price $40.00
19 Product Cost $25.00
20
21 The following exam ples show you how to work with the m odel above to solve for one value or several
22 values to m axim ize or m inim ize another value, enter and change constraints, and save a problem m odel.
23
376 Chapter 32 Modeling Marketing Decisions
23
24 Row Contains Explanation
25 3 Fixed values Seasonality factor: sales are higher in quarters 2 and 4,
26 and lower in quarters 1 and 3.
27
28 5 =35*B3*(B11+3000)^ 0.5 Forecast for units sold each quarter: row 3 contains
29 the seasonality factor; row 11 contains the cost of
30 advertising.
31
32 6 =B5*$B$18 Sales revenue: forecast for units sold (row 5) tim es
33 price (cell B18).
34
35 7 =B5*$B$19 Cost of sales: forecast for units sold (row 5) tim es
36 product cost (cell B19).
37
38 8 =B6-B7 Gross m argin: sales revenues (row 6) m inus cost of
39 sales (row 7).
40
41 10 Fixed values Sales personnel expenses.
42
43 11 Fixed values Advertising budget (about 6.3% of sales).
44
45 12 =0.15*B6 Corporate overhead expenses: sales revenues (row 6)
46 tim es 15%.
47
32.1 Allocating Advertising Expenditures 377
A B C D E F G H I
48 13 =SUM(B10:B12) Total costs: sales personnel expenses (row 10) plus
49 advertising (row 11) plus overhead (row 12).
50
51 15 =B8-B13 Product profit: gross m argin (row 8) m inus total costs
52 (row 13).
53
54 16 =B15/ B6 Profit m argin: profit (row 15) divided by sales revenue
55 (row 6).
56
57 18 Fixed values Product price.
58
59 19 Fixed values Product cost.
60
61 This is a typical m arketing m odel that shows sales rising from a base figure (perhaps due to the sales
62 personnel) along with increases in advertising, but with dim inishing returns. For exam ple, the first
63 $5,000 of advertising in Q1 yields about 1,092 increm ental units sold, but the next $5,000 yields only
64 about 775 units m ore.
65
66 You can use Solver to find out whether the advertising budget is too low, and whether advertising
67 should be allocated differently over tim e to take advantage of the changing seasonality factor.
68
69 Solving for a Value to Maximize Another Value
70 One way you can use Solver is to determ ine the m axim um value of a cell by changing another cell. The
71 two cells m ust be related through the form ulas on the worksheet. If they are not, changing the value in
72 one cell will not change the value in the other cell.
73
74 For exam ple, in the sam ple worksheet, you want to know how m uch you need to spend on advertising
75 to generate the m axim um profit for the first quarter. You are interested in m axim izing profit by changing
76 advertising expenditures.
77
78
On the Tools m enu, click Solver. In the Set target cell box, type b15 or
79 select cell B15 (first-quarter profits) on the worksheet. Select the Max option.
80 In the By changing cells box, type b11 or select cell B11 (first-quarter advertising)
81 on the worksheet. Click Solve.
82
83 You will see m essages in the status bar as the problem is set up and Solver starts working. After a
84 m om ent, you'll see a m essage that Solver has found a solution. Solver finds that Q1 advertising of
85 $17,093 yields the m axim um profit $15,093.
86
87
After you exam ine the results, select Restore original values and click OK to
88 discard the results and return cell B11 to its form er value.
89
90 Resetting the Solver Options
91
92 If you want to return the options in the Solver Parameters dialog box to their original settings so that
93 you can start a new problem , you can click Reset All.
94
378 Chapter 32 Modeling Marketing Decisions
A B C D E F G H I
95 Solving for a Value by Changing Several Values
96
97 You can also use Solver to solve for several values at once to m axim ize or m inim ize another value. For
98 exam ple, you can solve for the advertising budget for each quarter that will result in the best profits for
99 the entire year. Because the seasonality factor in row 3 enters into the calculation of unit sales in row 5
100 as a m ultiplier, it seem s logical that you should spend m ore of your advertising budget in Q4 when the
101 sales response is highest, and less in Q3 when the sales response is lowest. Use Solver to determ ine
102 the best quarterly allocation.
103
104
On the Tools m enu, click Solver. In the Set target cell box, type f15 or select
105 cell F15 (total profits for the year) on the worksheet. Make sure the Max option is
106 selected. In the By changing cells box, type b11:e11 or select cells B11:E11
107 (the advertising budget for each of the four quarters) on the worksheet. Click Solve.
108
109
After you exam ine the results, click Restore original values and click OK to
110 discard the results and return all cells to their form er values.
111
112 You've just asked Solver to solve a m oderately com plex nonlinear optim ization problem ; that is, to find
113 values for the four unknowns in cells B11 through E11 that will m axim ize profits. (This is a nonlinear
114 problem because of the exponentiation that occurs in the form ulas in row 5). The results of this
115 unconstrained optim ization show that you can increase profits for the year to $79,706 if you spend
116 $89,706 in advertising for the full year.
117
118 However, m ost realistic m odeling problem s have lim iting factors that you will want to apply to certain
119 values. These constraints m ay be applied to the target cell, the changing cells, or any other value that
120 is related to the form ulas in these cells.
121
122 Adding a Constraint
123
124 So far, the budget recovers the advertising cost and generates additional profit, but you're reaching a
125 point of dim inishing returns. Because you can never be sure that your m odel of sales response to
126 advertising will be valid next year (especially at greatly increased spending levels), it doesn't seem
127 prudent to allow unrestricted spending on advertising.
128
129 Suppose you want to m aintain your original advertising budget of $40,000. Add the constraint to the
130 problem that lim its the sum of advertising during the four quarters to $40,000.
131
132
On the Tools m enu, click Solver, and then click Add. The Add Constraint
133 dialog box appears. In the Cell reference box, type f11 or select cell F11
134 (advertising total) on the worksheet. Cell F11 m ust be less than or equal to $40,000.
135 The relationship in the Constraint box is <= (less than or equal to) by default, so
136 you don't have to change it. In the box next to the relationship, type 40000. Click
137 OK, and then click Solve.
138
139
After you exam ine the results, click Restore original values and then click OK
140 to discard the results and return the cells to their form er values.
141
32.1 Allocating Advertising Expenditures 379
A B C D E F G H I
142 The solution found by Solver allocates am ounts ranging from $5,117 in Q3 to $15,263 in Q4. Total
143 Profit has increased from $69,662 in the original budget to $71,447, without any increase in the
144 advertising budget.
145
146 Changing a Constraint
147
148 When you use Microsoft Excel Solver, you can experim ent with slightly different param eters to decide
149 the best solution to a problem . For exam ple, you can change a constraint to see whether the results
150 are better or worse than before. In the sam ple worksheet, try changing the constraint on advertising
151 dollars to $50,000 to see what that does to total profits.
152
153
On the Tools m enu, click Solver. The constraint, $ F$ 11<=40000, should
154 already be selected in the Subject to the constraints box. Click Change. In
155 the Constraint box, change 40000 to 50000. Click OK, and then click Solve.
156 Click Keep solver solution and then click OK to keep the results that are
157 displayed on the worksheet.
158
159 Solver finds an optim al solution that yields a total profit of $74,817. That's an im provem ent of $3,370
160 over the last figure of $71,447. In m ost firm s, it's not too difficult to justify an increm ental investm ent of
161 $10,000 that yields an additional $3,370 in profit, or a 33.7% return on investm ent. This solution also
162 results in profits of $4,889 less than the unconstrained result, but you spend $39,706 less to get there.
163
164 Saving a Problem Model
165
166 When you click Save on the File m enu, the last selections you m ade in the Solver Parameters
167 dialog box are attached to the worksheet and retained when you save the workbook. However, you
168 can define m ore than one problem for a worksheet by saving them individually using Save Model in
169 the Solver Options dialog box. Each problem m odel consists of cells and constraints that you
170 entered in the Solver Parameters dialog box.
171
172 When you click Save Model, the Save Model dialog box appears with a default selection, based
173 on the active cell, as the area for saving the m odel. The suggested range includes a cell for each
174 constraint plus three additional cells. Make sure that this cell range is an em pty range on the
175 worksheet.
176
177
On the Tools m enu, click Solver, and then click Options. Click Save Model.
178 In the Select model area box, type h15:h18 or select cells H15:H18 on the
179 worksheet. Click OK.
180
181 Note You can also enter a reference to a single cell in the Select model area box. Solver will use
182 this reference as the upper-left corner of the range into which it will copy the problem specifications.
183
184
185 To load these problem specifications later, click Load Model on the Solver Options dialog box,
186 type h15:h18 in the Model area box or select cells H15:H18 on the sam ple worksheet, and then
187 click OK. Solver displays a m essage asking if you want to reset the current Solver option settings with
188 the settings for the m odel you are loading. Click OK to proceed.
380 Chapter 32 Modeling Marketing Decisions
Profit Prod.
Margin 16 Profit 15
Gross
Margin 8
Units
Sold 5
23
24 Problem Specifications
25
26 Target Cell D18 Goal is to m axim ize profit.
27
28 Changing cells D9:F9 Units of each product to build.
29
30 Constraints C11:C15<=B11:B15 Num ber of parts used m ust be less than or
31 equal to the num ber of parts in inventory.
32
33 D9:F9>=0 Num ber to build value m ust be greater than or
34 equal to 0.
35
36 The form ulas for profit per product in cells D17:F17 include the factor ^ H15 to show that profit per unit
37 dim inishes with volum e. H15 contains 0.9, which m akes the problem nonlinear. If you change H15 to
38 1.0 to indicate that profit per unit rem ains constant with volum e, and then click Solve again, the
39 optim al solution will change. This change also m akes the problem linear.
Integer-Valued
Optimization Models
34
34.1 TRANSPORTATION PROBLEM
Figure 34.1 Transportation Problem
A B C D E F G H I
1 Example 2: Transportation Problem.
2 Minim ize the costs of shipping goods from production plants to warehouses near m etropolitan dem and
3 centers, while not exceeding the supply available from each plant and m eeting the dem and from each
4 m etropolitan area.
5
6 Number to ship from plant x to warehouse y (at intersection):
7 Plants: Total San Fran Denver Chicago Dallas New York
8 S. Carolina 5 1 1 1 1 1
9 Tennessee 5 1 1 1 1 1
10 Arizona 5 1 1 1 1 1
11 --- --- --- --- ---
12 Totals: 3 3 3 3 3
13
14 Demands by Whse --> 180 80 200 160 220
15 Plants: Supply Shipping costs from plant x to warehouse y (at intersection):
16 S. Carolina 310 10 8 6 5 4
17 Tennessee 260 6 5 4 3 6
18 Arizona 280 3 4 5 5 9
19
20 Shipping: $ 83 $19 $17 $15 $13 $19
21
22 The problem presented in this m odel involves the shipm ent of goods from three plants to five regional
23 warehouses. Goods can be shipped from any plant to any warehouse, but it obviously costs m ore to
24 ship goods over long distances than over short distances. The problem is to determ ine the am ounts
25 to ship from each plant to each warehouse at m inim um shipping cost in order to m eet the regional
26 dem and, while not exceeding the plant supplies.
27
384 Chapter 34 Integer-Valued Optimization Models
27
28 Problem Specifications
29
30 Target cell B20 Goal is to m inim ize total shipping cost.
31
32 Changing cells C8:G10 Am ount to ship from each plant to each
33 warehouse.
34
35 Constraints B8:B10<=B16:B18 Total shipped m ust be less than or equal to
36 supply at plant.
37
38 C12:G12>=C14:G14 Totals shipped to warehouses m ust be greater
39 than or equal to dem and at warehouses.
40
41 C8:G10>=0 Num ber to ship m ust be greater than or equal
42 to 0.
43
44 You can solve this problem faster by selecting the Assume linear model check box in the Solver
45 Options dialog box before clicking Solve. A problem of this type has an optim um solution at which
46 am ounts to ship are integers, if all of the supply and dem and constraints are integers.
A B C D E F G H I J
34 Problem Specifications
35
36 Target cell H8 Goal is to m axim ize interest earned.
37
38 Changing cells B14:G14 Dollars invested in each type of CD.
39 B15, E15, B16
40
41 Constraints B14:G14>=0 Investm ent in each type of CD m ust be greater than
42 B15:B16>=0 or equal to 0.
43 E15>=0
44
45 B18:H18>=100000 Ending cash m ust be greater than or equal to
46 $100,000.
47
48 The optim al solution determ ined by Solver earns a total interest incom e of $16,531 by investing as m uch as
49 possible in six-m onth and three-m onth CDs, and then turns to one-m onth CDs. This solution satisfies all of the
50 constraints.
51
52 Suppose, however, that you want to guarantee that you have enough cash in m onth 5 for an equipm ent
53 paym ent. Add a constraint that the average m aturity of the investm ents held in m onth 1 should not be m ore
54 than four m onths.
55
56 The form ula in cell B20 com putes a total of the am ounts invested in m onth 1 (B14, B15, and B16), weighted
57 by the m aturities (1, 3, and 6 m onths), and then it subtracts from this am ount the total investm ent, weighted by
58 4. If this quantity is zero or less, the average m aturity will not exceed four m onths. To add this constraint,
59 restore the original values and then click Solver on the Tools m enu. Click Add. Type b20 in the Cell
60 Reference box, type 0 in the Constraint box, and then click OK. To solve the problem , click Solve.
61
62 To satisfy the four-m onth m aturity constraint, Solver shifts funds from six-m onth CDs to three-m onth CDs. The
63 shifted funds now m ature in m onth 4 and, according to the present plan, are reinvested in new three-m onth
64 CDs. If you need the funds, however, you can keep the cash instead of reinvesting. The $56,896 turning
65 over in m onth 4 is m ore than sufficient for the equipm ent paym ent in m onth 5. You've traded about $460 in
66 interest incom e to gain this flexibility.
35.2 Work Cap Alternate Formulations 391
A B C D E F G H I J K
47 Problem Specifications
48
49 Target cell E18 Goal is to m axim ize portfolio return.
50
51 Changing cells E10:E14 Weight of each stock.
52
53 Constraints E10:E14>=0 Weights m ust be greater than or equal to 0.
54
55 E16=1 Weights m ust equal 1.
56
57 G18<=0.071 Variance m ust be less than or equal to 0.071.
58
59 Beta for each stock B10:B13
60
61 Variance for each stock C10:C13
62
63 Cells D21:D29 contain the problem specifications to m inim ize risk for a required rate of return of 16.4
64 percent. To load these problem specifications into Solver, click Solver on the Tools m enu, click
65 Options, click Load Model, select cells D21:D29 on the worksheet, and then click OK until the
66 Solver Parameters dialog box is displayed. Click Solve. As you can see, Solver finds portfolio
67 allocations in both cases that surpass the rule of 20 percent across the board.
68
69 You can earn a higher rate of return (17.1 percent) for the sam e risk, or you can reduce your risk without
70 giving up any return. These two allocations both represent efficient portfolios.
71
72 Cells A21:A29 contain the original problem m odel. To reload this problem , click Solver on the Tools
73 m enu, click Options, click Load Model, select cells A21:A29 on the worksheet, and then click OK.
74
75 Solver displays a m essage asking if you want to reset the current Solver option settings with the settings
76 for the m odel you are loading. Click OK to proceed.
35.4 MoneyCo Problem 395
The step-by-step instructions and screen shots in this book are based on Excel 2002
(Office XP). This appendix describes some differences between Excel 2002 on Windows
and Excel on the Macintosh.
If you are using Excel on an Apple Macintosh computer, first learn the Macintosh
graphical user interface, the basic features of the operating system, and the online help.
For example, to get answers to your questions about using Mac OS X, choose Mac Help
from the Help menu, type your question, and press the Return key.
Canavos, George C., and Don M. Miller. An Introduction to Modern Business Statistics.
Belmont, Calif.: Duxbury, 1993.
Clemen, Robert T. Making Hard Decisions: An Introduction to Decision Analysis. 2nd
ed. Belmont, Calif.: Duxbury, 1996.
Cryer, Jonathan D., and Robert B. Miller. Statistics for Business: Data Analysis and
Modeling. 2nd ed. Belmont, Calif.: Duxbury, 1994.
Keller, Gerald, Brian Warrack, and Henry Bartel. Statistics for Management and
Economics. 3rd ed. Belmont, Calif.: Duxbury, 1994.
Mendenhall, William, James E. Reinmuth, and Robert J. Beaver. Statistics for
Management and Economics. 7th ed. Belmont, Calif.: Duxbury, 1993.
Menzefricke, Ulrich. Statistics for Managers. Belmont, Calif.: Duxbury, 1995.
Survey of Current Business. Washington, D.C.: U.S. Government Printing Office, 1983-
1987.
400