Graduation Tests - Course Notes

CS2B‐10:
Graduation tests – Course Notes Page 1

Graduation tests

Course Notes

Syllabus objectives
4.5 Graduation and graduation tests
4.5.1 Describe and apply statistical tests of the comparison of crude estimates
with a standard mortality table testing for:
 the overall fit
 the presence of consistent bias
 the presence of individual ages where the fit is poor
 the consistency of the ‘shape’ of the crude estimates and the
standard table.
4.5.3 Describe a test for smoothness of a set of graduated estimates.
4.5.7 Carry out a comparison of a set of crude estimates and a standard table, or
of a set of crude estimates and a set of graduated estimates.

The Actuarial Education Company © IFE: 2019 Examinations
Page 2 CS2B‐10: Graduation tests – Course Notes
0 Introduction
0.1 Contents
There is very little material in the Core Reading for this chapter that relates specifically to R.
However, in theory, any syllabus objective in Subject CS2 can be tested in the CS2B exam.
For example, you could be asked to carry out the calculations required for some of the graduation
tests in this chapter.
We have provided an exam‐style worked example below that covers some types of exercises
relating to this topic that could be examined in the practical exam.
0.2 Summary sheets
For the topics in this chapter you will need to be familiar with the basic R commands and
functions. We have summarised these in the R Refresher for CS2 B document, which is available
for you to download from the main menu.
0.3 Data requirements
If you wish to follow our workings for the worked example, you will need to download the
following file into your working directory:
 Graduation2.csv

© IFE: 2019 Examinations The Actuarial Education Company
CS2B‐10: Graduation tests – Course Notes Page 3
Worked example
Ex 1 We have been provided with a data file Graduation2.csv, which includes the following columns:
Exam style
 AGE Age x
 ETR Central exposed to risk for age x last birthday
 DEATHS Number of deaths recorded at age x last birthday
 CRUDE Crude mortality rate for age x last birthday
 GRADUATED Graduated mortality rate for age x last birthday
 EXPECTED Expected number of deaths at age x last birthday based on the graduated rates
 ZX Individual standardised deviation for age x
The entries for the last four columns have not yet been calculated and currently just contain
zeros.
(i) (a) Read in the data file as a data table.
(b) Fill in the entries for the CRUDE column in your table. [6]
It has been proposed that the crude rates can be graduated using the Gompertz mortality law
x  Bc x .
(ii) (a) Explain how a suitable Gompertz law could be fitted using the R function lm.
(b) Calculate suitable values for the parameters B and c .
(c) Fill in the entries for the GRADUATED column in your table, rounding your
answers to 6 decimal places.
(d) Plot a graph showing the crude rates and the graduated rates together. [18]
(iii) (a) Explain what the following R function does to an input vector v .
diff1<-function(v)v[-1]-v[-length(v)]
(b) Display the third differences of both the crude rates and the graduated rates in a
form that makes their values easy to compare.
(c) Comment on your results in (b). [12]
(iv) (a) Fill in the correct values in the remaining columns EXPECTED and ZX in your table.
(b) Plot a graph of the values of ZX against age.
(c) Comment on your graph in (iv)(b). [12]

[Total 48]
Solution
Ex 1 (i)(a) Read in the data
We have used the c:\Temp folder for our files. So we need to set the working directory (folder):
setwd("c:\\Temp")
Note that you need to put the folder name in double quotes and (if you’re working in Windows)
you need to use two backslashes to specify a subfolder.
We can now read in the data file and check it has the correct contents:
data<-read.table("Graduation2.csv",header=T)
head(data)
AGE.ETR.DEATHS.CRUDE.GRADUATED.EXPECTED.ZX
1 25,3140,1,0,0,0,0
2 26,3217,1,0,0,0,0
3 27,3279,1,0,0,0,0
4 28,3349,1,0,0,0,0
5 29,3395,3,0,0,0,0
6 30,3403,2,0,0,0,0
This doesn’t look right. The table we have created appears to contain a single column of values of
the form 25,3140,1,0,0,0,0 with the very long name AGE.ETR.DEATHS…ZX, rather than treating it
as 7 separate columns of figures. So we need to include the sep option to specify that the data
values in the .csv file provided are separated by commas rather than spaces.
data<-read.table("Graduation2.csv",sep=",",header=T)
head(data)
AGE ETR DEATHS CRUDE GRADUATED EXPECTED ZX

1 25 3140 1 0 0 0 0
2 26 3217 1 0 0 0 0
3 27 3279 1 0 0 0 0
4 28 3349 1 0 0 0 0
5 29 3395 3 0 0 0 0
6 30 3403 2 0 0 0 0
This now has the format we would expect.
(i)(b) Entries for the CRUDE column
The crude rates for each age are calculated by dividing the number of deaths by the exposed to
risk.
data$CRUDE<-data$DEATHS/data$ETR
We can check that this has replaced the CRUDE column in our data table with the correct values:
head(data)

1 25 3140 1 0.0003184713 0 0 0
2 26 3217 1 0.0003108486 0 0 0
3 27 3279 1 0.0003049710 0 0 0
4 28 3349 1 0.0002985966 0 0 0
5 29 3395 3 0.0008836524 0 0 0
6 30 3403 2 0.0005877167 0 0 0
(ii)(a) How to fit a Gompertz law
The equation for the Gompertz mortality law is:
x  Bc x
If we take logs, this is equivalent to:
log x  log B  x log c
This has the form of a linear relationship between the variables x and log x with intercept

parameter logB and slope logc .
So we can use the lm (= linear model) function to fit this model in R.
(ii)(b) Values of the parameters B and c
The command we need to use to fit this linear model is:
gompertz<-lm(log(data$CRUDE)~data$AGE)
The specification on the RHS indicates that we want to carry out a regression of the values of
log(data$CRUDE) in terms of the independent variable data$AGE.
We can see what parameter values R has calculated:
gompertz
Call:
lm(formula = log(data$CRUDE) ~ data$AGE)
Coefficients:
(Intercept) data$AGE
-10.8415 0.1029
This tells us that the parameter values for the fitted model are:
log B  10.8415 and log c  0.1029
We can pick these values out from the fitted model using the coef function:
coef(gompertz)
(Intercept) data$AGE
-10.8415488 0.1028654
The output from this function includes the headings (Intercept) and data$AGE. To assign the
correct values to the variables B and c , we need to remove these using the as.numeric
function, which extracts just the numerical information. We can then apply the exponential
function:
B=exp(as.numeric(coef(gompertz)))[1]
C=exp(as.numeric(coef(gompertz)))[2]
c(B,C)
[1] 1.956929e-05 1.108342e+00
We’ve used capital C for the variable name here to avoid confusion with the built‐in
concatenation function c , which we’ve used in the line afterwards!
Exam tip
Using the same name for two different things is a common source of errors in R. So, when you’re
deciding what name to give a new variable, make sure that that name doesn’t already exist. If in
doubt, you can use the ? function in R to find out.
(ii)(c) Entries for the GRADUATED column
We can now fill in the column of graduated rates in our data table:
data$GRADUATED<-round(B*C^data$AGE,6)
head(data)

1 25 3140 1 0.0003184713 0.000256 0 0
2 26 3217 1 0.0003108486 0.000284 0 0
3 27 3279 1 0.0003049710 0.000315 0 0
4 28 3349 1 0.0002985966 0.000349 0 0
5 29 3395 3 0.0008836524 0.000386 0 0
6 30 3403 2 0.0005877167 0.000428 0 0
(ii)(d) Graph of the crude and graduated rates
We can use the plot function to plot the crude rates and then use the lines function to overlay a
curve for the graduated rates.
plot(data$AGE,data$CRUDE,
xlab="Age",ylab="Mortality rate",main="Crude and graduated rates")
lines(data$AGE,data$GRADUATED)
Crude and graduated rates

0.04
0.03
Mortality rate
0.02
0.01
0.00
30 40 50 60 70
Age

(iii)(a) Explain the diff1 function
diff1<-function(v)v[-1]-v[-length(v)]
The v[-1] removes the first element from the vector v, while the v[-length(v)] removes the
last element. Subtracting these will then give a vector of the first differences of the values in v
(as suggested by the function name).
(iii)(b) Calculate the third differences
The third differences can be calculated by applying the diff1 function three times to the rates.
To make the values easy to compare, we will multiply them by 106 and display them in a table.
diff_crude=round(diff1(diff1(diff1(data$CRUDE)))*10^6,0)
diff_grad=round(diff1(diff1(diff1(data$GRADUATED)))*10^6,0)
R will number the rows in the table automatically when we display it, but these row numbers will
not match the actual ages. So it is best if we include the actual ages as the first column in our
table. The ages in the original data went up to age 75, but we will need to restrict this to age 72
here, since each level of differencing we applied will have removed one age.
cbind(data$AGE[data$AGE<=72],diff_crude,diff_grad)
diff_crude diff_grad
[1,] 25 -2 0
[2,] 26 592 0
[3,] 27 -1472 2
[4,] 28 2055 0
[5,] 29 -3228 -1
[6,] 30 3800 2
[7,] 31 -2326 0
[8,] 32 18 1
[9,] 33 1398 1
[10,] 34 -1378 0
[11,] 35 2334 1
[12,] 36 -3829 2
[13,] 37 3829 0
[14,] 38 -3315 2
[15,] 39 986 1
[16,] 40 1046 1
[17,] 41 -1190 3
[18,] 42 3472 1
[19,] 43 -3254 2
[20,] 44 -1349 3
[21,] 45 415 1
[22,] 46 3675 5
[23,] 47 -285 2
[24,] 48 -4722 3
[25,] 49 4633 4
[26,] 50 -3373 5
[27,] 51 2679 5
[28,] 52 -5482 4
[29,] 53 8357 6
[30,] 54 -3873 7
[31,] 55 -1807 7
[32,] 56 4623 8
[33,] 57 -3577 9
[34,] 58 -2194 9
[35,] 59 6446 10
[36,] 60 -8835 14
[37,] 61 11501 12
[38,] 62 -13794 15
[39,] 63 16343 16
[40,] 64 -14429 18
[41,] 65 8462 20
[42,] 66 247 22
[43,] 67 -7678 25
[44,] 68 -1206 26
[45,] 69 17483 32
[46,] 70 -17614 31
[47,] 71 10855 39
[48,] 72 -12400 40
(iii)(c) Comment on the third differences
We can see from the table that the third differences of the crude rates are much larger in
magnitude (max = 17,614) and progress erratically. This is to be expected, since they are
calculated directly from the deaths, which contain a significant random element.
The third differences of the graduated rates are all very small (max = 40) and progress regularly.
(The slight irregularity we can see is a result of the rounding we applied to the graduated rates in
part (ii)(c).) This is to be expected, since they have been smoothed using a simple parametric
formula with just two parameters.
(iv)(a) Entries for the EXPECTED and ZX columns
We can easily fill in the expected numbers of deaths and the individual standardised deviations
(calculated as z x  (DEATHS  EXPECTED) / EXPECTED ), rounding the values to an appropriate
number of decimal places:
data$EXPECTED<-round(data$GRADUATED*data$ETR,2)
data$ZX<-round((data$DEATHS-data$EXPECTED)/sqrt(data$EXPECTED),3)
head(data)

1 25 3140 1 0.0003184713 0.000256 0.80 0.224
2 26 3217 1 0.0003108486 0.000284 0.91 0.094
3 27 3279 1 0.0003049710 0.000315 1.03 -0.030
4 28 3349 1 0.0002985966 0.000349 1.17 -0.157
5 29 3395 3 0.0008836524 0.000386 1.31 1.477
6 30 3403 2 0.0005877167 0.000428 1.46 0.447
(iv)(b) Graph of the ZX’s
It will be easiest to spot any patterns in the z x ’s if we plot them using the type=”b” option to
display both points for each age and lines connecting them.
plot(data$AGE,data$ZX,type="b",xlab="Age (x)",
ylab="zx",main="Individual standardised deviations")
Individual standardised deviations

3
2
1
zx
0
-1
-2
30 40 50 60 70
Age (x)

(iv)(c) Comment on the graph
Without doing any detailed calculations we can see that:
 The ISDs appear to be randomly distributed with no obvious pattern.
 They all lie in the range (–3,3), so there are no extreme outliers.
 However, seven values lie outside the range (–2,2), which might warrant investigation.
 Roughly equal numbers lie above and below the zero line, which indicates that there is no
systematic bias.
 There are no obvious runs of positive or negative values, which indicates that we haven’t
overgraduated the rates.
1 Core reading
The Core Reading for this chapter contains a few specific references to R.
Chi‐square test
R can carry out a  goodness-of-fit test using chisq.test().

2
However, the  2 statistic can also be calculated directly using:
chi2 = sum((observed-expected)^2/expected)
Serial correlation test
The serial correlations test can be carried out in R by considering the series of z x ’s as if
they were a time series and computing the first-order autocorrelation.
Alternatively, to compute rj in R from a set of 50 deviations z x , use:
z1x <- zx[1:49]

z2x <- zx[2:50]
cor(z1x,z2x)

Graduation Tests - Course Notes

Uploaded by

Graduation Tests - Course Notes

Uploaded by

CS2B‐10:

(i) (a) Read in the data file as a data table.

(b) Fill in the entries for the CRUDE column in your table. [6]

(ii) (a) Explain how a suitable Gompertz law could be fitted using the R function lm.

(b) Calculate suitable values for the parameters B and c .

(d) Plot a graph showing the crude rates and the graduated rates together. [18]

(iii) (a) Explain what the following R function does to an input vector v .

(c) Comment on your results in (b). [12]

(iv) (a) Fill in the correct values in the remaining columns EXPECTED and ZX in your table.

(c) Comment on your graph in (iv)(b). [12]

AGE ETR DEATHS CRUDE GRADUATED EXPECTED ZX

AGE ETR DEATHS CRUDE GRADUATED EXPECTED ZX

log x  log B  x log c

This has the form of a linear relationship between the variables x and log x with intercept

(ii)(b) Values of the parameters B and c

log B  10.8415 and log c  0.1029

[1] 1.956929e-05 1.108342e+00

AGE ETR DEATHS CRUDE GRADUATED EXPECTED ZX

Crude and graduated rates

To make the values easy to compare, we will multiply them by 106 and display them in a table.

AGE ETR DEATHS CRUDE GRADUATED EXPECTED ZX

Individual standardised deviations

R can carry out a  goodness-of-fit test using chisq.test().

Alternatively, to compute rj in R from a set of 50 deviations z x , use:

z1x <- zx[1:49]

You might also like