Graduation Tests - Course Notes
Graduation Tests - Course Notes
Graduation tests – Course Notes Page 1
Graduation tests
Course Notes
Syllabus objectives
4.5 Graduation and graduation tests
4.5.1 Describe and apply statistical tests of the comparison of crude estimates
with a standard mortality table testing for:
the overall fit
the presence of consistent bias
the presence of individual ages where the fit is poor
the consistency of the ‘shape’ of the crude estimates and the
standard table.
4.5.3 Describe a test for smoothness of a set of graduated estimates.
4.5.7 Carry out a comparison of a set of crude estimates and a standard table, or
of a set of crude estimates and a set of graduated estimates.
The Actuarial Education Company © IFE: 2019 Examinations
Page 2 CS2B‐10: Graduation tests – Course Notes
0 Introduction
0.1 Contents
There is very little material in the Core Reading for this chapter that relates specifically to R.
However, in theory, any syllabus objective in Subject CS2 can be tested in the CS2B exam.
For example, you could be asked to carry out the calculations required for some of the graduation
tests in this chapter.
We have provided an exam‐style worked example below that covers some types of exercises
relating to this topic that could be examined in the practical exam.
0.2 Summary sheets
For the topics in this chapter you will need to be familiar with the basic R commands and
functions. We have summarised these in the R Refresher for CS2 B document, which is available
for you to download from the main menu.
0.3 Data requirements
If you wish to follow our workings for the worked example, you will need to download the
following file into your working directory:
Graduation2.csv
© IFE: 2019 Examinations The Actuarial Education Company
CS2B‐10: Graduation tests – Course Notes Page 3
Worked example
Ex 1 We have been provided with a data file Graduation2.csv, which includes the following columns:
Exam style
AGE Age x
ETR Central exposed to risk for age x last birthday
DEATHS Number of deaths recorded at age x last birthday
CRUDE Crude mortality rate for age x last birthday
GRADUATED Graduated mortality rate for age x last birthday
EXPECTED Expected number of deaths at age x last birthday based on the graduated rates
ZX Individual standardised deviation for age x
The entries for the last four columns have not yet been calculated and currently just contain
zeros.
It has been proposed that the crude rates can be graduated using the Gompertz mortality law
x Bc x .
(c) Fill in the entries for the GRADUATED column in your table, rounding your
answers to 6 decimal places.
diff1<-function(v)v[-1]-v[-length(v)]
(b) Display the third differences of both the crude rates and the graduated rates in a
form that makes their values easy to compare.
(b) Plot a graph of the values of ZX against age.
The Actuarial Education Company © IFE: 2019 Examinations
Page 4 CS2B‐10: Graduation tests – Course Notes
Solution
Ex 1 (i)(a) Read in the data
We have used the c:\Temp folder for our files. So we need to set the working directory (folder):
setwd("c:\\Temp")
Note that you need to put the folder name in double quotes and (if you’re working in Windows)
you need to use two backslashes to specify a subfolder.
We can now read in the data file and check it has the correct contents:
data<-read.table("Graduation2.csv",header=T)
head(data)
AGE.ETR.DEATHS.CRUDE.GRADUATED.EXPECTED.ZX
1 25,3140,1,0,0,0,0
2 26,3217,1,0,0,0,0
3 27,3279,1,0,0,0,0
4 28,3349,1,0,0,0,0
5 29,3395,3,0,0,0,0
6 30,3403,2,0,0,0,0
This doesn’t look right. The table we have created appears to contain a single column of values of
the form 25,3140,1,0,0,0,0 with the very long name AGE.ETR.DEATHS…ZX, rather than treating it
as 7 separate columns of figures. So we need to include the sep option to specify that the data
values in the .csv file provided are separated by commas rather than spaces.
data<-read.table("Graduation2.csv",sep=",",header=T)
head(data)
This now has the format we would expect.
© IFE: 2019 Examinations The Actuarial Education Company
CS2B‐10: Graduation tests – Course Notes Page 5
(i)(b) Entries for the CRUDE column
The crude rates for each age are calculated by dividing the number of deaths by the exposed to
risk.
data$CRUDE<-data$DEATHS/data$ETR
We can check that this has replaced the CRUDE column in our data table with the correct values:
head(data)
(ii)(a) How to fit a Gompertz law
The equation for the Gompertz mortality law is:
x Bc x
If we take logs, this is equivalent to:
So we can use the lm (= linear model) function to fit this model in R.
The command we need to use to fit this linear model is:
gompertz<-lm(log(data$CRUDE)~data$AGE)
The specification on the RHS indicates that we want to carry out a regression of the values of
log(data$CRUDE) in terms of the independent variable data$AGE.
We can see what parameter values R has calculated:
gompertz
Call:
lm(formula = log(data$CRUDE) ~ data$AGE)
Coefficients:
(Intercept) data$AGE
-10.8415 0.1029
The Actuarial Education Company © IFE: 2019 Examinations
Page 6 CS2B‐10: Graduation tests – Course Notes
This tells us that the parameter values for the fitted model are:
We can pick these values out from the fitted model using the coef function:
coef(gompertz)
(Intercept) data$AGE
-10.8415488 0.1028654
The output from this function includes the headings (Intercept) and data$AGE. To assign the
correct values to the variables B and c , we need to remove these using the as.numeric
function, which extracts just the numerical information. We can then apply the exponential
function:
B=exp(as.numeric(coef(gompertz)))[1]
C=exp(as.numeric(coef(gompertz)))[2]
c(B,C)
We’ve used capital C for the variable name here to avoid confusion with the built‐in
concatenation function c , which we’ve used in the line afterwards!
Exam tip
Using the same name for two different things is a common source of errors in R. So, when you’re
deciding what name to give a new variable, make sure that that name doesn’t already exist. If in
doubt, you can use the ? function in R to find out.
(ii)(c) Entries for the GRADUATED column
We can now fill in the column of graduated rates in our data table:
data$GRADUATED<-round(B*C^data$AGE,6)
head(data)
© IFE: 2019 Examinations The Actuarial Education Company
CS2B‐10: Graduation tests – Course Notes Page 7
(ii)(d) Graph of the crude and graduated rates
We can use the plot function to plot the crude rates and then use the lines function to overlay a
curve for the graduated rates.
plot(data$AGE,data$CRUDE,
xlab="Age",ylab="Mortality rate",main="Crude and graduated rates")
lines(data$AGE,data$GRADUATED)
0.02
0.01
0.00
30 40 50 60 70
Age
(iii)(a) Explain the diff1 function
diff1<-function(v)v[-1]-v[-length(v)]
The v[-1] removes the first element from the vector v, while the v[-length(v)] removes the
last element. Subtracting these will then give a vector of the first differences of the values in v
(as suggested by the function name).
(iii)(b) Calculate the third differences
The third differences can be calculated by applying the diff1 function three times to the rates.
diff_crude=round(diff1(diff1(diff1(data$CRUDE)))*10^6,0)
diff_grad=round(diff1(diff1(diff1(data$GRADUATED)))*10^6,0)
The Actuarial Education Company © IFE: 2019 Examinations
Page 8 CS2B‐10: Graduation tests – Course Notes
R will number the rows in the table automatically when we display it, but these row numbers will
not match the actual ages. So it is best if we include the actual ages as the first column in our
table. The ages in the original data went up to age 75, but we will need to restrict this to age 72
here, since each level of differencing we applied will have removed one age.
cbind(data$AGE[data$AGE<=72],diff_crude,diff_grad)
diff_crude diff_grad
[1,] 25 -2 0
[2,] 26 592 0
[3,] 27 -1472 2
[4,] 28 2055 0
[5,] 29 -3228 -1
[6,] 30 3800 2
[7,] 31 -2326 0
[8,] 32 18 1
[9,] 33 1398 1
[10,] 34 -1378 0
[11,] 35 2334 1
[12,] 36 -3829 2
[13,] 37 3829 0
[14,] 38 -3315 2
[15,] 39 986 1
[16,] 40 1046 1
[17,] 41 -1190 3
[18,] 42 3472 1
[19,] 43 -3254 2
[20,] 44 -1349 3
[21,] 45 415 1
[22,] 46 3675 5
[23,] 47 -285 2
[24,] 48 -4722 3
[25,] 49 4633 4
[26,] 50 -3373 5
[27,] 51 2679 5
[28,] 52 -5482 4
[29,] 53 8357 6
[30,] 54 -3873 7
[31,] 55 -1807 7
[32,] 56 4623 8
[33,] 57 -3577 9
[34,] 58 -2194 9
[35,] 59 6446 10
[36,] 60 -8835 14
[37,] 61 11501 12
[38,] 62 -13794 15
[39,] 63 16343 16
[40,] 64 -14429 18
[41,] 65 8462 20
[42,] 66 247 22
[43,] 67 -7678 25
[44,] 68 -1206 26
[45,] 69 17483 32
[46,] 70 -17614 31
[47,] 71 10855 39
[48,] 72 -12400 40
© IFE: 2019 Examinations The Actuarial Education Company
CS2B‐10: Graduation tests – Course Notes Page 9
(iii)(c) Comment on the third differences
We can see from the table that the third differences of the crude rates are much larger in
magnitude (max = 17,614) and progress erratically. This is to be expected, since they are
calculated directly from the deaths, which contain a significant random element.
The third differences of the graduated rates are all very small (max = 40) and progress regularly.
(The slight irregularity we can see is a result of the rounding we applied to the graduated rates in
part (ii)(c).) This is to be expected, since they have been smoothed using a simple parametric
formula with just two parameters.
(iv)(a) Entries for the EXPECTED and ZX columns
We can easily fill in the expected numbers of deaths and the individual standardised deviations
(calculated as z x (DEATHS EXPECTED) / EXPECTED ), rounding the values to an appropriate
number of decimal places:
data$EXPECTED<-round(data$GRADUATED*data$ETR,2)
data$ZX<-round((data$DEATHS-data$EXPECTED)/sqrt(data$EXPECTED),3)
head(data)
The Actuarial Education Company © IFE: 2019 Examinations
Page 10 CS2B‐10: Graduation tests – Course Notes
(iv)(b) Graph of the ZX’s
It will be easiest to spot any patterns in the z x ’s if we plot them using the type=”b” option to
display both points for each age and lines connecting them.
plot(data$AGE,data$ZX,type="b",xlab="Age (x)",
ylab="zx",main="Individual standardised deviations")
0
-1
-2
30 40 50 60 70
Age (x)
(iv)(c) Comment on the graph
Without doing any detailed calculations we can see that:
The ISDs appear to be randomly distributed with no obvious pattern.
They all lie in the range (–3,3), so there are no extreme outliers.
However, seven values lie outside the range (–2,2), which might warrant investigation.
Roughly equal numbers lie above and below the zero line, which indicates that there is no
systematic bias.
There are no obvious runs of positive or negative values, which indicates that we haven’t
overgraduated the rates.
© IFE: 2019 Examinations The Actuarial Education Company
CS2B‐10: Graduation tests – Course Notes Page 11
1 Core reading
The Core Reading for this chapter contains a few specific references to R.
Chi‐square test
However, the 2 statistic can also be calculated directly using:
chi2 = sum((observed-expected)^2/expected)
Serial correlation test
The serial correlations test can be carried out in R by considering the series of z x ’s as if
they were a time series and computing the first-order autocorrelation.
The Actuarial Education Company © IFE: 2019 Examinations