0% found this document useful (0 votes)
15 views1 page

Calculating Correlation Coefficients With Repeated Observations - Part 1

This document discusses how to calculate correlation coefficients when there are repeated observations per subject, noting that simply combining all observations can be misleading. It recommends using multiple regression to look at variation within subjects, treating subject as a categorical variable. The analysis of variance for the regression allows calculating the correct within-subject correlation coefficient of -0.51 between pH and Paco2 for the sample data, rather than the incorrect -0.07 obtained without accounting for repeated measures.

Uploaded by

yuenkeith
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
15 views1 page

Calculating Correlation Coefficients With Repeated Observations - Part 1

This document discusses how to calculate correlation coefficients when there are repeated observations per subject, noting that simply combining all observations can be misleading. It recommends using multiple regression to look at variation within subjects, treating subject as a categorical variable. The analysis of variance for the regression allows calculating the correct within-subject correlation coefficient of -0.51 between pH and Paco2 for the sample data, rather than the incorrect -0.07 obtained without accounting for repeated measures.

Uploaded by

yuenkeith
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 1

Statistics Notes

Calculating correlation coefficients with repeated observations:


Part 1-correlation within subjects
J Martin Bland, Douglas G Altman

This is the twelfth in a series of In an earlier Statistics Note' we commented on the which shows how the variability in pH can be parti-
occasional notes on medical analysis of paired data where there is more than one tioned into components due to different sources. This
statistics observation per subject, as shown in table I. We method is also known as analysis of covariance and
pointed out that it could be highly misleading to is equivalent to fitting parallel lines through each
analyse such data by combining repeated observations subject's data (see figure). The residual sum of squares
from several subjects and then calculating the correla-
tion coefficient as if the data were a simple sample. This
note is a response to several letters about the appro- 7.5 °
priate analysis for such data. 0

TABLE i-Repeated measurements of intramural pH and Paco2 for I


eight subjects2 7-
-X 00
o
0
Subject pH Paco2 Subject pH Paco2 E
0
1 6-68 3 97 5 7 30 4-32 . 6.5- o
1 6-53 4-12 5 7-37 3-23
1 6-43 4 09 5 7-27 4-46
1 6-33 3-97 5 7-28 4-72
2 6-85 5-27 5 7-32 4.75
2 7-06 5-37 5 7-32 4-99
2 7-13 5-41 6 7.38 4 78
2 717 5-44 6 730 4-73 3 4 5 6 7
3 7 40 5-67 6 7-29 5-12
3 7-42 3-64 6 7-33 4 93 PaCO2
3 7-41 4-32 6 7-31 5 03 pH against Paco2 for eight subjects, with parallel lines fitted for each
3 7-37 4-73 6 7-33 4 93 subject
3 7-34 4-96 7 6-86 6-85
3 7.35 5 04 7 6 94 6 44
3 7 28 5 22 7 6-92 6-52
3 7 30 4-82 8 7-19 5-28 in table II represents the variation about these lines.
3 7-34 5 07 8 7-29 4-56 We remove the variation due to subjects (and any
4 7-36 5-67 8 7-21 4-34
4 7-33 5-10 8 7-25 432 other nuisance variables which might be present) and
4 7-29 5-53 8 7-20 4-41 express the variation in pH due to Paco2 as a propor-
4 7.30 4-75 8 7-19 3 69
4 7-35 5-51 8 6-77 6-09 tion of what's left:
5 7-35 4-28 8 6-82 5-58 Sum of squares for Paco2
5 7 30 4-44

Sum of squares for Paco2 + residual sum of squares


The choice of analysis for the data in table I depends The magnitude of the correlation coefficient within
on the question we want to answer. If we want to know subjects is the square root of this proportion. For table
whether subjects with high values of intramural pH II this is:
also tend to have high values of Paco2 we are interested
in whether the average pH for a subject is related to the 0-1153
subject's average Paco2. We can use the correlation =0-51
between the subject means, which we shall describe in 01153+0-3337
a subsequent note. If we want to know whether an The sign of the correlation coefficient is given by the
increase in pH within the individual was associated sign of the regression coefficient for Paco2. Here
with an increase in Paco2 we want to remove the the regression slope is -0-108, so the correlation
differences between subjects and look only at changes coefficient within subjects is -0-51. The P value is
within. found either from the F test in the associated analysis of
Department ofPublic To look at variation within the subject we can use variance table, or from the t test for the regression
Health Sciences, multiple regression. We make one of our variables, pH
St George's Hospital slope. It doesn't matter which variable we regress on
or Paco2, the outcome variable and the other variable which; we get the same correlation coefficient and P
Medical School, London and the subject the predictor variables. Subject is
SW17 ORE value either way.
J Martin Bland, reader in
treated as a categorical factor using dummy variables 34 If we incorrectly calculate the correlation coefficient
medical statistics and so has seven degrees of freedom. We use the ignoring the fact that we have 47 observations on only 8
analysis of variance table 34 for the regression (table II), subjects, we get -0-07, P=0-7. Hence the correct
Medical Statistics analysis within subjects reveals a relation which the
Laboratory, Imperial TABLE II-Analysis of variance for the data in table I incorrect analysis misses.
Cancer Research Fund,
PO Box 123, London Source of Degrees of Sum of Mean Variance
WC2A 3PX 1 Bland JM, Altman DG. Correlation, regression, and repeated data. BMJ
variation freedom squares square ratio (F) Probability 1994;308:896.
Douglas G Altman, head 2 Boyd 0, Mackay CJ, Lamb G, Bland JM, Grounds RM, Bennett ED.
Subjects 7 2-9661 0-4237 48-3 <0 0001 Comparison of clinical information gained from routine blood-gas analysis
Correspondence to: Paco2 1 0-1153 0-1153 13-1 0-0008 and from gastric tonometry for intramural pH. Lancet 1993;341:142-6.
Residual 38 0 3337 0-0088 3 Altman DG. Practical statistics for medical research. London: Chapman and Hall,
Dr Bland. 1991.
Total 46 3-3139 0-0720 4 Armitage P, Berry G. Statistical methods in medical research. 3rd ed. Oxford:
BMJ 1995;310:446 Blackwell, 1994.

446 BMJ VOLUME 310 18 FEBRUARY 1995

You might also like