GitHub - iamYannC/general-stuff: random things

title

format

editor

execute

Aggregation methods for correlation

html

visual

warning	message

The following document shows how aggregating a variable by averaging or summing its' components effect the correlation with another variable.


require(faux)
msg <- function(avg=cor_avg,sum=cor_sum){
  cat(
" By Avg:", avg,"\n",
"By Sum:", sum,"\n
 They are the same."
)
}

First let's generate some fake data. Using faux we can generate correlated variables.

Since I am using random correlations anyway, in this specific example, It doesn't really matter what is considered as X variable, and what is the Y variable.

set.seed(1)
r <- sample(seq(-1,1,0.001), 6, replace = TRUE)

set.seed(2)
X <- rnorm_multi(n = 1000,varnames = paste0('x',1:3),r = r[1:3])
set.seed(3)
Y <- rnorm_multi(n = 1000,varnames = paste0('y',1:3),r = r[4:6])

cbind(X,Y)[1:3,]


avg_X <- rowMeans(X)
sum_X <- rowSums(X)

avg_Y <- rowMeans(Y)
sum_Y <- rowSums(Y)

cor_avg <- cor(avg_X,avg_Y)

cor_sum <- cor(sum_X,sum_Y)

msg()

Mathematical solution

Let's define the average and sum of a variable

$\bar{X} = \frac{X_1 + X_2 + ... + X_n}{n}$

$S = X_1 + X_2 + ... + X_n = n\bar{X}$

It's clear now that the Sum is a linear transformation of the average

$S = \bar{X} \cdot n$

And since Pearson correlation is not effected by linear transformation, both methods yield the same correlation.

Here is a reminder of one of the many ways to define the pearson correlation:

$$ r_{XY} = \frac{\sum_{i=1}^N (X_i - \bar{X})(Y_i - \bar{Y})}{N \cdot s_X \cdot s_Y} $$

There is a standartization to Z-Scores embedded in the formula, thus linear transformation does not effect the correlation!

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
votes		votes
.gitignore		.gitignore
README.md		README.md
correlation.qmd		correlation.qmd
dummysp.html		dummysp.html
dummysp.md		dummysp.md
dummysp.qmd		dummysp.qmd
project.Rproj		project.Rproj
youtube download.py		youtube download.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The following document shows how aggregating a variable by averaging or summing its' components effect the correlation with another variable.

Mathematical solution

About

Releases

Packages

Contributors 2

Languages

iamYannC/general-stuff

Folders and files

Latest commit

History

Repository files navigation

The following document shows how aggregating a variable by averaging or summing its' components effect the correlation with another variable.

Mathematical solution

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages