0% found this document useful (0 votes)
24 views11 pages

Understanding Correlation in Statistics

statistics in nursing

Uploaded by

naresh.soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views11 pages

Understanding Correlation in Statistics

statistics in nursing

Uploaded by

naresh.soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Correlation Meaning and Need

A correlation is a statistical measure of the relationship between two variables. The


measure is best used in variables that demonstrate a linear relationship between each
other. The fit of the data can be visually represented in a scatterplot. Using a scatterplot,
we can generally assess the relationship between the variables and determine whether
they are correlated or not.

The correlation coefficient is a value that indicates the strength of the relationship
between variables. The coefficient can take any value from -1 to 1. The interpretations
of the values are:

The scatter plot explains the correlation between the two attributes or variables. It
represents how closely the two variables are connected. There can be three such
situations to see the relation between the two variables –

 Positive Correlation (+1) – when the values of the two variables move in the
same direction so that an increase/decrease in the value of one variable is
followed by an increase/decrease in the value of the other variable.
 Negative Correlation (-1)– when the values of the two variables move in the
opposite direction so that an increase/decrease in the value of one variable is
followed by a decrease/increase in the value of the other variable.
 No Correlation (0) – when there is no linear dependence or no relation between
the two variables.

Correlation Formula

Correlation shows the relation between two variables. The correlation coefficient shows
the measure of correlation. To compare two datasets, we use the correlation formulas.

Pearson Correlation Coefficient Formula

The most common formula is the Pearson Correlation coefficient used for linear
dependency between the data sets. The value of the coefficient lies between -1 to
+1. When the coefficient comes down to zero, then the data is considered as not related.
While, if we get the value of +1, then the data are positively correlated, and -1 has a
negative correlation.

Where n = Quantity of Information


Σx = Total of the First Variable Value

Σy = Total of the Second Variable Value

Σxy = Sum of the Product of first and second Value

Σx2 = Sum of the Squares of the First Value

Σy2 = Sum of the Squares of the Second Value

Rank-order correlation

Definition

A rank-order correlation is a correlation between two variables whose values are ranks.

Description

When variables are measured at least on ordinal scales, units of observation (e.g.,
individuals, nations, organizations, values) can be ranked. A ranking is an ordering of
units of observations with respect to an attribute of interest. For example, nations can
be ranked with respect to their quality of life, their freedom, their tightness, or
looseness, etc. A rank is the position of a unit of observation (e.g., nation) in the ranking.
Units of observation with higher ranks show the attribute of interest to a higher degree.
If one is interested in the association between two rankings (e.g., quality of life and
freedom of nations), rank-order correlations can be calculated.

Scatter diagrams
Scatter diagrams are used when you want to demonstrate the relationship between two
variables or when you have to identify data patterns.
A simple scatter plot can be used to see the difference in outdoor temperatures
compared to ice cream sales. The two variables would be outside temperature and ice
cream sales. This data could be collected and organized into a table. Once the data is
organized into a table, it can be turned into ordered pairs.

Product-moment correlation
The Pearson product-moment correlation coefficient (or Pearson correlation coefficient,
for short) is a measure of the strength of a linear association between two variables and
is denoted by r. Basically, a Pearson product-moment correlation attempts to draw a
line of best fit through the data of two variables, and the Pearson correlation
coefficient, r, indicates how far away all these data points are to this line of best fit (i.e.,
how well the data points fit this new model/line of best fit).

Values can be used for product-moment correlation

The Pearson correlation coefficient, r, can take a range of values from +1 to -1. A value of
0 indicates that there is no association between the two variables. A value greater than
0 indicates a positive association; that is, as the value of one variable increases, so does
the value of the other variable. A value less than 0 indicates a negative association; that
is, as the value of one variable increases, the value of the other variable decreases. This
is shown in the diagram below:

Simple linear regression analysis and prediction


When there is only one predictor variable, the prediction method is called simple
regression. In simple linear regression, the topic of this section is the predictions of Y
when plotted as a function of X form a straight line.

Comparison in Pairs
A paired comparison scale presents the respondent with two choices and calls for a
preference. For example, the respondent is asked which color he or she likes better, red
or blue, and a similar process is repeated throughout the scale items.
The pairwise comparison method (sometimes called the ‘paired comparison method’) is
a process for ranking or choosing from a group of alternatives by comparing them
against each other in pairs, i.e. two alternatives at a time. Pairwise comparisons are
widely used for decision-making, voting and studying people’s preferences.
Pairwise Comparison Steps:
1. Compute a mean difference for each pair of variables.
2. Find the critical mean difference.
3. Compare each calculated mean difference to the critical mean.
4. Decide whether to retain or reject the null hypothesis for that pair of means.

Randomized block design is a type of experiment where participants who share


certain characteristics are grouped together to form blocks, and then the treatment (or
intervention) gets randomly assigned within each block.
The objective of the randomized block design is to form groups where participants are
similar, and therefore can be compared with each other.

Latin square design


Latin square design is a general version of the dye-swapping design for samples
from more than two biological conditions. The Latin square design requires that
the number of experimental conditions equals the number of different labels.
The same number of experimental runs as the number of treatment conditions
is also used. The treatment conditions are labelled once using each label and
sampled once under each experimental run.

A Latin Square design is a specific type of experimental design commonly


used in statistics and research studies. It is particularly useful in situations
where there are multiple factors or variables that need to be controlled or
balanced.
In a Latin Square design, the experimental units (e.g., participants, test
subjects, treatments) are arranged in a square grid-like pattern, where each
row and column contains a unique combination of treatments. The Latin
Square design ensures that each treatment appears exactly once in each row
and column, providing a balanced distribution of treatments across the
factors or variables being studied.

The Latin Square design is commonly used in situations where there are
constraints or limitations on the experimental conditions. For example:

1. Time-based Experiments: When conducting experiments over a


period of time, the Latin Square design can be used to ensure that
each treatment is equally distributed across different time intervals
or days.
2. Control of Confounding Variables: The Latin Square design
allows researchers to control for the influence of certain variables by
ensuring that each treatment appears once in each row and column.
This helps reduce the potential bias caused by uncontrolled factors.
3. Comparative Studies: In studies comparing different treatments or
interventions, the Latin Square design can be employed to ensure
that each treatment has an equal chance of being tested with
different participants or under different conditions.
4. Resource Optimization: The Latin Square design can be useful
when resources are limited, such as in clinical trials or laboratory
experiments, as it allows for a balanced distribution of treatments
within the available resources.
Pa
rametric Test
Parametric test in statistics refers to a sub-type of the hypothesis test . Parametric
hypothesis testing is the most common type of testing done to understand the
characteristics of the population from a sample.
While there are many parametric test types, and they have certain differences, few
properties are shared across all the tests that make them a part of ‘parametric tests’.
These properties include-
1. When using such tests, there needs to be a deep or proper understanding of the
population.
2. An extension of the above point is that to use such tests, several assumptions
regarding the population must be fulfilled (hence a proper understanding of the
population is required). A common assumption is that the population should be
normally distributed (at least approximately).
3. The outputs from such tests cannot be relied upon if the assumptions regarding
the population deviate significantly.
4. A large sample size is required to run such tests. Theoretically, the sample size
should be more than 30 so that the central limit theorem can come into effect,
making the sample normally distributed.
5. Such tests are more powerful, especially compared to their non-parametric
counterparts for the same sample size.
6. These tests are only helpful with continuous/quantitative variables.
7. Measurement of the central tendency (i.e., the central value of data) is typically
done using the mean.
8. The output from such tests is easy to interpret; however, it can be challenging to
understand their workings.

Non Parametric Test


There is no requirement for any distribution of the population in the non-parametric
test. Also, the non-parametric test is a type of hypothesis test that is not dependent on
any underlying hypothesis. In the non-parametric test, the test depends on the value of
the median. This method of testing is also known as distribution-free testing. Test
values are found based on the ordinal or the nominal level. The parametric test is
usually performed when the independent variables are non-metric. This is known as a
non-parametric test.
Differences Between The Parametric Test and The Non-Parametric Test

Properties Parametric Test Non-Parametric Test

Assumptions Yes, assumptions are made No, assumptions are not made

Value for central The mean value is the The median value is the
tendency central tendency central tendency

Correlation Pearson Correlation Spearman Correlation

Probabilistic Normal probabilistic Arbitrary probabilistic


Distribution distribution distribution

Population Population knowledge is Population knowledge is not


Knowledge required required

Used for Used for finding interval data Used for finding nominal data

Applicable to variables and


Application Applicable to variables
attributes

Examples T-test, z-test Mann-Whitney, Kruskal-Wallis

Other Differences
Advantages and Disadvantages of Parametric and Nonparametric Tests
A lot of individuals accept that the choice between using parametric or nonparametric
tests relies upon whether your information is normally distributed. The distribution can
act as a deciding factor in case the data set is relatively small. Although, in a lot of cases,
this issue isn't a critical issue because of the following reasons:

 Parametric tests help in analyzing non normal appropriations for a lot of


datasets.
 Nonparametric tests when analyzed have other firm conclusions that are harder
to achieve.

The appropriate response is usually dependent upon whether the mean or median is
chosen to be a better measure of central tendency for the distribution of the data.
 A parametric test is considered when you have the mean value as your central
value and the size of your data set is comparatively large. This test helps in
making powerful and effective decisions.
 A non-parametric test is considered regardless of the size of the data set if the
median value is better when compared to the mean value.

Ultimately, if your sample size is small, you may be compelled to use a nonparametric
test. As the table shows, the example size prerequisites aren't excessively huge. On the
off chance that you have a little example and need to utilize a less powerful
nonparametric analysis, it doubly brings down the chances of recognizing an impact.

The non-parametric test acts as the shadow world of the parametric test. In the table
that is given below, you will understand the linked pairs involved in the statistical
hypothesis tests.

Brief Explanation of Parametric and Non Parametric Test

When you need to compare the sample’s mean with a


hypothesized value (which often refers to the population mean),
then one sample z-test is used.
The test has major requirements, such as the sample size should
be more than 30, and the population’s standard deviation should
be known
 Z-Test

If either of the requirements mentioned above cannot be met,


then you can use another type of parametric test known as the
one-sample t-test.
Here if the sample size is at least more than 15 and the standard
deviation of the sample is known, then you can use this test.
Here the sample distribution should be approximately normal
 One Sample t-Test

Paired t-test is used when from the same subject data is


collected; typically before and after an event—for example, the
weight of a group of 10 sportsmen before and after a diet
program.
Here to compare the mean of the before and after group, you can
use the paired t-test. The assumptions here include groups being
independent, the values of before and after belonging to the
 Paired same subjects, and the differences between the groups should be
(dependent) t- normally distributed
Test

 Two Sampled In situations where there are two separate samples, for example,
the house prices in Mumbai v/s house prices in Delhi, and you
have to check if the mean of both these samples is statistically
significantly different not, then a two-sampled t-test can be used.
It assumes that each sample’s data distribution should be
roughly normal, values should be continuous, the variance
should be equal in both the samples, and they should be
(Independent) t- independent of each other
Test

An extension of two sampled t-tests is one-way ANOVA, where


we compare more than two groups. Suppose someone asks you
if that is ANOVA a parametric test, the answer to that is a
definitive yes.
ANOVA analyses the variance of the groups and requires the
population distribution to be normal, variance to be
 One-way Analysis homogeneous, and groups to be independent
of Variance

To understand the association between two continuous numeric


variables, you can use a person’s coefficient of correlation.
It produces an ‘r’ value where a value closer to -1 and 1 indicates
a strong negative and positive correlation respectively.

A value close to 0 indicates no major correlation between the


 Pearson’s variables. A part of its assumption is that both the variables in
Coefficient of question should be continuous.
Correlation

Common types of non-parametric tests include-


 Wilcoxon signed-rank
test
It is used as an alternative to the one-sample t-test

 Mann-Whitney U-test /
Wilcoxon rank-sum test
They can be used as an alternative to the two-sample t-test

 Kruskal-Wallis test
It is an alternative to the parametric test – one-way ANOVA

 Spearman’s rank You can use this test as an alternative to pearson’s


correlation correlation coefficient. It’s important when the data is not
continuous but in the form of ranks (ordinal data)

 Signed-rank test It is an alternative to the parametric test – paired t-test

You might also like