0% found this document useful (0 votes)
12 views5 pages

Correlation and Regression

Uploaded by

Awmtei OM I
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
0% found this document useful (0 votes)
12 views5 pages

Correlation and Regression

Uploaded by

Awmtei OM I
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 5

INTERDISCIPLINARY IMN BIOSTATISTCS UNIT -1

Correlation and Regression

Correlation quantifies the strength of the linear relationship between a pair of variables, whereas
regression expresses the relationship in the form of an equation.

Correlation and regression are statistical measurements that are used to give a relationship between
two variables.

We use correlation to summarize the strength and degree of the relationship between two or more
numeric variables.

• We use regression when you’re looking to predict, optimize, or explain a number response

between the variables (how x influences y). In statistical terms we use correlation to denote association
between two quantitative variables. We also assume that the association is linear, that one variable
increases or decreases a fixed amount for a unit increase or decrease in the other.

• Correlation analysis is applied in quantifying the association between two continuous variables, for
example, an dependent and independent variable or among two independent variables.

• If an increase (or decrease) in one variable causes a corresponding increase (or decrease) in
another then the two variables are said to be directly correlated. Similarly, if an increase in one
causes a decrease in another or vice versa, then the variables are said to be indirectly correlated. If
a change in an independent variable does not cause a change in the dependent variable then they
are uncorrelated. Thus, correlation can be positive (direct correlation), negative (indirect
correlation), or zero. This relationship is given by the correlation coefficient.

• The degree of association is measured by a correlation coefficient, r.

• It is sometimes called Pearson’s correlation coefficient after its originator and is a measure of linear
association. If a curved line is needed to express the relationship, other and more complicated
measures of the correlation must be used.

• The correlation coefficient is measured on a scale that varies from + 1 through 0 to – 1. Complete
correlation between two variables is expressed by either + 1 or -1. Complete absence of correlation
is represented by 0.

• In statistical analysis, p value less than 0.05 is considered significant.

Regression involves estimating the best straight line to summarise the association.

• Regression can be defined as a measurement that is used to quantify how the change in one
variable will affect another variable.

• Regression is used to find the cause and effect between two variables.

• Regression analysis refers to assessing the relationship between the outcome variable and one or
more variables. The outcome variable is known as the dependent or response variable and the risk
elements, and co-founders are known as predictors or independent variables. The dependent
variable is shown by “y” and independent variables are shown by “x” in regression analysis.If the
INTERDISCIPLINARY IMN BIOSTATISTCS UNIT -1
regression has one independent variable, then it is known as a simple linear regression. If it has
more than one independent variable, then it is known as multiple linear regression.

Test of Significance(Hypothesis testing)

In statistics, it is important to know if the result of an experiment is significant enough or not. In order to
measure the significance, there are some predefined tests which could be applied. These tests are called

the tests of significance or simply the significance tests. The significance level is the level at which it can
be accepted if a given event is statistically significant. This is also termed as p-value. It is observed that the
bigger samples are less prone to chance, thus the sample size plays a vital role in measuring the statistical
significance. One should use only representative and random samples for significance testing.

• In short, the significance is the probability that a relationship exists. Significance tests tell us about
the probability that if a relationship we found is due to random chance or not and to which level.
This indicates about the error that would be made by us if the found relationship is assumed to
exist. The statistical significance may be weak or strong. It does not necessarily indicate practical
significance.

• A 5% probability or less which means 5% results occur due to chance. This also indicates that there
is a 95% chance of results occurring NOT by chance. Whenever it is found that the result of our
experiment is statistically significant, it refers that we should be 95% sure the results are not due to
chance.
INTERDISCIPLINARY IMN BIOSTATISTCS UNIT -1
• NULL AND ALTERNATE: Every test of significance begins with a null hypothesis H0. H0 represents a
theory that has been put forward, either because it is believed to be true or because it is to be used
as a basis for argument, but has not been proved. For example, in a clinical trial of a new drug, the
null hypothesis might be that the new drug is no better, on average, than the current drug. We
would write H0: there is no difference between the two drugs on average.

• The alternative hypothesis, Ha, is a statement of what a statistical hypothesis test is set up to
establish. For example, in a clinical trial of a new drug, the alternative hypothesis might be that the
new drug has a different effect, on average, compared to that of the current drug. We would
write Ha: the two drugs have different effects, on average. The alternative hypothesis might also be
that the new drug is better, on average, than the current drug. In this case we would write Ha: the
new drug is better than the current drug, on average.

• The final conclusion once the test has been carried out is always given in terms of the null
hypothesis. We either "reject H0 in favor of Ha" or "do not reject H0"; we never conclude
"reject Ha", or even "accept Ha".

• If we conclude "do not reject H0", this does not necessarily mean that the null hypothesis is true, it
only suggests that there is not sufficient evidence against H0 in favor of Ha; rejecting the null
hypothesis then, suggests that the alternative hypothesis may be true.

• In conclusion, we run a hypothesis test that helps statisticians determine if the evidence are enough
in a sample data to conclude that a research condition is true or false for the entire population.

CHI-SQUARED TEST: A chi-squared test (symbolically represented as χ2) is basically a data analysis on
the basis of observations of a random set of variables.

• It is used to estimate how likely the observations that are made would be, by considering the
assumption of the null hypothesis as true.

• It helps to determine whether there is a notable difference between the normal frequencies and
the observed frequencies in one or more classes or categories.

• STEPS: State the null and alternate hypothesis

• Create a table of the observed and expected frequencies.

• Calculate the chi-square value from your observed and expected frequencies using the chi-square
formula.Calculate the degrees of freedom

• Find the critical chi-square value in a chi-square critical value table or using statistical software.

• Compare the chi-square value to the critical value to determine which is larger.

• Decide whether to reject the null hypothesis. You should reject the null hypothesis if the chi-
square value is greater than the critical value. If you reject the null hypothesis, you can conclude
that your data are significantly different from what you expected.

• Types of Chi-square tests:There are two commonly used Chi-square tests: the Chi-square goodness
of fit test and the Chi-square test of independence. Both tests involve variables that divide our data
INTERDISCIPLINARY IMN BIOSTATISTCS UNIT -1
into categories.For both the Chi-square goodness of fit and chi-square test of independence, we
perform the same analysis steps ,

• Degrees of freedom (df)

• The term "degrees of freedom" is used to refer to the size of the contingency table on which the
value of the Chi Square statistic has been computed. The degrees of freedom is calculated as the
product of (the number of rows in the table minus 1) times (the number of columns in the table
minus ).

• For a table with two rows of cells and two columns of cells, the formula is:

• df = (2 - 1) x (2 - 1) = (1) x (1) = 1

• For a table with two rows of cells and three columns of cells, the formula is:

• df = (3 - 1) x (2 - 1) = (2) x (1) = 2

• For a table with three rows of cells and three columns of cells, the formula is:

• df = (3 - 1) x (3 - 1) = (2) x (2) = 4

• Alpha or p-value

• The level of alpha can vary, but the smaller the value, the more stringent the requirement for
reaching statistical significance becomes. Alpha levels are often written as the "p-value", or "p=.05."
Usual levels are p=.05 (or the chance of one in 20 of making an error), or p=.01 (or the chance of
one in 100 of making an error), or p=.001 (or the chance of one in 1,000 of making an error). 100/%

or
χ2 = ∑(Oi – Ei)2/Ei
STUDENT’S T-TEST : In the area of statistics, a student's t-test is mentioned as a method of testing the
theory about the mean of a small sample drawn from a normally distributed population where the
standard deviation of the given population is unknown.
It tells us how significant the differences can be between different groups.
One-Sample, Two-Sample, or Paired t-Test?
1. If there is a group being compared against any standard value (e.g. comparing the acidity of any
liquid to a neutral pH of 7), perform a one-sample t-test.
2. If the groups are coming from two different populations (e.g. people from two separate cities),
perform a two-sample t-test (also known as independent t-test).
3. If the groups are coming from a single population (e.g. measuring after and before an experimental
treatment), perform a paired t-test.,
INTERDISCIPLINARY IMN BIOSTATISTCS UNIT -1

t = t-value
• x1 and x2 = means of the two groups being compared
• S = pooled standard deviation of the two groups
• n = numbers of observations in each of the groups..
• Larger t scores = more difference between groups.
• Smaller t score = more similarity between groups

You might also like