Statistics Notes
Statistics Notes
Sl no.
Content
Page no.
Sampling
Reliability
Validity
Normal distribution
Types of error
Parametric tests
11
10
Student t-test
11
11
12
ANOVA
Non Parametric tests
12
13
13
14
Statistics: It is a broad subject, with applications in a vast number of different fields. Generally
speaking it is the methodology of collecting, analysing, interpreting and drawing conclusions from
information. It provides methods for
Page 1
graphs, charts, tables and calculation of measures like averages, measures of variation and
percentiles.
Inferential statistics: it consists of methods for drawing and measuring the reliability of
conclusions about population based on the information obtained from a sample of the
population. It includes point estimation, interval estimation and hypothesis testing. Ex are
frequency table, class interval, bar graph etc.
Both are inter-related. It is always necessary to use methods of descriptive statistics to
organise and summarise the information before methods of inferential statistics can be
used. Ex are chi square test, t test, ANOVA etc.
Preliminary descriptive statistics reveals the choice of the appropriate inferential method to
be used.
Variable: it is any characteristic that varies from one individual member of the population to
another. Ex are height, weight, age, marital status etc.
Types of variables
Quantitative variable: they yield numerical information. Ex is age, height etc. it is further
divided into Discrete and Continuous.
Discrete variable: it can take only specified values- often integers. No intermediate numbers
are possible. Ex is number of children, students taking exam etc.
Continuous variable: data are not restricted on taking any certain specified value. Fractional
values are possible. Ex is height, weight etc. Weight can be measured accurately to the tenth
of a gram.
Qualitative variable: they yield non numerical information. Ex is marital status, sex etc.
Tables It is a simplest means of summarising a set of observations and can be used to represent all
types of data
Frequency distribution
o The number of observations that fall into a particular class of the qualitative variable
is called as frequency. A table listing all the classes and their frequencies as called a
frequency distribution.
o For Nominal and Ordinal data, a frequency distribution consists of a set of classes or
categories with the numerical count that corresponds to each one.
o To display Interval or Ratio data, data must be broken down into a series of distinct,
non overlapping intervals called as class intervals (CI).
o If there are too many class intervals, the summary is not too much of an
improvement over raw data. If there are too few class intervals then a great deal of
information is lost.
o Usually class intervals are constructed so that all have equal width; this facilitates
comparison between the classes.
o Once the upper and lower limit for each class interval is selected, the number of
values that fall within that pair of limit is counted and the result is arranged as a
table.
Relative frequency
Page 2
o
o
o
This gives the proportion of values that fall into a given interval in frequency
distribution.
It is calculated by dividing the number of values within that interval by total number
of values in that table.
It can be useful in comparing sets of data that contain unequal numbers of
observations.
SBP (mmHg)
100-109
110-119
120-129
130-139
140-149
Total
Frequency
5
15
21
6
3
50
Relative frequency
10
30
42
12
6
100
The scale gives certain structure to the variable and also defines the meaning of the variable.
There are four types of scale Nominal, Ordinal, Interval and Ratio scales
Nominal:
o Simplest type of data.
o Values fall in unordered categories or classes.
o Nominal refers to the fact the categories are merely names.
o Used to represent Qualitative data.
o Ex include gender- male & female, blood groups- A, B, AB & O
o If the data has only two distinct values then it is called as Binary/ Dichotomous.
Ordinal:
o If the same nominal can be put into an order then it is called as Ordinal data.
o Used to represent Qualitative data.
o Ex include depression as mild, moderate & severe.
o Here a natural order exists among the groups, but the difference between the
groups is not necessarily equal.
Interval:
o Here the data can be placed in a meaningful order and there is equal difference
between the groups.
o Ratio of the measurements cannot be done, that is there is no absolute zero.
o Used to compare Quantitative data.
o Ex include temperature measured in centigrade, time (even though 0000hrs exist,
there is nothing like no time)
Ratio:
o Here there is comparable difference between the variables as well as there is
absolute zero.
o Ratio of the measurements also can be done.
o Used to measure Quantitative data.
o Ex include temperature measured in Kelvin.
Page 3
Properties of data
Permissible statistics
Mode
Chi-Square test
Mode/ Median
Mean, Standard Deviation
Correlation,
Regression, t-Test,
ANOVA
Sampling
Page 4
5.
6.
7.
8.
In any distribution, majority of the observations pile up, or cluster around in a particular
region. This is referred to as the central tendency of the distribution.
So it is a statistical measure to determine a single score that defines the centre of the
distribution.
It makes the large amount of information easily understandable.
There are three types of central tendency- Mean, Median & Mode
Mean and Median can be applied only to Quantitative data, whereas Mode can be used with
either Quantitative or Qualitative data.
Mean Most frequently used measure of central tendency
It is the sum total of all the observations divided by total number of observations.
It is stable average based on all observations.
Calculated for Quantitative data, measured in interval/ ratio scale
Page 5
Reliability
Definition: Reliability refers to the ability of a measurement instrument to produce the same
result on repeated measurement.
Types of reliability:
1. Scorer/ Inter-rater reliability:
It refers to ability of measurement instrument to produce the same result
when administered by two different raters.
It is the probability that 2 raters (i) will give the same score to given answer,
(ii) rate a given behaviour in the same way, (iii) add up the score properly.
Scorer reliability should be near perfect.
2. Test-Retest reliability:
Page 6
It assesses the ability of a measurement to arrive at the same result for the
same subject on repeated administrations.
The interval between the test and retest should be long enough to ensure
that the persons responses are based on his/ her current condition rather
than the memory of responses in the first test administration.
If there is too long, there is a risk that the persons condition may have
changed.
3. Parallel form reliability:
It refers to the degree to which two equivalent versions of a test give the
same result.
This type of reliability is usually used when a test cannot be exactly
repeated.
4. Split half reliability:
If a test cannot be repeated or if there is no parallel form, a test can be split
in two and these two halves are correlated with each other. For e.x odd vs
even items.
There is a mathematical formula for computing the mean of all possible split
halves.
5. Internal consistency:
The degree to which one test item correlates with all other test items.
It is denoted by co-efficient.
It should not drop below 0.7.
Psychometric tests aim to measure a real quantity. Real quantity is true score (t), score
obtained on test is observed score (x). As no test is perfect, there is error (e).
Aim is to reduce e to minimum to make as reliable as possible.
In practice, when the test is repeated, each occasion will give different score, i.e. observed
score will cluster around true score. Like the distribution of any variable, the distribution of
observed score would have mean and SD.
If reliability of the test is perfect, real score= observed score (t=e).
Validity
Definition: the degree to which a test measures what it is supposed to measure is known as
validity.
Types of validity:
1. Face validity:
It refers to whether test seems sensible to the person completing it, i.e.
does it appear to measure what it is meant to be measuring.
2. Content validity:
It refers to the degree to which the test measures all the aspects of the item
that is being assessed.
For example, test for depression should have questions asking depressive
symptoms.
3. Concurrent validity:
Page 7
4.
5.
6.
7.
It reveals that at a given point of time, high scorers on a test may be more
likely than low scorers.
To determine tests concurrent validity, external measures are obtained at
the same time that the test is given to the sample of subjects.
For example, correlation between HAMD and MADRS in which concurrent
validity of MADRS is checked with HAMD, already established instrument for
depression.
Predictive validity:
It refers to degree to which a test predicts whether some criterion is
achieved in future.
Here the external criterion would have to be obtained a number of years
down the road for the test to have the predictive validity.
For example. Whether a child IQ test predicts
Construct validity:
It refers to whether a test measures some specified hypothetical construct.
If a test is measuring one construct, there should not be cluster of items that
seem to be measuring different things.
Factorial validity:
If a test breaks down into various sub factors then the number and nature of
these factors should remain stable across time and different subject
populations.
Incremental validity:
It refers whether the test results improve decision making.
For example, whether knowledge of neuropsychological test results
improves the detection of brain injury.
Normal distribution
Also called as Bell shaped curve or Gaussian distribution, after the well known
mathematician Karl Freidrich Gauss.
It is the most common and widely used continuous distribution
Bell shaped curve can be obtained by compiling data into a frequency table and graphing it
in a histogram.
Normal distribution is easy to work with mathematically. In many practical cases, the
methods developed using normal theory work well even when the distribution is nearly
normally distributed.
Standard normal distribution (Z distribution), is used to find probabilities and percentiles for
regular normal distributions. It serves as a standard by which all other normal distributions
are measured.
It is a normal distribution with mean 0 and standard deviation of 1.
Properties of the standard normal curve
o Its shape is symmetric.
o Area of the curve is greatest in the middle, where there hump and thins out towards
the tails.
o It has mean denoted by (mu) and standard deviation denoted by (sigma).
Page 8
o
o
Types of error
Aim of doing a study is to check whether the data agree with certain predictions. These
predictions are called hypothesis.
Hypothesis arises from the theory that drives the research. These are formulated before
collecting the data.
Hypothesis testing will help to find if variation between two sample distributions can just be
explained through random chance or not. Before concluding that two distributions vary in a
meaningful way, precautions must be taken to see that the differences are not just through
random chance.
There are two types of hypothesis- Null (H0) and Alternative hypothesis (H1).
Page 9
Null hypothesis is usually a statement that the parameter has value corresponding to, in
some sense, no effect.
Alternative hypothesis is a hypothesis which contradicts the null hypothesis.
Significance test is a way of statistically testing hypothesis by comparing the data values.
Significance test analyses the strength of sample evidence against the null hypothesis. The
test is conducted to investigate whether the data contradicts the null hypothesis, suggesting
alternative hypothesis is true.
p-Value is the probability, if H0 were true, that the test statistic would fall in this collection of
values. The smaller the p-value, the more strongly the data contradicts H0. When p-value
0.05, data sufficiently contradicts H0.
Type / I error- Rejecting true null hypothesis.
Leading to conclusion that difference is significant, when in fact there is no real
difference.
In simple terms, it is asserting something that is absent. It is False Positive.
For example, a study indicating that a particular treatment cures a disease when in
fact it does not.
It is popularly known as p-value. Maximum p-value allowed is called as level of
significance. Being serious p-value is kept low, mostly less than 5% or p<0.05.
It means to say that there is only 5 in 100 chance that the variation we are seeing is
due to chance.
Type / II error- Accepting false null hypothesis.
Leading to conclusion that difference is not significant, when in fact there is real
difference.
It is also called as False Negative.
For example, a study indicating that a new treatment modality fails to work, when in
fact it does work.
It is also called as Power of the test & indicates sensitivity of the test.
Type II error can be decreased by ensuring enough power.
All statistical hypothesis tests have a probability of making type I and II errors. These error
rates are traded off against each other. For a given test, the effort to reduce one type of
error generally results in increasing the other type of error.
For a given test, one way to reduce the error is to increase the sample size where ever it is
feasible.
It is not possible to reduce both type I & II, So error is fixed at a tolerable limit & error is
minimized by sample size.
Page 10
Parametric tests
A parametric test is one that makes assumptions about the parameters (defining properties)
of the population distribution from which ones data are drawn.
These tests assume that data is normally distributed and rely on group means.
Most well known elementary statistical methods are parametric.
These tests assume more about a given population. When the assumptions are correct, they
produce more accurate and precise estimates.
When the assumptions are not correct these tests have a greater chance of failing, and for
this reason are not robust statistical methods.
Parametric formulae are often simpler to write down and faster to compute, and for this
reason they lack robustness.
Examples of Parametric tests are t-tests and ANOVA.
Advantages over Non parametric tests
1. Statistical power- parametric test have more statistical power than non parametric
tests. Thus more likely to detect a significant effect when one really exists.
2. Parametric perform well when the spread of each group is different- this is because
even though non parametric tests dont assume that data follow normal
distribution, they have other assumptions like the data for all groups must have the
same spread.
3. Parametric tests can perform well even with skewed and non normal distributions,
provided the data satisfies some guidelines about sample size.
Parametric test
1-Sample t test
2-Sample t test
One way ANOVA
Student t-test
Developed by WS Gosset, a chemist working for the Guinness brewery in Dublin, Ireland
("Student" was his pen name).
It can be used to determine if two sets of data are significantly different from each other.
A t-test is any statistical hypothesis test in which the test statistic follows a Student's t
distribution if the null hypothesis is supported.
The t-distribution is similar to standard normal distribution, but its shorter and flatter than
the standard normal distribution (Z distribution). But as the sample size increases (degrees
of freedom) the t distribution approaches the standard normal distribution.
Degree of freedom is always a simple function of sample size that is (n-1).
Particular advantage of the t-test is it does not require any knowledge of the population
standard deviation. So it can be used to test hypothesis of a completely unknown
population, and the only available information about the sample comes from the sample.
All that is required for a hypothesis test with t-test is a sample and a reasonable hypothesis
about the population mean.
Types of Student t-test
Page 11
1. One sample t-test: only one group and typically used to compare a sample mean to a
known population mean. E.g. a group of schizophrenic patients were assessed on
cognitive tests and the group is compared with the population.
2. Paired/ Dependent sample t-test: there are two groups and two means which are
related to each other. Like two scores for each person, or when there are matched
scores. E.g. a sample is tested to see the metabolic side effects before and after
treatment. Before and after sample are compared to see any difference.
3. Unpaired/ Independent/ two sample t-test: there are two means, two groups and
not related to each other. Used to compare means from independent groups. E.g.
two sample of patients one given a placebo and another newer anti psychotic both
were compared to see the utility of newer antipsychotic.
Assumptions:
1. Data must come from a population that follows a normal distribution.
2. For two sample t-test, the two populations must have equal variances. If variances
are not equal then Wlechs t-test is used.
3. Each score must be independent of all other scores.
Types of ANOVA depending on the number of treatments and the way they are applied to
the subjects in the experiment:
1. One-way ANOVA is used to test for differences among >=3 independent groups. E.g.
Group A is given vodka, Group B is given gin, and Group C is given a placebo. All groups
are then tested with a memory task.
Page 12
2. One-way ANOVA for repeated measures is used when the subjects are dependent
groups; this means that the same subjects are used for each treatment. E.g. Group A is
given vodka and tested on a memory task. The same group is allowed a rest period of
five days and then the experiment is repeated with gin. Again, the procedure is repeated
using a placebo.
3. 22 ANOVA, the most common type of factorial analysis of variance, is used when the
experimenter wants to study the effects of >= 2 treatment variables. Factorial ANOVA
can also be 222, 33, etc. but higher numbers of factors is rarely done because the
calculations are lengthy and the results are hard to interpret. E.g. In an experiment
testing the effects of expectation of vodka and the actual receiving of vodka, subjects
are randomly assigned to four groups: 1) expect vodka-receive vodka, 2) expect vodkareceive placebo, 3) expect placebo-receive vodka, and 4) expect placebo-receive placebo
(the last group is used as the control group). Each group is then tested on a memory
task. The advantage of this design is that multiple variables can be tested at the same
time instead of running two different experiments.
Assumptions of ANOVA
1.
2.
3.
4.
Many of the tests that are used for analyses of data have a presumption that data must have
a normal distribution. When the data does not meet normal distribution, Non Parametric
tests are used.
These are those tests that dont have any presumption about data.
These work with Median, which is a much more flexible statistic because it is not affected by
outliers.
Advantages of non parametric tests
1. Area of study is better represented by median. That is median better represents the
centre of distribution.
2. When the sample size is small. With small sample size it is not possible to ascertain
the distribution of data, so the parametric tests lack sufficient power to provide
meaningful results.
3. Presence of outliers. Parametric test can only assess continuous data and the results
can be significantly affected by outliers.
Testing a mean
Comparison of means of 2
unrelated groups
Comparison of means of 2
related samples
Comparison of means of > 2
unrelated samples
Comparison of means of > 2
Parametric test
One sample t test
Independent sample t test
Paired t test
ANOVA
Friedmans test
Page 13
related samples
Assessing the relationship
between 2 quantitative
variables
Pearsons correlation
Spearmens correlation
Page 14
3. Frequency data must have a precise numerical value and organised into groups
4. All the in the sample must be independent
5. Lowest expected frequency should not be less than 5
Limitations of chi square test
1. Can be applied in only on fourfold table
2. Will not give result if the expected value in any cell is less then 5
3. If the sample total is less than 50, need to be interpreted in caution
4. Test only tells about presence or absence of association between two events and
not strength of association
5. Tells only about probability of occurrence and does not indicate cause and effect
relationship
The use of statistics begins from even before the study is started. It starts from designing the
study like what type of study (retrospective/ prospective, case control/ cohort, cross
sectional/ follow up), how to collect the sample (simple/ stratified random, convenient) etc.
Sample size calculation is the most important before the study is actually started. There are
various methods in which it can be calculated. Various softwares are available to calculate
the same.
Once the sample size is obtained, the data is collected and has to be transferred onto a MS
Excel sheet or a SPSS data sheet.
SPSS data sheet helps in analysing the data easily. Even a MS Excel data sheet can be
converted into SPSS data sheet without much problem using SPSS software.
The data thus obtained is analysed using descriptive and inferential statistics.
Page 15