0% found this document useful (0 votes)
29 views

Final Biostatistics Lecture Notes

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Final Biostatistics Lecture Notes

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 71

BIOSTATISTICS AND RESEARCH METHODS

BIOSTATISTICS
STATISTICS - is the science of collecting, summarizing, presenting and interpreting data, and of using them to
estimate the magnitude of associations and test hypotheses
BIOSTATISTICS is a branch of applied statistics that is concerned with the application of statistical methods to
biological events (medicine, clinical trials, demography, population estimation, modelling, community diagnosis
and surveys). When the different statistical methods are applied in biological, medical and public health data, they
constitute the discipline of biostatistics. In general, the purpose of using biostatistics is to gather data that can be
used to provide honest information about unanswered biomedical questions. Biostatistics is now considered an
essential tool in the planning and delivery of health care systems. The knowledge and ability to use bio-statistical
techniques have also become increasingly important in health sciences. The medical practitioner in the 21 st century
will need a far greater ability to evaluate new information than in the past. A good understanding of biostatistics
can improve clinical thinking, decision making, evaluations and medical research. The role of biostatistics in
medical education is now well recognised.
Constant – Quantities that do not vary e.g. in biostatistics, mean, standard deviation are considered constant for a
population
• Variable – Characteristics which takes different values for different person, place or thing such as height, weight,
blood pressure
• Parameter – It is a constant that describes a population e.g. in a college there are 40% girls. This describes the
population, hence it is a parameter.
• Statistic – Statistic is a constant that describes the sample e.g. out of 200 students of the same college, 45% girls.
This 45% will be a statistic as it describes the sample
• Attribute - A characteristic based on which the population can be described into categories or classes e.g. gender,
caste, religion.
Essential features of statistics
a. Principles and methods for the collection of presentation, analysis and interpretation of numerical data of
different kinds.
1. Observational data, qualitative data.
2. Data that has been obtained by a repetitive operation.
3. Data affected to a marked degree of a multiplicity of causes.
b. The science and art of dealing with variation in such a way as to obtain reliable results.
c. Controlled objective methods whereby group trends are abstracted from observations on many separate
individuals.
d. The science of experimentation which may be regarded as mathematics applied to observational data.
WHY STATISTICS?
 Variability in measurement can be handled using statistics. E.g.: the investigator makes observations
according to his judgement of the situation. (Depending upon his skills, knowledge, and experience).
 Statistics pervades a way of organizing information on a wider and more formal basis than relying on the
exchange of anecdotes and personal experience.
 There is a great deal of intrinsic (inherent) variation in most biological processes.
 Public health and medicine are becoming increasingly quantitative. As technology progresses, the physician
encounters more and more quantitative rather than descriptive information. In one sense, statistics is the
language of assembling and handling quantitative material. Even if one’s concern is only with the results of
other people’s manipulation and assemblage of data, it is important to achieve some understanding of this
language to interpret their results properly.
 The planning, conduct, and interpretation of much of medical research are becoming increasingly reliant on
statistical technology. Is this new drug or procedure better than the one commonly in use? How much better?
What, if any, are the risks of side effects associated with its use? In testing a new drug how many patients
must be treated, and in what manner, to demonstrate its worth? What is the normal variation in some clinical
measurements? How reliable and valid is the measurement? What is the magnitude and effect of laboratory
and technical error? How does one interpret abnormal values?
 Statistics pervades the medical literature. As a consequence of the increasingly quantitative nature of public
health and medicine and its reliance on statistical methodology, the medical literature is replete with reports
in which statistical techniques are used extensively.
 Epidemiology and Biostatistics are sister sciences or disciplines.
 Epidemiology collects facts relating to a group of population in places, times and situations.
 Biostatistics converts all the facts into figures and at the end translates them into facts, interpreting
the significance of their results.
 Epidemiology and biostatistics both deal with the facts-figures-facts

Epidemiology and biostatistics both deal with the facts-figures-facts

QUANTITATIVE METHODOLOGY

USES OF BIOSTATISTICS
1. To test whether the difference between two populations is real or by chance occurrence.
2. To study the correlation between attributes in the same population.
3. To evaluate the efficacy of vaccines/drugs.
4. To measure mortality and morbidity.
5. To evaluate the achievements of public health programs/research
6. To fix priorities in public health programs
7. To help promote health legislation and create administrative standards for oral health.
DATA
What is DATA?
Data are facts or figures, or information, especially numerical facts, collected together for reference or information.
Data is the plural of "datum”. The collective recording of observations either numerical or otherwise is called data.
Many of the steps to conducting a field investigation rely on identifying relevant existing data or collecting new
data that address the key investigation objectives.
Types of data
Qualitative
Quantitative
Qualitative data
Observation or information characterized by measurement on a categorical scale (dichotomous, nominal or ordinal
scale).
Data that describe the quality of the subject studied. E.g. gender, ethnicity, death or survival, nationality etc.
Generally describes in terms of percentages or proportions. Mostly displayed by using contingency tables, pie
charts, and bar charts.
Quantitative data
Data in numerical quantities such as continuous measurements or counts. Observation for which the differences
between numbers have meaning on a numerical scale. They measure the quantity of something.
Types of numerical scales;
Continuous scale (e.g. Age, height)
Discrete scale (e.g. Number of pregnancies)
Described in terms of means and standard deviation.
Frequency tables and histograms are most often used to display this type of information.
How to analyse DATA?
Using STATISTICS!
A small representative ‘sample’ is used to study a big ‘population’
Why?
Expensive to conduct a very large study
Impossible to collect information from everyone in the population
POPULATIONS AND SAMPLES
POPULATION - is the collection or set of all of the values that a variable may have. The population is a
complete collection of data on the group under study. e.g.: If we are interested in the weights of students
enrolled in Vet. Med at the University of Abuja, then our population consists of the weights of all of these
students, and our variable of interest is the weight.
Population Size (N): The number of elements in the population is called the population size and is denoted by N.
Populations can be thought of as existing or conceptual. Existing populations are well–defined sets of data
containing elements that could be identified explicitly while Conceptual populations are non–existing, yet
visualized, or imaginable sets of measurements. Examples:
Existing populations
a. Red blood cells (RBC) counts of children diagnosed with malaria as of October 30, 2016 at the UATH
Gwagwalada.
b. Amount of active drug in all 50 mg berenil sachet manufactured in October 2009.

c. Presence or absence of prior myocardial infarction (MI) in all males horses between 3 and 6 years of age brought
to VTH Gwagwalada.
Conceptual populations
Could be thought of as characteristics of all people with a disease, now or shortly, also as the outcomes of some
treatment that were given to a large group of subjects.
a. Bioavailability of a drug’s oral dose relative to i. v. dose in all healthy subjects under identical conditions.
b. Presence or absence of MI’s in all current and future high blood pressure patients who receive short–acting
calcium channel blockers.
c. Positive or negative result of all pregnant women using a particular type of pregnancy test kit.
Target Population
Target population refers to the ENTIRE group of individuals or objects to which researchers are interested in
generalizing the conclusions. The target population usually has varying characteristics and it is also known as the
theoretical population.
Accessible Population
The accessible population is the population in research to which the researchers can apply their conclusions. This
population is a subset of the target population and is also known as the study population. It is from the accessible
population that researchers draw their samples. The factor which determines the choice of the population is the
problem under investigation. The population should be such that it can provide the most authentic and dependable
data necessary for solving the problem and should be such that the generalizations or conclusions from the study
can validly apply to it. So when a researcher is specifying his research population, he is setting some standard
against which his study will be judged.

SAMPLE
A sample is a part of a population. From the population, we select various elements on which we collect our data.
This part of the population on which we collect data is called the sample. E.g. suppose we are interested in
studying the characteristics of the weights of the students enrolled in Vet. Med. at the Uni. Abuja. If we randomly
select 30 students among the students of in Vet. Med. at the Uni. Abuja and measure their weights, then the
weights of these 30 students from our sample. Sample Size (n): The number of elements in the sample is called the
sample size and is denoted by n. A sample is a collection of sampling units selected from the population.
Sampling unit: Is a member of the population.
SAMPLING

Sampling Methods

Sampling Methods can be classified into one of two categories:

 Probability Sampling: Sample has a known probability of being selected

 Non-probability Sampling: Sample does not have known probability of being selected as convenience or
voluntary response surveys

PROBABILITY SAMPLING

In probability sampling, it is possible to both determine which sampling units belong to which sample and the
probability that each sample will be selected. The following sampling methods are examples of probability
sampling:
 Simple Random Sampling

 Stratified Sampling

 Cluster Sampling

 Systematic Sampling
 Multistage Sampling (in which some of the methods above are combined in the stage.
NON-PROBABILITY SAMPLING (Purposive selection)

With non-probability sampling methods, we do not know the probability that each population element will be
chosen, and/or we cannot be sure that each population element has a non-zero chance of being chosen.

Non-probability sampling methods offer two potential advantages - convenience and cost. The main disadvantage
is that non-probability sampling methods do not allow you to estimate the extent to which sample statistics are
likely to differ from population parameters. Only probability sampling methods permit that kind of analysis.
Examples of non-probability sampling methods are:

VOLUNTARY SAMPLE

A voluntary sample is made up of people who self-select into the survey. Often, these folks have a strong interest
in the main topic of the survey.
E.g. Suppose, that a news show asks viewers to participate in an online poll. This would be a volunteer sample.
The sample is chosen by the viewers, not by the survey administrator.

CONVENIENCE SAMPLE

A convenience sample is made up of people who are easy to reach.


Consider the following example. A pollster interviews shoppers at a local mall. If the mall was chosen because it
was a convenient site from which to solicit survey participants and/or because it was close to the pollster's home or
business, this would be a convenience sample.

PROBABILITY SAMPLING METHODS

 SIMPLE RANDOM SAMPLING.

A simple random sample (n) is drawn from a population (N) in such a way that every possible sample of size (n)
has an equal opportunity of being chosen.

Simple random sampling refers to any sampling method that has the following properties.
-The population consists of N objects.
-The sample consists of n objects.
-If all possible samples of n objects are equally likely to occur, the sampling method is called simple random
sampling.
There are many ways to obtain a simple random sample. One way would be the lottery method. Each of the N
population members is assigned a unique number. The numbers are placed in a bowl and thoroughly mixed. Then,
a blindfolded researcher selects n numbers. Population members having the selected numbers are included in the
sample.
 STRATIFIED SAMPLING
With stratified sampling, the population is divided into groups, based on some characteristic. Then, within each
group, a probability sample (often a simple random sample) is selected. The sample resulting from combining these
samples is called a stratified random sample. In stratified sampling, the groups are called strata. For example,
suppose we conduct a national survey. We might divide the population into groups or strata, based on geography -
north, east, south, and west. Then, within each stratum, we might randomly select survey respondents.
 CLUSTER SAMPLING
With cluster sampling, every member of the population is assigned to one, and only one, group. Each group is
called a cluster. A sample of clusters is chosen, using a probability method (often simple random sampling). Only
individuals within sampled clusters are surveyed.
Note the difference between cluster sampling and stratified sampling. With stratified sampling, the sample includes
elements from each stratum. With cluster sampling, in contrast, the sample includes elements only from sampled
clusters.
 MULTISTAGE SAMPLING
In multistage sampling, we select a sample by using combinations of different sampling methods. E.g., in Stage 1,
we might use cluster sampling to choose clusters from a population, in Stage 2, we might use simple random
sampling to select a subset of elements from each chosen cluster for the final sample.
SAMPLING ERRORS

• Faulty sample design

• Small sample size

Non-Sampling errors
o Coverage errors- due to non-response or noncooperation of the informant.
o Observational errors: interview bias, imperfect experimental technique.
o Processing errors: Statistical Analysis
DATA PRESENTATION
Two main types of data presentation are:
• Tabulation
• Graphic representation - charts and diagrams
TABULATION
Tables are a simple device used for the presentation of statistical data.
PRINCIPLES:
 Tables should be as simple as possible. (2-3 small tables).
 Data should be presented according to size or importance, chronologically or alphabetically.
 Should be self-explanatory.
 Each row and column should be labelled concisely and clearly.
 A Specific unit of measure for the data should be given.
 The title should be clear, concise and to the point.
 Total should be shown.
 Every table should contain a title as to what is depicted in the table.
 In the small table, vertical lines separating the column may not be necessary.
 If the data are not original, their source should be given in a footnote.
TYPES OF TABLES

MASTER SIMPLE TABLE FREQUENCY


TABLE DISTRIBUTION
TABLE
Master table
Contains all the data obtained from a survey
SIMPLE TABLE
One way tables supply the answer to questions about one characteristic of data only.
FREQUENCY DISTRIBUTION TABLE
Two-column frequent table.
The first column lists the classes into which the data are grouped.
The second column lists the frequency for each classification
Charts and diagrams
Most convincing and appealing ways of depicting statistical results.
Principles

1. Every diagram must be given a self-explanatory tittle.

2. Simple and consistent with the data.

3. The values of the variable are presented on the horizontal or X-axis and frequency on the vertical line Y-axis.

4. Number of lines drawn in any graph should not be many.

5. Scale of presentation for X-axis and Y-axis should be mentioned.

6. The scale of the division of both the axes should be proportional and the divisions should be marked along with the details of the variable and frequencies
presented on the axes.
BAR CHARTS
• Represents qualitative data.
• Bars can be either vertical or horizontal.
• Suitable scale is chosen
• Bars are usually equally spaced
• They are of three types:
• Simple bar chart- represents only one variable.
• Multiple bar chart-each category of a variable there are set of bars.
• Component /proportional bar chart, - individual bar is divided into 2 or more parts.
PIE CHART

• Entire graph looks like a pie.

• It is divided into different sectors corresponding to the frequencies.


LINE DIAGRAM
Useful to study changes of values in the variable over time and is the simplest type of diagram.

HISTOGRAM
• Pictorial presentation of frequency distribution
• No space between the cells on a histogram.
• class interval is given on vertical axis

• area of a rectangle is proportional to the frequency


Histograms (quantitative continuous data)

A histogram is the graph of the frequency distribution of continuous measurement variables. It is constructed based on the following principles:
a) The horizontal axis is a continuous scale running from one end of the distribution to the other. It should be labelled with the name of the variable and the units of
measurement.
b) For each class in the distribution a vertical rectangle is drawn with,
Its base on the horizontal axis extending from one class boundary of the class to the other class boundary, there will never be any gap between the histogram
rectangles.
The bases of all rectangles will be determined by the width of the class intervals.

Histogram

• Display the frequency distributions of one variable


• Very similar to a bar chart that is used for categorical data
• Consists of a set of columns with no space between each of them
– On the horizontal axis, the variables of consideration
– On the vertical axis, the scale of frequency of occurrence
– The area under each column represents the frequency of each class and thus the total under all columns equal the total frequency
HISTOGRAM

• Unimodal - bell shaped curve


• Symmetric about its mean –
Mirror image
• Mean = Median = Mode
NOTE

In descriptive presentation use Mean (SD) or Median (IQR)


– Normal distribution data, use Mean (SD)
– Skewed distribution data, use Median (IQR)
• In inferential results use Mean (SE#
TYPES OF STATISTICS
 Descriptive statistics
 Inferential statistics

Descriptive statistics
Describe the frequency and distribution to characterize data collected from a group of samples to represent the population.
E.g. - Percentage of patients attending the diabetes clinic
– Gender, age group, education level of the patients
– Patients waiting time for doctors consultation
– Patients fasting glucose and HbA1c level
– Etc.
VARIABLE
A variable is a characteristic that can take on different values for different members of the group under study e.g. a group of university students will be found to differ in
gender, height, attitudes, intelligence and many ways. These characteristics are called variables.
• Categories of variable
– Continuous vs. Discrete
– Independent vs. Dependent
• Continuous variable
– Can take on any values on the measurement scale under study
– Do not fit into a finite number or categories
– Referred to as measurement data
– E.g. weight, height, age, blood pressure etc.
• Discrete variable
– Only designated values or integer values i.e. 1, 2, 3…
– Fit into limited categories
– Referred as count data (dichotomous/ multichotomous)
• E.g. dichotomous
– Male-Female
– Yes-No
• E.g. multichotomous
– Malay-Chinese-Indian
Man Utd-Arsenal-Chelsea-Man City-Liverpool

• Independent variable (IV)


– Manipulated based on the purpose of the investigation
– Set by researcher
• Dependent variable (DV)
– Consequence of the independent variable
– Affected by the independent variable
– Outcome
Scales

• Type of scales
– Nominal – classify observation that cannot be numerically arranged (no order)
– Ordinal – assign an order to categories so that one category is higher than another
– Interval / ratio –sequential ranking of values (as ordinal scales)
MEASURES OF STATISTICAL AVERAGES OR CENTRAL TENDENCY
• Central value around which all the other observations are distributed.
• Main objective is to condense the entire mass of data and to facilitate the comparison.
• the most common measures of central tendency that are used in sciences:
– MEAN
– MEDIAN
– MODE

Mode
• Most frequently occurring observation in a data is called mode
• Not often used in medical statistics.
• Is the most frequently occurring value in a set of discrete data
• Can be more than one mode if two or more values are equally common
E.g.
– 1,3,4,7,2,5,9,4,6,7,8,9,3,4,9,6,4,5,
2,1,6,6,7,4,3

– 1,1,2,2,3,3,3,4,4,4,4,4,5,5,6,6,6,6,7,
7, 7, 8,9,9,9
Mode=4

Median

• The value halfway through the ordered data set


• Generally a good descriptive measure of the location which works well for skewed data or data with outliers
• When all the observations are arranged either in ascending order or descending order, the middle observation is known as the median.
• In case of even numbers the average of the two middle values is taken.
• Median is a better indicator of central value as it is not affected by the extreme values.
• E.g. (n=25)
–3,4,7,2,5,1,9,4,6,7,8,9,3,4,9,6,4,
5, 2,1,6,6,7,4,3
Ordered data
–1,1, 2,2,3,3,3,4,4,4,4,4,5,5,6,6,
6, 6, 7,7,7,8,9,9,9
Median=5
Mean

• The sample mean is an estimator available for estimating the population mean.
• Its value depends equally on all of the data which may include outliers.
• Refers to the arithmetic mean
• It is obtained by adding the individual observations divided by the total number of observations.
• Advantages – it is easy to calculate.
• Most useful of all the averages.
• Disadvantages – influenced by abnormal values.




• E.g. (N=10) (3, 4, 7, 2, 5, 7, 5, 5, 1, 2)
= 3+3+4+7+2+5+7+5+5+1+2
10
Mean = 4.1

MEASUREMENT OF DISPERSION
• Dispersion is the degree of spread or variation of the variable about a central value.
• Helps to know how widely the observations are spread on either side of the average.
• Used to describe the variability (spread and dispersion) in a given sample
Dispersion measurement;
– Range
– Percentiles
– Variance
– Standard deviation
– Standard error
– Interquartile range
Range
– Defined as the difference between the value of the largest item and the smallest item.
– Gives no information about the values that lie between the extreme values.
– Difference between the highest and the lowest value
Percentiles
– Indicate the % of individuals who have equal to/below a given value.
Variance
– Provides information about how individuals differ within sample.
Standard deviation (SD)
– Gives information about the spread/variability of scores around the mean.
The Standard error (SE)
– Indicates about the certainty of the mean itself.
Interquartile Range (IQR) the distance between 1st and 3rd quartile
Range

• Is the difference between the smallest and largest value in a set of observation
• Range = (the largest value – the smallest value)
– E.g. 3,5,6,7,9,10
– Range = 10 – 3 = 7
• Uses only extreme values and ignores the other values in the data set.
Variance

• Measure spread or dispersion within a set of sample data.


• E.g. for N observation X1, X2, X3,.... Xn with sample mean:

• Therefore, the sample variance is


Standard deviation (SD)

• Most important and widely used measure of studying dispersion.

• The greater the S.D, greater will be the magnitude of dispersion from the mean.

• Smaller S.D means a higher degree of uniformity of the observations.

• Measure of spread or dispersion of a set of data


• Calculated by taking the square root of the variance.
S.D=

The more widely the values are spread out, the larger
Standard Error (of the Mean)

• The SEM quantifies the precision of the mean.


• A small SEM indicates that the sample mean is likely to be quite close to the true population mean.
• A large SEM indicates that the sample mean is likely to be far from the true population mean
• A small SEM can be due to a large sample size rather than due to tight data.
Interquartile Range (IQR)

st rd
• IQR is the distance between the 1 and 3 quartile.
• It is not sensitive to extreme values (outliers).
• Thus, it is usually described together with the median in a skewed distribution of observation.
• Formula: IQR = (Q3 – Q1)
DRAWING INFERENCE FROM RESEARCH DATA
Statistics could be classified into two: descriptive and inferential statistics. While descriptive statistics deals with the methods and techniques of summarising and
describing information (data), inferential statistics goes beyond mere summarising and description of data. Inferential statistics is concerned with gaining knowledge of
a population’s characteristics from information collected from a random sample of the population. In other words, it is concerned with drawing inference or
generalizations about the characteristics of a population based on data collected from a random sample of that population. Therefore, with inferential statistics, we can
draw conclusions that apply beyond the actual subjects studied and extended to other subjects not

studied but which belong to the same population as those studied.

It should be noted that any research whose aim is to draw conclusions

that can apply only to the actual elements or subjects studied will be of limited applicability; and this can hardly be the aim of any meaningful research. Rather, any
meaningful research should be interested in conclusion that, although based on a limited number of subjects/elements actually studied, could still apply to other
subjects/elements not studied. Generally, this is what we desire to achieve in any research; and only inferential statistics can help us to realize such a desire. To this
extent, inferential statistics has contributed immensely to the development of research by providing more efficient ways of handling data and dealing with complex
problems. Our understanding of the effects has also widened through the application of inferential statistics.

Hypothesis Testing

Drawing inference about a population based on a random sample from that population involves formulation and testing of hypothesis. A hypothesis could be conceived
generally as an informed guess, a hunch or conjecture about the solution of the problem under investigation. However, in inferential statistics, the term assumes a very
specific meaning. In this context, a hypothesis is a guess, a hunch, or a conjecture about one or more population parameters. In other words, any hypothesis to be
subjected to inferential statistical test must specify the relevant population parameter(s) on which the test is to be

The hypothesis tested is usually stated as a Null hypothesis

represented with the symbol H0. A null hypothesis is one which posits that no difference or no relationship exists between two variables. It is a hypothesis of no
difference or no effect. For instance, in a study to compare the performance of male and female students in science, a simple null hypothesis could be: ‘There is no
significant difference between the mean performance of male and female students in science. Using symbols, this can be expressed thus:

H :µ =µ or µ - µ = 0
0 B G B G
Where µ = population mean of male students µ = population mean of female students
B G
this null hypothesis is usually tested against what is called Alternative hypothesis given the symbol H . This alternative hypothesis specifies the possible conditions
a
not included in the
null hypothesis. In other words, the alternative hypothesis does not hold. It is the hypothesis which we accept when the null hypothesis is rejected. In the example
above, an alternative hypothesis could be:

There is no significant difference between the mean performance of

male and female students in science;


i.e H : µ µ or µ µ 0
a B≠ G B- G≠
Other alternative hypothesis (possible condition not included in the null hypothesis) could be:
(a)The mean performance of male students is significantly higher than the mean performance of female students in science.

i.e H : µ µ or µ µ >0
a B> G B- G
The mean performance of female is significantly higher than the mean performance of male in science.

i.e H : µ µ or µ µ >0
a G> B G- B
Decision Rule

In testing a hypothesis, we usually compare the calculated value of the test statistics with a critical or table value of the test statistics. The critical or table value of a test
statistics, therefore, serves as a criterion value. This serves as the basis of rejecting or not rejecting the null hypothesis. As a rule, the decision to reject or not reject the
null hypothesis depends on whether the calculated value of the test statistics is greater than or less than the critical value.

Decision One

Reject the Null hypothesis if the calculated value of the test statistics is greater than the critical value.
Decision two
Do not reject the Null hypothesis if the calculated value of the test statistic is less than the critical value.
The critical value of the various test statistics are usually read off from the usual statistical table or from the net.

ERRORS IN DECISION AND LEVEL OF SIGNIFICANCE

When the researcher decides to reject or not to reject the Null hypothesis, he does so fully aware that his decision cannot be perfectly correct. Decision of this sort are
usually characterised by some degrees of errors and the researcher is usually keen, not only on reducing such errors, but also on knowing their magnitude. Let us
consider the possible situation that can occur when we decide to either reject or not to reject a Null hypothesis. There are four of such situations as follows:

1.A true Null hypothesis is not rejected


2.A true Null hypothesis is rejected
3.A false Null hypothesis is rejected
4.A false Null hypothesis is not rejected
When a true Null hypothesis is not rejected or a false Null
hypothesis is rejected, the correct decision is taken and no error is involved as in 1 and 3 above. However when a true Null hypothesis is rejected or a false Null
hypothesis is not rejected, an incorrect decision is made.
These represents the two types of errors encountered in hypothesis testing. These errors are called Type I and Type II errors.

Type I Error is made when a true Null hypothesis is

rejected
Type II Error is made when a false Null hypothesis is not rejected.

The table below shows the possible conditions in making a decision about the Null hypothesis, including the two types of errors.

STATE OF NATURE DECISION


Reject Null hypothesis Do not reject Null
hypothesis

Null hypothesis is true Type I Error Correct decision


The probability of making type I error is designated alpha (α),
while the probability of making type II error is designated beta (β).
The probability of making a type I error while testing a Null hypothesis is called the level of significance or the alpha level. This represents the amount of risk or error
the researcher is willing to tolerate in rejecting the Null hypothesis. This being the case, the researcher, therefore specifics an alpha level or level of significance before
conducting the test.

The higher the level of significance, the higher the error


associated with the decision. For instance a level of significance of 0.95 or 95%, implies that a decision to reject the Null hypothesis stands 95 chances of being in error
out of every 100 cases. In other words, such a decision is likely to be correct 5 out of every 100 cases. On the other hand, a level of significance of
0.05 or 5%, means that the decision to reject the Null hypothesis is likely to be in error 5 out of every 100 cases. This would mean that such a decision is likely to be
correct 95 out of 100 times.

However, we cannot because of the above fact make the

level of significance too small. If we make it too small, we reduce type I error but increase type II error. The determination of the probability associated with type II
error is not easy, and we will not be going into it here.
Fortunately, in the field, the choice of an alpha level is no longer much of a problem as the 0.05 and 0.01 levels of significance has come to be accepted as desirable.

Testing Hypothesis about the Difference between Two Population Means when the sample size is large: The Z-test
The Z-test usually adopted in testing hypothesis about the difference between two populations means when the sample size is large. Generally, a sample is considered to
be large if its size is equal to or greater than 30. Otherwise, the sample is regarded as small.
Assuming X1 and X2, represent the means of two independent sample S 1 and S2; n1 and n2 the corresponding standard deviation and sample sizes, respectively; then the Z-
test statistic (or ratio) is computed using the formula:
Z= x1 - x2
SDx
Where SDx = Standard error of different means.

SDx = √ S21 + S22


n1 n2

Z = x1 - x2
√ S21 + S22
n1 n2

To illustrate the application of Z-test in testing for the significance of the difference between two independent means, let us look at the following example. In a study to
determine the effectiveness of a new instructional method (E-Learning method) relative to the conventional methods, 40 students were assigned to each of the two
methods. The score obtained by the two groups on the post-test administered after the treatment were as shown below.
Post-exam scores of students exposed to New E- learning method and the Conventional method
E-Learning Method Conventional Method
23, 18, 22, 15, 14, 16, 13, 11, 20, 19, 8, 15, 10, 5, 5, 11, 14, 13, 10, 17, 12,
28, 17, 20, 17, 12, 24, 14, 10, 17, 14, 9, 16, 0, 4, 23, 15, 6, 15, 30, 0, 10, 7,
8, 21, 19, 23, 26, 18, 23, 22, 14, 20, 16, 9, 13, 8, 15, 11, 12, 10, 3, 19, 5,
14, 17, 19, 14, 24, 24, 29, 20, 12, 16 20, 14, 10, 3, 11, 9

The researcher now wants to determine whether those exposed to the E-learning method did better than those exposed to the conventional method. To do this, he has to
formulate and test an appropriate hypothesis about the difference between the means of the two groups of students. The procedures are as follows
Step 1: The appropriate null hypothesis under the Z-test is:
HO: There is no significant difference between the mean score of the
Student exposed to E-learning method and the mean score of those exposed to the conventional method. i.e. =. µE-µC = 0 or µE= µC
Where µE=population mean score of the group exposed to the E-learning method
And µC=Population mean score of the group exposed to the conventional method.
The alternative hypothesis against which the null hypothesis will be tested is stated as follow:
HA: There is a significant difference between the mean score of the student exposed to the E-learning method and the mean score of those exposed to the conventional
method. i.e. µE ≠ µC or µE-µC ± 0: Where µE and µC retain their previous meanings
Step 2: Level significance
The new hypothesis will be tested at the 0.05 or 5% level of significance
Step 3: Computation of the test statistic.
The test statistics in this case is the Z ratio. Before we can compute it, we must first calculate the means and the standard deviations of the two groups of students.
We will not go into the computation of the means and the standard deviation here since we are already familiar with the procedure involved. The means and the standard
deviations of the two groups have been calculated to be:
E-Learning method Conventional method
x E =18.18 x 2 = 10.83
SE = 4.95 SC = 5.43
nE = 40 NC = 40

Verify that the means and standard deviation of the two groups computed in the table above are correct.
Z-test (ratio) can then be computed as follows:

Z = x1 - x2
√ S21 + S22
n1 n2
By substituting, Z = 18.18 −10.83
√ (4.85)2+ (5.43)2
40 40

= 7.35
√ (4.85)2+ (5.43)2
40 40

= 7.35
1.16
= 6.34

Step 4: Critical or table value of Z


The critical or table value of Z for a two-tailed test at the 0.05 level of Significance is ±1.96.
Step 5: Decision Rule
Reject the null hypothesis if the Z-calculated is greater than the Z- critical or otherwise, do not reject Ho
Step 6: Interference or Decision
Since the calculated value of Z (6.34) is greater than the critical value of
Z (±1.96), we reject the null hypothesis and uphold the alternative hypothesis.
Step 7: Conclusion
There is a significant difference between the mean performance of the
Students exposed to the E-learning method and that of the students exposed to the conventional method in the favour of the E-learning method group.
All the pertinent information relating to the test are presented in a table below:
Mean Standard n Standard Z-cal Z-crit
deviation Error
E-Learning method 18.18 4.95 40 1.16 6.34 1.96
Conventional method 10.83 5.43 40
Testing Hypothesis about the Difference between means when sample Size is small: The t-Test
When the sample size is small (i.e. n<30) the z-test is no longer appropriate for testing the difference between means, in such cases, we use the t-test which has been
devised to take care of small sample cases. The t-distribution was discovered by W.S GOSSET, a young chemist, while he was working on quality control at a brewery in
Dublin, Ireland. He discovered that for small samples, the sampling distribution was markedly different from the normal distribution and that as the sample increased, the
distribution closely approximated to the normal distribution. Since he was working for a company, Gosset could not publish his findings under his real name; so he
chooses the pen name ‘Student’ ’from where this distribution derived the name students t-distribution.
The procedures involved in carrying out the t-test are essentially the same for the z-test. Whereas the z-test can be used for large samples only, the t-test (which is a small
sample test) can also be used for large samples. If the sample becomes sufficiently large, the t-distribution coincide with the z-distribution.
To be able to find the critical t-value, we must determine what is called the degree of freedom in addition to such other consideration as the level of significance and
whether the test will be one-tailed or two-tailed.
Degree of freedom
Degree of freedom refers to the number of ways in which any set of scores is free to vary. This depends on the number of restriction placed on the set of scores. Consider
the following set of 6 scores: 15, 9, 3, 12, 5, and 10 whose mean is 9. Having known the mean to be 9, we have placed a restriction on this set of scores, hence only 5 of
the set of the score are free to vary. Once the value of any 5 of the score is determined, the value of the sixth score is fixed, so the sixth score cannot vary. In this case, the
set of scores above has 5 degrees of freedom i.e. (n-1) degree of freedom. The formula for calculating the degree of freed (df) is:

df = n1 + n2 - 2
Where n1 and n2 are the total number of sample or population.

The t-ratio is calculated using the formula:


t= x1 - x2

√ S21 + S22
n1 n2
Where:
x , S, n denote the mean, standard deviation and cases or sample sizes for the groups.
Example 1
In a study to investigate the influence of gender on student’s achievement in Biostatistics, the following scores were obtained in a Biostatistics achievement test for 400
level DVM students.
Scores of 400 Level DVM students in Biostatistics achievement test
Male Students Female Students
9, 12. 16, 15, 5, 15, 10, 18, 18, 20, 26, 6, 10, 12, 18,13 17, 11, 9, 19, 5, 15,
10, 8, 19 10
The researcher wants to test if gender is a significant factor in the achievement of 400 level DVM students in Biostatistics.
Since n < 30, the appropriate test statistic is the t-test. Here are the steps involved in carrying out the t-test of difference between means;

Step one: Formulate an appropriate null and alternative hypothesis

 Ho: There is no significant difference between the mean Biostats achievement of male and female students in 400 L DVM students.

 Ha: There is a significant difference between the mean Biostats achievement of male and female 400 DVM students.

Step two: Choose an alpha level (level of significance). The test will be conducted at 0.05 level of significance,

Step three: Calculate the test statistic namely the t-ratio, using the formula:
t= xm - xf

√ S2m + S2f
nm nf

x m = mean for male students


x f = mean for female students
nm = number of male
nf = number of female
From this example, the above have been calculated as@
Male Female
Mean (x ) 15.00 12.08
Standard deviation (S) 5.79 4.50
Total Number (n) 14 12

Therefore: t = 15.00 - 12.00


√ 5.79 + 4.50
14 12

= 2.92
√ 4.08

= 2.92 = 1.45
2.02

Step four: Having computed the test statistic, we now determine the critical value (table value) of that test statistic. In the case of the t-test, we first calculate the
degree of freedom (df) using the formula.
 df = n1 + n2 - 2
= 14 + 12 – 2
= 26 – 2 = 24
df = 24
With the df =24 and 0.05 level of significance, reference is made to the table for the t-distribution for a two-tailed test as suggested by the non-directional alternative
hypothesis. At 0.05 level of significance and 24 df for two-tailed test, critical or table value of t=2.064.
Step five: We now state our decision rule as follows: Reject Ho in favour of Ha if the calculated value of t exceeds the critical (table) value. Otherwise do not
reject Ho.
Step six: Inference/Decision
The calculated t-value is 1.45 while the critical (table) value is 2.064, since the calculated t-value is less than the critical (table) value, we do not reject the null
hypothesis (we accept the null hypothesis).
Step seven: Conclusion
The result of the test suggests that, the significant difference between the mean achievement of the 400 level DVM male and female students in Biostats is not
statistically significant. The probability that the observed difference resulted from sampling errors is high, (i.e. greater than 0.05). We then conclude that there is
no significant difference between the mean achievements of male and female 400L DVM in Biostats. Any difference observed are such that they could have
arisen from sampling errors.
The table containing relevant information from the test is presented below:

Two - tailed t-test of difference between mean of male and female 400 Level DVM in Biostats

Mean Standard n Degree of Standard t-cal* t-crit**


deviation freedom Error
Male 15.00 5.79 14 2.02 2.02 1.45 2.064
Female 12.08 4.50 12

Assuming the research had formulated a directional alternative hypothesis, a one- tailed test would have been required. Let’s see our null hypothesis against the following
alternative hypothesis.
Ha: The mean achievement of male 400L DVM students is significantly higher than the mean achievement of their female counterparts in Biostats.
The steps involved in the two tailed test are the same with that of the one tailed test, the only difference is in the critical (table) value of the test statistic. Our level of
significance is 0.05 as previously.

Computation of Test Statistic


The test statistic is computed as in the case of two-tailed test, so t-cal = 1.45
Critical (table) value
The critical (table) value of a two-tailed test is different from that of a one tailed test. For a one-tailed test at a level of significance of 0.05 and 24 df, the critical (table)
value – 1.711.
Decision rule
Reject Ho in favour of Ha if the calculate value of the test statistic is greater than the critical (table) value. , otherwise do not reject the Ho.
Decision/inference
The calculated t-value is 1.45 while the critical (table) value is 1.711. Since the the calculated value of t is less than the critical value, we do not reject the null hypothesis.

Conclusion
There is no significant difference between the mean achievement of male and female 400 L DVM students in Biostats.

The table containing relevant information from the test is presented below:

One–tailed t-test of difference between mean of male and female 400 Level DVM in Biostats

Mean Standard n Degree of Standard t-cal* t-crit**


deviation freedom Error
Male 15.00 5.79 14 24 2.02 1.45 1.711
Female 12.08 4.50 12
CHI- SQUARE TEST (X2)
In medical or biomedical research, we are often interested in studying the association between two or more variables. The relationship between two or more continuous
variables can be studied by the methods of correlation, regression etc.
In some cases we may want to study the association between two discrete variables like smoking and lung cancer, sometimes the association between a continuous
variable grouped into categories; mild, moderate and severe.
The Chi-Square test (X2) is a non-parametric inferential statistical method used in the analysis of frequencies or nominal data. As a non-parametric statistic, it makes no
restrictive assumption about the distribution of scores in question and so it can be used where the assumption of parametric statistics about the distribution are not
satisfied. Consequently the statistic, has found extensive application in the field of medical or biomedical research and other sciences particularly in the analysis of data in
the form of frequencies or categories.
The Chi-square test is a two-tailed test, it can only indicate whether or not a set of observed frequencies differ significantly from the corresponding set of expected
frequencies and not possibly the direction in which they differ. The general formula for the computation of the Chi-square statistic is:
X2 = Σ (O –E) 2
E
Where O = Observed frequency
E = Expected or theoretical frequency
And Σ = Sum of
The above formula means that we have to obtain the expected or theoretical frequencies first. The expected frequencies are those frequencies which occur under the null
hypothesis, while the observed frequencies correspond to the frequencies obtained by direct observation of the phenomenon or event under consideration. Having obtained
the expected frequencies, we then, calculate the square of the differences between the observed frequencies and the expected frequencies. The squared differences are then
divided by the corresponding expected frequencies and the ratios summed up to get X 2. The obtained calculated X2 is then compared with the critical (table) value. If the
calculated X2 value is greater than the critical (table) value, we then reject the null hypothesis, otherwise, we do not reject the null hypothesis.
To find the critical table value, we have to decide on the level of significance (alpha level) and calculate the associate degree of freedom, just as we did in the t-test,
Critical or table value of X2 can be obtained from the sampling distribution table of X2 for df ≤ 30. For df > 30, it can be calculate.
The Chi square test is used to determine whether there is a significant difference between the expected frequencies and the observed frequencies in one or more categories.
The purpose of the test is to evaluate how likely the observations that are made would be, assuming the null hypothesis is true.
Chi-square test, unlike other tests of significance such as ' Z' and ’t’ tests, is a nonparametric test not based on any assumption or distribution of any variable. This statistic,
though different, also follows a specific distribution known as Chi-square distribution, it is very useful in research. It is most commonly used when data are in frequencies
such as in the number of responses in two or more categories.
APPLICATION OF CHI- SQUARE
The test involves the calculation of a quantity, called Chi- square (X 2) from the Greek letter 'Chi' (X) and pronounced as ' Kye'. It was developed by Karl Pearson and has
the following three common but very important applications in biostatistics as tests of:
1. Proportion (significance).
2. Association (Independence).
3. Goodness of fit.
Test of Proportion (Significance)
As an alternate test to find the significance in two or more than two proportions. Chi-square test is yet another very useful test which can be applied to find significance in
the same type of data with two more advantages.
i. To compare the values of two samples even if they are small, less than 30. The test can still be applied provided correction factor, Yates correction, is applied and
the expected value is not less than 5 in any cell.
ii. To compare the frequencies of two multinomial samples.
Test of Association (Independence)
The test of association between two events in binomial or multinomial samples is the most important application of the X 2 test in statistical methods. It measures the
probability of association between two discrete attributes. Two events can often be studied for their association such as smoking and cancer, treatment and outcome of
disease, vaccination and immunity, etc.
There are two possibilities, either they influence or affect each other or they do not. In other words they are either independent of each other or they are dependent on each
other, i.e. associated.
The X2 test has an added advantage. It can be applied to find association or relationship between two discrete attributes when there are more than two classes or groups as
happens in multinomial samples, e.g. to test the association between number of cigarettes and incidence of lung cancer, etc.
The X2 Goodness of Fit Test
Chi-square (X2) test is also applied as a test of "goodness of fit", to determine if actual numbers are similar to the expected or theoretical numbers. The goodness of fit test
is employed to indicate whether or not a set of observed frequencies fits closely the expected (theoretical) frequencies.
If the calculated X2 value of the sample is found to be higher than the expected value (critical value), at the critical level of significance, e.g. probability of 0.05, the
hypothesis of no difference (Ho) between two proportions or the hypothesis of independence (Ho) of two characters is rejected. If calculated value is lower the hypothesis
of no difference is not rejected, thereby, concluding that the difference is due to chance, or the two characters are not associated. The exact probability for larger or smaller
calculated value of X2 is found from the table of significant values. It may be 0.1, 0.01, 0.001 or may be somewhere in between any of these. The level of significance of
the X2 value may be stated in percentages as 5%, 1% and so on, instead of as probability of occurrence by chance out of unit given in the table (P=0.05 or 0.01 and so on).
Example
In a study to determine the course preference of a group of 120 Vet students, it was observed that 65 preferred Biostatistics while 55 preferred Microbiology, as shown the
table👇🏾
Subject Preference 120 Vet Medicine students
COURSE PREFERENCE
Biostatistics Microbiology
65 55

The researcher is interested in testing whether or not these frequencies fit the expected frequencies under the null hypothesis that the observed subject preference pattern of
the Vet students is due to chance.
The expected frequencies are those that will occur if the null hypothesis is true. In this case if the null hypothesis is true, we would expect 50% of Vet student to show
preference to Biostat and the other 50% to show preference to Microbiology. Accordingly, the expected frequencies will be 60 for Biostat and 60 for microbiology.
Observed and expected frequencies of course preference of 120 Vet students
COURSE PREFERENCE
Biostat Microbiology
Observed Frequency 65 55
Expected Frequency 60 60
We then state the Ho and Ha hypothesis as follows:
Ho = the subject preference of Vet students is due to chance
Or Vet students do not show any preference for either Biostat or Microbiology
Ha = the subject preference pattern of Vet student is not due to chance.
Or Vet students show more preference for either Biostat or Microbiology.
We, then choose the 5% (0.05) level of significance for this test. To compute the X2, we then apply the formula:
X2 = Σ (O –E) 2
E
Where O = Observed frequency
E = Expected or theoretical frequency
And Σ = Sum of
= (65 – 60)2 + (55 – 60)2
60 60

= 52 + -52 = 25 + 25 = 0.4167 + 04167 = 0.83 (approximately)


60 60 60 60
Now, to find the critical X2 we first determine the degree of freedom (df).

The df in this case = 1. We then refer to the table of sampling distribution of X 2 at 0.05 level of significance and 1 df to obtain the critical X2 as 3.84.

The calculated X2 = 0.83 and the critical (table) value of X2 = 3.84.


Since the calculated value is less than the critical value, we do not reject the Ho. In other words, Vets students did not exhibit any particular course preference. The
observed course preference pattern could be said to have arisen from chance factors. Therefore the observed frequencies are a good fit to what we would expect if Vet
students did not have preference to any course.

Let us consider another example involving the goodness for fit test. In the admission policy of Federal Universities in Nigeria, the following set of criteria are adopted.

Merit = 40%
Locality = 30%
Educationally less developed states (ELDS) = 20%
University discretion = 10%
In the University of Abuja, in 2020, out of a total of 250 candidates offered admission, 80 were offered admission on the basis of Merit, 70 on the basis of locality, 65 on
the basis of ELDS and 35 on the basis of University discretion. Let us test the null hypothesis which states that the distribution of candidates offered admission into the
University of Abuja does not differ from that stipulated by the admission policy of Universities in Nigeria. The alternative hypothesis would be that the candidates offered
admission into the University of Abuja differ significantly from that stipulated in the admission policy.
We now compute the expected frequencies based on the null hypothesis. If the null hypothesis is true, the expected frequency of the candidates to be admitted under the
various criteria will be:
Merit = 40% of 250 =40/ 100×250/1 =100
Locality =30% of 250 =30/100×250/1 =75
ELDS =20% of 250 =20/100× 250/1 =50
University Discretion =10% of 250 =10/100×250/1 =25.
Both the observed and expected frequencies are presented in a table as follows:
Observed and Expected Frequencies of Admission into a University
Merit Locality ELDS University Discretion
Observed(O) 80 70 65 35
Expected(E) 100 75 50 25

The X2 is now computed as follows:


X2 = Σ (O –E) 2
E
= (80 - 100)2 + (75 -75)2 + (65 - 50)2 + (35 - 25)2 = 4.0 + 0.33 + 4.5 + 4.0 = 12.83
100 75 50 25

The calculated X2 =12. 83. The associate degree of freedom is 3. At the 0.05 level of significance and 3df, the critical X 2 =7.815
The calculated X2 is greater than the critical X 2 we therefore reject the null hypothesis which implies that there is significant discrepancy between the frequencies of
candidate admitted under the various categories than that specified in the admission policy.
The X2 Test of Independence
Another major and popular application of the X2 test is the testing for the independence of two variables. In this case, two factors or variable, each having two or more
levels/categories, are involved and the researcher wants to test whether or not the two variable are dependent or not. The table in which the observed and expected
frequencies associated with the various levels of two variable are presented is call the contingency table. It is referred to as contingency table because it displays data
associated with 2 variable that are possibly contingent upon (i.e. dependent on) one another. A contingency table is usually named by the number of rows (R) and number
of columns(C) it has, as an R×C contingency table. For instance, if there are 3 rows and 5 columns, the contingency table will be called a 3×5 contingency table.
In a contingency table it is customary to put the expected frequency in the same cell as the corresponding observed frequency. However, the expected frequency is then
differentiated by enclosing it in a bracket. Consider the following example:
EXAMPLE: A researcher studying the attitude to science of 300 students from different socio-economic backgrounds on a 4-point Likert-type scale, obtained the
following information:
Attitude to Science of 300 students from different Socio-economic Backgrounds

Socio- SCIENCE ATTITUDE


Economic Strongly Agree Disagree Strongly Total
status agree disagree
High 20 15 30 15 80
Middle 40 35 15 10 100
Low 50 39 18 13 120
Total 110 89 83 38 300

The researcher is interested in knowing whether or not science attitude is dependent on the socio-economic status of the students. The appropriate null hypothesis under
the X2 test of independent is formulated as follows:
Ho: The attitude of students towards science is significantly independent of
their Socio-economic status.
Ha: The attitude of students toward science is significantly dependent
on their socio-economic status.
Let us choose the 0.05 level of significance for testing the hypothesis. We now calculate the X 2. In doing this the expected frequencies have to be calculated first. In a
contingency table, the expected frequency of each cell is calculated as follows:
E(RC) = fR x fC
N
Where E (RC) = Expected frequency of the cell
fR = total row frequency
fC = total column frequency
N = total frequency
Let us compute the expected frequencies for this example
Row 1 Cell 1 E = 80 × 110 = 29.33
300
Row 1 Cell 2 E = 80 × 89 = 23.73
300
Row 1 Cell 3 E = 80 × 63 = 16.80
300
Row 1 Cell 4 E = 80 × 38 = 10.13
300
Row 2 Cell 1 E = 100 × 110 = 36.67
300
Row 2 Cell 2 E = 100 × 89 = 29.67
300
Row 2 Cell 3 E = 100 × 63 = 21.00
300

Row 2 Cell 4 E = 100 × 38 = 12.676.67


300
Row 3 Cell 1 E = 120 × 110 = 12.67
300
Row 3 Cell 2 E = 120 × 89 = 35.60
300
Row 3 Cell 3 E = 120 × 63 = 25.20
300
Row 3 Cell 4 E = 120 × 38 = 15.20
300

These expected frequencies are now presented along with the corresponding frequencies in a 3 x 4 contingency table as shown below.
A 3 x 4 Contingency table

Socio- SCIENCE ATTITUDE


Economic Strongly Agree Disagree Strongly Total
status agree disagree
High 20 15 30 15 80
(29.33) (23.73) (16.80) (10.13)
Middle 40 35 15 10 100
(36.67) (29.67) (21.00)
Low 50 39 18 13 120
(44.00) (35.60) (25.20) (15.20)
Total 110 89 83 38 300

The figures in bracket are the expected frequencies. Next, we compute the X 2 as follows:
X2 = Σ (O –E) 2
E

X2 = (20 -29.33)2 + (15 – 23.73)2 + (30 -16.80)2 + (15 -10.13)2 + (40 -16.67)2
29.33 23.73 16.80 10.13 36.67
+ (35 -29.67) + (15 -21.00) + (10 -12.67) + (50 - 44.00) + (39 -35.60)2 +
2 2 2 2

29.67 21.00 12.67 44.00 35.60


2 2
(18 -25.20) + (13 -15.20)
25.20 15.20

= 2.97 + 3.21 + 10.37 + 2.34 + 0.30 + 0.96 + 1.71 + 0.56 + 0.82 + 0.32 + 2.06 + 0.32 = 25.94
i.e. the calculated X2 =25.94

To determine the critical X2 value, we first determine the associated degree of freedom. The dree of freedom in a contingency table is given by:
df = (R – 1) (C – 1)
Where R = number of rows
C = the number of columns
Therefore is this present case, df = (3-1) × (4-1) = 2×3 = 6
We now refer to the table of sampling distribution of X2 for 6 df at 0.05 level of significance. The critical X2 for 6 df and 0.05 level of significance is 12.592.
X2 Cal. = 25.94
X2 Crit. = 12.592

The calculated value exceeds the critical value, hence we reject the null hypothesis. This implies that the attitude of the student to science is dependent on their socio-
economic status. In other words, students from different socio-economic backgrounds hold different attitudes about science.
Computation of X2 from a 2×2 Contingency table
Assuming we have a 2×2 Contingency table whose cell frequencies and marginal totals are represented as follows:
a b k
c d l
m n N

With a, b, c, d as the cell frequencies, k, l, m, n as the marginal frequencies and N as the total frequency.
To compute X2 from this kind of contingency table, we use the formula:
X2 = N (ad – bc) 2
klmn
Example:
In a study, the opinion of male and female students was sought on the introduction of artificial insemination in all food animals in Nigeria and the following data were
obtained.
Yes No Total
Male 50(a) 46(b) 96(k)
Female 70(c) 61(d) 131(l)
Total 120(m) 107(n) 227(N)

X2 = N (ad – bc) 2
klmn

= 227(50×61 – 46+×70)2
96×131×120×107
= 227×28900
161475000

= 6560300
161475000

= 0.41

At 0.05 level of significance and 1 df, the X2 – critical = 3.841.


So, X2 Cal. < X2 Crit., hence we do not reject the null hypothesis. The opinion of the students was therefore independent on sex.
BASIC STEPS IN WRITING A PROJECT PROPOSAL
A research proposal is a document that describes the essential features of a study to be conducted in the future, as well as the strategy whereby the inquiry may be logically
and successfully accomplished. Steps involved include:
 Determining the general topic

 Performing a Literature review on the topic

 Identifying a gap in the literature;

 Identifying a problem highlighted by the gap in the literature and framing a purpose for the study.

A Typical Layout of a Project Proposal

1.0 Introduction
1.1 Statement of the research problems
1.2 Research questions/Hypothesis
1.3 Justification of the study
1.4 Aim of the study
1.5 Objectives of the study

2.0 Methodology
2.1 Study location
2.2 Sampling
2.3 Sampling method
2.4 Sample size determination
2.5 Sample collection
2.6 Sample processing
2.7 Sample analysis/laboratory analysis
2.8 Statistical analysis
2.9 Ethical consideration
2.9 Budget for the study
2.10 Time frame for the study
2.11 Expected outcome/impact of the study

3.0 References
Outline of a Typical Format of a Project
 Tittle Page
 Abstract
- Stress content not intent
- Assume a knowledgeable reader
- Write the abstract last
- Avoid passive voice
- Keep it short, nt more than 500 words
- Make quantitative and not qualitative statements
- Do not use equations or other mathematical notations
- Empathise with the first time reader
 Keywords
 List of figures and tables
 Acknowledgements
 Table of contents
 Statement of Original authorship
1.0 Introduction
1.1 Background of research
1.2 Research problem
1.3 Research question/Hypothesis and hypothesis
1.4 Justification for the research
2.0 Literature Review
3.0 Methods and materials
4.0 Results and Discussion
5.0 Conclusion and suggestions for further work
6.0 References
Appendices
There may be some slight changes in the structure, but you will need to justify it.
BASIC STRUCTURE OF A PROJECT
QUESTION SECTION OF PROJECT
1, Why am I doing it Introduction: Significance?
2. What is known Review of research?
3. What is unknown Identifying Gaps
4. What do I hope to discover Aims?
5. How am I going to discover it Methodology
6. What have I found Results?
7. What does it mean Discussion?
8. So What? What are the possible Conclusion?
Applications or Recommendations.
What contribution does it make to knowledge?
What next? Recommendation for further research.

You might also like