biostatistics notes part 1

Notes for mrcog 1
Biostatistics
Biostatistics Notes for MRCOG Part 1
Definition of Biostatistics:
 Biostatistics is the application of statistical methods to biological, medical, and health sciences to analyze
and interpret data.
Data refers to a collection of facts, observations, measurements, or information that can be analyzed,
interpreted, or used for various purposes. Data can take different forms, such as numbers, text, images,
or sounds, and it is typically collected for analysis in fields such as research, business, and science.
There are two main types of data:
1. Qualitative Data: Descriptive data that can be categorized but not measured (e.g., colors, names, labels).
2. Quantitative Data: Numerical data that can be measured and quantified (e.g., age, height, weight).
Data is often processed and analyzed using statistical tools to extract meaningful insights and support
decision-making.
Importance of Biostatistics in Medicine:
 Helps in evidence-based medicine and clinical decision-making.

 Essential for clinical trials and evaluating treatment efficacy.
 Supports public health policies and disease surveillance.
 Aids in designing and analyzing medical research studies.
Research Process Steps:

1. Identify Research Question – Define the objective clearly.
2. Literature Review – Review existing studies.
3. Formulate Hypothesis – Make testable assumptions.
4. Study Design – Choose appropriate methodology.
5. Data Collection – Gather relevant data.
6. Data Analysis – Use statistical tools for interpretation.
7. Results Interpretation – Draw conclusions.
8. Publication – Share findings.
Sample and Population:
 Population: The entire group being studied (e.g., all pregnant women).
 Sample: A subset of the population selected for study (e.g., 200 pregnant women from a hospital).
Variable:
 A characteristic or factor that can vary among individuals in a study (e.g., age, weight, blood pressure).
Types of Variables:
A. Qualitative Variables (Categorical):
 Represent categories or groups.
1. Nominal: No natural order (e.g., blood group: A, B, AB, O).

2. Ordinal: Have a specific order but unequal intervals (e.g., pain scale: mild, moderate, severe).
3. Binary: Two possible outcomes (e.g., yes/no, male/female).
B. Quantitative Variables (Numerical):
 Represent measurable quantities.
1. Discrete: Countable numbers (e.g., number of pregnancies).

2. Continuous: Measured on a continuous scale (e.g., height, weight, blood pressure).
continuous Variables
Interval Variable:
o Numeric values with equal intervals between measurements.

o No true zero point (zero does not mean "absence of the variable").
o Example: Temperature in Celsius or Fahrenheit, IQ scores.
Ratio Variable:
o Numeric values with equal intervals and a true zero point (zero indicates "absence of the variable").
o Allows for meaningful ratios (e.g., twice as much).
o Example: Weight, height, age, blood pressure, income.
Hypothesis
 A hypothesis is a proposed explanation or assumption made based on limited evidence as a starting point for
further investigation.
 It is a testable statement that predicts the relationship between two or more variables.
Characteristics of a Good Hypothesis:
 Clear and specific

 Testable and falsifiable
 Based on existing knowledge or literature
 Predicts a relationship between variables
Null Hypothesis (H₀) and Alternative Hypothesis (H₁)
Null Hypothesis (H₀):
1. States that there is no association, difference, or effect between variables being studied.
2. Assumes any observed effect is due to chance.
3. Example: "There is no difference in pregnancy outcomes between Group A and Group B."
Alternative Hypothesis (H₁ or Hₐ):
1. States that there is an association, difference, or effect between variables.

2. Opposes the null hypothesis.
3. Example: "There is a significant difference in pregnancy outcomes between Group A and Group B."
Examples of Null and Alternative Hypotheses
Example 1: Drug Efficacy
 Null Hypothesis (H₀): "There is no difference in blood pressure control between patients taking Drug A and those
taking a placebo."
 Alternative Hypothesis (H₁): "Patients taking Drug A have better blood pressure control than those taking a
placebo."
Example 2: Exercise and Weight Loss
 Null Hypothesis (H₀): "Regular exercise has no effect on weight loss in obese individuals."
 Alternative Hypothesis (H₁): "Regular exercise leads to significant weight loss in obese individuals."
Example 3: Smoking and Pregnancy Outcomes
 Null Hypothesis (H₀): "Smoking during pregnancy does not affect birth weight."
 Alternative Hypothesis (H₁): "Smoking during pregnancy reduces birth weight."
Rejection and Acceptance of Null Hypothesis:
 Reject Null Hypothesis (H₀): When the p-value is less than the significance level (α), H₀ is rejected, indicating
a statistically significant result.
 Accept (Fail to Reject) Null Hypothesis: When the p-value is greater than the significance level (α), there is
insufficient evidence to reject H₀.
P-Value:
 Represents the probability of observing results as extreme as the study data if H₀ is true.
 Low p-value (< 0.05): Strong evidence against H₀ → Reject H₀.
 High p-value (> 0.05): Weak evidence against H₀ → Fail to reject H₀.
Probability (P):
 Measures the likelihood of an event occurring, expressed as a value between 0 and 1.

 0: Impossible event.
 1: Certain event.
Significance Level (α):
 The threshold probability for rejecting H₀, typically set at 0.05 (5%).
 If p-value < α: Reject H₀ (significant result).
 If p-value > α: Fail to reject H₀ (not significant).
Data Distribution:
 Refers to the way data points are spread or arranged in a dataset.

 Helps understand patterns, trends, and variability in data.
Normal Distribution:
Also called a Gaussian distribution or bell curve.
Key Characteristics:
o Symmetrical, bell-shaped curve.

o Mean = Median = Mode.
o Data points cluster around the mean, with fewer points in the tails.
o 68% of data lies within 1 standard deviation (SD) from the mean.
o 95% of data lies within 2 SD from the mean.
o 99.7% of data lies within 3 SD from the mean.
Example: Birth weights of newborns in a population typically follow a normal distribution.
Skewed Distribution:
 A distribution where the data points are not symmetrically distributed around the mean.
 The tail of the distribution is longer on one side.
Types of Skewed Distribution:
Positive Skew (Right Skew):
o Tail extends to the right.

o Mean > Median > Mode.
o Example: Income distribution in a population.
Negative Skew (Left Skew):
o Tail extends to the left.

o Mean < Median < Mode.
o Example: Age at retirement in a population.
Skewness indicates the direction of data concentration and impacts statistical analysis.
Measures of Central Tendency:
 Statistical measures that describe the center or typical value of a dataset.
Mean:
o The arithmetic average of all data points.

o Formula: Mean = (Sum of all values) / (Number of values)
o Sensitive to outliers.
Median:
o The middle value when data points are arranged in ascending or descending order.
o Not affected by outliers.
Mode:
o The most frequently occurring value in the dataset.

o A dataset can have no mode, one mode (unimodal), or multiple modes (bimodal/multimodal).
Range:
o The difference between the largest and smallest values in the dataset.
o Range = Maximum value - Minimum value
o Measures data spread but is sensitive to outliers.
Measures of Dispersion or Spread:
 These measures show the variability or spread of data points in a dataset.
Variance:
o Measures how far each data point is from the mean and, therefore, how spread out the values are.
Standard Deviation (SD):
o The square root of variance, providing a more interpretable measure of spread in the same units as the
data.
Interquartile Range (IQR):
o The range between the first quartile (Q1) and the third quartile (Q3), representing the middle 50% of
the data.
o Formula: IQR=Q3−Q1
o Example:
For dataset: 1,3,5,7,91, 3, 5, 7, 9,
Q1 = 33, Q3 = 77,
IQR = 7−3=47 - 3 = 4.
Example of Interquartile Range (IQR):

For the dataset:
6, 7, 12, 13, 15, 18, 19, 21, 23
Arrange the data in ascending order:

6, 7, 12, 13, 15, 18, 19, 21, 23
Find the median (Q2):

The median is the middle value of the dataset. Here, the 5th value is 15.
So, Q2 = 15.
Find the first quartile (Q1):

This is the median of the lower half of the dataset (excluding Q2).
6, 7, 12, 13 → The median of these values is 10.
So, Q1 = 10.
Find the third quartile (Q3):

This is the median of the upper half of the dataset (excluding Q2).
18, 19, 21, 23 → The median of these values is 19.5.
So, Q3 = 19.5.
Calculate IQR:
IQR=Q3−Q1=19.5−10=9.5\text{IQR} = Q3 - Q1 = 19.5 - 10 = 9.5
So, the Interquartile Range (IQR) is 9.5.

Standard Error of Estimate (SE) of the Mean:
 The standard error of the mean (SEM) measures how accurately the sample mean estimates the population
mean. It reflects the variability of sample means around the population mean.
Example
The SEM provides insight into the precision of the sample mean as an estimate of the population
mean.
Confidence Interval (CI):
 A range of values that is likely to contain the true population parameter with a certain level of confidence.
 It provides an estimate of uncertainty around the sample statistic (e.g., mean, proportion).
Key Points:
 Confidence Level: Typically 95% or 99%, indicating the probability that the interval will contain the true
population parameter.
o Formula for Confidence Interval (for population mean):
Example:
 A study reports the mean weight of 100 patients is 70 kg, with a standard deviation of 5 kg.
 For a 95% confidence level, the Z-score is 1.96.
 This means we are 95% confident that the true population mean lies within this range.
Type 1 and Type 2 errors
refer to errors made in hypothesis testing when making decisions about the null hypothesis.
Type 1 Error (False Positive)
 Definition: Rejecting the null hypothesis when it is actually true.

 Example: Concluding that a drug is effective when it is actually not.
 Significance: Type 1 error is denoted by α (alpha), also called the significance level. The probability of making a
Type 1 error is typically set at 0.05 (5%).
Type 2 Error (False Negative)
 Definition: Failing to reject the null hypothesis when it is actually false.

 Example: Concluding that a drug is not effective when it is actually effective.
 Significance: Type 2 error is denoted by β (beta). The probability of avoiding a Type 2 error (correctly rejecting
the null hypothesis when it is false) is called power, and it is typically desired to be 80% or higher.
Summary:
 Type 1 Error (False Positive): Incorrectly rejecting a true null hypothesis (α = 0.05).
 Type 2 Error (False Negative): Incorrectly failing to reject a false null hypothesis (β = 0.20).
examples for Type 1 and Type 2 errors:
Example 1
Null Hypothesis (H₀): The patient does not have the disease.
Alternative Hypothesis (H₁): The patient has the disease.
o Type 1 Error (False Positive): The test incorrectly indicates that the patient has the disease when they
do not.
 Example: A patient without the disease is told they have it, leading to unnecessary treatment.
o Type 2 Error (False Negative): The test incorrectly indicates that the patient does not have the disease
when they actually do.
 Example: A patient with the disease is told they are disease-free, leading to delayed or no
treatment.
Example 2
Null Hypothesis (H₀): The drug has no effect.
Alternative Hypothesis (H₁): The drug has an effect.

o Type 1 Error (False Positive): The study concludes the drug is effective when it actually isn't.
 Example: A new drug is approved for use based on a study showing it works, even though it has
no real effect.
o Type 2 Error (False Negative): The study concludes the drug is ineffective when it actually is effective.
 Example: A potentially life-saving drug is rejected because the study fails to show its
effectiveness.
Example 3
Null Hypothesis (H₀): The educational program has no impact on students' performance.
Alternative Hypothesis (H₁): The educational program improves students' performance.
o Type 1 Error (False Positive): The study concludes the program improves student performance when it
actually does not.
 Example: The program is implemented in schools based on misleading results showing it boosts
performance.
o Type 2 Error (False Negative): The study concludes the program has no impact on performance when it
actually does.
 Example: The program is discontinued because the study fails to detect its positive effect on
students.

biostatistics notes part 1

Uploaded by

biostatistics notes part 1

Uploaded by

Notes for mrcog 1

Biostatistics Notes for MRCOG Part 1

There are two main types of data:

Importance of Biostatistics in Medicine:

 Helps in evidence-based medicine and clinical decision-making.

Research Process Steps:

Sample and Population:

A. Qualitative Variables (Categorical):

 Represent categories or groups.

1. Nominal: No natural order (e.g., blood group: A, B, AB, O).

B. Quantitative Variables (Numerical):

 Represent measurable quantities.

1. Discrete: Countable numbers (e.g., number of pregnancies).

o Numeric values with equal intervals between measurements.

Characteristics of a Good Hypothesis:

 Clear and specific

Null Hypothesis (H₀) and Alternative Hypothesis (H₁)

Null Hypothesis (H₀):

Alternative Hypothesis (H₁ or Hₐ):

1. States that there is an association, difference, or effect between variables.

Examples of Null and Alternative Hypotheses

Example 1: Drug Efficacy

Example 2: Exercise and Weight Loss

Example 3: Smoking and Pregnancy Outcomes

Rejection and Acceptance of Null Hypothesis:

 Measures the likelihood of an event occurring, expressed as a value between 0 and 1.

Significance Level (α):

 Refers to the way data points are spread or arranged in a dataset.

Also called a Gaussian distribution or bell curve.

o Symmetrical, bell-shaped curve.

Example: Birth weights of newborns in a population typically follow a normal distribution.

Positive Skew (Right Skew):

o Tail extends to the right.

Negative Skew (Left Skew):

o Tail extends to the left.

Measures of Central Tendency:

 Statistical measures that describe the center or typical value of a dataset.

o The arithmetic average of all data points.

o The most frequently occurring value in the dataset.

Measures of Dispersion or Spread:

 These measures show the variability or spread of data points in a dataset.

Interquartile Range (IQR):

Example of Interquartile Range (IQR):

Arrange the data in ascending order:

Find the median (Q2):

Find the first quartile (Q1):

Find the third quartile (Q3):

IQR=Q3−Q1=19.5−10=9.5\text{IQR} = Q3 - Q1 = 19.5 - 10 = 9.5

So, the Interquartile Range (IQR) is 9.5.

Confidence Interval (CI):

o Formula for Confidence Interval (for population mean):

Type 1 and Type 2 errors

Type 1 Error (False Positive)

 Definition: Rejecting the null hypothesis when it is actually true.

Type 2 Error (False Negative)

 Definition: Failing to reject the null hypothesis when it is actually false.

examples for Type 1 and Type 2 errors:

Alternative Hypothesis (H₁): The patient has the disease.

Null Hypothesis (H₀): The drug has no effect.

Alternative Hypothesis (H₁): The drug has an effect.

Alternative Hypothesis (H₁): The educational program improves students' performance.

You might also like