biostatistics notes part 1
biostatistics notes part 1
Biostatistics
Definition of Biostatistics:
Biostatistics is the application of statistical methods to biological, medical, and health sciences to analyze
and interpret data.
Data refers to a collection of facts, observations, measurements, or information that can be analyzed,
interpreted, or used for various purposes. Data can take different forms, such as numbers, text, images,
or sounds, and it is typically collected for analysis in fields such as research, business, and science.
1. Qualitative Data: Descriptive data that can be categorized but not measured (e.g., colors, names, labels).
2. Quantitative Data: Numerical data that can be measured and quantified (e.g., age, height, weight).
Data is often processed and analyzed using statistical tools to extract meaningful insights and support
decision-making.
Population: The entire group being studied (e.g., all pregnant women).
Sample: A subset of the population selected for study (e.g., 200 pregnant women from a hospital).
Variable:
A characteristic or factor that can vary among individuals in a study (e.g., age, weight, blood pressure).
Types of Variables:
continuous Variables
Interval Variable:
Ratio Variable:
o Numeric values with equal intervals and a true zero point (zero indicates "absence of the variable").
o Allows for meaningful ratios (e.g., twice as much).
o Example: Weight, height, age, blood pressure, income.
Hypothesis
A hypothesis is a proposed explanation or assumption made based on limited evidence as a starting point for
further investigation.
It is a testable statement that predicts the relationship between two or more variables.
1. States that there is no association, difference, or effect between variables being studied.
2. Assumes any observed effect is due to chance.
3. Example: "There is no difference in pregnancy outcomes between Group A and Group B."
Null Hypothesis (H₀): "There is no difference in blood pressure control between patients taking Drug A and those
taking a placebo."
Alternative Hypothesis (H₁): "Patients taking Drug A have better blood pressure control than those taking a
placebo."
Null Hypothesis (H₀): "Regular exercise has no effect on weight loss in obese individuals."
Alternative Hypothesis (H₁): "Regular exercise leads to significant weight loss in obese individuals."
Null Hypothesis (H₀): "Smoking during pregnancy does not affect birth weight."
Alternative Hypothesis (H₁): "Smoking during pregnancy reduces birth weight."
Reject Null Hypothesis (H₀): When the p-value is less than the significance level (α), H₀ is rejected, indicating
a statistically significant result.
Accept (Fail to Reject) Null Hypothesis: When the p-value is greater than the significance level (α), there is
insufficient evidence to reject H₀.
P-Value:
Represents the probability of observing results as extreme as the study data if H₀ is true.
Low p-value (< 0.05): Strong evidence against H₀ → Reject H₀.
High p-value (> 0.05): Weak evidence against H₀ → Fail to reject H₀.
Probability (P):
The threshold probability for rejecting H₀, typically set at 0.05 (5%).
If p-value < α: Reject H₀ (significant result).
If p-value > α: Fail to reject H₀ (not significant).
Data Distribution:
Normal Distribution:
Key Characteristics:
Skewed Distribution:
A distribution where the data points are not symmetrically distributed around the mean.
The tail of the distribution is longer on one side.
Types of Skewed Distribution:
Skewness indicates the direction of data concentration and impacts statistical analysis.
Mean:
Median:
o The middle value when data points are arranged in ascending or descending order.
o Not affected by outliers.
Mode:
Range:
o The difference between the largest and smallest values in the dataset.
o Range = Maximum value - Minimum value
o Measures data spread but is sensitive to outliers.
Variance:
o Measures how far each data point is from the mean and, therefore, how spread out the values are.
Standard Deviation (SD):
o The square root of variance, providing a more interpretable measure of spread in the same units as the
data.
o The range between the first quartile (Q1) and the third quartile (Q3), representing the middle 50% of
the data.
o Formula: IQR=Q3−Q1
o Example:
For dataset: 1,3,5,7,91, 3, 5, 7, 9,
Q1 = 33, Q3 = 77,
IQR = 7−3=47 - 3 = 4.
Calculate IQR:
The standard error of the mean (SEM) measures how accurately the sample mean estimates the population
mean. It reflects the variability of sample means around the population mean.
Example
The SEM provides insight into the precision of the sample mean as an estimate of the population
mean.
A range of values that is likely to contain the true population parameter with a certain level of confidence.
It provides an estimate of uncertainty around the sample statistic (e.g., mean, proportion).
Key Points:
Confidence Level: Typically 95% or 99%, indicating the probability that the interval will contain the true
population parameter.
Example:
A study reports the mean weight of 100 patients is 70 kg, with a standard deviation of 5 kg.
For a 95% confidence level, the Z-score is 1.96.
This means we are 95% confident that the true population mean lies within this range.
refer to errors made in hypothesis testing when making decisions about the null hypothesis.
Summary:
Type 1 Error (False Positive): Incorrectly rejecting a true null hypothesis (α = 0.05).
Type 2 Error (False Negative): Incorrectly failing to reject a false null hypothesis (β = 0.20).
Example 1
Null Hypothesis (H₀): The patient does not have the disease.
o Type 1 Error (False Positive): The test incorrectly indicates that the patient has the disease when they
do not.
Example: A patient without the disease is told they have it, leading to unnecessary treatment.
o Type 2 Error (False Negative): The test incorrectly indicates that the patient does not have the disease
when they actually do.
Example: A patient with the disease is told they are disease-free, leading to delayed or no
treatment.
Example 2
Example: A new drug is approved for use based on a study showing it works, even though it has
no real effect.
o Type 2 Error (False Negative): The study concludes the drug is ineffective when it actually is effective.
Example: A potentially life-saving drug is rejected because the study fails to show its
effectiveness.
Example 3
Null Hypothesis (H₀): The educational program has no impact on students' performance.
o Type 1 Error (False Positive): The study concludes the program improves student performance when it
actually does not.
Example: The program is implemented in schools based on misleading results showing it boosts
performance.
o Type 2 Error (False Negative): The study concludes the program has no impact on performance when it
actually does.
Example: The program is discontinued because the study fails to detect its positive effect on
students.