Chapter-1
Introduction to Data and Statistics
Integration with Business: Modern businesses heavily rely on data and statistics for
operations and decision-making.
Importance of Data: Collecting and analyzing data is crucial for business operations and
making informed decisions.
Definition and Utility: Data, associated variables, and scales of measurement are
fundamental for management professionals.
Statistics: Statistics transform numbers into useful information, aiding fact-based
decision-making and understanding variation.
Variables
Definition: Variables represent numbers, amounts, or situations that can change.
Types:
o Categorical (Qualitative): Variables with categorical values (e.g., yes/no, day of
the week).
o Numerical (Quantitative): Variables representing quantities.
Discrete: Countable values (e.g., number of employees).
Continuous: Measurable values (e.g., time waiting at an ATM).
Measurement Scales
Definition: Determines the ordering, differences, and equivalence of values for a
variable.
Types:
o Nominal: Categories without order (e.g., car brands).
o Ordinal: Categories with a meaningful order (e.g., grades).
o Interval: Differences between values are meaningful, no true zero (e.g.,
temperature in Celsius).
o Ratio: Differences and ratios are meaningful, true zero exists (e.g., salary).
Collecting Data
Importance: Objective data collection is crucial for accuracy.
Population and Sample:
o Population (N): Entire group of interest.
o Sample (n): Subset of the population used for analysis.
Parameter vs. Statistic: Parameter describes a population; a statistic describes a sample.
Methods of Data Collection
Primary Data: Collected directly by the researcher.
Secondary Data: Collected by someone else, used by the researcher.
Techniques:
o Data from Organizations: Collected and distributed by entities (e.g., financial
data).
o Designed Experiment: Controlled experiments to collect specific data.
o Surveys: Questionnaires collecting opinions and behaviors.
o Observational Studies: Directly observing behavior in a natural setting.
Sampling Methods
Probability Sampling: Each element has a known chance of being selected.
o Simple Random Sampling: Equal chance for all elements.
o Systematic Sampling: Every nth element is selected.
o Stratified Sampling: Subsamples drawn from different strata.
o Cluster Sampling: Samples drawn from clusters of elements.
Non-Probability Sampling: Selection probability is unknown.
o Convenience Sampling: Selecting the most easily available elements.
o Judgment Sampling: Selected based on the researcher's judgment.
o Quota Sampling: Ensuring subgroups are represented.
o Snowball Sampling: Initial respondents recruit further participants.
Survey Design
Components: Designing a questionnaire, pretesting, and editing.
Google Forms: An example tool for creating and distributing surveys.
Summary
Statistics: Science of collecting, analyzing, presenting, and interpreting data.
Data: Facts and figures used for analysis.
Key Terms:
o Data: Collected information.
o Variable: A characteristic of interest.
o Nominal Scale: Identifies attributes.
o Ordinal Scale: Indicates order or rank.
Chapter-2
Basic Concepts
Data: Facts and figures used for analysis.
Statistics: Collection, organization, analysis, interpretation, and presentation of data.
Organizing Data
Categorical Variables: Values that are names or labels (e.g., color, breed).
Quantitative Variables: Numerical values that can be measured or counted.
o Discrete Variables: Countable values (e.g., number of heads in coin flips).
o Continuous Variables: Measurable values within a range (e.g., weight).
Frequency Distribution
Definition: Tabular summary of data showing the number of items in each class.
Example:
o Coke Classic: 9
o Pepsi: 8
o Diet Coke: 13
o Sprite: 9
o Dr. Pepper: 11
Relative and Percent Frequency
Relative Frequency: Proportion of items in a class (Frequency of class / Total
frequency).
Percent Frequency: Relative frequency expressed as a percentage (Relative frequency ×
100).
Visualizing Categorical Data
Bar Graph: Summarizes frequency distribution with bars.
Pie Chart: Represents data as slices of a circle.
Pareto Chart: Bar chart in descending order with cumulative percentage line.
Visualizing Numerical Data
Dot Plot/Scatter Plot: Plots data points to show trends.
Histogram: Graphical representation of data distribution; helps identify skewness.
Cumulative Distribution (Ogive): Plots cumulative frequency on y-axis.
Steps to Create Frequency Distribution (Example)
1. Determine Classes: Decide on 5-20 classes based on data size.
2. Class Width: Calculate using the formula: (Largest value - Smallest value) / Number of
classes.
3. Class Limits: Ensure each data item belongs to one class.
Creating Graphs in Excel
Scatter Plot:
1. Select data cells.
2. Insert scatter plot from chart group.
Histogram:
1. Select class and frequency data.
2. Insert column chart and format data series.
Best Practices for Visualization
Use simple graphs/charts.
Provide clear titles and labels.
Avoid unnecessary decorative elements (chart junk).
Key Definitions
Frequency Distribution: Number of data values in each class.
Cumulative Frequency Distribution: Number of data values less than or equal to the
upper class limit.
Important Graph Types
Bar Chart: For categorical data.
Pie Chart: For proportional representation.
Histogram: For numerical data distribution.
Scatter Plot: For relationships between two numerical variables.
Pareto Chart: For prioritizing categories based on frequency.
By focusing on these key points, you'll be well-prepared for questions on data organization and
visualization in your exam.
Chapter-3
Objectives
Understand types of statistics
Use measures of location (descriptive statistics)
Comprehend measures of variability
Grasp covariance and the coefficient of correlation
Utilize Excel for descriptive statistics
Introduction
Numerical Measures: Summarize data using measures of location, dispersion, shape,
and association.
Sample Statistics vs. Population Parameters: Statistics for a sample are called sample
statistics; for a population, they are called population parameters.
Point Estimator: Sample statistic used to estimate a population parameter.
Central Tendency
Mean (Average):
o Population Mean (µ): Sum of all values divided by the total number of values.
o Sample Mean (𝑥̅): Sum of sample values divided by the sample size.
Median: Middle value when data is ordered. For even number of observations, it's the
average of the two middle values.
Mode: Most frequently occurring value.
Example Calculation
Sample Mean in Excel: =AVERAGE(D4:D13) or =SUM(D4:D13)/10.
Weighted Mean: =SUMPRODUCT(weights, values) / SUM(weights).
Variation and Shape
Range (R): Difference between the largest and smallest values.
Variance and Standard Deviation:
o Variance measures the average squared deviation from the mean.
o Standard Deviation is the square root of variance, showing average deviation from
the mean.
Exploring Numerical Data
Percentile: Value below which a given percentage of observations fall.
Interquartile Range (IQR): Difference between the third quartile (Q3) and the first
quartile (Q1).
Five-Number Summary: Minimum, Q1, Median, Q3, Maximum.
Boxplot: Visual representation of the five-number summary.
Covariance and Correlation
Covariance: Measures the strength of the linear relationship between two variables.
Coefficient of Correlation (r):
o Ranges from -1 to +1.
o Values close to 0 indicate no relationship.
o Positive values indicate a positive relationship; negative values indicate a negative
relationship.
o r = (∑(X - 𝑥̅)(Y - ȳ)) / (n - 1)
Key Formulas
Sample Mean: 𝑥̅ = \frac{∑X}{n}
Population Mean: µ=∑XNµ = \frac{∑X}{N}µ=N∑X
Variance (Sample): s^2 = \frac{∑(X - 𝑥̅)^2}{n-1}
Standard Deviation (Sample): s = \sqrt{\frac{∑(X - 𝑥̅)^2}{n-1}}
Covariance: Cov(X, Y) = \frac{∑(X - 𝑥̅)(Y - ȳ)}{n-1}
Correlation Coefficient: r = \frac{∑(X - 𝑥̅)(Y - ȳ)}{\sqrt{∑(X - 𝑥̅)^2 ∑(Y - ȳ)^2}}
Key Terms
Mean: Average value.
Median: Middle value.
Mode: Most frequent value.
Range: Spread between maximum and minimum values.
Variance: Average squared deviation from the mean.
Standard Deviation: Average deviation from the mean.
Covariance: Measure of the linear relationship between two variables.
Correlation Coefficient: Measure of the strength and direction of the linear relationship
between two variables.
Chapter-4
Probability: Numerical measure of the likelihood that an event occurs, ranging from 0 to 1.
Formula: Probability (P) = Number of favorable outcomes / Total number of possible outcomes.
Types of Probability:
o A Priori Probability: Based on prior knowledge or logical deduction (e.g., January days in
a year).
o Empirical Probability: Based on observed data (e.g., interest in a class).
o Subjective Probability: Based on personal judgment or experience (e.g., predicting sales
of a new product).
Probability of Events
Event: A set of outcomes (e.g., days in January).
Complement of an Event: All outcomes not in the event (e.g., days not in January).
Union of Events (A ∪ B): Probability of either event A or B occurring.
o Formula: P(A∪B)=P(A)+P(B)−P(A∩B)P(A ∪ B) = P(A) + P(B) - P(A ∩ B)P(A∪B)=P(A)+P(B)
−P(A∩B)
Intersection of Events (A ∩ B): Probability of both events A and B occurring.
o For independent events: P(A∩B)=P(A)×P(B)P(A ∩ B) = P(A) × P(B)P(A∩B)=P(A)×P(B)
Mutually Exclusive Events: No common outcomes (e.g., days in January and February).
Conditional Probability
Definition: Probability of event A given that event B has occurred.
o Formula: P(A∣B)=P(A∩B)P(B)P(A|B) = \frac{P(A ∩ B)}{P(B)}P(A∣B)=P(B)P(A∩B)
Example: Probability of promotion given that an officer is a man or a woman.
Ethical Issues in Probability
Ensuring clarity and transparency in probability-related information to avoid public confusion
and mistrust, particularly in advertisements.
Bayes' Theorem
Purpose: To update prior probability estimates based on new information.
Application: Used for revising probabilities, especially when initial probabilities are known, and
additional data is obtained.
Key Formulas and Concepts
Prior Probability: Initial probability estimate.
Posterior Probability: Revised probability based on new information.
Bayes' Theorem: P(A∣B)=P(B∣A)P(A)P(B)P(A|B) = \frac{P(B|A)P(A)}{P(B)}P(A∣B)=P(B)P(B∣A)P(A)
Joint Probability: Probability of two events occurring simultaneously.
Marginal Probability: Probability of a single event occurring, ignoring the other events.
Key Words
Probability: Numerical value indicating the likelihood of an event.
Conditional Probability: Probability of an event given another event has occurred.
Joint Probability: Probability of two events happening together.
Marginal Probability: Probability of an individual event occurring.
Bayes' Theorem: Method for calculating revised probabilities.
This condensed summary captures the essential concepts and examples related to probability as
they apply to business decision-making and statistical analysis.
Chapter-5
5.0 Objectives
Understand properties of probability distributions.
Differentiate between discrete and continuous probability distributions.
Compute expected value and variance.
Calculate probabilities for Binomial and Poisson distributions.
5.1 Introduction
Familiarize with probability distributions, especially Binomial and Poisson.
Learn assumptions and applications through problems.
5.2 Definitions
Random Variable: Numerical value representing outcomes of a statistical experiment.
Discrete Random Variables: Countable outcomes (e.g., number of customers).
Continuous Random Variables: Measurable outcomes over a range (e.g., time).
5.3 Probability Distributions
Probability Distribution: Function providing probabilities of all possible outcomes.
Discrete Probability Distributions: Probabilities for discrete random variables.
o Represented by Probability Mass Function (PMF) or Cumulative Distribution
Function (CDF).
o PMF calculates probability of exactly x successes in n trials.
o CDF calculates cumulative probability up to x successes.
Continuous Probability Distributions: Probabilities for continuous random variables
defined as area under the curve of its PDF.
Properties of Discrete Probability Distributions
Probabilities lie between 0 and 1.
Outcomes are mutually exclusive.
Total probabilities sum to 1.
5.4 The Importance of Expected Value in Decision-Making
Expected Value (E[X]): Measure of the center of the distribution.
Variance (Var(X)): Measure of spread around the expected value.
Standard Deviation (SD(X)): Square root of variance, indicating spread.
Properties of Mean (Expected Value)
E(X+Y)=E(X)+E(Y)E(X + Y) = E(X) + E(Y)E(X+Y)=E(X)+E(Y)
E(aX)=a⋅E(X)E(aX) = a \cdot E(X)E(aX)=a⋅E(X)
E(X+a)=E(X)+aE(X + a) = E(X) + aE(X+a)=E(X)+a
Properties of Variance
V(aX+b)=a2⋅V(X)V(aX + b) = a^2 \cdot V(X)V(aX+b)=a2⋅V(X)
V(X+Y)=V(X)+V(Y)V(X + Y) = V(X) + V(Y)V(X+Y)=V(X)+V(Y)
For pairwise independent variables: V(a1X1+a2X2+...+anXn)=a12V(X1)+a22V(X2)+...
+an2V(Xn)V(a_1X_1 + a_2X_2 + ... + a_nX_n) = a_1^2 V(X_1) + a_2^2 V(X_2) + ... +
a_n^2 V(X_n)V(a1X1+a2X2+...+anXn)=a12V(X1)+a22V(X2)+...+an2V(Xn)
5.5 Binomial Probability Distribution
Used for number of successes in n independent trials with probability p of success.
PMF: P(X=k)=(nk)pk(1−p)n−kP(X = k) = \binom{n}{k} p^k (1-p)^{n-k}P(X=k)=(kn
)pk(1−p)n−k
Mean: μ=np\mu = npμ=np
Variance: σ2=np(1−p)\sigma^2 = np(1-p)σ2=np(1−p)
5.6 Poisson Distribution
Used for number of events in a fixed interval of time/space.
PMF: P(X=k)=λke−λk!P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}P(X=k)=k!λke−λ
Mean and Variance: λ\lambdaλ
5.7 Let Us Sum Up
Summary of key concepts: probability distributions, expected value, variance, and
specific distributions (Binomial, Poisson).
5.8 Key Words
Random Variable, Discrete, Continuous, Probability Distribution, Expected Value,
Variance, Binomial, Poisson.
5.9 Case
Practical application of probability distributions in decision-making.
Chapter-6
Introduction
Objective: Understand and apply continuous distributions, including Uniform, Normal,
and Exponential distributions.
Purpose: Solve practical problems using continuous distributions, with exercises for
practice.
6.2 Continuous Distributions: Introduction
Continuous Random Variable: A variable with a range of possible values within an
interval.
Common Example: Normal distribution.
Use: Probability distributions help predict outcomes based on known properties.
Probability Distributions of Continuous Variables
Definition: Continuous random variables take all values in an interval.
Calculation: Probabilities found using calculus, not physical measurement.
6.3 Normal Distribution
Type: Continuous and most commonly used distribution.
Applications: Variables like weight, height, etc.
Parameters: Mean (µ) and standard deviation (σ).
Standard Normal Distribution: Mean of 0 and standard deviation of 1.
Characteristics:
o Symmetric distribution.
o Uni-modal.
o Continuous range from –∞ to +∞.
o Total area under the curve is 1.
o Mean, median, and mode are equal.
Properties of Normal Distribution
Symmetry: Curve is symmetric around the mean.
Mean, Median, Mode: All are equal.
Asymptotic: Curve never touches the x-axis.
Unimodal: One peak point.
Quartiles: Equidistant from mean.
Linear Combination: If X and Y are independent normal variates, aX + bY is also
normal.
Importance of Normal Distribution
1. Sample size increase leads to normal properties.
2. Skewed variables can be transformed to normal.
3. Sampling distributions tend to normal.
4. Basis for hypothesis testing.
5. Statistical Quality Control relies on normal distribution.
6. Approximation to binomial and Poisson distributions.
7. Theoretical and applied usefulness.
8. Mathematically convenient.
Area Under the Normal Curve
Total Area: 1 (50% on each side of the mean).
Z-Scores: Standard normal table used for probabilities.
Example: P(-2 ≤ z ≤ +2) = 0.9544 (approx 95%).
6.5 The Uniform Distribution
Definition: Equal probability over an interval.
Density Function: P(x)=x2−x1b−aP(x) = \frac{x_2 - x_1}{b - a}P(x)=b−ax2−x1.
Mean: a+b2\frac{a + b}{2}2a+b.
Standard Deviation: b−a12\frac{b - a}{\sqrt{12}}12b−a.
Example: Uniform Distribution
Process time between 20 to 40 minutes.
Probability for 25 to 30 minutes is 25%.
6.6 The Exponential Distribution
Definition: Time between random occurrences.
Characteristics:
o Continuous, right-skewed.
o Ranges from 0 to ∞.
o Apex at x = 0.
o Decreases gradually as x increases.
Density Function: f(x)=λe−λxf(x) = \lambda e^{-\lambda x}f(x)=λe−λx.
Parameter: λ (inverse of mean).
Example: Exponential Distribution
Arrivals at ticket counter (Poisson distributed, 3 customers/minute).
Probability of an interval of 2+ minutes is 85%.
6.7 The Normal Approximation to the Binomial Distribution
Definition: Approximate binomial distribution using normal distribution for large sample
sizes.
Conditions: n, p, and n(1-p) ≥ 10.
Transformation:
o Mean: μ=np\mu = npμ=np.
o Standard Deviation: σ=np(1−p)\sigma = \sqrt{np(1-p)}σ=np(1−p).
Example: Normal Approximation
Convert binomial parameters to normal.
Use normal distribution properties to estimate probabilities.
6.8 Summary
Key formulas, concepts, and distributions (Normal, Uniform, Exponential).
Application of standard normal variables, mean, and standard deviation.
Practical examples and exercises included.
Chapter-7
Introduction
Objective: Understand the concepts of the sampling distribution, central limit theorem,
distribution of a sample’s mean, and sample proportions.
Purpose: Solve practical problems related to sampling distributions, with exercises for
practice.
7.2 Sampling Distribution
Definition: The probability distribution of a statistic.
Concept: If all possible samples of size nnn are drawn from a population and a statistic is
computed for each sample, the probability distribution of this statistic is called a sampling
distribution.
7.3 Sampling Distribution of the Mean (X̄ )
Definition: The sample mean is a random variable with its probability distribution.
Example: Drawing a sample of size n=2n = 2n=2 from a uniformly distributed
population over the integers 1 to 6.
Key Points:
o The distribution of the sample mean may differ from the population distribution.
o Probability calculations for sample means often involve z-scores and normal
distribution tables.
Central Limit Theorem (CLT)
Definition: As the sample size increases, the sampling distribution of the mean tends to a
normal distribution.
Conditions:
o Sample size n>30n > 30n>30 for non-normal populations.
o Any sample size if the population is normally distributed.
Formulas:
o Mean of the sample means: μx=μ\mu_x = \muμx=μ.
o Standard deviation of the sample means (Standard Error): σx=σn\sigma_x = \
frac{\sigma}{\sqrt{n}}σx=nσ.
Sampling Distribution of the Difference of Means
Definition: Comparing means from two different populations.
Formulas:
o Mean of the difference: μxˉ1−xˉ2=μ1−μ2\mu_{x̄ 1 - x̄ 2} = \mu_1 - \
mu_2μxˉ1−xˉ2=μ1−μ2.
o Standard error of the difference: σxˉ1−xˉ2=σ12n1+σ22n2\sigma_{x̄ 1 - x̄ 2} = \
sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}σxˉ1−xˉ2=n1σ12+n2
σ22.
Example: Probability calculations for comparing the lifetimes of products from two
manufacturers.
7.4 Sampling Distribution of the Proportion
Definition: Distribution of sample proportions based on the binomial distribution.
Formulas:
o Mean of the sample proportion: μp=p\mu_p = pμp=p.
o Standard deviation of the sample proportion: σp=p(1−p)n\sigma_p = \sqrt{\
frac{p(1-p)}{n}}σp=np(1−p).
Example: Calculating the probability of a sample proportion deviating from the
population proportion.
Sampling Distribution of the Difference of Proportions
Definition: Comparing proportions from two different populations.
Formulas:
o Mean of the difference: μp1−p2=p1−p2\mu_{p1 - p2} = p1 - p2μp1−p2=p1−p2.
o Standard error of the difference: σp1−p2=p1(1−p1)n1+p2(1−p2)n2\sigma_{p1 -
p2} = \sqrt{\frac{p1(1-p1)}{n1} + \frac{p2(1-p2)}{n2}}σp1−p2=n1p1(1−p1)
+n2p2(1−p2).
Example: Probability calculations for the difference in defect rates between products
from two companies.
7.5 Determining Sample Size
Factors to Consider:
1. Tolerable error.
2. Desired confidence level.
3. Population variance.
Formula: n=(Zα/2σE)2n = \left( \frac{Z_{\alpha/2} \sigma}{E} \right)^2n=(EZα/2σ)2
where Zα/2Z_{\alpha/2}Zα/2 is the z-score for the desired confidence level, σ\sigmaσ is
the population standard deviation, and EEE is the tolerable error.
Example: Calculating the required sample size to estimate average income within a
specific confidence interval and error tolerance.
7.6 Summary
Key formulas and concepts of sampling distribution and determining estimates within
samples.
Application of central limit theorem, distribution of sample means, and sample
proportions.
Examples and exercises to practice calculating sample sizes and understanding sampling
distributions.
Chapter-8
1. Basic Terms
Null Hypothesis (H0): Statement of no effect or status quo.
Alternative Hypothesis (Ha): Statement indicating the presence of an effect or
difference.
2. Types of Errors
Type I Error (α): Rejecting a true null hypothesis (false positive).
Type II Error (β): Accepting a false null hypothesis (false negative).
3. Significance Level
Common levels: 1%, 5%, 10%.
p-value: Probability of obtaining test results at least as extreme as the results observed,
under the assumption that the null hypothesis is correct.
Decision Rule: Reject H0 if p-value < significance level.
4. Steps in Hypothesis Testing
1. Formulate Hypotheses:
o Example: H0: μ = μ0, Ha: μ ≠ μ0.
2. Choose the Test:
o Z-test for known population standard deviation (σ) or large samples (n > 30).
o t-test for unknown population standard deviation or small samples (n ≤ 30).
3. Calculate Test Statistic:
o Z-test: Z=Xˉ−μ0σnZ = \frac{\bar{X} - \mu_0}{\frac{\sigma}{\sqrt{n}}}Z=nσ
Xˉ−μ0
o t-test: t=Xˉ−μ0snt = \frac{\bar{X} - \mu_0}{\frac{s}{\sqrt{n}}}t=nsXˉ−μ0
4. Find Critical Value:
o Use statistical tables or software.
5. Make Decision:
o Compare test statistic with critical value or use p-value.
5. One-Tail vs. Two-Tail Tests
One-Tail Test: Tests for effect in one direction (e.g., μ > μ0 or μ < μ0).
Two-Tail Test: Tests for effect in both directions (e.g., μ ≠ μ0).
6. Example Formulas
Z-Test: Z=Xˉ−μ0σnZ = \frac{\bar{X} - \mu_0}{\frac{\sigma}{\sqrt{n}}}Z=nσXˉ−μ0
t-Test: t=Xˉ−μ0snt = \frac{\bar{X} - \mu_0}{\frac{s}{\sqrt{n}}}t=nsXˉ−μ0
o Degrees of Freedom (df): n - 1
Key Points to Remember
Null Hypothesis (H0): Usually states no effect or no difference.
Alternative Hypothesis (Ha): Indicates the presence of an effect or difference.
Type I Error (α): Probability of rejecting true H0.
Type II Error (β): Probability of accepting false H0.
Significance Level (α): Commonly 0.05 (5%).
p-value: If p < α, reject H0.
Z-Test: Use when σ is known or n > 30.
t-Test: Use when σ is unknown and n ≤ 30.
One-Tail Test: Tests for a specific direction.
Two-Tail Test: Tests for any differenc
Chapter-9
Objectives
1. Test hypothesis of difference in two means with known population standard
deviation.
2. Test hypothesis of difference in two means with unknown population standard
deviation.
3. Calculate Z test and t-test in the case of two dependent populations.
4. Test hypothesis of differences in two population proportions.
5. Test hypothesis of the average difference in two related populations.
Key Concepts and Steps
9.1 Introduction to Two-Sample Hypothesis Testing
Comparing Two Independent Populations
Two-Sample Z-Test:
o Used when population standard deviations (σ1\sigma_1σ1 and σ2\sigma_2σ2) are
known or sample sizes are large (n>30n > 30n>30).
o Formula: Z=(Xˉ1−Xˉ2)−(μ1−μ2)σ12n1+σ22n2Z = \frac{(\bar{X}_1 - \
bar{X}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\
sigma_2^2}{n_2}}}Z=n1σ12+n2σ22(Xˉ1−Xˉ2)−(μ1−μ2)
o Example Steps:
1. Formulate H0:μ1=μ2H_0: \mu_1 = \mu_2H0:μ1=μ2 and Ha:μ1≠μ2H_a: \
mu_1 \ne \mu_2Ha:μ1=μ2.
2. Calculate Z-statistic.
3. Compare with critical value from Z-table.
Two-Sample t-Test:
o Used when population standard deviations are unknown.
o Formula: t=(Xˉ1−Xˉ2)−(μ1−μ2)s12n1+s22n2t = \frac{(\bar{X}_1 - \bar{X}_2) -
(\mu_1 - \mu_2)}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}t=n1s12+n2
s22(Xˉ1−Xˉ2)−(μ1−μ2)
o Degrees of Freedom (df): Smaller of n1−1n_1 - 1n1−1 and n2−1n_2 - 1n2−1.
Comparing Two Related Populations
Paired t-Test:
o Used when samples are related (e.g., before and after measurements).
o Formula: t=DˉsDnt = \frac{\bar{D}}{\frac{s_D}{\sqrt{n}}}t=nsDDˉ
o Dˉ\bar{D}Dˉ: Mean of the differences, sDs_DsD: Standard deviation of the
differences.
Comparing Two Population Proportions
Z-Test for Proportions:
o Formula: Z=p1−p2p^(1−p^)(1n1+1n2)Z = \frac{p_1 - p_2}{\sqrt{\hat{p}(1 - \
hat{p}) \left(\frac{1}{n_1} + \frac{1}{n_2}\right)}}Z=p^(1−p^)(n11+n21)p1−p2
o p^\hat{p}p^ is the pooled proportion: p^=x1+x2n1+n2\hat{p} = \frac{x_1 + x_2}
{n_1 + n_2}p^=n1+n2x1+x2.
9.2 Detailed Steps for Z-Test and t-Test
Steps for Hypothesis Testing
1. Formulate Hypotheses:
o H0H_0H0: μ1=μ2\mu_1 = \mu_2μ1=μ2
o HaH_aHa: μ1≠μ2\mu_1 \ne \mu_2μ1=μ2 (two-tailed) or HaH_aHa: μ1>μ2\
mu_1 > \mu_2μ1>μ2 / HaH_aHa: μ1<μ2\mu_1 < \mu_2μ1<μ2 (one-tailed)
2. Select Significance Level (α):
o Common choices: 0.01, 0.05, 0.10
3. Choose the Test:
o Z-Test for known σ\sigmaσ or large samples (n>30n > 30n>30).
o t-Test for unknown σ\sigmaσ or small samples (n≤30n \le 30n≤30).
4. Calculate the Test Statistic:
o Z-Test Formula: Z=Xˉ1−Xˉ2σ12n1+σ22n2Z = \frac{\bar{X}_1 - \bar{X}_2}{\
sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}}Z=n1σ12+n2σ22Xˉ1
−Xˉ2
o t-Test Formula: t=Xˉ1−Xˉ2s12n1+s22n2t = \frac{\bar{X}_1 - \bar{X}_2}{\
sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}t=n1s12+n2s22Xˉ1−Xˉ2
5. Find Critical Value or p-Value:
o Use Z-table or t-table based on the selected test.
6. Make Decision:
o Compare calculated value with critical value.
o If ∣calculated value∣>critical value|\text{calculated value}| > \text{critical
value}∣calculated value∣>critical value, reject H0H_0H0.
Examples
1. Comparing Means of Electric Bulbs:
o Given:
n1=100n_1 = 100n1=100, Xˉ1=1300\bar{X}_1 = 1300Xˉ1=1300, σ1=82\
sigma_1 = 82σ1=82
n2=100n_2 = 100n2=100, Xˉ2=1288\bar{X}_2 = 1288Xˉ2=1288, σ2=93\
sigma_2 = 93σ2=93
o Test Statistic: Z=1300−1288822100+932100=0.968Z = \frac{1300 - 1288}{\
sqrt{\frac{82^2}{100} + \frac{93^2}{100}}} = 0.968Z=100822+100932
1300−1288=0.968
o Decision:
Critical value for α=0.05\alpha = 0.05α=0.05 is ±1.96.
Since 0.968<1.960.968 < 1.960.968<1.96, do not reject H0H_0H0.
2. Comparing Proportions of Tea Consumption:
o Given:
n1=100n_1 = 100n1=100, x1=60x_1 = 60x1=60, p1=0.60p_1 = 0.60p1
=0.60
n2=200n_2 = 200n2=200, x2=100x_2 = 100x2=100, p2=0.50p_2 =
0.50p2=0.50
o Test Statistic: Z=0.60−0.500.55×0.45(1100+1200)=1.645Z = \frac{0.60 - 0.50}
{\sqrt{0.55 \times 0.45 \left(\frac{1}{100} + \frac{1}{200}\right)}} =
1.645Z=0.55×0.45(1001+2001)0.60−0.50=1.645
o Decision:
Critical value for α=0.05\alpha = 0.05α=0.05 is ±1.96.
Since 1.645<1.961.645 < 1.961.645<1.96, do not reject H0H_0H0.
Summary
1. Use Z-test for large samples or known σ\sigmaσ.
2. Use t-test for small samples or unknown σ\sigmaσ.
3. For paired samples, use the paired t-test.
4. For proportions, use the Z-test for proportions.
5. Follow standard steps: Formulate hypotheses, choose significance level, select test,
calculate statistic, find critical value, make a decision.
Chapter-10