0% found this document useful (0 votes)
53 views31 pages

Analyzing Motel Occupancy Rates

The document discusses key concepts in statistics including descriptive and inferential statistics, data types, sampling methods, data organization and visualization techniques, and measures of center. It covers topics like data collection and presentation, population and sample parameters, random and non-random sampling, data distributions, and types of graphs and charts.

Uploaded by

fiseco4756
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views31 pages

Analyzing Motel Occupancy Rates

The document discusses key concepts in statistics including descriptive and inferential statistics, data types, sampling methods, data organization and visualization techniques, and measures of center. It covers topics like data collection and presentation, population and sample parameters, random and non-random sampling, data distributions, and types of graphs and charts.

Uploaded by

fiseco4756
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

SESSION 1 SUMMARY

STATISTICS is the science of collecting, organizing, analyzing, interpreting


and presenting data.
A STATISTIC is a single measure (number) used to summarize a sample data
set.

There are TWO kinds of statistics:


 DESCRIPTIVE STATISTICS refers to the collection, presentation, and
summary of data (either using charts, graphs or numerical summary).
 INFERENTIAL STATISTICS refers to the generalizing from a sample
to a population, estimating unknown population parameters, drawing
conclusions, and making decisions.

EMPIRICAL DATA represent data collected through observation and


experiments.

There are EIGHT pitfalls in statistics:


PITFALL 1: Conclusions from Small Samples
PITFALL 2: Conclusions from Nonrandom Samples
PITFALL 3: Conclusions from Rare Events
PITFALL 4: Poor Survey Methods
PITFALL 5: Assuming a Causal Link
PITFALL 6: Generalization to Individuals
PITFALL 7: Unconscious Bias
PITFALL 8: Significance versus Importance

OBSERVATION is a single member of a collection of items that we want to


study.
VARIABLE is a characteristic of the subject or individual.
DATA SET consists of all the values of all of the variables for all of the
observations we have chosen to observe.

Data Set Variables


UNIVARIATE One
BIVARIATE Two
MULTIVARIATE More than two

Verbal Label
Categorical
(Qualiative)
Coded
Type of Data
Discrete
Numerical
(Quantiative)
Continuous

TIME SERIES DATA are collected from one observation in a different


equally spaced point in time  TRENDS AND PATTERNS OVER TIME.
CROSS SECTIONAL DATA are collected from multiple observations at the
same point in time  VARIATION AMONG OBSERVATIONS &
RELATIONSHIP.
POOLED CROSS SECTIONAL DATA AND TIME SERIES DATA
(PANEL DATA) are the combination of both, which mean they are collected
from multiple observations in a different equally spaced point in time.
NOTICE:
DATUM is the singular of Data.
PERIODICITY is the tendency of an event or series of events to happen
repeatedly in a fixed pattern.
Level of Mesuarment Characterisctics
NOMINAL Categories only
ORDINAL Rank has meaning
INTERVAL Distance has meaning
RATIO Meaningful zero exists

SESSION 2 SUMMARY
LIKERT SCALES is a special case of interval data frequently used in survey
research.
SAMPLE involves looking only at some items selected from the population.
CENSUS is an examination of all items in a defined population.

STATISTICS are computed from a sample of n items, chosen from a


population of N items.
Statistics can be used as estimates of PARAMETERS found in the population.
SYMBOLS are used to represent population parameters and sample statistics.

TARGET POPULATION is the population we are interested in.


SAMPLING FRAME is the group from which we take the sample.

If we allow duplicates when sampling, then we are sampling WITH


REPLACEMENT.
If we do not allow duplicates when sampling, then we are sampling
WITHOUT REPLACEMENT.
DUPLICATES are unlikely when n is MUCH SMALLER than N.

SIMPLE RANDOM SAMPLING means that we use random numbers to


select items from the list.
SYSTEMATIC SAMPLING means that we sample by choosing every kth
item from a list, starting from a randomly chosen entry on the list.
STRATIFIED SAMPLING means that we utilize prior information about the
population and sample when the population can be divided into relatively
HOMOGENEOUS SUBGROUPS of known size (STRATA). A simple
random sample of the desired size is taken within each STRATUM.
CLUSTER SAMPLING means that we sample based on strata that consist
geographical regions and includes:
- One-stage cluster sampling is the sample consists of all elements in each
of k randomly chosen subregions (CLUSTERS).
- Two-stage cluster sampling required two step begin with choose k
subregions then choose a random sample of elements within each cluster.
JUDMENT SAMPLING is a non-probability sampling method that relies on
the expertise of the sampler to choose items that are representative of the
population and can be affected by subconscious bias.
QUOTA SAMPLING is a special kind of judgment sampling, in which the
interviewer chooses a certain number of people in each category.
CONVENIENCE SAMPLE is a sampling method that takes advantage of
whatever sample is available at that moment and is a quick way to sample.
FOCUS GROUPS is a sampling method that a panel of individuals chosen to
be representative of a wider population, formed for open-ended discussion and
idea gathering.

SESSION 3 SUMMARY
Method of organizing, exploring and summarizing data include:
- VISUAL (CHARTS and GRAPHS) provides insight into characteristics
of a data set WITHOUT USING MATHEMATICS.
- NUMERICAL (STATISTICS or TABLES) provides insight into
characteristics of a data set USING MATHEMATICS.

DOT PLOT is the simplest graphical display of n individual values of


numerical data.
A STACKED DOT PLOT compares TWO OR MORE groups using a
common X-axis scale.
A FREQUENCY DISTRIBUTION is a table formed by classifying n data
values into k classes (BINS).
BIN LIMITS define the values to be included in each bin. Widths must all be
the same except when we have open-ended bins.
FREQUENCIES are the number of observations within each bin.
Express as RELATIVE FREQUENCIES (frequency divided by the total) or
PERCENTAGES (relative frequency times 100).

A HISTOGRAM is a graphical representation of a frequency distribution (a


bar chart).
X-AXIS shows end points of each bin.
Y-AXIS shows frequency within each bin.

MODAL CLASS is a histogram bar that is higher than those on either side.
Modal classes may be artifacts of the way bin limits are chosen.
There are THREE type of modal class:
- UNIMODAL – a single modal class.
- BIMODAL – two modal classes.
- MULTIMODAL – more than two modal classes.

A histogram may suggest the shape of the population. It is influenced by the


NUMBER OF BINS and BIN LIMITS.
SKEWNESS indicated by the direction of the longer tail of the histogram.
LEFT-SKEWED – (NEGATIVELY SKEWED) has a longer left tail.
RIGHT-SKEWED – (POSITIVELY SKEWED) has a longer right tail.
SYMMETRIC – both tail areas are the same.
PROTYPE DISTRIBUTION SHAPES

SIMPLE LINE CHART is used to display a time series or spot trends or to


compare time periods and can display several variables at once.
TWO-SCALE LINE CHART is used to compare variables that differ in
magnitude or are measured in different units.

ARITHMETIC SCALE – distances on the Y-axis are proportional to the


magnitude of the variable being displayed.
LOGARITHMIC SCALE – (RATIO SCALE) equal distances represent
equal ratios.

A LOG SCALE is useful for time series data that might be expected to grow at
a compound annual percentage rate. Use a log scale for the vertical axis when
data vary over a wide range (by more than an order of magnitude). It reveals
whether the quantity is growing at:
- INCREASING PERCENT (CONCAVE UPWARD).
- CONSTANT PERCENT (STRAIGHT LINE).
- DECLINING PERCENT (CONCAVE DOWNWARD).
COLUMN CHART is a vertical display of the data.
BAR CHART is a horizontal display of the data.
PATERO CHART is a special type of bar chart used in quality management to
display the frequency of defects or errors of different types. Categories are
displayed in descending order of frequency. Focus on SIGNIFICANT FEW.
BAR HEIGHT is the sum of several subtotals. Areas may be compared by
color to show patterns in the subgroups and total.
A PIE CHART can only convey a general idea of the data. Pie charts should be
used to portray data which sum to a total. A pie chart should only have a few
(i.e., slices. Each slice can be labeled with data values or percents.

SCATTER PLOTS can convey patterns in data pairs that would not be
apparent from a table. A scatter plot is a starting point for BIVARIATE DATA
ANALYSIS in which we investigate the association and relationship between
two variables. A scatter plot can convey patterns in data pars that would not be
apparent from a table.
There are ELEVEN deceptive graphing techniques:
ERROR 1: Nonzero Origin
ERROR 2: Elastic Graph Proportions
ERROR 3: Dramatic Title and Distracting Pictures
ERROR 4: 3-D and Novelty Graphs
ERROR 5: Rotated Graphs
ERROR 6: Unclear Definitions or Scales
ERROR 7: Vague Sources
ERROR 8: Complex Graphs
ERROR 9: Gratuitous Effects
ERROR 10: Estimated Data
ERROR 11: Area Trick

CUMULATIVE is the sum from the smaller values to the value of


consideration.
A RANDOM VARIABLE is a variable we never know in advance.
A RANDOM EXPERIMENT is a process whose results are never known in
advance.

In Stata, a NUMBER is colored BLACK, a WORD is colored RED and a


UNIT OF DEFINITION is colored BLUE.

The MEDIAN (M) is the 50TH PERCENTILE or MIDPOINT of the ordered


sample data. M separates the upper and lower halves of the ordered
observations.
If n is ODD, the median is THE MIDDLE OBSERVATION in the ordered
data set. If n is EVEN, the median is the average of THE MIDDLE TWO
OBSERVATIONS in the ordered data set.
N

∑ Xi
μ= i=1
N
n

∑ xi
μ= i=1
n

POPULATION MEAN SAMPLE MEAN

N: Population size n: Sample size


X: Random value x: Specific value

µ = E(X): Expected value of a discrete random variable X.


MEAN is SENSITIVE to EXTREME VALUE OUTLIERS.
MEDIAN is INSENSITIVE to EXTREME VALUE OUTLIERS.

SESSION 4 SUMMARY
MODE is the most frequently occurring data value and may have multiple
modes or no mode. Mode is most useful for DISCRETE or CATEGORICAL
DATA with only a few distinct data values. For CONTINUOUS DATA or
DATA WITH A WIDE RANGE, the mode is rarely useful.
Type of Variable Best Measure of Central Tendency
Norminal Mode
Ordinal Median
Interval / Ratio (not skewed) Mean
Interval / Ratio (skewed) Median
VARIATION is the SPREAD of data points about the center of the distribution
in a sample.
RANGE is the differrence between the largest and smallest observation.
Range=xmax −x min

GROWTH RATE is a variation on the geometric mean used to find the


average growth rate for a time series.

GR=

n−1 xn
x1
−1

POPULATION VARIANCE σ
CV = × 100
μ
N

∑ (x i−μ)2 SAMPLE VARIANCE


σ 2= i =1
N n

∑ (xi −x)2
POPULATION STANDARD s2= i=1
DEVIATION n


N SAMPLE STANDARD
∑ (x i−μ) 2
DEVIATION
i=1
σ=


N n

∑ ( xi −x)2
POPULATION COEFFICIENT s= i=1

OF VARIATION n−1
SAMPLE COEFFICIENT OF s
CV = ×100
VARIATION x

POPULATION MEAN SAMPLE MEAN ABSOLUTE


ABSOLUTE DEVIATION DEVIATION
N

∑ ¿ x i−μ∨¿
MAD= i=1 ¿
N
n

∑ ¿ x i−x∨¿
MAD= i=1 ¿
n

DEVIATION
x i−μ ( μ is the population mean)

x i−x ( x is the sample mean)

The EMPIRICAL RULE demonstrtes that for data from a normal distribution,
we expect the interval  ± k to contain a known percentage of data. For:
• k = 1, 68.26% will lie within m + 1σ
• k = 2, 95.44% will lie within m + 2σ
• k = 3, 99.73% will lie within m + 3σ
A STANDARDIZED xi −x
VARIABLE (z) redefines each observation
xi −μ
in terms of
i=
the number of zstandard deviations from the mean. z i=
s σ

SAMPLE POPULATION
NEGATIVE z means that the POSITIVE z means that the
observation is to the LEFT of the observation is to the RIGHT of the
mean. mean.

For a normal distribution, the range of values is almost 6σ (from µ – 3σ to µ +


3σ). If you know the range R (high – low), you can estimate the standard
deviation as σ = R/6.

PERCENTILES are data that have been divided into 100 groups.
DECILES are data that have been divided into 10 groups.
QUINTILES are data that have been divided into 5 groups.
QUARTILES are data that have been divided into 4 groups.
Median (Q2)

Interquartile Range (Q1−Q3)


Inner fences Outer fences
Lower fence Q1 – 1.5 (Q3 – Q1) Q1 – 3.0 (Q3 – Q1)
Upper fence Q3 + 1.5 (Q3 – Q1) Q3 + 3.0 (Q3 – Q1)

MIDHINGE
Q 1+ Q 3
Midhinge=
2

The CORRELATION COEFFICIENT is a statistic that describes the degree


of LINEARITY between paired observations on two quantitative variables X
and Y.
N

∑ ( x i ¿−x )( y i− y )
i=1
r p= ¿

√∑ √∑
N N
2 2
(xi −μ x ) ( y i−μ y )
i=1 i=1

POPULATION
n

∑ (x i ¿−x )( y i− y )
i=1
r s= ¿

√∑ √∑
n n
2 2
(x i−x) ( y i− y)
i=1 i=1

SAMPLE
Note: −1 ≤r ≤+1

The COVARIANCE of two random variables X and Y measures the degree to


which the values of X and Y change together.
N n

∑ (x i¿−x )( y i− y ) ∑ (x i ¿−x )( y i− y )
i=1
σ xy= ¿ s xy= i=1 ¿
N n−1

POPULATION SAMPLE
A CORRELATION COEFFICIENT is the COVARIANCE divided by the
product of the standard deviations of X and Y.
POPULATION σ xy
ρ=
σx σ y

SAMPLE
s xy
r=
sx s y
WEIGHTED MEAN
k k
x=∑ w j x j where ∑ w j=1.00
j=1 j=1

GROUP MEAN

∑ f j m j SKEWNESS
x= j=1
n

( )
n 3
n x i−x
Skewness= ∑
( n−1 ) ( n−2 ) i=1 s
KURTOSIS

( )
n n 2
n ( n+ 1 ) x i−x 3 ( n−1 )
Kurtosis= ∑
( n−1 ) ( n−2 )( n−3 ) i=1 S

( n−2 )( n−3 )

SESSION 5 SUMMARY
A RANDOM EXPERIMENT is an observational process whose results cannot
be known in advance. The set of all possible OUTCOMES is the SAMPLE
SPACE for the experiment. A sample space can be finite (is a discrete) or
infinite. If the outcome is a CONTINUOUS measurement, the sample space
cannot be listed but can be described by a rule.

An EVENT is any subset of outcomes in the sample space. A simple event or


elementary event, is a single outcome. A DISCRETE SAMPLE SPACE
consists of all the simple events (Ei):
S = {E1, E2, …, En}.

The PROBABILITY of an event is a number that measures the relative


likelihood that the event will occur. The probability of event A, denoted as
P(A), must lie within the interval from 0 to 1:
0 ≤ P(A) ≤ 1
 If P(A) = 0, then the event CANNOT occur.
 If P(A) = 1, then the event IS CERTAIN to occur.
P(S) = P(E1) + P(E2) + … + P(En) = 1

CLASSICAL APPROACH is a priori refers to the process of assigning


probabilities before the event is observed or the experiment is conducted. A
priori probabilities are based on logic, not experience. Instead of performing the
experiment, we can use deduction to determine the probability of an event.
THE LAW OF LARGE NUMBERS says that as the number of trials
increases, any empirical probability approaches its theoretical limit.
EMPERICAL APPROACH is using the empirical or relative frequency
approach to assign probabilities by counting the frequency (fi) of observed
outcomes defined on the experimental sample space.
f
P=
n

SUBJECTIVE APPROACH reflects someone’s informed judgment about the


likelihood of an event and is used when there is no repeatable random
experiment.

A TREE DIAGRAM helps you visualize all possible outcomes and starts with
a contingency table. The tree diagram shows all events along with their
marginal, conditional, and joint probabilities.
A JOINT PROBABILITY represents the intersection of two events in a cross-
tabulation table.
The MARGINAL PROBABILITY of a single event is found by dividing a
row or column total by the total sample size.

Type of Symbol & Caculation Figure


Probality

Complement P ( A ) + P ( A ' )=1


P ( A ' )=1−P ( A )
Union A∪B
A or B
(∪: cup)

A∩B
Intersection A and B
(∩: cap)

Law of
Addition P ( A ∪ B )=P ( A ) + P ( B ) −P ( A ∩ B )
(General)

Mutually
Exclusive A ∩ B=ϕ → P ( A ∩ B )=0
Events
Law of
Addition P ( A ∪ B )=P ( A ) + P ( B )
(Special)

Dichotomous
(Binary)
Events

Collectively
Exhaustive
Events

Conditional P( A ∩ B)
P ( A|B )=
Probability P (B )
For P ( B ) >0

General Law
of P ( A ∩B )=P ( A|B ) × P ( B )
Multiplicatio
n
Odd (In P(A) P(A)
=
Favour) P( A )
'
1−P ( A )
Odd (Against) P ( A ' ) 1−P ( A )
=
P(A) P(A)
Independent P ( A ∩ B ) P ( A ) × P(B)
P ( A|B )= = =P( A)
P (B ) P(B)
Multiplicatio
n Law P ( A 1 ∩… ∩ A n ) =P ( A 1 ) … P (A n )
(Independent
Events)
Bayes P ( A|B )=
P( A)× P( B∨ A)
Theorem P(B)

SESSION 6 SUMMARY
BAYES’ THEOREM states that the PRIOR (MARGINAL) probability of an
event B is revised after event A has been considered to yield a POSTERIOR
(CONDITIONAL) probability.

If event A can occur in n1 ways and event B can occur in n2 ways, then events A
and B can occur in n1 ×n 2 ways.
In general, m events can occur n1 ×n 2 × …× nm ways.

The number of ways that n items can be arranged in a particular order and is the
product of all integers from 1 to n is n FACTORIAL. Factorials are useful for
counting the possible arrangements of any n items.
n !=n × ( n−1 ) × ( n−2 ) × … ×1

A PERMUTATION is an arrangement in a particular order of r randomly


sampled items from a group of n.
n n!
Pr =
(n−r )!

A COMBINATION is an arrangement of r items chosen at random from n


items where the order of the selected items is not important.
n n!
Cr=
r ! (n−r )!
A RANDOM VARIABLE is a function or rule that assigns a numerical value
to each outcome in the sample space of a random experiment.

A DISCRETE RANDOM VARIABLE has a countable number of distinct


values.
A DISCRETE PROBABILITY DISTRIBUTION assigns a probability to
each value of a discrete random variable X.
0 ≤ P ( x i ) ≤ 1(the probability for any given value of X)
n

∑ P ( x i )=1(the ∑ ¿ all values of X )


i=1

A PROBABILITY DISTRIBUTION FUNCTION (PDF) is a mathematical


function that shows the probability of each X-value.
A CUMULATIVE DISTRIBUTION FUNCTION (CDF) is a mathematical
function that shows the cumulative sum of probabilities, adding from the
smallest to the largest X-value, gradually approaching unity.

The EXPECTED VALUE E(X) of a discrete random variable is the sum of all
X-values weighted by their respective probabilities. E(X) is a measure of central
tendency.
N
μ=E ( X )=∑ x i P (x i)
i=1

The VARIANCE is a weighted average of the dispersion about the mean and a
measure of variability.
N
Var ( X )=σ =∑ ( x i−μ ) P(x i )
2 2

i=1

The STANDARD DEVIATION is the square root of the variance.


σ =√ σ 2= √ Var ( X)

The UNIFORM DISTRIBUTION describes a random variable with a finite


number of integer values from a to b (the only two parameters). Each value of
the random variable is equally likely to occur.
SESSION 7 & 8 SUMMARY
A random experiment with only 2 outcomes is a BERNOULLI
EXPERIMENT.
One outcome is arbitrarily labeled a “success” (denoted X = 1) and the other a
“failure” (denoted X = 0).
π is the P(success), 1 – π is the P(failure). “Success” is defined as the less likely
outcome so that π < 0.5 for convenience.
P(0) + P(1) = (1 – π) + π = 1 and 0 ≤ π ≤ 1
The expected value (mean) and variance of a Bernoulli experiment is calculated
as: E(X) = π and V(X) = π(1 – π).

The BINOMIAL DISTRIBUTION arises when a Bernoulli experiment is


repeated n times. Each trial is independent so the probability of success p
remains constant on each trial.
In a binomial experiment, we are interested in X = number of successes in n
trials.
X =X 1 + X 2 +…+ X n.

The probability of a particular number of successes P(X) is determined by


parameters n and π.

COMPOUND EVENTS are individual probabilities added to obtain any


desired event probability. Cumulative probabilities can be evaluated by
summing individual X probabilities.

The POISSON DISTRIBUTION describes the number of occurrences within a


continuous interval – a randomly chosen unit of time or space.
The events in Poisson distribution occur randomly and independently over a
continuum of time or space. We will call the continuum “time” since the most
common Poisson application is modeling arrivals per unit of time. The Poisson
model’s only parameter is λ, which represents the mean number of events per
unit of time or space.
The Poisson distribution may be used to approximate a binomial by setting λ =
nπ. This approximation is helpful when the binomial calculation is difficult.
The general rule for a good approximation is that n should be “large” and
should be “small.” A common rule of thumb says the approximation is that n
should be “large” and π should be “small.”
A common rule of thumb says the approximation is adequate if n ≥ 20 and π ≤
0.05.

DISCRETE VARIABLE – each value of X has its own probability P(X).


CONTINUOUS VARIABLE – events are intervals and probabilities are areas
under continuous curves. A single point has no probability.
CONTINUOUS PDF is denoted f(x), must be nonnegative and the total area
under curve = 1.
CONTINUOUS CDF is denoted F(x), shows P(X ≤ x), the cumulative
proportion of scores and useful for finding probabilities.
CONTINUOUS PROBABILITY FUNCTIONS, unlike discrete distributions,
has the probability at any single point = 0. The entire area under any PDF, by
definition, is set to 1.

NORMAL DISTRIBUTION has symmetric distribution and unimodal about


the mean. We can use normal distribution to estimate binominal and Poisson
distribution.

If events per unit of time follow a Poisson distribution, the time until the next
event follows the EXPONENTIAL DISTRIBUTION. The time until the next
event is a continuous variable.
Type of Formula
Distribution
Uniform
Discrete
Distribution

Binomial
Distribution

Poisson
Distribution
Normal
Distribution

Standard
Normal
Distriution

Exponential
Distribution

SESSION 9 SUMMARY
SAMPLE MEAN is considered a random variable.
SAMPLE STATISTIC is a random variable whose value depends on which
population items are included in the random sample.
SAMPLING DISTRIBUTION is a distribution of a sample statistic when we
repeat sampling with the same sample size.
ESTIMATOR is a statistic derived from a sample to infer the value of a
population parameter.
ESTIMATE is the value of the estimator in a particular sample.

CENTRAL LIMIT THEOREM states that sample mean will have a normal
distribution even if the distribution is not normal, as long as the sample size is
large enough.
Population shape is SIMMETRIC: n ≥ 15
Population shape is NOT KNOWN: n ≥ 30

SESSION 10 & 11 SUMMARY


POPULATION PROPORTION (denoted as π) is considered a random
variable.
μ x =nπ

σ x =nπ ( 1−π ) → σ x =√ nπ (1−π )


2

SAMPLE PROPORTION is considered a sample mean.


x
p= → μ p=π
n

σ p=
2 π (1−π )
n
→ σ p=

π (1−π)
n
p−π
N (0 ; 1)


Standardize P N ¿ ] to π (1−π )
n
CONFIDENCE INTERVAL
2
σx x−μ x
Standarddize x N (μ x ; ) to z x = N (0 ; 1)
n σ x √n

If μ x is known:
σx
 Lower limit: x−1 , 96
√n
σx
 Upper limit: x +1 ,96
√n
σx σx
95 % CI=( x−1 , 96 ; x+1 , 96 )
√n √n
σx σx
99 % CI =( x−2 , 58 ; x +2 , 58 )
√n √n
If μ x is not known:


n

∑ (x i−x)2
i=1
sx=
n−1
x−μ
t=
sx
√n
sx
( 1−α ) 100 % CI =x ± z
1−
α
2 √n
sx
With z =e (marginal error )
1−
α
2 √n
2 2
(z α ) σx
σx 1−
2
e=z × → n=
1−
α
2 √n e2

sx
( 1−α ) 100 % CI =x ± t
1−
α
2 √n
sx
With t =e(marginal error )
1−
α
2 √n
2 2
(t α ) σx
σx 1−
2
e=t × → n=
1−
α
2 √n e2
( 1−α ) 100 % CI = p ± z
1−
α
2 √ p (1− p)
n

With z
1−
α
2 √ p (1− p)
n
=e(marginal error )

( )
2
z p(1−p)


α
p (1− p) 1−
2
e=z α → n= 2
1−
2 n e

ONE-SIDED / TAILED TEST


TYPE 1 ERROR is rejecting H 0 when in fact H 0 is true.
P (type 1 error) = α
TYPE 2 ERROR is not rejecting H 0 when in fact H 0 is false.
P (type 2 error) = β

H 0 : μ=a

H 1 : μ >a

Decide on the significant level α


Look up the CDF table of standard normal or t-table for critical value c
x−μ x−μ
zx= ∨t x =
σx sx
√n √n
Compare with c:
 If z x <c∨t x < c, do not reject H 0 at α% level.
 If z x >c∨t x > c, reject H 0 at α% level.

π=a
π>a
Decide on the significant level α
Look up the CDF table of standard normal
p−x
z p=

√ p ( 1− p )
n

Compare with c with the same way above


H 0 : μ=a

H1: μ ≠ a

Decide on the significant level α


Look up the CDF table of standard normal or t-table for critical value c
x−μ x−μ
zx= ∨t x =
σx sx
√n √n
Compare with c:
 If |z x|< c∨|t x|< c, do not reject H 0 at α% level.
 If|z x|> c∨|t x|> c, reject H 0 at α% level.

P-VALUE
H 0 : μ=a

H1: μ ≠ a

x−μ x−μ
zx= ∨t x =
σx sx
√n √n
P−value=P ( z ←|z x|∨z >|z x|) =P ( t ←|t x|∨t >|t x|)

 If P-value > α, do not reject H 0 at α% level


 If P-value < α, reject H 0 at α% level

SESSION 12 SUMMARY
H 0 : μ1=μ2 → μ1−μ 2=0

H 1 : μ 1 ≠ μ2 → μ1−μ2 ≠ 0

1. Independent Sample, σ 1∧σ 2 unknown, σ 1=σ 2


Pooled variance:
2 ( n 1−1 ) s 12 + ( n2−1 ) s 22
sp =
n1+ n2−2

Test statistic:
( x1 −x2 ) −( μ1 −μ 2)
t stat =

√ s p2 (
1 1
+ )
n 1 n2

( 1−α ) 100 % CI =x 1−x 2 ± t


1−
α
2 √ 2
sp (
1 1
+ )
n1 n2

2. Related / Matched Samples


d i=x 1 i−x 2 i

d=
∑ x 1 i−∑ x 2 i
n

sd
2
=
∑ (d i−d)
2

n−1
2
sd
2
sd =
n

Test statistic:
d−(μ 1−μ2)
t stat = where df =n−1
sd / √ n

sd
( 1−α ) 100 % CI =d ±t
1−
α
2 √n

H 0 :π 1=π 2 → π 1−π 2=0

H 1 : π 1 ≠ π 2 → π 1 −π 2 ≠ 0

3. Population proportion
x 1+ x2
p=
n1 + n2

σ ∆ p=
√ p (1− p) p(1− p)
n1
+
n2

Test statistic
( p 1− p2 )−(π 1−π 2)
z stat =

√ p (1− p)(
1 1
+ )
n1 n2

( 1−α ) 100 % CI = p1− p2 ± z


1−
α
2 √ p(1−p)(
1 1
+ )
n1 n2

EXTRA SESSION SUMMARY


2 2
H 0 :σ 1 =σ 2
2 2
H1: σ1 ≠ σ 2

Decide on the significance level α


Look up the F table for critical value c
Compute test statistic
2
s1
F−stat= 2
Fn −1 , n −1
s2 1 2

Compare F – stat with c


If F – stat < c, do not reject H 0 at α% level.
If F – stat > c, reject H 0 at α% level.

H 0 : μ1 ¿ μ2 =…=μ c

H 1 : H 0 is not true

Group / Individual Mean

Group / Overall Mean


SSB/(c−1)
F−stat= F
SSE /(n−c ) c−1 ,n−c
POPULATION SIMPLE REGRESSION MODEL
y i=β 0 + β 1 x i +ε i

y i isdependent variable(explained variable)

x i is∈dependent variable (explanatary variable)

i is the observation, i = 1, …, N
β 0 is intercept

β 1 is slope

ε i is disturbance ( error ) , contains all the unobservable factor

Total derivatives of y i=β 0 + β 1 X i+ ε i: ∆ y =β 1 ∆ x + ∆ ε


If ∆ ε=0 → ∆ y= β1 ∆ x

SAMPLE REGRESSION MODEL


y i=b 0 +b1 x i+ ei isthe actual value

^y i=b 0 +b1 x i is the fitted∨ predicted value(on the regression line)

y i= ^y i + ei ( e i is residual )

ORDINARY LEAST SQUARE


n n

,b ∑ ei = ∑ ( y i −b0−b1 x i )
2 2
minb 0 1
i=1 i=1

b 1=
∑ ( x i−x)( y i− y )
∑ ( xi −x)
b 0= y−b1 x

You might also like