0% found this document useful (0 votes)

53 views31 pages

Analyzing Motel Occupancy Rates

The document discusses key concepts in statistics including descriptive and inferential statistics, data types, sampling methods, data organization and visualization techniques, and measures of center. It covers topics like data collection and presentation, population and sample parameters, random and non-random sampling, data distributions, and types of graphs and charts.

Uploaded by

fiseco4756

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views31 pages

Analyzing Motel Occupancy Rates

Uploaded by

fiseco4756

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

SESSION 1 SUMMARY

STATISTICS is the science of collecting, organizing, analyzing, interpreting

and presenting data.
A STATISTIC is a single measure (number) used to summarize a sample data
set.

There are TWO kinds of statistics:

 DESCRIPTIVE STATISTICS refers to the collection, presentation, and
summary of data (either using charts, graphs or numerical summary).
 INFERENTIAL STATISTICS refers to the generalizing from a sample
to a population, estimating unknown population parameters, drawing
conclusions, and making decisions.

EMPIRICAL DATA represent data collected through observation and

experiments.

There are EIGHT pitfalls in statistics:

PITFALL 1: Conclusions from Small Samples
PITFALL 2: Conclusions from Nonrandom Samples
PITFALL 3: Conclusions from Rare Events
PITFALL 4: Poor Survey Methods
PITFALL 5: Assuming a Causal Link
PITFALL 6: Generalization to Individuals
PITFALL 7: Unconscious Bias
PITFALL 8: Significance versus Importance

OBSERVATION is a single member of a collection of items that we want to

study.
VARIABLE is a characteristic of the subject or individual.
DATA SET consists of all the values of all of the variables for all of the
observations we have chosen to observe.

Data Set Variables

UNIVARIATE One
BIVARIATE Two
MULTIVARIATE More than two

Verbal Label
Categorical
(Qualiative)
Coded
Type of Data
Discrete
Numerical
(Quantiative)
Continuous

TIME SERIES DATA are collected from one observation in a different

equally spaced point in time  TRENDS AND PATTERNS OVER TIME.
CROSS SECTIONAL DATA are collected from multiple observations at the
same point in time  VARIATION AMONG OBSERVATIONS &
RELATIONSHIP.
POOLED CROSS SECTIONAL DATA AND TIME SERIES DATA
(PANEL DATA) are the combination of both, which mean they are collected
from multiple observations in a different equally spaced point in time.
NOTICE:
DATUM is the singular of Data.
PERIODICITY is the tendency of an event or series of events to happen
repeatedly in a fixed pattern.
Level of Mesuarment Characterisctics
NOMINAL Categories only
ORDINAL Rank has meaning
INTERVAL Distance has meaning
RATIO Meaningful zero exists

SESSION 2 SUMMARY
LIKERT SCALES is a special case of interval data frequently used in survey
research.
SAMPLE involves looking only at some items selected from the population.
CENSUS is an examination of all items in a defined population.

STATISTICS are computed from a sample of n items, chosen from a

population of N items.
Statistics can be used as estimates of PARAMETERS found in the population.
SYMBOLS are used to represent population parameters and sample statistics.

TARGET POPULATION is the population we are interested in.

SAMPLING FRAME is the group from which we take the sample.

If we allow duplicates when sampling, then we are sampling WITH

REPLACEMENT.
If we do not allow duplicates when sampling, then we are sampling
WITHOUT REPLACEMENT.
DUPLICATES are unlikely when n is MUCH SMALLER than N.

SIMPLE RANDOM SAMPLING means that we use random numbers to

select items from the list.
SYSTEMATIC SAMPLING means that we sample by choosing every kth
item from a list, starting from a randomly chosen entry on the list.
STRATIFIED SAMPLING means that we utilize prior information about the
population and sample when the population can be divided into relatively
HOMOGENEOUS SUBGROUPS of known size (STRATA). A simple
random sample of the desired size is taken within each STRATUM.
CLUSTER SAMPLING means that we sample based on strata that consist
geographical regions and includes:
- One-stage cluster sampling is the sample consists of all elements in each
of k randomly chosen subregions (CLUSTERS).
- Two-stage cluster sampling required two step begin with choose k
subregions then choose a random sample of elements within each cluster.
JUDMENT SAMPLING is a non-probability sampling method that relies on
the expertise of the sampler to choose items that are representative of the
population and can be affected by subconscious bias.
QUOTA SAMPLING is a special kind of judgment sampling, in which the
interviewer chooses a certain number of people in each category.
CONVENIENCE SAMPLE is a sampling method that takes advantage of
whatever sample is available at that moment and is a quick way to sample.
FOCUS GROUPS is a sampling method that a panel of individuals chosen to
be representative of a wider population, formed for open-ended discussion and
idea gathering.

SESSION 3 SUMMARY
Method of organizing, exploring and summarizing data include:
- VISUAL (CHARTS and GRAPHS) provides insight into characteristics
of a data set WITHOUT USING MATHEMATICS.
- NUMERICAL (STATISTICS or TABLES) provides insight into
characteristics of a data set USING MATHEMATICS.

DOT PLOT is the simplest graphical display of n individual values of

numerical data.
A STACKED DOT PLOT compares TWO OR MORE groups using a
common X-axis scale.
A FREQUENCY DISTRIBUTION is a table formed by classifying n data
values into k classes (BINS).
BIN LIMITS define the values to be included in each bin. Widths must all be
the same except when we have open-ended bins.
FREQUENCIES are the number of observations within each bin.
Express as RELATIVE FREQUENCIES (frequency divided by the total) or
PERCENTAGES (relative frequency times 100).

A HISTOGRAM is a graphical representation of a frequency distribution (a

bar chart).
X-AXIS shows end points of each bin.
Y-AXIS shows frequency within each bin.

MODAL CLASS is a histogram bar that is higher than those on either side.
Modal classes may be artifacts of the way bin limits are chosen.
There are THREE type of modal class:
- UNIMODAL – a single modal class.
- BIMODAL – two modal classes.
- MULTIMODAL – more than two modal classes.

A histogram may suggest the shape of the population. It is influenced by the

NUMBER OF BINS and BIN LIMITS.
SKEWNESS indicated by the direction of the longer tail of the histogram.
LEFT-SKEWED – (NEGATIVELY SKEWED) has a longer left tail.
RIGHT-SKEWED – (POSITIVELY SKEWED) has a longer right tail.
SYMMETRIC – both tail areas are the same.
PROTYPE DISTRIBUTION SHAPES

SIMPLE LINE CHART is used to display a time series or spot trends or to

compare time periods and can display several variables at once.
TWO-SCALE LINE CHART is used to compare variables that differ in
magnitude or are measured in different units.

ARITHMETIC SCALE – distances on the Y-axis are proportional to the

magnitude of the variable being displayed.
LOGARITHMIC SCALE – (RATIO SCALE) equal distances represent
equal ratios.

A LOG SCALE is useful for time series data that might be expected to grow at
a compound annual percentage rate. Use a log scale for the vertical axis when
data vary over a wide range (by more than an order of magnitude). It reveals
whether the quantity is growing at:
- INCREASING PERCENT (CONCAVE UPWARD).
- CONSTANT PERCENT (STRAIGHT LINE).
- DECLINING PERCENT (CONCAVE DOWNWARD).
COLUMN CHART is a vertical display of the data.
BAR CHART is a horizontal display of the data.
PATERO CHART is a special type of bar chart used in quality management to
display the frequency of defects or errors of different types. Categories are
displayed in descending order of frequency. Focus on SIGNIFICANT FEW.
BAR HEIGHT is the sum of several subtotals. Areas may be compared by
color to show patterns in the subgroups and total.
A PIE CHART can only convey a general idea of the data. Pie charts should be
used to portray data which sum to a total. A pie chart should only have a few
(i.e., slices. Each slice can be labeled with data values or percents.

SCATTER PLOTS can convey patterns in data pairs that would not be
apparent from a table. A scatter plot is a starting point for BIVARIATE DATA
ANALYSIS in which we investigate the association and relationship between
two variables. A scatter plot can convey patterns in data pars that would not be
apparent from a table.
There are ELEVEN deceptive graphing techniques:
ERROR 1: Nonzero Origin
ERROR 2: Elastic Graph Proportions
ERROR 3: Dramatic Title and Distracting Pictures
ERROR 4: 3-D and Novelty Graphs
ERROR 5: Rotated Graphs
ERROR 6: Unclear Definitions or Scales
ERROR 7: Vague Sources
ERROR 8: Complex Graphs
ERROR 9: Gratuitous Effects
ERROR 10: Estimated Data
ERROR 11: Area Trick

CUMULATIVE is the sum from the smaller values to the value of

consideration.
A RANDOM VARIABLE is a variable we never know in advance.
A RANDOM EXPERIMENT is a process whose results are never known in
advance.

In Stata, a NUMBER is colored BLACK, a WORD is colored RED and a

UNIT OF DEFINITION is colored BLUE.

The MEDIAN (M) is the 50TH PERCENTILE or MIDPOINT of the ordered

sample data. M separates the upper and lower halves of the ordered
observations.
If n is ODD, the median is THE MIDDLE OBSERVATION in the ordered
data set. If n is EVEN, the median is the average of THE MIDDLE TWO
OBSERVATIONS in the ordered data set.
N

∑ Xi
μ= i=1
N
n

∑ xi
μ= i=1
n

POPULATION MEAN SAMPLE MEAN

N: Population size n: Sample size

X: Random value x: Specific value

µ = E(X): Expected value of a discrete random variable X.

MEAN is SENSITIVE to EXTREME VALUE OUTLIERS.
MEDIAN is INSENSITIVE to EXTREME VALUE OUTLIERS.

SESSION 4 SUMMARY
MODE is the most frequently occurring data value and may have multiple
modes or no mode. Mode is most useful for DISCRETE or CATEGORICAL
DATA with only a few distinct data values. For CONTINUOUS DATA or
DATA WITH A WIDE RANGE, the mode is rarely useful.
Type of Variable Best Measure of Central Tendency
Norminal Mode
Ordinal Median
Interval / Ratio (not skewed) Mean
Interval / Ratio (skewed) Median
VARIATION is the SPREAD of data points about the center of the distribution
in a sample.
RANGE is the differrence between the largest and smallest observation.
Range=xmax −x min

GROWTH RATE is a variation on the geometric mean used to find the

average growth rate for a time series.

GR=
√
n−1 xn
x1
−1

POPULATION VARIANCE σ
CV = × 100
μ
N

∑ (x i−μ)2 SAMPLE VARIANCE

σ 2= i =1
N n

∑ (xi −x)2
POPULATION STANDARD s2= i=1
DEVIATION n

√
N SAMPLE STANDARD
∑ (x i−μ) 2
DEVIATION
i=1
σ=

√
N n

∑ ( xi −x)2
POPULATION COEFFICIENT s= i=1

OF VARIATION n−1
SAMPLE COEFFICIENT OF s
CV = ×100
VARIATION x

POPULATION MEAN SAMPLE MEAN ABSOLUTE

ABSOLUTE DEVIATION DEVIATION
N

∑ ¿ x i−μ∨¿
MAD= i=1 ¿
N
n

∑ ¿ x i−x∨¿
MAD= i=1 ¿
n

DEVIATION
x i−μ ( μ is the population mean)

x i−x ( x is the sample mean)

The EMPIRICAL RULE demonstrtes that for data from a normal distribution,
we expect the interval  ± k to contain a known percentage of data. For:
• k = 1, 68.26% will lie within m + 1σ
• k = 2, 95.44% will lie within m + 2σ
• k = 3, 99.73% will lie within m + 3σ
A STANDARDIZED xi −x
VARIABLE (z) redefines each observation
xi −μ
in terms of
i=
the number of zstandard deviations from the mean. z i=
s σ

SAMPLE POPULATION
NEGATIVE z means that the POSITIVE z means that the
observation is to the LEFT of the observation is to the RIGHT of the
mean. mean.

For a normal distribution, the range of values is almost 6σ (from µ – 3σ to µ +

3σ). If you know the range R (high – low), you can estimate the standard
deviation as σ = R/6.

PERCENTILES are data that have been divided into 100 groups.
DECILES are data that have been divided into 10 groups.
QUINTILES are data that have been divided into 5 groups.
QUARTILES are data that have been divided into 4 groups.
Median (Q2)

Interquartile Range (Q1−Q3)

Inner fences Outer fences
Lower fence Q1 – 1.5 (Q3 – Q1) Q1 – 3.0 (Q3 – Q1)
Upper fence Q3 + 1.5 (Q3 – Q1) Q3 + 3.0 (Q3 – Q1)

MIDHINGE
Q 1+ Q 3
Midhinge=
2

The CORRELATION COEFFICIENT is a statistic that describes the degree

of LINEARITY between paired observations on two quantitative variables X
and Y.
N

∑ ( x i ¿−x )( y i− y )
i=1
r p= ¿

√∑ √∑
N N
2 2
(xi −μ x ) ( y i−μ y )
i=1 i=1

POPULATION
n

∑ (x i ¿−x )( y i− y )
i=1
r s= ¿

√∑ √∑
n n
2 2
(x i−x) ( y i− y)
i=1 i=1

SAMPLE
Note: −1 ≤r ≤+1

The COVARIANCE of two random variables X and Y measures the degree to

which the values of X and Y change together.
N n

∑ (x i¿−x )( y i− y ) ∑ (x i ¿−x )( y i− y )
i=1
σ xy= ¿ s xy= i=1 ¿
N n−1

POPULATION SAMPLE
A CORRELATION COEFFICIENT is the COVARIANCE divided by the
product of the standard deviations of X and Y.
POPULATION σ xy
ρ=
σx σ y

SAMPLE
s xy
r=
sx s y
WEIGHTED MEAN
k k
x=∑ w j x j where ∑ w j=1.00
j=1 j=1

GROUP MEAN

∑ f j m j SKEWNESS
x= j=1
n

( )
n 3
n x i−x
Skewness= ∑
( n−1 ) ( n−2 ) i=1 s
KURTOSIS

( )
n n 2
n ( n+ 1 ) x i−x 3 ( n−1 )
Kurtosis= ∑
( n−1 ) ( n−2 )( n−3 ) i=1 S
−
( n−2 )( n−3 )

SESSION 5 SUMMARY
A RANDOM EXPERIMENT is an observational process whose results cannot
be known in advance. The set of all possible OUTCOMES is the SAMPLE
SPACE for the experiment. A sample space can be finite (is a discrete) or
infinite. If the outcome is a CONTINUOUS measurement, the sample space
cannot be listed but can be described by a rule.

An EVENT is any subset of outcomes in the sample space. A simple event or

elementary event, is a single outcome. A DISCRETE SAMPLE SPACE
consists of all the simple events (Ei):
S = {E1, E2, …, En}.

The PROBABILITY of an event is a number that measures the relative

likelihood that the event will occur. The probability of event A, denoted as
P(A), must lie within the interval from 0 to 1:
0 ≤ P(A) ≤ 1
 If P(A) = 0, then the event CANNOT occur.
 If P(A) = 1, then the event IS CERTAIN to occur.
P(S) = P(E1) + P(E2) + … + P(En) = 1

CLASSICAL APPROACH is a priori refers to the process of assigning

probabilities before the event is observed or the experiment is conducted. A
priori probabilities are based on logic, not experience. Instead of performing the
experiment, we can use deduction to determine the probability of an event.
THE LAW OF LARGE NUMBERS says that as the number of trials
increases, any empirical probability approaches its theoretical limit.
EMPERICAL APPROACH is using the empirical or relative frequency
approach to assign probabilities by counting the frequency (fi) of observed
outcomes defined on the experimental sample space.
f
P=
n

SUBJECTIVE APPROACH reflects someone’s informed judgment about the

likelihood of an event and is used when there is no repeatable random
experiment.

A TREE DIAGRAM helps you visualize all possible outcomes and starts with
a contingency table. The tree diagram shows all events along with their
marginal, conditional, and joint probabilities.
A JOINT PROBABILITY represents the intersection of two events in a cross-
tabulation table.
The MARGINAL PROBABILITY of a single event is found by dividing a
row or column total by the total sample size.

Type of Symbol & Caculation Figure

Probality

Complement P ( A ) + P ( A ' )=1

P ( A ' )=1−P ( A )
Union A∪B
A or B
(∪: cup)

A∩B
Intersection A and B
(∩: cap)

Law of
Addition P ( A ∪ B )=P ( A ) + P ( B ) −P ( A ∩ B )
(General)

Mutually
Exclusive A ∩ B=ϕ → P ( A ∩ B )=0
Events
Law of
Addition P ( A ∪ B )=P ( A ) + P ( B )
(Special)

Dichotomous
(Binary)
Events

Collectively
Exhaustive
Events

Conditional P( A ∩ B)
P ( A|B )=
Probability P (B )
For P ( B ) >0

General Law
of P ( A ∩B )=P ( A|B ) × P ( B )
Multiplicatio
n
Odd (In P(A) P(A)
=
Favour) P( A )
'
1−P ( A )
Odd (Against) P ( A ' ) 1−P ( A )
=
P(A) P(A)
Independent P ( A ∩ B ) P ( A ) × P(B)
P ( A|B )= = =P( A)
P (B ) P(B)
Multiplicatio
n Law P ( A 1 ∩… ∩ A n ) =P ( A 1 ) … P (A n )
(Independent
Events)
Bayes P ( A|B )=
P( A)× P( B∨ A)
Theorem P(B)

SESSION 6 SUMMARY
BAYES’ THEOREM states that the PRIOR (MARGINAL) probability of an
event B is revised after event A has been considered to yield a POSTERIOR
(CONDITIONAL) probability.

If event A can occur in n1 ways and event B can occur in n2 ways, then events A
and B can occur in n1 ×n 2 ways.
In general, m events can occur n1 ×n 2 × …× nm ways.

The number of ways that n items can be arranged in a particular order and is the
product of all integers from 1 to n is n FACTORIAL. Factorials are useful for
counting the possible arrangements of any n items.
n !=n × ( n−1 ) × ( n−2 ) × … ×1

A PERMUTATION is an arrangement in a particular order of r randomly

sampled items from a group of n.
n n!
Pr =
(n−r )!

A COMBINATION is an arrangement of r items chosen at random from n

items where the order of the selected items is not important.
n n!
Cr=
r ! (n−r )!
A RANDOM VARIABLE is a function or rule that assigns a numerical value
to each outcome in the sample space of a random experiment.

A DISCRETE RANDOM VARIABLE has a countable number of distinct

values.
A DISCRETE PROBABILITY DISTRIBUTION assigns a probability to
each value of a discrete random variable X.
0 ≤ P ( x i ) ≤ 1(the probability for any given value of X)
n

∑ P ( x i )=1(the ∑ ¿ all values of X )

i=1

A PROBABILITY DISTRIBUTION FUNCTION (PDF) is a mathematical

function that shows the probability of each X-value.
A CUMULATIVE DISTRIBUTION FUNCTION (CDF) is a mathematical
function that shows the cumulative sum of probabilities, adding from the
smallest to the largest X-value, gradually approaching unity.

The EXPECTED VALUE E(X) of a discrete random variable is the sum of all
X-values weighted by their respective probabilities. E(X) is a measure of central
tendency.
N
μ=E ( X )=∑ x i P (x i)
i=1

The VARIANCE is a weighted average of the dispersion about the mean and a
measure of variability.
N
Var ( X )=σ =∑ ( x i−μ ) P(x i )
2 2

i=1

The STANDARD DEVIATION is the square root of the variance.

σ =√ σ 2= √ Var ( X)

The UNIFORM DISTRIBUTION describes a random variable with a finite

number of integer values from a to b (the only two parameters). Each value of
the random variable is equally likely to occur.
SESSION 7 & 8 SUMMARY
A random experiment with only 2 outcomes is a BERNOULLI
EXPERIMENT.
One outcome is arbitrarily labeled a “success” (denoted X = 1) and the other a
“failure” (denoted X = 0).
π is the P(success), 1 – π is the P(failure). “Success” is defined as the less likely
outcome so that π < 0.5 for convenience.
P(0) + P(1) = (1 – π) + π = 1 and 0 ≤ π ≤ 1
The expected value (mean) and variance of a Bernoulli experiment is calculated
as: E(X) = π and V(X) = π(1 – π).

The BINOMIAL DISTRIBUTION arises when a Bernoulli experiment is

repeated n times. Each trial is independent so the probability of success p
remains constant on each trial.
In a binomial experiment, we are interested in X = number of successes in n
trials.
X =X 1 + X 2 +…+ X n.

The probability of a particular number of successes P(X) is determined by

parameters n and π.

COMPOUND EVENTS are individual probabilities added to obtain any

desired event probability. Cumulative probabilities can be evaluated by
summing individual X probabilities.

The POISSON DISTRIBUTION describes the number of occurrences within a

continuous interval – a randomly chosen unit of time or space.
The events in Poisson distribution occur randomly and independently over a
continuum of time or space. We will call the continuum “time” since the most
common Poisson application is modeling arrivals per unit of time. The Poisson
model’s only parameter is λ, which represents the mean number of events per
unit of time or space.
The Poisson distribution may be used to approximate a binomial by setting λ =
nπ. This approximation is helpful when the binomial calculation is difficult.
The general rule for a good approximation is that n should be “large” and
should be “small.” A common rule of thumb says the approximation is that n
should be “large” and π should be “small.”
A common rule of thumb says the approximation is adequate if n ≥ 20 and π ≤
0.05.

DISCRETE VARIABLE – each value of X has its own probability P(X).

CONTINUOUS VARIABLE – events are intervals and probabilities are areas
under continuous curves. A single point has no probability.
CONTINUOUS PDF is denoted f(x), must be nonnegative and the total area
under curve = 1.
CONTINUOUS CDF is denoted F(x), shows P(X ≤ x), the cumulative
proportion of scores and useful for finding probabilities.
CONTINUOUS PROBABILITY FUNCTIONS, unlike discrete distributions,
has the probability at any single point = 0. The entire area under any PDF, by
definition, is set to 1.

NORMAL DISTRIBUTION has symmetric distribution and unimodal about

the mean. We can use normal distribution to estimate binominal and Poisson
distribution.

If events per unit of time follow a Poisson distribution, the time until the next
event follows the EXPONENTIAL DISTRIBUTION. The time until the next
event is a continuous variable.
Type of Formula
Distribution
Uniform
Discrete
Distribution

Binomial
Distribution

Poisson
Distribution
Normal
Distribution

Standard
Normal
Distriution

Exponential
Distribution

SESSION 9 SUMMARY
SAMPLE MEAN is considered a random variable.
SAMPLE STATISTIC is a random variable whose value depends on which
population items are included in the random sample.
SAMPLING DISTRIBUTION is a distribution of a sample statistic when we
repeat sampling with the same sample size.
ESTIMATOR is a statistic derived from a sample to infer the value of a
population parameter.
ESTIMATE is the value of the estimator in a particular sample.

CENTRAL LIMIT THEOREM states that sample mean will have a normal
distribution even if the distribution is not normal, as long as the sample size is
large enough.
Population shape is SIMMETRIC: n ≥ 15
Population shape is NOT KNOWN: n ≥ 30

SESSION 10 & 11 SUMMARY

POPULATION PROPORTION (denoted as π) is considered a random
variable.
μ x =nπ

σ x =nπ ( 1−π ) → σ x =√ nπ (1−π )

SAMPLE PROPORTION is considered a sample mean.

x
p= → μ p=π
n

σ p=
2 π (1−π )
n
→ σ p=
√
π (1−π)
n
p−π
N (0 ; 1)

√
Standardize P N ¿ ] to π (1−π )
n
CONFIDENCE INTERVAL
2
σx x−μ x
Standarddize x N (μ x ; ) to z x = N (0 ; 1)
n σ x √n

If μ x is known:
σx
 Lower limit: x−1 , 96
√n
σx
 Upper limit: x +1 ,96
√n
σx σx
95 % CI=( x−1 , 96 ; x+1 , 96 )
√n √n
σx σx
99 % CI =( x−2 , 58 ; x +2 , 58 )
√n √n
If μ x is not known:

√
n

∑ (x i−x)2
i=1
sx=
n−1
x−μ
t=
sx
√n
sx
( 1−α ) 100 % CI =x ± z
1−
α
2 √n
sx
With z =e (marginal error )
1−
α
2 √n
2 2
(z α ) σx
σx 1−
2
e=z × → n=
1−
α
2 √n e2

sx
( 1−α ) 100 % CI =x ± t
1−
α
2 √n
sx
With t =e(marginal error )
1−
α
2 √n
2 2
(t α ) σx
σx 1−
2
e=t × → n=
1−
α
2 √n e2
( 1−α ) 100 % CI = p ± z
1−
α
2 √ p (1− p)
n

With z
1−
α
2 √ p (1− p)
n
=e(marginal error )

( )
2
z p(1−p)

√
α
p (1− p) 1−
2
e=z α → n= 2
1−
2 n e

ONE-SIDED / TAILED TEST

TYPE 1 ERROR is rejecting H 0 when in fact H 0 is true.
P (type 1 error) = α
TYPE 2 ERROR is not rejecting H 0 when in fact H 0 is false.
P (type 2 error) = β

H 0 : μ=a

H 1 : μ >a

Decide on the significant level α

Look up the CDF table of standard normal or t-table for critical value c
x−μ x−μ
zx= ∨t x =
σx sx
√n √n
Compare with c:
 If z x <c∨t x < c, do not reject H 0 at α% level.
 If z x >c∨t x > c, reject H 0 at α% level.

π=a
π>a
Decide on the significant level α
Look up the CDF table of standard normal
p−x
z p=

√ p ( 1− p )
n

Compare with c with the same way above

H 0 : μ=a

H1: μ ≠ a

Decide on the significant level α

Look up the CDF table of standard normal or t-table for critical value c
x−μ x−μ
zx= ∨t x =
σx sx
√n √n
Compare with c:
 If |z x|< c∨|t x|< c, do not reject H 0 at α% level.
 If|z x|> c∨|t x|> c, reject H 0 at α% level.

P-VALUE
H 0 : μ=a

H1: μ ≠ a

x−μ x−μ
zx= ∨t x =
σx sx
√n √n
P−value=P ( z ←|z x|∨z >|z x|) =P ( t ←|t x|∨t >|t x|)

 If P-value > α, do not reject H 0 at α% level

 If P-value < α, reject H 0 at α% level

SESSION 12 SUMMARY
H 0 : μ1=μ2 → μ1−μ 2=0

H 1 : μ 1 ≠ μ2 → μ1−μ2 ≠ 0

1. Independent Sample, σ 1∧σ 2 unknown, σ 1=σ 2

Pooled variance:
2 ( n 1−1 ) s 12 + ( n2−1 ) s 22
sp =
n1+ n2−2

Test statistic:
( x1 −x2 ) −( μ1 −μ 2)
t stat =

√ s p2 (
1 1
+ )
n 1 n2

( 1−α ) 100 % CI =x 1−x 2 ± t

1−
α
2 √ 2
sp (
1 1
+ )
n1 n2

2. Related / Matched Samples

d i=x 1 i−x 2 i

d=
∑ x 1 i−∑ x 2 i
n

sd
2
=
∑ (d i−d)
2

n−1
2
sd
2
sd =
n

Test statistic:
d−(μ 1−μ2)
t stat = where df =n−1
sd / √ n

sd
( 1−α ) 100 % CI =d ±t
1−
α
2 √n

H 0 :π 1=π 2 → π 1−π 2=0

H 1 : π 1 ≠ π 2 → π 1 −π 2 ≠ 0

3. Population proportion
x 1+ x2
p=
n1 + n2

σ ∆ p=
√ p (1− p) p(1− p)
n1
+
n2

Test statistic
( p 1− p2 )−(π 1−π 2)
z stat =

√ p (1− p)(
1 1
+ )
n1 n2

( 1−α ) 100 % CI = p1− p2 ± z

1−
α
2 √ p(1−p)(
1 1
+ )
n1 n2

EXTRA SESSION SUMMARY

2 2
H 0 :σ 1 =σ 2
2 2
H1: σ1 ≠ σ 2

Decide on the significance level α

Look up the F table for critical value c
Compute test statistic
2
s1
F−stat= 2
Fn −1 , n −1
s2 1 2

Compare F – stat with c

If F – stat < c, do not reject H 0 at α% level.
If F – stat > c, reject H 0 at α% level.

H 0 : μ1 ¿ μ2 =…=μ c

H 1 : H 0 is not true

Group / Individual Mean

Group / Overall Mean

SSB/(c−1)
F−stat= F
SSE /(n−c ) c−1 ,n−c
POPULATION SIMPLE REGRESSION MODEL
y i=β 0 + β 1 x i +ε i

y i isdependent variable(explained variable)

x i is∈dependent variable (explanatary variable)

i is the observation, i = 1, …, N
β 0 is intercept

β 1 is slope

ε i is disturbance ( error ) , contains all the unobservable factor

Total derivatives of y i=β 0 + β 1 X i+ ε i: ∆ y =β 1 ∆ x + ∆ ε

If ∆ ε=0 → ∆ y= β1 ∆ x

SAMPLE REGRESSION MODEL

y i=b 0 +b1 x i+ ei isthe actual value

^y i=b 0 +b1 x i is the fitted∨ predicted value(on the regression line)

y i= ^y i + ei ( e i is residual )

ORDINARY LEAST SQUARE

n n

,b ∑ ei = ∑ ( y i −b0−b1 x i )
2 2
minb 0 1
i=1 i=1

b 1=
∑ ( x i−x)( y i− y )
∑ ( xi −x)
b 0= y−b1 x

Introduction to Statistics Basics
No ratings yet
Introduction to Statistics Basics
65 pages
Descriptive and Inferential Statistics Overview
No ratings yet
Descriptive and Inferential Statistics Overview
10 pages
Ôn tập lý thuyết - SB - chap 1-5
No ratings yet
Ôn tập lý thuyết - SB - chap 1-5
12 pages
Statistical Analysis Techniques Overview
No ratings yet
Statistical Analysis Techniques Overview
7 pages
Understanding Statistics Basics
No ratings yet
Understanding Statistics Basics
6 pages
Biostatistics Overview and Data Analysis
No ratings yet
Biostatistics Overview and Data Analysis
44 pages
Understanding Statistics and Research Methods
No ratings yet
Understanding Statistics and Research Methods
4 pages
Elementary Statistics and Probability Chapter 1 3
No ratings yet
Elementary Statistics and Probability Chapter 1 3
5 pages
Statistics and Probability Basics
No ratings yet
Statistics and Probability Basics
6 pages
Psychological Statistics Reviewer
No ratings yet
Psychological Statistics Reviewer
8 pages
Statistical Techniques Overview
No ratings yet
Statistical Techniques Overview
118 pages
Introduction to Statistics Concepts
No ratings yet
Introduction to Statistics Concepts
6 pages
Understanding Statistics and Data Analysis
No ratings yet
Understanding Statistics and Data Analysis
41 pages
GAC - Math Definition - Statistics
100% (1)
GAC - Math Definition - Statistics
3 pages
Understanding Statistics and Data Analysis
No ratings yet
Understanding Statistics and Data Analysis
11 pages
Understanding Statistics: Methods & Measures
No ratings yet
Understanding Statistics: Methods & Measures
8 pages
Overview of Statistics in Business
No ratings yet
Overview of Statistics in Business
32 pages
Data Management (1)
No ratings yet
Data Management (1)
46 pages
Stats Reviewer
No ratings yet
Stats Reviewer
5 pages
Inferential Statistics Course
No ratings yet
Inferential Statistics Course
46 pages
Understanding Statistics: Key Concepts
No ratings yet
Understanding Statistics: Key Concepts
52 pages
Lesson 1: Engineering Data Analysis First Semester - A.Y. 2021 - 2022
100% (1)
Lesson 1: Engineering Data Analysis First Semester - A.Y. 2021 - 2022
4 pages
SLIDES Statistics-Chapter 2
No ratings yet
SLIDES Statistics-Chapter 2
31 pages
STK11O - Chapter 1-7 Notes
No ratings yet
STK11O - Chapter 1-7 Notes
22 pages
Understanding Variables and Data Analysis
No ratings yet
Understanding Variables and Data Analysis
9 pages
Statistical Analysis Basics
100% (1)
Statistical Analysis Basics
143 pages
MMW Reviewer
No ratings yet
MMW Reviewer
3 pages
Understanding Statistics Basics
No ratings yet
Understanding Statistics Basics
4 pages
Chap 1-4 Reviewer Psych Stats
67% (3)
Chap 1-4 Reviewer Psych Stats
3 pages
Biostatistics Sampling Techniques Guide
No ratings yet
Biostatistics Sampling Techniques Guide
4 pages
Comprehensive Guide to Biostatistics
No ratings yet
Comprehensive Guide to Biostatistics
72 pages
Introduction To Statistics: "There Are Three Kinds of Lies: Lies, Damned Lies, and Statistics." (B.Disraeli)
No ratings yet
Introduction To Statistics: "There Are Three Kinds of Lies: Lies, Damned Lies, and Statistics." (B.Disraeli)
32 pages
Data Management and Statistical Analysis
No ratings yet
Data Management and Statistical Analysis
6 pages
Statistical Foundations for Data Analysis
No ratings yet
Statistical Foundations for Data Analysis
108 pages
Economics Sem 4 Notes
No ratings yet
Economics Sem 4 Notes
25 pages
Intro to Statistics Basics
No ratings yet
Intro to Statistics Basics
89 pages
Understanding Econometrics Basics
No ratings yet
Understanding Econometrics Basics
105 pages
Basics of Statistics and Data Collection
No ratings yet
Basics of Statistics and Data Collection
26 pages
STA301 Statistics Short Notes Guide
No ratings yet
STA301 Statistics Short Notes Guide
33 pages
Overview of Statistical Concepts and Methods
No ratings yet
Overview of Statistical Concepts and Methods
3 pages
Untitled Document (3) 2
No ratings yet
Untitled Document (3) 2
3 pages
Data Collection and Statistical Analysis
No ratings yet
Data Collection and Statistical Analysis
64 pages
Statistics Glossary and Concepts Guide
No ratings yet
Statistics Glossary and Concepts Guide
5 pages
Data Organization in Statistical Research
No ratings yet
Data Organization in Statistical Research
6 pages
Statistical Analysis Fundamentals
No ratings yet
Statistical Analysis Fundamentals
143 pages
Statistics Unit 1
No ratings yet
Statistics Unit 1
25 pages
Introduction to Educational Statistics
No ratings yet
Introduction to Educational Statistics
41 pages
Biostatistics Notes-Numbered
No ratings yet
Biostatistics Notes-Numbered
21 pages
Understanding Statistical Analysis Basics
No ratings yet
Understanding Statistical Analysis Basics
4 pages
Statistics Class Notes
No ratings yet
Statistics Class Notes
31 pages
Data Collection and Statistical Methods
No ratings yet
Data Collection and Statistical Methods
4 pages
Descriptive Statistics: Graphs & Tables Guide
No ratings yet
Descriptive Statistics: Graphs & Tables Guide
34 pages
Understanding Basic Statistics Concepts
No ratings yet
Understanding Basic Statistics Concepts
4 pages
Statistics Midterm Ans
No ratings yet
Statistics Midterm Ans
1 page
1 Basic Statistics Unit-I Class
No ratings yet
1 Basic Statistics Unit-I Class
30 pages
Data Management in Statistics Guide
No ratings yet
Data Management in Statistics Guide
3 pages
Introduction to Statistics Basics
No ratings yet
Introduction to Statistics Basics
40 pages
Research Methods in Sampling and Statistics
No ratings yet
Research Methods in Sampling and Statistics
109 pages
Managerial Accounting Midterm Exam 2023
No ratings yet
Managerial Accounting Midterm Exam 2023
9 pages
Compound Interest Calculations Explained
No ratings yet
Compound Interest Calculations Explained
5 pages
Analysis of Musician Deaths and Statistics
No ratings yet
Analysis of Musician Deaths and Statistics
8 pages
Statistical Probability and Hypothesis Testing
No ratings yet
Statistical Probability and Hypothesis Testing
6 pages
Economic Strategies in Tournament Play
No ratings yet
Economic Strategies in Tournament Play
4 pages
Business Law Essay: Tort Case Analysis
No ratings yet
Business Law Essay: Tort Case Analysis
10 pages
Combinatorial Probability
No ratings yet
Combinatorial Probability
22 pages
Statistics For Data Science 2 - Doaa
No ratings yet
Statistics For Data Science 2 - Doaa
62 pages
Lecture - 3 Probability Theory
No ratings yet
Lecture - 3 Probability Theory
25 pages
2 Probability I
No ratings yet
2 Probability I
46 pages
L5 6 Deterministic Vs Random Experiments
No ratings yet
L5 6 Deterministic Vs Random Experiments
90 pages
Probability & Random Processes
No ratings yet
Probability & Random Processes
4 pages
Ade STA127
No ratings yet
Ade STA127
22 pages
C 15 Discrete Random Variables 1
No ratings yet
C 15 Discrete Random Variables 1
44 pages
Number Systems: Mathematical Formulas
No ratings yet
Number Systems: Mathematical Formulas
7 pages
Detailed Lesson Plan For Grade 10
No ratings yet
Detailed Lesson Plan For Grade 10
9 pages
Probability Basics for Scientists & Engineers
No ratings yet
Probability Basics for Scientists & Engineers
25 pages
Mathematical Statistics
No ratings yet
Mathematical Statistics
160 pages
RAMESH BABU - Probability Theory and Random Processes-MC GRAW HILL INDIA (2014)
100% (7)
RAMESH BABU - Probability Theory and Random Processes-MC GRAW HILL INDIA (2014)
922 pages
Probability 3
No ratings yet
Probability 3
16 pages
Lec 5
No ratings yet
Lec 5
75 pages
Statistical Models
100% (1)
Statistical Models
248 pages
8 SLK Q4 7 8
No ratings yet
8 SLK Q4 7 8
11 pages
Probability Union of Two Events
No ratings yet
Probability Union of Two Events
9 pages
Grade 11 Statistics Lesson Log: Random Variables
No ratings yet
Grade 11 Statistics Lesson Log: Random Variables
6 pages
Events: Union & Intersection Explained
No ratings yet
Events: Union & Intersection Explained
8 pages
Econ 131 Problem Set 1 Guide
No ratings yet
Econ 131 Problem Set 1 Guide
3 pages
Basics of Probability Explained
No ratings yet
Basics of Probability Explained
44 pages
Mathematics Remedial Module
100% (1)
Mathematics Remedial Module
102 pages
Probability Problems and Solutions
No ratings yet
Probability Problems and Solutions
6 pages
Raghunath Chatterjee Probability Lecture
No ratings yet
Raghunath Chatterjee Probability Lecture
48 pages
Statistic and Probabilty q3
No ratings yet
Statistic and Probabilty q3
27 pages
AS-Level Maths:: Statistics 1
0% (1)
AS-Level Maths:: Statistics 1
32 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
EE 214 Week 4 5 Module
No ratings yet
EE 214 Week 4 5 Module
13 pages
Grade 11 Statistics Module
No ratings yet
Grade 11 Statistics Module
5 pages

Analyzing Motel Occupancy Rates

Uploaded by

Analyzing Motel Occupancy Rates

Uploaded by

SESSION 1 SUMMARY

STATISTICS is the science of collecting, organizing, analyzing, interpreting

There are TWO kinds of statistics:

EMPIRICAL DATA represent data collected through observation and

There are EIGHT pitfalls in statistics:

OBSERVATION is a single member of a collection of items that we want to

Data Set Variables

TIME SERIES DATA are collected from one observation in a different

STATISTICS are computed from a sample of n items, chosen from a

TARGET POPULATION is the population we are interested in.

If we allow duplicates when sampling, then we are sampling WITH

SIMPLE RANDOM SAMPLING means that we use random numbers to

DOT PLOT is the simplest graphical display of n individual values of

A HISTOGRAM is a graphical representation of a frequency distribution (a

A histogram may suggest the shape of the population. It is influenced by the

SIMPLE LINE CHART is used to display a time series or spot trends or to

ARITHMETIC SCALE – distances on the Y-axis are proportional to the

CUMULATIVE is the sum from the smaller values to the value of

In Stata, a NUMBER is colored BLACK, a WORD is colored RED and a

The MEDIAN (M) is the 50TH PERCENTILE or MIDPOINT of the ordered

POPULATION MEAN SAMPLE MEAN

N: Population size n: Sample size

µ = E(X): Expected value of a discrete random variable X.

GROWTH RATE is a variation on the geometric mean used to find the

∑ (x i−μ)2 SAMPLE VARIANCE

POPULATION MEAN SAMPLE MEAN ABSOLUTE

x i−x ( x is the sample mean)

For a normal distribution, the range of values is almost 6σ (from µ – 3σ to µ +

Interquartile Range (Q1−Q3)

The CORRELATION COEFFICIENT is a statistic that describes the degree

The COVARIANCE of two random variables X and Y measures the degree to

An EVENT is any subset of outcomes in the sample space. A simple event or

The PROBABILITY of an event is a number that measures the relative

CLASSICAL APPROACH is a priori refers to the process of assigning

SUBJECTIVE APPROACH reflects someone’s informed judgment about the

Type of Symbol & Caculation Figure

Complement P ( A ) + P ( A ' )=1

A PERMUTATION is an arrangement in a particular order of r randomly

A COMBINATION is an arrangement of r items chosen at random from n

A DISCRETE RANDOM VARIABLE has a countable number of distinct

∑ P ( x i )=1(the ∑ ¿ all values of X )

A PROBABILITY DISTRIBUTION FUNCTION (PDF) is a mathematical

The STANDARD DEVIATION is the square root of the variance.

The UNIFORM DISTRIBUTION describes a random variable with a finite

The BINOMIAL DISTRIBUTION arises when a Bernoulli experiment is

The probability of a particular number of successes P(X) is determined by

COMPOUND EVENTS are individual probabilities added to obtain any

The POISSON DISTRIBUTION describes the number of occurrences within a

DISCRETE VARIABLE – each value of X has its own probability P(X).

NORMAL DISTRIBUTION has symmetric distribution and unimodal about

SESSION 10 & 11 SUMMARY

σ x =nπ ( 1−π ) → σ x =√ nπ (1−π )

SAMPLE PROPORTION is considered a sample mean.

ONE-SIDED / TAILED TEST

Decide on the significant level α

Compare with c with the same way above

Decide on the significant level α

 If P-value > α, do not reject H 0 at α% level

1. Independent Sample, σ 1∧σ 2 unknown, σ 1=σ 2

( 1−α ) 100 % CI =x 1−x 2 ± t

2. Related / Matched Samples

H 0 :π 1=π 2 → π 1−π 2=0

( 1−α ) 100 % CI = p1− p2 ± z

EXTRA SESSION SUMMARY

Decide on the significance level α

Compare F – stat with c

Group / Individual Mean

Group / Overall Mean

y i isdependent variable(explained variable)

x i is∈dependent variable (explanatary variable)

ε i is disturbance ( error ) , contains all the unobservable factor

Total derivatives of y i=β 0 + β 1 X i+ ε i: ∆ y =β 1 ∆ x + ∆ ε

SAMPLE REGRESSION MODEL

^y i=b 0 +b1 x i is the fitted∨ predicted value(on the regression line)

ORDINARY LEAST SQUARE

You might also like