Analyzing Motel Occupancy Rates
Analyzing Motel Occupancy Rates
Verbal Label
Categorical
(Qualiative)
Coded
Type of Data
Discrete
Numerical
(Quantiative)
Continuous
SESSION 2 SUMMARY
LIKERT SCALES is a special case of interval data frequently used in survey
research.
SAMPLE involves looking only at some items selected from the population.
CENSUS is an examination of all items in a defined population.
SESSION 3 SUMMARY
Method of organizing, exploring and summarizing data include:
- VISUAL (CHARTS and GRAPHS) provides insight into characteristics
of a data set WITHOUT USING MATHEMATICS.
- NUMERICAL (STATISTICS or TABLES) provides insight into
characteristics of a data set USING MATHEMATICS.
MODAL CLASS is a histogram bar that is higher than those on either side.
Modal classes may be artifacts of the way bin limits are chosen.
There are THREE type of modal class:
- UNIMODAL – a single modal class.
- BIMODAL – two modal classes.
- MULTIMODAL – more than two modal classes.
A LOG SCALE is useful for time series data that might be expected to grow at
a compound annual percentage rate. Use a log scale for the vertical axis when
data vary over a wide range (by more than an order of magnitude). It reveals
whether the quantity is growing at:
- INCREASING PERCENT (CONCAVE UPWARD).
- CONSTANT PERCENT (STRAIGHT LINE).
- DECLINING PERCENT (CONCAVE DOWNWARD).
COLUMN CHART is a vertical display of the data.
BAR CHART is a horizontal display of the data.
PATERO CHART is a special type of bar chart used in quality management to
display the frequency of defects or errors of different types. Categories are
displayed in descending order of frequency. Focus on SIGNIFICANT FEW.
BAR HEIGHT is the sum of several subtotals. Areas may be compared by
color to show patterns in the subgroups and total.
A PIE CHART can only convey a general idea of the data. Pie charts should be
used to portray data which sum to a total. A pie chart should only have a few
(i.e., slices. Each slice can be labeled with data values or percents.
SCATTER PLOTS can convey patterns in data pairs that would not be
apparent from a table. A scatter plot is a starting point for BIVARIATE DATA
ANALYSIS in which we investigate the association and relationship between
two variables. A scatter plot can convey patterns in data pars that would not be
apparent from a table.
There are ELEVEN deceptive graphing techniques:
ERROR 1: Nonzero Origin
ERROR 2: Elastic Graph Proportions
ERROR 3: Dramatic Title and Distracting Pictures
ERROR 4: 3-D and Novelty Graphs
ERROR 5: Rotated Graphs
ERROR 6: Unclear Definitions or Scales
ERROR 7: Vague Sources
ERROR 8: Complex Graphs
ERROR 9: Gratuitous Effects
ERROR 10: Estimated Data
ERROR 11: Area Trick
∑ Xi
μ= i=1
N
n
∑ xi
μ= i=1
n
SESSION 4 SUMMARY
MODE is the most frequently occurring data value and may have multiple
modes or no mode. Mode is most useful for DISCRETE or CATEGORICAL
DATA with only a few distinct data values. For CONTINUOUS DATA or
DATA WITH A WIDE RANGE, the mode is rarely useful.
Type of Variable Best Measure of Central Tendency
Norminal Mode
Ordinal Median
Interval / Ratio (not skewed) Mean
Interval / Ratio (skewed) Median
VARIATION is the SPREAD of data points about the center of the distribution
in a sample.
RANGE is the differrence between the largest and smallest observation.
Range=xmax −x min
GR=
√
n−1 xn
x1
−1
POPULATION VARIANCE σ
CV = × 100
μ
N
∑ (xi −x)2
POPULATION STANDARD s2= i=1
DEVIATION n
√
N SAMPLE STANDARD
∑ (x i−μ) 2
DEVIATION
i=1
σ=
√
N n
∑ ( xi −x)2
POPULATION COEFFICIENT s= i=1
OF VARIATION n−1
SAMPLE COEFFICIENT OF s
CV = ×100
VARIATION x
∑ ¿ x i−μ∨¿
MAD= i=1 ¿
N
n
∑ ¿ x i−x∨¿
MAD= i=1 ¿
n
DEVIATION
x i−μ ( μ is the population mean)
The EMPIRICAL RULE demonstrtes that for data from a normal distribution,
we expect the interval ± k to contain a known percentage of data. For:
• k = 1, 68.26% will lie within m + 1σ
• k = 2, 95.44% will lie within m + 2σ
• k = 3, 99.73% will lie within m + 3σ
A STANDARDIZED xi −x
VARIABLE (z) redefines each observation
xi −μ
in terms of
i=
the number of zstandard deviations from the mean. z i=
s σ
SAMPLE POPULATION
NEGATIVE z means that the POSITIVE z means that the
observation is to the LEFT of the observation is to the RIGHT of the
mean. mean.
PERCENTILES are data that have been divided into 100 groups.
DECILES are data that have been divided into 10 groups.
QUINTILES are data that have been divided into 5 groups.
QUARTILES are data that have been divided into 4 groups.
Median (Q2)
MIDHINGE
Q 1+ Q 3
Midhinge=
2
∑ ( x i ¿−x )( y i− y )
i=1
r p= ¿
√∑ √∑
N N
2 2
(xi −μ x ) ( y i−μ y )
i=1 i=1
POPULATION
n
∑ (x i ¿−x )( y i− y )
i=1
r s= ¿
√∑ √∑
n n
2 2
(x i−x) ( y i− y)
i=1 i=1
SAMPLE
Note: −1 ≤r ≤+1
∑ (x i¿−x )( y i− y ) ∑ (x i ¿−x )( y i− y )
i=1
σ xy= ¿ s xy= i=1 ¿
N n−1
POPULATION SAMPLE
A CORRELATION COEFFICIENT is the COVARIANCE divided by the
product of the standard deviations of X and Y.
POPULATION σ xy
ρ=
σx σ y
SAMPLE
s xy
r=
sx s y
WEIGHTED MEAN
k k
x=∑ w j x j where ∑ w j=1.00
j=1 j=1
GROUP MEAN
∑ f j m j SKEWNESS
x= j=1
n
( )
n 3
n x i−x
Skewness= ∑
( n−1 ) ( n−2 ) i=1 s
KURTOSIS
( )
n n 2
n ( n+ 1 ) x i−x 3 ( n−1 )
Kurtosis= ∑
( n−1 ) ( n−2 )( n−3 ) i=1 S
−
( n−2 )( n−3 )
SESSION 5 SUMMARY
A RANDOM EXPERIMENT is an observational process whose results cannot
be known in advance. The set of all possible OUTCOMES is the SAMPLE
SPACE for the experiment. A sample space can be finite (is a discrete) or
infinite. If the outcome is a CONTINUOUS measurement, the sample space
cannot be listed but can be described by a rule.
A TREE DIAGRAM helps you visualize all possible outcomes and starts with
a contingency table. The tree diagram shows all events along with their
marginal, conditional, and joint probabilities.
A JOINT PROBABILITY represents the intersection of two events in a cross-
tabulation table.
The MARGINAL PROBABILITY of a single event is found by dividing a
row or column total by the total sample size.
A∩B
Intersection A and B
(∩: cap)
Law of
Addition P ( A ∪ B )=P ( A ) + P ( B ) −P ( A ∩ B )
(General)
Mutually
Exclusive A ∩ B=ϕ → P ( A ∩ B )=0
Events
Law of
Addition P ( A ∪ B )=P ( A ) + P ( B )
(Special)
Dichotomous
(Binary)
Events
Collectively
Exhaustive
Events
Conditional P( A ∩ B)
P ( A|B )=
Probability P (B )
For P ( B ) >0
General Law
of P ( A ∩B )=P ( A|B ) × P ( B )
Multiplicatio
n
Odd (In P(A) P(A)
=
Favour) P( A )
'
1−P ( A )
Odd (Against) P ( A ' ) 1−P ( A )
=
P(A) P(A)
Independent P ( A ∩ B ) P ( A ) × P(B)
P ( A|B )= = =P( A)
P (B ) P(B)
Multiplicatio
n Law P ( A 1 ∩… ∩ A n ) =P ( A 1 ) … P (A n )
(Independent
Events)
Bayes P ( A|B )=
P( A)× P( B∨ A)
Theorem P(B)
SESSION 6 SUMMARY
BAYES’ THEOREM states that the PRIOR (MARGINAL) probability of an
event B is revised after event A has been considered to yield a POSTERIOR
(CONDITIONAL) probability.
If event A can occur in n1 ways and event B can occur in n2 ways, then events A
and B can occur in n1 ×n 2 ways.
In general, m events can occur n1 ×n 2 × …× nm ways.
The number of ways that n items can be arranged in a particular order and is the
product of all integers from 1 to n is n FACTORIAL. Factorials are useful for
counting the possible arrangements of any n items.
n !=n × ( n−1 ) × ( n−2 ) × … ×1
The EXPECTED VALUE E(X) of a discrete random variable is the sum of all
X-values weighted by their respective probabilities. E(X) is a measure of central
tendency.
N
μ=E ( X )=∑ x i P (x i)
i=1
The VARIANCE is a weighted average of the dispersion about the mean and a
measure of variability.
N
Var ( X )=σ =∑ ( x i−μ ) P(x i )
2 2
i=1
If events per unit of time follow a Poisson distribution, the time until the next
event follows the EXPONENTIAL DISTRIBUTION. The time until the next
event is a continuous variable.
Type of Formula
Distribution
Uniform
Discrete
Distribution
Binomial
Distribution
Poisson
Distribution
Normal
Distribution
Standard
Normal
Distriution
Exponential
Distribution
SESSION 9 SUMMARY
SAMPLE MEAN is considered a random variable.
SAMPLE STATISTIC is a random variable whose value depends on which
population items are included in the random sample.
SAMPLING DISTRIBUTION is a distribution of a sample statistic when we
repeat sampling with the same sample size.
ESTIMATOR is a statistic derived from a sample to infer the value of a
population parameter.
ESTIMATE is the value of the estimator in a particular sample.
CENTRAL LIMIT THEOREM states that sample mean will have a normal
distribution even if the distribution is not normal, as long as the sample size is
large enough.
Population shape is SIMMETRIC: n ≥ 15
Population shape is NOT KNOWN: n ≥ 30
σ p=
2 π (1−π )
n
→ σ p=
√
π (1−π)
n
p−π
N (0 ; 1)
√
Standardize P N ¿ ] to π (1−π )
n
CONFIDENCE INTERVAL
2
σx x−μ x
Standarddize x N (μ x ; ) to z x = N (0 ; 1)
n σ x √n
If μ x is known:
σx
Lower limit: x−1 , 96
√n
σx
Upper limit: x +1 ,96
√n
σx σx
95 % CI=( x−1 , 96 ; x+1 , 96 )
√n √n
σx σx
99 % CI =( x−2 , 58 ; x +2 , 58 )
√n √n
If μ x is not known:
√
n
∑ (x i−x)2
i=1
sx=
n−1
x−μ
t=
sx
√n
sx
( 1−α ) 100 % CI =x ± z
1−
α
2 √n
sx
With z =e (marginal error )
1−
α
2 √n
2 2
(z α ) σx
σx 1−
2
e=z × → n=
1−
α
2 √n e2
sx
( 1−α ) 100 % CI =x ± t
1−
α
2 √n
sx
With t =e(marginal error )
1−
α
2 √n
2 2
(t α ) σx
σx 1−
2
e=t × → n=
1−
α
2 √n e2
( 1−α ) 100 % CI = p ± z
1−
α
2 √ p (1− p)
n
With z
1−
α
2 √ p (1− p)
n
=e(marginal error )
( )
2
z p(1−p)
√
α
p (1− p) 1−
2
e=z α → n= 2
1−
2 n e
H 0 : μ=a
H 1 : μ >a
π=a
π>a
Decide on the significant level α
Look up the CDF table of standard normal
p−x
z p=
√ p ( 1− p )
n
H1: μ ≠ a
P-VALUE
H 0 : μ=a
H1: μ ≠ a
x−μ x−μ
zx= ∨t x =
σx sx
√n √n
P−value=P ( z ←|z x|∨z >|z x|) =P ( t ←|t x|∨t >|t x|)
SESSION 12 SUMMARY
H 0 : μ1=μ2 → μ1−μ 2=0
H 1 : μ 1 ≠ μ2 → μ1−μ2 ≠ 0
Test statistic:
( x1 −x2 ) −( μ1 −μ 2)
t stat =
√ s p2 (
1 1
+ )
n 1 n2
d=
∑ x 1 i−∑ x 2 i
n
sd
2
=
∑ (d i−d)
2
n−1
2
sd
2
sd =
n
Test statistic:
d−(μ 1−μ2)
t stat = where df =n−1
sd / √ n
sd
( 1−α ) 100 % CI =d ±t
1−
α
2 √n
H 1 : π 1 ≠ π 2 → π 1 −π 2 ≠ 0
3. Population proportion
x 1+ x2
p=
n1 + n2
σ ∆ p=
√ p (1− p) p(1− p)
n1
+
n2
Test statistic
( p 1− p2 )−(π 1−π 2)
z stat =
√ p (1− p)(
1 1
+ )
n1 n2
H 0 : μ1 ¿ μ2 =…=μ c
H 1 : H 0 is not true
i is the observation, i = 1, …, N
β 0 is intercept
β 1 is slope
y i= ^y i + ei ( e i is residual )
,b ∑ ei = ∑ ( y i −b0−b1 x i )
2 2
minb 0 1
i=1 i=1
b 1=
∑ ( x i−x)( y i− y )
∑ ( xi −x)
b 0= y−b1 x