Revision Notes: Bob Francis 2005
Revision Notes: Bob Francis 2005
P( X = r )
0.15
0.10
0.05
0.00 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Accept H0
Reject H0
19 20 r
Revision Notes
Statistics 1
Topic
Classifying data
Data Presentation
Examples
References
MEI Stats 1
Pages 12 and 13 Categorical data Discrete data Continuous data Pages 22 to 23 Grouped data
Categorical: Political parties: Conservative, Labour, Liberal Democrats, Greens, etc. Categorical: non-numerical categories, e.g. favourite colours of 30 children, political party Discrete: Goals scored in consecutive games: voted for by 1000 electors. 3, 3, 0, 4, 2, 1, 2, 1, 0, 2, 3, 3, 4, 2, 1, 2, 2, 3, 1, 2 Discrete: numerical taking particular, often integral, values, e.g. number of goals, shoe size Continuous: Heights measured to nearest cm: Continuous: numerical values measured to a given 181, 178, 160, 182, 166, 169, 174, 159, 180, 177, accuracy, e.g. length, weight, time, speed. 177, 182, 173, 174, 161, 177, 185, 166, 166, 186.
Frequency Distributions
Categorical: frequencies of various non-numerical categories, e.g. % supporting political parties. Discrete: frequencies of discrete values, e.g. goals scored by one team in 20 consecutive games. Continuous: frequencies of continuous values in class intervals with associated boundaries, e.g. no. students with heights measured to nearest 5 cm. Grouped discrete data can be treated as if it were continuous, e.g. distribution of marks in a test.
Categorical: Conservative 23% , Labour 42%, Liberal Democrats 18%, Greens 7%, rest 10%
MEI Stats 1
Pages 17 to 19 Frequency distributions Pages 24 to 26 Grouped data
MEI Stats 1
Pages 56 to 58 Bar charts and vertical line charts Pages 62 to 69 Histograms Pages 74 to 77 Cumulative frequency curves
MEI Stats 1
Pages 6 to 8 Stem and leaf diagrams
MEI Stats 1
Pages 73 and 74 Box & whisker plots
Skewness
A frequency distribution for discrete or continuous data may exhibit symmetry, positive skew or negative skew, according to its 'shape'. The discrete frequency distribution example (goals scored by a football team) is roughly symmetrical. The stem and leaf example (distribution of marks) exhibits negative skewness. The distribution of lengths of telephone calls may well exhibit positive skewness, peaking well to the left of the mid-range.
MEI Stats 1
Pages 5 and 6 Shapes of distributions
Statistics 1
Revision Notes
Topic
References
MEI Stats 1
Pages 13 to 16 Central Tendency Pages 24 to 27 Grouped data
MEI Stats 1
Pages 71 and 72 Quartiles
Standard deviation: s =
Dispersion [spread]
Range: maximum value minimum value Inter quartile range (IQR): (upper quartile lower quartile) = Q3 Q1 Sum of squares: Sxx = ( x x ) 2 x2 n x 2 (raw data)
Outliers (a): 174.1 2 9.37 = 155.36 or 192.84 - the value 196 lies beyond these limits, so one outlier Outliers (b): 166 1.5 14.5 = 144.25 180.5 + 1.5 14.5 = 202.25 - no values lie beyond these limits, so no outliers
MEI Stats 1
Pages 31 to 40 Range Sum of squares Root mean square deviation Standard deviation Page 73 Inter-Quartile Range
Frequency Distribution
Goals scored by one team in 20 consecutive games:
Goals scored (x) Frequency (f) 0 2 1 4 2 7 3 5 4 2
Sxx = ( x x ) 2 f x2f n x 2 (frequency dist.) S Mean square deviation: xx rmsd: S xx n n S Variance: xx Standard deviation: s= S xx n 1 n 1
Mode = 2 Midrange = (0 + 4) 2 = 2 Median = 2 (goals scored in 10th and l1th matches) Mean = x =
Using a calculator
Make sure that you can use a scientific or graphical calculator to find the mean [ x ], root mean square deviation, rmsd [n] and standard deviation, s [n-1] of a raw data set and a frequency distribution.
xf 41 = 2.05 = xf 20
Lower quartile Q1 = 1 Upper quartile Q3 = 3 Range = 4 0 = 4 Inter-quartile range (IQR) = 3 1 = 2 Sum of squares: Sxx = x2f n x = 109 20 2.052 = 24.95 Root mean square deviation: rmsd = S xx = 24.95 = 1.12 (3 s.f.)
n
20
2
Graphical calculator
Data Analysis for the TI-83 + accompanying notes
Outliers
Can be applied to data which are: (a) at least 2 standard deviations from the mean i.e. beyond x 2s (b) at least 1.5 IQR beyond the nearer quartile i.e. below Q1 1.5IQR or above Q3 + 1.5IQR
MEI Stats 1
Pages 40 and 41 Outliers & s.d. Pages 73 and 74 Outliers & IQR
Standard deviation: s = S xx = 24.95 = 1.15 (3 s.f.) n 1 19 Outliers (a): 2.05 2 1.15 = 0.25 or 4.35 - no values lie beyond these limits, so no outliers Outliers (b): 1 1 . 5 2 = 2; 3 + 1 . 5 2 = 6 - no values lie beyond these limits, so no outliers For data sets x and y : y = 5x 20: Given x = 24.8 and sx = 7.3:
Coding
If y = ax + b then: y=ax+b
MEI Stats 1
Pages 43 to 45 Linear coding
and
sy = a sx
Revision Notes
Statistics 1
Topic
Probability of events
Probability 1
Examples
Experimental Probability
In a statistical experiment a drawing pin is thrown 100 times, landing point-down 36 times. The probability of event A (the drawing pin landing point down) may be estimated as:
37 = 0.37 P(A) 100 P(A') = 1 P(A) = 1 0.37 = 0.63
References
MEI Stats 1
Pages 87 to 91 Measuring probability Experimental and theoretical probability The complement of an event Expectation (expected frequency)
Probability describes the likelihood of an event occurring in a statistical experiment. Probability is measured on a scale of 0 to 1:
Theoretical Probability The theoretical probability of an event A is given by P( A ) = n( A) where A is the set of n ( ) favourable outcomes and is the set of all possible outcomes. The experimental probability of an event is: number of successes number of trials The complementary event of A is given by A' and is defined as the set of possible outcomes not in set A. Hence P( A') = 1 P( A ) The expectation (expected frequency) of an event is the number of times it is expected to occur in n repetitions of the experiment, and is given by: Expected frequency = n P(A)
An ordinary pack of cards is shuffled and a card chosen at random. The probability of event A (card chosen is a picture card) is calculated by:
3 P(A) = 12 52 = 13 3 = 10 P(A') = 1 P(A) = 1 13 13 If the experiment is repeated 100 times, then the expectation (expected frequency) of a picture card being chosen
Sample Space
Two fair dice are thrown and their scores added. 1 2 3 4 5 6 +
6 7 8 9 10 11 6 = Event A (Total = 7): P( A ) = 36 Event B (Total > 8): P(B) = 10 36 = 1 2 3 4 5 6 2 3 4 5 6 7 3 4 5 6 7 8 4 5 6 7 8 9 5 6 7 8 9 10 7 8 9 10 11 12 1 6 5 18
Sample space
The sample space for an experiment illustrates the set of all possible outcomes. An event is therefore a sub-set of the sample space. Probabilities can be calculated from first principles.
MEI Stats 1
Pages 92 to 94 Probability of one event or another Pages 94 and 95 Mutually exclusive events
MEI Stats 1
Pages 109 to 110 Dependent and independent events
Statistics 1
Revision Notes
Topic
Tree diagrams
Probability 2
Examples
Independent events
A child's toy has two parts; 90% of top parts and 75% of bottom parts are perfect. Parts are placed together at random. Event A (top part is perfect): P(A) = 0.9 Event B (bottom part is perfect): P(B) = 0.75 P(A B) = P(A) P(B) = 0.9 0.75 = 0.675
References
MEI Stats 1
Pages 98 to 101 The probability of events from two trials
A useful way of illustrating probabilities for both independent and dependent events. Multiply probabilities along the branches (and); Add probabilities at the ends of branches (or).
Independent events:
Dependent events:
Dependent events
A pack of cards is shuffled; two cards are chosen at random without replacement. Event A (1st card is a picture card): P(A) = 12 52 Event B (2nd card is a picture card): P(B | A) = 11 51
11 11 P(A B) = P(A) P(B | A) = 12 52 51 = 221
Tree diagrams may have more than two branches at each division and/or more than two sets. Tree diagrams may be asymmetrical.
Conditional Probability
The multiplication law for dependent probabilities: may be rearranged to give: P(B | A) = P( A B ) or P(A | B) = P( A B )
P(A)
P(B )
40 12 3 11 P(B) = 12 52 51 + 52 51 = 13
MEI Stats 1
Pages 107 to 113 Conditional probability
If event A logically precedes event B then the right-hand version is useful for calculating posterior conditional probability.
= 11 51
MEI Stats 1
Pages 139 to 140 Factorials and arrangements Pages 143 to 146 Combinations Binomial coefficients Pages 147 to 149 Calculating probabilities in less simple cases
n n! Cr = = , for r = 0, 1, 2, , n r r !( n r )!
C5 = 17! = 6188
5! 12!
Suppose that n distinct objects are divided into types S and T , where n(S) = n1 and n(T) = n2 and r objects are selected at random from the n objects. The probability that there are r1 of type S and r2 of type T is: n1 Cr1 n2 Cr2 where r1 + r2 = r and n1 + n2 = n n
C3 =
7! 10! = 2520 2! 5! 3! 7!
Hence probability that the team, chosen at random, contains exactly two girls
Cr
Revision Notes
Statistics 1
Topic
References
MEI Stats 1
Pages 118 to 124 Definitions Notation Vertical line charts Calculation of probabilities
25 k = 1 k = 12 = 0.48
12
25
Definition by formula: Sometimes it is possible to define the probability function as a formula, as a function of r: P(X = r) = f(r) for values of r (usually integral) Often the function f includes a constant, k, which can be found using the property pi = 1 Definition by table: For a small set of values it is often convenient to list the associated probabilities pi for each xi x1 x2 x3 . xn 1 xn xi P(X = xi) p1 p2 p3 . pn 1 pn Calculation of probabilities: Sometimes you need to be able to calculate the probability of some compound event, given the values from the table or function. Explanation of probabilities: Often you need to explain how the probability P(X = xk), for some value of k, is derived from first principles.
Definition by table
In a competition, you have to match 4 inventors with 4 inventions. Assume this is done at random. Let X represent the number of correct matchings. The distribution is given by the table: r P(X = r) 0 3 8 1 1 3 2 1 4 3 0 4 1 24
Expectation (mean)
The expectation (or mean) of a discrete random variable is defined by: E(X) = = xiP(X = xi) = xipi
Calculation of probabilities:
If two friends both enter the competition, the probability that both guess the same number of correct matchings 3 2 + 1 2 + 1 2 + 0 2 + 1 2 = 91 0.316 (3 s.f.) = 8 288 3 4 24
MEI Stats 1
Pages 127 to 130 Expectation of a discrete random variable
Variance
The variance of a discrete random variable is defined by: Var(X) = = E([X ] ) x P(X = x)
2 2 2 2
Explanation of probabilities:
Explanation of why P(X = 2) = 1 4: Total number of possible matchings = 4! = 24 One correct matching found in 4C2 = 6 ways 6 = 1 P(X = 2) = 24 4
MEI Stats 1
Pages 127 to 130 Variance of a discrete random variable
Statistics 1
Revision Notes
Topic
References
MEI Stats 1
Pages 153 to 156 The binomial distribution Pages 158 to 161 Expectation of B(n, p) Using the binomial distribution
MEI Stats 1
P(X 5) = 0.8982
P(X = 4) = P(X 4) P(X 3) = 0.7687 0.5665 = 0.2022 P(X > 6) = 1 P(X 6) = 1 0.9629 = 0.0371 P(3 X 6) = P(X 6) P(X 2) = 0.9629 0.3287 = 0.6342
Hypothesis Testing
A null hypothesis (H0) is tested against an alternative hypothesis (H1) at a particular significance level. According to given criteria, the null hypothesis is either rejected is not rejected. An hypothesis test can be either 1-tailed or 2-tailed.
MEI Stats 1
Pages 169 to 173
Two tail test A survey claims: "15% of population left-handed". Hypothesis test to test survey's claim at 10% level: (1) H0: p = 0.15; H1: p 0.15 (2-tail) (2) Decide on the significance level: 10% (3) Data collected: 7 LH in random sample of 20 (4) Conduct test: since 7 > mean (20 0.15 = 3), P(X 7) = 1 P(X 6) = 1 0.9781 = 0.0219 (5) Interpret result: Since P(X 7) < 5%, there is enough evidence to reject H0, i.e. do not accept 15% of the population are left-handed. Critical region: Since P(X 0) = 0.0388 and P(X 1) = 0.1756, and P(X 6) = 0.0388 and P(X 7) = 0.0673, {X: x = 0 or x 7} is the critical region.
Excel Spreadsheet
Binomial Distribution, Hypothesis Testing and Critical Regions