Assignment Unit 4 Math 1280
Data:
1. Using the list of 17 numbers at the top of the page, the median of this data,
rounded to two decimal places, is:_____ 2.8
To find the median
# Write the values into an R object x,
> x <- c(7.2, 1.2, 1.8, 2.8, 18, -1.9, -0.1, -1.5, 13.0, 3.2, -1.1, 7.0, 0.5, 3.9, 2.1, 4.1,
6.5)
#Then sort the observations:
> sort(x)
[1] -1.9 -1.5 -1.1 -0.1 0.5 1.2 1.8 2.1 2.8 3.2
[11] 3.9 4.1 6.5 7.0 7.2 13.0 18.0
# Count the Observations:
> length(x)
[1] 17
# Find the median = the middle value
1: by counting. The 9th observation (there are 8 observations on both left and the
right) = 2.8
2: Use R
> median(x)
[1] 2.8
2. If you find the median using the original method (paper and pencil), you
have to arrange the values into numeric order (True/False).____ True.
3. (The calculations MUST be done manually, do not use R) The
interquartile rang for this data is (round each value to 3 decimal
places):__6.00
The first quartile (Q1) in the data is = 0.5.
The third quartile (Q3) in the data is = 6.5.
The interquartile range for the data is = 6.00 (6.500 – 0.500 = IQR = 6.000)
4. The formula for calculating the interquartile range is_____________
(show the formula and a citation to the source that you used).
Interquartile range equation = Q3 - Q1
Q3= the third quartile and Q1 = the first quartile.
Citation: Yakir, B. (March, 2011)
5. (The calculations MUST be done manually, do not use R) Using techniques
that we studied in this course, the upper and the lower cutoff points (rounded
to three decimal places) for identifying outliers in the given data sample are:
______ and ______ (this is not a request to show any outliers—just the cutoff
points that would determine what constitutes an outlier.) You may round to
three decimal places.
The upper cut off point (IQR * 1.5) + Q3 = (6.000 * 1.5) + 6.500 = 15.50
The lower cut off point Q1 - (IQR * 1.5) = (0.500 – (6.00 * 1.5) = -8.5
6. The summary() command shows a list of outliers, if there are any
(True/False):______________False
7. The list of outlier values is:_____________ (if there are none, write
"NA"). (18), only one outlier
To find it in R:
> lower.cutoff <- - 8.5
> upper.cutoff <- 15.50
> x[x<lower.cutoff | x >upper.cutoff]
[1] 18
8. The standard deviation of the list of 17 numbers is (round to 3 decimal
places):
> sd(x)
[1] 5.249468
= 5.300
A Random Variable:
9. The missing probability value (under the number 4) in the random
variable table above is:_______ The sum of the probabilities in the table .10 + .15
+ .25 + .10 + .10 +.15 = .85. The missing value was 1 - .85 = .15
10. The sum of the probabilites in the second row of any random variable
table like the one above should equal (round to 3 decimal places): _____ The
sum of probabilities must equal 1 (Yakir, 2011, p. 66).
(.10 + .15 + .25 + .10 + .10 +.15 + .15 = 1.000)
11. Read section 4.4.1 in the book (Yakir, 2011). Do the numbers in the
table above (for the random variable) represent a data sample (Yes/No)?____
No
12. In the random variable table shown above, the value in the second row
represents the cumulative probability of the corresponding values in the first
row (True/False) _________ False
13. The probability that a randomly selected value from this random value
will be less than or equal to 3 is :_____ P (Y <= 3) = P (Y = 0) + P (Y = 1) + P
(Y = 2) + P (Y = 3) = 0.10 + 0.15 + .25 + .10 = 0.6
14. What is the probability that a randomly selected value from the random
variable would be exactly 1.5? ______ 0
15. Review section 4.4 in the book (Yakir, 2011), especially pages 57—58.
The expectation of the random variable is:____ (0*.10) + (1*.15) + (2*0.25) +
(3*.10) + (4*.15) + (5*.10) + (6*.15) + (.15) + (.5) + (.3) + (.6) + (.5) + (.9)
Answer: (2.95)
16. To find the expectation of a random variable by using a relative
frequency table, you can add the values in the first row of the table and divide
by the number of columns in the table (True/False)_______ False
17. Study Yakir (2011) pp. 57-59 and solved problems 4.1.6-4.1.8. The
(population) standard deviation of the random variable above is (round to 3
decimal places):_______ (hint, you can not put values from the table into the
sd() function because the sd() function does not adjust for the probabilities).
1.910
18. If you have already calculated the standard deviation of a data sample,
what is the next thing to do to find the variance: Square the standard variance
A Population:
19. Determine how many observations in the pop3.csv file are of type a: __
49,949___.
> pppp3 <- read.csv("pop3.csv")
> colnames(pppp3)
[1] "type" "time"
> length(pppp3$type[ pppp3$type == 'a'])
[1] 49949
20. Using the appropriate R function with the defaul options, what is the
median of the time column of pop3 (round to 3 decimal places): __4.473
> round(median(pppp3$time), 3)
[1] 4.473
21. What is the variance of the time column of pop3 (rounded to three decimal
places)? _______54.916
> round(var(pppp3$time), 3)
[1] 54.916
Reference:
Yakir, B. (2011). Introduction to Statistical Thinking (With R, Without Calculus). The
Hebrew University.