0% found this document useful (0 votes)
118 views8 pages

Bluebottle Size and Handedness Analysis

The document presents an assignment analyzing the relationship between the handedness and size of bluebottles washed up on Maroubra's shore, focusing on observational data from a sample of 70 individuals. It includes statistical analyses such as median, variance, outlier detection, correlation, and hypothesis testing to assess the significance of the findings. The results indicate a moderate positive correlation between bluebottle length and sail height, with strong evidence against the null hypothesis regarding the mean length of bluebottles being less than or equal to 3.4 cm.

Uploaded by

Joseph Ha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
118 views8 pages

Bluebottle Size and Handedness Analysis

The document presents an assignment analyzing the relationship between the handedness and size of bluebottles washed up on Maroubra's shore, focusing on observational data from a sample of 70 individuals. It includes statistical analyses such as median, variance, outlier detection, correlation, and hypothesis testing to assess the significance of the findings. The results indicate a moderate positive correlation between bluebottle length and sail height, with strong evidence against the null hypothesis regarding the mean length of bluebottles being less than or equal to 3.4 cm.

Uploaded by

Joseph Ha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

MATH1041 Assignment, Joseph Onyoo Ha, z5420498

Question 1

1a. The relationship between bluebottles’ handedness and their size (as length and sail height).

1b. Washed-up bluebottles of Maroubra (assuming they do not conclusively represent the global
population)

1c. 70 washed up bluebottles on Maroubra’s shore

1d. n=70

1e. It’s an observational study; there is no attempt to influence responses while measuring individuals’
relevant features in the sample.
Question 2

2.a
median(BBData$Sail_Height);var(BBData$Sail_Height) #median, variance
for sail height
median(BBData$Bluebottle_Length);var(BBData$Bluebottle_Length)
#median, variance for bluebottle length

Sail_Height: M ≈ 0.46 cm, s2 ≈ 0.09 c m 2

Bluebottle_Length: M =4.40 cm, s2 ≈ 2.96 c m 2

2b.
boxplot.stats(BBData$Sail_Height)$out #outlier statistics from the
boxplot of Sail Height
boxplot.stats(BBData$Bluebottle_Length)$out #outlier statistics from
the boxplot of Length

These commands reveal that Sail Height has 2 outliers (1.3, 1.3) and Length has 2 outliers (8.7, 10.3).

2c.
length(which(BBData$Sail_Height > 0.6))/70 #the count of sail height
values greater than 0.6, over n=70 (proportion)

p̂ obs ( Sail Height> 0.6 cm ) ≈ 0.214 ( 3 sf )

2d.
greater0.6 <- which(BBData$Sail_Height>0.6)#indices where sail height
exceed 0.6cm
longer6 <- which(BBData$Bluebottle_Length>6)#indices for length>6cm
length(Reduce(intersect, list(greater0.6,longer6)))/70 #intersection
of the above lists, finds its length and divides by sample
(=proportion)

p̂ obs ( Sail Height> 0.6 cm∩ Length>6 cm) ≈ 0.114 ( 3 sf )


Question 3.

3a.
Min. 1st Qu. Median Mean 3rd Qu. Max. #Left-handed
1.500 3.900 4.400 4.734 5.700 10.300
Min. 1st Qu. Median Mean 3rd Qu. Max. #Right-handed
2.200 4.100 4.700 5.164 6.000 8.500

3b.

3c.
Differences: Left includes 2 outliers, while Right identifies none. Centres of data are lower for Left than
Right.

Similarities: Similar IQR, similar skewness (right skewed).

3d.

n¿=59¿ , n¿ =11

3e. The right-handed sample is smaller than the left-handed. This means that the left-handed sample is
more valid. In larger samples, the estimated mean is more likely to be accurate, outliers that cause skew
are clearly identified (aligning with my observations for Left), and margins of error are smaller overall.
Question 4

4a. Length is the explanatory variable.

4b. Sail Height is the response variable

4c. A scatterplot is an appropriate graphical summary.

4d. A regression line fits a scatterplot.

4e.

4f. Existent relationship, moderate correlation, increasing linear shape.

4g. cor(BBData$Bluebottle_Length, BBData$Sail_Height)

r =0.591 (3 sf )
4h. r measures the strength of the association between two variables, or the closeness of points to a
line fitted to minimise their average distance to it.r can range from –1 to 1 (inclusive). The sign of r (+, or
-) respectively shows that the response variable increases or decreases with an increase in the
explanatory variable. r is considered weak at a magnitude of 0 to 0.5, moderate between 0.5 and 0.7,
and strong beyond that. In this case, at around +0.591, it shows a moderate positive association. The
cluster of points are straight without curving from the fitted line, so it appears to follow a linear trend at
first glance.

4i Variability=r 2=( 0.5913363 )2 ≈ 0.350 ≈ 35.0 % ( 3 sf )


About 35% of the variability of sail height is explained by the length of a bluebottle.
4j Measuring variability is appropriate when assuming the absence of outliers.
4k
Using the 1.5IQR rule, no lower outliers were found, but one value (the 28 th sample) exceeding the
upper boundary was found. Ergo, the assumption in 4j was incorrect, and the regression line in 4e has
been distorted by the said outlier.
Question 5

5a.

5b.

5c. The normality of sample Length quantiles is fair.


5d. The samples must be randomised; as the bluebottles were collected and replaced, this reflects an
independent, identical distribution of length. Combined with a decent sample size of 70, the sample
validly approaches an estimate of the true population.

5e.
mean(BBData$Bluebottle_Length) #sample mean of length
#df=70-1=69, sample size n=70
qt(0.975, 69) #finds t which marks middle 95%
sd(BBData$Bluebottle_Length) #sample standard deviation
mean(BBData$Bluebottle_Length) - qt(0.975, 69) *
sd(BBData$Bluebottle_Length)/sqrt(70) #minimum of interval
mean(BBData$Bluebottle_Length) + qt(0.975, 69) *
sd(BBData$Bluebottle_Length)/sqrt(70) #maximum

Therefore, from the derived minimum and maximum, c i 95% ( μ )= [ 4.39 ,5.21 ] (3sf).

5f. There is no information on the population standard deviation, σ , so, t-distribution is used instead of a
z-test.

5g. For one measured range, the probability of containing a mean within calculated endpoints is not
meaningful; it refers to how 95% of trials to find intervals will contain the true mean. Individually, a
sample interval either has it or not. Secondly, a 95% confidence interval does not imply 95% of data
points are centred around the mean, but rather,
Question 6

6a. Let μ L be the true mean length of Maroubra’s beached bluebottles.

H 0 : μL ≤ 3.4 cm, H ̃ 0 : μ L=3.4 cm, H a : μ L >3.4 cm


¯
X ❑−μ 0
T=
Because σ is unknown, we must use the test statistic S . The mean sample length in
√n
Maroubra was xˉ ≈ 4.80 cm , sample standard deviation was s=1.72cm , and sample size was n=70.
4.80−3.4
t≈ ≈ 6.82 ~
This yields 1.72 . Also, the null distribution at ~
H ̃ 0 is t ( n−1 )=t ( 70−1 )=t ( 69 ) .
√7 0
Considering null hypothesis,
tval2=(mean(BBData$Bluebottle_Length)-3.4)/
(sd(BBData$Bluebottle_Length)/sqrt(70)) #the value of t
pt(tval2,df=69, lower.tail=FALSE) #p-value, lower tail false to read
when T is greater than t
# > [1] 1.409422e-09
−9
P−value=P μ =μ ( T ≥ 6.82 ) ≈ 1.41 ×10
L 0

The P-value, at about 1.41E-7%, is between 0 and 0.1%, providing very strong evidence against the null
hypothesis; it is almost certain that the true length of beached bluebottles in Maroubra is greater than
the South African estimate of 3.4cm.

6b.

Yes, both follow t distributions.

You might also like