0% found this document useful (0 votes)
9 views

Methods Guide

Uploaded by

Brigitta Domokos
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Methods Guide

Uploaded by

Brigitta Domokos
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Basic Synthesis

Measures of central tendency (middle point of data set)

- Mean (average)
- Median (middle value of set)
- Mode (most recurring value in set)

Data set

You can only produce a frequency count for both ‘institution’ and ‘gender’
Measures of dispersion (how do data points vary from the mean).

- Range X max – X min


- Variance (average area away from the mean)
- Standard deviation (spread around the mean)

The sequence is 4, 6, 8, 6, 3, 6, 8, 7, 9, and 8.

1. Find the mean?


2. Find the Median?
3. Find the Mode?
4. Find the Range?

Answers

1. 4+6+8+6+3+6+8+7+9+8 = 95, 95/10 = 0.95

2. 3, 4 ,6, 6, 6, 7, 8, 8, 8, 9 = 7

3. 6 & 8

4. Xmax - Xmin, 9 - 3 = 6
Variance

Sum of the distance of each value away from the mean squared, divided by
the number of data points.

Each data point has a distance from the mean, we square that distance and
divide by n-1 in the case of a sample.

When we talk about pop, we have to capitalise.

Variance Formulas

Used to measure Samples

Used to measure Populations


If you take the distance of each point away from the mean, those above the
mean will cancel those below the mean – some are less, some are more. If
the distance is taken, these will cancel each other out. Therefore, we convert
that distance into an area, so that the distances below the mean – minus
becomes plus – so that all we have is a positive area. These are then added
up.

2nd is a shorthand version of how to do it – make it an area, add up area, and


divide by n-1 (sample).

Variance used as a basis to study population.

x̄ is the average (mean).

Simple Example.

The sequence: 2, 3, 4, 5, 6, 7, 8

x̄ (mean) = 5

Find the variance?

(2 - 5)2 = 9
(3 - 5)2 = 4
(4 - 5)2 = 1
(5 - 5)2 = 0
(6 - 5)2 = 1
(7 - 5)2 = 4
(8 - 5)2 = 9

9+4+1+0+1+4+9 = 28

Total amount of variables of the sequence is 7

28 / 7 = 4
Variance = 4
Standard Deviation

Mean is distributed in terms of standard deviation. Normally, there are 34% of


our values which fall within the standard deviation. Symmetrical curve. 68% of
our data set would normally fall within minus and plus one standard deviation.
This is our metre ruler. Below the mean is negative st. dev.

Properties of curve:
- Normal curve is symmetrical
- The average, median and mode are equal
- The tail ends do not touch the axis
- Given a mean (X with line on top), and a standard deviation (S) you can
draw the curve

- Caution: normal curve assumptions need groups of around 50 or more


It's fairly simple, to find the standard deviation all that needs to be done is first
find the variance of a sequence, then square root the answer.

Example

The previous example we found that the variance of that sequence was 4. So
find the standard deviation of the same sequence?

𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒

Standard Deviation = 2

Z-Score

Properties of curve:
- Normal curve is symmetrical
- The average, median and mode are equal
- The tail ends do not touch the axis
- Given a mean (X with line on top), and a standard deviation (S) you can
draw the curve
- Caution: normal curve assumptions need groups of around 50 or more

An example
Describe the Maltese population given that:
- The average age is 47
- The standard deviation of the population is 15

Plus one s.d. is 47 + 15 = 62.


34% of pop is between 47 and 62. Next is 77 (plus 2 s.d.). nearly 50% of pop
is close to 47. Next is 92. If those percentages are added up, they are the
percentage of pop in that age bracket.

Therefore, any age can be expressed in terms of standard deviations away


from the mean.
The distance of s.d. can be calculated, can be expressed in terms of s.d. e.g.
How far is someone age 65 away from the mean?
This can be done by using a Z score table.
Example:
Average TV viewing = 100 mins
Standard deviation = 15 mins
How many views are 90 mins or more?

Take 90-100 divided by standard deviation (15) = 0.67 = Z score


Find the Z score table which is at back of book.
‘Another example’

Average age of population = 45


Standard deviation = 15
What is my potential reach for a
magazine directed to 61+?

Z = (61-45)/15 = 16/15 = 1.07 = 0.1423


Therefore, 14% of the population is 61
years old or more.
Other distributions besides Z-distribution

Table in book for T score is quite abbreviated – table C2.

Comparing 2 people in different cohorts.


When assessing, it is best done in relation to the rest of the group. What
appears to be a low score for person A is actually a sizable high score for
person B.

Sampling distribution

- Not one simple, but a number of samples


- Each sample has an average and a standard deviation
- Plot a new curve with the sample averages
- Sampling distribution is always a normal curve
- It is narrower than sample distribution
- Allows us to be closer in our estimations
- Allows us to consider the properties of a normal curve

3 distributions

1. Population
2. Sample
3. Sampling distribution
Confidence intervals

● Using samples to attempt to talk about populations


● Given our knowledge of the normal distribution, we can estimate the
confidence interval around the mean
● Especially useful because we often have to make assumptions about
the mean of the population – average plus or minus 1 standard error
● 68% CI = X ± σ_¯X
● Chances of average pop is in 68% of cases between 1 s.e minus and
plus.

Sampling dist: sample group which is representative of the group – calculate


average and s.d. Plot averages of different sub groups. Another normal curve
will be obtained which has an average and a s.d. Points are from averages
of samples taken. Average of averages. S.E = s.d. of averages. Prediction is
much more accurate.
Relationship between two variables

Chi-Squared
- A chi-square test is a statistical test used to compare observed results
with expected results. The purpose of this test is to determine if a
difference between observed data and expected data is due to chance,
or if it is due to a relationship between the variables you are studying.

- Therefore, a chi-square test is an excellent choice to help us better


understand and interpret the relationship between our two categorical
variables.

- The chi-squared test involves comparing the observed frequencies (the


actual data you have) with the expected frequencies (the frequencies
you would expect if the variables were independent or if they followed a
particular distribution). The test calculates a chi-squared statistic, which
quantifies the difference between observed and expected frequencies.
The larger the chi-squared statistic, the more significant the association
between the variables.

Example:
Suppose you are conducting a survey to determine whether there is a
relationship between gender and a preference for two different types of soft
drinks: "Soda A" and "Soda B." You survey 100 people and record their
preferences as follows:
- 40 males prefer Soda A
- 20 males prefer Soda B
- 30 females prefer Soda A
- 10 females prefer Soda B

Now, you want to test whether there is a significant association between


gender and soft drink preference using a chi-squared test. Here's how you can
set it up:

Null Hypothesis (H0): There is no association between gender and soft drink
preference.
Alternative Hypothesis (H1): There is an association between gender and
soft drink preference

1. Create an observed frequency table:

Soda A Soda B Total

Male 40 20 60

Female 30 10 40

Total 70 10 40
2. Calculate the expected frequencies for each cell under the assumption
of independence. To do this, you can use the formula for expected
frequency:

Expected Frequency = (Row Total * Column Total) / Grand Total

For the "Male - Soda A" cell:

Expected Frequency = (60 * 70) / 100 = 42

Similarly, calculate the expected frequencies for all cells.

Calculate the chi-squared statistic using the formula:

Chi-squared = Σ((Observed - Expected)^2 / Expected) for all cells

3. Calculate the chi-squared value for each cell and sum them up:

Chi-squared = ((40 - 42)^2 / 42) + ((20 - 18)^2 / 18) + ((30 - 28)^2 / 28) + ((10 -
12)^2 / 12) = 0.57
If asked to find the degrees of freedom not to worry.

- Degrees of Freedom refers to the number of values in a statistical


calculation that are free to vary.

To continue with the previous example:

Determine the degrees of freedom, which is

(number of rows - 1) * (number of columns - 1).

In this case, it's (2 - 1) * (2 - 1) = 1.

4. Now look up the critical chi-squared value from a chi-squared


distribution table or use a statistical calculator. For a significance level of
0.05 and 1 degree of freedom, the critical value is approximately 3.841.

5. Compare the calculated chi-squared statistic (0.57) with the critical


value (3.841). Since 0.57 is less than 3.841, you would fail to reject the
null hypothesis.

Conclusion: Based on the chi-squared test, there is not enough


evidence to conclude that there is a significant association between
gender and soft drink preference in this sample.

This is a simplified example, but it demonstrates the basic steps of conducting


a chi-squared test for independence with categorical data. The interpretation
may vary depending on the specific data and research question.

Hope this helps xxxx

You might also like