0% found this document useful (0 votes)
425 views11 pages

Difference Between Statistic and Statistics

The document defines key statistical concepts such as samples and populations, descriptive and inferential statistics, measures of dispersion including range, standard deviation and interquartile range. It also covers standard deviation for samples and populations, measures of relative position including percentiles, quartiles and z-scores, and how to construct a box and whisker plot.

Uploaded by

maria69
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
425 views11 pages

Difference Between Statistic and Statistics

The document defines key statistical concepts such as samples and populations, descriptive and inferential statistics, measures of dispersion including range, standard deviation and interquartile range. It also covers standard deviation for samples and populations, measures of relative position including percentiles, quartiles and z-scores, and how to construct a box and whisker plot.

Uploaded by

maria69
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Statistics and statistic

Statistics is the application of probability theory, a branch of mathematics, to statistics, as opposed to


techniques for collecting statistical data.

Samples and Populations

A selection taken from a larger group (the "population") that will, hopefully, let you find out things
about the larger group. Samples should be chosen randomly. 

Example

1. You ask 100 randomly chosen people at a football match what their main job is. Your sample is the
100, while the population is all the people at that match.

2. Kevin asks 20 random people at school. Sample is the 20, while the population is all the people at the
school

Descriptive and Inferential Statistics

Descriptive statistics uses the data to provide descriptions of the population, either through numerical
calculations or graphs or tables. 

Inferential statistics makes inferences and predictions about a population based on a sample of data
taken from the population in question.

Definition of Measures of Dispersion and its three commonly used measures

Measures of Dispersion - As the name suggests, the measure of dispersion shows the scatterings of the
data. It tells the variation of the data from one another and gives a clear idea about the distribution of
the data. The measure of dispersion shows the homogeneity or the heterogeneity of the distribution of
the observations.

Range - A range is the most common and easily understandable measure of dispersion. It is the
difference between two extreme observations of the data set. If X max and X minare the two extreme
observations then

Range = X max – X min

Standard Deviation - A standard deviation is the positive square root of the arithmetic mean of the
squares of the deviations of the given values from their arithmetic mean. It is denoted by a Greek letter
sigma, σ. It is also referred to as root mean square deviation. The standard deviation is given as

σ = [(Σi (yi – ȳ) ⁄ n] ½ =  [(Σ i yi 2 ⁄ n) – ȳ 2] ½

Interquartile Range - Interquartile range is defined as the difference between the 25 th and
75th percentile (also called the first and third quartile). Hence the interquartile range describes the
middle 50% of observations. If the interquartile range is large it means that the middle 50% of
observations are spaced wide apart.
Examples

Standard Deviation for Samples and Populations

Standard deviation measures the spread of a data distribution. It measures the typical distance between
each data point and the mean.

Examples:

Sam has 20 Rose Bushes.

The number of flowers on each bush is

9, 2, 5, 4, 12, 7, 8, 11, 9, 3, 7, 4, 12, Example: Sam has 20 Rose Bushes.

The number of flowers on each bush is

9, 2, 5, 4, 12, 7, 8, 11, 9, 3, 7, 4, 12, 5, 4, 10, 9, 6, 9, 4

Work out the Standard Deviation. 

Step 1. Work out the mean

In the formula above μ (the greek letter "mu") is the mean of all our values ...

Example: 9, 2, 5, 4, 12, 7, 8, 11, 9, 3, 7, 4, 12, 5, 4, 10, 9, 6, 9, 4

The mean is:

9+2+5+4+12+7+8+11+9+3+7+4+12+5+4+10+9+6+9+420= 140/20 = 7

So: μ = 7
 

Step 2. Then for each number: subtract the Mean and square the result

This is the part of the formula that says:

So what is xi ? They are the individual x values 9, 2, 5, 4, 12, 7, etc...

In other words x1 = 9, x2 = 2, x3 = 5, etc.

So it says "for each value, subtract the mean and square the result", like this

Example (continued):

(9 - 7)2 = (2)2 = 4

(2 - 7)2 = (-5)2 = 25

(5 - 7)2 = (-2)2 = 4

(4 - 7)2 = (-3)2 = 9

(12 - 7)2 = (5)2 = 25

(7 - 7)2 = (0)2 = 0

(8 - 7)2 = (1)2 = 1

... etc ...

And we get these results:

4, 25, 4, 9, 25, 0, 1, 16, 4, 16, 0, 9, 25, 4, 9, 9, 4, 1, 4, 9

Step 3. Then work out the mean of those squared differences.

To work out the mean, add up all the values then divide by how many.

First add up all the values from the previous step.

But how do we say "add them all up" in mathematics? We use "Sigma": Σ

The handy Sigma Notation says to sum up as many terms as we want:


Sigma Notation
We want to add up all the values from 1 to N, where N=20 in our case because there are 20 values:

Example (continued):

Which means: Sum all values from (x1-7)2 to (xN-7)2 

We already calculated (x1-7)2=4 etc. in the previous step, so just sum them up:

= 4+25+4+9+25+0+1+16+4+16+0+9+25+4+9+9+4+1+4+9 = 178

But that isn't the mean yet, we need to divide by how many, which is done by multiplying by 1/N (the
same as dividing by N):

Example (continued):

Mean of squared differences = (1/20) × 178 = 8.9

(Note: this value is called the "Variance") 

Step 4. Take the square root of that:

Example (concluded):

σ = √(8.9) = 2.983...

2. Example: Sam has 20 rose bushes, but only counted the flowers on 6 of them!

The "population" is all 20 rose bushes, and the "sample" is the 6 bushes that Sam counted the flowers
of.

Let us say Sam's flower counts are:


9, 2, 5, 4, 12, 7

We can still estimate the Standard Deviation.

But when we use the sample as an estimate of the whole population, the Standard Deviation formula
changes to this:

The formula for Sample Standard Deviation:

The important change is "N-1" instead of "N" (which is called "Bessel's correction").

The symbols also change to reflect that we are working on a sample instead of the whole population:
 The mean is now x (for sample mean) instead of μ (the population mean),
 And the answer is s (for Sample Standard Deviation) instead of σ.
But that does not affect the calculations. Only N-1 instead of N changes the calculations.

OK, let us now calculate the Sample Standard Deviation:

Step 1. Work out the mean


Example 2: Using sampled values 9, 2, 5, 4, 12, 7

The mean is (9+2+5+4+12+7) / 6 = 39/6 = 6.5

So:

x = 6.5

Step 2. Then for each number: subtract the Mean and square the result
Example 2 (continued):

(9 - 6.5)2 = (2.5)2 = 6.25

(2 - 6.5)2 = (-4.5)2 = 20.25

(5 - 6.5)2 = (-1.5)2 = 2.25

(4 - 6.5)2 = (-2.5)2 = 6.25

(12 - 6.5)2 = (5.5)2 = 30.25


(7 - 6.5)2 = (0.5)2 = 0.25

Step 3. Then work out the mean of those squared differences.

To work out the mean, add up all the values then divide by how many.

But hang on ... we are calculating the Sample Standard Deviation, so instead of dividing by how many
(N), we will divide by N-1

Example 2 (continued):

Sum = 6.25 + 20.25 + 2.25 + 6.25 + 30.25 + 0.25 = 65.5

Divide by N-1: (1/5) × 65.5 = 13.1

(This value is called the "Sample Variance")

Step 4. Take the square root of that:


Example 2 (concluded):

s = √(13.1) = 3.619...

Definition of Measures of Relative Position and its measures

Measures of Position. Statisticians often talk about the position of a value, relative to other values in a


set of data. The most common measures of position are percentiles, quartiles, and standard scores (aka,
z-scores).

Percentiles - Assume that the elements in a data set are rank ordered from the smallest to the largest.
The values that divide a rank-ordered set of elements into 100 equal parts are called percentiles.

Quartiles - Quartiles divide a rank-ordered data set into four equal parts. The values that divide each
part are called the first, second, and third quartiles; and they are denoted by Q 1, Q2, and Q3, respectively.
The chart below shows a set of four numbers divided into quartiles.

Standard Scores (z-Scores) - A standard score (aka, a z-score) indicates how many standard


deviations an element is from the mean. A standard score can be calculated from the following formula.

z = (X - μ) / σ

Examples

1. A national achievement test is administered annually to 3rd graders. The test has a mean score of 100
and a standard deviation of 15. If Jane's z-score is 1.20, what was her score on the test?
Solution

The correct answer is (E). From the z-score equation, we know

z = (X - μ) / σ

where z is the z-score, X is the value of the element, μ is the mean of the population, and σ is the
standard deviation.

Solving for Jane's test score (X), we get

X = ( z * σ) + 100 = ( 1.20 * 15) + 100 = 18 + 100 = 118

2.

Box and Whisker Plot

In a box and whisker plot: the ends of the box are the upper and lower quartiles, so the box spans the
interquartile range. the median is marked by a vertical line inside the box. the whiskers are the two lines
outside the box that extend to the highest and lowest observations.

Example:

1. Find Q1Q1 , Q2Q2 , and Q3Q3 for the following data set, and draw a box-and-whisker plot.

{2,6,7,8,8,11,12,13,14,15,22,23}{2,6,7,8,8,11,12,13,14,15,22,23}

There are 1212 data points. The middle two are 1111 and 1212 . So the median, Q2Q2 , is 11.511.5 .

The "lower half" of the data set is the set {2,6,7,8,8,11}{2,6,7,8,8,11} . The median here is 7.57.5 .
So Q1=7.5Q1=7.5 .

The "upper half" of the data set is the set {12,13,14,15,22,23}{12,13,14,15,22,23} . The median here
is 14.514.5 . So Q3=14.5Q3=14.5 .

A box-and-whisker plot displays the values Q1Q1 , Q2Q2 , and Q3Q3 , along with the extreme values of
the data set ( 22 and 2323 , in this case):
A box & whisker plot shows a "box" with left edge at Q1Q1 , right edge at Q3Q3 , the "middle" of the
box at Q2Q2 (the median) and the maximum and minimum as "whiskers".

Note that the plot divides the data into 44 equal parts. The left whisker represents the
bottom 25%25% of the data, the left half of the box represents the second 25%25% , the right half of the
box represents the third 25%25% , and the right whisker represents the top 25%25% .

2 Find Q1Q1 , Q2Q2 , and Q3Q3 for the following data set. Identify any outliers, and draw a box-and-
whisker plot.

{5,40,42,46,48,49,50,50,52,53,55,56,58,75,102}{5,40,42,46,48,49,50,50,52,53,55,56,58,75,102}

There are 1515 values, arranged in increasing order. So, Q2Q2 is the 8th8th data point, 5050 .

Q1Q1 is the 4th4th data point, 4646 , and Q3Q3 is the 12th12th data point, 5656 .

The interquartile range IQRIQR is Q3−Q1Q3−Q1 or 56−47=1056−47=10 .

Now we need to find whether there are values less than Q1−(1.5×IQR)Q1−(1.5×IQR) or greater
than Q3+(1.5×IQR)Q3+(1.5×IQR) .

Q1−(1.5×IQR)=46−15=31Q1−(1.5×IQR)=46−15=31

Q3+(1.5×IQR)=56+15=71Q3+(1.5×IQR)=56+15=71

Since 55 is less than 3131 and 7575 and 102102 are greater than 7171 , there are 33 outliers.

The box-and-whisker plot is as shown. Note that 4040 and 5858 are shown as the ends of the whiskers,
with the outliers plotted separately.

The Normal distortion and its applications

Distortion is the alteration of the original shape (or other characteristic) of something.
In communications and electronics it means the alteration of the waveform of an information-
bearing signal, such as an audio signal representing sound or a video signal representing images, in an
electronic device or communication channel.
Example

1. The Kansas Lottery routinely shows its recent results from the Pick 3 Lottery. One of the statistics
reported is the number of times each number (0 through 9) is drawn among the three winning numbers.
The table shows a chart of the number of times each number was drawn during 1,613 total Pick 3 games
(4,839 single numbers drawn). It also reports the percentage of times that each number was drawn.
Depending on how you choose to look at these results, you can make the statistics appear to tell very
different stories.

Numbers Drawn in the Pick 3


Lottery

Number Drawn No. of Times Drawn out Percentage of Times Drawn (No. of Times
of 4,839 Drawn ÷ 4,839)

0 485 10.0%

1 468 9.7%

2 513 10.6%

3 491 10.1%

4 484 10.0%

5 480 9.9%

6 487 10.1%

7 482 10.0%

8 475 9.8%

9 474 9.8%

The way lotteries typically display results like those in the table is shown in the top graph in the
following image.
Bar graphs showing a) number of times each number was drawn; and b) percentage of times each
number was drawn.

Notice that in this chart, it seems that the number 1 doesn’t get drawn nearly as often (only 468 times)
as number 2 does (513 times). The difference in the height of these two bars appears to be very large,
exaggerating the difference in the number of times these two numbers were drawn. However, to put
this in perspective, the actual difference here is 513 – 468 = 45 out of a total of 4,839 numbers drawn. In
terms of percentages, the difference between the number of times the number 1 and the number 2 are
drawn is 45 ÷ 4,839 = 0.009, or only nine-tenths of one percent (0.009 x 100% = 0.9%).

Why the top graph in the image was made this way? It might lead people to think they’ve got an inside
edge if they choose the number 2 because it’s “on a hot streak”; or they might be led to choose the
number 1 because it’s “due to come up.” Both of these theories are wrong, by the way; because the
numbers are chosen at random, what happened in the past doesn’t matter. The bottom graph in the
figure has been made correctly.

2. A final example of this type of misleading graph. Terry Schiavo was removed from life support after a
years-long court battle. CNN used a graph similar to the one below to show who agreed with the
decision to remove the feeding tube.

Linear correlation
A linear relationship means that you can represent the relationship between two sets of variables with
a line (the word “linear” literally means “a line”). In other words, a linear line on a graph is where you
can see a straight line with no curves.

Example

1. If a set of data is linearly related, you can show that relationship using a linear equation. A linear
equation has the form:
y = mx + b
Where:
“m” is the slope of the line,
“x” is any point (an input or x-value) on the line,
and “b” is where the line crosses the y-axis.

The “b” in the slope formula is the y-intercept and the “m” is the slope.
Y = mx + b is sometimes called the Slope Formula.

2. Positive and Negative Linear Relationships

 If a straight line on a graph travels upwards from left to right, it has a positive linear relationship.


It shows a steady rate of increase.

 If a straight line on a graph travels downwards from left to right, it has a negative linear


relationship. It shows a steady rate of decrease.

You might also like