Data Analysis

An Introduction and Overview
 Numerical representations of our data

 Can be:
 Descriptive statistics summarize data.
 Inferential statistics are tools that indicate how
much confidence we can have when we generalize
from a sample to a population.
 Statistics depend on our sampling methods:
 Probability or Non-probability? (i.e. Random or
not?)
 Even with probability samples, there is a
possibility that the statistics we obtain do not
accurately reflect the population.
 Sampling Error
 Inadequate sampling frame, low response rate,
coverage (some people in population not given a
chance of selection)
 Non-Sampling Error
 Problems with transcribing and coding data;
observer/ instrument error; misrepresenation as
error.
 Levels of Measurement – the relationship
among the values that are assigned to a
variable and the attributes of that variable.
 Nominal- naming
 Ordinal- rank order (high to low but no
indication of how much higher or lower one
subject is to another)
 Interval- equal intervals between values
 Ratio- equal intervals AND an absolute zero
(i.e. a ruler)
 Age: under 30, 30-39, 40-49, 50-59
 Gender: Male, Female
 Level of Agreement: Strongly Agree, Agree,
Neutral, Disagree, Strongly Disagree
 Percentage of the library budget spent on staff
salaries.
 Descriptive  Comparative
objectives/ research objectives/
questions: hypotheses
 Descriptive statistics  Inferential Statistics

 Can be applied to any measurements
(quantitative or qualitative)
 Offers a summary/ overview/ description of
data. Does not explain or interpret.
 Number  Variability
 Frequency Count  Variance and
 Percentage standard deviation
 Deciles and quartiles  Graphs
 Measures of Central  Normal Curve
Tendency (Mean,
Midpoint, Mode)
 Averages
 Mode: most frequently occurring value in a
distribution (any scale, most unstable)
 Median: midpoint in the distribution below which
half of the cases reside (ordinal and above)
 Mean: arithmetic average- the sum of all values in a
distribution divided by the number of cases (interval
or ratio)
 Example (11 test scores)
61, 61, 72, 77, 80, 81, 82, 85, 89, 90, 92
The median is 81 (half of the scores fall above 81,

and half below)
 Example (6 scores)
3, 3, 7, 10, 12, 15
Even number of scores= Median is half-way

between these scores
Sum the middle scores (7+10=17) and divide by 2
17/2= 8.5
 Insensitive to extremes
3, 3, 7, 10, 12, 15, 200

 Mean is half the sum of a set of values:
 Scores: 5, 6, 7, 10, 12, 15
 Sum: 55
 Number of scores: 6
 Computation of Mean: 55/6= 9.17
 Mode is the most frequently occurring value in
a set.
 Best used for nominal data.
 Skewed to the right (positive) or left (negative)
 An extremely hard test that results in a lot of
low grades will be skewed to the right:
 the mode is smaller than the median, which is
smaller than the mean. This relationship exists
because the mode is the point on the x-axis
corresponding to the highest point, that is the
score with greatest value, or frequency. The
median is the point on the x-axis that cuts the
distribution in half, such that 50% of the area
falls on each side.
 An extremely easy test will result in a lot of
high grades, and will skew to the left (negative)
 The order of the measures of central tendency
would be the opposite of the positively skewed
distribution, with the mean being smaller than
the median, which is smaller than the mode.
 Variability is the differences among scores-
shows how subjects vary:
 Dispersion: extent of scatter around the “average”
 Range: highest and lowest scores in a distribution
 Variance and standard deviation: spread of scores in
a distribution. The greater the scatter, the larger the
variance
 Interval or ration level data
 Standard deviation: how much subjects differ
from the mean of their group
 Measures how much subjects differ from the
mean of their group
 The more spread out the subjects are around
the mean, the larger the standard deviation
 Sensitive to extremes or “outliers”
 Allows for comparisons across variables
 i.e. is there a relation between one’s occupation and
their reason for using the public library?
 Hypothesis Testing
 The level of significance is the predetermined
level at which a null hypothesis is not
supported. The most common level is p < .05
 P =probability
 < = less than (> = more than)
 Type I error  Type II error
 Reject the null  Fail to reject the null
hypothesis when it is hypothesis when it is
really true really false
 By using inferential statistics to make decisions,
we can report the probability that we have
made a Type I error (indicated by the p value
we report)
 By reporting the p value, we alert readers to
the odds that we were incorrect when we
decided to reject the null hypothesis
 Chi-square test of independence: two variables
(nominal and nominal, nominal and ordinal, or
ordinal and ordinal)
 Affected by number of cells, number of cases
 2-tailed distribution= null hypothesis
 1-tailed distribution= directional hypothesis
 Correlation—the extent to which two variables
are related across a group of subjects
 Pearson r
 It can range from -1.00 to 1.00
 -1.00 is a perfect inverse relationship—the strongest possible
inverse relationship
 0.00 indicates the complete absence of a relationship
 1.00 is a perfect positive relationship—the strongest possible
direct relationship
 The closer a value is to 0.00, the weaker the relationship
 The closer a value is to -1.00 or +1.00, the stronger it is
 Spearman rho
 t-test
 Test the difference between two sample means for
significance
 pretest to posttest
 Relates to research design
 Perhaps used for information literacy instruction
Analysis of variance
 Regression analysis (including step-wise
regression)
Analysis of variance (ANOVA) tests the
difference(s) among two or more means
 It can be used to test the difference between

two means
 So use t-test or ANOVA?
 KEY: ANOVA also can be used to test the
difference among more than two means in a
single test—which cannot be done with a t test
 Parametric statistical tests generally require
interval or ratio level data and assume that the
scores were drawn from a normally distributed
population or that both sets of scores were
drawn from populations with the same
variance or spread of scores
 Nonparametric methods do not make
assumptions about the shape of the population
distribution. These are typically less powerful
and often need large samples

Data Analysis

Uploaded by

Data Analysis

Uploaded by

An Introduction and Overview

 Numerical representations of our data

 Descriptive statistics  Inferential Statistics

The median is 81 (half of the scores fall above 81,

Even number of scores= Median is half-way

3, 3, 7, 10, 12, 15, 200

 It can be used to test the difference between

You might also like