Statistics
- Discipline that concerns the collection, organization,
analysis, interpretation, and presentation of data.
- Comes from the following words:
Neo-Latin word ‘statisticum colllegium’ (council of
state)
Italian word ‘statista’ (stateman or politician)
- Statistik, first introduced by Gottfried Achenwall (1749),
originally designated the analysis of data about the state,
signifying the “science of state”.
SHORT HISTORY
Time Contributor Contribution
4500-3000 BCE Ancient Rome and Census Revenues
China
433-357 BCE Bhadrabahu Probability
27 BCE – 17 CE Roman Emperor Surveys on birth and
death
Middle Ages Censuses on
Population
14th Century Keeping records
1620-1674 John Graunt School of Political
Arithmetic
1623-1687 William Petty Survey methods
1701-1761 Thomas Bayes Baye’s Theorem
1623-1662 Blaise Pascal Probability Theory
1601-1965 Pierre de Fermat Theory of Numbers
1749-1827 Pierre Laplace Fundamentals of
Statistics
1777-1885 Carl Friedrich Gauss Normal Curve
1822-1911 Francis Galton Regression and
Correlation
1857-1936 Karl Pearson Father of Modern
Statistics
1874-1948 Charles Spearman Ranks
1878-1939 Kirstine Smith Chi-square
DESCRIPTIVE AND INFERENTIAL
Descriptive Statistics
- Summarizes the characteristics of a data set, such as
facts and proven outcomes.
- Presented in charts and graphs
- Measures of central tendency
Mean, median, mode
- Measure of dispersion or variability
Variance, standard deviation, skewness or range.
- Measure of distribution
Quantity or percentage of particular outcome
Inferential Statistics
- Uses a sample of data to make predictions or conclusions
about a larger population.
- Tests a hypothesis or assess if your data can be
generalized to a broader population.
- Hypothesis tests, or tests of significance
- Correlation analysis
- Logistic or linear regression analysis
- Confidence intervals
Population
- The entire group of people, objects, events, organizations,
or other items that you want to study. A population can be
any size, including infinite.
- “N”
Sample
- A subset of the population that you will collect data from.
The sample size is always smaller than the population
size.
- “n”
Data Collection
Data
- A collection of facts and statistics
Data Collection
- The process of gathering and analyzing accurate data
from various sources to find answers
Purposes of Data Collection
1. Research design
2. Sampling design
3. Data processing
Types of Data Collection
1. Primary data collection
Data collected directly by the researcher for the
first time, tailored specifically to the study’s
objectives. Examples include surveys, interviews,
and experiments.
2. Secondary data collection
Data collected previously by other researchers or
institutions, used for purposes different from its
original collection. Examples include government
reports, historical records, and previously published
studies.
PRIMARY DATA COLLECTION
Primary Data Collection
- is the process of gathering data directly from a first-hand
source.
Qualitative Primary Data Collection
gathers non-numeric data to gain deeper insights
into the behaviors, attitudes, and motivations of
the target group.
Quantitative Primary Data Collection
focuses on collecting numeric data that can be
statistically analyzed.
TYPES OF PRIMARY DATA COLLECTION
Offline Primary Data Collection
1. Interviews
2. Experimental Studies
3. Observation
4. Focus groups
Online Primary Data Collection
1. Online surveys
2. Online interviews
3. Web scraping
4. Mobile data collection
Self-Collection
1. Social Media
2. Blog and Forums
3. Self-Reported Diaries
ADVANTAGES OF USING PRIMARY DATA
Specific Relevance
Timeliness
Control
Adaptability
Uniqueness
Research continuity
DISADVANTAGES OF USING PRIMARY DATA
Can be time-consuming
Can be invasive and disruptive
Can be expensive
SECONDARY DATA COLLECTION
Secondary Data Collection
- data collected by someone other than the primary user
and made available for other researchers to use.
ADVANTAGES OF USING SECONDARY DATA
Can give you a greater understanding
Can help to suggest evaluation questions
Can provide a basis for comparison
Data breach notification laws
DISADVANTAGES OF USING PRIMARY DATA
Concepts used may not be the same
Questions may be defined differently
Units of measurement may be different
The data may be outdated
Inaccurate or bias
May not meet the needs of researcher
DATA COLLECTION TOOLS
Word association
Sentence completion
Role-playing
In-person surveys
Online/web surveys
Mobile surveys
Phone surveys
Observation
Sampling Method
Sampling Method
- The process of selecting a sample population from the
target population.
Population
- The population is the
entirety of the group
including all the members
that forms a set of data.
- It is the entire group that
you want to draw
conclusions about.
Sample
- It is the specific group of
individuals that you will
collect data from.
- It is a subset of units in a population, selected to
represent all units in a population of interest.
Sampling frame
- The sampling frame is the actual list of individuals that
the sample will be drawn from. Ideally, it should include
the entire target population (and nobody who is not part
of that population).
Sample size
- The number of individuals you should include in your
sample depends on various factors, including the size
and variability of the population and your research design.
There are different sample size calculators and formulas
depending on what you want to achieve with statistical
analysis.
Two types of Sampling
1. Probability Sampling
2. Non-Probability Sampling
PROBABILITY SAMPLING
Probability Sampling
- is a sampling method that involves randomly selecting a
sample, or a part of the population that you want to
research, allowing you to make strong statistical
inferences about the whole group.
- Probability sampling means that every member of the
population has a chance of being selected. It is mainly
used in quantitative research. If you want to produce
results that are representative of the whole population,
probability sampling techniques are the most valid choice.
TYPES OF PROBABILITY SAMPLING
Simple Random Sampling
In a simple random
sample, every member of
the population has an
equal chance of being
selected. Your sampling
frame should include the
whole population.
Systematic Sampling
Every member of the
population is listed with a number, but instead of
randomly generating numbers, individuals are
chosen at regular intervals.
Stratified Sampling
Stratified sampling involves dividing the population
into subpopulations that may differ in important
ways. It allows you draw more precise conclusions
by ensuring that every subgroup is properly
represented in the sample.
To use this sampling method, you divide the
population into subgroups (called strata) based on
the relevant characteristic
Cluster Sampling
also involves dividing the population into
subgroups, but each subgroup should have similar
characteristics to the whole sample. Instead of
sampling individuals from each subgroup, you
randomly select entire subgroups.
NON-PROBABILITY SAMPLING
Non-Probability Sampling
- In non-probability sampling, not every member of the
population has the equal chance of being selected. It can
rely on the subjective judgement of the researcher. This
type of sample is easier and cheaper to access, but it has
a higher risk of sampling bias.
- Non-probability sampling techniques are often used
in exploratory and qualitative research. In these types of
research, the aim is not to test a hypothesis about a
broad population, but to develop an initial understanding
of a small or under-researched population.
TYPES OF NON-PROBABILITY SAMPLING METHODS
Convenience Sampling
A convenience sample
simply includes the
individuals who happen
to be most accessible to
the researcher.
Convenience samples are at risk for both sampling
bias and selection bias.
Voluntary Response Sampling
Similar to a convenience sample, a voluntary
response sample is mainly based on ease of
access. Instead of the researcher choosing
participants and directly contacting them, people
volunteer themselves
Purposive Sampling
Sample are chosen based on the goals of the study.
They may be chosen based on their knowledge of
the study being conducted or if they satisfied the
traits and conditions set by the researchers.
This type of sampling, also known as judgement
sampling, involves the researcher using their
expertise to select a sample that is most useful to
the purposes of the research.
Quota Sampling
Proportion of the groups in the population were
considered in the number and selection of the
respondents.
Quota sampling relies on the non-random selection
of a predetermined number or proportion of units.
This is called a quota.
Snowball Sampling
Participants in the study were tasked to recruit
other members for the study.
The downside here is also representativeness, as
you have no way of knowing how representative
your sample is due to the reliance on participants
recruiting others.
Presentation of Data
Data presentation
- Data presentation is the art of transforming raw data into
a visual format that's easy to understand and interpret.
It's like turning numbers and statistics into a captivating
story that your audience can quickly grasp. When done
right, data presentation can be a game-changer, enabling
you to convey complex information effectively.
- Data presentation is the art of visualizing complex data
for better understanding.
- Number and statistics captivating story that your
audience can trust.
- It enables someone to convey complex information
effectively.
- Visualizing complex information for better understanding.
WHY DATA PRESENTATION IS IMPORTANT?
1. Clarity: Data presentations make complex information
clear and concise.
2. Engagement: Visuals, such as charts and graphs, grab
your audience's attention.
3. Comprehension: Visual data is easier to understand
than long, numerical reports.
4. Decision-making: Well-presented data aids informed
decision-making.
5. Impact: It leaves a lasting impression on your audience.
Importance: Data presentations enhance clarity, engage the
audience, aid decision-making, and leave a lasting impact.
TYPES OF DATA PRESENTATION
Textual Presentation
- Textual presentation harnesses the power of words and
sentences to elucidate and contextualize your data.
- This method is commonly used to provide a narrative
framework for the data, offering explanations, insights,
and the broader implications of your findings. It serves as
a foundation for a deeper understanding of the data's
significance.
Tabular Presentation
- Tabular presentation employs tables to arrange and
structure your data systematically.
- These tables are invaluable for comparing various data
groups or illustrating how data evolves over time. They
present information in a neat and organized format,
facilitating straightforward comparisons and reference
points.
Graphical Presentation
- Graphical presentation harnesses the visual impact of
charts and graphs to breathe life into your data.
- Charts and graphs are powerful tools for spotlighting
trends, patterns, and relationships hidden within the data.
TYPES OF GRAPHICALPRESENTATION
Bar Graph
o Use a bar chart when you want to compare
different categories or items.
o They are ideal for comparing different categories of
data.
o In this method, each category is represented by a
distinct bar, and the height of the bar corresponds
to the value it represents. Bar charts provide a
clear and intuitive way to discern differences
between categories.
Pie Graph
o Use a pie chart when you want to show how
something is divided into parts. They are great for
data given in percentages!
o It excels at illustrating the relative proportions of
different data categories.
o Each category is depicted as a slice of the pie, with
the size of each slice corresponding to the
percentage of the total value it represents. Pie
charts are particularly effective for showcasing the
distribution of data.
Line Graph
o Use a line graph when you want to track how
something changes over different timescales
(minutes, hour, days etc.)
o They are the go-to choice when showcasing how
data evolves over time.
o Each point on the line represents a specific value at
a particular time period. This method enables
viewers to track trends and fluctuations effortlessly,
making it perfect for visualizing data with temporal
dimensions.
Scatter Plots
o They are the tool of choice when exploring the
relationship between two variables.
o In this method, each point on the plot represents a
pair of values for the two variables in question.
Scatter plots help identify correlations, outliers, and
patterns within data pairs.
o Types: Textual, Tabular, and Graphical
presentations offer various ways to present data.
ADVANTAGES OF THE GRAPHICAL PRESENTATION
1. Easy to Understand
2. Quick Comparison
3. Trend Analysis
4. Space Saving
5. Visual Impact
PARTS OF TABLE PRESENTATION
1. Title
2. Table No.
3. Column Heading/Caption
4. Row Heading/Stub
5. Foot Note
6. Information Source
ADVANTAGES OF USING TABULAR PRESENTATION
1. Brief and simple presentation
2. Facilitates comparison
3. Simple analysis
4. Highlight characteristics of data
5. Cost-effective
6. Provide reference
WHEN TO USE
Textual Presentation
- Use text for narrative, complex ideas, background
information, or summary.
Example:
1. News article
2. Research
3. Description of findings
Tabular Presentation
- Use tables for detailed data, comparisons, reference, or
precision.
Example:
1. Sales data
2. Favorite things
3. Exam results
Graphical Presentation
- Use graphs for trend analysis, comparisons, relationships,
or summary.
Example:
1. Pie chart: Different categories of expenses in a
household budget.
2. Bar chart: comparing the performance of different
teams or products.
WHAT TO INCLUDE IN DATA PRESENTATION
Data points
- Data point could be represented using a number, date,
time, word, or a binary.
- This is where we can see the total data in the survey
conducted.
Graphical methods
- Graphical methods are visual presentations of data that
help us understand and communicate information more
effectively.
- This is where we can easily compare the highest and the
lowest in the survey conducted.
Data source
- Clearly identify where the data came from, including any
relevant details about its collection and analysis.
DOs IN DATA PRESENTATION
1. Use visuals
2. Keep it simple
3. Highlight key points
DON’Ts IN DATA PRESENTATION
1. Overloading with data
2. Unrelated data
3. Lose focus
REMEMBER!!
When interpreting a graph always start with “the graph shows”.