0% found this document useful (0 votes)

27 views

Introduction To Statistical Programming - PPT Week 2 - Descriptive Statistics

Introduction to Statistical Programming - PPT Week 2 - Descriptive statistics

Uploaded by

therezia.ryu

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

Introduction To Statistical Programming - PPT Week 2 - Descriptive Statistics

Introduction to Statistical Programming - PPT Week 2 - Descriptive statistics

Uploaded by

therezia.ryu

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 54

DESCRIPTIVE STATISTICS

UNIVERSITAS BINA NUSANTARA

SUBJECT MATTER EXPERT

Rinda Nariswari, S.Si., M.Si.
LEARNING OUTCOME

 LO1: Describe the basic concepts of descriptive and inferential statistics.

 LO2: Calculate the statistical measurement which related to descriptive and inferential
statistics.
 LO3: Use SPSS and SmartPLS to do statistical analysis.
• LO4: Interpret the result of statistical analysis output
SUBTOPICS

1. Describing Data: Graphical

2. Describing Data: Numerical
3. Association Between Two Variables
4. Calculating Univariate Parameters with SPSS
5. Nominal Associations with SPSS
DESCRIBING DATA: GRAPHICAL
GRAPHS TO DESCRIBE CATEGORICAL VARIABLES

• We can describe categorical variables using frequency distribution tables and graphs such as bar
charts, pie charts, and Pareto diagrams.
GRAPHS TO DESCRIBE CATEGORICAL VARIABLES

• A frequency distribution is a table used to organize data. The left column (called classes or
groups) includes all possible responses on a variable being studied. The right column is a
list of the frequencies, or number of observations, for each class.
• A relative frequency distribution is obtained by dividing each frequency by the number of
observations and multiplying the resulting proportion by 100%.
GRAPHS TO DESCRIBE CATEGORICAL VARIABLES

• The classes that we use to construct frequency distribution tables of a categorical variable
are simply the possible responses to the categorical variable.

• If our intent is to draw attention to the frequency of each category, then we will most likely
draw a bar chart.
GRAPHS TO DESCRIBE CATEGORICAL VARIABLES

A cross table, sometimes called a crosstab or a contingency table, lists the number of
observations for every combination of values for two categorical or ordinal variables. The
combination of all possible intervals for the two variables defines the cells in a table. A cross
table with r rows and c columns is referred to as an r * c cross table.
GRAPHS TO DESCRIBE CATEGORICAL VARIABLES

Figure 1.2 displays this information in a component or stacked bar chart. Figure 1.3 is a cluster, or side-
by-side, bar chart of the same data.
GRAPHS TO DESCRIBE CATEGORICAL VARIABLES

• If we want to draw attention to the proportion of frequencies in each category, then we will probably
use a pie chart to depict the division of a whole into its constituent parts.
• The circle (or “pie”) represents the total, and the segments (or “pieces of the pie”) cut from its center
depict shares of that total.
• The pie chart is constructed so that the area of each segment is proportional to the corresponding
frequency.
GRAPHS TO DESCRIBE CATEGORICAL VARIABLES
GRAPHS TO DESCRIBE CATEGORICAL VARIABLES

A Pareto diagram is a bar chart that displays the frequency of defect causes. The bar at the
left indicates the most frequent cause and the bars to the right indicate causes with
decreasing frequencies. A Pareto diagram is used to separate the “vital few” from the
“trivial many.”
GRAPHS TO DESCRIBE TIME-SERIES DATA

• A time series is a set of measurements, ordered over time, on a particular quantity of

interest.
• In a time-series the sequence of the observations is important. A line chart, also called a
time-series plot, is a series of data plotted at various time intervals.
• Examples of time-series data include:
 annual university enrolment
 annual interest rates
 the gross domestic product over a period of years
 Daily closing prices for shares of common stock
 Daily exchange rates between various world currencies
 Government receipts and expenditures over a period of years
 monthly product sales
 quarterly corporate earnings
 social network weekly traffic
GRAPHS TO DESCRIBE TIME-SERIES DATA

The time-series plot in Figure 1.7

shows the annual GDP data growing
rather steadily over a long period of
time from 1929 through 2009. This
pattern clearly shows a strong upward
trend component that is stronger in
some periods than in others.
GRAPHS TO DESCRIBE TIME-SERIES DATA

From the data file RELEVANT Magazine we

obtain the number of weekly new visitors for a
recent 9-week period from both Facebook and
Twitter. This information is given in Table 1.5.
The time series plot in Figure 1.12 shows the
trend over this same time period.
GRAPHS TO DESCRIBE NUMERICAL VARIABLES

Frequency Distributions
• Similar to a frequency distribution for categorical data, a frequency distribution for numerical data
is a table that summarizes data by listing the classes in the left column and the number of
observations in each class in the right column.
• Determining the classes of a frequency distribution for numerical data requires answers to certain
questions: How many classes should be used? How wide should each class be? There are some
general rules
GRAPHS TO DESCRIBE NUMERICAL VARIABLES

Example 1.9 Employee Completion Times

The supervisor of a very large plant obtained

the time (in seconds) for a random sample of n
= 110 employees to complete a particular task.
The goal is to complete this task in less than 4.5
minutes. Table 1.6 contains these times (in
seconds). The data are stored in the data file
Completion Times. What do the data indicate?
GRAPHS TO DESCRIBE NUMERICAL VARIABLES

Example 1.9 Employee Completion Times

GRAPHS TO DESCRIBE NUMERICAL VARIABLES

Histogram
A histogram is a graph that consists of vertical bars constructed on a horizontal line that is marked off
with intervals for the variable being displayed. The intervals correspond to the classes in a frequency
distribution table. The height of each bar is proportional to the number of observations in that interval.
The number of observations can be displayed above the bars.

Ogive
An ogive, sometimes called a cumulative line graph, is a line that connects points that are the cumulative
percent of observations below the upper limit of each interval in a cumulative frequency distribution.
GRAPHS TO DESCRIBE NUMERICAL VARIABLES
GRAPHS TO DESCRIBE NUMERICAL VARIABLES

Stem-and-Leaf Display
• A stem-and-leaf display is an EDA graph that is an alternative to the histogram. Data are grouped
according to their leading digits (called stems), and the final digits (called leaves) are listed
separately for each member of a class. The leaves are displayed individually in ascending order
after each of the stems.

• Describe the following random sample of 10 final exam grades for an introductory accounting class
with a stem-and-leaf display.
88 51 63 85 79 65 79 70 73 77
GRAPHS TO DESCRIBE NUMERICAL VARIABLES

Scatter Plot
We can prepare a scatter plot by locating one point for each pair of two variables that represent an
observation in the data set. The scatter plot provides a picture of the data, including the following:
1. The range of each variable
2. The pattern of values over the range
3. A suggestion as to a possible relationship between the two variables 4. An indication of outliers
(extreme points)
GRAPHS TO DESCRIBE NUMERICAL VARIABLES

Scatter Plot
DESCRIBING DATA: NUMERICAL
MEASURES OF CENTRAL TENDENCY AND LOCATION

Arithmetic Mean
The arithmetic mean (or simply mean) of a set of data is the sum of the data values divided by the
number of observations. If the data set is the entire population of data, then the population mean, m, is a
parameter given by
MEASURES OF CENTRAL TENDENCY AND LOCATION

Median
The median is the middle observation of a set of observations that are arranged in increasing (or
decreasing) order. If the sample size, n, is an odd number, the median is the middle observation. If the
sample size, n, is an even number, the median is the average of the two middle observations. The median
will be the number located in the

Mode
The mode, if one exists, is the most frequently occurring value. A distribution with one mode is called
unimodal; with two modes, it is called bimodal; and with more than two modes, the distribution is
said to be multimodal. The mode is most commonly used with categorical data.
MEASURES OF CENTRAL TENDENCY AND LOCATION

Example 2.1 Demand for Bottled Water

The demand for bottled water increases during the hurricane season in Florida. The number of 1-gallon bottles of
water sold for a random sample of n = 12 hours in one store during hurricane season is:
60 84 65 67 75 72
80 85 63 82 70 75
Describe the central tendency of the data.
MEASURES OF CENTRAL TENDENCY AND LOCATION

Example 2.1 Demand for Bottled Water

The average or mean hourly number of 1-gallon bottles of water demanded is found as follows:
MEASURES OF CENTRAL TENDENCY AND LOCATION

Percentiles and Quartiles

• To find percentiles and quartiles, data must first be arranged in order from the smallest to the largest
values.
• The Pth percentile is a value such that approximately P% of the observations are at or below that
number. Percentiles separate large ordered data sets into 100ths. The 50th percentile is the median.
• The Pth percentile is found as follows:

Pth percentile = value located in the (P/100)(n + 1) th ordered position

MEASURES OF CENTRAL TENDENCY AND LOCATION

Percentiles and Quartiles

• Quartiles are descriptive measures that separate large data sets into four quarters. The first quartile,
Q1, (or 25th percentile) separates approximately the smallest 25% of the data from the remainder of the
data. The second quartile, Q2, (or 50th percentile) is the median.
• The third quartile, Q3, (or 75th percentile), separates approximately the smallest 75% of the data from
the remaining largest 25% of the data.
MEASURES OF CENTRAL TENDENCY AND LOCATION

The five-number summary refers to the five descriptive measures: minimum, first quartile, median,
third quartile, and maximum.

minimum < Q1 < median < Q3 < maximum

MEASURES OF CENTRAL TENDENCY AND LOCATION

Example 2.5 Demand for Bottled Water

MEASURES OF VARIABILITY

• In this section we present descriptive numbers that measure the variability or spread of the observations from the
mean. In particular, we include the range, interquartile range, variance, standard deviation, and coefficient of
variation.
• While two data sets could have the same mean, the individual observations in one set could vary more
from the mean than do the observations in the second set. Consider the following two sets of sample
data

• Although the mean is 10 for both samples, clearly the data in sample A are farther from 10 than are
the data in sample B. We need descriptive numbers to measure this spread.
RANGE AND INTERQUARTILE RANGE

Box-and-Whisker Plot
A box-and-whisker plot is a graph that describes the shape of a distribution in terms of the five-number summary:
the minimum value, first quartile (25th percentile), the median, the third quartile (75th percentile), and the
maximum value. The inner box shows the numbers that span the range from the first to the third quartile. A line is
drawn through the box at the median. There are two “whiskers.” One whisker is the line from the 25th percentile to
the minimum value; the other whisker is the line from the 75th percentile to the maximum value.
RANGE AND INTERQUARTILE RANGE

Example 2.8 Gilotti’s Pizzeria

RANGE AND INTERQUARTILE RANGE

Example 2.8 Gilotti’s Pizzeria

• the distribution of sales for Location 3 is

skewed left, which indicates the presence of
days with sales less than most of the other
days ($200 and $300) or per- haps a data-
entry error.
• the distribution of sales in Location 4 is
skewed right indicating the presence of
sales higher than most of the other days
($2,200 and $2,000) or the possibility that
sales were incorrectly recorded.
VARIANCE AND STANDARD DEVIATION
VARIANCE AND STANDARD DEVIATION

Example 2.9 Gilotti’s Pizzeria Sales

Calculate the standard deviation of daily sales for Gilotti Pizzeria, Location 1. the daily sales for
Location 1 are:
6 8 10 12 14 9 11 7 13 11

To calculate sample variance and standard deviation follow these three steps:
• Step 1: Calculate the sample mean, xbar. It is equal to 10.1.
• Step 2: Find the difference between each of the daily sales and the mean of 10.1.
• Step 3: Square each difference
VARIANCE AND STANDARD DEVIATION

Example 2.9 Gilotti’s Pizzeria Sales

COEFFICIENT OF VARIATION
ASSOCIATION BETWEEN TWO
VARIABLES
MEASURES OF RELATIONSHIPS BETWEEN VARIABLES

• Covariance (Cov) is a measure of the linear relationship between two variables. A

positive value indicates a direct or increasing linear relationship, and a negative value
indicates a decreasing linear relationship.
MEASURES OF RELATIONSHIPS BETWEEN VARIABLES

Correlation Coefficient
The correlation coefficient is computed by dividing the covariance by the product of the
standard deviations of the two variables.
MEASURES OF RELATIONSHIPS BETWEEN VARIABLES

Correlation Coefficient

Figure 2.4 Scatter Plots and Correlation

CALCULATING UNIVARIATE
PARAMETERS WITH SPSS
CALCULATING UNIVARIATE PARAMETERS WITH SPSS

• This section uses the sample dataset spread.sav.

• There are two ways to calculate univariate parameters with SPSS :
1. Most descriptive parameters can be calculated by clicking the menu items Analyze  Descriptive
Statistics  Frequencies. In the menu that opens, first select the variables that are to be calculated for
the univariate statistics. If there’s a cardinal variable among them, deactivate the option Display
frequency tables.
2. Another way to calculate univariate statistics can be obtained by selecting Analyze  Descriptive
Statistics  Descriptives. . .. Once again, select the desired variables and indicate the univariate
parameters in the submenu Options .
CALCULATING UNIVARIATE PARAMETERS WITH SPSS

OUTPUT SPSS
NOMINAL ASSOCIATION WITH SPSS

• Go to Analyze  Descriptive Statistics  Crosstabs.

• Enter your dependent variable in the “row “and the independent variable in the “column” box.
Using the GSS 2008 (1500 cases) database, we can test for the association of the independent variable
“SEX” and the dependent variable “Happy”.
NOMINAL ASSOCIATION WITH SPSS

Click Statistics. Since we are looking at a nominal and

an ordinal variable, we will use lambda
NOMINAL ASSOCIATION WITH SPSS

Click Continue, then OK. Your output will look like this:

Look under the Value column to the dependent variable “General Happiness”. The lambda value is .000, suggesting
that there is no association between the variable “SEX” and the dependent variable “Happy”.
EXERCISES

1.
EXERCISES

2.
REFERENCES

Paul Newbold William L. Carlson Betty M. Thorne. (2023).

Statistics for Business and Economics. 10. Pearson
Education, Chapter 1,2

Thomas Cleff. (2019). Applied Statistics and Multivariate Data

Analysis for Business and Economics: A Modern Approach Using
SPSS, Stata, and Excel. , Chapter 3,4
THANK YOU
DESCRIPTIVE STATISTICS
UNIVERSITAS BINA NUSANTARA

December 2023

Project - Time Series Forecasting - Rajendra M Bhat
82% (11)
Project - Time Series Forecasting - Rajendra M Bhat
33 pages
LTSM Eng v1.2
No ratings yet
LTSM Eng v1.2
76 pages
Mike21 HD Step by Step
No ratings yet
Mike21 HD Step by Step
48 pages
FRM StudyGuide 092520
No ratings yet
FRM StudyGuide 092520
32 pages
SLIDES Statistics-Chapter 2
No ratings yet
SLIDES Statistics-Chapter 2
31 pages
Unit 01 Statistics
No ratings yet
Unit 01 Statistics
10 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
39 pages
1st Mid
No ratings yet
1st Mid
19 pages
Intro of Statistics - Ogive
No ratings yet
Intro of Statistics - Ogive
35 pages
Bustat Reviewer
No ratings yet
Bustat Reviewer
6 pages
Graphical Presentation
No ratings yet
Graphical Presentation
6 pages
Introduction To Statistics: "There Are Three Kinds of Lies: Lies, Damned Lies, and Statistics." (B.Disraeli)
No ratings yet
Introduction To Statistics: "There Are Three Kinds of Lies: Lies, Damned Lies, and Statistics." (B.Disraeli)
32 pages
Biostatistics Notes-numbered
No ratings yet
Biostatistics Notes-numbered
21 pages
Statistics For Begineers
No ratings yet
Statistics For Begineers
28 pages
E-Book On Essentials of Business Analytics: Group 7
No ratings yet
E-Book On Essentials of Business Analytics: Group 7
6 pages
01 Data & Statistics
No ratings yet
01 Data & Statistics
35 pages
CHAPTER 3 THE NATURE OF STATISTICS Copy 1
No ratings yet
CHAPTER 3 THE NATURE OF STATISTICS Copy 1
14 pages
Inferential Statistics
No ratings yet
Inferential Statistics
92 pages
Intro To Statistics Lecture
No ratings yet
Intro To Statistics Lecture
41 pages
Statistics A Review
No ratings yet
Statistics A Review
47 pages
Collection of Data Part 2 Edited MLIS
No ratings yet
Collection of Data Part 2 Edited MLIS
45 pages
Stats For PGDM
No ratings yet
Stats For PGDM
52 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
86 pages
Statistics- slide 2
No ratings yet
Statistics- slide 2
15 pages
Descriptive Statistics, Tables and Graphs 20
No ratings yet
Descriptive Statistics, Tables and Graphs 20
34 pages
2 Organizing and Visualizing Variables
No ratings yet
2 Organizing and Visualizing Variables
36 pages
Picturing Distributions With Graphs
No ratings yet
Picturing Distributions With Graphs
21 pages
2. presenting of data_١١١٠٥٩
No ratings yet
2. presenting of data_١١١٠٥٩
39 pages
Chapter 1 Descriptivestatistics
No ratings yet
Chapter 1 Descriptivestatistics
21 pages
2 Graphical Descriptive Techniques
No ratings yet
2 Graphical Descriptive Techniques
49 pages
Intro To Statistics
No ratings yet
Intro To Statistics
35 pages
Chapter1 Data Description PDF
No ratings yet
Chapter1 Data Description PDF
24 pages
STA 111 Note
No ratings yet
STA 111 Note
12 pages
Math 10
No ratings yet
Math 10
7 pages
IEM Outline Lecture Notes Autumn 2016
No ratings yet
IEM Outline Lecture Notes Autumn 2016
198 pages
MATH 201 Week 1 - Lecture Slides
No ratings yet
MATH 201 Week 1 - Lecture Slides
62 pages
Stats Methods
No ratings yet
Stats Methods
22 pages
Data Types: and Its Representation Session - 2 & 3
No ratings yet
Data Types: and Its Representation Session - 2 & 3
33 pages
Catatan Statisktik FIX
No ratings yet
Catatan Statisktik FIX
59 pages
Chapter 2
No ratings yet
Chapter 2
74 pages
Module 2 - Descriptive Statistics - PPT-3
No ratings yet
Module 2 - Descriptive Statistics - PPT-3
31 pages
Topic 3
No ratings yet
Topic 3
22 pages
Introduction To Stati Stics: There Are Three Kinds of Lies: Lies, Damned Lies, A ND Statistics." (B.Disraeli)
No ratings yet
Introduction To Stati Stics: There Are Three Kinds of Lies: Lies, Damned Lies, A ND Statistics." (B.Disraeli)
39 pages
Glossary
No ratings yet
Glossary
9 pages
Chap1 Student
No ratings yet
Chap1 Student
14 pages
NLARY-C1-STAT1181
No ratings yet
NLARY-C1-STAT1181
42 pages
Statistics Midterms Reviewer 1
No ratings yet
Statistics Midterms Reviewer 1
9 pages
BIOSTAT LESSON 2 - Descriptive Statistics
No ratings yet
BIOSTAT LESSON 2 - Descriptive Statistics
3 pages
W25_STAT1150_Unit
No ratings yet
W25_STAT1150_Unit
114 pages
Statanalysis C2a
No ratings yet
Statanalysis C2a
6 pages
SPSS - Unit I
No ratings yet
SPSS - Unit I
31 pages
Unit 2
No ratings yet
Unit 2
11 pages
Part1 141104090445 Conversion Gate01
No ratings yet
Part1 141104090445 Conversion Gate01
27 pages
Statistics - Basic Concepts
No ratings yet
Statistics - Basic Concepts
29 pages
ADDB - Week 1
No ratings yet
ADDB - Week 1
44 pages
SMA 140 Lectures Notes 2024 Sep
No ratings yet
SMA 140 Lectures Notes 2024 Sep
87 pages
Decision Science: Ken Black
No ratings yet
Decision Science: Ken Black
296 pages
Introduction Book 1
No ratings yet
Introduction Book 1
41 pages
Lesson 1: Engineering Data Analysis First Semester - A.Y. 2021 - 2022
No ratings yet
Lesson 1: Engineering Data Analysis First Semester - A.Y. 2021 - 2022
4 pages
Organizing-Data_250120_180858
No ratings yet
Organizing-Data_250120_180858
32 pages
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
Business Statistics For Dummies
From Everand
Business Statistics For Dummies
Alan Anderson
No ratings yet
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
WQU - Econometrics - Module - 7 - Compiled Content PDF
No ratings yet
WQU - Econometrics - Module - 7 - Compiled Content PDF
59 pages
Study of Temporal Data Mining Techniques IJERTV3IS100183
No ratings yet
Study of Temporal Data Mining Techniques IJERTV3IS100183
4 pages
Production Management BBA (II) MDU Students
88% (8)
Production Management BBA (II) MDU Students
112 pages
Package Imputets': July 1, 2019
No ratings yet
Package Imputets': July 1, 2019
29 pages
Applied Time Series Analysis
No ratings yet
Applied Time Series Analysis
200 pages
A Review of Basic Statistical Concepts: Answers To Problems and Cases 1
No ratings yet
A Review of Basic Statistical Concepts: Answers To Problems and Cases 1
94 pages
Academic Performance Prediction Based On Multisource, Multi Feature Behavioral Data
No ratings yet
Academic Performance Prediction Based On Multisource, Multi Feature Behavioral Data
6 pages
Resume
100% (3)
Resume
2 pages
Nhóm 13 - Chương 1
No ratings yet
Nhóm 13 - Chương 1
13 pages
Dynamic Linear Models
No ratings yet
Dynamic Linear Models
18 pages
A Deep Increasing-Decreasing-Linear Neural Network For Financial Time Series Prediction
No ratings yet
A Deep Increasing-Decreasing-Linear Neural Network For Financial Time Series Prediction
23 pages
1 s2.0 S0029801820314086 Main
No ratings yet
1 s2.0 S0029801820314086 Main
18 pages
AC Notes2013 PDF
No ratings yet
AC Notes2013 PDF
13 pages
Toledo 1341375203
No ratings yet
Toledo 1341375203
119 pages
HMM PDF
No ratings yet
HMM PDF
8 pages
SAP IBP Course
No ratings yet
SAP IBP Course
16 pages
Mee 505 Lecture Note 8
No ratings yet
Mee 505 Lecture Note 8
28 pages
2020 Hybrid Pressure Integration and Buffeting Analysis For Multi-Row Wind Loading in An Array of Single-Axis Trackers
No ratings yet
2020 Hybrid Pressure Integration and Buffeting Analysis For Multi-Row Wind Loading in An Array of Single-Axis Trackers
16 pages
Knime Bigdata Energy Timeseries Whitepaper
No ratings yet
Knime Bigdata Energy Timeseries Whitepaper
37 pages
Session 2 - Demand Planning in Supply Chain
100% (5)
Session 2 - Demand Planning in Supply Chain
24 pages
Sales Management Notes (Unit-2)
No ratings yet
Sales Management Notes (Unit-2)
20 pages
Wavelet Applications in Economics and Finance (PDFDrive)
100% (1)
Wavelet Applications in Economics and Finance (PDFDrive)
271 pages
Time Series Group Assignment
No ratings yet
Time Series Group Assignment
2 pages
Turkeys Monthly Demand Seasonal ANN (2019)
No ratings yet
Turkeys Monthly Demand Seasonal ANN (2019)
15 pages
Next Generation Spectrum Monitoring - Proactive, Autonomous and Data-Driven
100% (1)
Next Generation Spectrum Monitoring - Proactive, Autonomous and Data-Driven
35 pages
Tsvar
No ratings yet
Tsvar
12 pages