Notes (Chapter 1 - 3)

 ORDINAL
CHAPTER 1 – DATA AND STATISTICS  The data have the properties of nominal data and the
What is statistics? order or rank of the data is meaningful
 can refer to numerical facts (i.e. averages, medians,  A nonnumeric label or numeric code may be used
percentages, and maximums) that help us understand a  Example: The nonnumeric rating labels from AAA to F
variety of business and economic situations used for Fitch rating. These can be rank ordered from
 can also refer to the art and science  collecting, best credit rating AAA to poorest credit rating F.
analyzing, presenting, and interpreting data Numerical code can also be used - Class rank of a
student in school.
Applications in Business and Economics  INTERVAL
 ACCOUNTING  The data have the properties of ordinal data, and the
 Public accounting firms use statistical sampling interval between observations is expressed in terms
procedures when conducting audits for their clients. of a fixed unit of measure.
 ECONOMICS  Interval data is always numeric
 Economists use statistical information in making  Example: Melissa has a SAT score of 1985, while Kevin
forecasts about the future of the economy or some has a SAT score of 1880. Melissa scored 105 points
aspect of it. more than Kevin.
 FINANCE  RATIO
 Financial advisors use price-earnings ratios and  The data have all the properties of interval data and
dividend yields to guide their investment advice. the ratio of two values is meaningful.
 MARKETING  Variables such as distance, height, weight, and time
 Electronic point-of-sale scanners at retail checkout use the ratio scale.
counters are used to collect data for a variety of  This scale must contain a zero value that indicates
marketing research applications. that nothing exists for the variable at the zero point.
 PRODUCTION  Example: Melissa’s college record shows 36 credit
 A variety of statistical quality control charts are used to hours earned, while Kevin’s record shows 72 credit
monitor the output of a production process. hours earned. Kevin has twice as many credit hours
 INFORMATION SYSTEMS earned as Melissa.  1:2
 A variety of statistical information helps administrators
Categorial and Quantitative Data
assess the performance of computer networks.
Data can be further classified as being categorical or
quantitative.
Data and Data Sets
 Categorical Data
 Data  are the facts and figures collected, analyzed, and
o Labels or names used to identify an attribute of each
summarized for presentation and interpretation.
element
 Data Sets  refers to all the data collected in a particular
o Often referred to as qualitative data
study
o Use either the nominal or ordinal scale of
Elements, Variables, and Observations measurement
 Elements  are the entities on which data are collected o Can be either numeric or nonnumeric
 Variables  is a characteristic of interest for the elements o Appropriate statistical analysis is rather limited
 Observation  the set of measurements obtained for a  Quantitative Data
particular element o Indicates how many or how much
o a data set with n elements contains n observations  Discrete  if measuring how many
o the total no. of data values in a complete data set is  Continuous  if measuring how much
the number of elements multiplied by the number of o are always numeric
variables o ordinary arithmetic operations are meaningful for
quantitative data.
Scales of Measurement (NOIR)

 the scale determines the amount of information contained
in the data. It also indicates the data summarization and
statistical analyses that are most appropriate. Cross-Sectional Data
 Scales of measurement includes:  are collected at the same or approximately the same point
 NOMINAL in time.
 Data are labels or names used to identify an attribute  Example: Data detailing different variables like status, Per
of the element. capita GDP, Fitch rating for 60 different WTO nations at the
 A nonnumeric label or numeric code may be used same point in time.
 Example: The WTO status category for the nations in Time Series Data
the previous example is classified using  are collected over several time periods
nonnumerical labels – “member” and “observer”.
 Example: U.S average price per gallon of conventional
Alternatively, a numeric code could be used for the
regular gasoline between 2010 and 2015.
WTO status variable by letting 1 denote a member
 Graphs of time series help analysts understand
nation and 2 denote an observer nation.
 what happened in the past,
 identify any trends over time, and Descriptive Statistics
 project future values for the time series  Most of the statistical information in newspapers,
magazines, company reports, and other publications
consists of data that are summarized and presented in a
form that is easy to understand.
 Such summaries of data, which may be tabular,
graphical, or numerical, are referred to as descriptive
statistics.
 Example: Hudson Auto Repair. The manager of Hudson
Auto would like to have a better understanding of the
cost of parts used in the engine tune-ups performed in
her shop. She examines 50 customer invoices for tune-
ups. The costs of parts, rounded to the nearest dollar, are
listed on the next slide.
Data Sources
 EXISTING SOURCES
 DATA AVAILABLE FROM INTERNAL COMPANY RECORDS
 DATA AVAILABLE FORM SELECTED GOVERNMENT
AGENCIES
 STATISTICAL STUDIES – OBSERVATIONAL
 In observational (nonexperimental) studies no attempt
is made to control or influence the variables of interest
 Example: survey (studies of smokers and nonsmokers
are observational studies because researchers do not
determine or control who will smoke and who will not
smoke)
 STATISTICAL STUDIES – EXPERIMENTAL Numerical Descriptive Statistics
 In experimental studies the variable of interest is first  most common NDS  mean (or average)  demonstrates
identified. Then one or more variables are identified a measure of the central tendency, or central location, of
and controlled so that data can be obtained about how the data for a variable.
they influence the variable of interest.  Example: Hudson’s mean cost of parts, based on the 50
 Example: The largest experimental study ever tune-ups studied, is $79 (found by summing up the 50 cost
conducted is believed to be the 1954 Public Health values and then dividing by 50).
Service experiment for the Salk polio vaccine. Nearly
Statistical Inference
two million U.S. children (grades 1- 3) were selected.
 Population  The set of all elements of interest in a
Data Acquisition Considerations particular study
 TIME REQUIREMENT  Sample  A subset of the population
 searching for information can be time consuming  Statistical Inference  The process of using data obtained
 information may no longer be useful by the time it is from a sample to make estimates and test hypotheses
available about the characteristics of a population.
 COST OF ACQUISITION  Census  Collecting data for the entire population
 Organizations often charge for information even  Sample Survey  collecting data for a sample
when it is not their primary business activity.
 DATA ERRORS
 Using any data that happen to be available or were
acquired with little care can lead to misleading
information.
Statistical Analysis Using Microsoft Excel  the most effective data mining systems use automated
procedures to discover relationships in the data and
predict future outcomes, … prompted by only general,
even vague, queries by the user.
 the major applications of data mining have been made by
companies with a strong consumer focus such as retail,
financial, and communication firms.
 as another example, data mining is used to identify
customers who should receive special discount offers
based on their past purchasing volumes.
 Requirements:
 Statistical methodology (i.e. multiple regression, logistic
regression, and correlation) are heavily used.
 Computer science technologies are also needed in
relation to the involving artificial intelligence and
machine learning.
 Significant investment in time and money
 Model Reliability:
 Statistical model for a particular sample may not be
applicable to other data
 Data set can be partitioned into: training set (model
development) & test set (validating the model)
 over fitting the model can cause danger  misleading
associations & conclusions appear to exist
 careful interpretation of results and extensive testing is
important
Ethical Guidelines of Statistical Practice
 unethical behavior can take a variety of forms including:
 improper sampling
 Inappropriate analysis of the data
 Development of misleading graphs
 Use of inappropriate summary statistics
 Biased interpretation of the statistical results
 Be fair, thorough, objective, and neutral as you collect,
analyze, and present data.
 “Ethical Guidelines for Statistical Practice”  developed by
American Statistical Association
 It contains 67 guidelines organized into 8 topic area:
 Professionalism
 Responsibilities to Funders, Clients, Employers
 Responsibilities in Publications and Testimony
 Responsibilities to Research Subjects
 Responsibilities to Research Team Colleagues
Analytics  Responsibilities to Other
 Scientific process of transforming data into insight for Statisticians/Practitioners
making better decisions.  Responsibilities Regarding Allegations of
 Types: Misconduct
 DESCRIPTIVE A.  Analytical techniques that describe  Responsibilities of Employers Including
what happened in the past Organizations, Individuals, Attorneys, or Other
 PREDICTIVE A.  Analytical techniques that use models Clients
constructed from past data to predict future. It help
assess the impact of one variable on another
 PRESCRIPTIVE A.  Analytical techniques that yield a
best course of action to take
Data Warehousing
 is capturing, storing, and maintaining the data and it is a
significant undertaking.
 Organizations obtain large amounts of data on a daily basis
by means of magnetic card readers, bar code scanners,
point of sale terminals, and touch screen monitors.
 Example(s): Wal-Mart captures data on 20-30 million
transactions per day; Visa processes 6,800 payment
transactions per second.
Data Mining
 is used to identify related products that customers who
have already purchased a specific product are also likely to
purchase (and then pop-ups are used to draw attention to
those related products).
CHAPTER 2A: DESCRIPTIVE
STATISTICS (TABULAR AND
GRAPHICAL DISPLAYS
Categorical Data
 FREQUENCY DISTRIBUTION
- is a tabular summary of data showing the number
(frequency) of observations in each of several non-
overlapping categories or classes.
- Objective: to provide insights about the data that cannot
be quickly obtained by looking only at the original data.
 BAR CHART
- is a graphical display for depicting qualitative data
- are used to identify the most important causes of
problems.
- horizontal axis  we specify the labels that are used for
each class
- vertical axis  frequency, relative frequency, or percent
frequency scale
- bar or fixed width  drawn above each class label, we
extend the height appropriately
- bars are separated  to emphasize the fact that each
class is separate.
- Pareto Diagram  When the bars are arranged in
descending order of height from left to right (with the
most frequently occurring cause appearing first) 
founder “Vilfredo Pareto”, an Italian economist.
 RELATIVE FREQUENCY DISTRIBUTION

- Relative frequency  is the fraction or proportion of the
total number of data items belonging to a class.
- Relative frequency distribution  is a tabular summary

of a set of data showing the relative frequency for each
class.
 PERCENT FREQUENCY DISTRIBUTION
- Percent frequency  is the relative frequency multiplied
by 100.
- Percent frequency distribution  is a tabular summary
of a set of data showing the percent frequency for each
class.
Quantitative Data
 FREQUENCY DISTRIBUTION
- Example: Sanderson and Clifford, a small public
accounting firm wants to determine time in days
required to complete year end audits. It takes a
sample of 20 clients.
Year-end Audit Time (in days)

12 14 19 18 15
15 18 17 20 27
22 23 22 21 33
28 14 18 16 13
- Three (3) steps necessary to define the classes for a
frequency distribution with quantitative data are:
o Step 1: determine the number of non-
overlapping classes.
o Step 2: Determine the width of each class.
o Step 3: Determine the class limits.
 PIE CHART
- is a commonly used graphical display for presenting
relative frequency and percent frequency distributions
for categorical data.
- Inferences from the Pie Chart:

 Almost one-half of the customers surveyed
preferred Pepsi (looking at the left side of the pie).
 The second preference is for Dr. Pepper with 25% of
the customers opting for it.
 Only 5% of the customers opted for Sprite.
 RELATIVE FREQUENCY AND PERCENT FREQUENCY
DISTRIBUTIONS
- Example: Sanderson and Clifford
 CUMULATIVE DISTRIBUTIONS
- Cumulative Frequency D.  shows the number of
items with values less than or equal to the upper limit
of each class. (Last entry = total no. of observations)
- Insights obtained from Percent Frequency - Cumulative Relative FD.  shows the proportion of
Distribution: items with values less than or equal to the upper limit
 40% of the audits required from 15 to 19 days of each class. (Last entry = 1.00)
 Another 25% of the audits required 20 to 25 days - Cumulative Percent FD  shows the percentage of
 Only 5% of the audits required more than 30 days items with values less than or equal to the upper limit
 DOT PLOT of each class. (Last entry = 100)
- one of the simplest graphical summaries of data - Example: Sanderson and Clifford
- horizontal axis  range of data values
- then each data value is represented by a dot placed
above the axis
- Example: Sanderson and Clifford
 STEM-AND-LEAF DISPLAY
- shows both the rank order and shape of the
distribution of the data.
 HISTOGRAM - It is similar to a histogram on its side, but it has the
- Common graphical display of quantitative data advantage of showing the actual data values.
- Horizontal axis  variable of interest - the first digits of each data item are arranged to the
- A rectangle is drawn above each class interval with its left of a vertical line.
height corresponding to the interval’s frequency, - to the right of the vertical line we record the last digit
relative frequency, or percent frequency. for each item in rank order.
- has no natural separation between rectangles of - each line (row) in the display is referred to as a stem.
adjacent classes. - Each digit on a stem is a leaf.
- Example: Sanderson and Clifford - Example: The number of questions answered
correctly on an aptitude test by 50 students analysed
with the help of a Stem – and – leaf display here. The
relevant data is given in the following table.
No. of questions answered correctly by 50 students

112 73 126 82 92 115 95 84 68 100
72 92 128 104 108 76 141 119 98 85
69 76 118 132 96 91 81 113 115 94
97 86 127 134 100 102 80 98 106 106
107 73 124 83 92 81 106 75 95 119
- Histogram showing skewness:

 Symmetrical  Left tail is the mirror image of the
right tail (e.g. heights of people)
 Moderately Skewed Left  A longer tail to the left

(e.g. exam scores)
 Moderately Right Skewed  A Longer tail to the

right (e.g. housing values)
 Highly Skewed Right  A very long tail to the right
(e.g. executive salaries)
CHAPTER 2B: DESCRIPTIVE STATISTICS
(TABULAR AND GRAPHICAL DISPLAYS)
Crosstabulation
- Is a method of summarizing the data for two variables
- tabular summary of data for two variables
- can be used when:
 one variables is categorical and the other is
quantitative,
 both variables are categorical, or
 both variables are quantitative.
 The left and top margin labels define the classes for
the two variables.
- Example: Zagat’s Restaurant Review
Crosstabulation of quality rating and meal price data
for 300 Los angeles restaurants is given here.
- Insights gained from preceding crosstabulation
 Greatest number of restaurants in the sample (64)
have a very good rating and the meal price in the Stacked Bar Chart
$20-29 range. - It is a bar chart in which each bar is broken into rectangular
 Only 2 restaurants have an excellent rating and a segments of a different color.
meal price in the range of $10-19 range. - If percentage frequencies are displayed, all bars will be of
- Row or Column Percentages the same height (or length), extending to the 100% mark.
 Converting the entries in the table into row
percentages or column percentages can provide
additional insight about the relationship between
the two variables.
- Simpson’s Paradox  the reversal of conclusions based Data Visualization: Best Practices in Creating Effective
on aggregate and unaggregated data. Graphical Displays
- Scatter diagrams and trendlines  are useful in - Data Visualization  describes the use of graphical
exploring the relationship between two variables. displays to summarize and present information about a
 Scatter Diagram  is a graphical presentation of data set.
the relationship between two quantitative - The goal is to communicate as effectively and clearly as
variables. possible the key information about the data.
 One variable is shown on the horizontal axis
Choosing the Type of Graphical Display
and the other variable is shown on the vertical
Displays used to show the distribution of data:
axis.
 Bar Chart  to show the frequency distribution or
 The general pattern of the plotted points
relative frequency distribution for categorical data
suggests the overall relationship between the
variables.  Pie Chart  to show the relative frequency or percent
 Trendline  provides an approximation of the frequency for categorical data
relationship  Dot Plot  to show the distribution for quantitative
data over the entire range of the data
 Histogram  to show the frequency distribution for
quantitative data over a set of class intervals
 Stem-and-Leaf Display  to show both the rank order
and shape of the distribution for quantitative data
Display used to make comparisons:
 Side-by-Side Chart  to compare two variables
 Stacked Bar Chart  to compare the relative frequency
or Percent frequency of two categorical variables
Display used to show relationships:
 Scatter Diagram  to show the relationship between
two quantitative variables
 Trendline  to approximate the relationship of data in
a scatter diagram
Data Dashboard
 Data dashboard  widely used data visualization tool
 It organizes and presents key performance indicators
(KPIs) used to monitor an organization or process.
 It provides timely, summary information that is easy to
read, understand, and interpret.
Side-by-side bar chart
 Some additional guidelines include . . .
- is a graphical display for depicting multiple bar charts on
o Minimize the need for screen scrolling
the same display.
o Avoid unnecessary use of color or 3D
- Each cluster of bars represents one value of the first
o Use borders between charts to improve readability
variable
- Each bar within a cluster represents one value of the
second variable.
CHAPTER 3A: DESCRIPTIVE STATISTICS
(NUMERICAL MEASURES)
Numerical Measures
 Sample statistics  if the measure is computed for data
from a sample.
 Population parameters  If the measures are computed
for data from a population.
 Point estimator  a sample statistic of the
corresponding population parameter.
Measures of Location o Example: the 5% trimmed mean is obtained by
 MEAN [Excel Function  AVERAGE(data cell range)] removing the smallest 5% and the largest 5% of
- most important measure of location the data values and then computing the mean of
- provides a measure of central point the remaining values.
- the mean of a data set is the average of all the data  MODE [Excel Function  MODE.SNGL(data cell range)]
values - is the value that occurs with greatest frequency.
- The sample mean x́ is the point estimator of the - greatest frequency can occur at two or more different
population mean µ. values
- Bimodal  If the data have exactly two modes
- Multimodal  If the data have more than two modes
- Example: Monthly Starting Salary
The only monthly starting
salary that occurs more than
Sample Mean once is $3,880. Mode = 3,880
Population Mean
- Example: Monthly Starting Salary Note: Data is in ascending order.

A placement office wants to know the
average starting salary of business graduates.
Monthly starting salaries for a sample of 12 business
school graduates is provided here. Using Excel to Compute
the Mean, Median, and
Mode
 WEIGHTED MEAN
- In some instances the mean is computed by giving
 MEDIAN [Excel
each observation a weight that reflects its relative
Function 
importance. The choice of weights depends on the
MEDIAN(data
application (e.g. no. of credit hrs. earned for each
cell range)]
grade, GPA)
- is the value in the middle when the data items are
arranged in ascending order (least to greatest).
- is the measure of location most often reported for
annual income and property value data.
- Whenever a data set has extreme values, median is
the preferred measure of central location. A few
extremely large incomes or property values can - Example: Purchase of Raw Material
inflate the mean. Consider the following sample of five
purchases of a raw material over a period of three
months:
- Trimmed Mean
o another measure sometimes used when extreme
values are present
o it is obtained by deleting a percentage of the
smallest and largest values from a data set and
then computing the mean of the remaining
values.
 GEOMETRIC MEAN [Excel F.  GEOMEAN(data cell
range)]
- is calculated by finding the nth root of the product of
n values.
- It is often used in analyzing growth rates in financial
data (where using the arithmetic mean will provide
misleading results).
- It should be applied anytime you want to determine
the mean rate of change over several successive
periods (be it years, quarters, weeks, . . .).
- Other common applications include: changes in
populations of species, crop yields, pollution levels,
and birth and death rates.
- Example: Mutual Fund
 QUARTILES [Excel F.  QUARTILE.EXC (array,QUART)]

- 1st quartile  25th percentile
- 2nd quartile  50th percentile = median
- 3rd quartile  75th percentile
- Example: Monthly Starting Salary  3rd Q. (75th P.)

PERCENTILES [Excel F.  PERCENTILE.EXC(data range,
p/100)
- provides information about how the data are spread
over the interval from the smallest value to the largest
value. (e.g. Admission test scores for colleges and
universities are frequently reported in terms of
Measures of Variability
percentiles.)
- It is often desirable to consider measures of variability
- pth percentile  is a value such that at least p percent
(dispersion), as well as measures of location.
of the items take on this value or less and at least (100
- E.g. in choosing supplier A or supplier B we might consider
- p) percent of the items take on this value or more.
not only the average delivery time for each, but also the
variability in delivery time for each.
 RANGE
- is the difference between the largest and smallest
data values.
- Example: Monthly Starting Salary (80th percentile)
- It is the simplest measure of variability

- It is very sensitive to the smallest and largest data
values
 INTERQUARTILE RANGE
- is the difference between the third quartile and the
first quartile.
- It is the range for the middle 50% of the data
- It overcomes the sensitivity to extreme data values
3,950+4,050
=4 k
3,880+3,850 2
=3,865
2
 VARIANCE [Excel F.  VAR.S(data cell range)

- is a measure of variability that utilizes all the data
- It is based on the difference between the value of
each observation (xi) and the mean ( x́ for a sample,
m for a population).
- is useful in comparing the variability of two or more
variables.
- is the average of the squared differences between
each data value and the mean.
 STANDARD DEVIATION [Excel F.  STDEV.S(data cell

range)
- is the positive square root of the variance.
- It is measured in the same units as the data, making
it more easily interpreted than the variance.
 COEFFICIENT OF VARIATION
- indicates how large the standard deviation is in
relation to the mean.
CHAPTER 3B: DESCRIPTIVE STATISTICS

(NUMERICAL MEASURES)
Measures of Distribution Shape, Relative Location, and
Detecting Outliers
Example: Monthly Starting Salary (Variance, Standard Dev.,  DISTRIBUTION SHAPE
and Coefficient of Var.) - Skewness  An important measure of the shape of a
distribution. It can be easily computed using statistical
software.
o Formula: (sample data)
o Symmetric (not skewed) Xi
- skewness is zero
- mean and median is equal
 CHEBYSHEV’S THEOREM
o Moderately Skewed Left - At least (1 - 1/z2) of the items in any data set will be
- skewness is negative within z standard deviations of the mean, where z is
- mean will usually be less than the median any value greater than 1.
- Chebyshev’s theorem requires z > 1, but z need not be
an integer.
- At least 75% of the data values must be within z = 2
standard deviations of the mean.
standard deviations of the mean.
o Moderately Skewed Right standard deviations of the mean.
- skewness is positive - Example: Marks of Students
- mean is usually be more than the median Suppose the marks of 100 students in a
course had a mean of 70 and a standard deviation of
5. We want to know the number of students having
test scores between 60 and 80.
60 and 80 are 2 standard deviations below

and above the mean respectively.
o Highly Skewed Right

- skewness is positive (often above 1.0)
- mean is usually be more than the median
 EMPIRICAL RULE
- When the data are believed to approximate a bell-
 Z-SCORES shaped distribution.
- is often called the standardized value - can be used to determine the percentage of data
- It denotes the number of standard deviations a data values that must be within a specified number of
value xi is from the mean. standard deviations of the mean.
- rule is based on the normal distribution (chap.6)
- For data having a bell-shaped distribution:
o Approximately 68% of the data values will be
within +/- 1 standard deviation of its mean.
- Excel’s STANDARDIZE function can be used to
o Approximately 95% of the data values will be
compute the z-score.
within +/- 2 standard deviations of its mean.
- observation’s z-score  is a measure of the relative
o Almost all of the data values will be within +/- 3
location of the observation in a data set.
standard deviations of its mean.
- A data value less than the sample mean will have a z-
score less than zero.
- greater than  sample mean will have a z-score is
greater than zero
- equal  the sample mean will have a z-score of zero
- Example: Class size data
 DETECTING OUTLIERS
- Outlier  is an unusually small or unusually large
value in a data set.
- A data value with a z-score less than -3 or greater than
+3 might be considered an outlier.
- It might be:
o an incorrectly recorded data value
o a data value that was incorrectly included in the
data set
o a correctly recorded unusual data value that
belongs in the data set
- Example: Class size data
Measures of Association Between Two Variables

- Two descriptive measures of the relationship between two
variables are covariance and correlation coefficient.
Five-number Summaries and Box Plots o COVARIANCE
 Smallest value - is a measure of the linear association between
 First (1st) quartile two variables.
 Median - Positive Values  positive relationship & vice
versa
 Third (3rd) quartile
 Largest Value
o CORRELATION COEFFICIENT
- Correlation  is a measure of linear association
and not necessarily causation.
- Just because two variables are highly correlated, it
Box Plot does not mean that one variable is the cause of
- is a graphical summary of data that is based on a five- the other.
number summary.
- A key to the development of a box plot is the computation
of the median and the quartiles Q1 and Q3.
- Box plots provide another way to identify outliers
o A box is drawn with its ends located at the first and
third quartiles.
o A vertical line is drawn in the box at the location of - The coefficient can take on values between -1 and
the median (second quartile). +1
o Limits are located (not drawn) using the interquartile o Strong negative linear relationship  values
near -1
o Strong positive linear relationship  values
near +1
- The closer the correlation is to zero, the weaker
the relationship.
- Example: Stereo and Sound Equipment Store
The store’s manager wants to determine
the relationship between the number of weekend
television commercials shown and the sales at the
store during the following week
range (IQR).
o Data outside these limits are considered outliers
o The locations of each outlier is shown with the
symbol  ●
Data Dashboards: Adding Numerical Measures to Improve
Effectiveness
- Data dashboards are not limited to graphical displays.
- The addition of numerical measures, such as the mean
and standard deviation of KPIs, to a data dashboard is
often critical.
- Dashboards are often interactive
- Drilling Down  refers to functionality in interactive
dashboards that allows the user to access information
and analyses at increasingly detailed level.

Notes (Chapter 1 - 3)

Uploaded by

Notes (Chapter 1 - 3)

Uploaded by

 ORDINAL

Scales of Measurement (NOIR)

 DATA AVAILABLE FROM INTERNAL COMPANY RECORDS

 DATA AVAILABLE FORM SELECTED GOVERNMENT

 RELATIVE FREQUENCY DISTRIBUTION

- Relative frequency distribution  is a tabular summary

Year-end Audit Time (in days)

- Inferences from the Pie Chart:

No. of questions answered correctly by 50 students

- Histogram showing skewness:

 Moderately Skewed Left  A longer tail to the left

 Moderately Right Skewed  A Longer tail to the

- Example: Monthly Starting Salary Note: Data is in ascending order.

- Example: Mutual Fund

 QUARTILES [Excel F.  QUARTILE.EXC (array,QUART)]

- It is the simplest measure of variability

 VARIANCE [Excel F.  VAR.S(data cell range)

 STANDARD DEVIATION [Excel F.  STDEV.S(data cell

CHAPTER 3B: DESCRIPTIVE STATISTICS

60 and 80 are 2 standard deviations below

o Highly Skewed Right

Measures of Association Between Two Variables

You might also like