0% found this document useful (0 votes)
133 views2 pages

Detection of Outliers: Iglewicz and Hoaglin

This document discusses the detection and identification of outliers in data. It defines an outlier as an observation that deviates markedly from other observations. Outliers need to be identified for two main reasons: 1) outliers may indicate bad or incorrect data, and 2) outliers could be scientifically interesting observations rather than errors. The document outlines three issues related to outliers: labeling potential outliers, accommodating outliers in statistical analyses, and formally identifying outliers. It focuses on labeling and identifying outliers. Additionally, it notes that outliers should be identified assuming an approximately normal distribution, and normal probability plots, box plots, and histograms can help check this assumption and identify potential outliers.

Uploaded by

Joseph Tang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
133 views2 pages

Detection of Outliers: Iglewicz and Hoaglin

This document discusses the detection and identification of outliers in data. It defines an outlier as an observation that deviates markedly from other observations. Outliers need to be identified for two main reasons: 1) outliers may indicate bad or incorrect data, and 2) outliers could be scientifically interesting observations rather than errors. The document outlines three issues related to outliers: labeling potential outliers, accommodating outliers in statistical analyses, and formally identifying outliers. It focuses on labeling and identifying outliers. Additionally, it notes that outliers should be identified assuming an approximately normal distribution, and normal probability plots, box plots, and histograms can help check this assumption and identify potential outliers.

Uploaded by

Joseph Tang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 2

1.3.5.17.

Detection of Outliers
Introduction An outlier is an observation that appears to deviate markedly
from other observations in the sample.

Identification of potential outliers is important for the


following reasons.

1. An outlier may indicate bad data. For example, the


data may have been coded incorrectly or an
experiment may not have been run correctly. If it can
be determined that an outlying point is in fact
erroneous, then the outlying value should be deleted
from the analysis (or corrected if possible).
2. In some cases, it may not be possible to determine if
an outlying point is bad data. Outliers may be due to
random variation or may indicate something
scientifically interesting. In any event, we typically
do not want to simply delete the outlying observation.
However, if the data contains significant outliers, we
may need to consider the use of robust statistical
techniques.

Labeling, Iglewicz and Hoaglin distinguish the three following issues


Accomodation, with regards to outliers.
Identification
1. outlier labeling - flag potential outliers for further
investigation (i.e., are the potential outliers erroneous
data, indicative of an inappropriate distributional
model, and so on).
2. outlier accomodation - use robust statistical
techniques that will not be unduly affected by
outliers. That is, if we cannot determine that potential
outliers are erroneous observations, do we need
modify our statistical analysis to more appropriately
account for these observations?
3. outlier identification - formally test whether
observations are outliers.

This section focuses on the labeling and identification issues.

Normality Identifying an observation as an outlier depends on the


Assumption underlying distribution of the data. In this section, we limit
the discussion to univariate data sets that are assumed to
follow an approximately normal distribution. If the normality
assumption for the data being tested is not valid, then a
determination that there is an outlier may in fact be due to the
non-normality of the data rather than the prescence of an
outlier.

For this reason, it is recommended that you generate


a normal probability plot of the data before applying an
outlier test. Although you can also perform formal tests for
normality, the prescence of one or more outliers may cause
the tests to reject normality when it is in fact a reasonable
assumption for applying the outlier test.

In addition to checking the normality assumption, the lower


and upper tails of the normal probability plot can be a useful
graphical technique for identifying potential outliers. In
particular, the plot can help determine whether we need to
check for a single outlier or whether we need to check for
multiple outliers.

The box plot and the histogram can also be useful graphical
tools in checking the normality assumption and in identifying
potential outliers.

You might also like