Detection of Outliers: Iglewicz and Hoaglin

This document discusses the detection and identification of outliers in data. It defines an outlier as an observation that deviates markedly from other observations. Outliers need to be identified for two main reasons: 1) outliers may indicate bad or incorrect data, and 2) outliers could be scientifically interesting observations rather than errors. The document outlines three issues related to outliers: labeling potential outliers, accommodating outliers in statistical analyses, and formally identifying outliers. It focuses on labeling and identifying outliers. Additionally, it notes that outliers should be identified assuming an approximately normal distribution, and normal probability plots, box plots, and histograms can help check this assumption and identify potential outliers.

Uploaded by

Joseph Tang

Available Formats

Download as PDF, TXT or read online on Scribd

Download as pdf or txt

0% found this document useful (0 votes)

133 views2 pages

Detection of Outliers: Iglewicz and Hoaglin

Uploaded by

Joseph Tang

Available Formats

Download as PDF, TXT or read online on Scribd

Download as pdf or txt

You are on page 1/ 2

1.3.5.17.

Detection of Outliers
Introduction An outlier is an observation that appears to deviate markedly
from other observations in the sample.

Identification of potential outliers is important for the

following reasons.

1. An outlier may indicate bad data. For example, the

data may have been coded incorrectly or an
experiment may not have been run correctly. If it can
be determined that an outlying point is in fact
erroneous, then the outlying value should be deleted
from the analysis (or corrected if possible).
2. In some cases, it may not be possible to determine if
an outlying point is bad data. Outliers may be due to
random variation or may indicate something
scientifically interesting. In any event, we typically
do not want to simply delete the outlying observation.
However, if the data contains significant outliers, we
may need to consider the use of robust statistical
techniques.

Labeling, Iglewicz and Hoaglin distinguish the three following issues

Accomodation, with regards to outliers.
Identification
1. outlier labeling - flag potential outliers for further
investigation (i.e., are the potential outliers erroneous
data, indicative of an inappropriate distributional
model, and so on).
2. outlier accomodation - use robust statistical
techniques that will not be unduly affected by
outliers. That is, if we cannot determine that potential
outliers are erroneous observations, do we need
modify our statistical analysis to more appropriately
account for these observations?
3. outlier identification - formally test whether
observations are outliers.

This section focuses on the labeling and identification issues.

Normality Identifying an observation as an outlier depends on the

Assumption underlying distribution of the data. In this section, we limit
the discussion to univariate data sets that are assumed to
follow an approximately normal distribution. If the normality
assumption for the data being tested is not valid, then a
determination that there is an outlier may in fact be due to the
non-normality of the data rather than the prescence of an
outlier.

For this reason, it is recommended that you generate

a normal probability plot of the data before applying an
outlier test. Although you can also perform formal tests for
normality, the prescence of one or more outliers may cause
the tests to reject normality when it is in fact a reasonable
assumption for applying the outlier test.

In addition to checking the normality assumption, the lower

and upper tails of the normal probability plot can be a useful
graphical technique for identifying potential outliers. In
particular, the plot can help determine whether we need to
check for a single outlier or whether we need to check for
multiple outliers.

The box plot and the histogram can also be useful graphical
tools in checking the normality assumption and in identifying
potential outliers.

Hologic Selenia Tech Workbook
83% (6)
Hologic Selenia Tech Workbook
53 pages
Solar Panel Installation IT Report
100% (2)
Solar Panel Installation IT Report
24 pages
Non-Normality and Outliers1
No ratings yet
Non-Normality and Outliers1
17 pages
How To Calculate Outliers
No ratings yet
How To Calculate Outliers
7 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
28 pages
Outlier Detection
No ratings yet
Outlier Detection
9 pages
Handling Outliers
No ratings yet
Handling Outliers
6 pages
A Review of Statistical Outlier Methods
No ratings yet
A Review of Statistical Outlier Methods
8 pages
Anomaly Detection and Outlier Analysis
No ratings yet
Anomaly Detection and Outlier Analysis
25 pages
1outlier - Wikipedia
No ratings yet
1outlier - Wikipedia
47 pages
Jones
No ratings yet
Jones
8 pages
Outlier: Occurrence and Causes
No ratings yet
Outlier: Occurrence and Causes
6 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
13 pages
Chapter 6 7 Anomaly Fraud Detection Advanced Datamining Application
No ratings yet
Chapter 6 7 Anomaly Fraud Detection Advanced Datamining Application
10 pages
12 Outlier
No ratings yet
12 Outlier
55 pages
Anomaly Detection
No ratings yet
Anomaly Detection
49 pages
Outliers
No ratings yet
Outliers
4 pages
Unit 5 Exploratory Data Analysis (EDA)
100% (1)
Unit 5 Exploratory Data Analysis (EDA)
41 pages
LECTURE 12
No ratings yet
LECTURE 12
54 pages
Outlier Detection
No ratings yet
Outlier Detection
45 pages
Outliers PDF
No ratings yet
Outliers PDF
5 pages
2009 Data Cleaning
No ratings yet
2009 Data Cleaning
8 pages
Outlier Analysis in Data Mining
No ratings yet
Outlier Analysis in Data Mining
5 pages
Data Minning Unit 4-1
No ratings yet
Data Minning Unit 4-1
10 pages
Outlier
No ratings yet
Outlier
9 pages
Formal Methods of Countering Deception and Misperception in Intelligence Analysis
No ratings yet
Formal Methods of Countering Deception and Misperception in Intelligence Analysis
29 pages
Data Cleaning
No ratings yet
Data Cleaning
4 pages
12Outlier-1
No ratings yet
12Outlier-1
45 pages
On Outliers, Statistical Risks, and A Resampling Approach Towards Statistical Inference
No ratings yet
On Outliers, Statistical Risks, and A Resampling Approach Towards Statistical Inference
14 pages
Missing Data
No ratings yet
Missing Data
5 pages
1preparing Data
No ratings yet
1preparing Data
6 pages
Missing Data and Multi Imputation
No ratings yet
Missing Data and Multi Imputation
5 pages
CMS 810 Assignment
No ratings yet
CMS 810 Assignment
3 pages
A_Survey_on_Anomalous_Topic_Discovery_in
No ratings yet
A_Survey_on_Anomalous_Topic_Discovery_in
7 pages
Missing and Outlier
No ratings yet
Missing and Outlier
20 pages
Bio Stat Problems 2
No ratings yet
Bio Stat Problems 2
15 pages
Answer For Adv Biostatistics
No ratings yet
Answer For Adv Biostatistics
26 pages
Outliers Intrusion Detection: Anomaly Detection, Also Referred To As Outlier Detection
No ratings yet
Outliers Intrusion Detection: Anomaly Detection, Also Referred To As Outlier Detection
1 page
Statistical Test Methods For Hypothesis Testing
No ratings yet
Statistical Test Methods For Hypothesis Testing
6 pages
Outliers CW
No ratings yet
Outliers CW
6 pages
DataScience Interview Questions
100% (1)
DataScience Interview Questions
66 pages
Data Science Interview Questions: Answer Here
No ratings yet
Data Science Interview Questions: Answer Here
54 pages
Lec3. Outlier Analysis
No ratings yet
Lec3. Outlier Analysis
54 pages
12 Outlier
No ratings yet
12 Outlier
18 pages
Be A 65 Ads Exp 7
No ratings yet
Be A 65 Ads Exp 7
7 pages
Safari - Feb 29, 2024 at 8:02 AM
No ratings yet
Safari - Feb 29, 2024 at 8:02 AM
1 page
Anomaly Detection in Partical Physics
No ratings yet
Anomaly Detection in Partical Physics
179 pages
Outlier Detection Techniques
100% (2)
Outlier Detection Techniques
56 pages
Handling Missing Data
No ratings yet
Handling Missing Data
23 pages
Data Cleaning
No ratings yet
Data Cleaning
8 pages
Statistics in Data Science Interview Questions
No ratings yet
Statistics in Data Science Interview Questions
2 pages
4_Outliers_+Transformaations ML
No ratings yet
4_Outliers_+Transformaations ML
28 pages
DATA ANALYSIS
No ratings yet
DATA ANALYSIS
27 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
Data Mining: Outlier Analysis - Presentation Transcript
No ratings yet
Data Mining: Outlier Analysis - Presentation Transcript
1 page
BA UNIT-3 - Part 1
No ratings yet
BA UNIT-3 - Part 1
4 pages
Chapter 4 Part 2
No ratings yet
Chapter 4 Part 2
12 pages
Unit 5
No ratings yet
Unit 5
70 pages
Midtemr-Data - Mining-Nguyen Tuan Hung - K194141723
No ratings yet
Midtemr-Data - Mining-Nguyen Tuan Hung - K194141723
3 pages
Data Preprocessing and Cleaning
No ratings yet
Data Preprocessing and Cleaning
6 pages
ABC of Clinical Reasoning
From Everand
ABC of Clinical Reasoning
Nicola Cooper
No ratings yet
Statistics: Practical Concept of Statistics for Data Scientists
From Everand
Statistics: Practical Concept of Statistics for Data Scientists
John Slavio
No ratings yet
Pastor Scotts Israel Trip 2023
No ratings yet
Pastor Scotts Israel Trip 2023
4 pages
Job Title: Advanced Automation Engineer
No ratings yet
Job Title: Advanced Automation Engineer
2 pages
Coulters Candy
No ratings yet
Coulters Candy
1 page
Quantitative Techniques: Confirmatory Statistics
No ratings yet
Quantitative Techniques: Confirmatory Statistics
3 pages
Probability Plot Correlation Coefficient Plot
No ratings yet
Probability Plot Correlation Coefficient Plot
2 pages
What Is Design of Experiments (DOE) ?: Systematic Approach To Data Collection
No ratings yet
What Is Design of Experiments (DOE) ?: Systematic Approach To Data Collection
1 page
Short
No ratings yet
Short
1 page
How Can I Tell If A Model Fits My Data?: Is Not Enough!
No ratings yet
How Can I Tell If A Model Fits My Data?: Is Not Enough!
2 pages
SCM5B38: Strain Gage Input Modules, Narrow Bandwidth
No ratings yet
SCM5B38: Strain Gage Input Modules, Narrow Bandwidth
2 pages
Short
No ratings yet
Short
1 page
Short
No ratings yet
Short
1 page
MPU-6050 - DataSheet - V3 4 PDF
No ratings yet
MPU-6050 - DataSheet - V3 4 PDF
52 pages
Notifier VeriFire Tools CD Insert
No ratings yet
Notifier VeriFire Tools CD Insert
12 pages
Reporting and Sharing Research Output
50% (2)
Reporting and Sharing Research Output
3 pages
Service Manual For Dynamove Vector Ii: Konecranes
100% (2)
Service Manual For Dynamove Vector Ii: Konecranes
20 pages
Class Number: Name: Section: Schedule: Date:: Nursing Informatics - Lecture & Laboratory Module #1 Student Activity Sheet
No ratings yet
Class Number: Name: Section: Schedule: Date:: Nursing Informatics - Lecture & Laboratory Module #1 Student Activity Sheet
3 pages
Algorithmique Et Programmation en C: Cours Avec 200 Exercices Corrigés
No ratings yet
Algorithmique Et Programmation en C: Cours Avec 200 Exercices Corrigés
298 pages
Stat
No ratings yet
Stat
88 pages
GARD G6000: Installation Instructions For Swift Road Barriers
No ratings yet
GARD G6000: Installation Instructions For Swift Road Barriers
24 pages
Annual Report 2018 PDF
No ratings yet
Annual Report 2018 PDF
444 pages
The Hidden Data Economy: The Marketplace For Stolen Digital Information
No ratings yet
The Hidden Data Economy: The Marketplace For Stolen Digital Information
15 pages
Iritel - WDM CWDM and Oadm Solution
No ratings yet
Iritel - WDM CWDM and Oadm Solution
2 pages
Software Development with Go Cloud Native Programming using Golang with Linux and Docker 1st Edition Nanik Tolaram download pdf
100% (5)
Software Development with Go Cloud Native Programming using Golang with Linux and Docker 1st Edition Nanik Tolaram download pdf
50 pages
Complementing Digital Logic Design With Logisim: Nebfx+ ) SG 6B+ BXD - GD B G 6
No ratings yet
Complementing Digital Logic Design With Logisim: Nebfx+ ) SG 6B+ BXD - GD B G 6
8 pages
Mcs 32
100% (1)
Mcs 32
11 pages
What Is The Internet - Definition, Uses, History - Javatpoint
No ratings yet
What Is The Internet - Definition, Uses, History - Javatpoint
15 pages
Big Data of Materials Science: Critical Role of The Descriptor
No ratings yet
Big Data of Materials Science: Critical Role of The Descriptor
5 pages
Answer All Questions, Each Carries 3 Marks: Reg No.: - Name
No ratings yet
Answer All Questions, Each Carries 3 Marks: Reg No.: - Name
2 pages
LTspice Shortcuts
No ratings yet
LTspice Shortcuts
1 page
Lab 2
No ratings yet
Lab 2
29 pages
PHOENIX Proposal Report
No ratings yet
PHOENIX Proposal Report
23 pages
Software Build - Overview
No ratings yet
Software Build - Overview
16 pages
Instructor's Manual For Multivariate Data Analysis: A Global Perspective Seventh Edition
No ratings yet
Instructor's Manual For Multivariate Data Analysis: A Global Perspective Seventh Edition
18 pages
DX Diag
No ratings yet
DX Diag
13 pages
CMA Inter (Dec'24 Attempt) - GRP 2
No ratings yet
CMA Inter (Dec'24 Attempt) - GRP 2
2 pages
11.Zero-Crossing Detector For Ac
No ratings yet
11.Zero-Crossing Detector For Ac
6 pages
Nis SBG
No ratings yet
Nis SBG
2 pages
REMVue Basic Download Manual v1.0
No ratings yet
REMVue Basic Download Manual v1.0
38 pages
Adams-Bashforth Methods
No ratings yet
Adams-Bashforth Methods
5 pages

Detection of Outliers: Iglewicz and Hoaglin

Uploaded by

Detection of Outliers: Iglewicz and Hoaglin

Uploaded by

1.3.5.17.

Identification of potential outliers is important for the

1. An outlier may indicate bad data. For example, the

Labeling, Iglewicz and Hoaglin distinguish the three following issues

This section focuses on the labeling and identification issues.

Normality Identifying an observation as an outlier depends on the

For this reason, it is recommended that you generate

In addition to checking the normality assumption, the lower

You might also like