Anomaly Detection: Course: Data Mining II

Anomaly Detection
Course:
Data Mining II
June, 18th, 2020
Jovelson Aguilar Sabino Junior

Konstantin Köhler
Contents
1. Introduction....................................................................................................................... 2
2. Challenges and issues........................................................................................................ 4
3. Methods............................................................................................................................. 6
4. Conclusion. ........................................................................................................................10
5. References .........................................................................................................................10
1
Anomaly Detection
Jovelson Aguilar Sabino Junior and Konstantin Köhler
Keywords: anomaly detection, outlier detection, novelty detection.
1 Introduction
Anomaly detection is a peculiar problem in Data Mining. The objective in Data Mining is discovering
interesting patterns and knowledge from large amounts of data - as described by Han and Kamber, 2000 -,
which usually means finding a model that better fits most observations, disregarding the exceptions. In contrast,
according to Chandola et al. [2009], anomaly detection refers to the problem of finding patterns in data that do
not conform to expected behavior, i.e., the goal in anomaly detection is to identify some characteristics of the
data that are often neglected in other Data Mining methods. Hence, the very concept of anomaly detection is
challenging.
Many techniques have been developed to solve this problem and some are specific for an application
domain, but this report presents the more generic methods and approaches. Furthermore, some examples of real
life applications will be introduced.
Besides, there are specific issues with the traditional techniques, but the great development of this field
is explained by the big impact that an anomaly causes in a domain - usually a loss or a damage, as it will be
shown with more details in next sections.
This work’s objective is to provide a literature review on the topic, highlighting the concepts, main
challenges and proposed methods. Both classic and recent works in the area were used.
1.1 Definitions
In order to provide a comprehensive overview about anomaly detection, it is necessary to outline some basic
concepts.
Noise. According to Gama [2010], noise is a perturbation in data with neither consistency nor persistence and
may be, for example, a cause of error in detecting drift. Noise can occur due to a random error in the variable
generation or an incorrect recording. Chandola et al. [2009] remark that noise is an out of interest phenomenon
in data, that acts as a hindrance to data analysis. Hence, noise should be removed or accommodated. However,
these tasks are not part of this report, since it will focus on those deviations that are of actual interest in a certain
domain.
Outlier. The classic definition by Hawkins [1980] is that outlier is an observation that seems to be inconsistent
with the reminder. Outliers are often related to extreme values, but it is not always true; which characterizes an
outlier is its deviation from the other objects.
The literature classifies outliers in three types. Global outliers are those who are aberrant observations
disregarding any context or sequence. Usually they are extremes. For example, the temperature of -10°C in
Porto is a global outlier. Contextual outliers are observations that are unexpected in a given context, but are
considered normal in other situations. For example, 5°C is an outlier in Porto in summer but is not an outlier in
the winter. The collective outliers are those who become outlier not because of its value itself, but due to the
interaction with other observations. For example, it is normal to rain in December in Porto, but, if it rains for 30
days in a row, it is a collective outlier.
Anomaly. Anomalies are unexpected observations or patterns found in the data. They are also referred to as
surprising, suspicious, rare or deviating events. The idea of anomaly is very similar to the outlier, but anomalies
2
are characterized by their “interestingness" or real-life relevance, as mentioned by Chandola et al. [2009].
However, outlier and anomaly are words often used interchangeably and most of the time when analysts work
on detecting an anomaly, they are, in fact, working on detecting an outlier. Ribeiro, Pereira and Gama [2016],
for example, state that, in engineering applications, one can consider that an anomaly is an outlier. In our report,
we frequently use the word outlier to refer to an anomaly.
1.2 Anomaly Detection Examples
Anomaly detection has been applied in many domains to identify fraud, damage, disease or other undesirable
output. Below are some applications.
Fraud detection. Some frauds can be recognized by identifying patterns in the data. The classical example is
credit card fraud detection, where the fraud can be detected for a peculiar purchase, for a purchase in a peculiar
context or for a peculiar sequence of purchases. This kind of technique is also utilized by mobile phone
operators in order to detect unauthorized use of the network carried out by deception. Cochran [2019] presents
the case of Infinity Property & Casualty Corporate, an insurance company that has been increasing the number
of cases in which they get the money back from the people responsible for the damages. Besides, it is reported
that they achieved 35.5% of improvement in claims processing speed and have been improved customer
satisfaction levels. Other types of fraud detection are performed by regulatory agencies such as the US
Securities and Exchange Commission (SEC), that aims to detect illegal practices, like insider trading. Fraud
detection is a really challenging task, as companies demand that the suspicious action is identified (and blocked)
promptly, but some advanced types of fraud are only detected with deeper analysis.
Industrial damage detection. Outlier detection is used in many industries, such as road, railway, chemical
engineering, turbines, power plants et cetera. The main idea is to prevent a damage or diminish its extension, by,
for example, recommending equipment to be maintained. In general, industrial damage can cost money and a
loss of reputation. To avoid further loss or damage, immediate detection is generally expected, which makes this
task very challenging. In some sectors such as the food industry and the pharmaceutical industry, failure often
occurs in a row, which means that if a failure is not avoided or repaired timely, there is a high probability of
waste, loss and risk to health. When the data is recorded by sensors, there is a temporal aspect, which demands
a time series modeling technique. As highlighted by Ribeiro, Pereira & Gama [2016], structural faults are
usually related to a consecutive set of outliers, which make the task even more complex, as demands sequential
anomaly detection techniques.
Image processing. Many areas such as surveillance, health care and satellite monitoring have been adopting
anomaly detection techniques in order to improve the detection of abnormalities on both static and moving
images. For example, the Massachusetts Institute of Technology (MIT) and the Massachusetts General Hospital
(MGH) have developed a model that screens mammograms and is able to identify patterns related to breast
cancer. The model finds anomalies in this kind of image, which are indistinguishable by the human eye. It is
reported that it identifies 31% of breast cancer in high risk patients, while the traditional methods achieve only
18% [Barzilay et al. 2019]. Similarly, Zhang et al. [2020] developed a model that reached 96% of sensitivity
and 70,65% of specificity in diagnosing patients with COVID19 by screening on chest X-ray images using deep
learning. Anomaly detection in image processing faces ethical problems, as most training examples are from
white people, thus the algorithms are not accurate to identify anomalies in blacks. Furthermore, critics have
concerns about privacy, especially in surveillance activity.
Medical and public Health anomaly detection. Besides the studies about image processing, there are other
ways in which medical and public health have been benefited by anomaly detection techniques. For example,
some studies aim to detect disease outbreaks in a specific area as well to avoid bioterrorism. As explained by
Buckeridgea et al. [2004], statistical process control (SPC) methods, originally designed to identify defective
3
products in a manufacturing process, outperform other methods in this domain. This topic has attracted interest
since the coronavirus pandemic. In recent years, there is also development in other important areas like
electrocardiograms and electroencephalograms, which are time series data, thus it is necessary a collective
anomaly detection method. In general, in this domain, it is better to classify a normal observation as abnormal
than the opposite, since the cost of, for example, diagnosing a sick patient as healthy is very high.
Other domains. Anomaly detection is used with success in many other areas, such as text mining, sensor
networks, speech recognition, traffic monitoring, and in detecting ecosystem disturbances. It is also broadly
used to detect intrusion in computers and networks infrastructure by identifying attacks, faults and defects in the
connection.
1.3 Related work

Many authors from different domains have been written about anomaly detection and similar problems, whose
approaches are often useful in analogous tasks. Below, change detection and novelty detection are defined in
order to distinguish them from anomaly detection while pointing out some similarities.
Change detection. According to Gama [2010], change detection is the process of learning the concept drift over
time. It takes in account consistency (Et ≤ threshold) and persistence (p times). The concept drift is a gradual
change and should be distinguished from noise, stationarity and concept shift (an abrupt change). Thus, change
detection focuses on the slow changes in the mechanism that generates the observations.
Novelty detection. Novelty detection aims to detect a novel or unknown concept, whereas it is not interested in
sparse independent examples - which would be noise or outliers. In this process, the novel patterns are typically
incorporated into the normal model after being detected Gama [2010]. The traditional techniques allow the
recognition of novel concepts in unlabeled data by clustering the unknown observations (e.g. k-means). As
mentioned by Carvalho and Gama [2007], some algorithms detect both concept drift (change) and novelty (e.g.
OLLINDA).
It is clear that the novelty detection goal is broader than the one of anomaly detection since anomaly
detection focus is only on finding the discordant data so as to prevent a loss or a damage, while novelty
detection analyse the anomalies in order to evaluate if it is possible to label them as a new class or subclass.
Anyway, as remembered by Chandola et al. [2009], solutions for novelty detection are used for anomaly
detection and vice-versa.
2 Challenges and issues
2.1 General Challenges
The challenge of detecting anomalies brings itself the issue of focusing on the exception. As in the material
world, it is much more difficult to find something scass than an abundant one. Moreover, it is hard to identify an
object as part of a class whether there is little previous information of other similar objects. In both cases, it is
especially more complex if the object that should be detected resembles an object of no interest. Indeed, in
anomaly detection, usually there is small availability of labeled data (examples of outliers), which can restrict
the choice of methods, increase the costs and decrease the sensibility. Because the proportion of anomalies is
typically low, some algorithms tend to ignore them; this problem is repaired by creating a new one: increasing
the sensibility can lead to a higher false positive rate. Another peculiarity of anomaly detection can be a reason
for false alarms: the boundary between normal and abnormal is not precise, which, on the other hand, can mask
the outlier. But it is not only the normal data that can be confused with outliers: noise often is similar to
4
outliers; distinguishing them is a hard task, as both are typically rare and different from the remaining
observations.
There are more problems than the intrinsic task of detecting outliers: the notion of outlier is different in
each domain and, at times, quite subjective. This makes it difficult to create algorithms that could be applied in
different areas.
In addition, the anomaly detection model can become obsolete more quickly than other data mining
models. First, what is considered normal behavior tends to change naturally over time, which has a stronger
impact on the task of separating normal observations from abnormal ones. Second, in fraud detection - a classic
anomaly detection application - the offender usually adapts to make the fraud appear normal. This characteristic
is even more challenging when the domain requires blocking, in real time, the mechanism that generates the
anomaly.
2.2 Challenges related to type of outlier
The global outliers are the easiest to be detected, as their identification depends mostly on finding a
measurement which defines what is an outlier in the specific domain.
The contextual outlier detection requires background information to determine contextual attributes and
behavioural contexts. Thus, an expert dictates which characteristics external to the object define a given context
and which threshold the example should reach to be considered an outlier. It should be noted that considering
contexts in outlier detection can also help to avoid false alarms [Han and Kamber, 2000].
In collective outlier detection, the background information is necessary to find relationships among
objects, i.e. the issue in this type of outlier is to discover a measurement that evaluates if the behavior of a given
object is close enough to the behavior of a group of objects to the point to be considered outliers. As collective
outliers depend on their relationship with other objects, the individual object may not be an outlier, which makes
the task better nominated, in this case, as sequential anomaly detection. As modeling sequences demands time
series techniques, it is notably complex.
2.3 Challenges related to degree of supervision
Supervised. In supervised learning, the task is modeled as a classification problem, so the first issue is to label
the abnormal examples. Besides, since anomaly detection usually is employed in datasets with imbalanced
classes, supervised methods have serious limitations in identifying most kinds of abnormal observations. There
are techniques to minimize the problem of imbalance, but they typically increase the rate of false positives.
Unsupervised. Techniques that draw inferences from data without labeled examples are used in detecting
anomalies, but they often make confusion between noise and outliers. Moreover, it is computationally expensive
to find clusters first and then outliers.
Semisupervised. The semi-supervised techniques find similar problems to the supervised one: if there are only
a few labeled outliers, probably the outliers are not well represented in the model.
Thus, the supervised methods are more utilized in domains where the cost of labeling is worth and it is
better to predict a negative class as an anomaly instead of missing a true outlier. It is remarkable that, while
supervised techniques usually outperform unsupervised or semisupervised techniques, the latters are often
preferred due to the costs of labelling examples [Ribeiro, Pereira & Gama 2015].
2.4 Challenges related to number of attributes
5
It is possible to identify an anomaly in univariate and bivariate data by using statistical techniques or visual
representation. This kind of data does not bring challenges itself besides the issues previously presented.
However, multivariate data demands more sophisticated approaches, as it is more difficult to obtain
representative graphics, and the relationship between variables is less intuitive. It is much more changeling, for
example, identifying a lurking variable in multivariate data, which can disturb the anomaly detection process by
hiding outliers or fabricating them. One approach to address this issue is to reduce the dimensions by PCA
method, with which there is some loss of information.
As mentioned by Ramchandran & Sangaiah [2018], anomaly detection algorithms of low dimensional
data are not suitable for high dimensional data, thus it is necessary to find an appropriate algorithm according to
the number of attributes of the dataset.
3 Methods and techniques
In order to conduct outlier detection an overwhelming number of techniques exist. In the context of this work, a
few are introduced and their respective main purposes are identified.
In the following statistical, proximity based and neural network based approaches are covered.
3.1 Statistical
“An anomaly is an observation which is suspected of being partially or wholly irrelevant because it is not
generated by the stochastic model assumed” [Anscombe and Guttman 1960]. In other words: Normal data
instances occur in high probability regions of a stochastic model, while anomalies occur in the low probability
regions.
Statistical techniques try to fit a statistical model to the given data. Afterwards an inference test is
conducted in order to determine whether an unseen observation belongs to that model or not. Observations that
have a low probability of being generated from the proposed model based on the applied test statistics are then
labeled as anomalies. Both parametric and non-parametric techniques can be used to fit a statistical model.
Parametric techniques require the assumption of the underlying probability distribution and the estimation of its
parameters from the given data [Eskin]. Non-parametric techniques generally do not require knowledge of the
underlying distribution [Desforges et al.].
In the following both techniques will be portrayed in greater detail.
Parametric. Parametric techniques assume that the given data was generated by a specific probability
distribution. Thereby, its distribution specific set of parameters Θ (e.g. µ and σ for normal distributions) are used
inside the probability density function f (x, Θ), where x is an observation. The “anomaly score” of a observation
x is the inverse of the probability density function f (x, Θ).
The parameters Θ of the assumed underlying distribution can be estimated in several ways. However,
parameter estimation via the Maximum Likelihood Estimation (MLE) is a common approach. Hereby, the
parameter set selected is the one according to whose distribution the realization of the observed data appears
most plausible.
In order to detect outliers, one has to measure the “distance” of an observation to the estimated mean µ.
The distance is then compared to a threshold. The threshold needs to be chosen distinctly from case to case. We
saw that researchers often use µ ± 3σ as a threshold which covers 99.7% of the data when coming from a
gaussian distribution [Chandola et al., pp. 33f].
As it is quite simple to introduce a distance-measure in the univariate case on a continuous scale, one
has to come up with a more sophisticated technique when working with multivariate data and/or nominal or
ordinal scales.
In order to apply a distance measure on multivariate data, a multivariate distance function d has to be
introduced. This function d will take the multivariate data X as an input and will output a non-negative real
number:
6
d : X × X → [0, ∞) (eq. 1)
Researchers came up with a lot of different distance functions. One of the most popular distance
functions is the Mahalanobis Distance.
The Mahalanobis Distance measures the distance of a given point to the mean of the dataset in
standard deviations. Therefore, the distance grows along the principle components of the dataset. If the data was
scaled along the principle components to have unit variance, the Mahalanobis Distance would equal the
Euclidean Distance.
Paradigmatic transformation of dataset along its principle components. [Gyebnár et al.]
The obtained distance can then be used as shown before for the univariate case.
Non-Parametric. Non-parametric approaches typically make fewer a priori assumptions about the given data.
Thus, non-parametric approaches can generally be used in more scenarios than parametric approaches.
In the following two specific non-parametric techniques for anomaly detection will be introduced.
Histogram-Based. The usage of histograms can be seen as the simplest non-parametric statistical approach in
order to detect outliers. In the univariate case, a histogram is created based on the training data. Hereby, the bin
size h has to be chosen manually.
When checking an unseen observation, one labels it as an anomaly if it falls in a bin of size zero. If the
bin is already populated, the unseen observation is considered normal.
Thus, the bin size has a great impact on the detection algorithmn’s behavior. If the bins are small, many
normal test instances will fall in empty or rare bins, resulting in a high false alarm rate. If the bins are large,
many anomalous test instances will fall in frequent bins, resulting in a high false negative rate. Thus, the
determination of the bins’ optimal size is a key challenge when working with histograms for outlier detection
[Han et al., pp 558ff].
When working with multivariate data, a common approach is to conduct a histogram for every feature
of the data.
Kernel-Based. Besides the histogram approach, kernel-based approaches are common when using
non-parametric techniques for outlier detection. Hereby, the probability density function (pdf) of a given data set
is estimated through a kernel (which can be chosen manually).
Anomaly detection techniques based on kernel functions are similar to parametric methods described
earlier. The only difference is the density estimation technique used. An unseen observation which lies in the
low probability area of the obtained pdf is labeled as anomalous [Chandola et al., p. 38].
7
When working with multivariate data, one can simply use a multivariate kernel.
Use Case. Statistical approaches towards outlier detection are useful as they can operate in an unsupervised
setting without any need for labeled training data.
The approaches output an anomaly score (as mentioned above) which can be (respectively has to be)
interpreted. Depending on the setting, this can be seen as either an advantage or a disadvantage of statistical
methods.
The need to make assumptions about the actual probability distribution in parametric approaches is an
obvious disadvantage of that method. If the assumption does not hold true, the obtained model is useless.
Histogram based approaches suffer when applied to a more complex multivariate setting as it is not
able to capture interactions between different features.
3.2 Proximity Based
Proximity Based approaches can be divided into two groups. Namely, distance based approaches and density
based approaches. Both approaches search for dense neighborhoods and label observations in less dense
neighborhoods as anomalies. Thereby, both approaches rely on the usage of arbitrary distance functions
(compare eq. 1).
In the following both approaches will be portrayed. Furtheron, conclusions in the context of their
preferable use cases will be drawn.
Distance Based. Even Though a lot of distance based approaches can be found in literature, most
implementations are straightforward.
All of the methods have the usage of a distance function in common. This function can look arbitrary
as long as it follows eq. 1. Similar to the statistical approaches, an appropriate distance function can calculate a
non-negative real number from a multivariate input.
The distance based approaches differ in the amount of neighbors they take into account. Some use a
k=1 (and derive decent results) others use a larger k.
After obtaining a distance for an unseen observation it is usually compared to a threshold to determine
if the observation is anomalous or not.
There are a lot of extensions to this basic distance based approach for outlier detection. They mostly serve a
specific use case and are therefore not discussed further in this work [Chandola, pp. 25ff].
Density Based. As distance based approaches suffice in order to find global outliers, the methods perform
poorly when applied to a dataset of heterogeneous density (contextual outliers). Hereby, multiple data sources
with different probability distributions might occur in the same data set. Distance based methods can not
differentiate between those but would apply a global threshold when looking for outliers.
Density based approaches can help as they detect outliers with respect to their local neighborhoods
rather than the global data distribution.
A popular density based approach was derived by Breunig et al [2000], who assigned an anomaly score
to a given data instance, known as Local Outlier Factor (LOF). For any given observation, the LOF score is the
ratio of average local density of the k nearest neighbors of the observation and the local density of the
observation itself. As this approach works in heterogeneous datasets, its efficiency is (as most of knn
approaches) improvable.
Another popular variation of the LOF was derived by Tang et al. [2002]. It is called the
Connectivity-based Outlier Factor (COF). It differs to the default LOF in the manner in which the k neighbors of
an observation are derived.
The COF computes the neighborhoods for the observations in an incremental manner. To start, the
closest observation to the given observation is added to the set of neighbors. The next observation added is the
one such that the neighborhood’s distance to the existing neighborhood is minimum among all remaining
8
observations. Once the complete k neighborhood is computed, the anomaly score (COF) is computed similarly
to the LOF. Contrary to LOF, COF is able to capture regions such as straight lines (compare Figure XXX)
[Chandola, pp. 27ff].
Comparison of neighborhoods for LOF and COF.
Use Case. When using proximity based approaches one does not need to do any assumptions about the
underlying data. The approaches are unsupervised and purley data-driven.
Density based approaches like the LOF or COF can detect anomalies in data sets of heterogeneous
density which can be a great advantage (depending on the use case).
However, the introduced proximity based approaches are all based on k-nearest neighbors operations.
Thus, they are computationally costly.
Furthermore, if the dataset has normal observations which do not have enough close neighbors, the
algorithm might fail to label them as outliers.
3.3 Classification Based
In order to realise outlier detection, classification based approaches can be used. Thereby, a model (classifier) is
obtained (learned) from a set of labeled data. After the learning process, unseen observations are fed into the
model and classified by it.
Neural Network Based. Neural Network approaches are common when using classification based techniques
for outlier detection. This work will specifically focus on the usage of autoencoders.
In simple words, autoencoders are Artificial Neural Networks (ANN) that meet two criteria:
1. The ANN’s number of input neurons equals the number of its output neurons .
2. The ANN’s number of neurons in the hidden layers is smaller than the number of its input/output
neurons.
Schematic structure of autoencoders[Salim]
Generally autoencoders are used to recreate the feed in data as accurately as possible while
compressing its size drastically. As figure XXX indicates, the autoencoder adapts to a linear combination of the
most important features in the dataset due to the limited number of neurons in the hidden layer. It will ensure
that the data’s main structure will persist, while unimportant information is filtered.
When an autoencoder model is trained to “normal” data, it can afterwards recreate “normal”
observations, but will fail when recreating novel data. By measuring how poorly the autoencoder performs on
unseen observations, one can classify the new observation as an outlier (or a normal instance) [Japkowicz].
9
Use Case. As neural network based approaches distinguish between training and testing phase, the actual testing
is fast since each unseen observation is compared against a pre-computed model.
Furthermore multi-class classification can be conducted efficiently by neural network approaches.
However, these techniques rely on the availability of accurate labels for various normal classes, which is often
hard to find in practise.
4 Conclusion
As discussed throughout this Report, anomaly detection is an important area of Data Mining, with application in
different domains. The usage of anomaly detection often has a severe direct impact on the economic well-being
of organizations or contributes greatly to health and security.
However, an efficient usage of outlier detection algorithms is often challenging, as its anomaly
detection problems typically come with a small availability of labeled examples and high costs to obtain them.
It was shown that practically, every specific use case needs a more or less tailor made approach in
order to operate effectively.
References
1. Adam Yala , Constance Lehman, Tal Schuster, Tally Portnoi, Regina Barzilay. A Deep Learning
Mammography-based Model for Improved Breast Cancer Risk Prediction
2. Anitha Ramchandran, Arun Kumar Sangaiah, in Computational Intelligence for Multimedia Big Data
on the Cloud with Engineering Applications, 2018
3. Anscombe, F. J. and Guttman, I. 1960. Rejection of outliers. Technometrics 2, 2, 123–147.
4. Breunig, M. M., Kriegel, H.-P., Ng, R. T., and Sander, J. 2000. Lof: identifying density-based local
outliers. In Proceedings of 2000 ACM SIGMOD International Conference on Management of Data.
ACM Press, 93–104.
5. Chandola, V., Banerjee, A., Kumar, V. 2009. Anomaly Detection: A Survey. ACM Computing Surveys
6. Cochran, J.J. (Editor). (2018) INFORMS Analytics Body of Knowledge. Wiley
7. Desforges, M., Jacob, P., and Cooper, J. 1998. Applications of probability density estimation to the
detection of abnormal conditions in engineering. In Proceedings of Institute of Mechanical Engineers.
Vol. 212. 687–703.
8. Eskin, E. 2000. Anomaly detection over noisy data using learned probability distributions. In
9. Faria, E.R., Gonçalves, I. J. C. R., Carvalho, A. C. P. L. F. de, Gama, J. Novelty detection in data
streams. 2015.
10. Gama, J. Knowledge discovery from data streams (2010).
11. Gama J, R. P., Spinosa EJ, Carvalho A. OLINDDA: a cluster-based approach for detecting novelty and
concept drift in data streams (2007)
12. Gyebnár, Gyula, et al. “Personalized Microstructural Evaluation Using a Mahalanobis-Distance Based
Outlier Detection Strategy on Epilepsy Patients’ DTI Data – Theory, Simulations and Example
Cases.” Plos One, vol. 14, no. 9, 2019, doi:10.1371/journal.pone.0222720.
13. Han, J., Kamber, M., & Pei, J. (2011). Data Mining: Concepts and Techniques (3rd ed.). San Francisco,
CA, USA: Morgan Kaufmann Publishers Inc.
14. Hawkins, D.: “Identification of Outliers”, Chapman and Hall, London, 1980
15. Howard Burkomc, Murray Campbelld, William R. Hogane,Andrew W. Mooref, for the BioALIRT
Project 2004. Algorithms for rapid outbreak detection: a research synthesisDavid L. Buckeridgea,b,*,
16. Japkowicz, N. (1999). Concept-learning in the absence of counter-examples: An autoassociation-based
approach to classification.
10
17. Jianpeng Zhang∗1,2, Yutong Xie∗1,2, Yi Li3 , Chunhua Shen2 , and Yong Xia 2020 COVID-19
Screening on Chest X-ray Images Using Deep Learning based Anomaly Detection
18. Proceedings of the Seventeenth International Conference on Machine Learning. Morgan Kaufmann
Publishers Inc., 255–262.
19. Ribeiro, R, Pereira P., Gama J. Sequential anomalies: a study in the Railway Industry (2016).
20. Salim, A., 2018,
https://medium.com/@ally_20818/anomaly-detection-with-auto-encoders-how-we-used-it-for-cervical-
cancer-detection-bdae74cbf05a
21. Tang, J., Chen, Z., Chee Fu, A. W., and W.Cheung, D. 2002. Enhancing effectiveness of outlier
detections for low density patterns. In Proceedings of the Pacific-Asia Conference on Knowledge
Discovery and Data Mining. 535–548.
11

Anomaly Detection: Course: Data Mining II

Uploaded by

Anomaly Detection: Course: Data Mining II

Uploaded by

Anomaly Detection

June, 18th, 2020

Jovelson Aguilar Sabino Junior

Jovelson Aguilar Sabino Junior and Konstantin Köhler

Keywords: anomaly detection, outlier detection, novelty detection.

1.2 Anomaly Detection Examples

1.3 Related work

2 Challenges and issues

2.1 General Challenges

2.2 Challenges related to type of outlier

2.3 Challenges related to degree of supervision

2.4 Challenges related to number of attributes

3 Methods and techniques

Paradigmatic transformation of dataset along its principle components. [Gyebnár et al.]

3.2 Proximity Based

Comparison of neighborhoods for LOF and COF.

3.3 Classification Based

Schematic structure of autoencoders[Salim]

You might also like

Anomaly Detection: Course: Data Mining II

Uploaded by

Anomaly Detection: Course: Data Mining II

Uploaded by

Anomaly Detection

June, 18th, 2020

Jovelson Aguilar Sabino Junior

Jovelson Aguilar Sabino Junior and Konstantin Köhler

Keywords:​ anomaly detection, outlier detection, novelty detection.

1.2 Anomaly Detection Examples

1.3 Related work

2 Challenges and issues

2.1 General Challenges

2.2 Challenges related to type of outlier

2.3 Challenges related to degree of supervision

2.4 Challenges related to number of attributes

3 Methods and techniques

Paradigmatic transformation of dataset along its principle components. [Gyebnár et al.]

3.2 Proximity Based

Comparison of neighborhoods for LOF and COF.

3.3 Classification Based

Schematic structure of autoencoders[Salim]

You might also like

Keywords: anomaly detection, outlier detection, novelty detection.