0% found this document useful (0 votes)
16 views

Anomaly Detection

This document discusses anomaly detection in machine learning and network security. It defines anomalies and outliers, and explains how to detect outliers using z-scores. Common causes of anomalies are given as different data classes, natural variation, and data errors. Challenges in anomaly detection are discussed, including obtaining accurate labels for supervised learning methods and dealing with different data types.

Uploaded by

Amita Soni
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Anomaly Detection

This document discusses anomaly detection in machine learning and network security. It defines anomalies and outliers, and explains how to detect outliers using z-scores. Common causes of anomalies are given as different data classes, natural variation, and data errors. Challenges in anomaly detection are discussed, including obtaining accurate labels for supervised learning methods and dealing with different data types.

Uploaded by

Amita Soni
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

MACHINE LEARNING AND NETWORK SECURITY

UNIT 2: Anomaly Detection


Anomaly vs Outliers
A standard normal table (also called the unit normal table or z-score table) is a mathematical table for the values
of ϕ, indicating the values of the cumulative distribution function of the normal distribution. Z-Score, also known as
the standard score, indicates how many standard deviations an entity is, from the mean.

𝑋−𝜇
𝑍=
𝜎
Reference Link: https://www.machinelearningplus.com/machine-learning/how-to-detect-outliers-with-z-score/
Standard Normal Distribution
Mean=0, Standard Deviation=1
Causes of Anomalies

1. Data from different classes

An object may be different because it is of a different class. Cases like credit card theft, Intrusion detection, outcome
of disease, abnormal test result are good examples of anomalies occurring and identified using class labels. Example:
measuring the weights of oranges, but a few grapefruit are mixed in.

2. Natural variation

In a Normal or Gaussian distribution the probability of a data object decreases rapidly. Such objects are considered as
anomalies. These are also called as outliers. Example: Unusually tall people.

3. Data measurement and Collection Errors

These kinds of errors occur when we collect erroneous data or if there is any deviation while measuring data.
Example: 200 pounds of a 2 year old.
Line A is blue line, B is green line and C is red
line.
We could use a clustering algorithm to assign membership to cluster.
Other Challenges in Anomaly Detection
Machine learning methods can be classified in many different ways. Quite frequently, we differentiate between
supervised and unsupervised learning. In supervised learning, the learning program needs labeled examples
given by a “teacher”, whereas in unsupervised learning, the program directly learns patterns from the data,
without any human intervention or guidance. The typical approach adopted by this method is to build a
predictive model for normal vs. anomaly classes. It compares any unseen data instance against the model to
identify which class it belongs to, whereas an unsupervised method works based on certain assumptions. It
assumes that (i) normal instances are far more frequent than anomalous instances and (ii) anomalous instances
are statistically different from normal instances. However, if these assumptions are not true, such methods
suffer from high false alarm rates.

For supervised learning, an important issue is to obtain accurate and representative labels, especially for the
anomaly classes.
Various Types of Data

The attributes used to describe real-life objects can be of different types. The following are the commonly used
types of attribute variables.
network

You might also like