Understanding Data Mining
Understanding Data Mining
• Decision making
• Marketing
• Detecting Fraud
The Data Mining technology is popular with many
businesses because it allows businesses to learn
more about their customers, prevent frauds and
identity theft, and also make smart marketing
decisions.
Definitions of Data Mining
Data Mining is the analysis step of
Knowledge Discovery in Databases or KDD.
The core of the KDD process, involving the
inferring of algorithms that explore the
data, develop the model and discover
previously unknown patterns.
Algorithms - a process or set of rules to be
followed in calculations or other problem-
solving operations, especially by a computer.
Definitions of Data Mining
Data Mining is the process of discovering
new, hidden or unexpected patterns and
inferring associations in raw data.
Data Mining is a collection of powerful
techniques intended to analyse large
amounts of data.
There is no single Data Mining approach.
Data Mining can employ a range of
methods, either individually or in
combination with each other.
Importance of Data Mining
Data are being generated in enormous
quantities
Data are being collected over long periods
of time
Data are being kept for long periods of
time
Computing power is formidable and
cheap
A variety of Data Mining software is
available
Goal of Data Mining
The overall goal of the data mining process is to
extract information from a data set and transform
it into an understandable structure for further use
and action. Predictive Analytics
Unsupervised learning
• Association Analysis
• Sequential Pattern Analysis
• Clustering
• Text Mining/Social Media Sentiment
Analysis
Supervised Learning
In supervised learning, the output datasets
are provided which are used to train the
machine and get the desired outputs.
In supervised learning, there is a given data
set and how the correct output should look
like is already known.
In supervised learning, there is a
relationship between the input and the
output.
Unsupervised Learning
In Unsupervised learning no datasets
are provided, instead the data is
clustered into different classes .
In Unsupervised learning, it allows us
to approach problems with little or no
idea what our results should look like.
We can derive structure from data
where we don't necessarily know the
effect of the variables.
Illustration: Supervised vs Unsupervised
Situation:
• Basket full of fresh fruits (apple, banana, cherry
grape, orange).
• Task is to arrange the same type of fruit
Supervised learning:
• From previous work, you already know the
shape of each fruit so it is easy to arrange the
same type of fruits at one place.
• Here your previous work is called as train data in
data mining.
• You already learn the things from your train data
(i.e. the features of each fruit).
Illustration: Supervised vs Unsupervised, continued
Unsupervised Learning:
•No knowledge about fruits. So, how will you arrange
the same type of fruit.
•To arrange, select any physical character of a particular
fruit.
• If color: Then it will be arranged based on color
•Red group: apples & cherry fruits.
•Green group: bananas & grapes.
•Select another physical character, eg. Size.
•Red and big group: apple.
•Red and small group: cherry fruits.
•Green and big group: bananas.
•Green and small: grapes
Categories of Supervised Learning
Supervised learning problems are categorized into
"classification" and “regression” problems.