Machine Learning

Machine learning is the study of algorithms that can learn from data to improve performance. There are two main types: supervised learning, where labeled examples are used by an algorithm to learn a function that maps inputs to outputs, and unsupervised learning, where unlabeled data is used to find hidden patterns. Some examples of supervised learning problems include classification and regression. Decision trees are a popular supervised learning method that uses information gain or Gini index to split data into partitions at each node. Unsupervised learning techniques include clustering, probability distribution estimation, and dimensionality reduction.

Uploaded by

jetlin

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

98 views

Machine Learning

Uploaded by

jetlin

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 9

Machine Learning

Machine Learning is the systematic study of algorithms and systems that improve their knowledge or
performance (learn a model for accomplishing a task) with experience (from available data /examples)
Examples:
 Given an URL decide whether it is a Sports website or not
 Given that a buyer is buying a book at online store, suggest some related products for that
buyer
 Given an ultrasound image of abdomen scan of a pregnant lady, predict the weight of the
baby

Like human learning from past experiences, a computer does not have “experiences”.

A computer system learns from data, which represent some “past experiences” of an
application domain.

Objective of machine learning : learn a target function that can be used to predict the
values of a discrete class attribute, e.g., approve or not-approved, and high-risk or low
risk.

The task is commonly called: Supervised learning, classification, or inductive learning

Supervised Learning

 The computer is presented with example inputs and their desired outputs, given by a "teacher",
and the goal is to learn a general rule that maps inputs to outputs.
 Supervised learning is a machine learning technique for learning a function from training
data.
 The training data consist of pairs of input objects (typically vectors), and desired outputs.
The output of the function can be a continuous value (called regression), or can predict a
class label of the input object (called classification).
 The task of the supervised learner is to predict the value of the function for any valid
input object after having seen a number of training examples (i.e. pairs of input and target
output).
 To achieve this, the learner has to generalize from the presented data to unseen situations
in a "reasonable" way.
 Another term for supervised learning is classification.
 Classifier performance depend greatly on the characteristics of the data to be classified.
There is no single classifier that works best on all given problems.
 Determining a suitable classifier for a given problem is however still more an art than a
science.
 The most widely used classifiers are the Neural Network (Multi-layer Perceptron),
Support Vector Machines, k-Nearest Neighbors, Gaussian Mixture Model, Gaussian,
Naive Bayes, Decision Tree and RBF classifiers.

Supervised learning process: two steps


Learning (training): Learn a model using the training data

Testing: Test the model using unseen test data to assess the model accuracy

Accuracy  Number of correct classifica tions ,

Total number of test cases
Decision Tree Representation/ Learning by decision tree
 Decision tree induction is the learning of decision trees from class-labeled training tuples.
 A decision tree is a flowchart-like tree structure, where each internal node (nonleaf node)
denotes a test on an attribute, each branch represents an outcome of the test, and each leaf
node (or terminal node) holds a class label.
 The topmost node in a tree is the root node.

 A typical decision tree is shown in above Figure.

 It represents the concept buys computer, that is, it predicts whether a customer at
AllElectronics is likely to purchase a computer. Internal nodes are denoted by rectangles, and
leaf nodes are denoted by ovals. Some decision tree algorithms produce only binary trees
(where each internal node branches to exactly two other nodes), whereas others can produce
non binary trees.“How are decision trees used for classification?”
 Decision Tree Induction
 The algorithm is called with three parameters: D, attribute list, and Attribute selection
method.
 We refer to D as a data partition. Initially, it is the complete set of training tuples and
their associated class labels.
 The parameter attribute list is a list of attributes describing the tuples.
Information gain
ID3 uses information gain as its attribute selection measure.

 Information gain is defined as the difference between the original information requirement
(i.e., based on just the proportion of classes) and the new requirement (i.e., obtained after
partitioning on A). That is,

 In other words, Gain(A) tells us how much would be gained by branching on A. It is the
expected reduction in the information requirement caused by knowing the value of A. The
attribute A with the highest information gain, (Gain(A)), is chosen as the splitting attribute at
node N.
 Hence, the gain in information from such a partitioning would be

 Similarly, we can compute Gain(income) = 0.029 bits, Gain(student) = 0.151 bits, and
Gain(credit rating) = 0.048 bits. Because age has the highest information gain among the
attributes, it is selected as the splitting attribute. Node N is labeled with age, and branches are
grown for each of the attribute’s values. The tuples are then partitioned accordingly, as shown
in Figure 6.5. Notice that the tuples falling into the partition for age = middle aged all belong
to the same class. Because they all belong to class “yes,” a leaf should therefore be created at
the end of this branch and labeled with “yes.” The final decision tree returned by the
algorithm is shown in Figure 6.5.

Gain Ratio:
 C4.5, a successor of ID3 uses an extention to information gain known as gain ratio, which
attempts to overcome the bias.
 It applies a kind of normalization to information gain using a “split information”

 SplitInfo(D)=-

 The gain ratio is defined as

GainRatio(A) =

Gini Index:
The Gini index is used in CART. Using the notation, the gini index measures the impurity of D, a
data partition or set of training tuples , as
Gini(D)=1-

GiniA(D)=

Example:
Information gain
PREDICTION
 Numeric prediction is the task of predicting continuous (or ordered) values for given input.
 For example, we may wish to predict the salary of college graduates with 10 years of work
experience, or the potential sales of a new product given its price. By far, the most widely
used approach for numeric prediction is regression.
 Regression analysis can be used to model the relationship between one or more independent
or predictor variables and a dependent or response variable.
 The response variable is what we want to predict.
 Regression analysis is a good choice when all of the predictor variables are continuous
valued as well.
 Linear Regression

 Multiple linear regression is an extension of straight line regression so as to improve more

than one predictor variable.
 Multiple regression problems are commonly solved with the use of statistical software
packages such as SAS, SPASS and S-PLUS.

2) Nonlinear Regression
Polynomial regression is often of interest when there is just one predictor variable. It can
be modeled by adding polynomial terms to the basic linear model. By applying transformations
to the variables, we can convert the nonlinear model into a linear one that can then be solved by
the method of least squares.
The non linear regression model is as follows:
Y=w0+w1x+w2x2+w3x3
To convert this equation to linear form, we define new variables:
X1=x x2=x2 x3=x3

Unsupervised learning

Unsupervised learning, no labels are given to the learning algorithm, leaving it on its own
to find structure in its input. Unsupervised learning can be a goal in itself (discovering hidden
patterns in data) or a means towards an end.

Example
 Suppose you have a basket and it is filled with some different types fruits, your task is to
arrange them as groups.
 This time you don’t know anything about the fruits, honestly saying this is the first time
you have seen them. You have no clue about those.
 So, how will you arrange them? What will you do first???
 You will take a fruit and you will arrange them by considering physical character of that
particular fruit.
 Suppose you have considered color.
 Then you will arrange them on considering base condition as color.
 Then the groups will be something like this.
 RED COLOR GROUP: apples & cherry fruits.
 GREEN COLOR GROUP: bananas & grapes.
 So now you will take another physical character such as size.
 RED COLOR AND BIG SIZE: apple.
 RED COLOR AND SMALL SIZE: cherry fruits.
 GREEN COLOR AND BIG SIZE: bananas.
 GREEN COLOR AND SMALL SIZE: grapes.
 Job done happy ending.
 Here you did not learn anything before, means no train data and no response variable.
 This type of learning is known as unsupervised learning.
 Clustering comes under unsupervised learning.
 Clustering

In clustering, a set of inputs is to be divided into groups. Unlike in

classification, the groups are not known beforehand, making this typically
an unsupervised task.
 Probability distribution estimation
 Finding association (in features)
 Dimension reduction
 Dimensionality reduction simplifies inputs by mapping them into a lower-
dimensional space. Topic modeling is a related problem, where a program
is given a list of human language documents and is tasked to find out
which documents cover similar topics.
Clustering:
 The process of grouping a set of physical or abstract objects into classes of similar
objects is called clustering.
 A cluster is a collection of data objects that are similar to one another within the
same cluster and are dissimilar to the objects in other clusters.
 A cluster of data objects can be treated collectively as one group and so may be
considered as a form of data compression.
 Although classification is an effective means for distinguishing groups or classes of
objects, it requires the often costly collection and labeling of a large set of training
tuples or patterns, which the classifier uses to model each group.
 It is often more desirable to proceed in the reverse direction: First partition the set of
data into groups based on data similarity (e.g., using clustering), and then assign
labels to the relatively small number of groups. Additional advantages of such a
clustering-based process are that it is adaptable to changes and helps single out useful
features that distinguish different groups.
 Clustering is a challenging field of research in which its potential applications pose
their own special requirements. The following are typical requirements of clustering
in data mining:
 Scalability: Many clustering algorithms work well on small data sets containing fewer
than several hundred data objects; however, a large database may contain millions of
objects.
 Ability to deal with different types of attributes: Many algorithms are designed to cluster
interval-based (numerical) data.
 Minimal requirements for domain knowledge to determine input parameters: Many
clustering algorithms require users to input certain parameters in cluster.

1.05: Energy Lab Worksheet-Assignment Template
100% (1)
1.05: Energy Lab Worksheet-Assignment Template
6 pages
Brutt-Griffler - World - English
100% (4)
Brutt-Griffler - World - English
230 pages
Agile Data Warehouse PDF
No ratings yet
Agile Data Warehouse PDF
24 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
New Theory On Ayanamsa
No ratings yet
New Theory On Ayanamsa
3 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Machine Learning: Mona Leeza Email: Monaleeza - Bukc@bahria - Edu.pk
No ratings yet
Machine Learning: Mona Leeza Email: Monaleeza - Bukc@bahria - Edu.pk
60 pages
ML Unit 2
No ratings yet
ML Unit 2
37 pages
Data Mining Unit 3
No ratings yet
Data Mining Unit 3
50 pages
Asign-3 DWDM
No ratings yet
Asign-3 DWDM
27 pages
AIYA SESSION 4
No ratings yet
AIYA SESSION 4
42 pages
Artificial Intelligence: Slide 6
100% (1)
Artificial Intelligence: Slide 6
42 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Module 3
No ratings yet
Module 3
132 pages
DW&M Unit 3 Part I
No ratings yet
DW&M Unit 3 Part I
101 pages
Classification and Prediction
No ratings yet
Classification and Prediction
143 pages
Interview Preparing - ML Draft
No ratings yet
Interview Preparing - ML Draft
12 pages
Data Mining Book
No ratings yet
Data Mining Book
84 pages
Module 3 (1)
No ratings yet
Module 3 (1)
63 pages
NLP Chapter 2
No ratings yet
NLP Chapter 2
79 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
100% (1)
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
13 pages
Chapter-V CLASSIFICATION & CLUSTERING
No ratings yet
Chapter-V CLASSIFICATION & CLUSTERING
153 pages
DOC-20241106-WA0007
No ratings yet
DOC-20241106-WA0007
48 pages
08 Class Basic
No ratings yet
08 Class Basic
141 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Salary Prediction-2
No ratings yet
Salary Prediction-2
26 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
Machine Learning QNA
No ratings yet
Machine Learning QNA
1 page
7 - Classification
No ratings yet
7 - Classification
71 pages
Machine Learning Report
No ratings yet
Machine Learning Report
58 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
Classification DecisionTreesNaiveBayeskNN
No ratings yet
Classification DecisionTreesNaiveBayeskNN
75 pages
Classification Ppts 2021
No ratings yet
Classification Ppts 2021
80 pages
Module 1
No ratings yet
Module 1
50 pages
Artificial Intelligence Chapter 18 (Updated)
No ratings yet
Artificial Intelligence Chapter 18 (Updated)
19 pages
Classification
No ratings yet
Classification
73 pages
1
No ratings yet
1
42 pages
machine learning
No ratings yet
machine learning
37 pages
A Preliminary Idea On Machine Learning
No ratings yet
A Preliminary Idea On Machine Learning
40 pages
Colloquium Evaluation: Faculty of Computer Science and Engineering To:Kanika Gupta Ma'Am Bhavya Sethi 16csu082
No ratings yet
Colloquium Evaluation: Faculty of Computer Science and Engineering To:Kanika Gupta Ma'Am Bhavya Sethi 16csu082
12 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
No ratings yet
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
17 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
No ratings yet
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
117 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
4 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
CH 5
No ratings yet
CH 5
84 pages
IT 802 ML Unit-2 Notes
No ratings yet
IT 802 ML Unit-2 Notes
19 pages
Classification: Basic Concepts
No ratings yet
Classification: Basic Concepts
73 pages
Data Mining-Unit-3
No ratings yet
Data Mining-Unit-3
16 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
88 pages
Unit I
No ratings yet
Unit I
44 pages
Classification and Prediction
No ratings yet
Classification and Prediction
40 pages
Unit 2 ML
No ratings yet
Unit 2 ML
141 pages
Agenda: - Introduction - Basics - Classification - Clustering - Regression - Use-Cases
No ratings yet
Agenda: - Introduction - Basics - Classification - Clustering - Regression - Use-Cases
30 pages
AI lab6 (1)
No ratings yet
AI lab6 (1)
7 pages
ml
No ratings yet
ml
9 pages
Machine Learning Concepts
No ratings yet
Machine Learning Concepts
68 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Agni College of Technology: Office of Examcell Question Bank For Model
No ratings yet
Agni College of Technology: Office of Examcell Question Bank For Model
2 pages
AI QB 2017 Int 1 Ans
No ratings yet
AI QB 2017 Int 1 Ans
25 pages
Adhoc 2017
No ratings yet
Adhoc 2017
2 pages
Pds I - Internal 2 - Set 1
No ratings yet
Pds I - Internal 2 - Set 1
2 pages
4) Using Recursion Reverse The Order of Words in A String Algorithm
No ratings yet
4) Using Recursion Reverse The Order of Words in A String Algorithm
2 pages
Polynomial and Synthetic Division
No ratings yet
Polynomial and Synthetic Division
2 pages
Time Tracking Workbook v3
No ratings yet
Time Tracking Workbook v3
16 pages
How Mixed Must A Mixed System Be?
No ratings yet
How Mixed Must A Mixed System Be?
17 pages
Kruthika CV
No ratings yet
Kruthika CV
4 pages
Topographic Map of Mayer
No ratings yet
Topographic Map of Mayer
1 page
Lesson Plan: News Item Text
No ratings yet
Lesson Plan: News Item Text
3 pages
Role of Library in Research
100% (1)
Role of Library in Research
6 pages
Chapter 4 - BUSINESS STATISTICS
No ratings yet
Chapter 4 - BUSINESS STATISTICS
14 pages
Reference Style - KU - Economics - Discipline - 2015 - 11 - 10-2
No ratings yet
Reference Style - KU - Economics - Discipline - 2015 - 11 - 10-2
7 pages
State Pollution Control Board-Mishra-Sahu
No ratings yet
State Pollution Control Board-Mishra-Sahu
19 pages
Command STATA Yg Sering Dipakai
No ratings yet
Command STATA Yg Sering Dipakai
5 pages
Resumen Del Libro de Spivak
No ratings yet
Resumen Del Libro de Spivak
7 pages
Prayer For Protection PDF
100% (1)
Prayer For Protection PDF
3 pages
Examen Final Ingles 7
No ratings yet
Examen Final Ingles 7
2 pages
50 BÀI TẬP ĐỌC HIỂU.docx- Part 1
No ratings yet
50 BÀI TẬP ĐỌC HIỂU.docx- Part 1
5 pages
IPE Prospectus 14-16
No ratings yet
IPE Prospectus 14-16
28 pages
Brunner
No ratings yet
Brunner
27 pages
Stock Level
No ratings yet
Stock Level
14 pages
Chapter 5 (Magnetism and Matter) Unsolved PDF
No ratings yet
Chapter 5 (Magnetism and Matter) Unsolved PDF
4 pages
TCS34725 Color Sensor User Manual
No ratings yet
TCS34725 Color Sensor User Manual
16 pages
Political Science MCQs With Explanation For CSS (Machiavelli)
100% (1)
Political Science MCQs With Explanation For CSS (Machiavelli)
3 pages
Ielts Speaking A Collection of Common Topics: Unit 1 People Lesson 7 An Old Person You Respect
No ratings yet
Ielts Speaking A Collection of Common Topics: Unit 1 People Lesson 7 An Old Person You Respect
8 pages
BIU2022 English Proficiency Class (Writing Assignment) Diploma20152016
No ratings yet
BIU2022 English Proficiency Class (Writing Assignment) Diploma20152016
4 pages
Laser Gauge Application
No ratings yet
Laser Gauge Application
2 pages
Module 3a-Physics 1
100% (1)
Module 3a-Physics 1
5 pages
Artikel Ilmiah Kelompok 2
No ratings yet
Artikel Ilmiah Kelompok 2
8 pages