Lecture 1 -Intro

Machine Learning
Dr. Dinesh Kumar Vishwakarma

Professor,
Department of Information Technology,
Delhi Technological University, Delhi-110042
dinesh@dtu.ac.in
http://www.dtu.ac.in/Web/Departments/InformationTechnology/faculty/
dkvishwakarma.php
Course Detail
• Faculty: Dinesh K Vishwakarma, Ph.D. in
Computer Vision
Email: dinesh@dtu.ac.in
Webpage:
http://www.dtu.ac.in/Web/Departments/InformationTec
hnology/faculty/dkvishwakarma.php
• Course Code:
Credit: L T P: 3 0 2 : 4C
2
Evaluation Schedule
3
Evaluation Criteria
• CWS (15%) PRS (25%)
 Assignments
 Tutorials
 Quiz's/Random Questions
• MTE (20%)
 1 Innovative Work in the form of Small Project, Startup Idea, Collaborative
Projects, Automation, Simulation, Case study, Solutions to Real time social,
economic and technical problems etc. (group of maximum 2 students): Graphical
abstract
• ETE (40%)
 (15x2=30%) 3 Class Tests after every 4 weeks, Best 2 will be considered for
evaluation.
 (10x1=10%) Minor Tests in the form of Quizzes, Short Answer Questions,
MCQs, Open Ended/Essay, Questions, etc. Better of the two will be considered
for evaluation.
4
Course Content
Contact
Unit No. Contents
Hours
Introduction to Machine Learning: Overview of different tasks:
classification, regression, clustering, control, Concept learning,
1 information theory and decision trees, data representation, diversity of 8
data, data table, form of learning, Basic Linear Algebra in machine
learning techniques.
Supervised Learning: Decision trees, nearest neighbours, linear
2 classifiers and kernels, neural networks, linear regression, logistic 12
regression, Support Vector Machines.
Unsupervised Learning: Clustering, Expectation Maximization, K-
3 Mean clustering, Dimensionality Reduction, Feature Selection, PCA, 10
factor analysis, manifold learning.
Reinforcement Learning: Element of Reinforcement learning, Basic of
4 Dynamic Programming; fining optimal policies, value iteration; policy 8
iteration; TD learning; Q learning; actor-critic.
Recent applications & Research Topics: Applications in the fields of
5 4
web and data mining, text recognition, speech recognition, finance.
Total Contact Hours 42
5
Books
Text Books
1 Introduction to Machine Learning, Alpaydin, E., MIT Press, 2004
2 Machine Learning, Tom Mitchell, McGraw Hill, 1997
3 Elements of Machine Learning, Pat Langley Morgan Kaufmann Publishers
4. Applied Machine Learning, M. Gopal, McGraw Hill, 2018
Reference
1 The elements of statistical learning, Friedman, Jerome, Trevor Hastie, and
Robert Tibshirani. Vol. 1. Springer, Berlin: Springer series in statistics, 2001.
2 Machine Learning: A probabilistic approach, by David Barber.

3 Pattern recognition and machine learning by Christopher Bishop, Springer
Verlag, 2006
4 An Introduction to Statistical Learning: with Applications in R (Springer Texts in

Statistics) 1st ed. 2013, Corr. 7th printing 2017 Edition
6
Resources: Journals
1 IEEE Transactions on Pattern Analysis and Machine
Intelligence
2 IEEE Transactions on Neural Networks and Learning
Systems
3 Pattern Recognition
4 International Journal of Computer Vision
5 IEEE Transactions on Fuzzy Systems
Ranking
6 Journal of Machine Learning Research

7 Expert Systems with Applications
8 Fuzzy Sets and Systems
9 Information Sciences
10 Artificial Intelligence
11 Machine Learning
12 Pattern Recognition Letters
7
Resources: Conferences
http://www.guide2research.com/topconf/machine-learning 8
A Few Quotes
• “A breakthrough in machine learning would be worth
ten Microsofts” (Bill Gates, Chairman, Microsoft)
• Machine learning is the hot new thing”
(John Hennessy, President, Stanford)
• “Web rankings today are mostly a matter of machine
learning” (Prabhakar Raghavan, Dir. Research, Yahoo)
• “Machine learning is going to result in a real revolution”
(Greg Papadopoulos, CTO, Sun)
• Machine learning (ML) is the study of

computer algorithms that improve
automatically through experience.
9
What is Machine Learning?
• A branch of artificial intelligence, concerned with the design and
development of algorithms that allow computers to evolve
behaviors based on empirical data.
• Machine Learning is the science (and art) of programming

computers so they can learn from data.
• As intelligence requires knowledge, it is necessary for the computers
to acquire knowledge.
Machine Learning is the field
• Getting computers to program themselves of study that gives computers
the ability to learn
without being explicitly
• Writing software is the bottleneck
programmed.
—Arthur Samuel, 1959
• Let the data do the work instead!
The term machine learning was coined in 1959 by Arthur Samuel

10
• A computer program is said to learn from experience E
with respect to some class of tasks T and performance
measure P if its performance at tasks in T, as measured
by P, improves with experience E. Tom Mitchell,
"Machine Learning" 1997.
Process T Measure
E Improve
P
11
E T P
Experience Task Performance
Having Labelled Data:

Measuring
No. of students (male, Processing
Performance
female), etc.
Classification, Accuracy, Precession,

Supervised Learning
Regression Recall
12
T: Playing checkers
P: Percentage of games won against an arbitrary opponent
E: Playing practice games against itself
T: Recognizing hand-written words

P: Percentage of words correctly classified
E: Database of human-labeled images of handwritten words
T: Driving on four-lane highways using vision sensors

P: Average distance traveled before a human-judged error
E: A sequence of images and steering commands recorded while
observing a human driver.
T: Categorize email messages as spam or legitimate.

P: Percentage of email messages correctly classified.
E: Database of emails, some with human-given labels
13
Example 1: Class of ML Analysis
• Typical customer: Admin/ Instructor.

• Database:
Current students registered
basic parameters ( Height, weight )
Basic classification.
• Goal: predict/decide whether student is

FIT?
14
Example 2: Credit Risk Analysis
• Typical customer: bank.

• Database:
Current clients data, including:
basic profile (income, house ownership,
delinquent account, etc.)
Basic classification.
• Goal: predict/decide whether to grant

credit.
15
Example 2: Credit Risk Analysis
• Rules learned from data:

IF Other-Delinquent-Accounts > 2 and
Number-Delinquent-Billing-Cycles >1
THEN DENY CREDIT
IF Other-Delinquent-Accounts = 0 and
Income > $30k
THEN GRANT CREDIT
16
Example 3: Clustering news
• Data: Reuters news / Web data

• Goal: Basic category classification:
Business, sports, politics, etc.
classify to subcategories (unspecified)
• Methodology:
consider “typical words” for each category.
Classify using a “distance “ measure.
17
Traditional Programming
Data
Computer Output
Program
Machine Learning
Data
Computer Program
Output
18
Resources: Datasets
• UCI Repository:
http://www.ics.uci.edu/~mlearn/MLRepository.html
• UCI KDD Archive:

http://kdd.ics.uci.edu/summary.data.application.html
• Statlib: http://lib.stat.cmu.edu/
• Delve: http://www.cs.utoronto.ca/~delve/
• Kaggle : https://www.kaggle.com/notebook
19
Why Machine Learning?
• Consider an example of Spam filtering.
 First we look, how spam typically looks like, such as (“4U,” “credit card,”
“free,” and “amazing”)
 Then we write a detection algorithm for each patterns and flagged if
pattern is detected.
 We test our program and repeat step 1 and 2 until is good enough
Traditional
Approach
20
Since the problem is not trivial, your program will likely become a long list of complex rules—pretty hard to maintain
Why Machine Learning?...
• ML techniques automatically learns which words and phrases are good
predictors of spam by detecting unusually frequent patterns of words in the
spam examples compared to the ham example.
• The program is much shorter, easier to maintain, and most likely more
accurate.
21
• ML algorithms can be inspected to see what has been learned. For
instance, once the spam filter has been trained on enough spam, it can
easily be inspected to reveal the list of words and combinations of words
that it believes are the best predictors of spam.
• Sometimes this will reveal unsuspected correlations or new trends, and
thereby lead to a better understanding of the problem.
Applying ML
techniques to dig into
large amounts of data
can help discover
patterns that were not
immediately apparent.
This is called data
mining.
22
• No human experts
 industrial/manufacturing control
 mass spectrometer analysis, drug design, astronomic discovery
• Black-box human expertise
 face/handwriting/speech recognition
 driving a car, flying a plane
• Rapidly changing phenomena
 credit scoring, financial modeling
 diagnosis, fraud detection
• Need for customization/personalization
 personalized news reader
 movie/book recommendation
23
Benefit of ML over Rule
Based
• Problems for which existing solutions require a lot of
hand-tuning or long lists of rules: one ML algorithm can
often simplify code and perform better.
• Complex problems for which there is no good solution at
all using a traditional approach: the best ML techniques
can find a solution.
• Fluctuating environments: a ML system can adapt to new
data.
• Getting insights about complex problems and large
amounts of data.
24
Applications
• Traffic Alerts • Online Video Streaming Applications

• Image Recognition • Virtual Professional Assistants
• Video Surveillance • Machine Learning Usage in Social Media
• Sentiment Analysis • Stock Market Signals Using Machine
• Product Recommendation Learning
• Online support using Chatbots • Auto-Driven Cars
• Google Translate • Fraud Detection
25
Related Field
data
mining control theory
statistics
decision theory
information theory machine
learning
cognitive science
databases
psychological models
evolutionary neuroscience
models
Machine learning is primarily concerned with the accuracy

and effectiveness of the computer system.
Machine Learning System
Unsupervised Feature Extraction Machine Learning Grouping of Objects
Algorithm
Training Set
Supervised
Predictive Model
New Data
Annotated Data
27
Machine Learning System
28
Machine Learning in a Nutshell
• Tens of thousands of machine learning
algorithms.
• Hundreds new every year
• Every machine learning algorithm has
three components:
Representation
Evaluation
Optimization
29
Representation
• Decision trees
• Sets of rules / Logic programs
• Instances
• Graphical models (Bayes/Markov nets)
• Neural networks
• Support vector machines
• Model ensembles
• Etc.
30
Evaluation
• Confusion Matrix • Cost / Utility
• Accuracy • Margin
• Recall/Sensitivity/T • Specificity
rue Positive Rate • F-Score
• Specificity
• etc.
• Error Rate
• ROC
• Squared error
• Likelihood
• Posterior probability
31
Optimization
• Combinatorial optimization
E.g.: Greedy search,
finding an optimal object from a finite set of objects
• Convex optimization
E.g.: Gradient descent
Finding the minimum of a function.
• Constrained optimization
E.g.: Linear programming
Optimizing an objective function with respect to some
variables in the presence of constraints on those
variables
32
Examples of Machine Learning
Problems
• Pattern Recognition
 Facial identities or facial expressions
 Handwritten or spoken words (e.g., Siri)
 Medical images
 Sensor Data/IoT
• Optimization
 Many parameters have “hidden” relationships that can be the basis of
optimization
• Pattern Generation
 Generating images or motion sequences
• Anomaly Detection
 Unusual patterns in the telemetry from physical and/or virtual plants (e.g., data
centers)
 Unusual sequences of credit card transactions
 Unusual patterns of sensor data from a nuclear power plant
• or unusual sound in your car engine or …
• Prediction
 Future stock prices or currency exchange rates
33
Web-based E.g. of ML
• Web data is huge and tasks have to performed
with very big datasets often use ML.
 especially if the data is noisy or non-stationary.
• Spam filtering, fraud detection:
 The enemy adapts so we must adapt too.
• Recommendation systems:
Lots of noisy data. Million dollar prize!
• Information retrieval:
Find documents or images with similar content.
• Data Visualization:
Display a huge database in a revealing way
34
Domain of
ML
35
Types of Learning
• Supervised (inductive) learning
Training data includes desired outputs
• Unsupervised learning
Training data does not include desired
outputs
• Semi-supervised learning
Training data includes a few desired outputs
• Reinforcement learning
Rewards from sequence of actions
36
Inductive Learning
• Learner discovers rules by observing
examples
• Given examples of a function (X, F(X))
• Predict function F(X) for new examples X
Discrete F(X): Classification
Continuous F(X): Regression
F(X) = Probability(X): Probability estimation
37
Learning Algorithms
Supervised learning Unsupervised learning
Semi-supervised learning
38
Machine learning structure
• Supervised learning
39
Supervised Learning
40
E.g. Supervised Learning
41
Document Classifier E.g. Supervised Learning
42
Spectrum of Supervision
Unsupervised “Weakly” supervised Fully supervised
Definition depends on task

Slide credit: L. 43
Lazebnik
Machine learning structure
44
Unsupervised Learning
45
E.g. Unsupervised Learning
46
Reinforcement Learning
47
2
48
4
49
50
E.g. Reinforcement Learning
51
Why Machine Learning is Hard?
52
What We’ll Cover
• Fundamentals of Linear Algebra and Probability
• Supervised learning
 Linear Regression
 Logistic Regression
 Decision tree induction
 Instance-based learning
 Bayesian learning
 Neural networks
 Support vector machines
 Model ensembles
 Clustering
 Dimensionality reduction
• Reinforcement Learning
53
Data Representation
• Information systems:
It represents knowledge from RAW data,
which is used for decision making.
• Data warehousing
It provide integrated, consistent and cleaned
data to machine learning algorithms.
• Data Table:
It is used to represent information.
54
DATA TABLE
• Each row represents a measurements/
observations and each column gives the value of
an attribute of the information system for all
measurements/ observations.
• Different terms are used to call ‘Rows’
information such as “Instances, examples,
samples, measurements, observations,
records, patterns, objects, cases, events”
• Similarly, the ‘Column’ information is used to call
“attributes and features”.
55
E.G. DATA TABLE
• Consider a patient information in the data table.
• Features and attributes: Headache, Muscle-
Pain, Temperature. These attributes represented
in linguistic form.
Patient Headache Muscle Pain Temperature Flu
1 NO YES HIGH YES
2 YES YES HIGH YES
3 YES YES VERY HIGH YES
4 NO YES NORMAL NO
5 YES NO HIGH NO
6 NO YES VERY HIGH YES
56
E.G. DATA TABLE
• An outcome for each observation is known as “a
priori” for directed/supervised learning.
• Decision Attribute: one distinguished attributes
that represent knowledge and information
system of this kind called decision system.
• E.g. ‘FLU’ is decision attribute
• {Flu: Yes}, {Flu; No}.
• Flu is a decision attribute with respect to
condition attributes: headache, muscle-pain,
temperature.
57
E.G. DATA TABLE
• A data file represents inputs as N instances:
𝑆 (1) , 𝑆 (2) , 𝑆 (3) , … … … . 𝑆 𝑁 .
• Each individual instances 𝑆 𝑖 ; 𝑖 = 1, 2, … . . 𝑁 that
provides the input to the machine learning tools
is characterized by its predefined values for a set
of features/attributes 𝑥1 , 𝑥2 , 𝑥3 , … … … . 𝑥𝑛 or
𝑥𝑗 ; 𝑗 = 1,2,3, … . . 𝑛
58
E.G. DATA TABLE
𝑥𝑗
𝑥1 𝑥2 𝑥3 𝑥3 ……. 𝑥𝑛 Decision 𝑦
𝑆 (𝑖)
𝑆 (1)
𝑆 (2)
𝑆 (3)
𝑆 (4)
.
.
𝑆 (𝐍)
Training experience is available in the form of N examples: 𝑆 (𝑖) ∈ 𝑆; 𝑖 =

1, 2, 3 … 𝑁. Where 𝑆 is a set of possible instances, which come from
real world.
59
DATA REPRESENTATION
• An instance can be represented for 𝑛
attribute/features: 𝑥𝑗 ; 𝑗 = 1, 2, 3, … … . . 𝑛.
• These features can be visualize as 𝑛 numerical
features as a point in 𝑛 -dimensional state space
ℜ𝑛 .
• 𝒙 = [𝑥1 𝑥2 𝑥3 𝑥4 ……𝑥𝑛 ] 𝑻 ∈ ℜ𝑛 . The set 𝑋 is a
finite set of feature vector 𝑥 (𝑖) for all possible
instance.
• Also visualized as 𝑋 region in the state space ℜ𝑛
to which instance belongs, i.e. 𝑋 ⊂ ℜ𝑛
60
DATA REPRESENTATION
• Here, 𝑥 (𝑖) is a representation of 𝑠 (𝑖) , 𝑋 is the
representation space.
• The pair of (𝑆, 𝑋) constitutes the information system.
Where 𝑆 is non-empty set of instances and 𝑋 is non-
empty features.
• Here, index 𝑖 represents instances and j represents
features.
 𝒔 𝒊 ; 𝒊 = 𝟏, 𝟐, 𝟑, … . . 𝑵 ∈ 𝑺
 𝒙 𝒊 ; 𝒊 = 𝟏, 𝟐, 𝟑, … . . 𝑵 ∈ 𝑿 (set of features)
(𝒊)
 𝒙𝒋 ; 𝒋 = 𝟏, 𝟐, 𝟑, … . . 𝑵 = 𝒙(𝒊)
 Features 𝒙𝒋 ; 𝒋 = 𝟏, 𝟐, … . 𝒏, may be viewed as state variables
and feature vector 𝒙 as a state vector in n-dimensional space.
61
DATA REPRESENTATION
• For every feature 𝑥𝑗 a set of values can be written as
𝑉𝑥𝑗 ∈ 𝑅 and called as domain of 𝑥𝑗 ; 𝑗 = 1, 2, … . 𝑛.
(𝑖)
• 𝑉𝑥𝑗 ∈ 𝑉𝑥𝑗 ; 𝑖 = 1, 2, … 𝑁.
• The tuple (S,X,Y) may be constituted and this is called
decision system.
62
Thank You
63

Lecture 1 -Intro

Uploaded by

Lecture 1 -Intro

Uploaded by

Machine Learning

Dr. Dinesh Kumar Vishwakarma

2 Machine Learning: A probabilistic approach, by David Barber.

4 An Introduction to Statistical Learning: with Applications in R (Springer Texts in

6 Journal of Machine Learning Research

• Machine learning (ML) is the study of

• Machine Learning is the science (and art) of programming

The term machine learning was coined in 1959 by Arthur Samuel

Having Labelled Data:

Classification, Accuracy, Precession,

T: Recognizing hand-written words

T: Driving on four-lane highways using vision sensors

T: Categorize email messages as spam or legitimate.

• Typical customer: Admin/ Instructor.

• Goal: predict/decide whether student is

• Typical customer: bank.

• Goal: predict/decide whether to grant

• Rules learned from data:

• Data: Reuters news / Web data

• UCI KDD Archive:

• Traffic Alerts • Online Video Streaming Applications

Machine learning is primarily concerned with the accuracy

Supervised learning Unsupervised learning

Unsupervised “Weakly” supervised Fully supervised

Definition depends on task

Training experience is available in the form of N examples: 𝑆 (𝑖) ∈ 𝑆; 𝑖 =

You might also like