Lecture 1 -Intro
Lecture 1 -Intro
2
Evaluation Schedule
3
Evaluation Criteria
• CWS (15%) PRS (25%)
Assignments
Tutorials
Quiz's/Random Questions
• MTE (20%)
1 Innovative Work in the form of Small Project, Startup Idea, Collaborative
Projects, Automation, Simulation, Case study, Solutions to Real time social,
economic and technical problems etc. (group of maximum 2 students): Graphical
abstract
• ETE (40%)
(15x2=30%) 3 Class Tests after every 4 weeks, Best 2 will be considered for
evaluation.
(10x1=10%) Minor Tests in the form of Quizzes, Short Answer Questions,
MCQs, Open Ended/Essay, Questions, etc. Better of the two will be considered
for evaluation.
4
Course Content
Contact
Unit No. Contents
Hours
Introduction to Machine Learning: Overview of different tasks:
classification, regression, clustering, control, Concept learning,
1 information theory and decision trees, data representation, diversity of 8
data, data table, form of learning, Basic Linear Algebra in machine
learning techniques.
Supervised Learning: Decision trees, nearest neighbours, linear
2 classifiers and kernels, neural networks, linear regression, logistic 12
regression, Support Vector Machines.
Unsupervised Learning: Clustering, Expectation Maximization, K-
3 Mean clustering, Dimensionality Reduction, Feature Selection, PCA, 10
factor analysis, manifold learning.
Reinforcement Learning: Element of Reinforcement learning, Basic of
4 Dynamic Programming; fining optimal policies, value iteration; policy 8
iteration; TD learning; Q learning; actor-critic.
Recent applications & Research Topics: Applications in the fields of
5 4
web and data mining, text recognition, speech recognition, finance.
Total Contact Hours 42
5
Books
Text Books
1 Introduction to Machine Learning, Alpaydin, E., MIT Press, 2004
2 Machine Learning, Tom Mitchell, McGraw Hill, 1997
3 Elements of Machine Learning, Pat Langley Morgan Kaufmann Publishers
4. Applied Machine Learning, M. Gopal, McGraw Hill, 2018
Reference
1 The elements of statistical learning, Friedman, Jerome, Trevor Hastie, and
Robert Tibshirani. Vol. 1. Springer, Berlin: Springer series in statistics, 2001.
6
Resources: Journals
1 IEEE Transactions on Pattern Analysis and Machine
Intelligence
2 IEEE Transactions on Neural Networks and Learning
Systems
3 Pattern Recognition
4 International Journal of Computer Vision
5 IEEE Transactions on Fuzzy Systems
Ranking
http://www.guide2research.com/topconf/machine-learning 8
A Few Quotes
• “A breakthrough in machine learning would be worth
ten Microsofts” (Bill Gates, Chairman, Microsoft)
• Machine learning is the hot new thing”
(John Hennessy, President, Stanford)
• “Web rankings today are mostly a matter of machine
learning” (Prabhakar Raghavan, Dir. Research, Yahoo)
• “Machine learning is going to result in a real revolution”
(Greg Papadopoulos, CTO, Sun)
9
What is Machine Learning?
• A branch of artificial intelligence, concerned with the design and
development of algorithms that allow computers to evolve
behaviors based on empirical data.
Process T Measure
E Improve
P
11
What is Machine Learning?
E T P
Experience Task Performance
12
What is Machine Learning?
T: Playing checkers
P: Percentage of games won against an arbitrary opponent
E: Playing practice games against itself
13
Example 1: Class of ML Analysis
16
Example 3: Clustering news
• Methodology:
consider “typical words” for each category.
Classify using a “distance “ measure.
17
What is Machine Learning?
Traditional Programming
Data
Computer Output
Program
Machine Learning
Data
Computer Program
Output
18
Resources: Datasets
• UCI Repository:
http://www.ics.uci.edu/~mlearn/MLRepository.html
• Statlib: http://lib.stat.cmu.edu/
• Delve: http://www.cs.utoronto.ca/~delve/
• Kaggle : https://www.kaggle.com/notebook
19
Why Machine Learning?
• Consider an example of Spam filtering.
First we look, how spam typically looks like, such as (“4U,” “credit card,”
“free,” and “amazing”)
Then we write a detection algorithm for each patterns and flagged if
pattern is detected.
We test our program and repeat step 1 and 2 until is good enough
Traditional
Approach
20
Since the problem is not trivial, your program will likely become a long list of complex rules—pretty hard to maintain
Why Machine Learning?...
• ML techniques automatically learns which words and phrases are good
predictors of spam by detecting unusually frequent patterns of words in the
spam examples compared to the ham example.
• The program is much shorter, easier to maintain, and most likely more
accurate.
21
Why Machine Learning?...
• ML algorithms can be inspected to see what has been learned. For
instance, once the spam filter has been trained on enough spam, it can
easily be inspected to reveal the list of words and combinations of words
that it believes are the best predictors of spam.
• Sometimes this will reveal unsuspected correlations or new trends, and
thereby lead to a better understanding of the problem.
Applying ML
techniques to dig into
large amounts of data
can help discover
patterns that were not
immediately apparent.
This is called data
mining.
22
Why Machine Learning?...
• No human experts
industrial/manufacturing control
mass spectrometer analysis, drug design, astronomic discovery
• Black-box human expertise
face/handwriting/speech recognition
driving a car, flying a plane
• Rapidly changing phenomena
credit scoring, financial modeling
diagnosis, fraud detection
• Need for customization/personalization
personalized news reader
movie/book recommendation
23
Benefit of ML over Rule
Based
• Problems for which existing solutions require a lot of
hand-tuning or long lists of rules: one ML algorithm can
often simplify code and perform better.
• Complex problems for which there is no good solution at
all using a traditional approach: the best ML techniques
can find a solution.
• Fluctuating environments: a ML system can adapt to new
data.
• Getting insights about complex problems and large
amounts of data.
24
Applications
statistics
decision theory
information theory machine
learning
cognitive science
databases
psychological models
evolutionary neuroscience
models
Supervised
Predictive Model
New Data
Annotated Data
27
Machine Learning System
28
Machine Learning in a Nutshell
• Tens of thousands of machine learning
algorithms.
• Hundreds new every year
• Every machine learning algorithm has
three components:
Representation
Evaluation
Optimization
29
Representation
• Decision trees
• Sets of rules / Logic programs
• Instances
• Graphical models (Bayes/Markov nets)
• Neural networks
• Support vector machines
• Model ensembles
• Etc.
30
Evaluation
• Confusion Matrix • Cost / Utility
• Accuracy • Margin
• Recall/Sensitivity/T • Specificity
rue Positive Rate • F-Score
• Specificity
• etc.
• Error Rate
• ROC
• Squared error
• Likelihood
• Posterior probability
31
Optimization
• Combinatorial optimization
E.g.: Greedy search,
finding an optimal object from a finite set of objects
• Convex optimization
E.g.: Gradient descent
Finding the minimum of a function.
• Constrained optimization
E.g.: Linear programming
Optimizing an objective function with respect to some
variables in the presence of constraints on those
variables
32
Examples of Machine Learning
Problems
• Pattern Recognition
Facial identities or facial expressions
Handwritten or spoken words (e.g., Siri)
Medical images
Sensor Data/IoT
• Optimization
Many parameters have “hidden” relationships that can be the basis of
optimization
• Pattern Generation
Generating images or motion sequences
• Anomaly Detection
Unusual patterns in the telemetry from physical and/or virtual plants (e.g., data
centers)
Unusual sequences of credit card transactions
Unusual patterns of sensor data from a nuclear power plant
• or unusual sound in your car engine or …
• Prediction
Future stock prices or currency exchange rates
33
Web-based E.g. of ML
• Web data is huge and tasks have to performed
with very big datasets often use ML.
especially if the data is noisy or non-stationary.
• Spam filtering, fraud detection:
The enemy adapts so we must adapt too.
• Recommendation systems:
Lots of noisy data. Million dollar prize!
• Information retrieval:
Find documents or images with similar content.
• Data Visualization:
Display a huge database in a revealing way
34
Domain of
ML
35
Types of Learning
• Supervised (inductive) learning
Training data includes desired outputs
• Unsupervised learning
Training data does not include desired
outputs
• Semi-supervised learning
Training data includes a few desired outputs
• Reinforcement learning
Rewards from sequence of actions
36
Inductive Learning
• Learner discovers rules by observing
examples
• Given examples of a function (X, F(X))
• Predict function F(X) for new examples X
Discrete F(X): Classification
Continuous F(X): Regression
F(X) = Probability(X): Probability estimation
37
Learning Algorithms
Semi-supervised learning
38
Machine learning structure
• Supervised learning
39
Supervised Learning
40
E.g. Supervised Learning
41
Document Classifier E.g. Supervised Learning
42
Spectrum of Supervision
44
Unsupervised Learning
45
E.g. Unsupervised Learning
46
Reinforcement Learning
47
Reinforcement Learning
2
48
Reinforcement Learning
4
49
Reinforcement Learning
50
E.g. Reinforcement Learning
51
Why Machine Learning is Hard?
52
What We’ll Cover
• Fundamentals of Linear Algebra and Probability
• Supervised learning
Linear Regression
Logistic Regression
Decision tree induction
Instance-based learning
Bayesian learning
Neural networks
Support vector machines
Model ensembles
• Unsupervised learning
Clustering
Dimensionality reduction
• Reinforcement Learning
53
Data Representation
• Information systems:
It represents knowledge from RAW data,
which is used for decision making.
• Data warehousing
It provide integrated, consistent and cleaned
data to machine learning algorithms.
• Data Table:
It is used to represent information.
54
DATA TABLE
• Each row represents a measurements/
observations and each column gives the value of
an attribute of the information system for all
measurements/ observations.
• Different terms are used to call ‘Rows’
information such as “Instances, examples,
samples, measurements, observations,
records, patterns, objects, cases, events”
• Similarly, the ‘Column’ information is used to call
“attributes and features”.
55
E.G. DATA TABLE
• Consider a patient information in the data table.
• Features and attributes: Headache, Muscle-
Pain, Temperature. These attributes represented
in linguistic form.
Patient Headache Muscle Pain Temperature Flu
1 NO YES HIGH YES
2 YES YES HIGH YES
3 YES YES VERY HIGH YES
4 NO YES NORMAL NO
5 YES NO HIGH NO
6 NO YES VERY HIGH YES
56
E.G. DATA TABLE
• An outcome for each observation is known as “a
priori” for directed/supervised learning.
• Decision Attribute: one distinguished attributes
that represent knowledge and information
system of this kind called decision system.
• E.g. ‘FLU’ is decision attribute
• {Flu: Yes}, {Flu; No}.
• Flu is a decision attribute with respect to
condition attributes: headache, muscle-pain,
temperature.
57
E.G. DATA TABLE
• A data file represents inputs as N instances:
𝑆 (1) , 𝑆 (2) , 𝑆 (3) , … … … . 𝑆 𝑁 .
• Each individual instances 𝑆 𝑖 ; 𝑖 = 1, 2, … . . 𝑁 that
provides the input to the machine learning tools
is characterized by its predefined values for a set
of features/attributes 𝑥1 , 𝑥2 , 𝑥3 , … … … . 𝑥𝑛 or
𝑥𝑗 ; 𝑗 = 1,2,3, … . . 𝑛
58
E.G. DATA TABLE
𝑥𝑗
𝑥1 𝑥2 𝑥3 𝑥3 ……. 𝑥𝑛 Decision 𝑦
𝑆 (𝑖)
𝑆 (1)
𝑆 (2)
𝑆 (3)
𝑆 (4)
.
.
𝑆 (𝐍)
59
DATA REPRESENTATION
• An instance can be represented for 𝑛
attribute/features: 𝑥𝑗 ; 𝑗 = 1, 2, 3, … … . . 𝑛.
• These features can be visualize as 𝑛 numerical
features as a point in 𝑛 -dimensional state space
ℜ𝑛 .
• 𝒙 = [𝑥1 𝑥2 𝑥3 𝑥4 ……𝑥𝑛 ] 𝑻 ∈ ℜ𝑛 . The set 𝑋 is a
finite set of feature vector 𝑥 (𝑖) for all possible
instance.
• Also visualized as 𝑋 region in the state space ℜ𝑛
to which instance belongs, i.e. 𝑋 ⊂ ℜ𝑛
60
DATA REPRESENTATION
• Here, 𝑥 (𝑖) is a representation of 𝑠 (𝑖) , 𝑋 is the
representation space.
• The pair of (𝑆, 𝑋) constitutes the information system.
Where 𝑆 is non-empty set of instances and 𝑋 is non-
empty features.
• Here, index 𝑖 represents instances and j represents
features.
𝒔 𝒊 ; 𝒊 = 𝟏, 𝟐, 𝟑, … . . 𝑵 ∈ 𝑺
𝒙 𝒊 ; 𝒊 = 𝟏, 𝟐, 𝟑, … . . 𝑵 ∈ 𝑿 (set of features)
(𝒊)
𝒙𝒋 ; 𝒋 = 𝟏, 𝟐, 𝟑, … . . 𝑵 = 𝒙(𝒊)
Features 𝒙𝒋 ; 𝒋 = 𝟏, 𝟐, … . 𝒏, may be viewed as state variables
and feature vector 𝒙 as a state vector in n-dimensional space.
61
DATA REPRESENTATION
• For every feature 𝑥𝑗 a set of values can be written as
𝑉𝑥𝑗 ∈ 𝑅 and called as domain of 𝑥𝑗 ; 𝑗 = 1, 2, … . 𝑛.
(𝑖)
• 𝑉𝑥𝑗 ∈ 𝑉𝑥𝑗 ; 𝑖 = 1, 2, … 𝑁.
• The tuple (S,X,Y) may be constituted and this is called
decision system.
62
Thank You
63