0% found this document useful (0 votes)
40 views66 pages

Lecture 01 - Introduction To AML-Jan24

Uploaded by

i202492
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
40 views66 pages

Lecture 01 - Introduction To AML-Jan24

Uploaded by

i202492
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 66

AI4003

Applied Machine Learning

Instructor: Dr. M. Tariq


Today’s Class

• A little about me

• Intro to Applied Machine


Learning

• Course outline and logistics


Muhammad Tariq| Professor & Head
Electrical Engineering| FAST-NUCES| Islamabad Campus
Postdoc. Princeton University| Ph.D. Waseda University| MS. Hanyang University
Fulbright Scholar USA| MEXT Scholar Japan| HEC Scholar S. Korea
Enlisted in Stanford's Top 2% Scientists
Course
outline
Grading Policy

Assessment Weightage % Due When


Methods
Assignment 6% At end of difficult
Absolute grading topics
Quiz 6% After each
assignment
Project 8% After Sessional-II
Sessional I 15% Sessional-I Week
Sessional II 15% Sessional-II Week
Final Exam 50% Final Exam Week
Homework details

• Implement and apply machine


learning methods in Python notebooks

• Submit Report PDF and Jupyter


notebook
Learning resources
Syllabus
Recordings
Assignments
Schedule
Lecture slides and readings
Lectures
• In-person

Office hours
• Friday 11.30 am to 1.00 pm

Readings/textbook: Forsyth Applied Machine Learning

The lectures are not directly based on any textbook, but will point you to relevant
readings from David
Forsyth’s Applied Machine Learning, which is considered our primary text, or other
online resources. The AML
book is really quite good and worth reading, even for parts not covered in lectures.
Academic Integrity
These are OK
• Discuss homeworks with classmates (don’t show each other code)
• Use Stack Overflow to learn how to use a Python module
• Get ideas from online (make sure to attribute the source)

Not OK
• Copying or looking at homework-specific code (i.e. so that you claim credit for part of an assignment based on code that you
didn’t write)
• Using external resources (code, ideas, data) without acknowledging them

Remember
• Ask if you’re not sure if it’s ok
• You are safe as long as you acknowledge all of your sources of inspiration, code, etc. in your write-up
Other comments

Prerequisites
• Probability, linear algebra, calculus, signal
and systems
• Experience with Python will help but is not
necessary, understanding that it may take
more time to complete assignments
• Watch tutorials (see schedule: intro reading)
for linear algebra, python/numpy, and
jupyter notebooks.
How is this course different from…

This course provides a foundation for ML practice, This course has less theory, derivations, and
while most of ML courses provides a foundation for ML optimization, and more on application representations
research and examples
Should you take this course?

Take this course if …


• You want to learn how to apply machine ML
• You like coding-based homeworks and are OK with
math too
• You are willing to spend 10-12 hours per week (maybe
even more) on lectures, reading, review, and
assignments

Do not take this course if …


• You want more of a theoretical background
• You want to focus on one application domain (take
vision, NLP, or a special topics course instead)
• You want an “easy A” (it’s not going to be easy)
Feedback is welcome

I will occasionally solicit feedback in You can always talk to me after class My goal is to be a force multiplier on
the class directly or indirectly– or send me email how much you can learn with a
please respond given amount of effort
What to do next

• Read the syllabus and schedule


• Unless you consider yourself highly proficient in Python/numpy and linear
algebra, watch/do the tutorials linked in the web page
General Purpose Learners

Kamath et al. 2022


What is machine learning?
• Create predictive models Data
or useful insights from raw
data
– Alexa speech recognition
– Amazon product
recommendations
Algorithm
– Tesla autopilot
– GPT-3 text generation
– Image generation ML spins raw data into gold!

– Data visualization
What is Machine Learning?

• “Learning is any process by which a system improves performance from experience.”


• - Herbert Simon

• Definition by Tom Mitchell (1998):


• Machine Learning is the study of algorithms that
• improve their performance P
• at some task T
• with experience E.
• A well-defined learning task is given by <P, T, E>.
The whole machine learning problem
1. Data preparation Example: voice recognition in Alexa
a. Collect and curate data
b. Annotate the data (for supervised
problems)
c. Split your data into train, validation, and
test sets

2. Algorithm and model development


a. Design methods to extract features from
the data
b. Design a machine learning model and Our focus, but it’s important to understand all of it
identify key parameters and loss
c. Train, select parameters, and evaluate
your designs using the validation set

3. Final evaluation using the test set

4. Integrate into your application


Course objectives
• Learn how to solve problems with ML • The global machine learning market is
expected to grow from $21.17 billion in 2022 to
• Key concepts and methodologies for $209.91 billion by 2029, at a CAGR of 38.8%.
learning from data With the field growing at such an exponential
rate the number of jobs is growing too and
• Algorithms and their strengths and machine learning is one of the most trending
limitations career paths of today. - Emeritus
• Domain-specific representations
• Ability to select the right tools for the
job
Traditional Programming
Data
Output
Program Computer

Machine Learning
Data
Computer Program
Output

4
Slide credit: Pedro Domingos
When Do We Use
Machine Learning? ML is used when:
• Human expertise does not exist (navigating on Mars)
• Humans can’t explain their expertise (speech recognition)
• Models must be customized (personalized medicine)
• Models are based on huge amounts of data (genomics)

Learning isn’t always useful:


• There is no need to “learn” to calculate payroll
5
Based on slide by E. Alpaydin
A classic example of a task that requires machine learning:
It is very hard to say what makes a 2

6
Slide credit: Geoffrey Hinton
Some more examples of tasks that are best
solved by using a learning algorithm

Recognizing patterns: Generating patterns: Recognizing anomalies: Prediction:


Facial identities or facial Generating images or motion Unusual credit card Future stock prices or currency
expressions sequences transactions exchange rates
Handwritten or spoken words Unusual patterns of sensor
Medical images readings in a nuclear power
plant

7
Slide credit: Geoffrey Hinton
Sample Applications

• Web search
• Computational biology
• Finance
• E-commerce
• Space exploration
• Robotics
• Information extraction
• Social networks
• Debugging software
• [Your favorite area]

8
Slide credit: Pedro Domingos
State of the Art Applications of
Machine Learning

11
Autonomous Cars

• Nevada made it legal for


autonomous cars to drive on
roads in June 2011
• As of 2013, four states (Nevada,
Florida, California, and
Michigan) have legalized
autonomous cars
Penn’s Autonomous Car →
(Ben Franklin Racing Team) 12
Autonomous Car Sensors

13
Autonomous Car Technology
Path
Planning

Laser Terrain Mapping

Learning from Human Drivers


Adaptive Vision

Sebastian

Stanley

Images and movies taken from Sebastian Thrun’s multimedia w1e4bsite.


Deep Learning in the Headlines

15
Deep Belief Net on Face Images
object models

object parts
(combination
of edges)

edges

pixels
Based on materials 16
by Andrew Ng
Learning of Object Parts

30
Slide credit: Andrew Ng
Training on Multiple Objects

• Trained on 4 classes (cars, faces,


motorbikes, airplanes).
• Second layer: Shared-features and
object-specific features.
• Third layer: More specific features.

31
Slide credit: Andrew Ng
Scene Labeling via Deep Learning

[Farabet et al. ICML 2012, PAMI 2013] 19


Inference from Deep Learned Models
Generating posterior samples from faces by “filling in” experiments
(cf. Lee and Mumford, 2003). Combine bottom-up and top-down inference.

Input images

Samples from
feedforward
Inference
(control)

Samples from
Full posterior
inference

20
Slide credit: Andrew Ng
Machine Learning in
Automatic Speech Recognition
A Typical Speech Recognition System

ML used to predict of phone states from the sound spectrogram

Deep learning has state-of-the-art results

# Hidden Layers 1 2 4 8 10 12

Word Error Rate % 16.0 12.8 11.4 10.9 11.0 11.1

Baseline GMM performance = 15.4%


[Zeiler et al. “On rectified linear units for speech
recognition” ICASSP 2013]
21
Impact of Deep Learning in Speech Technology

22
Slide credit: Li Deng, MS Research
Fake Videos

Cheapfake: Video frame slowed down to make Deepfake: A puppet-mastered deepfake to transfer
Nancy Pelosi’s speech appear slurred and drunk source’s head movement and facial expressions on
Putin’s face

Image credit: Washington Post,


Types of Learning

23
Types of Learning
• Supervised (inductive) learning
– Given: training data + desired outputs (labels)
• Unsupervised learning
– Given: training data (without desired outputs)
• Semi-supervised learning
– Given: training data + a few desired outputs
• Reinforcement learning
– Rewards from sequence of actions

24
Based on slide by Pedro Domingos
Supervised Learning: Regression
• Given (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f (x) to predict y given x
– y is real-valued == regression
9
8
September Arctic Sea Ice Extent

7
(1,000,000 sq km)

6
5
4
3
2
1
0
1970 1980 1990 2000 2010 2020
Year
26
Data from G. Witt. Journal of Statistics Education, Volume 21, Number 1 (2013)
Supervised Learning: Classification
• Given (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f (x) to predict y given x
– y is categorical == classification
Breast Cancer (Malignant / Benign)

1(Malignant)

0(Benign)
Tumor Size

27
Based on example by Andrew Ng
Supervised Learning: Classification
• Given (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f (x) to predict y given x
– y is categorical == classification
Breast Cancer (Malignant / Benign)

1(Malignant)

0(Benign)
Tumor Size

Based on example by Andrew Ng


Tumor Size 41
Supervised Learning: Classification
• Given (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f (x) to predict y given x
– y is categorical == classification
Breast Cancer (Malignant / Benign)

1(Malignant)

0(Benign)
Tumor Size
Predict Benign Predict Malignant

Based on example by Andrew Ng


Tumor Size 42
Supervised Learning
• x can be multi-dimensional
– Each dimension corresponds to an attribute

- Clump Thickness
- Uniformity of Cell Size
Age - Uniformity of Cell Shape

Tumor Size

30
Based on example by Andrew Ng
Unsupervised Learning
• Given x1, x2, ..., x n (without labels)
• Output hidden structure behind the x’s
– E.g., clustering

31
Unsupervised Learning
Genomics application: group individuals by genetic similarity
Genes

Individuals 32
[Source: Daphne Koller]
Unsupervised Learning

Organize computing clusters Social network analysis

Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison)

Market segmentation Astronomical data analysis 33


Slide credit: Andrew Ng
Unsupervised Learning
• Independent component analysis – separate a
combined signal into its original sources

47
Image credit: statsoft.com Audio from http://www.ism.ac.jp/~shiro/research/blindsep.html
Unsupervised Learning
• Independent component analysis – separate a
combined signal into its original sources

48
Image credit: statsoft.com Audio from http://www.ism.ac.jp/~shiro/research/blindsep.html
Reinforcement Learning
• Given a sequence of states and actions with
(delayed) rewards, output a policy
– Policy is a mapping from states → actions that
tells you what to do in a given state
• Examples:
– Credit assignment problem
– Game playing
– Robot in a maze
– Balance a pole on your hand

36
The Agent-Environment Interface

Agent and environment interact at discrete time steps : t = 0, 1, 2, K


Agent observes state at step t : st S
produces action at step t : at  A(st )
gets resulting reward : rt +1 
and resulting next state : st +1

... rt +1 s rt +2 s rt +3 s ...
st a t +1
at +1 t +2
at +2 t +3 at +3
t
37
Slide credit: Sutton & Barto
Reinforcement Learning

https://www.youtube.com/watch?v=4cgWya-wjgY
51
Inverse Reinforcement Learning
• Learn policy from user demonstrations

Stanford Autonomous Helicopter


http://heli.stanford.edu/
https://www.youtube.com/watch?v=VCdxqn0fcnE
52
Framing a Learning Problem

53
Designing a Learning System

• Choose the training experience


• Choose exactly what is to be learned
– i.e. the target function
• Choose how to represent the target function
• Choose a learning algorithm to infer the target
function from the experience

Training data Learner

Environment/
Experience Knowledge

Testing data
Performance
Element 41
Based on slide by Ray Mooney
Training vs. Test Distribution
• We generally assume that the training and
test examples are independently drawn from
the same overall distribution of data
– We call this “i.i.d” which stands for “independent
and identically distributed”

• If examples are not independent, requires


collective classification
• If test distribution is different, requires
transfer learning
42
Slide credit: Ray Mooney
Tens of thousands of machine learning algorithms
– Hundreds new every year

ML in a Every ML algorithm has three components:


– Representation

Nutshell – what the model looks like; how knowledge is represented.


– Optimization
– the process for finding good models; how programs are
generated.
– Evaluation
– how good models are differentiated; how programs are
evaluated. 43
Various Function Representations
• Numerical functions
– Linear regression
– Neural networks
– Support vector machines
• Symbolic functions
– Decision trees
– Rules in propositional logic
– Rules in first-order predicate logic
• Instance-based functions
– Nearest-neighbor
– Case-based
• Probabilistic Graphical Models
– Naïve Bayes
– Bayesian networks
– Hidden-Markov Models (HMMs)
– Probabilistic Context Free Grammars (PCFGs)
– Markov networks

57
Slide credit: Ray Mooney
Various Search/Optimization
Algorithms
• Gradient descent
– Perceptron
– Backpropagation
• Dynamic Programming
– HMM Learning
– PCFG Learning
• Divide and Conquer
– Decision tree induction
– Rule learning
• Evolutionary Computation
– Genetic Algorithms (GAs)
– Genetic Programming (GP)
– Neuro-evolution

58
Slide credit: Ray Mooney
Evaluation
• Accuracy
• Precision and recall
• Squared error
• Likelihood
• Posterior probability
• Cost / Utility
• Margin
• Entropy
• K-L divergence
• etc.

47
Slide credit: Pedro Domingos
ML in Practice

• Understand domain, prior knowledge, and goals


• Data integration, selection, cleaning, pre-processing, etc.
Loop • Learn models
• Interpret results
• Consolidate and deploy discovered knowledge

48
Based on a slide by Pedro Domingos
Lessons Learned about Learning
• Learning can be viewed as using direct or indirect
experience to approximate a chosen target function.

• Function approximation can be viewed as a search


through a space of hypotheses (representations of
functions) for one that best fits a set of training data.

• Different learning methods assume different


hypothesis spaces (representation languages) and/or
employ different search techniques.

49
Slide credit: Ray Mooney
A Brief History of
Machine Learning

50
History of Machine Learning
• 1950s
– Samuel’s checker player
– Selfridge’s Pandemonium
• 1960s:
– Neural networks: Perceptron
– Pattern recognition
– Learning in the limit theory
– Minsky and Papert prove limitations of Perceptron
• 1970s:
– Symbolic concept induction
– Winston’s arch learner
– Expert systems and the knowledge acquisition bottleneck
– Quinlan’s ID3
– Michalski’s AQ and soybean diagnosis
– Scientific discovery with BACON
– Mathematical discovery with AM

63
Slide credit: Ray Mooney
History of Machine Learning (cont.)
• 1980s:
– Advanced decision tree and rule learning
– Explanation-based Learning (EBL)
– Learning and planning and problem solving
– Utility problem
– Analogy
– Cognitive architectures
– Resurgence of neural networks (connectionism, backpropagation)
– Valiant’s PAC Learning Theory
– Focus on experimental methodology
• 1990s
– Data mining
– Adaptive software agents and web applications
– Text learning
– Reinforcement learning (RL)
– Inductive Logic Programming (ILP)
– Ensembles: Bagging, Boosting, and Stacking
– Bayes Net learning
64
Slide credit: Ray Mooney
History of Machine Learning (cont.)
• 2000s
– Support vector machines & kernel methods
– Graphical models
– Statistical relational learning
– Transfer learning
– Sequence labeling
– Collective classification and structured outputs
– Computer Systems Applications (Compilers, Debugging, Graphics, Security)
– E-mail management
– Personalized assistants that learn
– Learning in robotics and vision
• 2010s
– Deep learning systems
– Learning for big data
– Bayesian methods
– Multi-task & lifelong learning
– Applications to vision, speech, social networks, learning to read, etc.
– ???
65
Based on slide by Ray Mooney
What We’ll Cover in this Course

• Supervised learning • Unsupervised learning


– Decision tree induction – Clustering
– Linear regression – Dimensionality reduction
– Logistic regression • Reinforcement learning
– Support vector machines – Temporal difference
& kernel methods learning
– Model ensembles – Q learning
– Bayesian learning • Evaluation
– Neural networks & deep • Applications
learning
– Learning theory

Our focus will be on applying machine learning to real applications


54

You might also like