Lecture 01 - Introduction To AML-Jan24
Lecture 01 - Introduction To AML-Jan24
• A little about me
Office hours
• Friday 11.30 am to 1.00 pm
The lectures are not directly based on any textbook, but will point you to relevant
readings from David
Forsyth’s Applied Machine Learning, which is considered our primary text, or other
online resources. The AML
book is really quite good and worth reading, even for parts not covered in lectures.
Academic Integrity
These are OK
• Discuss homeworks with classmates (don’t show each other code)
• Use Stack Overflow to learn how to use a Python module
• Get ideas from online (make sure to attribute the source)
Not OK
• Copying or looking at homework-specific code (i.e. so that you claim credit for part of an assignment based on code that you
didn’t write)
• Using external resources (code, ideas, data) without acknowledging them
Remember
• Ask if you’re not sure if it’s ok
• You are safe as long as you acknowledge all of your sources of inspiration, code, etc. in your write-up
Other comments
Prerequisites
• Probability, linear algebra, calculus, signal
and systems
• Experience with Python will help but is not
necessary, understanding that it may take
more time to complete assignments
• Watch tutorials (see schedule: intro reading)
for linear algebra, python/numpy, and
jupyter notebooks.
How is this course different from…
This course provides a foundation for ML practice, This course has less theory, derivations, and
while most of ML courses provides a foundation for ML optimization, and more on application representations
research and examples
Should you take this course?
I will occasionally solicit feedback in You can always talk to me after class My goal is to be a force multiplier on
the class directly or indirectly– or send me email how much you can learn with a
please respond given amount of effort
What to do next
– Data visualization
What is Machine Learning?
Machine Learning
Data
Computer Program
Output
4
Slide credit: Pedro Domingos
When Do We Use
Machine Learning? ML is used when:
• Human expertise does not exist (navigating on Mars)
• Humans can’t explain their expertise (speech recognition)
• Models must be customized (personalized medicine)
• Models are based on huge amounts of data (genomics)
6
Slide credit: Geoffrey Hinton
Some more examples of tasks that are best
solved by using a learning algorithm
7
Slide credit: Geoffrey Hinton
Sample Applications
• Web search
• Computational biology
• Finance
• E-commerce
• Space exploration
• Robotics
• Information extraction
• Social networks
• Debugging software
• [Your favorite area]
8
Slide credit: Pedro Domingos
State of the Art Applications of
Machine Learning
11
Autonomous Cars
13
Autonomous Car Technology
Path
Planning
Sebastian
Stanley
15
Deep Belief Net on Face Images
object models
object parts
(combination
of edges)
edges
pixels
Based on materials 16
by Andrew Ng
Learning of Object Parts
30
Slide credit: Andrew Ng
Training on Multiple Objects
31
Slide credit: Andrew Ng
Scene Labeling via Deep Learning
Input images
Samples from
feedforward
Inference
(control)
Samples from
Full posterior
inference
20
Slide credit: Andrew Ng
Machine Learning in
Automatic Speech Recognition
A Typical Speech Recognition System
# Hidden Layers 1 2 4 8 10 12
22
Slide credit: Li Deng, MS Research
Fake Videos
Cheapfake: Video frame slowed down to make Deepfake: A puppet-mastered deepfake to transfer
Nancy Pelosi’s speech appear slurred and drunk source’s head movement and facial expressions on
Putin’s face
23
Types of Learning
• Supervised (inductive) learning
– Given: training data + desired outputs (labels)
• Unsupervised learning
– Given: training data (without desired outputs)
• Semi-supervised learning
– Given: training data + a few desired outputs
• Reinforcement learning
– Rewards from sequence of actions
24
Based on slide by Pedro Domingos
Supervised Learning: Regression
• Given (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f (x) to predict y given x
– y is real-valued == regression
9
8
September Arctic Sea Ice Extent
7
(1,000,000 sq km)
6
5
4
3
2
1
0
1970 1980 1990 2000 2010 2020
Year
26
Data from G. Witt. Journal of Statistics Education, Volume 21, Number 1 (2013)
Supervised Learning: Classification
• Given (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f (x) to predict y given x
– y is categorical == classification
Breast Cancer (Malignant / Benign)
1(Malignant)
0(Benign)
Tumor Size
27
Based on example by Andrew Ng
Supervised Learning: Classification
• Given (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f (x) to predict y given x
– y is categorical == classification
Breast Cancer (Malignant / Benign)
1(Malignant)
0(Benign)
Tumor Size
1(Malignant)
0(Benign)
Tumor Size
Predict Benign Predict Malignant
- Clump Thickness
- Uniformity of Cell Size
Age - Uniformity of Cell Shape
…
Tumor Size
30
Based on example by Andrew Ng
Unsupervised Learning
• Given x1, x2, ..., x n (without labels)
• Output hidden structure behind the x’s
– E.g., clustering
31
Unsupervised Learning
Genomics application: group individuals by genetic similarity
Genes
Individuals 32
[Source: Daphne Koller]
Unsupervised Learning
47
Image credit: statsoft.com Audio from http://www.ism.ac.jp/~shiro/research/blindsep.html
Unsupervised Learning
• Independent component analysis – separate a
combined signal into its original sources
48
Image credit: statsoft.com Audio from http://www.ism.ac.jp/~shiro/research/blindsep.html
Reinforcement Learning
• Given a sequence of states and actions with
(delayed) rewards, output a policy
– Policy is a mapping from states → actions that
tells you what to do in a given state
• Examples:
– Credit assignment problem
– Game playing
– Robot in a maze
– Balance a pole on your hand
36
The Agent-Environment Interface
... rt +1 s rt +2 s rt +3 s ...
st a t +1
at +1 t +2
at +2 t +3 at +3
t
37
Slide credit: Sutton & Barto
Reinforcement Learning
https://www.youtube.com/watch?v=4cgWya-wjgY
51
Inverse Reinforcement Learning
• Learn policy from user demonstrations
53
Designing a Learning System
Environment/
Experience Knowledge
Testing data
Performance
Element 41
Based on slide by Ray Mooney
Training vs. Test Distribution
• We generally assume that the training and
test examples are independently drawn from
the same overall distribution of data
– We call this “i.i.d” which stands for “independent
and identically distributed”
57
Slide credit: Ray Mooney
Various Search/Optimization
Algorithms
• Gradient descent
– Perceptron
– Backpropagation
• Dynamic Programming
– HMM Learning
– PCFG Learning
• Divide and Conquer
– Decision tree induction
– Rule learning
• Evolutionary Computation
– Genetic Algorithms (GAs)
– Genetic Programming (GP)
– Neuro-evolution
58
Slide credit: Ray Mooney
Evaluation
• Accuracy
• Precision and recall
• Squared error
• Likelihood
• Posterior probability
• Cost / Utility
• Margin
• Entropy
• K-L divergence
• etc.
47
Slide credit: Pedro Domingos
ML in Practice
48
Based on a slide by Pedro Domingos
Lessons Learned about Learning
• Learning can be viewed as using direct or indirect
experience to approximate a chosen target function.
49
Slide credit: Ray Mooney
A Brief History of
Machine Learning
50
History of Machine Learning
• 1950s
– Samuel’s checker player
– Selfridge’s Pandemonium
• 1960s:
– Neural networks: Perceptron
– Pattern recognition
– Learning in the limit theory
– Minsky and Papert prove limitations of Perceptron
• 1970s:
– Symbolic concept induction
– Winston’s arch learner
– Expert systems and the knowledge acquisition bottleneck
– Quinlan’s ID3
– Michalski’s AQ and soybean diagnosis
– Scientific discovery with BACON
– Mathematical discovery with AM
63
Slide credit: Ray Mooney
History of Machine Learning (cont.)
• 1980s:
– Advanced decision tree and rule learning
– Explanation-based Learning (EBL)
– Learning and planning and problem solving
– Utility problem
– Analogy
– Cognitive architectures
– Resurgence of neural networks (connectionism, backpropagation)
– Valiant’s PAC Learning Theory
– Focus on experimental methodology
• 1990s
– Data mining
– Adaptive software agents and web applications
– Text learning
– Reinforcement learning (RL)
– Inductive Logic Programming (ILP)
– Ensembles: Bagging, Boosting, and Stacking
– Bayes Net learning
64
Slide credit: Ray Mooney
History of Machine Learning (cont.)
• 2000s
– Support vector machines & kernel methods
– Graphical models
– Statistical relational learning
– Transfer learning
– Sequence labeling
– Collective classification and structured outputs
– Computer Systems Applications (Compilers, Debugging, Graphics, Security)
– E-mail management
– Personalized assistants that learn
– Learning in robotics and vision
• 2010s
– Deep learning systems
– Learning for big data
– Bayesian methods
– Multi-task & lifelong learning
– Applications to vision, speech, social networks, learning to read, etc.
– ???
65
Based on slide by Ray Mooney
What We’ll Cover in this Course