Lecture 1

Introduction to Machine Learning.
Basic Concepts and Learning

Paradigms.
Radu Ionescu, Prof. PhD.

raducu.ionescu@gmail.com
Faculty of Mathematics and Computer Science

University of Bucharest
Machine Learning
Artificial Intelligence
Natural Language
Kernel Methods Processing
Deep Learning Nearest Neighbors Knowledge

Representation
Markov Models Machine K-means
Learning Expert Systems
Bayesian Models Random Forest
Hierarchical Clustering Computer Vision
Image Processing
Signal Processing
Instructors
• Lectures:
Radu Ionescu (raducu.ionescu@gmail.com)
• Labs:
Alin Croitoru (alincroitoru97@gmail.com)
Eduard Poesina (eduardgabriel.poe@gmail.com)
Vlad Hondru (vlad.hondru25@gmail.com)
• Website:
https://practical-ml-fmi.github.io/ML/
Grading System
• Your final grade is composed of:
 50% for Project 1
 50% for Project 2
• Both projects are individual!
• Each project consists of employing machine learning methods
on a specific data set
• Project 1 is about participating in a Kaggle competition
 The competition will be launched in a couple of weeks
• Project 2 is about comparing two unsupervised approaches
 There are many datasets out there, so no overlap allowed
among students!
 Methods and data sets must be chosen beforehand!
Grading System
• Project 1 must be presented no later than week 10
• Project 2 must be presented no later than the day of the “exam”
• There will be no paper exam, only oral exam!
• The average grade of projects 1 and 2 must be >= 5
• The project consists of the code implementation in Python (any
library is allowed) and a PDF report including (2 points):
 a description of the data set (for project 2 only)
 a description of the implemented machine learning methods
 figures and / or tables with results / hyperparameter tuning
 comments / interpretation for the results
 conclusion
Grading System

First project consists of implementing some machine learning
method(s) for the proposed Kaggle challenge (TBA)

The grades will be proportional to your model’s accuracy:
- Top 1-20 => your grade can be up to 10
- Others => your grade can be up to 5

Ranks can change depending on the final number of
participants
Grading System

Submit projects to: practical.ml.fmi@gmail.com

Submit only .py files only! (.ipynb not accepted)

We will set deadlines (during every evaluation session) for:
 choosing the project
 submitting the project
 presenting the project
• If you don’t know the dates, please ask! Don’t wait until the
presentation day!
Grading System
• Extra points during lectures / labs
 awarded only in the first round of evaluation
• Lectures:
 awarded based on the ranking of answers on Kahoot
 top 3 get up to 0.3 points per lecture, next 3 up to 0.2 points and
so on
• Labs:
 first to answer solve an exercise gets 0.2 points
 maximum 0.4 points per lab for each student
• Up to 1 bonus point during lectures (added to final grade)
• Up to 1 bonus point during labs (added to final grade)
• Maybe up to 2 bonus points for some data annotation (TBD)
(NO) Collaboration Policy
• Collaboration
Each student must write their own code for the project(s)
Borrowing code from web sources with copy & paste is
not permitted under any circumstances
• No tolerance on plagiarism
Neither ethical nor in your best interest
Code will be checked automatically and manually!
Don’t cheat. We will find out!
We are serious about this!
Examples of unacceptable plagiarism
Examples of acceptable code
What is artificial intelligence (AI)?
• The ultimate goal of artificial intelligence is to build systems
able to reach human intelligence levels
• Turing test: a computer is said to possess human-level
intelligence if a remote human interrogator, within a fixed time
frame, cannot distinguish between the computer and a human
subject based on their replies to various questions posed by the
interrogator
Perhaps we are going in the right
direction?
What is machine learning (ML)?
• Many AI researchers consider the ultimate goal of AI can

be achieved by imitating the way humans learn
• Machine Learning – is the scientific study of algorithms
and statistical models that computer systems use to learn
from observations, without being explicitly programmed
• In this context, learning refers to:
 recognizing complex patterns in data
 making intelligent decisions based on data observations
Classic Programming
Data
Computer Output
Program
Machine Learning
Data
Computer Program
Output
A well-posed machine learning
problem
• What problems can be solved* with machine learning?
• Well-posed machine learning problem:
"A computer program is said to learn from experience E with
respect to some class of tasks T and performance measure
P, if its performance at tasks in T, as measured by P,
improves with experience E.” – Tom Mitchell
(*) implies a certain degree of accuracy
A well-posed machine learning
problem
• Arthur Samuel (1959) wrote a program for playing checkers
(perhaps the first program based on the concept of learning, as
defined by Tom Mitchell)
• The program played 10K games against itself
• The program was designed to find the good and bad positions
on the board from the current state, based on the probability of
winning or losing
• In this example:
 E = 10000 games
 T = play checkers
 P = win or lose
Strong AI versus Weak AI
• Strong / generic / true AI
(see the Turing test and its extensions)
• Weak / narrow AI
(focuses on a specific well-posed problem)
When do we use machine learning?
• We use ML when it is hard (impossible) to define a set of
rules by hand / to write a program based on explicit rules
• Examples of tasks that be solved through machine

learning:
 face detection
 speech recognition
 stock price prediction
 object recognition
The essence of machine learning
• A pattern exists
• We cannot express it programmatically
• We have data on it
What is machine learning?
• [Arthur Samuel, 1959] field of study that:
• gives computers the ability to learn without being
explicitly programmed
• [Kevin Murphy] algorithms that:

• automatically detect patterns in data
• use the uncovered patterns to predict future data or
other outcomes of interest
• [Tom Mitchell] algorithms that:

• improve their performance (P)
• at some task (T)
• with experience (E)
Brief history of AI
(C) Dhruv Batra 30

Brief history of AI
• “We propose that a 2 month, 10 man study of artificial
intelligence be carried out during the summer of 1956 at
Dartmouth College in Hanover, New Hampshire.”
• The study is to proceed on the basis of the conjecture that
every aspect of learning or any other feature of intelligence
can in principle be so precisely described that a machine
can be made to simulate it.
• An attempt will be made to find how to make machines
use language, form abstractions and concepts, solve kinds
of problems now reserved for humans, and improve
themselves.
• We think that a significant advance can be made in one or
more of these problems if a carefully selected group of
scientists work on it together for a summer.”
Brief history of AI
• 1960-1980s: ”AI Winter”
• 1990s: Neural networks dominate, essentially because of

the discovery of the backpropagation for training neural
networks with two or more layers
• 2000s: Kernel methods dominate, essentially because of

the instability of training neural networks
• 2010s: The comeback of neural networks, essentially

because of the discovery of deep learning
Why are things working today?
• More compute
power
• More data Accuracy
• Better
algorithms /
models
Amount of Training Data
ML in a nutshell
• Tens of thousands of machine learning algorithms

 Researchers publish hundreds new every year
• Decades of ML research oversimplified:

 Learn a mapping f from the input X to the output Y,
i.e.:
 Example: X are emails, Y: {spam, not-spam}

ML in a nutshell
• Input: X (images, texts, emails…)
• Output: Y (spam or not-spam…)
• (Unknown) Target Function:

• (the “true” mapping / reality)
• Data
• ,…
• Model / Hypothesis Class

•
•
ML in a nutshell
• Every machine learning algorithm has three

components:
 Representation / Model Class
 Evaluation / Objective Function
 Optimization
Where does ML fit in?
Biology Applied
Neuroscience Maths
• Biology of learning • Optimization

• Inspiring paradigms • Linear algebra
• E.g.: neural networks • Derivatives
Machine • E.g.: local minimum
Learning
Computer
Statistics
Science
• Algorithms • Estimation techniques

• Data structures • Theoretical frameworks
• Complexity analysis • Optimality, efficiency
• E.g.: k-d trees • E.g.: Bayes rule
Learning paradigms
• Standard learning paradigms:
 Supervised learning
 Unsupervised learning
 Semi-supervised learning
 Reinforcement learning
• Non-standard paradigms:
 Active learning
 Transfer learning
 Transductive learning
Supervised learning
• We have a set of labeled training samples
• Example 1: object recognition in images annotated
with corresponding class labels
Car Person
Person Dog
Car
Supervised learning
• Example 2: handwritten digit recognition (on the MNIST data
set)
• Images of 28 x 28 pixels
• We can represent each image as a vector x of 784 components
• We train a classifier such that:
Supervised learning
• Example 2 (continued): handwritten digit recognition (on the
MNIST data set)
• Starting with a training set of about 60K images (about 6000 images
per class)
• … the error rate can go down to 0.23% (using convolutional neural
networks)
• Among the first (learning-based) systems used in a large-scale
commercial setting for postal code and bank cheque processing
Supervised learning
• Example 3: face detection
• One approach consists of sliding a window over the image

• The goal is to classify each window into one of the two
possible classes: face or not-face
• The original problem is transformed into a classification
problem
Supervised learning
• Example 3: face detection
• We start with a set of face images with different variations

such as age, gender, illumination, pose, but no translations
• … and a larger set of images that do not contain full faces
Supervised learning
• Example 4: spam detection
• The task is to classify an email into spam or not-spam

• The occurrence of the word “Dollars” is a good indicator of spam
• A possible representation is a vector of word frequencies
We count the words…
obtaining X
The spam detection algorithm
Confidence /
performance
guarantee?
Why linear
combination?
Why these words?
Where do the weights
come from?
Supervised learning
• Example 5: predicting stock prices on the market
• The goal is to predict the price at a future date, for example in a

few days
• This is a regression task, since the output is continuous
Supervised learning
• Example 6: image difficulty prediction [Ionescu et al. CVPR2016]
• The goal is to predict the time necessary for a human to solve a

visual search task (data set available for project 2!)
• This is a regression task, since the output is continuous
Canonical forms of supervised
learning problems
• Classification
• Regression
Age estimation in images
• Classification?
• Regression?
What age?
The supervised learning paradigm
Supervised learning models
• Naive Bayes (lecture 2)
• k-Nearest Neighbors (lecture 3)
• Decision trees and random forests (lecture 4)
• Support Vector Machines (lecture 5, 6)
• Kernel methods (lecture 5)
• Kernel Ridge Regression (lecture 5)
• Neural networks (lectures 7, 8, 9)
• Many others…
Unsupervised learning
• We have an unlabeled training set of samples
• Example 1: clustering images based on similarity
• Example 1: clustering MNIST images based on
similarity [Georgescu et al. ICIP2019]
• Example 2: unsupervised features learning
• Example 2: unsupervised features learning for
abnormal event detection [Ionescu et al. CVPR2019]
• Example 3: clustering mammals by family, species, etc.
• The task is to generate the phylogenetic tree based on DNA

Canonical forms of unsupervised
learning problems
• Clustering
• Dimensionality Reduction
Unsupervised learning models
• K-means clustering (lecture 10, 11)
• DBScan (lecture 12)
• Hierarchical clustering (lecture 12)
• Principal Component Analysis (lecture 13)
• t-Distributed Stochastic Neighbor Embedding
(lecture 13)
• Hidden Markov Models
• Many others…
Semi-supervised learning
• We have a training set of samples that are partially
annotated with class labels
• Example 1: object recognition in images, some of
which are annotated with corresponding class labels
Car Dog
Person
Reinforcement learning
• How does it work?
• The system learns intelligent behavior using a
reinforcement signal (reward)
• The reward is given after several actions are taken (it
does not come after every action)
• Time matters (data is sequential, not i.i.d.)
• The actions of the system can influence the data
• Example 1: learning to play Go
• +/- reward for winning / losing the game
• Example 2: teaching a robot to ride a bike
• +/- reward for moving forward / falling
• Example 3: learning to play Pong from image pixels
• +/- reward for increasing
• personal / adversary score
Reinforcement learning paradigm
Formalizing as Markov Decision
Process
Process
Process
• Solution based on dynamic programming (small
graphs) or approximation (large graphs)
• Goal: select the actions that maximize the total final
reward
• The actions can have long-term consequences
• Sacrificing the immediate reward can lead to higher
rewards on the long term
Process
• AlphaGo example:
 Narrator 1: “That’s a very strange move”
 Narrator 2: “I thought it was a mistake”
 But actually, “the move turned the course of the
game. AlphaGo went on to win Game Two, and at
the post-game press conference, Lee Sedol was in
shock.”
 https://www.wired.com/2016/03/two-moves-alphago-
lee-sedol-redefined-future/
Active learning
• Given a large set of unlabeled samples, we have to
choose a small subset for annotation in order to
obtain a good classification model
Transfer learning
• Starting with a model trained for a certain task /
domain, use the model for a different task / domain
More specific object classes,

face recognition,
texture classification, etc.
Transfer learning
• Adapt the model to specific test samples
• Example 1: facial expression recognition [Georgescu et al. Access2019]
Transfer learning
• Example 2: zero-shot learning
Many interesting applications, but…
• What is ethical and what is not?
• Trolley paradox
• Trolley paradox
• Trolley paradox
• http://moralmachine.mit.edu
Bibliography

Lecture 1

Uploaded by

Lecture 1

Uploaded by

Introduction to Machine Learning.

Basic Concepts and Learning

Radu Ionescu, Prof. PhD.

Faculty of Mathematics and Computer Science

Deep Learning Nearest Neighbors Knowledge

Hierarchical Clustering Computer Vision

• Many AI researchers consider the ultimate goal of AI can

• Examples of tasks that be solved through machine

• [Kevin Murphy] algorithms that:

• [Tom Mitchell] algorithms that:

(C) Dhruv Batra 30

• 1960-1980s: ”AI Winter”

• 1990s: Neural networks dominate, essentially because of

• 2000s: Kernel methods dominate, essentially because of

• 2010s: The comeback of neural networks, essentially

• More data Accuracy

• Tens of thousands of machine learning algorithms

• Decades of ML research oversimplified:

 Example: X are emails, Y: {spam, not-spam}

• Output: Y (spam or not-spam…)

• (Unknown) Target Function:

• Model / Hypothesis Class

• Every machine learning algorithm has three

• Biology of learning • Optimization

• Algorithms • Estimation techniques

• One approach consists of sliding a window over the image

• We start with a set of face images with different variations

• The task is to classify an email into spam or not-spam

• The goal is to predict the price at a future date, for example in a

• The goal is to predict the time necessary for a human to solve a

• The task is to generate the phylogenetic tree based on DNA

More specific object classes,

You might also like