Learning From Data
Learning From Data
Lecture 1
The Learning Problem
Introduction
Motivation
Credit Default - A Running Example
Summary of the Learning Problem
M. Magdon-Ismail
CSCI 4100/6100
Resources
4. TA.
5. Professor.
6. Prerequisites? assignment #0
c AM
L Creator: Malik Magdon-Ismail The Learning Problem: 2 /16 The storyline −→
The Storyline
1. What is Learning?
2. Can We do it?
3. How to do it?
concepts
4. How to do it well? theory
practice
5. General principles?
6. Advanced techniques.
c AM
L Creator: Malik Magdon-Ismail The Learning Problem: 3 /16 The applications −→
c AM
L Creator: Malik Magdon-Ismail The Learning Problem: 4 /16 Define a tree −→
Let’s Define a Tree?
c AM
L Creator: Malik Magdon-Ismail The Learning Problem: 5 /16 A definition −→
Let’s Define a Tree?
c AM
L Creator: Malik Magdon-Ismail The Learning Problem: 6 /16 Does it work? −→
Are These Trees?
c AM
L Creator: Malik Magdon-Ismail The Learning Problem: 7 /16 Learning a Tree −→
Learning “What are Trees” is ‘Easy’
c AM
L Creator: Malik Magdon-Ismail The Learning Problem: 8 /16 Recognizing is easy −→
Defining is Hard; Recognizing is Easy
c AM
L Creator: Malik Magdon-Ismail The Learning Problem: 9 /16 Rating movies −→
Learning to Rate Movies
• Why? So that Netflix can make better movie recommendations, and get more rentals.
c AM
L Creator: Malik Magdon-Ismail The Learning Problem: 10 /16 There’s a pattern, we have data −→
Previous Ratings Reflect Future Ratings
?
rs
te ?
s ise
y? bu ru
ed on? ock C
m ti l m
co ac rs b To
es es fe es
lik lik pre lik
• Viewer taste & movie content imply viewer rating.
viewer:
movie:
• Netflix has data. We can learn to identify movie
“categories” as well as viewer “preferences”
To
co
ac
bl
oc con nt
m
tio co
m
ed
kb ten
Cr
n
ui
y
us
se
te
nt
r?
in
e
it?
Class Motto:
c AM
L Creator: Malik Magdon-Ismail The Learning Problem: 11 /16 Credit approval −→
Credit Approval
age 32 years
gender male
salary 40,000
debt 26,000
years in job 1 year
years at home 3 years
... ...
c AM
L Creator: Malik Magdon-Ismail The Learning Problem: 12 /16 There’s a pattern, we have data −→
Credit Approval
age 32 years
• Using salary, debt, years in residence, etc., approve for credit or not.
gender male
• No magic credit approval formula. salary 40,000
debt 26,000
• Banks have lots of data.
years in job 1 year
– customer information: salary, debt, etc. years at home 3 years
– whether or not they defaulted on their credit. ... ...
c AM
L Creator: Malik Magdon-Ismail The Learning Problem: 13 /16 Key players −→
The Key Players
c AM
L Creator: Malik Magdon-Ismail The Learning Problem: 14 /16 Learning −→
Learning
• Start with a set of candidate hypotheses H which you think are likely to represent f .
H = {h1, h2, . . . , }
is called the hypothesis set or model.
c AM
L Creator: Malik Magdon-Ismail The Learning Problem: 15 /16 Summary of learning setup −→
Summary of the Learning Setup
TRAINING EXAMPLES
(x1 , y1 ), (x2 , y2 ), . . . , (xN , yN )
LEARNING FINAL
ALGORITHM HYPOTHESIS
A g≈f
(learned credit approval formula)
HYPOTHESIS SET
H
c AM
L Creator: Malik Magdon-Ismail The Learning Problem: 16 /16