L1 Intro
L1 Intro
Machine learning is
changing the world
1
3/27/18
Old view of ML
My curve
ML is better Write a
Data Algorithm than your paper
curve
Search Coupons
Retail
APPLICATIONS
Advertising
using Real Estate
Human
Machine Learning
Resources Legal
Advice
Dating Wearables
CRM
Taxis
2
3/27/18
Generically…
Study of algorithms that
improve their performance
at some task
with experience
3
3/27/18
ML
Data Intelligence
Method
ML case studies
4
3/27/18
Case Study 1:
Predicting house prices
ML
Data Regression Intelligence
Method
$ = ??
price ($)
$ $
$
+ house house size
features
9 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning
What is regression?
From features to predictions
Input x:
features derived Learn xày
from data
relationship Predict y:
continuous “output” or
“response” to input
10 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning
5
3/27/18
hard work
Stock prediction
• Predict the price of a stock (y)
• Depends on x =
- Recent history of stock price
- News events
- Related commodities
6
3/27/18
Tweet popularity
• How many people will retweet your tweet?
• Depends on # followers, # of followers of followers,
features of text tweeted,
popularity of hashtag,
# of past retweets,…
Output y
very sad very happy
Inputs x are
brain region
intensities
7
3/27/18
Case Study 2:
Sentiment analysis
ML
Data Classification Intelligence
Method
Sushi was awesome,
the food was awesome,
but the service was awful. Score(x) < 0
All reviews:
“awful”
Score(x) > 0
“awesome”
15 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning
What is classification?
From features to predictions
ML
Data Classifier Intelligence
Method
Input x:
features derived Learn xày
from data
relationship Predict y:
categorical “output”,
class or label
16 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning
8
3/27/18
Spam filtering
Not spam
Text of email,
sender, IP,…
Spam
Input: x Output: y
Multiclass classifier
Output y has more than 2 categories
Education
Finance
Technology
Input: x Output: y
Webpage
9
3/27/18
Image classification
Input: x Output: y
Image pixels Predicted object
10
3/27/18
Case Study 3:
Document retrieval
Nearest
ML
Data Intelligence
neighbor
Method
11
3/27/18
What is retrieval?
Search for related items
Nearest
ML
Data Intelligence
Neighbor
Method
Input x,{x’}:
features for Compute
query point
distances to Output xNN:
+
features of other x’ “nearest” point or
23
all other datapoints ©2018 Emily Fox set of points to query
STAT/CSE 416: Intro to Machine Learning
query article
nearest neighbor
24 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning
12
3/27/18
query article
Retrieval applications
Products
Just about everything…
Images
Social networks
(people you might want
Streaming content: to connect with)
- Songs News articles
- Movies
- TV shows
- …
13
3/27/18
ML
Data Clustering Intelligence
Method
ENTERTAINMENT SCIENCE
27 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning
What is clustering?
Discover groups of similar inputs
ML
Data Clustering Intelligence
Method
Input {x}:
features for Separate
points in
points into Output {z}:
dataset
disjoint sets cluster labels per
datapoint
28 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning
14
3/27/18
Clustering images
For search, group as:
- Ocean
- Pink flower
- Dog
- Sunset
- Clouds
-…
Or users on websites…
Discover groups of
users for better
targeting of content
15
3/27/18
Embedding
Example: Embedding images to visualize data
ML
Data PCA Intelligence
Method
Images with
thousands or
millions of pixels
31 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning
Case Study 4:
Product recommendation
Matrix
ML
Data Intelligence
Factorization
Method
Your past purchases: Customers Recommended items:
features
features
features
+ purchase Products
histories of all features
customers
features
features
32 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning
16
3/27/18
Case Study 5:
Visual product recommender
Deep
ML
Data Intelligence
Method
Learning
x1 z1 y
x z
2 2
17
3/27/18
Deep
ML
Data Intelligence
Method
Learning
Input {x}:
raw data or Nonlinear
extracted features
feature Output {z}:
for points in
dataset representation label or value
35 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning
0.25
Huge
0.2
gain
0.15
0.1
0.05
0
SuperVision ISI OXFORD_VGG
18
3/27/18
19
3/27/18
Syllabus
ML
Data Intelligence
Method
Use pre-specified
(black box)
40 ©2018 Emily Fox STAT/CSE 416: Intro to Machine Learning
20
3/27/18
Detailed •
•
Linear regression, regularized approaches (ridge, Lasso)
Linear classifiers: logistic regression
topics Models •
•
Non-linear models: decision trees
Nearest neighbors, clustering
• Recommender systems
• Deep learning
• Gradient descent
Algorithms • Boosting
• K-means
21
3/27/18
Course logistics
Prerequisites
• Formally:
- Either CSE 143 or CSE 160; either STAT 311 or STAT/MATH 390 or STAT 391
• Basic Probability + Statistics
- Distributions, densities, independence, marginalization, conditioning,
expectation, variance…
• Programming
- Python will be very useful, but we’ll help you get started
22
3/27/18
Computing needs
• Everything will be on JupyterHub
- Just need to log in
- No need to install and run Python locally
- Email with username/password
TAs:
• Devin Didericksen
- Office hours: Tuesday 3:30 – 5:00pm, 3rd floor CSE breakout
• Varun Mahadevan
- Office hours: Wednesdays, 12:30 – 2pm, 5th floor CSE breakout
• John Kaltenbach
- Office hours: TBA
• Hunter Schafer
- Office hours: Mondays, 12:30 – 2pm; Tuesdays 12:30 – 1:30pm, CSE 220
• Patrick Spieker
- Office hours: Wednesdays and Fridays, 10:30 – 11:30am, 3rd floor CSE breakout
23
3/27/18
Quiz Sections
• Important to attend weekly
• Topics:
- Intros to and demos of running things in Python
- Reinforcing concepts from lecture
- Bonus material to supplement lectures
• Course website
- https://courses.cs.washington.edu/courses/cse416/18sp/
- Lecture slides, quiz section handouts, high-level (static) course info
• Canvas
- Discussion board, access to concept quizzes, submissions of work, and grades
• Google calendar
- Live updates to schedules (also via email to course mailing list)
- Shared url to be announced…stay tuned
24
3/27/18
Textbooks
• None! Come to lectures and quiz sections
- Annotated slides will be posted
- Quiz section handouts will be posted
- Blog posts and other sources will sometimes be referenced, too
• Optional Books:
- A Course in Machine Learning; Hal Duame III
http://ciml.info
- Machine Learning: A Probabilistic Perspective; Kevin Murphy
- Pattern Recognition and Machine Learning; Chris Bishop
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction;
Trevor Hastie, Robert Tibshirani, Jerome Friedman
https://web.stanford.edu/~hastie/ElemStatLearn/
25
3/27/18
Programming assignments
Programming assignments are hands-on experience with ML methods on
real data. The assignments are hard, start early J
Collaboration policy:
• You may discuss the questions
• Each student must write their own code and submit their own answers
- We will be using a cheating detection software
• Submit the names of anyone with whom you collaborate
• Please don’t search for answers on the web, Google, etc.
- please ask us if you are not sure if you can use a particular reference
Exams
• Concept quizzes
- Online!!!
- Spread throughout the quarter
- At least one per major topic
- Primary purpose is to make sure you are following content
- Must be completed 100% individually
• Final
- Finals week
- Monday, June 4, 10:30-12:20 in MLR 301
26
3/27/18
Grading
• Programming assignments (60%)
- Start early, Start early, Start early, Start early, Start early, Start early,
Start early, Start early, Start early, Start early, Start early, Start early,
Start early, Start early, Start early, Start early, Start early, Start early
- Bonus Assignment 0 to get setup with tools (0%)
• Final (25%)
• Resources:
- Java-to-Python guide (thanks to Hunter!)
- Videos on Python and Turi Create fundamentals
- Quiz section intro to running things on JupyterHub
27
3/27/18
You’ll be able to do
amazing things…
28
3/27/18
29