0% found this document useful (0 votes)

254 views46 pages

Maxent Machine Learning Overview

This document provides an introduction to maximum entropy models. It begins with an outline of the topics to be covered, including what maximum entropy modeling is, example applications, and an upcoming lab session on parameter estimation algorithms. It then defines maximum entropy models as frameworks that combine evidence into probability models using maximum entropy as the optimization criterion. Example applications discussed include sentence boundary detection, part-of-speech tagging, and content match for ad clicks. The document also discusses machine learning approaches, features, probability distributions, maximum likelihood estimation, and the relationship between maximum likelihood and maximum entropy solutions.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

254 views46 pages

Maxent Machine Learning Overview

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Introduction to maximum entropy

models
Adwait Ratnaparkhi
Yahoo! Labs
Introduction
• This talk is geared for Technical Yahoos…
– Who are unfamiliar with machine learning or maximum entropy
– Who are familiar with machine learning but not maximum
entropy
– Who want to see an example of machine learning on the grid
– Who are looking for an introductory talk
• Lab session is geared for Technical Yahoos…
– Who brought their laptop
– Who want to obtain more numerical intuition about the
optimization algorithms
– Who don’t mind coding a little PERL
– Who have some patience in case things go awry!

-2- Yahoo! Confidential

Outline
• What is maximum entropy modeling?
• Example applications of maximum entropy framework
• Lab session
– Look at the mapper and reducer of a maximum entropy
parameter estimation algorithm

-3- Yahoo! Confidential

What is a maximum entropy model?
• A framework for combining evidence into a probability model
• Probability can then be used for classification, or input to
next component
– Content match: P(click | page, ad, user)
– Part-of-speech tagging: P(part-of-speech-tag | word )
– Categorization: P(category | page )
• Principle of maximum entropy is used as optimization
criterion
– Relationship to maximum likelihood optimization
– Same model form as multi-class logistic regression

-4- Yahoo! Confidential

Machine learning approach
• Data with labels
– Automatic: ad click
• sponsored search, content match, graphical ads
– Manual: editorial process
• page categories, part-of-speech tags
• Objective
– Training phase
• Find a (large) training set
• Use training set to construct a classifier or probability model
– Test phase
• Find an (honest) test set
• Use model to assign labels to previously unseen data

-5- Yahoo! Confidential

Why machine learning? Why maxent?
• Why machine learning?
– Pervasive ambiguity
• Any task with natural language features
– For these tasks, easier to collect and annotate data vs. hand-
coding expert system
– For certain web tasks, annotation is free (e.g., clicks)
• Why maxent?
– Little restriction on kinds of evidence
• No independence assumption
– Works well with sparse features
– Works well in parallel settings, like Hadoop
– Appeal of maxent interpretation

-6- Yahoo! Confidential

Sentence Boundary Detection
• Which “.” denotes a sentence boundary?

Mr. B. Green from X.Y.Z. Corp. said that the U.S. budget
deficit hit $1.42 trillion for the year that ended Sept. 30. The
previous year’s deficit was $459 billion.

Correct Boundary

• Model: P({yes|no}| candidate boundary)

-7- Yahoo! Confidential

Part-of-speech tagging
• What is the part of speech of flies ?
– Fruit flies like a banana.
– Time flies like an arrow.

• Model: P(tag | current word, surrounding words, …)

– End goal is sequence of tags

-8- Yahoo! Confidential

Content match ad clicks

P(click|page, ad, user)

-9- Yahoo! Confidential

Ambiguity resolution: an artificial example
• Tagging unknown, or “out-of-vocabulary”, words:
– Given word, predict POS tag based on spelling features
• Assume all words are either:
– Proper Noun (NNP)
– Verb Gerund (VBG)
• Find a model: P(tag | word) where tag is in {NNP,VBG}
• Assume a training set
– Editorially derived (word, tag) pairs
– Training data tags are { NNP, VBG }
– Reminder: artificial example!

- 10 - Yahoo! Confidential
Ambiguity resolution (cont’d)
• Ask the data
– q1: Is first letter of word capitalized ?
– q2: Does word end in ing ?
• Evidence for (q1, NNP)
– Clinton, Bush, Reagan as NNP
• Evidence for (q2, VBG)
– trading, offering, operating as VBG
• Rules based on q1,q2 can conflict
– Boeing
• How to choose in ambiguous cases?

- 11 - Yahoo! Confidential
Features represent evidence
• a = what we are predicting
• b = what we observe
• Terminology
– Here, input to feature is (a,b) pair
– But in a lot of ML literature, input to feature is just (b)

⎧1 if a = y AND q(b) = true

f y,q (a,b) = ⎨
⎩0 otherwise

f1 (a,b) = 1 if a = NNP and q1(b) = true

f 2 (a,b) = 1 if a = VBG and q2(b) = true

- 12 - Yahoo! Confidential
Combine the features: probability model
k
1
p(a | b) = ∏
Z(b) j=1
α j
f j ( a,b )

k
Z(b) = ∑ ∏α
f j (a',b )
j
a' j=1

f j (a,b) ∈ {0,1} : features

α j > 0 : parameters
• Probability is product of weights of active features
• Why this form?

€
- 13 - Yahoo! Confidential
Probabilities for ambiguity resolution

1 f1 (a,b ) f 2 (a,b )
p(NNP | Boeing) = α1 α2
Z
1 f1 ( a,b )
= α1
Z
1 f1 (a,b ) f 2 (a,b )
p(VBG | Boeing) = α1 α2
Z
1 f ( a,b )
= α2 2
Z

• How do we find optimal parameter values?

€

- 14 - Yahoo! Confidential
Maximum likelihood estimation

1 k f j (a ,b )
Q = {p |p (a |b)= ∏ αj }
Z (b) j =1
r (a,b)= {Normalized frequency
of(a,b)}
L(p )= ∑r (a,b)logp (a |b)
a ,b

pML = argmax L(p )

p∈Q

- 15 - Yahoo! Confidential
Principle of maximum entropy (Jaynes, 1957)
• Use probability model that is maximally uncertain, w.r.t.
to observed evidence
• Why? Anything else assumes a fact you have not
observed.

P = {models consistent withevidence}

H (p )= {entropyofp}
pME = argmax H (p )
p∈P

- 16 - Yahoo! Confidential
Maxent example
• Task: estimate joint distribution p(A,B)
– A is in {x,y}
– B is in {0,1}
• Define a feature f
– Assume some expected value over a training set

f (a,b) = 1 iff (a = y & b = 1), 0 otherwise

E[ f ] = 7 /10 = .7
∑ p(a,b) = 1
a,b
p(A,B) 0 1
x ? ?
y ? 0.7
€
- 17 - Yahoo! Confidential
Maxent example (cont’d)
• Define entropy H( p) = −∑ p(a,b)log p(a,b)
a,b
• One way to meet the constraints. H(p) = 1.25

p(A,B) 0 1
x .05
€ 0.2
y .05 0.7

• Maxent way to meet constraints. H(p) = 1.35

p(A,B) 0 1
x 0.1 0.1
y 0.1 0.7

- 18 - Yahoo! Confidential
Conditional maximum entropy (Berger et al, 1995)

E r f j = {observed expectation of f}
= ∑ r(a,b) f j (a,b)
a,b

E p f j = {model's expectation of f}
= ∑ r(b) p(a | b) f j (a,b)
a,b

P = { p | E p f j = E r f j , j = 1...k}
H( p) = −∑ r(b) p(a | b)log p(a | b)
a,b

pME = arg max H( p)

p ∈P

- 19 - Yahoo! Confidential
Duality of ML and ME
• Under ME it must be the case that:
k
1
pME (a | b) = ∏
Z(b) j=1
α j
f j (a,b )

• ML and ME solutions are the same

– pme=pml
€
– ML: form is assumed without justification
– ME: constraints are assumed, form is derived

- 20 - Yahoo! Confidential
Extensions: Minimum divergence modeling
• Kullback-Leibler divergence
– Measures “distance” between 2 probability distributions
– Not symmetric!
– See (Cover and Thomas, Elements of Information Theory)

p(a | b)
D( p,q) = ∑ r(b) p(a | b)log
a,b q(a | b)
D( p,q) ≥ 0
D( p,q) = 0 iff p = q

- 21 - Yahoo! Confidential
Extensions : Minimum divergence models (cont’d)
P = {p | E p f j = E r f j , j = 1...k}
pMD = arg min D( p,q)
p ∈P
k
q(a | b)∏α
f j (a,b )
j
j=1
pMD (a | b) = k

∑ q(a'| b)∏α j
f j (a',b )

a' j=1
• Minimum divergence framework:
– Start with prior model q
– From the set of consistent models P, minimize KL divergence to q
€
• Parameters will reflect deviation from prior model
– Use case: prior model is static
• Same as maximizing entropy when q is uniform
• See (Della Pietra et al, 1992) for an example in language modeling

- 22 - Yahoo! Confidential
Parameter estimation (an incomplete list)
• Generalized Iterative Scaling
(Darroch & Ratcliff, 1972)
– Find correction feature and
Define correction feature :
constant k
f k +1 (a,b) = C − ∑ f j (a,b)
– Iterative updates

• Improved iterative scaling j=1

(Della Pietra et. al., 1997)
• Conjugate gradient GIS :
• Sequential conditional GIS α (0)
j =1
(Goodman, 2002)
1/C
• Correction-free GIS ⎡ Er f j ⎤
(n ) (n−1)
(Curran and Clark, 2003) α j =α j ⎢ (n−1) ⎥
⎣E p f j ⎦

€ - 23 - Yahoo! Confidential
Comparisons
• Same model form as multi-class logistic regression
• Diverse forms of evidence
• Compared to decision trees:
– Advantage: No data fragmentation
– Disadvantage: No feature construction
• Compared to naïve bayes
– No independence assumptions
• Scales well on sparse feature sets
– Parameter estimation (GIS) is
O( [# of training samples]
[# of predictions]
[avg. # of features per training event])

- 24 - Yahoo! Confidential
Disadvantages
• “Perfect” predictors cause parameters to diverge
– Suppose the word the only occurred with tag DT
– Estimation algorithm is forcing p(a|b) = 1 in order to meet
constraints
• Parameter for (the, DT) will diverge to infinity
• May beat out other parameters estimated from many examples!
• A remedy
– Gaussian priors or “fuzzy maximum entropy” (Chen &
Rosenfeld, 2000)
– Discount observed expectations

- 25 - Yahoo! Confidential
How to specify a maxent model
• Outcomes
– What are we predicting?
• Questions
– What information is useful for predicting ?
• Feature selection
– Candidate feature set consists of all (outcome, question) pairs
– Given candidate feature set, what subset do we use?

- 26 - Yahoo! Confidential
Finding the part-of-speech
• Part of Speech (POS) Tagging
– Return a sequence of POS tags

Input: Fruit flies like a banana

Output: N N V D N

Input: Time flies like an arrow

Output: N V P D N
• Train maxent models from POS tags of Penn treebank
(Marcus et al, 1993)

• Use heavily pruned search procedure to find highest

probability tag sequence

- 27 - Yahoo! Confidential
Model for POS tagging
(Ratnaparkhi, 1996)

• Outcomes
– 45 POS tags (Penn Treebank)
• Question Patterns:
– common words: word identity
– rare words: presence of prefix, suffix, capitalization, hyphen,
and numbers
– previous 2 tags
– surrounding 2 words
• Feature Selection
– count cutoff of 10

- 28 - Yahoo! Confidential
A training event
• Example:
– ...stories about well-heeled communities and ...
NNS IN ???
• Outcome: JJ (adjective)
• Questions
– w[i-1]=about, w[i-2]=stories, w[i+1]=communities, w[i+2]=and,
t[i-1]=IN, t[i-2][i-1]=NNS IN,
pre[1]=w, pre[2]=we, pre[3]=wel, pre[4]=well,
suf[1]=d,suf[2]=ed,suf[3]=led,suf[4]=eled

- 29 - Yahoo! Confidential
Finding the best POS sequence

n
a1* ...an* = arg max ∏ p (ai | bi )
a1 ...an i =1

• Find maximum probability sequence of n tags

– Use "top K" breadth first search:
– Tag left-to-right, but maintain only top K ranked hypotheses
• Best ranked hypothesis is not guaranteed to be optimal
• Alternative: Conditional random fields (Lafferty et al,
2001)

- 30 - Yahoo! Confidential
Performance

Domain Word accuracy Unknown word Sentence accuracy

accuracy

English: Wall St. 96.6% 85.3% 47.3%

Journal

Spanish: CRATER 97.7% 83.3% 60.4%

corpus (with re-
mapped tagset)

- 31 - Yahoo! Confidential
Summary
• Errors:
– are typically the words that are difficult to annotate
• that, about, more

• Architecture
– Can be ported easily for similar tasks with different tag sets,
esp. named entity detection
• Name detector
– Tags = { begin_Name, continue_Name, other }
– Sequence probability can be used downstream

• Available for download

– MXPOST & MXTERMINATOR

- 32 - Yahoo! Confidential
Maxent model for Keystone content-match
• Yahoo’s Keystone content-match uses a click model
1
P(click | page,ad,user) = ∏
Z( page,ad,user) j
α
f j (click, page,ad ,user)
j

• Use features of the page,ad,user to predict click

• Use cases:
– Select ads with (page,ad) cosine score, use click model to re-
rank
– Select ads directly with click model score

- 33 - Yahoo! Confidential
Maxent model for Keystone content-match
• Outcomes: click (1) or no click (0)
• Questions:
– Unigrams, phrases, categories on the page side
– Unigrams, phrases, categories on the ad side
– The user’s BT category
• Feature selection
– Count cutoff, mutual information

- 34 - Yahoo! Confidential
Some recent work
• Recent work
– Using the (page,ad) cosine score (Opal) as a feature
– Using page domain  ad bid phrase mappings
– Using user BT  ad bid phrase mappings
– Using user age+gender  bid phrase mappings
• Contact
– Andy Hatch (aohatch)
– Abraham Bagherjeiran (abagher)

- 35 - Yahoo! Confidential
Maxent on the grid
• Several grid maxent implementations at Yahoo
• Correction-Free GIS (Curran and Clark, 2003)
– Implemented for verification and lab session
– Not product-ready code!
• Input for each iteration
– Training data format:
[label] [weight] [q1 … qN]
– Feature set format:
[q] [label] [parameter]
• Output for each iteration
– New feature set format:
[q] [label] [new parameter]

- 36 - Yahoo! Confidential
Maxent on the grid (cont’d)
• Map phase (parallelization across training data)
– Collect observed feature expectations
– Collect model’s feature expectations w.r.t. current model
– Use feature name as key
• Reduce phrase (parallelization across model parameters)
– For each key (feature):
• Sum up observed and model feature expectations
• Do the parameter update
– Write the new model
• Repeat for N iterations

- 37 - Yahoo! Confidential
Maxent on the grid (cont’d)
• maxent_mapper
– args: [file of model params]
– stdin: [training data, one instance per line]
– stdout: [feature name] [observed val] [current param] [model val]
• maxent_reducer
– args: [iteration] [correction constant]
– stdin: ( input from maxent_mapper, sorted by key )
– stdout: [feature name] [new param]
• Use hadoop streaming, can be used off the grid

- 38 - Yahoo! Confidential
Lab session
• Login to a UNIX machine or Mac
• mkdir maxent_lab; cd maxent_lab
• svn checkout
svn+ssh://[Link]/yahoo/adsciences/contextual
advertising/streaming_maxent/trunk/GIS .
• cd src; make clean all
• cd ../unit_test
• ./doloop

- 39 - Yahoo! Confidential
Lab Exercise: Sentence Boundary Detection
• Problem: given a “.” in free text, classify it:
– Yes, it is a sentence boundary
– No, it is not
• Not a super-hard problem, but not super-easy!
– Hand-coded baseline can get high result
– With foreign languages, hand-coding is tougher
• cd data
• The Penn Treebank corpus : [Link]
– gunzip –c [Link] | head
– Newline indicates sentence boundary
– Penn treebank tokenizes text for NLP
• Undid tokenization as much as possible for this exercise

- 40 - Yahoo! Confidential
Lab: data generation
• cd ../lab/data
• Look at [Link]
– creates train/development test/test
– Feature extraction
• [true label] [weight] [q1] … [qN]

no 1.0 default prefix=Nov

yes 1.0 *default* prefix=29

• *default* is always on
– Model can estimate prior probability of Yes and No
• Prefix: char sequence before “.” up to space char
• run ./[Link]

- 41 - Yahoo! Confidential
Lab: feature selection and training
• cd ../train
• Look at dotrain
– selectfeatures
• select features based on freq => cutoff
– train
• find correction constant
• Iterate (each iteration is one map/reduce job)
– maxent_mapper: collect stats
– maxent_reducer: find update

• Look at dotest
– [Link]: Classifies as “yes” if prob > 0.5
– Evaluation: # of correctly classified test instances
• Run ./dotrain

- 42 - Yahoo! Confidential
Lab: matching expectations
• GIS should bring model expectation closer to observed
expectation
• After 1st map ([Link])
Feature observed parameter model
prefix=$1 no 462 1 233
prefix=$1 yes 4 1 233

• After 9th map ([Link])

Feature observed parameter model

prefix=$1 no 462 1.51773 461.574
prefix=$1 yes 4 0.00731677 4.42558

- 43 - Yahoo! Confidential
Lab: results
• Log-likelihood of training data must increase

Log Likelihood
0
1 2 3 4 5 6 7 8 9
-5000

-10000
Log Likelihood
-15000

-20000

-25000

-30000

-35000

Accuracy:
– Train: 46670 correct out of 47771, or 97.6952544430722%
– Dev: 14940 correct out of 15579, or 95.8983246678221%

- 44 - Yahoo! Confidential
Lab: your turn!!!
• Beat this result (on the development set only)!
• Things to try
– Feature extraction
• Look at data and find other things that would be useful for
sentence boundary detection
• data/[Link]
– Suffix features
– Feature classes
– Feature selection
• Pay attention to the number of features
– Number of iterations
– Pay attention to train vs. test accuracy

- 45 - Yahoo! Confidential
Lab: Now let’s try the real test set
• Take your best model, and try it on the test set

./dotest [Link] ../data/te

• Did you beat the baseline?

13937 correct out of 14586, or 95.5505279034691%

• Who has the highest result?

- 46 - Yahoo! Confidential

5.2 Feature Engineering
No ratings yet
5.2 Feature Engineering
57 pages
NLP: Linear & Log-Linear Models
No ratings yet
NLP: Linear & Log-Linear Models
34 pages
Understanding Machine Learning Concepts
No ratings yet
Understanding Machine Learning Concepts
33 pages
ML/Data Science Interview Cheat Sheet
No ratings yet
ML/Data Science Interview Cheat Sheet
17 pages
CSE546: Naïve Bayes: Winter 2012
No ratings yet
CSE546: Naïve Bayes: Winter 2012
35 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
39 pages
Gaussian Mean and Variance MAP Analysis
No ratings yet
Gaussian Mean and Variance MAP Analysis
36 pages
Machine Learning Interview Cheat Sheets
No ratings yet
Machine Learning Interview Cheat Sheets
16 pages
lec20-ML I
No ratings yet
lec20-ML I
48 pages
07 Intro To ML
No ratings yet
07 Intro To ML
38 pages
Machine Learning: Feature Engineering & Models
No ratings yet
Machine Learning: Feature Engineering & Models
84 pages
Naive Bayes Classifier Homework Solutions
No ratings yet
Naive Bayes Classifier Homework Solutions
5 pages
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
No ratings yet
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
31 pages
Manual
No ratings yet
Manual
34 pages
Machine Learning Basics for Beginners
100% (5)
Machine Learning Basics for Beginners
134 pages
ML Interview Cheat Sheets Overview
No ratings yet
ML Interview Cheat Sheets Overview
18 pages
Machine Learning Interview Cheat Sheets
No ratings yet
Machine Learning Interview Cheat Sheets
14 pages
Natural Language Processing Classifiers
No ratings yet
Natural Language Processing Classifiers
52 pages
NLP Algorithms and Applications Overview
No ratings yet
NLP Algorithms and Applications Overview
108 pages
Advanced Machine Learning Topics
No ratings yet
Advanced Machine Learning Topics
40 pages
Maximum Entropy Markov Models Overview
No ratings yet
Maximum Entropy Markov Models Overview
70 pages
Naïve Bayes Text Classification Overview
No ratings yet
Naïve Bayes Text Classification Overview
58 pages
Statistical Machine Learning Overview
No ratings yet
Statistical Machine Learning Overview
52 pages
Supervised Learning
No ratings yet
Supervised Learning
5 pages
2024 - Slide2 - BayesML Sub
No ratings yet
2024 - Slide2 - BayesML Sub
40 pages
In4080 2022 Lecture 03
No ratings yet
In4080 2022 Lecture 03
62 pages
01 Introduction
No ratings yet
01 Introduction
51 pages
Complete Unit-1 Merged
No ratings yet
Complete Unit-1 Merged
74 pages
ML Merged Endsem
No ratings yet
ML Merged Endsem
1,117 pages
ML Merged
No ratings yet
ML Merged
729 pages
Slide07 Bayes
No ratings yet
Slide07 Bayes
51 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
10 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
91 pages
CPSC540: Machine Learning Machine Learning Machine Learning Machine Learning
No ratings yet
CPSC540: Machine Learning Machine Learning Machine Learning Machine Learning
387 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
43 pages
Graphical Models for Internet Data Analysis
No ratings yet
Graphical Models for Internet Data Analysis
306 pages
Bag - of - Words NLP
100% (1)
Bag - of - Words NLP
23 pages
Discriminative Estimation in NLP Models
No ratings yet
Discriminative Estimation in NLP Models
55 pages
Machine Learning: Logistic Regression & Classifiers
No ratings yet
Machine Learning: Logistic Regression & Classifiers
104 pages
Artificial Intelligence: Machine Learning
No ratings yet
Artificial Intelligence: Machine Learning
110 pages
Machine Learning Techniques Overview
No ratings yet
Machine Learning Techniques Overview
18 pages
Overview of Graphical Models in AI
No ratings yet
Overview of Graphical Models in AI
135 pages
Machine Learning Fundamentals Overview
No ratings yet
Machine Learning Fundamentals Overview
85 pages
DATA - FA 2024 - Dist
No ratings yet
DATA - FA 2024 - Dist
85 pages
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
0% (1)
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
15 pages
Lecture13 Nbayes
No ratings yet
Lecture13 Nbayes
56 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
18 pages
Machine Learning PPT Part III
No ratings yet
Machine Learning PPT Part III
26 pages
Intro to Machine Learning Basics
No ratings yet
Intro to Machine Learning Basics
71 pages
Types of Machine Learning Explained
No ratings yet
Types of Machine Learning Explained
26 pages
Probability and Estimation Techniques
No ratings yet
Probability and Estimation Techniques
18 pages
Python 06 MachineLearning
No ratings yet
Python 06 MachineLearning
45 pages
Advanced Machine Learning Lecture Notes
No ratings yet
Advanced Machine Learning Lecture Notes
123 pages
FALLSEM2024-25 BCSE334L TH VL2024250101768 2024-10-04 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE334L TH VL2024250101768 2024-10-04 Reference-Material-I
69 pages
NLP Word Vectors for Students
No ratings yet
NLP Word Vectors for Students
33 pages
Dexter
No ratings yet
Dexter
51 pages
(Ebook) Probability and Statistics by Example: Volume 1, Basic Probability and Statistics by Yuri Suhov, Mark Kelbert ISBN 9781107603585, 1107603587 Instant Download
100% (2)
(Ebook) Probability and Statistics by Example: Volume 1, Basic Probability and Statistics by Yuri Suhov, Mark Kelbert ISBN 9781107603585, 1107603587 Instant Download
52 pages
Rose & Bliemer 2009 - Constructing Efficient Stated Choice Experimental Designs
No ratings yet
Rose & Bliemer 2009 - Constructing Efficient Stated Choice Experimental Designs
32 pages
IDAPISlides 01
No ratings yet
IDAPISlides 01
34 pages
Machine Learning Lab Guide
No ratings yet
Machine Learning Lab Guide
69 pages
Logistic Regression Using The SAS System Theory and Application 1st Edition Paul D. Allison Sample
No ratings yet
Logistic Regression Using The SAS System Theory and Application 1st Edition Paul D. Allison Sample
490 pages
NUS-RMI Credit Rating Initiative Technical Report: Version: 2011 Update 1 (07-07-2011)
No ratings yet
NUS-RMI Credit Rating Initiative Technical Report: Version: 2011 Update 1 (07-07-2011)
28 pages
Understanding Probability Concepts
No ratings yet
Understanding Probability Concepts
46 pages
RStudio Guide for Predictive Modeling
No ratings yet
RStudio Guide for Predictive Modeling
17 pages
Probability Theory Overview
No ratings yet
Probability Theory Overview
7 pages
Gwnext 2024: Meeting Summary: TH TH
No ratings yet
Gwnext 2024: Meeting Summary: TH TH
32 pages
Microhabitat Selection by Macroinvertebrates Generality Among Rivers and Functional Interpretation
No ratings yet
Microhabitat Selection by Macroinvertebrates Generality Among Rivers and Functional Interpretation
15 pages
Computational Bayesian Statistics. An Introduction - Amaral, Paulino, Muller PDF
100% (5)
Computational Bayesian Statistics. An Introduction - Amaral, Paulino, Muller PDF
257 pages
Logarithmic Poisson
No ratings yet
Logarithmic Poisson
9 pages
Understanding Pattern Recognition Systems
No ratings yet
Understanding Pattern Recognition Systems
22 pages
Pipeline Risk Modeling Technical Information Document 05-09-2018 Draft 1
80% (5)
Pipeline Risk Modeling Technical Information Document 05-09-2018 Draft 1
111 pages
Truck Vibration-Based Health Monitoring
No ratings yet
Truck Vibration-Based Health Monitoring
19 pages
ISO 27001 Risk Register Template
50% (2)
ISO 27001 Risk Register Template
6 pages
Statistics B.SC 1 To 6th Sem Syllabus 2024-25
No ratings yet
Statistics B.SC 1 To 6th Sem Syllabus 2024-25
39 pages
Analysis of Age, Education, and Culture
No ratings yet
Analysis of Age, Education, and Culture
16 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
6 pages
ROSI Framework for Security Investment
No ratings yet
ROSI Framework for Security Investment
10 pages
Autoregressive Diffusion Models Explained
No ratings yet
Autoregressive Diffusion Models Explained
23 pages
Multivariate Generalized Linear Mixed Models Using R 1st Edition Damon Mark Berridge PDF Version
No ratings yet
Multivariate Generalized Linear Mixed Models Using R 1st Edition Damon Mark Berridge PDF Version
52 pages
2014 Bookmatter DataAnalysis
No ratings yet
2014 Bookmatter DataAnalysis
20 pages
Basic 7 - Third Term
No ratings yet
Basic 7 - Third Term
17 pages
Animal Movement: Statistical Models For Animal Telemetry Data 1st Edition Mevin B. Hooten Download
100% (2)
Animal Movement: Statistical Models For Animal Telemetry Data 1st Edition Mevin B. Hooten Download
51 pages
ST 544: Applied Categorical Data Analysis: Daowen Zhang
No ratings yet
ST 544: Applied Categorical Data Analysis: Daowen Zhang
514 pages
Conditional Likelihood in Exponential Family
No ratings yet
Conditional Likelihood in Exponential Family
4 pages
The Role of Purchase Regularity and Purchase Frequency in Price and Discount Sensitivity
No ratings yet
The Role of Purchase Regularity and Purchase Frequency in Price and Discount Sensitivity
27 pages

Maxent Machine Learning Overview

Uploaded by

Maxent Machine Learning Overview

Uploaded by

Introduction to maximum entropy

-2- Yahoo! Confidential

-3- Yahoo! Confidential

-4- Yahoo! Confidential

-5- Yahoo! Confidential

-6- Yahoo! Confidential

• Model: P({yes|no}| candidate boundary)

-7- Yahoo! Confidential

• Model: P(tag | current word, surrounding words, …)

-8- Yahoo! Confidential

P(click|page, ad, user)

-9- Yahoo! Confidential

⎧1 if a = y AND q(b) = true

f1 (a,b) = 1 if a = NNP and q1(b) = true

f j (a,b) ∈ {0,1} : features

• How do we find optimal parameter values?

pML = argmax L(p )

P = {models consistent withevidence}

f (a,b) = 1 iff (a = y & b = 1), 0 otherwise

• Maxent way to meet constraints. H(p) = 1.35

pME = arg max H( p)

• ML and ME solutions are the same

• Improved iterative scaling j=1

Input: Fruit flies like a banana

Input: Time flies like an arrow

• Use heavily pruned search procedure to find highest

• Find maximum probability sequence of n tags

Domain Word accuracy Unknown word Sentence accuracy

English: Wall St. 96.6% 85.3% 47.3%

Spanish: CRATER 97.7% 83.3% 60.4%

• Available for download

• Use features of the page,ad,user to predict click

no 1.0 *default* prefix=Nov

• After 9th map ([Link])

Feature observed parameter model

./dotest [Link] ../data/te

• Did you beat the baseline?

13937 correct out of 14586, or 95.5505279034691%

• Who has the highest result?

You might also like

no 1.0 default prefix=Nov