0% found this document useful (0 votes)

39 views52 pages

Deep Learning (MODULE-4) - RNN - NLP

nlp

Uploaded by

Soosan Shabnam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views52 pages

Deep Learning (MODULE-4) - RNN - NLP

nlp

Uploaded by

Soosan Shabnam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Natural Language Processing

with Deep Learning

Language Models and

Recurrent Neural Networks
Overview
Today we will:

• Introduce a new NLP task

• Language Modeling

motivates

• Introduce a new family of neural networks

• Recurrent Neural Networks (RNNs)

These are two of the most important ideas for the rest of the class!

2
Language Modeling
• Language Modeling is the task of predicting what word comes
next. books
laptops
the students opened their
exams
minds

• More formally: given a sequence of words ,

compute the probability distribution of the next word :

where can be any word in the vocabulary

• A system that does this is called a Language Model.

3
Language Modeling
• You can also think of a Language Model as a system that
assigns probability to a piece of text.

• For example, if we have some text , then the

probability of this text (according to the Language Model) is:

This is what our LM provides

4
You use Language Models every day!

5
You use Language Models every day!

6
n-gram Language Models
the students opened their

• Question: How to learn a Language Model?

• Answer (pre- Deep Learning): learn a n-gram Language Model!

• Definition: A n-gram is a chunk of n consecutive words.

• unigrams: “the”, “students”, “opened”, ”their”
• bigrams: “the students”, “students opened”, “opened their”
• trigrams: “the students opened”, “students opened their”
• 4-grams: “the students opened their”

• Idea: Collect statistics about how frequent different n-grams

are, and use these to predict next word.
7
n-gram Language Models
• First we make a simplifying assumption: depends only on the
preceding n-1 words.
n-1 words

(assumption)

prob of a n-gram
(definition of
prob of a (n-1)-gram conditional prob)

• Question: How do we get these n-gram and (n-1)-gram probabilities?

• Answer: By counting them in some large corpus of text!

(statistical
approximation)

8
n-gram Language Models: Example
Suppose we are learning a 4-gram Language Model.
as the proctor started the clock, the students opened their
discard
condition on this

For example, suppose that in the corpus:

• “students opened their” occurred 1000 times
• “students opened their books” occurred 400 times
•  P(books | students opened their) = 0.4 Should we have
discarded the
• “students opened their exams” occurred 100 times
“proctor” context?
•  P(exams | students opened their) = 0.1
9
Sparsity Problems with n-gram Language Models
Sparsity Problem 1
Problem: What if “students
(Partial) Solution: Add small 𝛿
opened their ” never
to the count for every .
occurred in data? Then
This is called smoothing.
has probability 0!

Sparsity Problem 2
Problem: What if “students
(Partial) Solution: Just condition
opened their” never occurred in
on “opened their” instead.
data? Then we can’t calculate
This is called backoff.
probability for any !

Note: Increasing n makes sparsity problems worse.

Typically we can’t have n bigger than 5.
10
Storage Problems with n-gram Language Models

Storage: Need to store count for

all n-grams you saw in the corpus.

Increasing n or increasing corpus

increases model size!

11
n-gram Language Models in practice
• You can build a simple trigram Language Model over a
1.7 million word corpus (Reuters) in a few seconds on your laptop*
Business and financial news
today the

get probability
distribution

company 0.153 Sparsity problem:

bank 0.153 not much granularity
price 0.077 in the probability
italian 0.039 distribution
emirate 0.039
…

Otherwise, seems reasonable! * Try for yourself: https://nlpforhackers.io/language-models/

12
Generating text with a n-gram Language Model
• You can also use a Language Model to generate text.

today the
condition on this
get probability
distribution

company 0.153
bank 0.153
price 0.077 sample
italian 0.039
emirate 0.039
…

13
Generating text with a n-gram Language Model
• You can also use a Language Model to generate text.

today the price

condition on this
get probability
distribution

of 0.308 sample
for 0.050
it 0.046
to 0.046
is 0.031
…

14
Generating text with a n-gram Language Model
• You can also use a Language Model to generate text.

today the price of

condition on this
get probability
distribution

the 0.072
18 0.043
oil 0.043
its 0.036
gold 0.018 sample
…

15
Generating text with a n-gram Language Model
• You can also use a Language Model to generate text.

today the price of gold

16
Generating text with a n-gram Language Model
• You can also use a Language Model to generate text.

today the price of gold per ton , while production of shoe

lasts and shoe industry , the bank intervened just after it
considered and rejected an imf demand to rebuild depleted
european stocks , sept 30 end primary 76 cts a share .

Surprisingly grammatical!

…but incoherent. We need to consider more than

three words at a time if we want to model language well.

But increasing n worsens sparsity problem,

17
and increases model size…
How to build a neural Language Model?
• Recall the Language Modeling task:
• Input: sequence of words
• Output: prob dist of the next word

• How about a window-based neural model?

• We saw this applied to Named Entity Recognition in Lecture 3:
LOCATION

18
museums in Paris are amazing
A fixed-window neural Language Model

as the proctor started the clock the students opened their

discard
fixed window
19
A fixed-window neural Language Model
books
laptops

output distribution

a zoo

hidden layer

concatenated word embeddings

words / one-hot vectors the students opened their

20
A fixed-window neural Language Model
books
Improvements over n-gram LM: laptops
• No sparsity problem
• Don’t need to store all observed
n-grams
a zoo
Remaining problems:
• Fixed window is too small
• Enlarging window enlarges
• Window can never be large
enough!
• and are multiplied by
completely different weights in .
No symmetry in how the inputs are
processed.

We need a neural
architecture that can the students opened their
process any length input
21
Core idea: Apply the
Recurrent Neural Networks (RNN) same weights
A family of neural architectures repeatedly

outputs
(optional) …

hidden states …

input sequence
(any length) …

22
A RNN Language Model books
laptops

output distribution

a zoo

hidden states

is the initial hidden state

word embeddings

words / one-hot vectors the students opened their

Note: this input sequence could be much

23
longer, but this slide doesn’t have space!
A RNN Language Model books
laptops

RNN Advantages:
• Can process any length
input
a zoo
• Computation for step t
can (in theory) use
information from many
steps back
• Model size doesn’t
increase for longer input
• Same weights applied on
every timestep, so there is
symmetry in how inputs
are processed.

RNN Disadvantages:
• Recurrent computation is More on
slow these later
• In practice, difficult to in the the students opened their
access information from course
many steps back
24
Training a RNN Language Model
• Get a big corpus of text which is a sequence of words
• Feed into RNN-LM; compute output distribution for every step t.
• i.e. predict probability dist of every word, given words so far

• Loss function on step t is cross-entropy between predicted probability

distribution , and the true next word (one-hot for ):

• Average this to get overall loss for entire training set:

25
Training a RNN Language Model
= negative log prob
of “students”
Loss

Predicted
prob dists

Corpus the students opened their exams …

26
Training a RNN Language Model
= negative log prob
of “opened”
Loss

Predicted
prob dists

Corpus the students opened their exams …

27
Training a RNN Language Model
= negative log prob
of “their”
Loss

Predicted
prob dists

Corpus the students opened their exams …

28
Training a RNN Language Model
= negative log prob
of “exams”
Loss

Predicted
prob dists

Corpus the students opened their exams …

29
Training a RNN Language Model

Loss + + + +… =

Predicted
prob dists

Corpus the students opened their exams …

30
Training a RNN Language Model
• However: Computing loss and gradients across entire corpus
is too expensive!

• In practice, consider as a sentence (or a document)

• Recall: Stochastic Gradient Descent allows us to compute loss

and gradients for small chunk of data, and update.

• Compute loss for a sentence (actually a batch of

sentences), compute gradients and update weights. Repeat.

31
Backpropagation for RNNs

… …

Question: What’s the derivative of w.r.t. the repeated weight matrix ?

“The gradient w.r.t. a repeated weight

Answer: is the sum of the gradient
w.r.t. each time it appears”

Why?

32
Multivariable Chain Rule

Source:
https://www.khanacademy.org/math/multivariable-calculus/multivariable-derivatives/differentiating-vector-valued-functions/a/multivariable-chain-rule-simple-version

33
Backpropagation for RNNs: Proof sketch

In our example: Apply the multivariable chain rule:

34
Backpropagation for RNNs

… …

Answer: Backpropagate over

timesteps i=t,…,0, summing
gradients as you go.
This algorithm is called
“backpropagation through time”

Question: How do we
35 calculate this?
Generating text with a RNN Language Model
Just like a n-gram Language Model, you can use a RNN Language Model to
generate text by repeated sampling. Sampled output is next step’s input.

favorite season is spring

sample sample sample sample

36 spring
my favorite season is
Generating text with a RNN Language Model
• Let’s have some fun!
• You can train a RNN-LM on any kind of text, then generate text
in that style.
• RNN-LM trained on Obama speeches:

Source: https://medium.com/@samim/obama-rnn-machine-generated-political-speeches-c8abd18a2ea0
37
Generating text with a RNN Language Model
• Let’s have some fun!
• You can train a RNN-LM on any kind of text, then generate text
in that style.
• RNN-LM trained on Harry Potter:

Source: https://medium.com/deep-writing/harry-potter-written-by-artificial-intelligence-8a9431803da6
38
Generating text with a RNN Language Model
• Let’s have some fun!
• You can train a RNN-LM on any kind of text, then generate text
in that style.
• RNN-LM trained on recipes:

Source: https://gist.github.com/nylki/1efbaa36635956d35bcc
39
Generating text with a RNN Language Model
• Let’s have some fun!
• You can train a RNN-LM on any kind of text, then generate text
in that style.
• RNN-LM trained on paint color names:

This is an example of a character-level RNN-LM (predicts what character comes next)

40 Source: http://aiweirdness.com/post/160776374467/new-paint-colors-invented-by-neural-network
Evaluating Language Models
• The standard evaluation metric for Language Models is perplexity.

Normalized by
number of words

Inverse probability of corpus, according to Language Model

• This is equal to the exponential of the cross-entropy loss :

Lower perplexity is better!

41
RNNs have greatly improved perplexity

n-gram model

Increasingly
complex RNNs

Perplexity improves
(lower is better)

Source: https://research.fb.com/building-an-efficient-neural-language-model-over-a-billion-words/

42
Why should we care about Language Modeling?
• Language Modeling is a benchmark task that helps us
measure our progress on understanding language

• Language Modeling is a subcomponent of many NLP tasks,

especially those involving generating text or
estimating the probability of text:
• Predictive typing
• Speech recognition
• Handwriting recognition
• Spelling/grammar correction
• Authorship identification
• Machine translation
• Summarization
• Dialogue
• etc.
43
Recap
• Language Model: A system that predicts the next word

• Recurrent Neural Network: A family of neural networks that:

• Take sequential input of any length
• Apply the same weights on each step
• Can optionally produce output on each step

• Recurrent Neural Network ≠ Language Model

• We’ve shown that RNNs are a great way to build a LM.

• But RNNs are useful for much more!

44
RNNs can be used for tagging
e.g. part-of-speech tagging, named entity recognition

DT JJ NN VBN IN DT NN

the startled cat knocked over the vase

45
RNNs can be used for sentence classification
e.g. sentiment classification
positive How to compute
sentence encoding?

Sentence encoding

overall I enjoyed the movie a lot

46
RNNs can be used for sentence classification
e.g. sentiment classification
positive How to compute
sentence encoding?

Basic way:
Sentence encoding Use final hidden state

overall I enjoyed the movie a lot

47
RNNs can be used for sentence classification
e.g. sentiment classification
positive How to compute
sentence encoding?

Usually better:
Sentence encoding Take element-wise max or
mean of all hidden states

overall I enjoyed the movie a lot

48
RNNs can be used as an encoder module
e.g. question answering, machine translation, many other tasks!
Answer: German
Here the RNN acts as an
encoder for the Question (the
hidden states represent the
Question). The encoder is part
of a larger neural system.
Context: Ludwig van
Beethoven was a
German composer
and pianist. A crucial
figure …

Question: what nationality was Beethoven ?

49
RNN-LMs can be used to generate text
e.g. speech recognition, machine translation, summarization
RNN-LM

what’s the weather

Input (audio)

conditioning

<START> what’s the

This is an example of a conditional language model.

We’ll see Machine Translation in much more detail later.

50
A note on terminology
RNN described in this lecture = “vanilla RNN”

Next lecture: You will learn about other RNN flavors

like GRU and LSTM and multi-layer RNNs

By the end of the course: You will understand phrases like

“stacked bidirectional LSTM with residual connections and self-attention”

51
Next time
• Problems with RNNs!
• Vanishing gradients

motivates

• Fancy RNN variants!

• LSTM
• GRU
• multi-layer
• bidirectional

Cs224n 2025 Lecture05 RNNLM
No ratings yet
Cs224n 2025 Lecture05 RNNLM
54 pages
Cs224n 2023 Lecture05 RNNLM
No ratings yet
Cs224n 2023 Lecture05 RNNLM
68 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
142 pages
N-Gram vs. RNN Language Models
No ratings yet
N-Gram vs. RNN Language Models
37 pages
Language Models
No ratings yet
Language Models
11 pages
XCS224N Module4 Slides
No ratings yet
XCS224N Module4 Slides
91 pages
Language Modeling Lecture Notes
No ratings yet
Language Modeling Lecture Notes
88 pages
01-Transformer Based NLP Applications
No ratings yet
01-Transformer Based NLP Applications
55 pages
NLP Lecture: Language Models & RNNs
No ratings yet
NLP Lecture: Language Models & RNNs
14 pages
L5 Cse256 Fa24 LM
No ratings yet
L5 Cse256 Fa24 LM
65 pages
NLP Unit-4
No ratings yet
NLP Unit-4
62 pages
Building a Recurrent Neural Network Language Model
No ratings yet
Building a Recurrent Neural Network Language Model
50 pages
Ngrams
No ratings yet
Ngrams
22 pages
NLP Unit-5.2 Notes
No ratings yet
NLP Unit-5.2 Notes
72 pages
Language Modelling-NGRAM, NeuralLM
No ratings yet
Language Modelling-NGRAM, NeuralLM
16 pages
Language Modeling
No ratings yet
Language Modeling
50 pages
Lecture 3 - Language Modelling and RNNs Part 1
No ratings yet
Lecture 3 - Language Modelling and RNNs Part 1
44 pages
RNN For Moodle
No ratings yet
RNN For Moodle
42 pages
LSTM and RNN Language Models
No ratings yet
LSTM and RNN Language Models
59 pages
Brief Introduction To LLM
No ratings yet
Brief Introduction To LLM
69 pages
Neural Network Language Models Survey
No ratings yet
Neural Network Language Models Survey
7 pages
Probabilistic Language Modeling Challenges
No ratings yet
Probabilistic Language Modeling Challenges
12 pages
Ngrams
100% (1)
Ngrams
22 pages
CS 388: Natural Language Processing:: N-Gram Language Models
No ratings yet
CS 388: Natural Language Processing:: N-Gram Language Models
22 pages
LLM Book 43-102
No ratings yet
LLM Book 43-102
60 pages
Mikolov10 Interspeech
No ratings yet
Mikolov10 Interspeech
4 pages
Advancements in Deep Neural Language Models
No ratings yet
Advancements in Deep Neural Language Models
9 pages
2 Generative Models
No ratings yet
2 Generative Models
60 pages
N-Gram Language Models in NLP
No ratings yet
N-Gram Language Models in NLP
22 pages
NLP
No ratings yet
NLP
12 pages
Language Modeling
No ratings yet
Language Modeling
3 pages
Neural Cache Model for Language Prediction
No ratings yet
Neural Cache Model for Language Prediction
9 pages
Hocken Maier 25
No ratings yet
Hocken Maier 25
46 pages
Unit-5 Notes NLP
No ratings yet
Unit-5 Notes NLP
28 pages
6.chapter6 LanguageModel
No ratings yet
6.chapter6 LanguageModel
33 pages
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
No ratings yet
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
53 pages
NLP Model
No ratings yet
NLP Model
6 pages
Lecture 6 To 8 N-Gram
No ratings yet
Lecture 6 To 8 N-Gram
19 pages
NLP Internal
No ratings yet
NLP Internal
15 pages
PLM Language Models Overview
No ratings yet
PLM Language Models Overview
35 pages
Intro DL 10 NLP
No ratings yet
Intro DL 10 NLP
99 pages
NLP-Ch-2 Introduction To Language Models
No ratings yet
NLP-Ch-2 Introduction To Language Models
82 pages
L6 - UCLxDeepMind DL2020 Document of Google
No ratings yet
L6 - UCLxDeepMind DL2020 Document of Google
141 pages
14 LookingForward
No ratings yet
14 LookingForward
48 pages
3 Sequence and Language Modeling
No ratings yet
3 Sequence and Language Modeling
56 pages
NLP Week7 RNNLSTM
No ratings yet
NLP Week7 RNNLSTM
66 pages
Statistical Language Model
No ratings yet
Statistical Language Model
9 pages
NLP Language Models Explained
No ratings yet
NLP Language Models Explained
9 pages
Neural Network Language Modeling Overview
No ratings yet
Neural Network Language Modeling Overview
33 pages
Language Modeling Course Notes
No ratings yet
Language Modeling Course Notes
19 pages
Recurrent Neural Nets
No ratings yet
Recurrent Neural Nets
144 pages
Lecture 6 N Gram Language Models Contd Annotations
No ratings yet
Lecture 6 N Gram Language Models Contd Annotations
36 pages
NLP Unit-5
No ratings yet
NLP Unit-5
13 pages
2.1 Chap NLP Ngrams
No ratings yet
2.1 Chap NLP Ngrams
37 pages
Lecture 10 - N-Gram Language Models4 - Unit 2
No ratings yet
Lecture 10 - N-Gram Language Models4 - Unit 2
4 pages
SREEDEV C S Seminar Report
No ratings yet
SREEDEV C S Seminar Report
36 pages
Unit - 3 - DL
No ratings yet
Unit - 3 - DL
15 pages
Pearson Correlation Coefficient-Based Performance Enhancement of Broad Learning System For Stock Price Prediction
No ratings yet
Pearson Correlation Coefficient-Based Performance Enhancement of Broad Learning System For Stock Price Prediction
6 pages
Practical Guide To Keras
No ratings yet
Practical Guide To Keras
28 pages
NNFL Midsem Presentation
No ratings yet
NNFL Midsem Presentation
20 pages
IMPALA: Scalable Distributed Deep-RL With Importance Weighted Actor-Learner Architectures
No ratings yet
IMPALA: Scalable Distributed Deep-RL With Importance Weighted Actor-Learner Architectures
22 pages
Predicting Learning Status in MOOCs Using LSTM
No ratings yet
Predicting Learning Status in MOOCs Using LSTM
5 pages
A Review On Basic Deep Learning
No ratings yet
A Review On Basic Deep Learning
9 pages
AI Based Automated Essay Grading System Using NLP
No ratings yet
AI Based Automated Essay Grading System Using NLP
6 pages
Deep Learning Neural Networks Overview
No ratings yet
Deep Learning Neural Networks Overview
31 pages
LSTM 1997
No ratings yet
LSTM 1997
3 pages
Wisen Document Text
No ratings yet
Wisen Document Text
26 pages
Basic Introduction To Fake News Detection
No ratings yet
Basic Introduction To Fake News Detection
5 pages
Data Science Roadmap
No ratings yet
Data Science Roadmap
41 pages
Speech Emotion Recognition Insights
No ratings yet
Speech Emotion Recognition Insights
54 pages
Cloud Resource Prediction Model
No ratings yet
Cloud Resource Prediction Model
10 pages
Action N Pose Estimation
No ratings yet
Action N Pose Estimation
84 pages
Water Quality Prediction Using Artificial Intellig
No ratings yet
Water Quality Prediction Using Artificial Intellig
12 pages
Paper 54-A Method For Network Intrusion Detection
No ratings yet
Paper 54-A Method For Network Intrusion Detection
9 pages
AI & Deep Learning Certification Course
No ratings yet
AI & Deep Learning Certification Course
12 pages
Unit III
No ratings yet
Unit III
43 pages
LSTM Predicts Campus Energy Use
No ratings yet
LSTM Predicts Campus Energy Use
12 pages
LSTM Stock Prediction Proposal
100% (1)
LSTM Stock Prediction Proposal
5 pages
A Comprehensive Review of Artificial Neural Network Techniques Used For Smart Meter-Embedded Forecasting System
No ratings yet
A Comprehensive Review of Artificial Neural Network Techniques Used For Smart Meter-Embedded Forecasting System
12 pages
2 - A Review On Methods and Applications of Artificial Intelligence On Fault
No ratings yet
2 - A Review On Methods and Applications of Artificial Intelligence On Fault
14 pages
Answer Key Class Test 1 Paper3
No ratings yet
Answer Key Class Test 1 Paper3
7 pages
A Stacked GRU-RNN-Based Approach For Predicting Renewable Energy and Electricity Load For Smart Grid Operation
No ratings yet
A Stacked GRU-RNN-Based Approach For Predicting Renewable Energy and Electricity Load For Smart Grid Operation
9 pages
NN UNIT-1 Complete Notes With 153 Pages
No ratings yet
NN UNIT-1 Complete Notes With 153 Pages
153 pages
2025 - The Perfect Chatbot
No ratings yet
2025 - The Perfect Chatbot
37 pages
CS 229 - Deep Learning Cheatsheet
No ratings yet
CS 229 - Deep Learning Cheatsheet
6 pages