0% found this document useful (0 votes)

65 views72 pages

2020 NLPDeepLearning

The document discusses advances in natural language processing with deep learning. It summarizes three periods of NLP: symbolic-based NLP from the 1950s-1990s which used rules and patterns; corpus-based NLP from the 1990s-2010s which applied machine learning; and neural-based NLP from the 2010s-present which uses deep learning models like word2vec to generate distributed word representations from large corpora.

Uploaded by

ricksant2003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views72 pages

2020 NLPDeepLearning

Uploaded by

ricksant2003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

SEMINÁRIOS DA ESCOLA DE INFORMÁTICA & COMPUTAÇÃO

AVANÇOS RECENTES NO
PROCESSAMENTO DE
LINGUAGEM NATURAL COM DEEP
LEARNING

Prof. Eduardo Bezerra

ebezerra@[Link]

08/out/2020
Summary
2

 Introduction
 NLP Periods
 Symbolic based
 Corpus-based
 Neural-based
 Conclusions
3
Introduction
What is NLP?
4

 At the intersection of linguistics,

computer science, and artificial
intelligence.
 Has to do with processing and analyzing
large amounts of natural language data.
 “processing and analyzing”  extract
context and meaning.
NLP is pop, but it is hard!
5

 Homonymy, polysemy, …
Jaguar is the luxury vehicle brand of Land Rover.

The jaguar is an animal of the genus Panthera native to the Americas

 Natural languages are unstructured,

redundant and ambiguous.
Enraged cow injures farmer with ax.
NLP Tasks/Applications
6

 Text classification, clustering, summarization

 Machine translation
 Conversational chatbots
 Question answering
 Speech synthesis & recognition
 Text generation
 Auto-correcting
7
NLP Periods
NLP periods
8
9
Symbolic-based NLP (1950s-1990s)
Symbolic-based NLP
10

 Georgetown Experiment (1954)

 ELISA (1964-1966)
 Cyc Project (1984)
 WordNet (1985)

1950s-1990s
Georgetown-IBM experiment
11

 Machine translation: automatic

translation of Russian sentences into
English.
“The experiment was considered a
success and encouraged
governments to invest in
computational linguistics. The
project managers claimed that
machine translation would be a
reality in three to five years.”
1954
ELIZA
12

“Natural language” conversation through

pattern matching.
“...sister...”  “Tell me more about your
family.”

1966
Cyc Project
13

1984-
WordNet
14

155,327 words
organized in
175,979 synsets for
a total of 207,016
word-sense pairs

1985-
WordNet – graph fragment
15

chicken
Is_a poultry Purpose supply Typ_obj
clean Is_a Quesp
smooth Typ_obj keep
Is_a hen duck
Is_a
Typ_obj Purpose meat
preen Typ_subj Caused_by
Is_a egg
Means quack
Not_is_a plant
chatter Typ_subj animal
Is_a Is_a
Is_a Is_a creature
make bird Is_a
Typ_obj sound
gaggle Part feather
Is_a Is_a
Classifier goose wing Is_a limb
peck Is_a
number Typ_subj Is_a
claw
Is_a Means Is_a
beak Part Part
hawk Is_a
Typ_obj
strike Typ_subj
fly
leg
turtle catch
Is_a Typ_subj Is_a
bill arm
face Location mouth Is_a opening
16
Corpus-based NLP (1990s-2010s)
Corpus-based NLP
17

1990s-2010s
Corpus-based NLP (aka ML-based)
18

 Successful applications of ML methods to

text data
 e.g., SVM, HMM

1990s-2010s
Corpus-based NLP (aka ML-based)
19

 Text Mining

1990s-2010s
20
Neural-based NLP (2010s-present)
21
Conception, gestation, …, birth!
22

"There is a moment of conception and

a moment of birth, but between them
there is a long period of gestation."

Jonas Salk, 1914-1995

Distributional Hypothesis
23

“The more semantically similar two words are, the more

distributionally similar they will be in turn, and thus the more
that they will tend to occur in similar linguistic contexts.”

“words that are similar in meaning occur in similar contexts”

1950s
Distributional Hypothesis
24

“words that are similar in meaning occur in similar contexts”

It would be marvelous to watch a match between Kasparov and Fisher.

similar words

It would be fantastic to watch a match between Kasparov and Fisher.

Zellig Harris, 1909-1992

1950s
Vector Space Model (for Information Retrieval)
25

 SMART Information Retrieval System

term-doc matrix

First attempt to model

1960s text elements as vectors Gerard Salton, 1927-1995

Vector Space Model
26

 Similarity between docs (sentences,

words)

1960s Gerard Salton, 1927-1995

Distributed Representations
27

1986
Latent semantic analysis (LSA)
28

1988
Latent semantic analysis (LSA)
29

1988
Latent semantic analysis (LSA)
30

 LSA creates context vectors

1988
Distributed representation – an example
31

Image by Garrett Hoffman

Distributed representation – an example
32

Image by Garrett Hoffman

Distributed representation – an example
33

Image by Garrett Hoffman

Conception, gestation, …, birth!
34

 Conception, gestation
 Distributional hypothesis
 Vector Space model
 LSA
 Distributed representations
 Now, for the Deep Learning based NLP
birth…
Neural-based NLP (aka Deep Learning based)
35

 Most SOTA results in NLP today are

obtained through Deep Learning
methods.
 One of the main achievements of this
period is related to building rich
distributed representations of text
objects through deep neural networks.
2010s-present
word2vec
36

 Efficient Estimation of Word

Representations in Vector
Space, September 7th, 2013.
 Distributed Representations of
Words and Phrases and their
Compositionality, October 16th, Tomas Mikolov

2013. (20K+ citations)

2013 Idea: each word can be represented by a fixed-length numeric
vector. Words of similar meanings have similar vectors.
word2vec
37

 In word2vec, a single hidden layer NN is trained to

perform a certain “fake” task.
 Skip-gram: predicting surrounding context words
given a center word.
 CBOW: predicting a center word from the surrounding
context.
 But this NN is not actually used!
 Instead, the goal is to learn the weights of the hidden
layer– these weights are the “word vectors”.
word2vec: skip-gram alternative
38

 The task: given a specific word w in the middle

of a sentence (the input word), look at the
words nearby and pick one word at random.
 The solution: train an ANN to produce the
probability (for every word in the vocabulary) of
being nearby w.
 “nearby” means there is actually a "window size"
hyperparameter (typical value: 5)
word2vec
39

 Each word in the vocabulary is

represented using one hot encoding (aka
local representation!).

Credits: Marco Bonzaninin

word2vec
40

Credits: Marco Bonzaninin

word2vec
41

Credits: Marco Bonzaninin

word2vec
42

Skip-gram NN
architecture

The amount of neurons in the hidden layer (a hyperparameter) determines de size of the embedding.
word2vec
43
word2vec
44

 word2vec captures context similarity:

 If words wj and wk have similar contexts, then the
model needs to output very similar results for
them.
 Oneway for the network to do this is to make the word
vectors for wj and wk very similar.
 So, if two words have similar contexts, the network
is motivated to learn similar word vectors for
them.
word2vec
45

Credits: [Link]
Embedding models
46

 Word2Vec
 GloVe Currently, the distributional hypothesis through vector
embeddings models generated by ANNs is used
 SkipThoughts pervasively in NLP.

 Paragraph2Vec
 Doc2Vec
 FastText
Encoder-Decoder models (aka seq2seq models)
47

Encoder

Decoder

[Link]
“Classical” Encoder-Decoder model
48

“The idea is to use one LSTM to read the input sequence,

one timestep at a time, to obtain large fixed-dimensional
vector representation, and then to use another LSTM to
extract the output sequence from that vector

2014 recurrent architecture

Encoder-Decoder model with Attention
49

2015 recurrent architecture

Attention models into recurrent NNs
50

2015 Bahdanau et al 2015

Transformers
51

ATTENTION

“We propose a new simple network

architecture, the Transformer, based

2017 solely on attention mechanisms,

dispensing with recurrence and
convolutions entirely.” feedforward architecture!
Transformers
52

 Transformers are the

current SOTA neural
architecture when it
comes to produce text
representations to use in
most NLP tasks.

From Vaswani et al (2018)

Famous Transformer Models
53

 BERT (Bidirection Encoder Representations from

Transformers)
 GPT-2 (Generative Pre-Training)
 GPT-3

2018-2020
54
Conclusions
Take away notes
55

 SOTA results in most NLP is currently

neural-based.
 Neural-based NLP is recent, but relies on
older ideas.
 Attention mechanism is a novel and very
promising idea.
Pretrained models
56

[Link]
[Link]
Neural Nets need a Vapnik!
57

The theory about generalization properties of ANNs is not completely understood.

TODO: Natural Language Understanding
58

 Headlines:
 Enraged Cow Injures Farmer With Ax
 Hospitals Are Sued by 7 Foot Doctors
 Ban on Nude Dancing on Governor’s Desk
 Iraqi Head Seeks Arms
 Local HS Dropouts Cut in Half
 Juvenile Court to Try Shooting Defendant
 Stolen Painting Found by Tree
Humans use their underlying understanding of the world as context
Source: CS188
TODO: Common Sense Knowledge
59

"If a mother has a son, then the son is younger than

the mother and remains younger for his entire life."

"If President Trump is in Washington, then his left foot

is also in Washington,"
Food for thought
60

“There’ll be a lot of people who argue against

it, who say you can’t capture a thought like
that. But there’s no reason why not. I think
you can capture a thought by a vector.”

Geoff Hinton
These slides are available at
[Link]

Eduardo Bezerra (ebezerra@cefet-

[Link])
62
Backup slides
Language Models (Unigrams, Bigrams, etc.)
63

 A model that assigns a probability to a

sequence of tokens.
 A good language model gives...
 ...(syntactically and semantically) valid
sentences a high probability.
 ...low probability to nonsense.
Language Models (Unigrams, Bigrams, etc.)
64

 Mathematically, we can apply a LM to

any given sequence of n words:
Language Models (Unigrams, Bigrams, etc.)
65

 An example:
"The quick brown fox jumps over the lazy
dog."
 Another example:

"The quik brown lettuce over jumps the

lazy dog.“
Language Models (Unigrams, Bigrams, etc.)
66

Unigram model

Bigram model

But, how to learn these

probabilities?
Transfer Learning
67
68
Neural Nets
Artificial Neural Net
69

 It is possible to build arbitrarily complex

networks using the artificial neuron as
the basic component.
Artificial Neural Net
70

Feedforward Neural Network

Training
71

 Given a training set of the form

 training an ANN corresponds to using this set to

adjust the parameters of the network, so that the
training error is minimized.

 So, training of an ANN is an optimization problem.

Training
72

 The error signal (computed with a cost function)

is used during training to gradually change the
weights (parameters), so that the predictions are
morePick
accurate.
a batch Propagate them through
of training layers from input to
examples output ()

Update Backpropagate the error

parameters for all signal through the
hidden layers layers from the output
W, b. to the input ()

2022 Foundations Tutorial3 Sunwang Deeplearning4nlp
No ratings yet
2022 Foundations Tutorial3 Sunwang Deeplearning4nlp
103 pages
Intro DL 10 NLP
No ratings yet
Intro DL 10 NLP
99 pages
Unit - 4 DL
No ratings yet
Unit - 4 DL
33 pages
Word2Vec for NLP Enthusiasts
100% (1)
Word2Vec for NLP Enthusiasts
12 pages
DL Unit-IV
No ratings yet
DL Unit-IV
20 pages
Foundations of Text Representation, LLMs and Transformers
No ratings yet
Foundations of Text Representation, LLMs and Transformers
87 pages
The 7 NLP Techniques That Will Change How You Communicate in The Future (Part I)
No ratings yet
The 7 NLP Techniques That Will Change How You Communicate in The Future (Part I)
19 pages
Transfer Learning in Natural Language Processing PDF
0% (1)
Transfer Learning in Natural Language Processing PDF
238 pages
NLP 1
No ratings yet
NLP 1
15 pages
Word Vectors in NLP: Skip-Gram Model
No ratings yet
Word Vectors in NLP: Skip-Gram Model
11 pages
NLP M4 Part 2 SPP
No ratings yet
NLP M4 Part 2 SPP
71 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
36 pages
Slide
No ratings yet
Slide
28 pages
Introduction to NLP and Word2Vec
No ratings yet
Introduction to NLP and Word2Vec
13 pages
AI4youngster - 6 - Topic NLP
No ratings yet
AI4youngster - 6 - Topic NLP
66 pages
Natural Language Processing
No ratings yet
Natural Language Processing
6 pages
11-Transformer LLMs Updated
No ratings yet
11-Transformer LLMs Updated
96 pages
NLP Slides2
No ratings yet
NLP Slides2
93 pages
06 Wordvectors
No ratings yet
06 Wordvectors
96 pages
NLP and Word Vector Representation
No ratings yet
NLP and Word Vector Representation
86 pages
Intro to NLP & Word Vectors
No ratings yet
Intro to NLP & Word Vectors
42 pages
Unit 5b - Natural Language Processing
No ratings yet
Unit 5b - Natural Language Processing
41 pages
Thuyết Trình TWP
No ratings yet
Thuyết Trình TWP
7 pages
NLP 160709201345
No ratings yet
NLP 160709201345
61 pages
Dealing With Textual Data
No ratings yet
Dealing With Textual Data
67 pages
ML For NLP-LO4
No ratings yet
ML For NLP-LO4
42 pages
Madhav Institute of Technology & Science, Gwalior
No ratings yet
Madhav Institute of Technology & Science, Gwalior
13 pages
Word Embadding
No ratings yet
Word Embadding
24 pages
GenAI Workflow Automation NPTEL Zoom Course
No ratings yet
GenAI Workflow Automation NPTEL Zoom Course
88 pages
Word Vectors and Text Classification Techniques
No ratings yet
Word Vectors and Text Classification Techniques
52 pages
BDMH LLM
No ratings yet
BDMH LLM
51 pages
Lecture 2
No ratings yet
Lecture 2
80 pages
cs224n 2025 Lecture02 Wordvecs2
No ratings yet
cs224n 2025 Lecture02 Wordvecs2
46 pages
08 Word Embeddings (2021)
No ratings yet
08 Word Embeddings (2021)
58 pages
2009 Tutorial Nips
No ratings yet
2009 Tutorial Nips
113 pages
NLP Unit 1 & 2
No ratings yet
NLP Unit 1 & 2
29 pages
NLP Deep Learning Course Overview
No ratings yet
NLP Deep Learning Course Overview
40 pages
08-DL-Deep Learning For Text Data (Transfer Learning in NLP)
No ratings yet
08-DL-Deep Learning For Text Data (Transfer Learning in NLP)
53 pages
Unit 2
No ratings yet
Unit 2
6 pages
Big Data Analytics Chap 11
No ratings yet
Big Data Analytics Chap 11
8 pages
Natural Language Processing: Presented By
No ratings yet
Natural Language Processing: Presented By
22 pages
NLP Word Embeddings Explained
No ratings yet
NLP Word Embeddings Explained
55 pages
NLP: Key Concepts and Applications
No ratings yet
NLP: Key Concepts and Applications
5 pages
Deep Learning for NLP Students
No ratings yet
Deep Learning for NLP Students
53 pages
Word 2 Vec
No ratings yet
Word 2 Vec
28 pages
NLP Challenges & Techniques
No ratings yet
NLP Challenges & Techniques
45 pages
Hocken Maier 25
No ratings yet
Hocken Maier 25
46 pages
Language Models
No ratings yet
Language Models
11 pages
NLP Intro
No ratings yet
NLP Intro
74 pages
Lecture3 2022 4pages
No ratings yet
Lecture3 2022 4pages
24 pages
01-Transformer Based NLP Applications
No ratings yet
01-Transformer Based NLP Applications
55 pages
Word Embedding Techniques Explained
No ratings yet
Word Embedding Techniques Explained
9 pages
XCS224N Module1 Slides
No ratings yet
XCS224N Module1 Slides
72 pages
Introduction To NLPAbebe Zerihun
No ratings yet
Introduction To NLPAbebe Zerihun
45 pages
Ntru 3 Final
No ratings yet
Ntru 3 Final
16 pages
1 s2.0 S0167404817301578 Main
No ratings yet
1 s2.0 S0167404817301578 Main
14 pages
Hands-On Speech Recognition Wit - Yamin Ren
No ratings yet
Hands-On Speech Recognition Wit - Yamin Ren
223 pages
2019 Data Scientist Insights Report
No ratings yet
2019 Data Scientist Insights Report
23 pages
Deep Features for Object Localization
No ratings yet
Deep Features for Object Localization
9 pages
Interpretable Machine Learning Overview
No ratings yet
Interpretable Machine Learning Overview
15 pages
d2l en PDF
No ratings yet
d2l en PDF
995 pages
Memory Dump ICS Lab.
No ratings yet
Memory Dump ICS Lab.
21 pages
Introduction to Hidden Markov Models
No ratings yet
Introduction to Hidden Markov Models
30 pages
Malware Detection Using Statistical Analysys Byte Level File Content
No ratings yet
Malware Detection Using Statistical Analysys Byte Level File Content
9 pages
Ijet 22797
No ratings yet
Ijet 22797
4 pages
Thesis Final PDF
No ratings yet
Thesis Final PDF
93 pages
On The Loss of Orthogonality in The Gram-Schmidt Orthogonalization Process
No ratings yet
On The Loss of Orthogonality in The Gram-Schmidt Orthogonalization Process
7 pages
LxMLS Lab Guide: Python Setup & Basics
No ratings yet
LxMLS Lab Guide: Python Setup & Basics
115 pages
Explaining Vulnerabilities of Deep Learning To Adversarial Malware Binaries
No ratings yet
Explaining Vulnerabilities of Deep Learning To Adversarial Malware Binaries
13 pages
An Empirical Study of The Naive Bayes Classifier
No ratings yet
An Empirical Study of The Naive Bayes Classifier
7 pages
Infographic Renewal Programv2 - SANS
No ratings yet
Infographic Renewal Programv2 - SANS
1 page
British Chess Magazine 2017-137
100% (2)
British Chess Magazine 2017-137
64 pages
Control Systems Lecture 6
No ratings yet
Control Systems Lecture 6
3 pages
Disadvantages of Vlocity Omnistudio
No ratings yet
Disadvantages of Vlocity Omnistudio
3 pages
Production Planning and Control (PPC) : - Submitted By: - Mr. Sudhir Pawar
No ratings yet
Production Planning and Control (PPC) : - Submitted By: - Mr. Sudhir Pawar
32 pages
Fundamentals Part 1
No ratings yet
Fundamentals Part 1
14 pages
Peng Et Al.: Deep Learning and Practice 1
No ratings yet
Peng Et Al.: Deep Learning and Practice 1
8 pages
Software Engineering Practice Principles
No ratings yet
Software Engineering Practice Principles
5 pages
Artificial Intelligence Class IX Code 417 Part B Unit 1
75% (32)
Artificial Intelligence Class IX Code 417 Part B Unit 1
30 pages
An IEEE Standard Ontology For Robotics and Automat
No ratings yet
An IEEE Standard Ontology For Robotics and Automat
7 pages
Designing Weights for TLNs
No ratings yet
Designing Weights for TLNs
2 pages
Swarm-Based LSTM for Temp Forecasting
No ratings yet
Swarm-Based LSTM for Temp Forecasting
16 pages
Chat AI Clone Project Approval Form
No ratings yet
Chat AI Clone Project Approval Form
2 pages
Software Testing Challenges
No ratings yet
Software Testing Challenges
16 pages
Chp-1 Basic Concepts of Systems
No ratings yet
Chp-1 Basic Concepts of Systems
7 pages
Understanding Cause-and-Effect Diagrams
No ratings yet
Understanding Cause-and-Effect Diagrams
4 pages
UNIT-5 MYCIN Case Studies
No ratings yet
UNIT-5 MYCIN Case Studies
58 pages
BABOK - Techniques by KA
No ratings yet
BABOK - Techniques by KA
1 page
Zlib - Pub Computational Interaction
No ratings yet
Zlib - Pub Computational Interaction
433 pages
Language and Processors For Requirements Specification
0% (1)
Language and Processors For Requirements Specification
14 pages
Software Design Fundamentals and Methods
No ratings yet
Software Design Fundamentals and Methods
131 pages
OO Analysis & Design Course Guide
No ratings yet
OO Analysis & Design Course Guide
35 pages
Modelling, Simulation and Control Design For Robotic Manipulators PDF
100% (2)
Modelling, Simulation and Control Design For Robotic Manipulators PDF
16 pages
ID Penerapan Manajemen Risiko Pada Pabrik Kelapa Sawit Pks PTPN IV Unit Usaha Pabat
No ratings yet
ID Penerapan Manajemen Risiko Pada Pabrik Kelapa Sawit Pks PTPN IV Unit Usaha Pabat
6 pages
IATF 16949-2016 Requirements.
100% (1)
IATF 16949-2016 Requirements.
24 pages
Self-Calibration of Intelligent Sensors Based On Virtual Analysis
No ratings yet
Self-Calibration of Intelligent Sensors Based On Virtual Analysis
6 pages
Learning Scenarios
No ratings yet
Learning Scenarios
25 pages
Final Quiz 2 - Attempt Review
No ratings yet
Final Quiz 2 - Attempt Review
3 pages
IT2401 - Fundamentals of Software Engineering: University of Colombo School of Computing
No ratings yet
IT2401 - Fundamentals of Software Engineering: University of Colombo School of Computing
12 pages
Bus Reservation System Final
No ratings yet
Bus Reservation System Final
57 pages
QSP-08 Employee Motivation and Empowerment
No ratings yet
QSP-08 Employee Motivation and Empowerment
1 page
Thermodynamics Basics Guide
No ratings yet
Thermodynamics Basics Guide
5 pages

2020 NLPDeepLearning

Uploaded by

2020 NLPDeepLearning

Uploaded by

SEMINÁRIOS DA ESCOLA DE INFORMÁTICA & COMPUTAÇÃO

Prof. Eduardo Bezerra

 At the intersection of linguistics,

The jaguar is an animal of the genus Panthera native to the Americas

 Natural languages are unstructured,

 Text classification, clustering, summarization

 Georgetown Experiment (1954)

 Machine translation: automatic

“Natural language” conversation through

 Successful applications of ML methods to

"There is a moment of conception and

Jonas Salk, 1914-1995

“The more semantically similar two words are, the more

“words that are similar in meaning occur in similar contexts”

“words that are similar in meaning occur in similar contexts”

It would be marvelous to watch a match between Kasparov and Fisher.

It would be fantastic to watch a match between Kasparov and Fisher.

Zellig Harris, 1909-1992

 SMART Information Retrieval System

First attempt to model

1960s text elements as vectors Gerard Salton, 1927-1995

 Similarity between docs (sentences,

1960s Gerard Salton, 1927-1995

 LSA creates context vectors

Image by Garrett Hoffman

Image by Garrett Hoffman

Image by Garrett Hoffman

 Most SOTA results in NLP today are

 Efficient Estimation of Word

2013. (20K+ citations)

 In word2vec, a single hidden layer NN is trained to

 The task: given a specific word w in the middle

 Each word in the vocabulary is

Credits: Marco Bonzaninin

Credits: Marco Bonzaninin

Credits: Marco Bonzaninin

 word2vec captures context similarity:

“The idea is to use one LSTM to read the input sequence,

2014 recurrent architecture

2015 recurrent architecture

2015 Bahdanau et al 2015

“We propose a new simple network

2017 solely on attention mechanisms,

 Transformers are the

From Vaswani et al (2018)

 BERT (Bidirection Encoder Representations from

 SOTA results in most NLP is currently

The theory about generalization properties of ANNs is not completely understood.

"If a mother has a son, then the son is younger than

"If President Trump is in Washington, then his left foot

“There’ll be a lot of people who argue against

Eduardo Bezerra (ebezerra@cefet-

 A model that assigns a probability to a

 Mathematically, we can apply a LM to

"The quik brown lettuce over jumps the

But, how to learn these

 It is possible to build arbitrarily complex

Feedforward Neural Network

 Given a training set of the form

 training an ANN corresponds to using this set to

 So, training of an ANN is an optimization problem.

 The error signal (computed with a cost function)

Update Backpropagate the error

You might also like