0% found this document useful (0 votes)

36 views22 pages

Transformers Tutorial

A tutorial covering transformers model in simple terms.

Uploaded by

Rishu Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views22 pages

Transformers Tutorial

A tutorial covering transformers model in simple terms.

Uploaded by

Rishu Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Encoder-Decoder architecture and Transformers in

Rishu Kumar

August 9, 2024

Rishu Kumar Encoder-Decoder architecture in MT August 9, 2024 1 / 22

Recap

Outline

1 Recap

2 Attention is all you need

3 Encoder-Decoder models

4 Transformer

Rishu Kumar Encoder-Decoder architecture in MT August 9, 2024 2 / 22

Recap

RNN

Figure: This is RNN1

1
Image Credits: Jindřich Helcl, Jindřich Libovický; unless explicitly mentioned
Rishu Kumar Encoder-Decoder architecture in MT August 9, 2024 3 / 22
Recap

A Fancy image for RNN

Rishu Kumar Encoder-Decoder architecture in MT August 9, 2024 4 / 22

Recap

Vanilla RNN

Rishu Kumar Encoder-Decoder architecture in MT August 9, 2024 5 / 22

Recap

LSTM

Figure: LSTM: Long Short Term Memory

Rishu Kumar Encoder-Decoder architecture in MT August 9, 2024 6 / 22

Attention is all you need

Outline

1 Recap

2 Attention is all you need

3 Encoder-Decoder models

4 Transformer

Rishu Kumar Encoder-Decoder architecture in MT August 9, 2024 7 / 22

Attention is all you need

But why?

Long Short Term Memory is still unable to keep the relevant

information in long input

Attention Mechanism is introduced to mitigate this shortcoming

Transformer based models are one of the models to use attention

Recommended Reading:
1 Neural Machine Translation by Jointly Learning to Align and
Translate (Bahdanu et al. 20162 )
2 Effective Approaches to Attention-based Neural Machine
Translation (Luong et al. 2015)

2
Original paper published in 2014
Rishu Kumar Encoder-Decoder architecture in MT August 9, 2024 8 / 22
Encoder-Decoder models

Outline

1 Recap

2 Attention is all you need

3 Encoder-Decoder models

4 Transformer

Rishu Kumar Encoder-Decoder architecture in MT August 9, 2024 9 / 22

Encoder-Decoder models

Encoder-Decoder

Figure: Encoder-Decoder overview

Rishu Kumar Encoder-Decoder architecture in MT August 9, 2024 10 / 22

Encoder-Decoder models

A Better Representation

Rishu Kumar Encoder-Decoder architecture in MT August 9, 2024 11 / 22

Encoder-Decoder models

Shortcomings

Figure: Losing information from words3

3
Image Credit: [Link]
Rishu Kumar Encoder-Decoder architecture in MT August 9, 2024 12 / 22
Encoder-Decoder models

Shortcomings

Figure: Information and processing Bottleneck

Rishu Kumar Encoder-Decoder architecture in MT August 9, 2024 13 / 22

Encoder-Decoder models

Let’s introduce Attention

Attention: Probabilistic retrieval of encoder states for estimating

probability of target words.

Rishu Kumar Encoder-Decoder architecture in MT August 9, 2024 14 / 22

Encoder-Decoder models

RNN with Attention

Rishu Kumar Encoder-Decoder architecture in MT August 9, 2024 15 / 22

Encoder-Decoder models

Attention, but with Maths

Rishu Kumar Encoder-Decoder architecture in MT August 9, 2024 16 / 22

Encoder-Decoder models

Attention, but with Maths

Bahdanu et al. describes eij in a different manner

eij is an alignment model which scores how well the inputs around
position j and the output at position i match

We are just expanding the equation as described in this paper, if you

remember RNN(s), it’s the output of recurrent operations

Rishu Kumar Encoder-Decoder architecture in MT August 9, 2024 17 / 22

Encoder-Decoder models

Visualisation of RNN with attention

Rishu Kumar Encoder-Decoder architecture in MT August 9, 2024 18 / 22

Transformer

Outline

1 Recap

2 Attention is all you need

3 Encoder-Decoder models

4 Transformer

Rishu Kumar Encoder-Decoder architecture in MT August 9, 2024 19 / 22

Transformer

Finally, let’s start with Transformer

Throughout the discussion about Transformers, we will keep coming

across three words, namely key, query, value, so let’s define them.

key: vectors representing all inputs

query: hidden states of the decoder
value: encoder hidden states
The attention is defined in Attention is all you need paper as:

QK T
AT T EN T ION (K, Q, V ) = √ V
dk

Rishu Kumar Encoder-Decoder architecture in MT August 9, 2024 20 / 22

Transformer

Transformer Visualized

Rishu Kumar Encoder-Decoder architecture in MT August 9, 2024 21 / 22

Transformer

The Illustrated Transformer

[Link]

Rishu Kumar Encoder-Decoder architecture in MT August 9, 2024 22 / 22

VR Part2 Lecture 5 Annotated
No ratings yet
VR Part2 Lecture 5 Annotated
14 pages
2024 Transformer Master
No ratings yet
2024 Transformer Master
50 pages
Transformers Architecture
No ratings yet
Transformers Architecture
5 pages
Transformers
No ratings yet
Transformers
15 pages
3 2transformers
No ratings yet
3 2transformers
22 pages
Attention Is All You Need: Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit
No ratings yet
Attention Is All You Need: Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit
15 pages
Transformer Networks
No ratings yet
Transformer Networks
53 pages
Transformer
No ratings yet
Transformer
33 pages
Deep Neural Network Module 7 Attention Transformer
No ratings yet
Deep Neural Network Module 7 Attention Transformer
40 pages
Transformers in Machine Learning - GeeksforGeeks
No ratings yet
Transformers in Machine Learning - GeeksforGeeks
9 pages
Encoder-Decoder Models
No ratings yet
Encoder-Decoder Models
6 pages
Class47 49 - AttentionBasedModels Transformers 10 15may2023
No ratings yet
Class47 49 - AttentionBasedModels Transformers 10 15may2023
27 pages
Transformers: Attention Is All You Need
No ratings yet
Transformers: Attention Is All You Need
54 pages
GEN-AI Handout 1
No ratings yet
GEN-AI Handout 1
4 pages
Transformers - The Brain of ChatGPT
No ratings yet
Transformers - The Brain of ChatGPT
25 pages
Attention Is All You Need
No ratings yet
Attention Is All You Need
15 pages
Attn Is All You Need
No ratings yet
Attn Is All You Need
15 pages
Transformer
No ratings yet
Transformer
58 pages
AE556 2024 Topic7 Transformer
No ratings yet
AE556 2024 Topic7 Transformer
49 pages
Generative AI
No ratings yet
Generative AI
54 pages
Attention
No ratings yet
Attention
15 pages
Transformer
No ratings yet
Transformer
31 pages
Aiayn
No ratings yet
Aiayn
15 pages
Transformer: Attention-Only Model
No ratings yet
Transformer: Attention-Only Model
139 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
58 pages
Attention Is All You Need
No ratings yet
Attention Is All You Need
19 pages
Lecture 2
No ratings yet
Lecture 2
39 pages
Understanding Transformer Models and BERT
No ratings yet
Understanding Transformer Models and BERT
10 pages
Lesson 14 - Transformer
No ratings yet
Lesson 14 - Transformer
124 pages
Understanding Transformer Model Architectures - Practical Artificial Intelligence
No ratings yet
Understanding Transformer Model Architectures - Practical Artificial Intelligence
6 pages
Transformers 22nd April 2025
No ratings yet
Transformers 22nd April 2025
67 pages
02-Transformer Based NLP Applications
No ratings yet
02-Transformer Based NLP Applications
57 pages
Understanding Transformers in NLP
No ratings yet
Understanding Transformers in NLP
62 pages
What Is A Transformer
No ratings yet
What Is A Transformer
11 pages
Transformer en US
No ratings yet
Transformer en US
34 pages
Transformer Presentation
No ratings yet
Transformer Presentation
15 pages
15 - NEW 2020 ATTENTION ENC DEC TRANSFORMERS Lect15
No ratings yet
15 - NEW 2020 ATTENTION ENC DEC TRANSFORMERS Lect15
50 pages
01 The Transformer
No ratings yet
01 The Transformer
64 pages
Understanding Transformers in NLP
No ratings yet
Understanding Transformers in NLP
55 pages
20190630transformer 210110081057
No ratings yet
20190630transformer 210110081057
32 pages
Notes 2 Transformer Model Architecture
No ratings yet
Notes 2 Transformer Model Architecture
4 pages
A1
No ratings yet
A1
11 pages
Lecture 6 Transformers
No ratings yet
Lecture 6 Transformers
92 pages
Understanding The Transformer Archi
No ratings yet
Understanding The Transformer Archi
2 pages
NLP Week8 Transformers
No ratings yet
NLP Week8 Transformers
66 pages
Ece265p Fahmy Day7
No ratings yet
Ece265p Fahmy Day7
93 pages
Attention Is All You Need - NIPS-2017-attention-is-all-you-need-Paper
No ratings yet
Attention Is All You Need - NIPS-2017-attention-is-all-you-need-Paper
11 pages
Understanding Transformers in Deep Learning
No ratings yet
Understanding Transformers in Deep Learning
19 pages
Attention Is All You Need
No ratings yet
Attention Is All You Need
18 pages
Attention LLM
No ratings yet
Attention LLM
36 pages
Transformer Model in NLP Explained
No ratings yet
Transformer Model in NLP Explained
1 page
ISPR 25 Attention
No ratings yet
ISPR 25 Attention
45 pages
Slides Attention
No ratings yet
Slides Attention
16 pages
Tranformrerz
No ratings yet
Tranformrerz
62 pages
Attention Transformer
No ratings yet
Attention Transformer
41 pages
DL Co4 PPT-1
No ratings yet
DL Co4 PPT-1
29 pages
Transformer: Attention Mechanism Unleashed
No ratings yet
Transformer: Attention Mechanism Unleashed
15 pages
LenovoVMwareESXiISO Reference
No ratings yet
LenovoVMwareESXiISO Reference
74 pages
Node Displacement Summary: Sls - Column Deflection
No ratings yet
Node Displacement Summary: Sls - Column Deflection
1 page
Manual: English
No ratings yet
Manual: English
56 pages
2.leveraging Product Characteristics For Online Collusive Detection in Big Data Transactions
No ratings yet
2.leveraging Product Characteristics For Online Collusive Detection in Big Data Transactions
45 pages
CSCI378 - Sample Midterm Exam
No ratings yet
CSCI378 - Sample Midterm Exam
3 pages
Python Course in Pune
No ratings yet
Python Course in Pune
5 pages
Tytana Wavier Form
No ratings yet
Tytana Wavier Form
2 pages
Wa0009.
No ratings yet
Wa0009.
1 page
Syntax and Parsing in Compiler Design
No ratings yet
Syntax and Parsing in Compiler Design
27 pages
COMP 652 Project Final Paper
No ratings yet
COMP 652 Project Final Paper
10 pages
Nikunj Agrawal Formatted Resume
No ratings yet
Nikunj Agrawal Formatted Resume
2 pages
Uwo Thesis Printing
100% (3)
Uwo Thesis Printing
4 pages
Quiz1 TTL 1
No ratings yet
Quiz1 TTL 1
8 pages
HDI3000 FieldServiceManual 3639 2613 PDF
No ratings yet
HDI3000 FieldServiceManual 3639 2613 PDF
752 pages
Ai 11
100% (1)
Ai 11
141 pages
05 VLAN Principles and Configuration - 1609743790342
No ratings yet
05 VLAN Principles and Configuration - 1609743790342
36 pages
BCA 421 Java - (B)
No ratings yet
BCA 421 Java - (B)
1 page
Ranakpur Express Ticket Details
No ratings yet
Ranakpur Express Ticket Details
2 pages
SpringBoot Basics Accenture Complete
No ratings yet
SpringBoot Basics Accenture Complete
15 pages
M. Phil. in Statistics: Syllabus
No ratings yet
M. Phil. in Statistics: Syllabus
12 pages
INF1505 - Module 6 - Study Notes
No ratings yet
INF1505 - Module 6 - Study Notes
15 pages
AP3200 Brochure
No ratings yet
AP3200 Brochure
4 pages
AKASH
No ratings yet
AKASH
16 pages
Reason Report - 24 JULY 2025
No ratings yet
Reason Report - 24 JULY 2025
28 pages
Daa Question Bank Srmist
No ratings yet
Daa Question Bank Srmist
58 pages
DCD Unit Wise Important Questions
No ratings yet
DCD Unit Wise Important Questions
10 pages
TLE 10 Reviewer
No ratings yet
TLE 10 Reviewer
40 pages
Extras de Cont - 20240607 - 001720
No ratings yet
Extras de Cont - 20240607 - 001720
3 pages
Office Tools - PDF Downloads PDF
No ratings yet
Office Tools - PDF Downloads PDF
4 pages
IntelliSys 2015 7361164
No ratings yet
IntelliSys 2015 7361164
5 pages