0% found this document useful (0 votes)
46 views132 pages

Understanding Transformer Architecture

The document discusses the architecture and components of the Transformer model, focusing on the attention mechanism, encoder block, and positional encoding. It explains how the encoder processes input data and the role of multi-head attention in creating query, key, and value vectors for self-attention. Overall, it provides insights into the workings of neural networks in natural language processing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views132 pages

Understanding Transformer Architecture

The document discusses the architecture and components of the Transformer model, focusing on the attention mechanism, encoder block, and positional encoding. It explains how the encoder processes input data and the role of multi-head attention in creating query, key, and value vectors for self-attention. Overall, it provides insights into the workings of neural networks in natural language processing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

GENERATIVE AI

WEEK-4
Asst.Prof.Dr.Murat ŞİMŞEK
NLP
NLP
Transformer
Self-Attention
Seq2Seq
Seq2Seq
Seq2Seq
Attention Mechanism

The attention mechanism is a neural network


architecture that allows a deep learning
model to focus on specific and relevant
Transformer Architecture

24
LLM

25
Transformer
Transformer
Encoder
• The main purpose of the
encoder block is to provide the
necessary data for the output
we want to get by encoding
the given input data.
• This data includes the words
and context in the given
sentence.
• To perform the encoding, the
encoder block uses word
embedding, positional
Word Embedding
Positional Encoding

Positional Encoding enables the conversion of


word vectors resulting from the word
embedding layer of text data into numerical
vectors that contain the order information of
Positional Encoding
Data passing
through the
positional
encoding
layer
becomes
ready to be
processed in
the encoder
section of the
Multi-Head Attention
Multi-Head Attention The first step in
calculating self-
attention is to create
three vectors from
each of the encoder’s
input vectors (in this
case, the embedding
of each word).
So for each word, we
create a Query
vector, a Key vector,
and a Value vector.
These vectors are
created by
multiplying the
Self-Attention
Self-Attention
Self-Attention
Multi-Headed Attention
Multi-Headed Attention
Transformer Architecture

40
Transformer Architecture

41
Transformer Architecture

42
Transformer Architecture

43
Transformer Architecture

44
Transformer Architecture

45
Transformer Architecture

46
Transformer Architecture

47
Transformer Architecture

48
Transformer Architecture

49
Transformer Architecture

50
Transformer Architecture

51
Transformer Architecture

52
Transformer Architecture

53
Transformer Architecture

54
Transformer Architecture

55
Transformer Architecture

56
Transformer Architecture

57
Transformer Architecture

58
Transformer Architecture

59
Transformer Architecture

60
Transformer Architecture

61
Transformer Architecture

62
Transformer Architecture

63
Transformer Architecture

64
Transformer Architecture

65
Attention

66
Attention

67
Attention

68
Attention

69
Attention

70
Attention

71
Attention

72
Attention

73
Attention

74
Attention

75
Self-Attention

76
Self-Attention

77
Self-Attention

78
Self-Attention

79
Self-Attention

80
Self-Attention

81
Self-Attention

82
Self-Attention

83
Self-Attention

84
Self-Attention

85
Self-Attention

86
87
88
89
90
91
92
93
94
95
96
97
98
99
10
0
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
8
10
9
11
0
11
1
11
2
11
3
11
4
11
5
11
6
11
7
11
8
11
9
12
0
12
1
12
2
12
3
12
4
12
5
12
6
12
7
12
8
12
9
13
0
13
1
13
2

You might also like