0% found this document useful (0 votes)
93 views76 pages

Unit 5

Uploaded by

dengduchupaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views76 pages

Unit 5

Uploaded by

dengduchupaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Unit 5

Sequence Modelling – Recurrent and


Recursive Nets
Sequence Modelling
• Sequence modeling is a task of modeling sequential data. Modeling sequence data is
when you create a mathematical notion to understand and study sequential data, and
use those understandings to generate, predict or classify the same for a specific
application.
• Sequential data has three properties:
• Elements in the sequence can repeat
• It follows order (contextual arrangement)
• Length of data varies (potentially infinitely)
• Example of sequential data are:
• Text and sentences, Audio (eg. speech and music), Motion pictures or videos, Time-
series data (eg. Stock market data), DNA sequence, protein structures ,Material
composition , Decision-making
• Sequence data is difficult to model because of its properties, and it
requires a different method. For instance, if sequential data is fed
through a feed-forward network, it might not be able to model it well,
because sequential data has variable length.
• The feed-forward network works well with fixed-size input, and
doesn’t take structure into account well.
• Convolutional neural networks, on the other hand, were created to
process structures, or grids of data, such as an image.
• They can deal with long sequences of data, but are limited by the fact
that they can’t order the sequence correctly.
Recurrent Neural Network
• Recurrent neural networks are used to model sequential data with the time step index t, and
incorporate the technique of context vectorizing.
• Recurrent Neural Network(RNN) is a type of Neural Network where the output from the
previous step is fed as input to the current step.
• In traditional neural networks, all the inputs and outputs are independent of each other. Still, in
cases when it is required to predict the next word of a sentence, the previous words are
required and hence there is a need to remember the previous words.
• Thus RNN came into existence, which solved this issue with the help of a Hidden Layer.
• The main and most important feature of RNN is its Hidden state, which remembers some
information about a sequence.
• The state is also referred to as Memory State since it remembers the previous input to the
network. It uses the same parameters for each input as it performs the same task on all the
inputs or hidden layers to produce the output. This reduces the complexity of parameters,
unlike other neural networks.
• Like many other deep learning algorithms, recurrent neural networks are relatively old.
They were initially created in the 1980s, but only in recent years have we seen their true
potential. An increase in computational power along with the massive amounts of data
that we now have to work with, and the invention of long short-term memory (LSTM) in
the 1990s, has really brought RNNs to the foreground.
• Because of their internal memory, RNNs can remember important things about the input
they received, which allows them to be very precise in predicting what’s coming next.
• This is why they’re the preferred algorithm for sequential data like time series, speech,
text, financial data, audio, video, weather and much more. Recurrent neural networks
can form a much deeper understanding of a sequence and its context compared to other
algorithms.
• Since RNNs are being used in the software behind Siri and Google Translate, recurrent
neural networks show up a lot in everyday life.
How Do Recurrent Neural
Networks Work?
• Sequential data is basically just ordered data in which related things
follow each other. Examples are financial data or the DNA sequence.
The most popular type of sequential data is perhaps time series data,
which is just a series of data points that are listed in time order.
• RNNs and feed-forward neural networks get their names from the way they
channel information.
• In a feed-forward neural network, the information only moves in one direction
— from the input layer, through the hidden layers, to the output layer. The
information moves straight through the network.
• Feed-forward neural networks have no memory of the input they receive and
are bad at predicting what’s coming next. Because a feed-forward network only
considers the current input, it has no notion of order in time. It simply can’t
remember anything about what happened in the past except its training.
• In an RNN, the information cycles through a loop. When it makes a decision, it
considers the current input and also what it has learned from the inputs it
received previously.
• A usual RNN has a short-term memory. In combination with an LSTM they
also have a long-term memory .
• Another good way to illustrate the concept of a recurrent neural network’s
memory is to explain it with an example: Imagine you have a normal feed-
forward neural network and give it the word “neuron” as an input and it
processes the word character by character. By the time it reaches the
character “r,” it has already forgotten about “n,” “e” and “u,” which makes it
almost impossible for this type of neural network to predict which character
would come next.
• A recurrent neural network, however, is able to remember those characters
because of its internal memory. It produces output, copies that output and
loops it back into the network.
• Therefore, an RNN has two inputs: the present and the recent past.
This is important because the sequence of data contains crucial
information about what is coming next, which is why an RNN can do
things other algorithms can’t.
• A feed-forward neural network assigns, like all other deep learning
algorithms, a weight matrix to its inputs and then produces the
output.
• Note that RNNs apply weights to the current and also to the previous
input. Furthermore, a recurrent neural network will also tweak the
weights for both gradient descent and backpropagation through time.
Types of Recurrent Neural
Networks
• One to One
• One to Many
• Many to One
• Many to Many
• 1. One-to-One
• The simplest type of RNN is One-to-One, which allows a single input
and a single output. It has fixed input and output sizes and acts as a
traditional neural network. The One-to-One application can be found
in Image Classification.
• 2. One-to-Many
• One-to-Many is a type of RNN that gives multiple
outputs when given a single input. It takes a fixed input
size and gives a sequence of data outputs. Its
applications can be found in Music
Generation and Image Captioning.
• 3. Many-to-One
• Many-to-One is used when a single output is required from multiple
input units or a sequence of them. It takes a sequence of inputs to
display a fixed output. Sentiment Analysis is a common example of
this type of Recurrent Neural Network.
• 4. Many-to-Many
• Many-to-Many is used to generate a sequence of output data from a
sequence of input units.
• This type of RNN is further divided into the following two
subcategories:
• 1. Equal Unit Size: In this case, the number of both the input and
output units is the same. A common application can be found in
Name-Entity Recognition.
• 2. Unequal Unit Size: In this case, inputs and outputs have different
numbers of units. Its application can be found in Machine Translation.
Recurrent Neuron and RNN
Unfolding
• The fundamental processing unit in a Recurrent Neural Network (RNN) is a
Recurrent Unit, which is not explicitly called a “Recurrent Neuron.”
• This unit has the unique ability to maintain a hidden state, allowing the network
to capture sequential dependencies by remembering previous inputs while
processing.
• Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) versions
improve the RNN’s ability to handle long-term dependencies.
RNN Architecture
• RNNs are a type of neural network that has hidden states and allows past outputs to be
used as inputs. They usually go like this:
• Here’s a breakdown of its key components:
• Input Layer: This layer receives the initial element of the sequence data. For example,
in a sentence, it might receive the first word as a vector representation.
• Hidden Layer: The heart of the RNN, the hidden layer contains a set of interconnected
neurons. Each neuron processes the current input along with the information from the
previous hidden layer’s state. This “state” captures the network’s memory of past
inputs, allowing it to understand the current element in context.
• Activation Function: This function introduces non-linearity into the network, enabling it
to learn complex patterns. It transforms the combined input from the current input
layer and the previous hidden layer state before passing it on.
• Output Layer: The output layer generates the
network’s prediction based on the processed
information. In a language model, it might predict the
next word in the sequence.
• Recurrent Connection: A key distinction of RNNs is
the recurrent connection within the hidden layer. This
connection allows the network to pass the hidden state
information (the network’s memory) to the next time
step. It’s like passing a baton in a relay race, carrying
information about previous inputs forward
• RNNs have the same input and output architecture as any other deep
neural architecture.
• However, differences arise in the way information flows from input to
output.
• Unlike Deep neural networks where we have different weight
matrices for each Dense network in RNN, the weight across the
network remains the same. It calculates state hidden state Hi for
every input Xi .
• By using the following formulas:
• h= σ(UX + Wh-1 + B)
• Y = O(Vh + C)
• Hence
• Y = f (X, h , W, U, V, B, C)
• Here S is the State matrix which has element si as the state of the
network at timestep i
• The parameters in the network are W, U, V, C,B which are shared
across timestep
Working of RNN
• The Recurrent Neural Network consists of multiple fixed activation
function units, one for each time step.
• Each unit has an internal state which is called the hidden state of the
unit. This hidden state signifies the past knowledge that the network
currently holds at a given time step.
• This hidden state is updated at every time step to signify the change
in the knowledge of the network about the past.
• 1) holds information about the previous words in the sequence. As you
can see, h_t is calculated using the previous h_(t-1) vector and current
word vector x_t. We also apply a non-linear activation function f
(usually tanh or sigmoid) to the final summation. It is acceptable to
assume that h_0 is a vector of zeros.
• 2) calculates the predicted word vector at a given time step t. We use
the SoftMax function to produce a (V,1) vector with all elements
summing up to 1. This probability distribution gives us the index of the
most likely next word from the vocabulary.
• 3)uses the cross-entropy loss function at each time step t to calculate
the error between the predicted and actual word.
Example:
• First, we need to train the network using a large dataset. For the
purpose, we can choose any large text .When done training, we can
input the sentence “Napoleon was the Emperor of…” and expect a
reasonable prediction based on the knowledge from the book.
• we input one example at a time and produce one result, both of which
are single words.
• The difference with a feedforward network comes in the fact that we
also need to be informed about the previous inputs before evaluating
the result.
• So you can view RNNs as multiple feedforward neural networks, passing
information from one to the other.
• Here x_1, x_2, x_3, …, x_t represent the input words from the text,
y_1, y_2, y_3, …, y_t represent the predicted next words and h_0,
h_1, h_2, h_3, …, h_t hold the information for the previous input
words.
• Since plain text cannot be used in a neural network, we need to
encode the words into vectors.
• The best approach is to use word embeddings (word2vec or GloVe).
For example, if our vocabulary is apple, apricot, banana, …, king, …
zebra and the word is banana, then the vector is [0, 0, 1, …, 0, …, 0].
• If you are wondering what these W’s are, each of them represents the weights of
the network at a certain stage.
• As mentioned above, the weights are matrices initialized with random elements,
adjusted using the error from the loss function.
• We do this adjusting using back-propagation algorithm which updates the
weights.
• Once we have obtained the correct weights, predicting the next word in the
sentence “Napoleon was the Emperor of…” is quite straightforward. Plugging
each word at a different time step of the RNN would produce h_1, h_2, h_3, h_4.
We can derive y_5 using h_4 and x_5 (vector of the word “of”).
• If our training was successful, we should expect that the index of the largest
number in y_5 is the same as the index of the word “France” in our vocabulary.
Steps of RNN
• 1.Sequence Input:
• RNNs take a sequence of inputs as their primary data format. Each
element in the sequence corresponds to a time step, and the goal is
to process the sequence step by step to learn patterns and
relationships.
• 2. Hidden State:
• At each time step, an RNN maintains a hidden state vector, which
captures information about the current input and the previous hidden
state. This hidden state serves as the memory of the network and
allows it to maintain context from earlier time steps.
• 3. Input and Hidden State Interaction:
• The input at each time step is combined with the previous
hidden state to produce a new hidden state. This combination
involves matrix multiplication and activation functions, often a
hyperbolic tangent (tanh) or rectified linear unit (ReLU).
• 4. Updating Hidden State
• 5.Sequence Processing:
• The RNN processes the entire sequence by iteratively
updating the hidden state for each time step. As it progresses
through the sequence, the hidden state accumulates
information about the previous inputs and hidden states.
• 6. Output Generation:
• Depending on the task, an RNN can produce an output at each
time step or only at the final time step. For example, in
language modeling, an output might be generated at each step
to predict the next word, whereas in sequence classification, the
final hidden state might be used to produce the final prediction.
• 7. Backpropagation Through Time (BPTT):
• To train an RNN, the process of backpropagation is extended
through time. Errors are propagated backward from the output
predictions to the initial time step, updating the weights to
minimize the difference between predicted and actual values.
Backpropagation Through Time
(BPTT)
• Recurrent Neural Networks are those networks that deal with
sequential data.
• They predict outputs using not only the current inputs but also by
taking into consideration those that occurred before it.
• In other words, the current output depends on current output as well
as a memory element (which takes into account the past inputs).
• For training such networks, we use good old backpropagation but
with a slight twist. We don’t independently train the system at a
specific time “t”. We train it at a specific time “t” as well as all that has
happened before time “t” like t-1, t-2, t-3
• In a traditional feedforward neural network, the data flows through the network
in one direction, from the input layer through the hidden layers to the output
layer.
• However, in RNNs, there are connections between nodes in different time steps,
which means that the output of the network at one time step depends on the
input at that time step as well as the previous time steps.
• BPTT works by unfolding the RNN over time, creating a series of interconnected
feedforward networks. Each time step corresponds to one layer in this unfolded
network, and the weights between layers are shared across time steps. The
unfolded network can be thought of as a very deep feedforward network, where
the weights are shared across layers.
• During training, the error is backpropagated through the unfolded network, and
the weights are updated using gradient descent. This allows the network to learn
to predict the output at each time step based on the input at that time step as
well as the previous time steps.
• S1, S2, S3 are the hidden states or memory units at time t1, t2, t3
respectively, and Ws is the weight matrix associated with it.
• X1, X2, X3 are the inputs at time t1, t2, t3 respectively, and Wx is the
weight matrix associated with it.
• Y1, Y2, Y3 are the outputs at time t1, t2, t3 respectively, and Wy is the
weight matrix associated with it. For any time, t, we have the
following two equations:
• where g1 and g2 are activation functions.
• Let us now perform back propagation at time t = 3. Let the error
function be:
Adjusting Wy
• We are using the squared error here, where d3 is the desired output
at time t = 3.
• To perform back propagation, we have to adjust the weights
associated with inputs, the memory units and the outputs. Adjusting
Wy For better understanding.
Explanation: E3 is a function of Y3. Hence, we differentiate E3 w.r.t Y3. Y3 is a function of WY. Hence,
we differentiate Y3 w.r.t WY. Adjusting Ws For better understanding
Adjusting Ws

Explanation: E3 is a function of Y3. Hence, we differentiate E3 w.r.t Y3. Y3 is a function of S3. Hence, we
differentiate Y3 w.r.t S3. S3 is a function of WS. Hence, we differentiate S3 w.r.t WS. But we can’t stop with this;
we also have to take into consideration, the previous time steps. So, we differentiate (partially) the Error
function with respect to memory units S2 as well as S1 taking into consideration the weight matrix WS. We
have to keep in mind that a memory unit, say St is a function of its previous memory unit St-1. Hence, we
differentiate S3 with S2 and S2 with S1.
• Generally, we can express this formula as:
Adjusting WX

Explanation: E3 is a function of Y3. Hence, we


differentiate E3 w.r.t Y3. Y3 is a function of S3.
Hence, we differentiate Y3 w.r.t S3. S3 is a function
of WX. Hence, we differentiate S3 w.r.t WX. Again
we can’t stop with this; we also have to take into
consideration, the previous time steps. So, we
differentiate (partially) the Error function with
respect to memory units S2 as well as S1 taking into
consideration the weight matrix WX
Limitations of BPTT

• This technique that uses the back Propagation over time (BPTT) is a
method that can be employed for a limited amount of time intervals,
like 8 or 10.
• If we continue to backpropagate and the gradient gets too small. This
is known as the "Vanishing gradient" problem.
• This is because the value of information diminishes geometrically with
time. Therefore, if the number of time steps is greater than 10 (Let's
say), the data is effectively discarded.
• Going Beyond RNNs:
• One of the most famous solutions to this issue is using what's known
as Long-Short-Term Memory (LSTM for short) cells instead of
conventional RNN cells. However, there could be another issue,
referred to as the explosion gradient problem, in which the gradient
becomes uncontrollably high.
• Solution:
• A well-known method is known as gradient clipping when for each
time step, we will determine if the gradient δ is greater than the
threshold. If it is, then we should normalize it.
Bi-directional Recurrent Neural
Network
• Bi-directional recurrent neural networks (Bi-RNNs) are artificial neural
networks that process input data in both the forward and backward
directions.
• They are often used in natural language processing tasks, such as
language translation, text classification, and named entity recognition.
• In addition, they can capture contextual dependencies in the input
data by considering past and future contexts.
• Bi-RNNs consist of two separate RNNs that process the input data in
opposite directions, and the outputs of these RNNs are combined to
produce the final output.
• The goal of a Bi-RNN is to capture the contextual dependencies in the
input data by processing it in both directions, which can be useful in
various natural language processing (NLP) tasks.
• One common way to combine the outputs of the forward and reverse
RNNs is to concatenate them. Still, other methods, such as element-
wise addition or multiplication, can also be used. The choice of
combination method can depend on the specific task and the desired
properties of the final output.
• The BRNN functions similarly to conventional recurrent neural networks in the
forward direction, updating the hidden state depending on the current input
and the prior hidden state at each time step.
• The backward hidden layer, on the other hand, analyses the input sequence in
the opposite manner, updating the hidden state based on the current input and
the hidden state of the next time step.
• Compared to conventional unidirectional recurrent neural networks, the
accuracy of the BRNN is improved since it can process information in both
directions and account for both past and future contexts.
• Because the two hidden layers can complement one another and give the final
prediction layer more data, using two distinct hidden layers also offers a type of
model regularization.
• In order to update the model parameters, the gradients are computed
for both the forward and backward passes of the backpropagation
through the time technique that is typically used to train BRNNs.
• The input sequence is processed by the BRNN in a single forward pass
at inference time, and predictions are made based on the combined
outputs of the two hidden layers.
Working of Bidirectional
Recurrent Neural Network
• Inputting a sequence: A sequence of data points, each represented as
a vector with the same dimensionality, are fed into a BRNN. The
sequence might have different lengths.
• Dual Processing: Both the forward and backward directions are used
to process the data. On the basis of the input at that step and the
hidden state at step t-1, the hidden state at time step t is determined
in the forward direction. The input at step t and the hidden state at
step t+1 are used to calculate the hidden state at step t in a reverse
way.
• Computing the hidden state: A non-linear activation function on the
weighted sum of the input and previous hidden state is used to
calculate the hidden state at each step. This creates a memory
mechanism that enables the network to remember data from earlier
steps in the process.
• Determining the output: A non-linear activation function is used to
determine the output at each step from the weighted sum of the
hidden state and a number of output weights. This output has two
options: it can be the final output or input for another layer in the
network.
• Training: The network is trained through a supervised learning
approach where the goal is to minimize the discrepancy between the
predicted output and the actual output. The network adjusts its
weights in the input-to-hidden and hidden-to-output connections
during training through backpropagation.
Applications of Bidirectional
Recurrent Neural Network
• Bi-RNNs have been applied to various natural language processing (NLP)
tasks, including:
• Sentiment Analysis: By taking into account both the prior and
subsequent context, BRNNs can be utilized to categorize the sentiment
of a particular sentence.
• Named Entity Recognition: By considering the context both before and
after the stated thing, BRNNs can be utilized to identify those entities in
a sentence.
• Part-of-Speech Tagging: The classification of words in a phrase into their
corresponding parts of speech, such as nouns, verbs, adjectives, etc., can
be done using BRNNs.
• Machine Translation: BRNNs can be used in encoder-decoder models
for machine translation, where the decoder creates the target
sentence and the encoder analyses the source sentence in both
directions to capture its context.
• Speech Recognition: When the input voice signal is processed in both
directions to capture the contextual information, BRNNs can be used
in automatic speech recognition systems.
Advantages of Bidirectional RNN
• Context from both past and future: With the ability to process
sequential input both forward and backward, BRNNs provide a
thorough grasp of the full context of a sequence. Because of this,
BRNNs are effective at tasks like sentiment analysis and speech
recognition.
• Enhanced accuracy: BRNNs frequently yield more precise answers
since they take both historical and upcoming data into account.
• Efficient handling of variable-length sequences: When compared to
conventional RNNs, which require padding to have a constant length,
BRNNs are better equipped to handle variable-length sequences.
• Resilience to noise and irrelevant information: BRNNs may be
resistant to noise and irrelevant data that are present in the data. This
is so because both the forward and backward paths offer useful
information that supports the predictions made by the network.
• Ability to handle sequential dependencies: BRNNs can capture long-
term links between sequence pieces, making them extremely adept at
handling complicated sequential dependencies.
Disadvantages of Bidirectional
RNN
• Computational complexity: Given that they analyze data both
forward and backward, BRNNs can be computationally expensive due
to the increased amount of calculations needed.
• Long training time: BRNNs can also take a while to train because
there are many parameters to optimize, especially when using huge
datasets.
• Difficulty in parallelization: Due to the requirement for sequential
processing in both the forward and backward directions, BRNNs can
be challenging to parallelize.
• Overfitting: BRNNs are prone to overfitting since they include many
parameters that might result in too complicated models, especially
when trained on short datasets.
• Interpretability: Due to the processing of data in both forward and
backward directions, BRNNs can be tricky to interpret since it can be
difficult to comprehend what the model is doing and how it is
producing predictions.
Encoder-decoder sequence to
sequence architectures
• In Deep Learning, Many Complex problems can be solved by constructing better
neural network architecture.
• Seq2Seq model or Sequence-to-Sequence model, is a machine learning
architecture designed for tasks involving sequential data.
• The RNN(Recurrent Neural Network) and its variants are much useful in sequence
to sequence learning. The RNN variant LSTM (Long Short-term Memory) is the
most used cell in seq-seq learning tasks.
• The encoder-decoder architecture for recurrent neural networks is the standard
neural machine translation method that rivals and in some cases outperforms
classical statistical machine translation methods.
• This architecture is very new, having only been pioneered in 2014, although, has
been adopted as the core technology inside Google’s translate service.
• At the core of seq2seq models lies the attention mechanism, a game-
changer that allows the decoder to focus dynamically on the most
relevant parts of the input sequence.
• This attention-based approach boosts accuracy and provides valuable
insights into the model’s decision-making process. While classical
seq2seq models faced hurdles with long sequences, the advent of
transformers using self-attention has pushed the boundaries further.
Encoder-Decoder Model
• The seq2seq models are encoder-decoder models.
• The encoder processes the input sequence and transforms it into a fixed-size
hidden representation.
• The decoder uses the hidden representation to generate output sequence.
• The encoder-decoder structure allows them to handle input and output
sequences of different lengths, making them capable to handle sequential
data.
• Seq2Seq models are trained using a dataset of input-output pairs, where the
input is a sequence of tokens, and the output is also a sequence of tokens.
• The model is trained to maximize the likelihood of the correct output
sequence given the input sequence.
• Encoder Block
• The main purpose of the encoder block is to process the input sequence and capture
information in a fixed-size context vector.
• Architecture:
• The input sequence is put into the encoder.
• The encoder processes each element of the input sequence using neural networks (or
transformer architecture).
• Throughout this process, the encoder keeps an internal state, and the ultimate hidden
state functions as the context vector that encapsulates a compressed representation of
the entire input sequence. This context vector captures the semantic meaning and
important information of the input sequence.
• The final hidden state of the encoder is then passed as the context vector to the decoder.
• Decoder Block
• The decoder block is similar to encoder block. The decoder processes the context vector from
encoder to generate output sequence incrementally.
• Architecture:
• In the training phase, the decoder receives both the context vector and the desired target output
sequence (ground truth).
• During inference, the decoder relies on its own previously generated outputs as inputs for
subsequent steps.
• The decoder uses the context vector to comprehend the input sequence and create the
corresponding output sequence. It engages in autoregressive generation, producing individual
elements sequentially. At each time step, the decoder uses the current hidden state, the context
vector, and the previous output token to generate a probability distribution over the possible next
tokens. The token with the highest probability is then chosen as the output, and the process
continues until the end of the output sequence is reached.
• The decoder and encoder architecture utilizes RNNs to generate
desired outputs. Let’s look at the simplest seq2seq model.
• For a given sequence of inputs , a RNN generates a
sequence of outputs
• Recurrent Neural Networks can easily map sequences to sequences
when the alignment between the inputs and the outputs are known
in advance.
• Although the vanilla version of RNN is rarely used, its more advanced
version i.e. LSTM or GRU is used.
• This is because RNN suffers from the problem of vanishing gradient.
• LSTM develops the context of the word by taking 2 inputs at each
point in time. One from the user and the other from its previous
output, hence the name recurrent (output goes as input).
• Advantages of seq2seq Models
• Flexibility: Seq2Seq models can handle a wide range of tasks such as
machine translation, text summarization, and image captioning, as
well as variable-length input and output sequences.
• Handling Sequential Data: Seq2Seq models are well-suited for tasks
that involve sequential data such as natural language, speech, and
time series data.
• Handling Context: The encoder-decoder architecture of Seq2Seq
models allows the model to capture the context of the input
sequence and use it to generate the output sequence.
• Attention Mechanism: Using attention mechanisms allows the model
to focus on specific parts of the input sequence when generating the
output, which can improve performance for long input sequences.
• Disadvantages of seq2seq Models
• Computationally Expensive: Seq2Seq models require significant
computational resources to train and can be difficult to optimize.
• Limited Interpretability: The internal workings of Seq2Seq models
can be difficult to interpret, which can make it challenging to
understand why the model is making certain decisions.
• Overfitting: Seq2Seq models can overfit the training data if they are
not properly regularized, which can lead to poor performance on new
data.
• Handling Rare Words: Seq2Seq models can have difficulty handling
rare words that are not present in the training data.
• Handling Long input Sequences: Seq2Seq models can have difficulty
handling input sequences that are very long, as the context vector
may not be able to capture all the information in the input sequence.
• Applications of Seq2Seq model
• Text Summarization: The seq2seq model effectively understands the
input text which makes it suitable for news and document
summarization.
• Speech Recognition: Seq2Seq model, especially those with attention
mechanisms, excel in processing audio waveform for ASR. They are
able to capture spoken language patterns effectively.
• Image Captioning: The seq2seq model integrate image features from
CNNs with textual generation capabilities for image captioning. They
are capable to describe images in a human readable format.
Long short-term memory
(LSTM)
• Self study

You might also like