0% found this document useful (0 votes)
108 views43 pages

Abstractive Text Summarization Using Deep Learning

This document presents a project on abstractive text summarization using deep learning. The project aims to build a text summarizer that takes a long text as input and generates a short summary as output. An encoder-decoder architecture with attention mechanism is used to model the sequence-to-sequence problem. The Amazon fine food reviews dataset is used, and BLEU score is employed as the evaluation metric. The model achieves a BLEU score of 0.8265 on the training set and 0.7730 on the test set.

Uploaded by

Arunima Maitra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
108 views43 pages

Abstractive Text Summarization Using Deep Learning

This document presents a project on abstractive text summarization using deep learning. The project aims to build a text summarizer that takes a long text as input and generates a short summary as output. An encoder-decoder architecture with attention mechanism is used to model the sequence-to-sequence problem. The Amazon fine food reviews dataset is used, and BLEU score is employed as the evaluation metric. The model achieves a BLEU score of 0.8265 on the training set and 0.7730 on the test set.

Uploaded by

Arunima Maitra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 43

Abstractive Text Summarization

Using Deep Learning

ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING LAB (18CS62)

PROJECT REPORT

Submitted by

Arunima Maitra 1RV18CS029


Ayush Daga 1RV18CS034

Under the guidance of

Prof. Manonmani S, Assistant Professor

Department of
Computer Science
Engineering 2020-2021
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CERTIFICATE
Certified that the project work titled Abstractive Text Summarization Using Deep
Learning is carried out by Arunima Maitra (1RV18CS029) and Ayush Daga
(1RV18CS034) in partial fulfilment of the completion of the course Artificial Intelligence
and Machine Learning (18CS62) of the VI Sem, Computer Science Engineering program,
during the academic year 2020-2021. It is certified that all corrections/suggestions indicated
for the Internal Assessment have been incorporated in the project report and duly approved
by the laboratory faculty.

Signature of Internal Guide Signature of Head of the Department


Prof. Manonmani S Dr. Ramakanth Kumar P

External Viva

Name of Examiners Signature with Date

1
2
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

DECLARATION

We, the students of sixth semester B.E., Department of Computer Science and Engineering,
RV College of Engineering, Bengaluru-59, hereby declare that the project titled ‘Abstractive
Text Summarization Using Deep Learning’ has been carried out by us and submitted in
fulfilment for the completion of the course Artificial Intelligence and Machine Learning
(18CS62) of the VI Sem during the year 2020-21.Further we declare that the content of the
report is originally prepared by us and also we have acknowledged all the sources used for
preparation of this report.

Place: Bengaluru Date:

Name Signature
1. Arunima Maitra (1RV18CS029)

2. Ayush Daga (1RV18CS034)


Acknowledgement

We are indebted to our guide, Manonmani S, Assistant Professor , Dept of CSE , for the
wholehearted support, suggestions and invaluable advice throughout our project work and
also helped in the preparation of this report.

My sincere thanks to Dr. Ramakanth Kumar P., Professor and Head, Department of
Computer Science and Engineering, RVCE for his valuable suggestions and expert advice
along with support and encouragement.

First and foremost I would like to thank Dr. Subramanya. K. N, Principal, R.V.C.E,
Bengaluru, for his moral support towards completing our project work.

I would also like to thank my Parents, and all the Faculty members of the department of
Computer Science & Engineering, for their constant support and encouragement.

Last, but not the least, I would like to thank my peers and friends who provided me with
valuable suggestions to improve my project.
ABSTRACT

The volume of information and published data on the internet is growing day by day. It has become
tough for the users to go through the loads of information to analyse it to draw conclusions.
Therefore, accessing and reading the required information in the shortest possible time are becoming
constantly an open research matter. Text summarization solves this problem by generating a
summary, selecting sentences which are most important from the document without losing any
valuable information. In this process, the extracted information is generated as a comprehensive
report and presented as a concise summary to the user. Text summarization approaches can be
majorly classified into two types- Extractive Summarization and Abstractive Summarization.
However, this project primarily focuses on the abstractive text summarization method. Key
challenges in text summarization include topic identification, interpretation, summary generation,
and evaluation of the generated summary. Automatic text summarization techniques are immensely
required to address the ever-growing amount of text data available online to both better help
discover all the relevant information and also to consume relevant information faster.

Abstractive text summarization is an approach of summarizing the text by generating new sentences
from the original text. Here the objective is to build a text summarizer where the input is a long
sequence of words (in a text body), and the output is a short summary which is a sequence as well.
Hence, we can model this problem as a Many-to-Many Seq2Seq problem. The Encoder-Decoder
architecture is mainly used to solve the sequence-to-sequence problems where the input and output
sequences are of different lengths. Variants of RNN are preferred as the encoder and decoder
components. However, a potential issue with this encoder decoder approach is that a neural network
needs to be able to compress all the necessary information of a source sentence into a fixed-length
vector. This may make it difficult for the neural network to cope with long sentences. As a result, the
performance of a basic encoder-decoder deteriorates rapidly as the length of an input sentence
increases. Therefore, to solve this issue, a mechanism known as attention mechanism can be used.
Attention mechanism targets to predict a word by looking at a few specific parts of the sequence
only, rather than the entire sequence.

The evaluation metric used in this model is BLEU- Bilingual Evaluation Understudy metric. The BLEU score
for the training set is 0.8265,while the BLEU score for the test data set is 0.7730.
List of Acronyms:
AI - Artificial Intelligence
ML - Machine Learning
RNN - Recurrent Neural Network
NLP - Natural Language Processing
Seq2Seq - Sequence to sequence
DL - Deep Learning
LSTM - Long Short Term Memory
NLTK - Natural Language Toolkit

List of figures:

Figure 1: Encoder architecture 09


Figure 2: Decoder architecture 10
Figure 3: Inference phase 10
Figure 4: Snapshot of Amazon fine food reviews dataset 13
Figure 5: Snapshot of Amazon fine food reviews dataset 14
Figure 6: Architectural flow of abstractive text summarization model 18
Figure 7: Data preprocessing steps 19
Figure 8: Result obtained from the model 21
Figure 9: Result obtained from the model 22
Figure 10: Result obtained from the model 22
Figure 11: Result obtained from the model 23
Figure 12: Result obtained from the model 23
TABLE OF CONTENTS

Abstract I

List of acronyms II

List of figures III

1. Introduction 1
1.1. Project Domain and Problem addressing 1
1.2 Issues and Challenges 2
1.3 Need for AI-based solutions 2
1.4. Problem Statement 3
1.5. Project objectives 3
1.6. Summary 3

2. Literature Study 4

3. Design Details 9
3.1 Architecture 9
3.2 Methodology 11
3.3 Data set details 12
3.4 ML/DL techniques used 14
3.5 Hardware and Software requirements 15

4. Implementation details of the Project 16


4.1 Language/Tools/APIs used 16
4.2 Workflow diagrams 18
4.3 Data preprocessing 19
4.4 Validation methodology 20

5. Results and Analysis 21

6. Conclusion and Future Enhancement 24


6.1 Novelty in the proposed solution 24
6.2 Limitations of the Project 24
6.3. Future Enhancements 25
6.4. Summary 26

References 27
Appendices
Appendix A: Screenshots 28
Appendix B: Printout of the base paper used for implementation of this project 31
Chapter 1
INTRODUCTION

1.1 Project Domain and Problem addressing

After the rise of the internet, massive amounts of data generation became a regular scenario. The
internet is the source of data. It houses information related to numerous domains such as
entertainment, health, politics, world, medicine, education, literacy, news, topology etc, and is
accessible on the internet. Data involves various forms such as images, pictures, videos,
mathematical, statistical, numerical, and textual data. Among these text data is quite hard to interpret
since it involves huge amounts of characters, hence it needs a system to extract the vital parts of the
information we need, Text summarization is one way to achieve this. The problem domain for the
proposed project is ‘Text Summarization’. Summarization is the task of condensing a piece of text
to a shorter form, hence reducing the size of the initial text while at the same time preserving key
informational elements and the meaning of content. In general, there are two different approaches
used for the purpose of automatic summarization: extraction and abstraction based methods. In case
of Extractive text summarization, it picks up sentences directly from the document based on a
scoring function to form a coherent summary. This method works by identifying important sections
of the text cropping out and stitching together portions of the content to produce a condensed
version. While, Abstractive text summarization methods aim at producing summary by interpreting
the text using advanced NLP techniques in order to generate a new shorter text, parts of which may
not appear as part of the original document, that conveys the most critical information from the
original text, requiring rephrasing sentences and incorporating information from full text to generate
summaries such as a human-written abstract usually does. The method proposed in this paper is to
create an automatic text summarizer which follows the abstractive method and is able to respond to
some short length text summary. AI can enable humans to obtain key information from increasing
huge data with the advantaged development of technology and solve the problem of generating titles
(short summaries). Deep learning analyses complex problems to facilitate the decision-making
process and attempts to imitate what the human brain can achieve by extracting features at different
levels of abstraction. There are important applications for text summarization in various NLP related
tasks, such as medical cases - Summarization of Electronic health records, text classification,
question answering and bots, and internal document workflow summarization.

1
1.2 Issues and Challenges

The main objective of the project is to implement an abstractive text summarizer using advanced
NLP techniques to reduce the given text to a smaller number of sentences without leaving out the
main ideas of the original text. Automatic text summarization is applied in different domains, for
example search engines apply the idea in reviews of textual resources, news websites use it for
condensing the information to the shorter sort of news. Semantic similarity has a vital part in many
natural language processing tasks and in many associated areas such as text summarization, text
classification, document clustering etc. Word2Vec is a model in natural language processing used to
represent the words in terms of vectors. These vectors are then used for achieving several tasks in
natural language processing. One of the tasks in text summarization is that the similarity between the
words can be calculated by applying mathematical functions on these vectors. The couple of words
considered for semantic similarity calculation are transformed into vectors, the rate of closeness is
computed using cosine similarity. It is calculated by obtaining the angle between the vectors using
cosine function. Semantic similarity measures can be categorized as knowledge based and corpus
based. The knowledge-based extracts meaning from the sentence using details from semantic storage
area. The corpus-based methods process the document and fetch important information from it, and
these are used for calculating the similarity among the words, phrases or sentences.

1.3 Need for AI-based solutions

Automatically generated text summaries help reduce reading time, are non-biased compared to
human authored summaries, and could also be beneficial for a lot of personal day to day applications
like email summarization, for posting on social media sites like Twitter, and more. Automatic text
summarization is an extremely challenging task as the AI model needs to understand the
domain-specific language of the source text ( for example- scientific paper, news article, legal
document), then identify the salient features of the source text (like in case of scientific paper -
problem statement, novel contributions, methodology, future work, references), and finally produce a
coherent summary that is faithful to the essence of the article. The shorter the summary, the more it
will have to abstract over the source text. Although extractive text summarization was the most
popular approach over the past few years, recent progress in Natural language processing (NLP) has
enabled the rise of more abstractive summarization models. Therefore, to conclude, automatic text
summarization is an interesting and exciting research problem with a wide range of industry
applications like news highlights, entity-centric summarization for search engines, legal contract
analysis, question answering systems, and more. The recent developments in NLP have also enabled
progress in human-like abstractive summarization. Recent work has also tested incorporating human
feedback to train and improve summarization systems with great success. Thus, NLP has become an

2
integral part of abstractive summarization techniques and methods.

1.4 Problem Statement

The problem statement for the project is:

Developing a text summarization model to generate a summary for Amazon fine food reviews using
abstraction based approach

1.5 Problem Objectives

The major objectives of the project are listed below:

1. Implementation of an abstractive text summarizer using advanced NLP techniques to reduce the
given text to a smaller number of sentences without leaving out the main ideas of the original text .

2.The input to the summarizer is a long sequence of words (in a text body), and the output is a short
summary (which is a sequence as well). So, this problem is modelled as a Seq2Seq problem.

3.To generate precise and accurate information representation with correct semantics.

4.To quickly give the reader or listener an idea of what this material is saying.

1.6 Summary

Today's world is getting flooded with an increasing amount of articles and links to choose from. As
this data grows, the importance of semantic density does as well. Having a generated summary lets
one decide whether they want to deep dive further or not. Conversion of lengthy texts into short and
meaningful sentences is the main idea behind text summarization. To achieve this, various
algorithms are present. Machine Learning models are trained, first to understand the given document
and then create a summary of it. These models achieve this task either by extracting important words
out of the document or by creating human-like sentences to form the summary. Natural language
processing has various applications and automatic text summarization is one of the popular and great
techniques. Text summarization can be used for various purposes like email summary, reviews of
movies, news headlines, outline of student notes, summarize information for businessmen and
government officials, summarize the medical data for doctors, summarize the legal document, novel
or book summary helping consumers decide whether to further read it or not. The main objective of a
text summarization system is to identify the most important information from the given text and
present it to the end users. In this paper, the dataset on amazon fine food review is given as input to
the system and abstractive text summarization is performed using NLP techniques as discussed in the
further sections.

3
Chapter 2

LITERATURE STUDY

There were several papers explored pertaining to the topic of abstractive text summarization. From
those papers we concluded the following points.

From [1] titled “Automatic Text Summarization of News Articles in Serbian Language” by author
Dijana Kosmajac,Vlado Keˇselj .
● The objective of this paper is to develop machine learning algorithms that can
automatically shorten longer texts and deliver accurate summaries. First, the generated
summaries should fluently pass the intended messages. Second, the generated summaries
should reduce reading time and speed up the process of researching relevant information.
● The results obtained in Experiment 1 We can see that ROUGE-1 value for word2vec was
higher, but ROUGE-L was higher for Gensim implementation. Interestingly, other two
runs, with Keˇselj stemming and English translation show that Gensim had lower result.
Whereas in Experiment 2 there is no significant difference between translated and
stemmed evaluations.However, we plan to compare the evaluation with included POS
information in the original language.
● The limitations was it tried to apply deep learning encoder-decoder architecture with no
success. It tried to validate it with two “off-the-shelf” implementations seq2seq and
Neural Monkey with similar negative results. However, it didn’t use any semantics
enhancement such as pre-trained word vectors. One possibility is that their dataset was
too small.

From [2] titled “Abstract Text Summarization: A Low Resource Challenge” by author Shantipriya
Parida,Petr Motlicek.
● The objective of this paper was to build an abstract text summarizer for the German
language text using the state-of-the-art “Transformer” model. It proposes an iterative
data augmentation approach which uses synthetic data along with the real summarization
data for the German language.
● The results obtained show that the summaries generated without a minimum length
constraint appear better compared to summaries with minimum length constraint.
Although they do not explicitly specify a minimum length parameter for generating
summaries for the models, the average length of words generated by model S2 is longer

4
than the model S1. There was a variance in summaries generated by model S3 as
compared to S2 and S1. In terms of Rouge score model S3 outperforms model S1 but
performs worse than model S2.
● The limitations were : i) it can investigate synthetic summarization data, and ii) applying
transfer learning on text summarization for the multilingual low resource data set with
little or no ground truth summaries.

From [3] titled “Text Summarization using Deep Learning” by author Kasimahanthi Divya, Kambala
Sneha, Baisetti Sowmya, G Sankara Rao
● The objective of this paper was to understand and implement a method of achieving text
summaries accurately using deep learning methods.
● The results obtained in this text summarization model, ROUGE is used as the validation
metric. It consists of a set of metrics which are used for evaluation. It compares the result
automatically produced using the model by the machine against human produced results.
In the text summarization model, the human produced summaries which are used for
reference are reference summaries and the summary generated using the model are
called system summaries. The model is evaluated based on the extent to which the
system summaries are similar to the reference summaries.
● The limitations were that ROUGE provides reasonable results in the case of extractive
summarisation. However, in abstractive summarisation, ROUGE is not enough as
ROUGE depends on exact matching between words

From [4] titled “A Survey on Abstractive Text Summarization” by author N. Moratanch ,Dr. S.
Chitrakala
● The objective is to collectively summarize and deciphers the various methodologies,
challenges and issues of abstractive summarization. State of art benchmark datasets and
their properties are being explored. The objective of this survey is to portray that most of
the abstractive summarization methods produce highly cohesive, coherent, less
redundant summary and information rich.
● The results obtained was that the quality of the summary is improved in a structure based
approach but it has some grammatical issues since it does not take semantic
representation of the document into consideration. The semantic based model provides
better linguistic quality to the summary since it involves semantic representation of the
text document as it overcomes the issues of structure based and reduces redundancy in
the summary and provides information rich content.
● The limitations were that there was no generalized framework , parsing and alignment of

5
parse trees is difficult. Extracting the important sentences and sentence ordering as it has
to be in the order as in the original source document for producing an efficient summary
is an open issueStill Information diffusion is not handled properly using abstractive text
summarization

From [5] titled “An approach to Abstractive Text Summarization” by author Huong Thanh Le,Tien
Manh Le.
● The objective of this paper is an approach to abstractive text summarization based on
discourse rules, syntactic constraints, and word graph. Discourse rules and syntactic
constraints are used in the process of generating sentences from keywords.Experimental
results show that our approach is promising in solving the abstractive summarization
task.
● The results obtained were among 433 sentences generated by our abstractive
summarizer, 95% sentences are syntactic correct; 72% of those sentences are complete in
meaning with unimportant parts at the end of sentences being removed. Most cases of
incomplete sentences are due to the process of completing the end of a new sentence in
the sentence reduction phrase.
● The limitations was text generation in general and automatically abstractive text
summarization in particular is still a challenge task, more work should be done to
improve the quality of the system like to improve the meaning completeness of
sentences generated in the sentence reduction phrase or to further compress sentences;
and investigating strategies to efficiently combine sentences in the summary.

From [6] titled “Multiple Text Document Summarization System using Hybrid Summarization
Technique” by author Harsha Dave, Shree Jaswal
● The objective was to propose a system to generate the abstractive summary from the
extractive summary using WordNet ontology. The multiple documents had been used
like text, pdf, word files etc.It has discussed various text summarization techniques then
discussed step by step the multiple document text summarization approaches.
● The result is compared with the existing online extractive tools as well as with human
generated summaries and shows the proposed system gives good results.
● The future work is that the summarization accuracy can be increased by comparing this
abstractive system with some other abstractive system.

6
From [7] titled “Abstractive text summarization using sequence-to-sequence RNNs and beyond”
by author R. Nallapati, B. Zhou, C. Santos.
● The objective was to propose Switching Pointer-Generator to avoid the generation of
“UNK” token, a token that is generated by most summarization systems, which
try to generate a word that is out of their training dictionary. It also shows that even
though the model learns to use pointers very accurately not only for named entities but
also for multi-word phrases.
● The results show that improvement of the overall model is not significant. It is
proposed that model impact may be clearer in other document sets, where the tail
distribution of rare words is heavier. Hierarchical attention model is proposed, but it
didn’t show a significant difference on all datasets. Also the work introduced a new
large scale dataset “CNN/Daily Mail”, which is very important due to the lack of
the former.
● The limitation was that the performance improvement of the overall model is not
significant.

From [8] titled “Selective Encoding for Abstractive Sentence Summarization” by author Q.
Zhou, N. Yang, F. Wei, M. Zhou.
● The objective was that unlike machine translation,where alignment between all parts
of input and output is required, there is no explicit alignment between input
sentences in sentence summarization. The challenge is not to infer the alignment, but to
select the highlights and filter secondary information. To solve this task, it is
proposed to extend the seq2seq framework with an additional selective encoding model.
● The result consists of sentence encoder (bidirectional GRU), and attention
equipped decoder (attentional GRU). The selective gate network constructs a second
level sentence representation by controlling the information flow from the encoder
to the decoder. Also, the proposed layer was shown to perform as expected; it
has highlighted the representation of important words from the input sentence.
● The model is evaluated on the Gigaword, DUC 2004 and MSR abstractive sentence
summarization datasets. The proposed selective encoding model outperformed the
state-of-art baseline models.

7
From [9] titled “A Graph-Based Approach to Abstractive Summarization of Highly Redundant
Opinions” by author Kavita Ganesan, ChengXiang Zhai, Jiawei Han, Opinosis.

● The objective was to present a novel graph- based summarization framework that
generates concise abstractive summaries of highly redundant opinions.

● Evaluation results on summarizing user reviews show that their summaries have better
agreement with human summaries compared to the baseline extractive method.

● The key idea of their approach is to first construct a textual graph that represents the text
to be summarized. Then, three unique properties of this graph are used to explore and
score various sub-paths that help in generating candidate abstractive summaries.

From [10] titled “Sentiment Analysis and Text Summarization of Online Reviews: A Survey” by
author Pankaj Gupta, RituTiwari and Nirmal Robert.

● The objective of this paper was to review different techniques of Sentiment analysis and
different techniques of text summarization. Sentiment analysis isa machine learning
approach in which a machine learns and analyzes the sentiments, emotions present in the
text. The machine learning methods like Naive Bayes Classifier and Support Machine
Vectors (SVM).
● Text summarization uses the natural language processing (NPL) and linguistic features
of sentences are used for checking the importance of the words and sentences that can be
included in the final summary.
● In this paper, a survey has been done of previous research work related to text
summarization and Sentiment analysis, so that new research area can be explored by
considering the merits and demerits of the current techniques and strategies.are
used.these methods are used to determine the emotions and sentiments in the text data
like reviews about movies or products.

8
Chapter 3
DESIGN DETAILS

3.1 Architecture

Encoder-Decoder architecture:
The Sequence to Sequence model uses a method of encoder decoder based machine translation.
Encoder-Decoder architecture is used in predicting sequences when the length of output and input
data may vary. The input sequence is read entirely by the encoder and a fixed length internal
representation is generated. The internal representation captures the entire context of the input data
sequence. The decoder network uses this internal representation to predict the output words until the
end of the sequence token is reached.

Encoder:
Once the data preprocessing is done, the encoder decoder architecture is set up. It is done in two
phases- the training phase and the inference phase. In the training phase, first the encoder and
decoder are set up. Then the model is trained to predict the target sequence offset by one timestep.
An Encoder Long Short Term Memory model (LSTM) reads the entire input sequence wherein, at
each timestep, one word is fed into the encoder.Then it processes the information at every timestep
and captures the contextual information present in the input sequence.
The diagram to illustrate that is shown below.

Figure 1: Encoder architecture used-The hidden state (hi) and cell state (ci) of the last time step are
used to initialize the decoder. This is because the encoder and decoder are two different sets of the
LSTM architecture.

9
Decoder:
The decoder is also an LSTM network which reads the entire target sequence word-by-word and
predicts the same sequence offset by one timestep. The decoder is trained to predict the next word in
the sequence given the previous word.

Figure 2: Decoder architecture

<start> and <end> are the special tokens which are added to the target sequence before feeding it
into the decoder. The target sequence is unknown while decoding the test sequence. So, prediction
of the target sequence is started by passing the first word into the decoder which would always be
the <start> token. And the <end> token signals the end of the sentence.

After completing the training phase, the model is tested on new source sequences for which the
target sequence is unknown. So, the inference architecture needs to be set up for the purpose of
decoding a test sequence. The architecture is shown in the figure below.

Figure 3: Inference architecture for decoding the test sequence

10
The use of encoder decoder architecture can pose quite a few problems, such as- It is difficult for
the encoder to memorize long sequences into a fixed length vector. In order to overcome this
problem a concept called the attention mechanism is used. It aims to predict a word by looking at a
few specific parts of the sequence only, rather than the entire sequence.

Attention layer:
A Sequence to Sequence model with an attention mechanism consists of encoder, decoder and an
attention layer. Attention mechanism is used to secure individual parts of the input which are more
important at that particular time. It can be implemented by taking inputs from each time step and
giving weightage to time steps. The weightage depends on the contextual importance of that
particular time step. It helps pay attention to the most relevant parts of the input data sequence so
that the decoder can optimally generate the next word in the output sequence.
Each and every component specified in the architecture model is thoroughly explained in the
methodology section.

3.2 Methodology

In the current development period, data scientists are trying to create algorithms which can help to
replicate the human brain and achieve its potential and functionalities. This has been achieved by
the concept of neural networks. Neural Networks are the set of algorithms that can recognize certain
patterns in the input data. They closely resemble the human brain and have the capability to create
models that can work or function like a human brain. Recurrent Neural Networks (RNN) are a type
of neural networks. They are feed-forward neural networks which have an internal memory. In a
traditional neural network, the input and the output sequences are independent of each other. But in
order to predict a sequence or a sentence, the previous words are crucial to predict the next word.
Hence, there is the requirement of internal memory. RNN helps to store the previous memory with
the help of hidden states which remembers information about previous sequences.

The RNN is named so as it recurrently performs the same function on all input of data and the
hidden layers. This reduces the complexity of having to store various parameters for each of the
layers in the network thus saving the memory. The output of the current input depends on the past
outputs too. After the output is produced, it is sent back to the same network so that it can be stored
and used for the processing of the next output in the same sequence. In order to generate an output
in RNN, the current input and the output that was stored from the previous input is considered.
RNNs work perfectly when it comes to short contexts. But when there is the requirement to create a

11
summary of a complete article, which could be extremely lengthy, consisting of more than 5,000
words or so, it is necessary to capture the context behind the complete input sequence and not just
the output of the previous input. Hence, we need a network that can capture the complete context
like a human brain. Unfortunately, simple RNN fails to capture the context or the long term relation
of the data, that is, it cannot remember or recall data in the input that occurred long before and
hence cannot make an effective prediction. RNN can remember data or context only for a short
term. This problem is often regarded as the vanishing gradient problem. This issue can be resolved
by a slightly different version of RNN - The Long Short Term Memory (LSTM) Networks.

Long Short-Term Memory (LSTM) networks are a better version of RNN. They can remember the
past data easily by resolving the vanishing gradient problem. LSTM uses back propagation to train
the model. LSTM is well-suited for predictions and classifications of data sequences of unknown
durations. They can also be used in language translation and text summarization methods.

3.3 Data set details

The data set used in this project of abstractive text summarization is: Amazon Fine Food
Reviews.

This dataset consists of reviews of fine foods from Amazon. The data span a period of more than 10
years, including all ~500,000 reviews up to October 2012. Reviews include product and user
information, ratings, and a plaintext review. There are also reviews from all other Amazon
categories. Amazon reviews are often the most publicly visible reviews of consumer products.
Analyzing the structure of such a large database of Amazon reviews and visualizing this
information can be extremely helpful for the consumers.
The Amazon Fine Food Reviews dataset consists of reviews of fine foods from Amazon. The
dataset could be categorized as :
1. Number of reviews: 568,454
2. Number of users: 256,059
3. Number of products: 74,258
4. Timespan: Oct 1999 — Oct 2012
5. Number of Attributes/Columns in data: 10

Attribute Information:

12
1. Id
2. ProductId — unique identifier for the product
3. UserId — unique identifier for the user
4. ProfileName
5. Helpfulness Numerator — number of users who found the review helpful
6. Helpfulness Denominator — number of users who indicated whether they found the
review helpful or not
7. Score — rating between 1 and 5
8. Time — timestamp for the review
9. Summary — brief summary of the review
10. Text — text of the review

Figure 4: Snapshot of Amazon fine food reviews dataset

13
Figure 5: Snapshot of Amazon fine food reviews dataset
3.4 ML/DL techniques used in the project

The model used here is sequence to sequence model. Sequence-to-sequence learning is a training
model that can convert sequences of one input domain into the sequences of another output domain.
It is generally used when the input and output of a model can be of variable lengths. It is a method
of encoder-decoder based machine translation that maps an input of sequence to an output of
sequence with a tag and attention value. The idea is to use two LSTMs that will work together with
a special token and try to predict the next state sequence from the previous sequence.
Encoder-Decoder architecture: The Sequence to Sequence model uses a method of encoder decoder
based machine translation. Encoder-Decoder architecture is used in predicting sequences when the
length of output and input data may vary. The input sequence is read entirely by the encoder and a
fixed length internal representation is generated. The internal representation captures the entire
context of the input data sequence. The decoder network uses this internal representation to predict
the output words until the end of the sequence token is reached.

14
3.5 Hardware and Software requirements
Minimum Hardware Requirements
● Processor : o x86 processor @ 1.6 GHz or faster
● RAM : 4 GB or Higher
● Monitor, Keyboard, Mouse
● Hard disk : 20 GB Or Higher
● Good Internet Connection
Software Requirements
● Language- Python 3.7.10
● Platform- Google Colab
● Packages- NumPy 1.19, Pandas 1.1.5, Tensorflow 2.5.0 , Keras, Beautiful Soup 4, NLTK
3.

15
Chapter 4
Implementation Details

4.1 Language/Tools/APIs
1. Language- The language used for this project is Python. From development to deployment and
maintenance, Python helps developers to be productive about the software being built. Benefits that
make Python the best fit for deep learning and AI-based projects include simplicity and consistency,
access to great libraries and frameworks for AI and machine learning (ML), flexibility, platform
independence, and a wide community. These add to the overall popularity of the language. Python
offers concise and readable code. While complex algorithms and versatile workflows stand behind
machine learning and AI, Python’s simplicity allows developers to write reliable systems.
Developers get to put all their effort into solving an ML problem instead of focusing on the
technical nuances of the language. Python is appealing to many developers as it’s easy to learn.
Python code is understandable by humans, which makes it easier to build models for machine
learning. Implementing AI and ML algorithms can be tricky and requires a lot of time. It’s vital to
have a well-structured and well-tested environment to enable developers to come up with the best
coding solutions. To reduce development time, programmers turn to a number of Python
frameworks and libraries. A software library is pre-written code that developers use to solve
common programming tasks. Python, with its rich technology stack, has an extensive set of libraries
for artificial intelligence and machine learning. Some of them are as follows:

1. Keras, TensorFlow, and Scikit-learn for machine learning


2. NumPy for high-performance scientific computing and data analysis
3. SciPy for advanced computing
4. Pandas for general-purpose data analysis
5. Seaborn for data visualization

One of the other key reasons for Python's popularity is that it’s a platform independent language.
Python is supported by many platforms including Linux, Windows, and macOS. Python code can be
used to create standalone executable programs for most common operating systems, which means
that Python software can be easily distributed and used on those operating systems without a Python
interpreter.

16
Few of the other languages used in the field of Artificial Intelligence include R, Java, Scala, Julia.
2. Tool- The tool used for carrying out the development of the project is Google colaboratory.
Google colab provides a free cloud service based on Jupyter Notebooks that supports free GPU.
Colab is ideal for everything from improving Python coding skills to working with deep learning
libraries, like PyTorch, Keras, TensorFlow, and OpenCV. It is possible to create notebooks in Colab,
upload notebooks, store notebooks, share notebooks, mount Google Drive and use whatever is
stored in there, import most of the directories, upload personal Jupyter Notebooks, upload
notebooks directly from GitHub, upload Kaggle files, download notebooks, and a lot more. It only
supports python as of now, not R or scala.

3. Library/Frameworks/APIs-
Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data
structures and data analysis tools for the Python programming language. Pandas is a software
library written for the Python programming language for data manipulation and analysis. In
particular, it offers data structures and operations for manipulating numerical tables and time series.
NumPy is a Python library used for working with arrays. NumPy stands for Numerical Python. It
also has functions for working in the domain of linear algebra, fourier transform, and matrices.
NumPy was created in 2005 by Travis Oliphant. It is an open source project and can be used freely.
NumPy arrays are stored at one continuous place in memory unlike lists, so processes can access
and manipulate them very efficiently. NumPy is a Python library and is written partially in Python,
but most of the parts that require fast computation are written in C or C++.
TensorFlow is a free and open-source software library for machine learning. It can be used across a
range of tasks but has a particular focus on training and inference of deep neural networks.
Tensorflow is a symbolic math library based on dataflow and differentiable programming.
TensorFlow is Google Brain's second-generation system. Version 1.0.0 was released on February
11, 2017.
Keras is an open-source software library that provides a Python interface for artificial neural
networks. Keras acts as an interface for the TensorFlow library. Up until version 2.3, Keras
supported multiple backends, including TensorFlow, Microsoft Cognitive Toolkit, Theano, and
PlaidML. Tensorflow 2 comes up with a tight integration of Keras and an intuitive high-level API tf.
keras to build neural networks and other ML models.
Beautiful Soup is a library that makes it easy to scrape information from web pages. It sits atop an
HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse
tree. It creates a parse tree for parsed pages that can be used to extract data from HTML, which is
useful for web scraping.

17
4.2 Workflow diagrams

The workflow diagram of abstractive text summarization is shown in the diagram below. The model
used is the abstractive method which is implemented using deep learning techniques. The algorithm
used is the LSTM or Long Short Term Memory model which is a type of Recurrent Neural Network
model. The model used is sequence to sequence model. Sequence-to-sequence learning is a training
model that can convert sequences of one input domain into the sequences of another output domain.
It is generally used when the input and output of a model can be of variable lengths.

Figure 6: Architectural flow of abstractive text summarization model

18
4.3 Data preprocessing

Performing basic pre-processing steps is very important before the model building part is
implemented. Usage of untidy and unclean text as data could lead to a potentially disastrous
problem. Hence, all the unwanted symbols, characters, etc. from the text are removed first so that
they do not affect the objective of the problem statement. The below preprocessing tasks are
performed for the data: Convert everything to lowercase, Remove (‘s), Remove any text inside the
parenthesis ( ), Eliminate punctuations and special characters, Remove stopwords, Remove short
words.
At first, all the texts are converted to lowercase and then all of the texts are splitted. There are several
contractions in the English language such as ain’t, aren't, didn't, doesn't, don't, I'm etc. So,
contractions in the data preprocessing stage are added . For a clean text it is needed to remove
unnecessary components from the text. The regular expression can remove those unnecessary
components. Next step is to remove stop words which contain little or no meaning. After stop words,
lemmatization on the words are performed which used to contain variant forms of the same words.
After completion of all steps, a purified or clean text is found.

Figure 7: Data preprocessing steps

19
4.4 Validation methodology

In this text summarization model, BLEU is used as the validation metric. BLEU is a Method for
Automatic Evaluation of Machine Translation first described by Kishore Papineni. The closeness of
translation is measured by finding legitimate differences in word choice and word order between the
reference human translation and translation generated by the machine.
A few terms in context with BLEU: Reference translation is Human translation and the Candidate
Translation is Machine translation.
In this project, to measure the machine translation effectiveness, the closeness of the machine
translation to human reference translation is evaluated using a metric known as BLEU- Bilingual
Evaluation Understudy. BLEU compares the n-gram of the candidate translation with the n-gram of
the reference translation to count the number of matches. These matches are independent of the
positions where they occur. The more the number of matches between candidate and reference
translation, the better is the machine translation. BLEU metric can be used for- Machine
Translation, Image captioning, Text summarization, Speech recognition. In python, the nltk library
provides implementation to calculate the BLEU score. To conclude, BLEU measures the closeness
of the machine translation to human reference translation taking translation length, word choice, and
word order into consideration.

20
Chapter 5
Results and Analysis

The results generated by the BLEU evaluation metric are shown here, in the images below.

Figure 8: Result obtained from the model

21
Figure 9: Result obtained from the model

Figure 10: Result obtained from the model

22
Figure 11: Result obtained from the model

Figure 12: Result obtained from the model

During implementation, a 3 Layer Stacked LSTM Encoder-Decoder model with Global Attention
Mechanism was used. With this model a cumulative BLEU-4 score of 0.8265 was achieved on the
training set (constituting 80% of the dataset). Also cumulative BLEU-4 score of 0.7730 was achieved
by this model on the test set.

23
Chapter 6
Conclusion and Future Enhancement

6.1 Novelty in the proposed solution

In recent years, the volume of textual data has rapidly increased, which has generated a valuable
resource for extracting and analysing information. To retrieve useful knowledge within a reasonable
time period, this information must be summarised. As a result, there have been many recent
approaches for abstractive text summarisation using deep learning models. Furthermore, the
measures that are utilised to evaluate the quality of summarisation are investigated, and
Recall-Oriented Understudy for Gisting Evaluation (ROUGE1), ROUGE2, ROUGE-L and BLEU
are determined to be the most commonly applied metrics. In one of the papers referred during the
literature survey, some of the challenges that are encountered during the summarisation process and
the solutions proposed in each approach were analysed. And the analysis of the several approaches
shows that recurrent neural networks with an attention mechanism and long short-term memory
(LSTM) are the most prevalent techniques for abstractive text summarisation. Most of the deep
learning approaches use the original LSTM model. However in this project a 3- stacked LSTM is
implemented. The original LSTM model consists of a single hidden LSTM layer followed by a
standard feedforward output layer. The Stacked LSTM is an extension to this model that has multiple
hidden LSTM layers where each layer contains multiple memory cells.

6.2 Limitations of the Project

Some of the current limitations in the project are as follows:

1. The project has presented a good approach to the English to English text summary using the
encoding and decoding with LSTM. However it is only limited to english language as of now.

2. Another most important limitation is that the model provides a correct summary, only for the
short to medium length texts.

24
6.3 Future Enhancements

Some of the future enhancements could be :

1. Increasing the training dataset size and building the model. The generalization capability of a
deep learning model enhances with an increase in the training dataset size.
2. Implementing Bi-Directional LSTM which is capable of capturing the context from both the
directions and results in a better context vector.
3. Using the beam search strategy for decoding the test sequence instead of using the greedy
approach (argmax).
4. Evaluating the performance of the model based on some other metric like ROUGE, along with
the BLEU score.
5. Implementing pointer-generator networks and coverage mechanisms.

6.4 Summary
The increasing growth of the Internet has made a huge amount of information available. It is difficult
for humans to summarize large amounts of text. Thus, there is an immense need for automatic
summarization tools in this age of information overload. The International Data Corporation (IDC)
projects that the total amount of digital data circulating annually around the world would sprout from
4.4 zettabytes in 2013 to hit 180 zettabytes in 2025. That’s a huge amount of data circulating in the
digital world. There is a dire need of algorithms which can be used to automatically shorten the
amount of data with accurate summaries that capture the essence of the intended messages.
Furthermore, applying text summarization reduces reading time and accelerates the process of
researching for information playing a major role in the current era of rapid development and
digitalisation. Humans are generally quite good at this task as we have the capacity to understand the
meaning of a text document and extract salient features to summarize the documents using our own
words. However, automatic methods for text summarization are crucial in today’s world where there
is an overabundance of data and lack of manpower as well as time to interpret the data.

25
References
[1] D. Kosmajac and V. Kešelj, "Automatic Text Summarization of News Articles in Serbian
Language," 2019 18th International Symposium INFOTEH-JAHORINA (INFOTEH), 2019, pp. 1-6,
doi: 10.1109/INFOTEH.2019.8717655.

[2] Parida, Shantipriya & Motlicek, Petr. (2019). Abstract Text Summarization: A Low Resource
Challenge. 10.18653/v1/D19-1616.

[3] Kasimahanthi Divya, Kambala Sneha, Baisetti Sowmya, G Sankara Rao (2020). Text
Summarization using Deep Learning.

[4] Moratanch, N. & Gopalan, Chitrakala. (2016). A survey on abstractive text summarization. 1-7.
10.1109/ICCPCT.2016.7530193.

[5] H. T. Le and T. M. Le, "An approach to abstractive text summarization," 2017 International
Conference on Soft Computing and Pattern Recognition (SoCPaR), 2017

[6] Jaswal, Shree. (2016). Multiple Text Document Summarization System using hybrid
Summarization. 10.1109/NGCT.2015.7375231.

[7] R. Nallapati, B. Zhou, C. Santos.(2017) Abstractive text summarization using


sequence-to-sequence RNNs and beyond.

[8] Zhou, Qingyu & Yang, Nan & Wei, Furu & Zhou, Ming. (2017). Selective Encoding for
Abstractive Sentence Summarization.

[9] Kavita Ganesan, ChengXiangZhai, Jiawei Han, “Opinosis: A Graph-Based Approach to


Abstractive Summarization of Highly Redundant Opinions.” Proceedings of the 23rd International
Conference on Computational Linguistics

[10] Pankaj Gupta, RituTiwari and Nirmal Robert, “Sentiment Analysis and Text Summarization of
Online Reviews: A Survey.” International Conference on Communication and Signal Processing,
2016

26
Appendix 1: Screenshots of the summary
Some of the screenshots of the summaries of the reviews generated by the model are attached below.

Fig1. Summary of the reviews generated by the model

Fig2. Summary of the reviews generated by the model

27
Fig3. Summary of the reviews generated by the model

Fig4. Summary of the reviews generated by the model

28
Fig5. Summary of the reviews generated by the model

Fig6. Summary of the reviews generated by the model

29
Fig7. Summary of the reviews generated by the model

Fig8. Summary of the reviews generated by the model

Appendix 2: Base Paper

30
IEEE - 45670

Abstractive method of text summarization with


sequence to sequence RNNs
Abu Kaisar Mohammad Masum Sheikh Abujar Md Ashraful Islam Talukder
Dept. of CSE Dept. of CSE Dept. of CSE
Daffodil International University Daffodil International University Daffodil International University
Dhaka, Bangladesh Dhaka, Bangladesh Dhaka, Bangladesh
mohammad15-6759@diu.edu.bd sheikh.cse@diu.edu.bd islam15-7100@diu.edu.bd

AKM Shahariar Azad Rabby Syed Akhter Hossain


Dept. of CSE Dept. of CSE
Daffodil International University Daffodil International University
Dhaka, Bangladesh Dhaka, Bangladesh
azad15-5424@diu.edu.bd aktarhossain@daffodilvarsity.edu.bd

Abstract— Text summarization is one of the famous problems summary. Those words or sentences may be present or not in
in natural language processing and deep learning in recent the text document. That means the abstract method can create
years. Generally, text summarization contains a short note on a a summary from itself which can reduce grammatical
large text document. Our main purpose is to create a short, inconsistency of a text document.
fluent and understandable abstractive summary of a text
In this paper, our proposed method is to create an automatic
document. For making a good summarizer we have used
amazon fine food reviews dataset, which is available on Kaggle. text summarizer which follows the abstractive method and
We have used reviews text descriptions as our input data, and able to respond to some short length text summary. As the
generated a simple summary of that review descriptions as our dataset, we have used amazon fine food reviews dataset.
output. To assist produce some extensive summary, we have These datasets contain a large number of fine food reviews
used a bi-directional RNN with LSTM’s in encoding layer and data. But for our working purpose, we have used a few data
attention model in decoding layer. And we applied the sequence for the train. After successfully training dataset for almost
to sequence model to generate a short summary of food eight hours, we got a satisfactory result.
descriptions. There are some challenges when we working with Working with text is very difficult every time. To produce
abstractive text summarizer such as text processing, vocabulary
a proper abstractive text summarizer, there is some step to
counting, missing word counting, word embedding, the
efficiency of the model or reduce value of loss and response follow, i.e. better text processing or pre-processing,
machine fluent summary. In this paper, the main goal was vocabulary counting, missing word counting, word
increased the efficiency and reduce train loss of sequence to embedding counting, use the special token for the encoder
sequence model for making a better abstractive text and decoder, fixed the length of input and output text. In our
summarizer. In our experiment, we’ve successfully reduced the proposed method we work with all of those steps. We have
training loss with a value of 0.036 and our abstractive text used the sequence to sequence model with a two-layered
summarizer able to create a short summary of English to bidirectional RNN's. On the input text and two layers RNN's,
English text. each with an LSTM using Bahdanau attention [1] on the
Keywords— Text Processing, Word-Embedding, Missing
target text to produce an extensive summary. An encoder
Word Counting, Vocabulary Counting, Deep Learning, Bi- encodes source sentence into a fixed-length vector from
directional RNN, Encoding, Decoding. which a decoder generates an output sequence. The proposed
model was originally used to resolve relevant text such as
I. INTRODUCTION machine translation. We have modified them to food reviews
Human has a receptivity to assume the interpretation of a summarization.
text document. They can select noticeable features to short We have discussed various necessary factors concerning
summarize the text documents using their own words. in text summarization on the basis of improving their
Nevertheless, automated text summarization is the ultimate efficiency to create a fluent summary. Major processes are
way nowadays worlds for efficiency, effectiveness, less mentioned in details and more importantly discussed deep
preferential and minimize time. learning methods and models were explained here.
There are two ways to summarize text, one is extractive and
another is abstractive. In an extractive method, dragging the
main sentences or words and extract some necessary parts
from the source document, and combines them to make a
summary. Abstractive text summarization is a method which
made up of some core words or sentences to generate a

10th ICCCNT - 2019


July 6-8, 2019,IIT - Kanpur
Kanpur, India
IEEE - 45670

II. LITERATURE REVIEW


Neural machine translator is not like a traditional machine
translator but recently a great approach to machine translation
(Kalchbrenner et al, 2013) [8]. An individual neural network
that is able jointly turned translation mostly associated with
encoder and decoder. For Improving the performance of basic
encoder and decoder, using a fixed length of text as input that
generates output by the decoder (Bahdanau et al 2014) [1].
Abstractive text summarization creates a summary of a text
document using it's intrinsic and chooses the key content of
the text document using potential vocabulary. A sequence of
aim words as input text in a source text document and
predicted aim words of a sequence is called a summary of a
text document. For a short text, summarizer has established a
self-encoder, decoder RNN attention model on machine
translation to text summarization (Ramesh Nallapati, Bowen
Zhou, et al,2016) [2].
Sutskever et al. [4] describe an end-to-end approach to
sequence to sequence learning used a Multilayer LSTM. The
neural network contains encoder and decoder. Encoder used
a fixed length of text using as input and Decoder represents
the output.
The sequence to sequence learning has done improved
neural machine translation. There are two ways of attention
model (Minh-Thang Luong et al .2015) [5] one global and
another is local. Global approach accepts all of the source text
words and local accept the only subset of source text at a time.
Both approaches are effective for machine translation.
Neural network response to producer short text conversion.
Lifeng Shang et al [10] proposed Neural Responding
Machine which proposed Neural Responding Machine which
follows normal encoding and decoding.
When we read a long text its kills our time. Some time it is
not easy to understand the meaning of this text. So, to save
time and easy to understand the meaning of the text summary
is important. An automated summary is an effective way to
solve this problem. The main motivation of our research is to
generate automate abstractive text summarizer to generate
summary easily by machine. Since a build in the model used
in our experiment but if increase the efficacy of the model
processing text and model input and output is important. This
paper we were discussing our contribution in coding site to
increase the accuracy and reduce the loss in train time. We
introduce an easy to pre-process and apply lemmatization to
reduce the inflectional form on the word in summary.

III. METHODOLOGY Figure1: Working Flow.


In this section, we represent our methodology for making A. Problem Assertion
an abstractive text summarizer for English to English text Here is given a large number of review description and we
document. Previously there are many successful works were need to create a short sequence of the description which
completed for making a text summarizer in English text. But represents of text summary. Let us consider, the input
we have tried to produce a better text summarizer using our
sequence of description D words 𝑥1 ,…..,𝑥𝐷 is coming to the
own approach and we were able to create a better abstractive
vocabulary size V and generated output sequence will be
text summarizer for English text documents. For building our
𝑦1 ,…….., 𝑦𝑆 similar that S<D, that means a summary
model and training, we have used tensorflow CPU version.
sequence is less than the description sequence. Assume that
Fig.1 shows our workflow for abstractive text summarizer.
all output sequence is coming from the same vocabulary.
B. Data Collection
All deep learning algorithms need a huge number of data to
produce a better result. For better text summarizer we also

10th ICCCNT - 2019


July 6-8, 2019,IIT - Kanpur
Kanpur, India
IEEE - 45670

need a good dataset. There is some available dataset in online E. Model


to make text summarizer but we select Amazon fine food Different kinds of deep learning model used to a different
review datasets. The raw dataset contains 568,454 fine food purpose. When working with text, longest short-term
reviews. For our working purpose, but we’ve used only memory (LSTM) will helpful for text modeling. Machine
20,000 data only. The dataset contains several columns but Translation is important for learning a machine about text
for working purpose, we’ve used only two columns such as sequence. Every translator uses encoder and decoder such
summary and text description. asGoogle translate. The translator translates a sequence of
C. Data Preprocessing text in a language to another language.
In the data pre-processing phase, we followed some steps.
i. Neural Machine Translation
At first, we have converted all the texts to lower case and then
Neural machine translation is an approach uses to translate
split all of the texts. There are several contractions in the one language into another language. Most of the machine
English language such as ain’t, aren't, didn't, doesn't, don't, translation using encoder and decoder to translate the
I'm etc. So, we added contractions in data preprocessing sequence of one language into another language. The encoder
stage. For a clean text needs to remove an unnecessary takes the input sequence and decoder show or predicts the
component from the text. The regular expression can remove output sequence. Neural Machine translation uses a target
those unnecessary component. Next step we remove stop sentence x that maximizes the conditional probability of x. If
words which contains little meaning. After stop words, we y is a source sentence so,𝑎𝑟𝑔 𝑚𝑎𝑥𝑦 𝑝(𝑥|𝑦).
lemmatize the words which contain variant forms of the same
words. After completion, all steps found a purified or clean ii. RNN Encoder–Decoder
text. At first 2 layers, RNN encoder-decoder architecture
introduces by Cho et al [11]. Later this was extended by
Bahdanau et al [1]. Those encoder and decoder model used
only machine translation.
We’ve used this a neural network which contains 2 layers
RNN’s. Encoder contains a fixed length of a sentence and
decoder contains the sequence of output. The 2 layers RNN’s
network are trained unitedly and keep the maximum
conditional probability of target text sequence. The hidden
unit used to improved memory capacity and training. We train
our model to learn the probability of an English sentence to
the corresponding English sentence.
If encoder read a target Input sentence 𝑋 = (𝑥1 , … … , 𝑥 𝑇𝑥 ),
in table 1 & 2 Input words are the input of the model. Where
c is a context vector, so
ℎ𝑡 = 𝑓(𝑥𝑡 , ℎ𝑡−1 ) (1)
and
𝑐 = 𝑞({ℎ1 , … . . ℎ 𝑇𝑥 })
Where, ℎ𝑡 = hidden state at the time t. 𝑐 = Context vector
which is generated from hidden state sequence. f and g is non-
linear function.
Figure2:Data Pre-processing . If the decoder predicted word sequence {𝑦1 , … . . , 𝑦𝑇𝑦 } ,
D. Vocabulary Count & Word embedding Response Summary of table 2& 3 are predicted sequence then
the probability will be,
The significance of words is not only depending on
frequency but also depend on word similarity. So, we need to 𝑝(𝑦) = ∏𝑇𝑡=1 𝑝(𝑦|{𝑦1 , … , 𝑦𝑡−1 }, 𝑐) (2)
count the total vocabulary from purified text and summary. Where, (𝑦1 , … . 𝑦𝑇𝑜 ).Now conditional probability is modeled
After the vocabulary counting, we’ve checked word by
occurrence. Such as, we’ve tested the word “test” and the
occurrence was 37. 𝑝(𝑦|{𝑦1 , … , 𝑦𝑡−1 }, 𝑐) = 𝑔(𝑦𝑡−1 , 𝑠𝑡 , 𝑐) (3)
For the improvement of our model, we used a pre-trained Where, g = non-linear function, 𝑦𝑡 = output of probability, 𝑠𝑡 =
word to a vector file. There is some available pre-trained hidden sate.
word to vector files such as GloVe, ConceptNet
Numberbatch, wiki-news-300d-1M.vec,crawl-300d-2M.vec 𝑐𝑖 = ∑𝑇𝑗=0 𝑎𝑖𝑗 ℎ𝑗 (4)
etc. Based on work, here we’ve used ConceptNet
We user Bi-directional RNN’s. Which is consists of forward
Numberbatch. We fixed the vocabulary size and count those and backward recurrent neural network. Forward recurrent
words which occurred more than 20 times in our dataset. neural network sequence order is (𝑥1 𝑡𝑜 𝑥𝑇𝑥 ) and the hidden
state is ( ⃗⃗⃗⃗⃗
ℎ1 , … , ⃗⃗⃗⃗⃗⃗
ℎ 𝑇𝑥 ). Backward recurrent neural network

10th ICCCNT - 2019


July 6-8, 2019,IIT - Kanpur
Kanpur, India
IEEE - 45670

⃖⃗⃗⃗⃗1 , … , ℎ
sequence order is (𝑥 𝑇𝑥 𝑡𝑜 𝑥1 ) and hidden state is (ℎ ⃖⃗⃗⃗⃗⃗⃗
𝑇𝑥 ).
sequence to sequence mode x is input sequence of encoder and
So y are generated output or response output sequence.

⃗⃗⃗⃗⃗
ℎ𝑗 = [ℎ ⃖⃗⃗⃗⃗⃗ 𝑇 (5) IV. EXPERIMENT AND OUTPUT
𝑗𝑇 ; ℎ𝑗𝑇 ]
We have used tensorflow1.12.0 sequence to sequence
Where, ℎ𝑗 =Summary of predicting and following words.
model. When stop training we’ll able to create a machine’s
Here, 𝑎𝑖𝑗 = is softmax of 𝑒𝑖𝑗 which is normalize exceptional own summary. Making summary we take input sentence from
function and show how input position j align with output at the dataset and define the summary length randomly. For the
position i, parameter, we’ve used attention based encoder. We take
epoch=100, batch size=64, rnn’s size=256, learning
𝑒𝑖𝑗 = 𝑎(𝑠𝑖−1 , ℎ𝑗 ) (6) rate=0.005, keep probability=0.75 and we used Adam
Optimizer, which calculated the learning rate of each
parameter. For faster converges used vanilla gradient descent
optimizer. After data pre-processing 20% data in the whole
dataset were keeping for test and the other 80% data put for
training. Total 20k data use and train 16000 data another 4000
data use of the test.
Role Sample number Percentage

Train 16000 80%

Test 4000 20%

Tabale1: Sample number for train and test.


The machine gave the maximum positive output. Few
response outputs of some text are wrong. But positive
response output looks unbelievable. Here is given positive
response from machine after few hours of training our model
with dataset:
Figure3: View of model Table2: Sample1 output
Original I believe gerber may have mixed apricots with
iii. Sequence to Sequence Model
Text: the peaches. It now has a sour after taste and
Every sequence to sequence model has encoder and my son hates it.
decoder with LSTM cell. In our text summarize method, we
have used a word embedding file. Then we did count the Original Does not taste like peaches anymore
vocabulary size of those files which will be used our model Summary:
input. Input believe gerber may mixed apricots peaches
Words: sour test son hates
Response not for best
Summary:

Table3: Sample2 output


Original Maybe, I got a bad one. It's very plain, nothing
Text: special in flavor. I felt like I wasted my time
and money. I will not buy this flavor again.
Original Very plain
Summary:
Figure4: Sequence to Sequence model Input maybe got bad one plain nothing special
Words: flavor felt like wasted time money buy flavor
We have added some special tokens in vocabulary such as
<UNK>, <PAD>, <EOS>, <GO>. Vocabulary is limited for Response tasted plain
some reasons. Some word has not been replaced. Those words Summary:
are replaced by UNK token. PAD token adds a batch size of
each sentence has the same length. EOS token contains the
end of the sequence which gives signals to encoder when V. CONCLUSION AND FUTURE WORK
receiving the input. GO token give instruction to start the
process of output sequence in the decoder. In data This paper has presented a good approach to the English to
preprocessing stage, we add UNK and replace the vocabulary. English text summary using the encoding and decoding with
Before our train data, we chose GO and EOS in data which LSTM. But all machines are not given a hundred per cent
contains words id used sequence translation. In this, the accurate value or prediction. Like a machine, our summarizer

10th ICCCNT - 2019


July 6-8, 2019,IIT - Kanpur
Kanpur, India
IEEE - 45670

provides a maximum accurate summary. Few reviews, the [9] Rico Sennrich et al “ Neural Machine Translation of
model cannot predict accurately but for other reviews, it Rare Words with Subword Units”. Association for
provides great performance and fluent summary. After all, we Computational Linguistics (ACL, 2016).
are successfully able to create an understandable, fluent and [10] Lifeng Shang et al “Neural Responding Machine for
short summary with reducing the training loss. Short-Text Conversation”. Association for Computational
We have identified some limitation in our experiment. A Linguistics (ACL 2015).
most important limitation is machine provide a correct [11] Cho, Kyunghyun. et al. (2014) “Learning Phrase
summary only short text. The maximum output of long text Representations using RNN Encoder Decoder”. Proceedings
provides incorrectly. Another important limitation is needed of the 2014 Conference on Empirical Methods in Natural
to fixed the text and summary length. Long time and strong Language Processing (EMNLP).
hardware configuration need to train the dataset.
In future, our main purpose is generating a model which
learn any length and generate correct summary which has no
fixed length. This model was used for English text but there
is not any good text summarizer in our Bengali language.
Some Resources are available for Bengali text. But these
resources are not sufficient for making a better abstractive
text summarizer. Such as, there is not sufficient word to
vector for the Bengali language, even no lemmatizer is
available for the Bengali language. In future, we will try to
solve all of these problems for making a better text
summarizer for our mother language. And we will try to
increase Bengali resources for Machine Learning, Deep
Learning and Natural Language Processing Researcher.

ACKNOWLEDGMENT
We would like to thanks our DIU-NLP and Machine
Learning Research Lab for providing all research facility.
Special thanks, of our supervisor and course instructor for
their patience and support in overcoming numerous obstacles
we have been facing through our research work.
REFERENCES
[1] Dzmitry Bahdanau, K.Cho, Y.Bengio. “Neural Machine
Translation by Jointly Learning to Align and Translate”.
International Conference on Learning Representation
(ICLR), 19 May 2014.
[2] Ramesh Nallapati, Bowen Zhou, et al “Abstractive Text
Summarization Using Sequence-to-Sequence RNNs and
Beyond”.The SIGNLL Conference on Computational
Natural Language Learning (CoNLL), 26 Aug 2016.
[3] K.Cho, B .van Merrienboer, D.Bahdanau, Y.Bengio “
On the Properties of Neural Machine translation: Encoder
Decoder Approaches”. Workshop on Syntax, Semantics and
Structure in Statistical Translation (SSST-8),7 Oct 2014.
[4] Sutskever et al “Sequence to Sequence Learning with
Neural Networks”. Conference on Neural Information
Processing Systems (NIPS,2014).
[5] M. Luong, H. Pham, Christopher D. Manning “Effective
Approaches to Attention-based Neural Machine
Translation”. Conference on Empirical Methods in Natural
Language Processing (EMNLP 2015).
[6] Peter J. Liu et al. “Generating Wikipedia by Summarizing
Long Sequences”. International Conference on Learning
Representation (ICLR), 2018.
[7] Minh-Thang Luong, Ilya Sutskever et al Addressing the
Rare Word Problem in Neural Machine Translation”.
Association for Computational Linguistics (ACL, 2015).
[8] Kalchbrenner et al “Recurrent continuous translation
models”. Empirical Methods in Natural Language Processing
(EMNLP)(2013).

10th ICCCNT - 2019


July 6-8, 2019,IIT - Kanpur
Kanpur, India

You might also like