Abstractive Text Summarization Using Deep Learning
Abstractive Text Summarization Using Deep Learning
PROJECT REPORT
Submitted by
Department of
Computer Science
Engineering 2020-2021
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
CERTIFICATE
Certified that the project work titled Abstractive Text Summarization Using Deep
Learning is carried out by Arunima Maitra (1RV18CS029) and Ayush Daga
(1RV18CS034) in partial fulfilment of the completion of the course Artificial Intelligence
and Machine Learning (18CS62) of the VI Sem, Computer Science Engineering program,
during the academic year 2020-2021. It is certified that all corrections/suggestions indicated
for the Internal Assessment have been incorporated in the project report and duly approved
by the laboratory faculty.
External Viva
1
2
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
DECLARATION
We, the students of sixth semester B.E., Department of Computer Science and Engineering,
RV College of Engineering, Bengaluru-59, hereby declare that the project titled ‘Abstractive
Text Summarization Using Deep Learning’ has been carried out by us and submitted in
fulfilment for the completion of the course Artificial Intelligence and Machine Learning
(18CS62) of the VI Sem during the year 2020-21.Further we declare that the content of the
report is originally prepared by us and also we have acknowledged all the sources used for
preparation of this report.
Name Signature
1. Arunima Maitra (1RV18CS029)
We are indebted to our guide, Manonmani S, Assistant Professor , Dept of CSE , for the
wholehearted support, suggestions and invaluable advice throughout our project work and
also helped in the preparation of this report.
My sincere thanks to Dr. Ramakanth Kumar P., Professor and Head, Department of
Computer Science and Engineering, RVCE for his valuable suggestions and expert advice
along with support and encouragement.
First and foremost I would like to thank Dr. Subramanya. K. N, Principal, R.V.C.E,
Bengaluru, for his moral support towards completing our project work.
I would also like to thank my Parents, and all the Faculty members of the department of
Computer Science & Engineering, for their constant support and encouragement.
Last, but not the least, I would like to thank my peers and friends who provided me with
valuable suggestions to improve my project.
ABSTRACT
The volume of information and published data on the internet is growing day by day. It has become
tough for the users to go through the loads of information to analyse it to draw conclusions.
Therefore, accessing and reading the required information in the shortest possible time are becoming
constantly an open research matter. Text summarization solves this problem by generating a
summary, selecting sentences which are most important from the document without losing any
valuable information. In this process, the extracted information is generated as a comprehensive
report and presented as a concise summary to the user. Text summarization approaches can be
majorly classified into two types- Extractive Summarization and Abstractive Summarization.
However, this project primarily focuses on the abstractive text summarization method. Key
challenges in text summarization include topic identification, interpretation, summary generation,
and evaluation of the generated summary. Automatic text summarization techniques are immensely
required to address the ever-growing amount of text data available online to both better help
discover all the relevant information and also to consume relevant information faster.
Abstractive text summarization is an approach of summarizing the text by generating new sentences
from the original text. Here the objective is to build a text summarizer where the input is a long
sequence of words (in a text body), and the output is a short summary which is a sequence as well.
Hence, we can model this problem as a Many-to-Many Seq2Seq problem. The Encoder-Decoder
architecture is mainly used to solve the sequence-to-sequence problems where the input and output
sequences are of different lengths. Variants of RNN are preferred as the encoder and decoder
components. However, a potential issue with this encoder decoder approach is that a neural network
needs to be able to compress all the necessary information of a source sentence into a fixed-length
vector. This may make it difficult for the neural network to cope with long sentences. As a result, the
performance of a basic encoder-decoder deteriorates rapidly as the length of an input sentence
increases. Therefore, to solve this issue, a mechanism known as attention mechanism can be used.
Attention mechanism targets to predict a word by looking at a few specific parts of the sequence
only, rather than the entire sequence.
The evaluation metric used in this model is BLEU- Bilingual Evaluation Understudy metric. The BLEU score
for the training set is 0.8265,while the BLEU score for the test data set is 0.7730.
List of Acronyms:
AI - Artificial Intelligence
ML - Machine Learning
RNN - Recurrent Neural Network
NLP - Natural Language Processing
Seq2Seq - Sequence to sequence
DL - Deep Learning
LSTM - Long Short Term Memory
NLTK - Natural Language Toolkit
List of figures:
Abstract I
List of acronyms II
1. Introduction 1
1.1. Project Domain and Problem addressing 1
1.2 Issues and Challenges 2
1.3 Need for AI-based solutions 2
1.4. Problem Statement 3
1.5. Project objectives 3
1.6. Summary 3
2. Literature Study 4
3. Design Details 9
3.1 Architecture 9
3.2 Methodology 11
3.3 Data set details 12
3.4 ML/DL techniques used 14
3.5 Hardware and Software requirements 15
References 27
Appendices
Appendix A: Screenshots 28
Appendix B: Printout of the base paper used for implementation of this project 31
Chapter 1
INTRODUCTION
After the rise of the internet, massive amounts of data generation became a regular scenario. The
internet is the source of data. It houses information related to numerous domains such as
entertainment, health, politics, world, medicine, education, literacy, news, topology etc, and is
accessible on the internet. Data involves various forms such as images, pictures, videos,
mathematical, statistical, numerical, and textual data. Among these text data is quite hard to interpret
since it involves huge amounts of characters, hence it needs a system to extract the vital parts of the
information we need, Text summarization is one way to achieve this. The problem domain for the
proposed project is ‘Text Summarization’. Summarization is the task of condensing a piece of text
to a shorter form, hence reducing the size of the initial text while at the same time preserving key
informational elements and the meaning of content. In general, there are two different approaches
used for the purpose of automatic summarization: extraction and abstraction based methods. In case
of Extractive text summarization, it picks up sentences directly from the document based on a
scoring function to form a coherent summary. This method works by identifying important sections
of the text cropping out and stitching together portions of the content to produce a condensed
version. While, Abstractive text summarization methods aim at producing summary by interpreting
the text using advanced NLP techniques in order to generate a new shorter text, parts of which may
not appear as part of the original document, that conveys the most critical information from the
original text, requiring rephrasing sentences and incorporating information from full text to generate
summaries such as a human-written abstract usually does. The method proposed in this paper is to
create an automatic text summarizer which follows the abstractive method and is able to respond to
some short length text summary. AI can enable humans to obtain key information from increasing
huge data with the advantaged development of technology and solve the problem of generating titles
(short summaries). Deep learning analyses complex problems to facilitate the decision-making
process and attempts to imitate what the human brain can achieve by extracting features at different
levels of abstraction. There are important applications for text summarization in various NLP related
tasks, such as medical cases - Summarization of Electronic health records, text classification,
question answering and bots, and internal document workflow summarization.
1
1.2 Issues and Challenges
The main objective of the project is to implement an abstractive text summarizer using advanced
NLP techniques to reduce the given text to a smaller number of sentences without leaving out the
main ideas of the original text. Automatic text summarization is applied in different domains, for
example search engines apply the idea in reviews of textual resources, news websites use it for
condensing the information to the shorter sort of news. Semantic similarity has a vital part in many
natural language processing tasks and in many associated areas such as text summarization, text
classification, document clustering etc. Word2Vec is a model in natural language processing used to
represent the words in terms of vectors. These vectors are then used for achieving several tasks in
natural language processing. One of the tasks in text summarization is that the similarity between the
words can be calculated by applying mathematical functions on these vectors. The couple of words
considered for semantic similarity calculation are transformed into vectors, the rate of closeness is
computed using cosine similarity. It is calculated by obtaining the angle between the vectors using
cosine function. Semantic similarity measures can be categorized as knowledge based and corpus
based. The knowledge-based extracts meaning from the sentence using details from semantic storage
area. The corpus-based methods process the document and fetch important information from it, and
these are used for calculating the similarity among the words, phrases or sentences.
Automatically generated text summaries help reduce reading time, are non-biased compared to
human authored summaries, and could also be beneficial for a lot of personal day to day applications
like email summarization, for posting on social media sites like Twitter, and more. Automatic text
summarization is an extremely challenging task as the AI model needs to understand the
domain-specific language of the source text ( for example- scientific paper, news article, legal
document), then identify the salient features of the source text (like in case of scientific paper -
problem statement, novel contributions, methodology, future work, references), and finally produce a
coherent summary that is faithful to the essence of the article. The shorter the summary, the more it
will have to abstract over the source text. Although extractive text summarization was the most
popular approach over the past few years, recent progress in Natural language processing (NLP) has
enabled the rise of more abstractive summarization models. Therefore, to conclude, automatic text
summarization is an interesting and exciting research problem with a wide range of industry
applications like news highlights, entity-centric summarization for search engines, legal contract
analysis, question answering systems, and more. The recent developments in NLP have also enabled
progress in human-like abstractive summarization. Recent work has also tested incorporating human
feedback to train and improve summarization systems with great success. Thus, NLP has become an
2
integral part of abstractive summarization techniques and methods.
Developing a text summarization model to generate a summary for Amazon fine food reviews using
abstraction based approach
1. Implementation of an abstractive text summarizer using advanced NLP techniques to reduce the
given text to a smaller number of sentences without leaving out the main ideas of the original text .
2.The input to the summarizer is a long sequence of words (in a text body), and the output is a short
summary (which is a sequence as well). So, this problem is modelled as a Seq2Seq problem.
3.To generate precise and accurate information representation with correct semantics.
4.To quickly give the reader or listener an idea of what this material is saying.
1.6 Summary
Today's world is getting flooded with an increasing amount of articles and links to choose from. As
this data grows, the importance of semantic density does as well. Having a generated summary lets
one decide whether they want to deep dive further or not. Conversion of lengthy texts into short and
meaningful sentences is the main idea behind text summarization. To achieve this, various
algorithms are present. Machine Learning models are trained, first to understand the given document
and then create a summary of it. These models achieve this task either by extracting important words
out of the document or by creating human-like sentences to form the summary. Natural language
processing has various applications and automatic text summarization is one of the popular and great
techniques. Text summarization can be used for various purposes like email summary, reviews of
movies, news headlines, outline of student notes, summarize information for businessmen and
government officials, summarize the medical data for doctors, summarize the legal document, novel
or book summary helping consumers decide whether to further read it or not. The main objective of a
text summarization system is to identify the most important information from the given text and
present it to the end users. In this paper, the dataset on amazon fine food review is given as input to
the system and abstractive text summarization is performed using NLP techniques as discussed in the
further sections.
3
Chapter 2
LITERATURE STUDY
There were several papers explored pertaining to the topic of abstractive text summarization. From
those papers we concluded the following points.
From [1] titled “Automatic Text Summarization of News Articles in Serbian Language” by author
Dijana Kosmajac,Vlado Keˇselj .
● The objective of this paper is to develop machine learning algorithms that can
automatically shorten longer texts and deliver accurate summaries. First, the generated
summaries should fluently pass the intended messages. Second, the generated summaries
should reduce reading time and speed up the process of researching relevant information.
● The results obtained in Experiment 1 We can see that ROUGE-1 value for word2vec was
higher, but ROUGE-L was higher for Gensim implementation. Interestingly, other two
runs, with Keˇselj stemming and English translation show that Gensim had lower result.
Whereas in Experiment 2 there is no significant difference between translated and
stemmed evaluations.However, we plan to compare the evaluation with included POS
information in the original language.
● The limitations was it tried to apply deep learning encoder-decoder architecture with no
success. It tried to validate it with two “off-the-shelf” implementations seq2seq and
Neural Monkey with similar negative results. However, it didn’t use any semantics
enhancement such as pre-trained word vectors. One possibility is that their dataset was
too small.
From [2] titled “Abstract Text Summarization: A Low Resource Challenge” by author Shantipriya
Parida,Petr Motlicek.
● The objective of this paper was to build an abstract text summarizer for the German
language text using the state-of-the-art “Transformer” model. It proposes an iterative
data augmentation approach which uses synthetic data along with the real summarization
data for the German language.
● The results obtained show that the summaries generated without a minimum length
constraint appear better compared to summaries with minimum length constraint.
Although they do not explicitly specify a minimum length parameter for generating
summaries for the models, the average length of words generated by model S2 is longer
4
than the model S1. There was a variance in summaries generated by model S3 as
compared to S2 and S1. In terms of Rouge score model S3 outperforms model S1 but
performs worse than model S2.
● The limitations were : i) it can investigate synthetic summarization data, and ii) applying
transfer learning on text summarization for the multilingual low resource data set with
little or no ground truth summaries.
From [3] titled “Text Summarization using Deep Learning” by author Kasimahanthi Divya, Kambala
Sneha, Baisetti Sowmya, G Sankara Rao
● The objective of this paper was to understand and implement a method of achieving text
summaries accurately using deep learning methods.
● The results obtained in this text summarization model, ROUGE is used as the validation
metric. It consists of a set of metrics which are used for evaluation. It compares the result
automatically produced using the model by the machine against human produced results.
In the text summarization model, the human produced summaries which are used for
reference are reference summaries and the summary generated using the model are
called system summaries. The model is evaluated based on the extent to which the
system summaries are similar to the reference summaries.
● The limitations were that ROUGE provides reasonable results in the case of extractive
summarisation. However, in abstractive summarisation, ROUGE is not enough as
ROUGE depends on exact matching between words
From [4] titled “A Survey on Abstractive Text Summarization” by author N. Moratanch ,Dr. S.
Chitrakala
● The objective is to collectively summarize and deciphers the various methodologies,
challenges and issues of abstractive summarization. State of art benchmark datasets and
their properties are being explored. The objective of this survey is to portray that most of
the abstractive summarization methods produce highly cohesive, coherent, less
redundant summary and information rich.
● The results obtained was that the quality of the summary is improved in a structure based
approach but it has some grammatical issues since it does not take semantic
representation of the document into consideration. The semantic based model provides
better linguistic quality to the summary since it involves semantic representation of the
text document as it overcomes the issues of structure based and reduces redundancy in
the summary and provides information rich content.
● The limitations were that there was no generalized framework , parsing and alignment of
5
parse trees is difficult. Extracting the important sentences and sentence ordering as it has
to be in the order as in the original source document for producing an efficient summary
is an open issueStill Information diffusion is not handled properly using abstractive text
summarization
From [5] titled “An approach to Abstractive Text Summarization” by author Huong Thanh Le,Tien
Manh Le.
● The objective of this paper is an approach to abstractive text summarization based on
discourse rules, syntactic constraints, and word graph. Discourse rules and syntactic
constraints are used in the process of generating sentences from keywords.Experimental
results show that our approach is promising in solving the abstractive summarization
task.
● The results obtained were among 433 sentences generated by our abstractive
summarizer, 95% sentences are syntactic correct; 72% of those sentences are complete in
meaning with unimportant parts at the end of sentences being removed. Most cases of
incomplete sentences are due to the process of completing the end of a new sentence in
the sentence reduction phrase.
● The limitations was text generation in general and automatically abstractive text
summarization in particular is still a challenge task, more work should be done to
improve the quality of the system like to improve the meaning completeness of
sentences generated in the sentence reduction phrase or to further compress sentences;
and investigating strategies to efficiently combine sentences in the summary.
From [6] titled “Multiple Text Document Summarization System using Hybrid Summarization
Technique” by author Harsha Dave, Shree Jaswal
● The objective was to propose a system to generate the abstractive summary from the
extractive summary using WordNet ontology. The multiple documents had been used
like text, pdf, word files etc.It has discussed various text summarization techniques then
discussed step by step the multiple document text summarization approaches.
● The result is compared with the existing online extractive tools as well as with human
generated summaries and shows the proposed system gives good results.
● The future work is that the summarization accuracy can be increased by comparing this
abstractive system with some other abstractive system.
6
From [7] titled “Abstractive text summarization using sequence-to-sequence RNNs and beyond”
by author R. Nallapati, B. Zhou, C. Santos.
● The objective was to propose Switching Pointer-Generator to avoid the generation of
“UNK” token, a token that is generated by most summarization systems, which
try to generate a word that is out of their training dictionary. It also shows that even
though the model learns to use pointers very accurately not only for named entities but
also for multi-word phrases.
● The results show that improvement of the overall model is not significant. It is
proposed that model impact may be clearer in other document sets, where the tail
distribution of rare words is heavier. Hierarchical attention model is proposed, but it
didn’t show a significant difference on all datasets. Also the work introduced a new
large scale dataset “CNN/Daily Mail”, which is very important due to the lack of
the former.
● The limitation was that the performance improvement of the overall model is not
significant.
From [8] titled “Selective Encoding for Abstractive Sentence Summarization” by author Q.
Zhou, N. Yang, F. Wei, M. Zhou.
● The objective was that unlike machine translation,where alignment between all parts
of input and output is required, there is no explicit alignment between input
sentences in sentence summarization. The challenge is not to infer the alignment, but to
select the highlights and filter secondary information. To solve this task, it is
proposed to extend the seq2seq framework with an additional selective encoding model.
● The result consists of sentence encoder (bidirectional GRU), and attention
equipped decoder (attentional GRU). The selective gate network constructs a second
level sentence representation by controlling the information flow from the encoder
to the decoder. Also, the proposed layer was shown to perform as expected; it
has highlighted the representation of important words from the input sentence.
● The model is evaluated on the Gigaword, DUC 2004 and MSR abstractive sentence
summarization datasets. The proposed selective encoding model outperformed the
state-of-art baseline models.
7
From [9] titled “A Graph-Based Approach to Abstractive Summarization of Highly Redundant
Opinions” by author Kavita Ganesan, ChengXiang Zhai, Jiawei Han, Opinosis.
● The objective was to present a novel graph- based summarization framework that
generates concise abstractive summaries of highly redundant opinions.
● Evaluation results on summarizing user reviews show that their summaries have better
agreement with human summaries compared to the baseline extractive method.
● The key idea of their approach is to first construct a textual graph that represents the text
to be summarized. Then, three unique properties of this graph are used to explore and
score various sub-paths that help in generating candidate abstractive summaries.
From [10] titled “Sentiment Analysis and Text Summarization of Online Reviews: A Survey” by
author Pankaj Gupta, RituTiwari and Nirmal Robert.
● The objective of this paper was to review different techniques of Sentiment analysis and
different techniques of text summarization. Sentiment analysis isa machine learning
approach in which a machine learns and analyzes the sentiments, emotions present in the
text. The machine learning methods like Naive Bayes Classifier and Support Machine
Vectors (SVM).
● Text summarization uses the natural language processing (NPL) and linguistic features
of sentences are used for checking the importance of the words and sentences that can be
included in the final summary.
● In this paper, a survey has been done of previous research work related to text
summarization and Sentiment analysis, so that new research area can be explored by
considering the merits and demerits of the current techniques and strategies.are
used.these methods are used to determine the emotions and sentiments in the text data
like reviews about movies or products.
8
Chapter 3
DESIGN DETAILS
3.1 Architecture
Encoder-Decoder architecture:
The Sequence to Sequence model uses a method of encoder decoder based machine translation.
Encoder-Decoder architecture is used in predicting sequences when the length of output and input
data may vary. The input sequence is read entirely by the encoder and a fixed length internal
representation is generated. The internal representation captures the entire context of the input data
sequence. The decoder network uses this internal representation to predict the output words until the
end of the sequence token is reached.
Encoder:
Once the data preprocessing is done, the encoder decoder architecture is set up. It is done in two
phases- the training phase and the inference phase. In the training phase, first the encoder and
decoder are set up. Then the model is trained to predict the target sequence offset by one timestep.
An Encoder Long Short Term Memory model (LSTM) reads the entire input sequence wherein, at
each timestep, one word is fed into the encoder.Then it processes the information at every timestep
and captures the contextual information present in the input sequence.
The diagram to illustrate that is shown below.
Figure 1: Encoder architecture used-The hidden state (hi) and cell state (ci) of the last time step are
used to initialize the decoder. This is because the encoder and decoder are two different sets of the
LSTM architecture.
9
Decoder:
The decoder is also an LSTM network which reads the entire target sequence word-by-word and
predicts the same sequence offset by one timestep. The decoder is trained to predict the next word in
the sequence given the previous word.
<start> and <end> are the special tokens which are added to the target sequence before feeding it
into the decoder. The target sequence is unknown while decoding the test sequence. So, prediction
of the target sequence is started by passing the first word into the decoder which would always be
the <start> token. And the <end> token signals the end of the sentence.
After completing the training phase, the model is tested on new source sequences for which the
target sequence is unknown. So, the inference architecture needs to be set up for the purpose of
decoding a test sequence. The architecture is shown in the figure below.
10
The use of encoder decoder architecture can pose quite a few problems, such as- It is difficult for
the encoder to memorize long sequences into a fixed length vector. In order to overcome this
problem a concept called the attention mechanism is used. It aims to predict a word by looking at a
few specific parts of the sequence only, rather than the entire sequence.
Attention layer:
A Sequence to Sequence model with an attention mechanism consists of encoder, decoder and an
attention layer. Attention mechanism is used to secure individual parts of the input which are more
important at that particular time. It can be implemented by taking inputs from each time step and
giving weightage to time steps. The weightage depends on the contextual importance of that
particular time step. It helps pay attention to the most relevant parts of the input data sequence so
that the decoder can optimally generate the next word in the output sequence.
Each and every component specified in the architecture model is thoroughly explained in the
methodology section.
3.2 Methodology
In the current development period, data scientists are trying to create algorithms which can help to
replicate the human brain and achieve its potential and functionalities. This has been achieved by
the concept of neural networks. Neural Networks are the set of algorithms that can recognize certain
patterns in the input data. They closely resemble the human brain and have the capability to create
models that can work or function like a human brain. Recurrent Neural Networks (RNN) are a type
of neural networks. They are feed-forward neural networks which have an internal memory. In a
traditional neural network, the input and the output sequences are independent of each other. But in
order to predict a sequence or a sentence, the previous words are crucial to predict the next word.
Hence, there is the requirement of internal memory. RNN helps to store the previous memory with
the help of hidden states which remembers information about previous sequences.
The RNN is named so as it recurrently performs the same function on all input of data and the
hidden layers. This reduces the complexity of having to store various parameters for each of the
layers in the network thus saving the memory. The output of the current input depends on the past
outputs too. After the output is produced, it is sent back to the same network so that it can be stored
and used for the processing of the next output in the same sequence. In order to generate an output
in RNN, the current input and the output that was stored from the previous input is considered.
RNNs work perfectly when it comes to short contexts. But when there is the requirement to create a
11
summary of a complete article, which could be extremely lengthy, consisting of more than 5,000
words or so, it is necessary to capture the context behind the complete input sequence and not just
the output of the previous input. Hence, we need a network that can capture the complete context
like a human brain. Unfortunately, simple RNN fails to capture the context or the long term relation
of the data, that is, it cannot remember or recall data in the input that occurred long before and
hence cannot make an effective prediction. RNN can remember data or context only for a short
term. This problem is often regarded as the vanishing gradient problem. This issue can be resolved
by a slightly different version of RNN - The Long Short Term Memory (LSTM) Networks.
Long Short-Term Memory (LSTM) networks are a better version of RNN. They can remember the
past data easily by resolving the vanishing gradient problem. LSTM uses back propagation to train
the model. LSTM is well-suited for predictions and classifications of data sequences of unknown
durations. They can also be used in language translation and text summarization methods.
The data set used in this project of abstractive text summarization is: Amazon Fine Food
Reviews.
This dataset consists of reviews of fine foods from Amazon. The data span a period of more than 10
years, including all ~500,000 reviews up to October 2012. Reviews include product and user
information, ratings, and a plaintext review. There are also reviews from all other Amazon
categories. Amazon reviews are often the most publicly visible reviews of consumer products.
Analyzing the structure of such a large database of Amazon reviews and visualizing this
information can be extremely helpful for the consumers.
The Amazon Fine Food Reviews dataset consists of reviews of fine foods from Amazon. The
dataset could be categorized as :
1. Number of reviews: 568,454
2. Number of users: 256,059
3. Number of products: 74,258
4. Timespan: Oct 1999 — Oct 2012
5. Number of Attributes/Columns in data: 10
Attribute Information:
12
1. Id
2. ProductId — unique identifier for the product
3. UserId — unique identifier for the user
4. ProfileName
5. Helpfulness Numerator — number of users who found the review helpful
6. Helpfulness Denominator — number of users who indicated whether they found the
review helpful or not
7. Score — rating between 1 and 5
8. Time — timestamp for the review
9. Summary — brief summary of the review
10. Text — text of the review
13
Figure 5: Snapshot of Amazon fine food reviews dataset
3.4 ML/DL techniques used in the project
The model used here is sequence to sequence model. Sequence-to-sequence learning is a training
model that can convert sequences of one input domain into the sequences of another output domain.
It is generally used when the input and output of a model can be of variable lengths. It is a method
of encoder-decoder based machine translation that maps an input of sequence to an output of
sequence with a tag and attention value. The idea is to use two LSTMs that will work together with
a special token and try to predict the next state sequence from the previous sequence.
Encoder-Decoder architecture: The Sequence to Sequence model uses a method of encoder decoder
based machine translation. Encoder-Decoder architecture is used in predicting sequences when the
length of output and input data may vary. The input sequence is read entirely by the encoder and a
fixed length internal representation is generated. The internal representation captures the entire
context of the input data sequence. The decoder network uses this internal representation to predict
the output words until the end of the sequence token is reached.
14
3.5 Hardware and Software requirements
Minimum Hardware Requirements
● Processor : o x86 processor @ 1.6 GHz or faster
● RAM : 4 GB or Higher
● Monitor, Keyboard, Mouse
● Hard disk : 20 GB Or Higher
● Good Internet Connection
Software Requirements
● Language- Python 3.7.10
● Platform- Google Colab
● Packages- NumPy 1.19, Pandas 1.1.5, Tensorflow 2.5.0 , Keras, Beautiful Soup 4, NLTK
3.
15
Chapter 4
Implementation Details
4.1 Language/Tools/APIs
1. Language- The language used for this project is Python. From development to deployment and
maintenance, Python helps developers to be productive about the software being built. Benefits that
make Python the best fit for deep learning and AI-based projects include simplicity and consistency,
access to great libraries and frameworks for AI and machine learning (ML), flexibility, platform
independence, and a wide community. These add to the overall popularity of the language. Python
offers concise and readable code. While complex algorithms and versatile workflows stand behind
machine learning and AI, Python’s simplicity allows developers to write reliable systems.
Developers get to put all their effort into solving an ML problem instead of focusing on the
technical nuances of the language. Python is appealing to many developers as it’s easy to learn.
Python code is understandable by humans, which makes it easier to build models for machine
learning. Implementing AI and ML algorithms can be tricky and requires a lot of time. It’s vital to
have a well-structured and well-tested environment to enable developers to come up with the best
coding solutions. To reduce development time, programmers turn to a number of Python
frameworks and libraries. A software library is pre-written code that developers use to solve
common programming tasks. Python, with its rich technology stack, has an extensive set of libraries
for artificial intelligence and machine learning. Some of them are as follows:
One of the other key reasons for Python's popularity is that it’s a platform independent language.
Python is supported by many platforms including Linux, Windows, and macOS. Python code can be
used to create standalone executable programs for most common operating systems, which means
that Python software can be easily distributed and used on those operating systems without a Python
interpreter.
16
Few of the other languages used in the field of Artificial Intelligence include R, Java, Scala, Julia.
2. Tool- The tool used for carrying out the development of the project is Google colaboratory.
Google colab provides a free cloud service based on Jupyter Notebooks that supports free GPU.
Colab is ideal for everything from improving Python coding skills to working with deep learning
libraries, like PyTorch, Keras, TensorFlow, and OpenCV. It is possible to create notebooks in Colab,
upload notebooks, store notebooks, share notebooks, mount Google Drive and use whatever is
stored in there, import most of the directories, upload personal Jupyter Notebooks, upload
notebooks directly from GitHub, upload Kaggle files, download notebooks, and a lot more. It only
supports python as of now, not R or scala.
3. Library/Frameworks/APIs-
Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data
structures and data analysis tools for the Python programming language. Pandas is a software
library written for the Python programming language for data manipulation and analysis. In
particular, it offers data structures and operations for manipulating numerical tables and time series.
NumPy is a Python library used for working with arrays. NumPy stands for Numerical Python. It
also has functions for working in the domain of linear algebra, fourier transform, and matrices.
NumPy was created in 2005 by Travis Oliphant. It is an open source project and can be used freely.
NumPy arrays are stored at one continuous place in memory unlike lists, so processes can access
and manipulate them very efficiently. NumPy is a Python library and is written partially in Python,
but most of the parts that require fast computation are written in C or C++.
TensorFlow is a free and open-source software library for machine learning. It can be used across a
range of tasks but has a particular focus on training and inference of deep neural networks.
Tensorflow is a symbolic math library based on dataflow and differentiable programming.
TensorFlow is Google Brain's second-generation system. Version 1.0.0 was released on February
11, 2017.
Keras is an open-source software library that provides a Python interface for artificial neural
networks. Keras acts as an interface for the TensorFlow library. Up until version 2.3, Keras
supported multiple backends, including TensorFlow, Microsoft Cognitive Toolkit, Theano, and
PlaidML. Tensorflow 2 comes up with a tight integration of Keras and an intuitive high-level API tf.
keras to build neural networks and other ML models.
Beautiful Soup is a library that makes it easy to scrape information from web pages. It sits atop an
HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse
tree. It creates a parse tree for parsed pages that can be used to extract data from HTML, which is
useful for web scraping.
17
4.2 Workflow diagrams
The workflow diagram of abstractive text summarization is shown in the diagram below. The model
used is the abstractive method which is implemented using deep learning techniques. The algorithm
used is the LSTM or Long Short Term Memory model which is a type of Recurrent Neural Network
model. The model used is sequence to sequence model. Sequence-to-sequence learning is a training
model that can convert sequences of one input domain into the sequences of another output domain.
It is generally used when the input and output of a model can be of variable lengths.
18
4.3 Data preprocessing
Performing basic pre-processing steps is very important before the model building part is
implemented. Usage of untidy and unclean text as data could lead to a potentially disastrous
problem. Hence, all the unwanted symbols, characters, etc. from the text are removed first so that
they do not affect the objective of the problem statement. The below preprocessing tasks are
performed for the data: Convert everything to lowercase, Remove (‘s), Remove any text inside the
parenthesis ( ), Eliminate punctuations and special characters, Remove stopwords, Remove short
words.
At first, all the texts are converted to lowercase and then all of the texts are splitted. There are several
contractions in the English language such as ain’t, aren't, didn't, doesn't, don't, I'm etc. So,
contractions in the data preprocessing stage are added . For a clean text it is needed to remove
unnecessary components from the text. The regular expression can remove those unnecessary
components. Next step is to remove stop words which contain little or no meaning. After stop words,
lemmatization on the words are performed which used to contain variant forms of the same words.
After completion of all steps, a purified or clean text is found.
19
4.4 Validation methodology
In this text summarization model, BLEU is used as the validation metric. BLEU is a Method for
Automatic Evaluation of Machine Translation first described by Kishore Papineni. The closeness of
translation is measured by finding legitimate differences in word choice and word order between the
reference human translation and translation generated by the machine.
A few terms in context with BLEU: Reference translation is Human translation and the Candidate
Translation is Machine translation.
In this project, to measure the machine translation effectiveness, the closeness of the machine
translation to human reference translation is evaluated using a metric known as BLEU- Bilingual
Evaluation Understudy. BLEU compares the n-gram of the candidate translation with the n-gram of
the reference translation to count the number of matches. These matches are independent of the
positions where they occur. The more the number of matches between candidate and reference
translation, the better is the machine translation. BLEU metric can be used for- Machine
Translation, Image captioning, Text summarization, Speech recognition. In python, the nltk library
provides implementation to calculate the BLEU score. To conclude, BLEU measures the closeness
of the machine translation to human reference translation taking translation length, word choice, and
word order into consideration.
20
Chapter 5
Results and Analysis
The results generated by the BLEU evaluation metric are shown here, in the images below.
21
Figure 9: Result obtained from the model
22
Figure 11: Result obtained from the model
During implementation, a 3 Layer Stacked LSTM Encoder-Decoder model with Global Attention
Mechanism was used. With this model a cumulative BLEU-4 score of 0.8265 was achieved on the
training set (constituting 80% of the dataset). Also cumulative BLEU-4 score of 0.7730 was achieved
by this model on the test set.
23
Chapter 6
Conclusion and Future Enhancement
In recent years, the volume of textual data has rapidly increased, which has generated a valuable
resource for extracting and analysing information. To retrieve useful knowledge within a reasonable
time period, this information must be summarised. As a result, there have been many recent
approaches for abstractive text summarisation using deep learning models. Furthermore, the
measures that are utilised to evaluate the quality of summarisation are investigated, and
Recall-Oriented Understudy for Gisting Evaluation (ROUGE1), ROUGE2, ROUGE-L and BLEU
are determined to be the most commonly applied metrics. In one of the papers referred during the
literature survey, some of the challenges that are encountered during the summarisation process and
the solutions proposed in each approach were analysed. And the analysis of the several approaches
shows that recurrent neural networks with an attention mechanism and long short-term memory
(LSTM) are the most prevalent techniques for abstractive text summarisation. Most of the deep
learning approaches use the original LSTM model. However in this project a 3- stacked LSTM is
implemented. The original LSTM model consists of a single hidden LSTM layer followed by a
standard feedforward output layer. The Stacked LSTM is an extension to this model that has multiple
hidden LSTM layers where each layer contains multiple memory cells.
1. The project has presented a good approach to the English to English text summary using the
encoding and decoding with LSTM. However it is only limited to english language as of now.
2. Another most important limitation is that the model provides a correct summary, only for the
short to medium length texts.
24
6.3 Future Enhancements
1. Increasing the training dataset size and building the model. The generalization capability of a
deep learning model enhances with an increase in the training dataset size.
2. Implementing Bi-Directional LSTM which is capable of capturing the context from both the
directions and results in a better context vector.
3. Using the beam search strategy for decoding the test sequence instead of using the greedy
approach (argmax).
4. Evaluating the performance of the model based on some other metric like ROUGE, along with
the BLEU score.
5. Implementing pointer-generator networks and coverage mechanisms.
6.4 Summary
The increasing growth of the Internet has made a huge amount of information available. It is difficult
for humans to summarize large amounts of text. Thus, there is an immense need for automatic
summarization tools in this age of information overload. The International Data Corporation (IDC)
projects that the total amount of digital data circulating annually around the world would sprout from
4.4 zettabytes in 2013 to hit 180 zettabytes in 2025. That’s a huge amount of data circulating in the
digital world. There is a dire need of algorithms which can be used to automatically shorten the
amount of data with accurate summaries that capture the essence of the intended messages.
Furthermore, applying text summarization reduces reading time and accelerates the process of
researching for information playing a major role in the current era of rapid development and
digitalisation. Humans are generally quite good at this task as we have the capacity to understand the
meaning of a text document and extract salient features to summarize the documents using our own
words. However, automatic methods for text summarization are crucial in today’s world where there
is an overabundance of data and lack of manpower as well as time to interpret the data.
25
References
[1] D. Kosmajac and V. Kešelj, "Automatic Text Summarization of News Articles in Serbian
Language," 2019 18th International Symposium INFOTEH-JAHORINA (INFOTEH), 2019, pp. 1-6,
doi: 10.1109/INFOTEH.2019.8717655.
[2] Parida, Shantipriya & Motlicek, Petr. (2019). Abstract Text Summarization: A Low Resource
Challenge. 10.18653/v1/D19-1616.
[3] Kasimahanthi Divya, Kambala Sneha, Baisetti Sowmya, G Sankara Rao (2020). Text
Summarization using Deep Learning.
[4] Moratanch, N. & Gopalan, Chitrakala. (2016). A survey on abstractive text summarization. 1-7.
10.1109/ICCPCT.2016.7530193.
[5] H. T. Le and T. M. Le, "An approach to abstractive text summarization," 2017 International
Conference on Soft Computing and Pattern Recognition (SoCPaR), 2017
[6] Jaswal, Shree. (2016). Multiple Text Document Summarization System using hybrid
Summarization. 10.1109/NGCT.2015.7375231.
[8] Zhou, Qingyu & Yang, Nan & Wei, Furu & Zhou, Ming. (2017). Selective Encoding for
Abstractive Sentence Summarization.
[10] Pankaj Gupta, RituTiwari and Nirmal Robert, “Sentiment Analysis and Text Summarization of
Online Reviews: A Survey.” International Conference on Communication and Signal Processing,
2016
26
Appendix 1: Screenshots of the summary
Some of the screenshots of the summaries of the reviews generated by the model are attached below.
27
Fig3. Summary of the reviews generated by the model
28
Fig5. Summary of the reviews generated by the model
29
Fig7. Summary of the reviews generated by the model
30
IEEE - 45670
Abstract— Text summarization is one of the famous problems summary. Those words or sentences may be present or not in
in natural language processing and deep learning in recent the text document. That means the abstract method can create
years. Generally, text summarization contains a short note on a a summary from itself which can reduce grammatical
large text document. Our main purpose is to create a short, inconsistency of a text document.
fluent and understandable abstractive summary of a text
In this paper, our proposed method is to create an automatic
document. For making a good summarizer we have used
amazon fine food reviews dataset, which is available on Kaggle. text summarizer which follows the abstractive method and
We have used reviews text descriptions as our input data, and able to respond to some short length text summary. As the
generated a simple summary of that review descriptions as our dataset, we have used amazon fine food reviews dataset.
output. To assist produce some extensive summary, we have These datasets contain a large number of fine food reviews
used a bi-directional RNN with LSTM’s in encoding layer and data. But for our working purpose, we have used a few data
attention model in decoding layer. And we applied the sequence for the train. After successfully training dataset for almost
to sequence model to generate a short summary of food eight hours, we got a satisfactory result.
descriptions. There are some challenges when we working with Working with text is very difficult every time. To produce
abstractive text summarizer such as text processing, vocabulary
a proper abstractive text summarizer, there is some step to
counting, missing word counting, word embedding, the
efficiency of the model or reduce value of loss and response follow, i.e. better text processing or pre-processing,
machine fluent summary. In this paper, the main goal was vocabulary counting, missing word counting, word
increased the efficiency and reduce train loss of sequence to embedding counting, use the special token for the encoder
sequence model for making a better abstractive text and decoder, fixed the length of input and output text. In our
summarizer. In our experiment, we’ve successfully reduced the proposed method we work with all of those steps. We have
training loss with a value of 0.036 and our abstractive text used the sequence to sequence model with a two-layered
summarizer able to create a short summary of English to bidirectional RNN's. On the input text and two layers RNN's,
English text. each with an LSTM using Bahdanau attention [1] on the
Keywords— Text Processing, Word-Embedding, Missing
target text to produce an extensive summary. An encoder
Word Counting, Vocabulary Counting, Deep Learning, Bi- encodes source sentence into a fixed-length vector from
directional RNN, Encoding, Decoding. which a decoder generates an output sequence. The proposed
model was originally used to resolve relevant text such as
I. INTRODUCTION machine translation. We have modified them to food reviews
Human has a receptivity to assume the interpretation of a summarization.
text document. They can select noticeable features to short We have discussed various necessary factors concerning
summarize the text documents using their own words. in text summarization on the basis of improving their
Nevertheless, automated text summarization is the ultimate efficiency to create a fluent summary. Major processes are
way nowadays worlds for efficiency, effectiveness, less mentioned in details and more importantly discussed deep
preferential and minimize time. learning methods and models were explained here.
There are two ways to summarize text, one is extractive and
another is abstractive. In an extractive method, dragging the
main sentences or words and extract some necessary parts
from the source document, and combines them to make a
summary. Abstractive text summarization is a method which
made up of some core words or sentences to generate a
⃖⃗⃗⃗⃗1 , … , ℎ
sequence order is (𝑥 𝑇𝑥 𝑡𝑜 𝑥1 ) and hidden state is (ℎ ⃖⃗⃗⃗⃗⃗⃗
𝑇𝑥 ).
sequence to sequence mode x is input sequence of encoder and
So y are generated output or response output sequence.
⃗⃗⃗⃗⃗
ℎ𝑗 = [ℎ ⃖⃗⃗⃗⃗⃗ 𝑇 (5) IV. EXPERIMENT AND OUTPUT
𝑗𝑇 ; ℎ𝑗𝑇 ]
We have used tensorflow1.12.0 sequence to sequence
Where, ℎ𝑗 =Summary of predicting and following words.
model. When stop training we’ll able to create a machine’s
Here, 𝑎𝑖𝑗 = is softmax of 𝑒𝑖𝑗 which is normalize exceptional own summary. Making summary we take input sentence from
function and show how input position j align with output at the dataset and define the summary length randomly. For the
position i, parameter, we’ve used attention based encoder. We take
epoch=100, batch size=64, rnn’s size=256, learning
𝑒𝑖𝑗 = 𝑎(𝑠𝑖−1 , ℎ𝑗 ) (6) rate=0.005, keep probability=0.75 and we used Adam
Optimizer, which calculated the learning rate of each
parameter. For faster converges used vanilla gradient descent
optimizer. After data pre-processing 20% data in the whole
dataset were keeping for test and the other 80% data put for
training. Total 20k data use and train 16000 data another 4000
data use of the test.
Role Sample number Percentage
provides a maximum accurate summary. Few reviews, the [9] Rico Sennrich et al “ Neural Machine Translation of
model cannot predict accurately but for other reviews, it Rare Words with Subword Units”. Association for
provides great performance and fluent summary. After all, we Computational Linguistics (ACL, 2016).
are successfully able to create an understandable, fluent and [10] Lifeng Shang et al “Neural Responding Machine for
short summary with reducing the training loss. Short-Text Conversation”. Association for Computational
We have identified some limitation in our experiment. A Linguistics (ACL 2015).
most important limitation is machine provide a correct [11] Cho, Kyunghyun. et al. (2014) “Learning Phrase
summary only short text. The maximum output of long text Representations using RNN Encoder Decoder”. Proceedings
provides incorrectly. Another important limitation is needed of the 2014 Conference on Empirical Methods in Natural
to fixed the text and summary length. Long time and strong Language Processing (EMNLP).
hardware configuration need to train the dataset.
In future, our main purpose is generating a model which
learn any length and generate correct summary which has no
fixed length. This model was used for English text but there
is not any good text summarizer in our Bengali language.
Some Resources are available for Bengali text. But these
resources are not sufficient for making a better abstractive
text summarizer. Such as, there is not sufficient word to
vector for the Bengali language, even no lemmatizer is
available for the Bengali language. In future, we will try to
solve all of these problems for making a better text
summarizer for our mother language. And we will try to
increase Bengali resources for Machine Learning, Deep
Learning and Natural Language Processing Researcher.
ACKNOWLEDGMENT
We would like to thanks our DIU-NLP and Machine
Learning Research Lab for providing all research facility.
Special thanks, of our supervisor and course instructor for
their patience and support in overcoming numerous obstacles
we have been facing through our research work.
REFERENCES
[1] Dzmitry Bahdanau, K.Cho, Y.Bengio. “Neural Machine
Translation by Jointly Learning to Align and Translate”.
International Conference on Learning Representation
(ICLR), 19 May 2014.
[2] Ramesh Nallapati, Bowen Zhou, et al “Abstractive Text
Summarization Using Sequence-to-Sequence RNNs and
Beyond”.The SIGNLL Conference on Computational
Natural Language Learning (CoNLL), 26 Aug 2016.
[3] K.Cho, B .van Merrienboer, D.Bahdanau, Y.Bengio “
On the Properties of Neural Machine translation: Encoder
Decoder Approaches”. Workshop on Syntax, Semantics and
Structure in Statistical Translation (SSST-8),7 Oct 2014.
[4] Sutskever et al “Sequence to Sequence Learning with
Neural Networks”. Conference on Neural Information
Processing Systems (NIPS,2014).
[5] M. Luong, H. Pham, Christopher D. Manning “Effective
Approaches to Attention-based Neural Machine
Translation”. Conference on Empirical Methods in Natural
Language Processing (EMNLP 2015).
[6] Peter J. Liu et al. “Generating Wikipedia by Summarizing
Long Sequences”. International Conference on Learning
Representation (ICLR), 2018.
[7] Minh-Thang Luong, Ilya Sutskever et al Addressing the
Rare Word Problem in Neural Machine Translation”.
Association for Computational Linguistics (ACL, 2015).
[8] Kalchbrenner et al “Recurrent continuous translation
models”. Empirical Methods in Natural Language Processing
(EMNLP)(2013).