0% found this document useful (0 votes)
144 views25 pages

Mini Project Report Final

The document presents a project report for a 'Multilingual Text Summarizer' developed by Hrishikesh Thakare, Sanchit Atre, and Neeraj Sharma as part of their Bachelor of Technology in Information Technology. The project aims to create a web application that summarizes English text and translates it into Indian languages like Hindi and Marathi, utilizing Natural Language Processing techniques. The report includes details on the system's design, methodology, implementation, and potential future enhancements.

Uploaded by

kasii.0801
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
144 views25 pages

Mini Project Report Final

The document presents a project report for a 'Multilingual Text Summarizer' developed by Hrishikesh Thakare, Sanchit Atre, and Neeraj Sharma as part of their Bachelor of Technology in Information Technology. The project aims to create a web application that summarizes English text and translates it into Indian languages like Hindi and Marathi, utilizing Natural Language Processing techniques. The report includes details on the system's design, methodology, implementation, and potential future enhancements.

Uploaded by

kasii.0801
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

MULTILINGUAL TEXT SUMMARIZER

Submitted in partial fulfillment of the requirements

of the degree of

Bachelor of Technology in Information Technology


By

Hrishikesh Thakare (Roll No. 23101C0025)

Sanchit Atre (Roll No. 23101C0026)

Neeraj Sharma (Roll No. 23101C0027)

Under the Guidance of

Dr./Prof. Rasika Rasing

Department of Information Technology

Autonomous Institute affiliated University of Mumbai


Vidyalankar Institute of Technology
Wadala(E), Mumbai-400437

University of Mumbai

2024-25

i
CERTIFICATE OF APPROVAL

This is to certify that the project entitled

“Multilingial Text Summarizer”

is a bonafide work of

Hrishikesh Thakare (Roll No. 23101C0025)

Sanchit Atre (Roll No. 23101C0026)

Neeraj Sharma (Roll No. 23101C0027)

submitted to the University of Mumbai in partial fulfillment of the requirement for


the award of the

degree of Bachelor of Technology in Information Technology.

___________________________

(Prof./Dr. Rasika Ransing)


Project Guide

Dr. Vidya Chitre Dr. Sangeeta Joshi


Head of Department Principal

i
PROJECT REPORT APPROVAL FOR S. E.
This project report entitled Multilingual Text Summarizer by
1. Hrishikesh Thakare (23101C0025)
2. Sanchit Atre (23101C0026)
3. Neeraj Sharma (23101C0027)

is approved for the degree of Bachelor of Technology in Information


Technology.

1._______________________________

Name and Signature External Examiner

2._______________________________

Name and Signature Internal Examiner

Date:

Place:

ii
DECLARATION

I declare that this written submission represents my ideas in my own words and
where others' ideas or words have been included, I have adequately cited and
referenced the original sources. I also declare that I have adhered to all principles of
academic honesty and integrity and have not misrepresented or fabricated or falsified
any idea/data/fact/source in my submission. I understand that any violation of the
above will be cause for disciplinary action by the Institute and can also evoke penal
action from the sources which have thus not been properly cited or from whom
proper permission has not been taken when needed.

Name of student Roll No. Signature


1. Hrishikesh Thakare 23101C0025
2. Sanchit Atre 23101C0026
3. Neeraj Sharma 23101C0027

Date:
Place: Mumbai

iii
ACKNOWLEDGEMENT

Before presenting our second year mini project work entitled “Multilingual
Text Summarizer”, we would like to convey our sincere thanks to the people
who guided us throughout the course for this project work.

First, we would like to express our sincere thanks to our beloved Principal Dr.
Sangeeta Joshi for providing various facilities to carry out this project.

We would like to express our immense gratitude towards our Project Guide
(Dr./Prof. Rasika Ransing) for constant encouragement, support, guidance,
and mentoring at the ongoing stages of the project and report.

We would like to express our sincere thanks to our H.O.D. Dr. Vidya Chitre,
for the encouragement, co-operation, and suggestions for progressing stages
of the report.

Finally, we would like to thank all the teaching and non-teaching staff of the
college, and our friends, for their moral support rendered during the course of
the reported work, and for their direct and indirect involvement in the
completion of our report work, which made our endeavour fruitful.

Date:
Place: Mumbai

iv
ABSTRACT
The rapid growth of digital content has made it increasingly important to process and
consume textual data efficiently. Reading lengthy documents or articles can be time-
consuming, especially for non-native English speakers. This project aims to address
this challenge by building a web-based application capable of generating concise
summaries of English text and translating them into native Indian languages such as
Hindi and Marathi. This dual functionality significantly enhances accessibility and
comprehension for regional language users.

The system utilizes an extractive summarization approach, employing techniques such


as tokenization, TF-IDF vectorization, thematic relevance, and sentence scoring based
on both content relevance and positional importance. The top-scoring sentences are
selected and assembled into a concise summary. To enable multilingual support, the
system integrates the googletrans library for real-time translation of the generated
summary into the user's chosen language.

The frontend of the application is developed using React.js, offering a clean and
responsive user interface where users can input their text, choose the output language,
and control the number of sentences in the summary. It also provides real-time
statistics, such as character, word, and sentence counts, enhancing user engagement
and usability. The backend, built with Python’s Flask framework, handles the core
logic for summarization and translation. It communicates with the frontend via a REST
API and ensures smooth, asynchronous processing of input requests.

This project demonstrates a practical application of Natural Language Processing


(NLP) in a full-stack environment and has the potential to be extended for document
summarization, academic research assistance, or multilingual content creation. It
provides a strong foundation for future enhancements in the field of AI-driven text
simplification and translation.

v
LIST OF FIGURES
Fig 4.1 System Block Diagram (Page No. 4)
Fig 6.1 Frontend User Interface (Page No. 7)
Fig 6.2 Backend API Workflow (Page No. 9)
Fig 7.1 Example Hindi Translation Output (Page No. 11)

vi
LIST OF TABLES
No tables are included as the project content did not require tabular representation.

vii
LIST OF ABBREVIATION
NLP: Natural Language Processing

TF-IDF: Term Frequency-Inverse Document Frequency

API: Application Programming Interface

UI: User Interface

REST: Representational State Transfer

HTTP: Hypertext Transfer Protocol

CORS: Cross-Origin Resource Sharing

NLTK: Natural Language Toolkit

viii
CONTENTS
Chapter Page
TITLE
No. no.
Abstract v
LIST OF FIGURES vi
LIST OF TABLES vii
LIST OF ABBREVIATION viii

1 INTRODUCTION 1
1.1 Problem Definition 1
1.2 Aim and Objective 1
1.3 Organization of the Report 1
2 REVIEW OF LITERATURE 2
2.1 Literature Survey 2
3 REQUIREMENT SPECIFICATION 3
3.1 Introduction 3
3.2 Hardware requirements 3
3.3 Software requirements 3
3.4 Feasibility Study 3
3.5 Cost Estimation 3
4 PROJECT ANALYSIS & DESIGN 4
4.1 System Block Diagram 4
4.2 Flow Diagram 4
5 METHODOLOGY 5
5.1 Extractive Summarization 5
5.2 Translation Logic 6
6 IMPLEMENTATION DETAILS 7
6.1 Frontend 7
6.2 Backend 8
6.3 Frontend-Backend Integration 10

ix
7 RESULT ANALYSIS 11
7.1 Example Input and Output 11
7.2 Comparative Analysis 12
8 CONCLUSION & FUTURE SCOPE 13

REFERENCES 14
PLAGIARISM REPORT 14
Github Link 14

x
Chapter 1: Introduction
1.1 Problem Definition
Manually reading large texts can be time-consuming, especially when users are non-native English
speakers. A system that can summarize and translate content into native languages like Hindi and
Marathi is needed to improve comprehension.

1.2 Aim and Objective


• To develop an extractive text summarization system for English text.
• To implement a translation layer that converts summaries to native languages.
• To offer a simple and user-friendly web interface.

1.3 Organization of Report


• Chapter 1 introduces the problem and objectives.
• Chapter 2 presents a literature review.
• Chapter 3 covers software and hardware requirements.
• Chapter 4 details the design.
• Chapter 5 explains the methodology.
• Chapter 6 includes implementation specifics.
• Chapter 7 analyses results.
• Chapter 8 concludes with future enhancements.

1
CHAPTER 2: REVIEW OF LITERATURE
2.1 Literature Survey
Existing tools like Sumy, TextRank, and gensim offer summarization but lack multilingual
translation. Google Translate can convert full text but doesn't combine with summarization. This
project combines summarization and native language translation, providing accessibility and
comprehension.

2
CHAPTER 3: REQUIREMENT SPECIFICATION
3.1 Introduction
This chapter outlines the essential requirements for building and deploying the project. It specifies
the necessary hardware, software, and other resources needed to implement the system.
Additionally, it covers the feasibility study, assessing the technical, operational, and economic
aspects of the project, as well as providing a cost estimation. The aim is to ensure that all necessary
resources are identified and that the project is achievable within the provided constraints.

3.2 Hardware Requirements


• CPU: Intel i3 or above
• RAM: 4 GB minimum
• Storage: 2 GB available space for code, libraries, and browser cache
• Optional: GPU not required, as the application runs lightweight models suitable for CPU
execution

3.3 Software Requirements


• Frontend: React, JavaScript, CSS
• Backend: Python, Flask, NLTK, Scikit-learn, Googletrans
• Libraries: NumPy, Flask-Cors, Googletrans

3.4 Feasibility Study


• Technical: Uses stable, open-source tools.
• Operational: Simple UI supports ease of use.
• Economic: No commercial tools required.

3.5 Cost Estimation


No direct costs incurred. Entire project built using free resources.

3
CHAPTER 4: PROJECT ANALYSIS & DESIGN
4.1 System Block Diagram

Fig. 4.1 System Block Diagram


4.2 Flow Diagram
1. User inputs text
2. React sends API request to Flask
3. Summarizer extracts key sentences
4. Summary is translated if needed
5. Output displayed in chosen language

4
CHAPTER 5: METHODOLOGY
5.1 Extractive Summarization
The summarization process uses an extractive approach based on statistical scoring of sentences.
The steps are as follows:
• Sentence Tokenization
o The input text is split into individual sentences using nltk.sent_tokenize().
o This step is crucial for sentence-level analysis and scoring.
• TF-IDF Vectorization
o Each sentence is converted into a numerical representation using TfidfVectorizer.
o The vectorizer uses word_tokenize() to break sentences into individual words,
considers unigrams, bigrams, and trigrams (ngram_range=(1,3)), and removes
English stop words
o This quantifies the importance of each word in a sentence relative to the entire text.
• Cosine Similarity Scoring
o A mean vector representing the entire document is computed.
o Each sentence vector is compared to the document vector using cosine similarity.
o This helps identify sentences that best represent the entire document.
• Positional Weighting
o Sentences appearing at the beginning and end are generally more informative.
o A U-shaped positional scoring function boosts the importance of these sentences.
o This bias ensures that the summary includes contextually important positions.
• Sentence Selection
o Each sentence receives a final score based on TF-IDF weight, similarity, and
position.
o The top N sentences (default: 3) are selected in their original order to maintain flow
and coherence.
o These sentences are concatenated to form the extractive summary.

5
5.2 Translation Logic
Once the summary is generated, the translation module processes it if the target language is not
English:
• Translation Library
o The googletrans==4.0.0rc1 library is used to translate text from English to Hindi
(hi) or Marathi (mr).
o It provides real-time access to Google Translate’s capabilities.
• Retry Mechanism
o Translation attempts can occasionally fail due to API limitations or network issues.
o A retry loop attempts up to 3 times to ensure successful translation before failing
gracefully.
• Language Flexibility
o While currently supporting Hindi and Marathi, the system can be extended to other
languages supported by Google Translate.
o The translated summary is returned to the frontend for user display.

6
CHAPTER 6: IMPLEMENTATION DETAILS
6.1 Frontend (React.js)
The frontend of the system is developed using React.js, focusing on usability and interactivity. The
application provides the following key features:
• Live Text Analysis: Dynamically updates and displays character count, word count, and
sentence count as the user types.
• Language Selection: Allows users to choose the output language—English, Hindi, or
Marathi—through a dropdown menu.
• Sentence Limit Control: Users can define the number of sentences they wish the summary
to contain, up to the maximum sentence count detected in the input.
• Summary Display and Copy Feature: After processing, the summary is displayed with
an option to copy it to the clipboard.
• Error Handling and Accessibility: Provides user-friendly error messages for network
issues, invalid inputs, and server delays. It also supports keyboard shortcuts and includes
accessibility attributes like aria-labels.
The frontend uses the native JavaScript fetch() API to send HTTP POST requests to the Flask
backend and handles the response asynchronously.

Fig. 6.1 Frontend User Interface

7
6.2 Backend (Flask API)
The backend is developed using Python and the Flask micro-framework. It manages the core logic
of the application, including summarization and translation. Key features include:
• Flask-based REST API: Processes requests from the frontend and returns a translated
summary based on user input.
• Cross-Origin Resource Sharing (CORS): Enabled using flask-cors to allow
communication between the React frontend and the Flask backend.
• Input Handling: Accepts user input text, target language, and desired number of sentences
for summarization.
• Output Response: Returns a translated summary along with a status indicator.
• Robust Error Handling:
o Manages missing or invalid input gracefully.
o Handles translation failures with retry mechanisms.
o Includes timeout safeguards to prevent the backend from hanging or slowing down.
• Lightweight and Efficient: Optimized for fast response and low resource usage, suitable
for deployment on modest hardware.

8
Fig. 6.2 Backend API Workflow

9
6.3 Frontend-Backend Integration
The integration between frontend and backend follows RESTful API principles and uses HTTP
protocol for communication:
• The React application sends data to the Flask server using fetch() with the Content-Type
set to application/json.
• The Flask server processes the input, returns the summary, and the React client displays it
in the UI.
• The system is developed and tested locally with the Flask server hosted at
http://127.0.0.1:5000 and the React app served via local development server (e.g., Vite,
Create React App, etc.).
This modular architecture enables clean separation of concerns and makes the system scalable for
future feature additions such as file uploads.

10
CHAPTER 7: RESULT ANALYSIS
7.1 Example Input and Output
Input Text:
Artificial Intelligence (AI) has transformed various industries, from healthcare to finance, by
automating tasks and improving efficiency. In healthcare, AI-powered algorithms assist doctors
in diagnosing diseases like cancer with greater accuracy. In finance, AI-driven systems help
detect fraudulent transactions, ensuring security in digital payments. Additionally, AI is widely
used in customer service through chatbots that provide instant support to users. Despite these
advantages, AI also raises ethical concerns, such as job displacement and privacy issues. Many
experts argue that while AI increases productivity, it should be regulated to prevent misuse.
Governments and organizations are now focusing on creating policies to ensure AI is used
responsibly. Furthermore, advancements in AI, such as natural language processing and machine
learning, continue to improve human-computer interaction. As AI evolves, it is crucial to balance
innovation with ethical considerations to create a future where technology benefits everyone.
Output Summary (in Hindi):
आर्टि र्िर्ियल इं टेर्लजेंस (एआई) ने र्िर्िन्न उद्योगों को स्वास्थ्य सेिा से र्ित्त तक, कायों को स्वचार्लत

करके और दक्षता में सुधार करके बदल र्दया है ।र्ित्त में, एआई-संचार्लत र्सस्टम र्िर्जटल िुगतान में

सुरक्षा सुर्नर्ित करते हुए, धोखाधड़ी लेनदे न का पता लगाने में मदद करते हैं ।इसके अलािा, प्राकृर्तक
िाषा प्रसंस्करण और मि़ीन लर्निंग जैसे एआई में प्रगर्त, मानि-कंप्यूटर इं टरै क्शन में सुधार करना जाऱी है ।

Fig. 7.1 Example Hindi Translation Output

11
7.2 Comparative Analysis
Original Length: 142 words
Summary Length: 61 words
Reduction: ~57% (The summary was generated by selecting the 3-sentence option available in
the mini project, which allows users to choose the number of sentences in the output.)
Translation Accuracy: Acceptable for informal use

12
CHAPTER 8: CONCLUSION & FUTURE SCOPE
Conclusion
The Multilingual Text Summarizer successfully demonstrates the integration of extractive
summarization and real-time translation to address the challenge of processing lengthy English
texts for non-native speakers. Key achievements include:
1. Technical Implementation:
o A fully functional web application combining a responsive frontend with an
efficient backend API.
o Effective summarization using advanced NLP techniques that achieve significant
text reduction while preserving meaning.
o Robust translation capabilities supporting multiple Indian languages.
2. User Benefits:
o Intuitive interface with real-time text analysis and user-friendly controls.
o Enhanced accessibility for regional language speakers through accurate
translations.
The project successfully meets its core objectives while maintaining technical feasibility and cost-
effectiveness.

Future Scope
• Add abstractive summarization using deep learning
• Support other Indian languages
• Allow input from PDFs or Word documents
• Offline summarization via local models

13
REFERENCES
• Googletrans Documentation
• NLTK Toolkit
• Scikit-learn TF-IDF
• Mihalcea & Tarau (2004), TextRank
• React & Flask Official Docs

APPENDICES
• GitHub link: Project Link
• Plagiarism Report:
• No papers published as of now

14

You might also like