0% found this document useful (0 votes)
3 views4 pages

Generation_of_Brand_Product_Reputation_using_Twitter_Data

The document discusses a method for generating brand reputation using sentiment analysis of Twitter data, focusing on categorizing sentiments into positive, negative, or neutral. It highlights the importance of social media analytics in understanding customer opinions and improving business strategies. The proposed system utilizes the Hadoop Map-Reduce framework for processing large datasets and employs the Maximum Entropy Algorithm for sentiment classification.

Uploaded by

tarek brs
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
3 views4 pages

Generation_of_Brand_Product_Reputation_using_Twitter_Data

The document discusses a method for generating brand reputation using sentiment analysis of Twitter data, focusing on categorizing sentiments into positive, negative, or neutral. It highlights the importance of social media analytics in understanding customer opinions and improving business strategies. The proposed system utilizes the Hadoop Map-Reduce framework for processing large datasets and employs the Maximum Entropy Algorithm for sentiment classification.

Uploaded by

tarek brs
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 4

2018 International Conference on Information, Communication, Engineering and Technology (ICICET)

Zeal College of Engineering and Research, Narhe, Pune, India. Aug 29-31, 2018

Generation of Brand/Product Reputation using


Twitter Data
1 2
Prof. Pranalini A. Joshi Garry Simon
Department of Information Technology Department of Information Technology
Zeal College of Engineering and research Zeal College of Engineering and research
Pune, India Pune, India
pranalini.ketkar@gmail.com garysimon7777@gmail.com
3
Prof. Yogesh P. Murumkar
Department of Information Technology
Zeal College of Engineering and research
Pune, India
yogeshpm5555@gmail.com

Abstract— Sentiment Analysis is a variant of Opinion has ever been same, its capabilities have only rapidly
Mining. It basically deals with going through volumes of multiplied and its reach has substantially grown. Social
already existing data collected from the Social Networking Media Platforms form an integral component of the World
Websites such as Twitter, and processing that data in order to Wide Web revolution. Social Media has provided the
derive conclusion(s) from it. Not only that, it takes it a step customers a new and incomparable channel to interact with
further, where it not only gathers and analyses the data, but the organizations, businesses and also provides them an
also categorizes the same primarily into three categories unprecedented opportunity to offer their opinions,
namely positive, negative and sometimes even neutral. The suggestions, remarks on their products and the services that
data from Twitter is collected and analyzed on the fly to get
are being offered. Social Media possesses the unparalleled
sentiments out of the public for a particular brand. This very
feature of Sentiment Analysis can be used to recognize the
ability to influence the perspective of the customers and their
market value of a business brand by its users and after interests and inclination in purchasing the products or
comprehending the overall value of the brand in the eyes of its services. Thus, with the launch of the Social Media, the
consumer, the brand owners can determine how their product customers are equipped with an ability to give their opinions
is performing in the market in order to take, corrective action, about any topic under the sun and not only that, this ability
if the need arises, to improve their product and strategically could be further extended to discussions, public polls,
take over the market. Thus, this paper proposes a smart debates etc. on a public platform. Thus, Online Social
method to campaign for a business brand, whereby the Networks, along with the micro-blogging websites, have
business owner determines his position in the market, and how become the top priority for the user to express their thoughts
well(or bad) his business is doing ,by mining data and deriving on a particular product or an event or any activity, and that
inferences from the same, rendering them the capability to too in real time. Sentiment Analysis is used to derive
make insightful and well-informed decisions, thereby inferences from diverse texts. This appealing property of the
providing a cost-effective as well as a highly efficient method to Sentiment Analysis can be used to extract reviews, to
review a business. Thus, it gives the business owners an ability conduct election polls and to determine answers to trending
to add value to their business and acquire a competitive edge. questions. By studying and interpreting the user's behavior
on the social online networks, the users determine as to how
Keywords— Sentiment Analysis, Opinion Mining, Social
the customers take their products and services, and also
Brand Monitoring, Social Media Analytics, Business Analytics
figure out, ways and means, to better their brand reputation
I. INTRODUCTION and exponentially increase their electronic commerce.

Business Analytics has been in boom since several II. LITERATURE SURVEY
decades. Many organizations have realized the importance of Following are among the many challenges in the domain
the same and have invested significant amount in this global of Sentiment Analysis which need to be dealt with and
phenomenon. This has enabled organizations to take resolved:
cognizance of the current market scenario and strategically
steer their businesses to success, reaping exponential profits i)"Hidden Sentiment Identification" is to analyze and
and unprecedented growth. Social Media Analytics is a comprehend the actual emotion in the data rather than simply
branch of Business Analytics(BA) and has practically grown classifying into any of the three polarities i.e. positive,
into a profound and widely used technical strategy in the negative or neutral.
business spheres. Social Media Analytics can be concisely ii)"Handling Polysemy" is nothing but having more than
defined as an analytic capability to analyze and break-down one meaning of the same word leading to multiple sentiment
huge of data, both semi-structured and unstructured data polarity.
from Social Media. Social Media is the "new big thing"
which has happened to the world and not without good iii)"Mapping Slangs" is to narrow down the slangs in the
reasons. It is a revolution in itself, which has given the data and to determine their associated meanings and
organizations, an alternate and unique medium of conclude their polarity. Generally, the practice has been that,
communication, where they have unlimited access to huge in order to figure out the reputation of any business, tools or
amount of useful data. Since the advent of World Wide Web services are provided by various agencies, wherein several
2.0, the Internet has been redefined in every way and nothing sentiment analysis algorithms are implemented to determine

978-1-5386-5510-8/18/$31.00 ©2018 IEEE 1


Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on March 15,2024 at 23:07:09 UTC from IEEE Xplore. Restrictions apply.
the sentiment in a sentence or extract the opinion from the Operations such as opening, closing, renaming file and
text. Now algorithms used to determine the polarity of the directories are managed by the Master (Name Node) along
text in question, consist of using lexical resources. Other with the mapping of blocks to Data Nodes. It also regulates
popular approaches are based on Machine Learning where access to files by clients. Slaves (Data nodes) are responsible
popular algorithms such as Support Vector Machines or for serving read and write requests from the client along with
Naive Bayes Classifiers are utilized. Along with extracting block creation, deletion and replication upon respective
the sentiment in the text, the other advantage of the instructions from the Master (Name Node).
Sentiment Analysis, is to evaluate and determine the
influence of the users on the Social Networking portals or B. Hadoop Map Reduce Framework
the microblogging sites. Various Social Media Monitoring
tools and Social Media Services are available which evaluate
how much a particular brand is visible on the social
networks. Brand Watch and Sysomos are few of the
prominent examples which are used for business marketing
and to understand how the customers really feel about them.
III. METHODOLOGY
Hadoop Map-Reduce Framework
Hadoop is an open source software project written in
Java. It used to optimize the usage of massive volumes of
data. It is essentially a software framework, for the
distributed processing of large datasets across large clusters
of commodity servers. Hadoop is based on simple
programming model called the MapReduce model. It
basically provides reliability through Replication.
A. Hadoop Ecosystem
In the Hadoop Ecosystem, there are two components:
i) HDFS (Hadoop Distributed File System) for purpose
of storage.
ii)MapReduce for Processing.
Hadoop Distributed File System
It is one of the primary components of the Hadoop
clusters and it is designed in the structure of the Master- Fig. 2. HDFS Architecture..
Slave Architecture.
When a client makes a request for a Hadoop cluster, this
request is managed by the JobTracker. The JobTracker,
working with the NameNode, distributes work as closely as
possible to the data on which it will work. The NameNode
is the master of the file system,
Providing the metadata services for data distribution and
replication. The JobTracker schedules map and reduce tasks
into available slots at one or more TaskTrackers. The Map
and Reduce operations are performed on the Data Node
which are slaves to the NameNode. When the map and
reduce tasks are completed, the TaskTracker notifies the
JobTracker, which identifies which all tasks are complete
and eventually notifies the client after the conclusion of the
job.
IV.PROPOSED SYSTEM
This system has the capacity to gauge the feelings of the
customers about the product and hence understand their
position in the market. By analyzing the content produced by
the users, the organizations can obtain an effective idea
about what the users think of their products, as a result, they
can effectively manage their reputation in the market and
take corrective action before the user gets to respond on a
Fig. 1. Hadoop Master/Slave Architecture
particular product, with the help of ad-hoc marketing
campaigns and digital marketing, in order to assess the

2
Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on March 15,2024 at 23:07:09 UTC from IEEE Xplore. Restrictions apply.
sentiment of their customers. More importantly, the data Once the Data has been prepared, groomed and refined,
available on the Social Media Platforms is free of cost and the next and the most vital stage is to extract and identify the
hence no question of being burdened financially and hence sentiment hidden in the text and it is achieved through the
this freely available data can be used to create the prediction Maximum Entropy Algorithm. This enables us, not only to
models in order to accurately predict the sentiment. Hence, determine the polarity in the sentence but also to
more or less the objective of the system is to obtain the comprehend the influence of the user on Twitter who wrote
recent tweets in the required time frame, and to evaluate the it. Ordinarily the approach used to gauge the influence of a
tweets in order to get the sentiments of the users from the particular user is, by getting hold of his followers, his
text after it has been analyzed. So that, on the collection and mentions on Twitter and reactions to his tweet. The pre-
collation of these tweets, the overall image of the business classified data for training the model is provided by a
can be generated. dictionary known as the SentiWordNet dictionary. The
Maximum Entropy Algorithm, uses Entropy as a criterion to
V. SYSTEM DESIGN polarize the text into the concerned classes of Positive,
Negative and Neutral with the help of the training data
provided. The Maximum Entropy Algorithm, is a
probabilistic model, that excels in the classification of text. It
also takes relatively less time to train the data when
compared to other algorithms. Moreover, Laplacian
Smoothing is used to deal with the words that have not been
encountered in the Training Model. Another noteworthy
aspect of this system is that Maximum Entropy Algorithm is
used in combination with Part of Speech Tagging so as to
achieve and maintain the best possible accuracy. Also,
Negation Handling techniques are employed to take care of
"not" in sentences, so that the meaning of the sentence is not
altered.
A. Emoticons
These are entities used in sentences in order to convey a
feeling or an emotion in a given text. They are most widely
Fig. 3. Process of Sentiment Analysis-The Flow used and found in written communication. Over the last
decades, they have dominated the Social Networking sites.
Tweet Data is accumulated using streaming API, known Some examples of Emoticons are as follows: - Emoticon for
as Twitter4j, which provides Tweet Data for the particular a positive feeling/emotion :-) Emoticon for a negative
topic. feeling/emotion :-( and our application makes use of
Twitter 4j API, renders us the ability to crawl the web them in order to classify the post into different classes of
and in this case, Twitter. This API can be simply obtained by Polarity.
possessing a Twitter account and being registered as a B. System Architecture
developer.
The entire application consists of three distinct function
The collected Twitter Data is analyzed by gathering the tiers.
adjectives in the tweet and categorizing the data into
positive, negative or neutral. The analysis of the data is i) Presentation Layer: -This is what the end-user sees and
executed in parallel using Apache Scala and their where the input is collected and the output is displayed. This
RDDs(Resilient Distributed Datasets). Data is prepared is the layer established for the purpose of interaction with the
using the following set of procedures: - i) Stop Word end-user. Input is taken from the user in the form of
Removal: -Stop Words are the words that don’t generate any keywords to be searched for or with name of the
sentiments, and hence are dead weights. Thus, it is brand/product along with start-date and the end-date of the
mandatory to get rid of them, in order to optimize the search, in the data streamed from Twitter.
process ii) Tokenization: -is used so that the tokens can be
singled out and identified i.e. the given text is broken down ii) Application Layer: -This layer is used for executing
to its individualistic components so that the text is pre- all the Logical Operations. This layer is created using the
processed for tagging the different Parts of Speech iii) POS Apache Scala Language. This layer accomplishes its task of
(Part of Speech) tagging: -Several Parts of Speech such as Sentiment Analysis by seeking adjectives in the given tweets
nouns, adjectives, verbs and more are found out in this and polarizing them into categories of classes of Positive,
phase. The objective of Part of Speech, is to separate out the Negative or Neutral.
adjectives from a phrase so that the underlying latent
emotion can be identified with ease. The emphasis is laid out iii) Database Layer:-It is the layer used for the purpose of
more on disintegrating the sentence and isolating adjectives storage. Data from Twitter is streamed into the HDFS using
from them. Twitter4j API. Using this interface, all the content in the
Twitter regarding a particular feature, can be pulled from its
Apache Scala is used to stream the data from Twitter database and stored in this layer.
using Twitter4j API and the data is acquired and stored in
the JSON(JavaScriptObjectNotation) format, which is light-
weight format used for the purpose of communication.

3
Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on March 15,2024 at 23:07:09 UTC from IEEE Xplore. Restrictions apply.
A pre-decided number of tweets are drawn from the
Twitter Database, which are found to be relevant to the string
keyed-in and then are analyzed to conclude the holistic
sentiment regarding the keyword. The results are then
visualized, in the graphical format, using various Graphical
representations such as Pie-charts or Donut-shapes etc. and
tables. The sentiment is graphically shown and the polarity is
displayed in the tables for every tweet collected.
VII. CONCLUSION
Sentiment Analysis is the need of the hour for any and all
businesses; to not only determine their market value in the
eyes of the customer, but also to give them a competitive
advantage by offering deep insights in the market scenario. It
is proved that our application can be used to derive accurate
conclusions, from data that is collected in real time and
Fig. 4. System Architecture scrutinized also in real time, thereby, providing results on the
fly.
Finally, the result is displayed using a Graphical Format
such as pie-chart, donut or a half-donut. Then, the overall REFERENCES
Sentiment is derived and summarized into any of the [1] Guerrero, J. Olivas, F. Romero, and E. Viedma, “ Sentiment analysis:
following emotions: - i) Joy ii) Disappointment iii) Furious a review and comparative analysis of web services,”
iv) Thrilled. The Polarity in every tweet is categorized into Information Sciences, vol. 311, pp. 18–38, 2015.
the following sets: i) 0 ii) 1 iii)-0.5 iv) 0.5. Since the system [2] “Sentiment analysis AP Is” [online]. Available
:https://konghq.com/blog/list-of-20-sentiment-analysis-apis/
is analyzing real time data, the data is collected and analyzed [3] “Sentiment analysis tools track social market ing
on the fly and thus, this application is successful in providing success”[online].Available:
Sentiment Analysis over any topic in Real Time, hence, https://www.iprospect.com/en/ca/blog/10-sentiment-analysis- tools-
characterizing it as a Real Time Application. track-social-marketing-success/
[4] Pang Bo, Lee Lillian,Vaithyanathan Shivakumar, "Thumbs up?
VI. RESULT Sentiment Classification using Machine Learning Techniques,"
Proceedings of the Conference on Empirical Methods in
Natural Language Processing (EMNLP ). pp. 79–86, 2002.5
A Keyword, in the form of a string, is accepted from [5] www.apache.org
the user. The user can type-in the Text box provided next to
the Search button, any string, which is relevant for a brand.

4
Authorized licensed use limited to: Consortium - Algeria (CERIST). Downloaded on March 15,2024 at 23:07:09 UTC from IEEE Xplore. Restrictions apply.

You might also like