0% found this document useful (0 votes)

63 views

Sentiment Analysis With NLP Deep Learning

1. The document discusses building a sentiment analysis model using deep learning and NLP techniques to classify user reviews of products on an e-commerce platform into positive or negative sentiment. 2. Key steps include preprocessing the text data by lowercasing, removing stop words and contractions, and creating feature vectors using TF-IDF. 3. A deep learning model with an embedding layer, multiple hidden layers and dropout is developed and evaluated using cross-validation, achieving a training log loss of 3.45 and test log loss of 3.69.

Uploaded by

Ankush Chajgotra

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views

Sentiment Analysis With NLP Deep Learning

Uploaded by

Ankush Chajgotra

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Sentiment Analysis with NLP & Deep Learning

D E E P LE A RNI NG I NT E RM E D I AT E NLP PYT HO N

This article was published as a part of the Data Science Blogathon.

Source: Author, Paint

Introduction

The main idea of this article is to help you all understand the concept of Sentiment Analysis Deep Learning
& NLP. Let’s try to understand this with the help of a case. Anirudh owns an e-commerce company-Universal
for the past 1 year and he was very happy as more and more new customers were coming to purchase
through his platform. One day he came to know that one of his friends was not satisfied with the product he
purchased through his platform. He purchased a foldable geared cycle and the parts required for assembly
were missing. He saw few negative reviews by other customers but he purchased from Anirudh as he was his
friend. After listening to his friend, Anirudh decided to deploy a machine-learning algorithm to categorize
user reviews and their sentiments so that his team can understand their customers better and provide the
products and services without any inconvenience to the customers.

Goal: To know what users are saying about products and services. It can help in future decision-making.

Objective: 1. To compile and Tag past data of User Reviews.

2. Use of NLP and Deep Learning to classify the Reviews and find out the Polarity/Sentiment.

His Team labelled the past User Review data. They were reading the Reviews and Classifying them into one
or more categories.

1 indicates the presence of that category. For Example, First Review is talking about usability and its
polarity/sentiment is negative as the user is complaining(indicated by 0) whereas the second review is
talking about features and Functionality having a positive Polarity/Sentiment(indicated by 1)

Anant is the Data Scientist in the company. Now it’s his time to be in the playground. He is using Jupyter
Notebook for the model building.

Getting Familiar with Data

Impor ting Core Libraries

1 import pandas as pd 2 import numpy as np 3 import matplotlib.pyplot as plt # For Plotting 4 import seaborn
as sns # For Plotting 5 from sklearn.metrics import log_loss # For Model Evaluation 6 from
sklearn.model_selection import RepeatedKFold # For Cross-Validation

Reading data

data = pd.read_csv("/content/train.csv") data.head() # First 5 rows

Observation: By looking at the data, it is clear that it has multiclass(12 classes). It is called a multiclass
problem. Apar t from multiclass a review can have >=1 category. For Example, Id-1 has 2 categories,
Features & Functionality. Categories are not mutually exclusive which is called multilabel. By combining
both, we can say that it is a multiclass, multilabel challenge.

data.columns # List of columns in the dataframe

data.isnull().sum()

data.info()

data.shape
Exploratory Data Analysis (EDA)

What % of users are talking negatively about the product/services?

sentiment_count = data.Polarity.value_counts() sentiment_type = data.Polarity.value_counts().index

# 1- Positive 0- Negative plt.pie(sentiment_count,lablels=sentiment_type, autopct='1.1f%%',colors=

['green', 'red'])

Observation: 22.3 % of users are giving negative reviews. In simple terms, only 1 out 5 users are giving
negative reviews.

Out of total Reviews in different categories, What % of reviews are negative in different categories?

col = ['Components', 'Delivery and Customer Support', 'Design and Aesthetics', 'Dimensions', 'Features',

'Functionality', 'Installation', 'Material', 'Price', 'Quality', 'Usability'] negative_category=[] for i in

col: k = ((data[i].sum()-data[data[i]==1]['Polarity'].sum())/data[i].sum())*100 negative_category.append(k)

fig = plt.figure(figsize = (30, 10)) plt.bar(col,negative_category,color ='red') plt.xlabel("Categories")

plt.ylabel("% of Negative Reviews") plt.title("Negative Reviews Category Wise") plt.show()

Observation: Out of total reviews in different categories, Users are giving the highest % of negative
reviews for the Component Category. This is followed by Material, Delivery, and Customer Suppor t.

Text Preprocessing in NLP

Before we feed our data as input to the machine learning algorithm, it’s impor tant to prepare the data in
such a way that it reduces the time for processing, takes less memory space, and gives the highest
metric evaluation.

Lower Casing & De contraction

The lower casing is removing capitalization from words so that it is treated the same. For example, Look
& look are considered different as the first one is capitalized.

import re def clean_text(text): text = text.lower() text = re.sub(r"what's", "what is ", text) text =

re.sub(r"'s", " ", text) text = re.sub(r"'ve", " have ", text) text = re.sub(r"can't", "can not ", text) text
= re.sub(r"n't", " not ", text) text = re.sub(r"i'm", "i am ", text) text = re.sub(r"'re", " are ", text)

text = re.sub(r"'d", " would ", text) text = re.sub(r"'ll", " will ", text) text = text.strip(' ') return
text

data[‘Review’] = data[‘Review’].map(lambda com : clean_text(com))

data[‘Review’][0]

Stop Words Removal

Stop words are used for grammatical flow and connecting sentences. For example, I ,are, my, me etc. It
does not convey any meaning. If we get rid of stop words, we can reduce the size of our data without
information loss. NLTK library is used here to remove stop words.
!pip install nltk import nltk from nltk.corpus import stopwords stop_words = stopwords.words('english')
data['Review'] = data['Review'].apply(lambda x: ' '.join([word for word in x.split() if word not in
(stop_words)])) data['Review'][0]

y = np.array(data[[ 'Components', 'Delivery and Customer Support', 'Design and Aesthetics', 'Dimensions',
'Features', 'Functionality', 'Installation', 'Material', 'Price', 'Quality', 'Usability','Polarity']]) X =

data['Review'] X.head(5) X.shapes

Feature Matrix through TF-IDF

from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer(min_df=20,ngram_range=(1,4), max_features=250) vectorizer.fit(X) X =

vectorizer.transform(X) # Taking X as input and converting into feature matrix(numerical values) X =
X.todense()

Deep Learning in Picture

Deep learning attempts to mimic the human brain, and analysis with deep learning fetches fruitful results
when we implement it in our model. In deep learning, there are at least three hidden layers. Each unit that
takes, processes or outputs the data is called a neuron just like we have in our brain. Based on the work
these neurons do, deep learning neurons are divided into input, hidden & output layers. More Hidden layers
mean a more complex model!

!pip install tensorflow from tensorflow.keras.models import Sequential from tensorflow.keras.layers import
Dense, Activation, Dropout from tensorflow.keras.initializers import RandomNormal def get_model(n_inputs,

n_outputs): batch_size = 256 hidden_units = 64 dropout = 0.2 model = Sequential()

model.add(Dense(hidden_units, input_dim=n_inputs,activation='relu', kernel_initializer='he_uniform'))
model.add(Dropout(dropout)) model.add(Dense(64,activation='relu', kernel_initializer='he_uniform'))
model.add(Dropout(dropout)) model.add(Dense(n_outputs)) model.add(Activation('sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='adam') return model

import tensorflow as tf def evaluate_model(X,y): results_test = [] results_train =[] callback =

tf.keras.callbacks.EarlyStopping(monitor='loss', patience=5,min_delta = 0.05) n_inputs, n_outputs =

X.shape[1], y.shape[1] cv = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1) for train_ix,test_ix in
cv.split(X): X_train, X_test = X[train_ix], X[test_ix] y_train,y_test = y[train_ix],y[test_ix] model =

get_model(n_inputs, n_outputs) model.fit(X_train,y_train,verbose = 0,epochs = 50,callbacks = callback)

yhat_train = model.predict(X_train) yhat_test = model.predict(X_test) train_log_loss = log_loss(y_train,
yhat_train) test_log_loss = log_loss(y_test,yhat_test) results_train.append(train_log_loss)
results_test.append(test_log_loss) return results_train,results_test,model
results_train,results_test,model = evaluate_model(X, y) print(results_train) print(results_test)

Training and Validation Score

print(sum(results_train)/len(results_train) print(sum(results_test)/len(results_test)

Training Log Loss = 3.45

Cross-Validation Log Loss = 3.69

Its time to test on current User Reviews

test_data = pd.read_csv("/content/test.csv") test_data.head(5)

Preprocessing

test_data['Review'] = test_data['Review'].map(lambda com : clean_text(com)) test_data['Review'] =

test_data['Review'].apply(lambda x: ' '.join([word for word in x.split() if word not in (stop_words)]))

test_vectorised_data = vectorizer.transform(test_data['Review'] ) test_vectorised_data =
test_vectorised_data.todense()

Prediction on Test Data

prediction_on_test_data = model.predict(test_vectorised_data)

df_test = pd.DataFrame(prediction_on_test_data, columns = ['Components', 'Delivery and Customer Support',

'Design and Aesthetics', 'Dimensions', 'Features', 'Functionality', 'Installation', 'Material', 'Price',

'Quality', 'Usability', 'Polarity'])

df_test
Conclusion

Let’s check a few Reviews classification and Polarity Suggested by our Model

Review: Made of very thin cheap metal broke on the very first crimp. Had to rush to a local hardware store
spend 60 more on another because the water was shut off in my home.Did not return because using the case
for the new one.

Our Model is categorizing it as Quality- 0.86 and Polarity/Sentiment-0.06(Negative)

Review: As good as the brand names, no jams or misfires on my Paslode fuel cell nailer or on my Banks
(HF) nailer.

Our Model is categorizing it as Functionality- 0.79 and Polarity/Sentiment-0.88(Positive)

Different departments now can take actions based on negative reviews in their bucket.

Thus from the above article, it has been lucidly explained as to how we can categorise user reviews and
and study the sentiment analysis with Deep Learning & NLP.

The media shown in this ar ticle is not owned by Analytics Vidhya and are used at the Author’s discretion.

Article Url - https://www.analyticsvidhya.com/blog/2022/02/sentiment-analysis-with-nlp-deep-learning/

DEVENDRA Maindola

Package Unit Instrumentation Specification - TBE and VDR
50% (2)
Package Unit Instrumentation Specification - TBE and VDR
53 pages
Coe Projects
No ratings yet
Coe Projects
7 pages
41 Perusse Alexander Aperusse PDF
No ratings yet
41 Perusse Alexander Aperusse PDF
7 pages
AI Lab - Manual - 136
No ratings yet
AI Lab - Manual - 136
17 pages
5 Mark Python Imp
No ratings yet
5 Mark Python Imp
30 pages
MLA Lab 6:-Implementation of Decision Tree
No ratings yet
MLA Lab 6:-Implementation of Decision Tree
16 pages
Spam Detection Using Tensorflow
No ratings yet
Spam Detection Using Tensorflow
13 pages
Pythonfile
No ratings yet
Pythonfile
36 pages
Practical No4 - 5 ML
No ratings yet
Practical No4 - 5 ML
11 pages
ml record
No ratings yet
ml record
21 pages
Week 7 Laboratory Activity
No ratings yet
Week 7 Laboratory Activity
12 pages
ANLY 502 Final Report
No ratings yet
ANLY 502 Final Report
7 pages
Phase 2
No ratings yet
Phase 2
17 pages
10 PDF
No ratings yet
10 PDF
12 pages
Aipython Warmup Session 2
No ratings yet
Aipython Warmup Session 2
7 pages
Few-Shot Learning Tutorial - Medium
No ratings yet
Few-Shot Learning Tutorial - Medium
16 pages
Cse425 Assignement - 20101257
No ratings yet
Cse425 Assignement - 20101257
12 pages
Lecture Material 10
No ratings yet
Lecture Material 10
9 pages
AIML Lab 3
No ratings yet
AIML Lab 3
17 pages
CCS355-Neural networks and deep learning__Assignment 1
No ratings yet
CCS355-Neural networks and deep learning__Assignment 1
15 pages
Machine Learning Code Explanation
No ratings yet
Machine Learning Code Explanation
33 pages
ANN Detection Technique
No ratings yet
ANN Detection Technique
20 pages
DL-basics-of-neural-networks-MNIST-dataset.ipynb - Colab
No ratings yet
DL-basics-of-neural-networks-MNIST-dataset.ipynb - Colab
5 pages
CCS355-Neural networks and deep learning_____Assignment 1
No ratings yet
CCS355-Neural networks and deep learning_____Assignment 1
15 pages
TMA01 Question 2 (55 Marks)
No ratings yet
TMA01 Question 2 (55 Marks)
26 pages
Hyperparameter Tuning the Random Forest in Python BOM 3_ by Will Koehrsen _ Towards Data Science
No ratings yet
Hyperparameter Tuning the Random Forest in Python BOM 3_ by Will Koehrsen _ Towards Data Science
15 pages
RS
No ratings yet
RS
7 pages
Mining and Visualising Real-World Data: About This Module
100% (1)
Mining and Visualising Real-World Data: About This Module
16 pages
mini4
No ratings yet
mini4
9 pages
Maneesha Nidigonda Verzeo Major Project
No ratings yet
Maneesha Nidigonda Verzeo Major Project
11 pages
ML Activity Kalyan
No ratings yet
ML Activity Kalyan
21 pages
Arnav MLlab04
No ratings yet
Arnav MLlab04
7 pages
Step 1: Finding The Data Set: "Amazon - Reviews - Multilingual - UK - v1 - 00.tsv - GZ" 'RT' "Utf8"
No ratings yet
Step 1: Finding The Data Set: "Amazon - Reviews - Multilingual - UK - v1 - 00.tsv - GZ" 'RT' "Utf8"
4 pages
Tahapan Sentiment Analysis
No ratings yet
Tahapan Sentiment Analysis
6 pages
AI Phase4
No ratings yet
AI Phase4
11 pages
ML Lab PT
No ratings yet
ML Lab PT
25 pages
TMA01 Question 1 (45 Marks)
No ratings yet
TMA01 Question 1 (45 Marks)
31 pages
Deep Learning Lab Assignments - 6-9
No ratings yet
Deep Learning Lab Assignments - 6-9
14 pages
Implementation of Time Series Forecasting
No ratings yet
Implementation of Time Series Forecasting
12 pages
Machine Learning With SQL
100% (1)
Machine Learning With SQL
12 pages
DATA SPLITTING-TRAINING MATERIAL
No ratings yet
DATA SPLITTING-TRAINING MATERIAL
42 pages
ML MANUAL WITH OUTPUTS (2)
No ratings yet
ML MANUAL WITH OUTPUTS (2)
30 pages
C2W3_Lab_01_Model_Evaluation_and_Selection
No ratings yet
C2W3_Lab_01_Model_Evaluation_and_Selection
21 pages
Machine Learning LAB: Practical-1
100% (2)
Machine Learning LAB: Practical-1
24 pages
Exercise and Experiment 3
No ratings yet
Exercise and Experiment 3
14 pages
Big Data Merged
No ratings yet
Big Data Merged
7 pages
C2W3 Lab 01 Model Evaluation and Selection
No ratings yet
C2W3 Lab 01 Model Evaluation and Selection
21 pages
DL
No ratings yet
DL
12 pages
Feature Engineering
No ratings yet
Feature Engineering
20 pages
ML practical Manjot 6-10
No ratings yet
ML practical Manjot 6-10
10 pages
Advanced Recommender Systems With Python
No ratings yet
Advanced Recommender Systems With Python
13 pages
Py_ Customer Churn Classification — Actuaries' Analytical Cookbook
No ratings yet
Py_ Customer Churn Classification — Actuaries' Analytical Cookbook
76 pages
Email Spam Classifier
No ratings yet
Email Spam Classifier
22 pages
ML practical Kunal 6-10
No ratings yet
ML practical Kunal 6-10
10 pages
Data analytics assignment solutions
No ratings yet
Data analytics assignment solutions
20 pages
Pythonexameple Code for Builtins Fumctipns
No ratings yet
Pythonexameple Code for Builtins Fumctipns
12 pages
Federated Learning For Image Classification - Ipynb - Colab
No ratings yet
Federated Learning For Image Classification - Ipynb - Colab
8 pages
Assignment 4x
No ratings yet
Assignment 4x
19 pages
AIL303 M
No ratings yet
AIL303 M
22 pages
Twitter Sentiment Analysis Dss
No ratings yet
Twitter Sentiment Analysis Dss
14 pages
MCS-011: Problem Solving and Programming
From Everand
MCS-011: Problem Solving and Programming
Dr. DK Sukhani
No ratings yet
Sunnydale: Half Yearly Examination 2020 2021
No ratings yet
Sunnydale: Half Yearly Examination 2020 2021
14 pages
Adv Synonyms PDF
No ratings yet
Adv Synonyms PDF
2 pages
Manisha - Smart Phones Vs DSLR - Final - 2020
No ratings yet
Manisha - Smart Phones Vs DSLR - Final - 2020
34 pages
Thesis-M Deri Taufan
No ratings yet
Thesis-M Deri Taufan
106 pages
Techopedia Explains: Common Object Request Broker Architecture (CORBA)
No ratings yet
Techopedia Explains: Common Object Request Broker Architecture (CORBA)
1 page
Python PRactice Questions
No ratings yet
Python PRactice Questions
12 pages
Sanjivani K. B. P. Polytechnic: Department of Computer Technology
No ratings yet
Sanjivani K. B. P. Polytechnic: Department of Computer Technology
5 pages
Radiant Lighting B.R (15-16) PDF
No ratings yet
Radiant Lighting B.R (15-16) PDF
7 pages
5 - Marris TLS 31 TOCPA Finland 9 Mar 2017 Rev
100% (2)
5 - Marris TLS 31 TOCPA Finland 9 Mar 2017 Rev
34 pages
October 2009
No ratings yet
October 2009
48 pages
TED Java Development Estimation v3
No ratings yet
TED Java Development Estimation v3
11 pages
WinTest EC Version 6.0.17.1
100% (2)
WinTest EC Version 6.0.17.1
46 pages
Adobe Photoshop Level 1 - EnG
No ratings yet
Adobe Photoshop Level 1 - EnG
56 pages
Autosar TR Bswumlmodelmodelingguide
No ratings yet
Autosar TR Bswumlmodelmodelingguide
51 pages
Rahul Bhangale
No ratings yet
Rahul Bhangale
5 pages
CGR Microproject
No ratings yet
CGR Microproject
11 pages
How To Install Nagios On Debian Wheezy
No ratings yet
How To Install Nagios On Debian Wheezy
20 pages
Data Science S (2 Files Merged)
No ratings yet
Data Science S (2 Files Merged)
30 pages
Kiln Roller Water Jacket-Model PDF
No ratings yet
Kiln Roller Water Jacket-Model PDF
1 page
How To Start A Cryptocurrency Exchange
No ratings yet
How To Start A Cryptocurrency Exchange
23 pages
Io TSystem Architectureand Technologies
No ratings yet
Io TSystem Architectureand Technologies
6 pages
AVAYA Voicemail Pro - Sample Exercises
No ratings yet
AVAYA Voicemail Pro - Sample Exercises
74 pages
Demystifying Radius Authentication
No ratings yet
Demystifying Radius Authentication
17 pages
Unix Commands On Different OS's
No ratings yet
Unix Commands On Different OS's
24 pages
Software Nature Qualities
No ratings yet
Software Nature Qualities
15 pages
Focus 35C/43C Detectors With Image Suite V4 Software
No ratings yet
Focus 35C/43C Detectors With Image Suite V4 Software
2 pages
12 Tips For Troubleshooting Your Internet Connection
No ratings yet
12 Tips For Troubleshooting Your Internet Connection
12 pages
RTS KP-Series Brochure
No ratings yet
RTS KP-Series Brochure
2 pages
WBP Micro Project
No ratings yet
WBP Micro Project
18 pages