1 - Sentiment - Analysis - NLP - Ipynb - Codes Only
1 - Sentiment - Analysis - NLP - Ipynb - Codes Only
ipynb - Colaboratory
%cd /content/drive/MyDriveb/nlp_project
!ls #checking if files are there or not
mydata.head()
As can be seen above, dataset is imbalanced. Thus we will be using Undersampling technique to balance the dataset.
def clean_text(text):
# to remove special characters and punctuation
text = re.sub(r"[^\w\s]", " ", text)
return text
https://colab.research.google.com/drive/1v0c7kmDSBApUGFTq0mibaidimGca-PWg?authuser=6#scrollTo=Rx1NfrvIxcyP&printMode=true 1/5
3/26/24, 4:07 PM 1_sentiment_analysis_nlp.ipynb - Colaboratory
import pandas as pd
data_balanced
keyboard_arrow_down Splitting the dataset into 5% training and 95% test dataset
import pandas as pd
# Necessary packages
import pathlib
import textwrap
def to_markdown(text):
text = text.replace('•', ' *')
return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))
genai.configure(api_key=GOOGLE_API_KEY)
for m in genai.list_models():
if 'generateContent' in m.supported_generation_methods:
print(m.name)
%%time
response = model.generate_content("how great is MS Dhoni?")
to_markdown(response.text)
keyboard_arrow_down Integrating the Gemini pro API to our sentiment analysis task
https://colab.research.google.com/drive/1v0c7kmDSBApUGFTq0mibaidimGca-PWg?authuser=6#scrollTo=Rx1NfrvIxcyP&printMode=true 2/5
3/26/24, 4:07 PM 1_sentiment_analysis_nlp.ipynb - Colaboratory
test_set_sample
json_data = test_set_sample[['clean_reviews','pred_label']].to_json(orient='records')
prompt = f"""
You are an expert linguist, who is good at classifying customer review sentiments into Positive/Negative labels.
Help me classify customer reviews into: Positive(label=1), and Negative(label=0).
Customer reviews are provided between three back ticks.
In your output, only return the Json code back as output - which is provided between three backticks.
Your task is to update predicted labels under 'pred_label' in the Json code.
Don't make any changes to Json code format, please.
```
{json_data}
```
"""
print(prompt)
print(response.text)
import json
df_sample
test_set_sample['pred_label'] = df_sample['pred_label'].values
test_set_sample
y_true = test_set_sample["label"]
y_pred = test_set_sample["pred_label"]
confusion_matrix(y_true, y_pred)
test_set_total = test_set.sample(100)
test_set_total['pred_label'] = ''
test_set_total
batches = []
batch_size = 25
https://colab.research.google.com/drive/1v0c7kmDSBApUGFTq0mibaidimGca-PWg?authuser=6#scrollTo=Rx1NfrvIxcyP&printMode=true 3/5
3/26/24, 4:07 PM 1_sentiment_analysis_nlp.ipynb - Colaboratory
import time
def gemini_completion_function(batch,current_batch,total_batch):
"""Function works in three steps:
# Step-1: Convert the DataFrame to JSON using the to_json() method.
# Step-2: Preparing the Gemini Prompt
# Step-3: Calling Gemini API
"""
json_data = batch[['clean_reviews','pred_label']].to_json(orient='records')
prompt = f"""You are an expert linguist, who is good at classifying customer review sentiments into Positive/Negative labels.
Help me classify customer reviews into: Positive(label=1), and Negative(label=0).
Customer reviews are provided between three backticks below.
In your output, only return the Json code back as output - which is provided between three backticks.
Your task is to update predicted labels under 'pred_label' in the Json code.
Don't make any changes to Json code format, please.
Error handling instruction: In case a Customer Review violates API policy, please assign it default sentiment as Negative (label=0).
```
{json_data}
```
"""
print(prompt)
response = model.generate_content(prompt)
time.sleep(5)
return response
batch_count = len(batches)
responses = []
for i in range(0,len(batches)):
responses.append(gemini_completion_function(batches[i],i,batch_count))
import json
test_set_total['pred_label'] = df_total['pred_label'].values
test_set_total
y_true = test_set_total["label"]
y_pred = test_set_total["pred_label"]
print(confusion_matrix(y_true, y_pred))
print(f"\nAccuracy: {accuracy_score(y_true, y_pred)}")
https://colab.research.google.com/drive/1v0c7kmDSBApUGFTq0mibaidimGca-PWg?authuser=6#scrollTo=Rx1NfrvIxcyP&printMode=true 4/5
3/26/24, 4:07 PM 1_sentiment_analysis_nlp.ipynb - Colaboratory
https://colab.research.google.com/drive/1v0c7kmDSBApUGFTq0mibaidimGca-PWg?authuser=6#scrollTo=Rx1NfrvIxcyP&printMode=true 5/5