Amazon Sentiment Analysis Documentation
Amazon Sentiment Analysis Documentation
**Purpose:** To analyze and classify the sentiment of Amazon reviews using machine learning
**Dataset:** The dataset includes Amazon reviews labeled with scores from 1 to 5. The training data
has 3 million samples, while testing data has 650,000 samples. Each sample includes a class index
1. **Importing Libraries**
- `TfidfVectorizer` and `CountVectorizer`: Convert text into numerical term frequencies or TF-IDF
scores.
- `SVC` (Support Vector Classifier) and `LogisticRegression`: Models for sentiment classification.
5. **Evaluation Metrics**
6. **Data Visualization**
- `matplotlib` and `seaborn`: Create various plots, like word frequency distributions.
- **Data Load:** The dataset is loaded and the first few rows are displayed for verification.
- **Dataset Dimensions:** Shows the size, helping understand the scale of the data.
- Renames columns for clarity, combining title and review text for unified analysis.
9. **Assigning Sentiment Labels**
- A function assigns labels for Positive (rating > 3), Negative (rating < 3), and Neutral (rating = 3).
- **Sentiment Distribution**: Visualizes the count of positive, negative, and neutral sentiments.
class balance.
- Missing values in reviews are filled with empty strings, preventing blank entries.
- Review lengths are grouped and counted, helping understand typical review length distribution.
- Cleaning steps are detailed for future processing: tokenization, stop words removal,
class distribution.
- A word cloud visually represents word frequencies, with common words appearing larger,