0% found this document useful (0 votes)
23 views8 pages

Code Documentation

The document details the implementation of an emotion-aware chatbot system that utilizes a fine-tuned BERT model for emotion classification and generates empathic responses based on detected emotions. It includes sections on system architecture, performance testing, and component details, highlighting features such as a RESTful API and a hybrid prediction system. Performance analysis indicates fast inference times and recommendations for dataset expansion and model optimization to improve accuracy.

Uploaded by

Larisa Elena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views8 pages

Code Documentation

The document details the implementation of an emotion-aware chatbot system that utilizes a fine-tuned BERT model for emotion classification and generates empathic responses based on detected emotions. It includes sections on system architecture, performance testing, and component details, highlighting features such as a RESTful API and a hybrid prediction system. Performance analysis indicates fast inference times and recommendations for dataset expansion and model optimization to improve accuracy.

Uploaded by

Larisa Elena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Emotion Chatbot System - Code

Documentation and Performance Analysis


Table of Contents
1. System Overview
2. Code Architecture
3. Component Details
4. Performance Testing
5. Chapter Summaries
Chapter 5 Summary
Chapter 6 Summary
Chapter 7 Summary

System Overview
This project implements an emotion-aware chatbot system that detects emotions in user text and generates appropriate empathic responses. The system uses a fine-
tuned BERT-based model (bert-base-uncased) for emotion classification across 8 emotion categories: joy, sadness, anger, fear, surprise, disgust, love,
and neutral.

Key Features
Emotion Classification: Fine-tuned BERT model for multi-class emotion detection
Hybrid Prediction System: Combines neural model predictions with keyword-based fallback for improved accuracy
Context-Aware Responses: Selects empathic responses from a curated dataset based on detected emotions
RESTful API: Flask-based web service for real-time emotion detection and response generation
Performance Optimized: Fast inference with keyword matching for common cases

Code Architecture
The codebase is organized into modular components:

research_project/
├── src/
│ ├── models/ # Model training and evaluation
│ │ ├── train_classifier.py # Fine-tune BERT for emotion classification
│ │ └── evaluate.py # Evaluate model performance
│ ├── chatbot/ # Chatbot application
│ │ ├── app.py # Flask REST API server
│ │ └── response_selector.py # Emotion detection and response selection
│ └── utils/ # Utility functions
│ ├── dataset.py # Data loading and preprocessing
│ └── metrics.py # Evaluation metrics
├── data/ # Dataset files
│ └── emotion_chatbot_samples.csv
├── models/ # Trained model checkpoints
└── tests/ # Unit tests

Component Details
1. Model Training (src/models/train_classifier.py)
Purpose: Fine-tunes a pre-trained BERT model for emotion classification.

Key Functionality:

Loads emotion-labeled text data from CSV


Tokenizes text using BERT tokenizer (max_length=128)
Splits data into train/validation/test sets (70%/15%/15%)
Fine-tunes bert-base-uncased using Hugging Face Transformers
Saves model and tokenizer to specified output directory

Training Configuration:

Base model: bert-base-uncased


Batch size: 8 (train), 16 (eval)
Epochs: 3 (default)
Evaluation strategy: After each epoch
Best model selection: Based on F1 score

Usage:

python src/models/train_classifier.py --data data/emotion_chatbot_samples.csv --output models/emotion_bert_small --epochs 3

2. Model Evaluation (src/models/evaluate.py)


Purpose: Evaluates trained model performance on test set.

Key Functionality:

Loads trained model and tokenizer


Evaluates on test split using same preprocessing pipeline
Computes macro-averaged precision, recall, F1, and accuracy

Metrics Computed:

Accuracy: Overall classification accuracy


Precision (macro): Average precision across all emotion classes
Recall (macro): Average recall across all emotion classes
F1 Score (macro): Harmonic mean of precision and recall

Usage:

python src/models/evaluate.py --model models/emotion_bert_small --data data/emotion_chatbot_samples.csv

3. Response Selector (src/chatbot/response_selector.py)


Purpose: Core component that detects emotions and selects appropriate responses.

Key Features:

Emotion Prediction Pipeline:

1. Keyword Override (Priority): Checks for emotion-indicating keywords first, prioritizing positive emotions to avoid false negatives
2. Model Prediction: Uses fine-tuned BERT model if no keyword match
3. Confidence Check: If model confidence < 0.3, falls back to keyword matching
4. Negative Emotion Validation: Double-checks if model predicts negative emotion but keywords suggest positive

Response Selection:

Primary: Loads responses from CSV dataset grouped by emotion


Fallback: Uses hardcoded templates if CSV responses unavailable
Randomization: Selects random response from available options for variety

Keyword Hints System:

Comprehensive keyword lists for each emotion


Positive emotions (joy, love) checked first to prevent misclassification
Negative emotions (sadness, anger, fear, disgust) checked second

Usage:

selector = ResponseSelector(model_path='models/emotion_bert_small', csv_path='data/emotion_chatbot_samples.csv')


response = selector.select_response("I'm feeling great today!")

4. Flask API (src/chatbot/app.py)


Purpose: RESTful web service for emotion detection and response generation.

Endpoints:

GET /ping: Health check endpoint (returns 'pong')


POST /chat: Main endpoint for emotion detection and response
Request: {"text": "user message"}
Response: {"response": "empathic response"}

Configuration:

Default host: 0.0.0.0 (all interfaces)


Default port: 5000 (configurable via PORT environment variable)
Model path: Configurable via --model argument

Usage:

python src/chatbot/app.py --model models/emotion_bert_small --csv data/emotion_chatbot_samples.csv

Example API Call:

curl -X POST http://localhost:5000/chat -H "Content-Type: application/json" -d '{"text": "I just got promoted!"}'

5. Dataset Utilities (src/utils/dataset.py)


Purpose: Data loading and preprocessing for training.

Key Functionality:

Loads CSV with columns: text, label


Generates label mapping automatically from unique labels
Splits data deterministically (70% train, 15% validation, 15% test)
Tokenizes text using provided tokenizer
Returns Hugging Face DatasetDict with train/validation/test splits

Data Format:

CSV must contain text and label columns


Labels are automatically mapped to integer indices
Text is tokenized with truncation and padding to max_length

6. Metrics (src/utils/metrics.py)
Purpose: Computes evaluation metrics for model performance.

Metrics:

Accuracy: Overall correct predictions


Precision (macro): Average precision across all classes
Recall (macro): Average recall across all classes
F1 Score (macro): Macro-averaged F1 score

Uses scikit-learn's precision_recall_fscore_support and accuracy_score.


Performance Testing
Performance tests were conducted using test_performance.py script. Results below:

Test Environment
Model: emotion_bert_small (fine-tuned BERT-base-uncased)
Dataset: 64 samples (56 training + 8 test)
Test runs: 50-100 iterations per test
Hardware: CPU-based inference

1. Inference Speed Test


Results:

Average inference time: 39.31 ms per prediction


Standard deviation: 2.27 ms
Min/Max time: 36.00 ms / 46.01 ms
Throughput: 25.44 predictions/second

Analysis:

Inference is fast enough for real-time chatbot applications


Low variance indicates consistent performance
Suitable for interactive use cases (< 50ms latency)

2. Response Selector Performance


Results:

Average response time: 0.20 ms per response


Standard deviation: 0.40 ms
Throughput: 5,002 responses/second

Analysis:

Extremely fast response selection (keyword matching is instant)


Most responses use keyword override, bypassing model inference
System can handle high request volumes

3. Model Accuracy Test


Results (on test set):

Accuracy: 33.33%
Precision (macro): 16.67%
Recall (macro): 19.44%
F1 Score (macro): 17.86%

Analysis:

Lower accuracy expected due to small test set (9 samples)


With only 64 total samples, model has limited training data
Performance would improve significantly with larger dataset
Keyword fallback system compensates for model limitations

4. Memory Usage
Results:

Model parameters: 109,488,392 total parameters


Trainable parameters: 109,488,392 (all parameters)
Model size: ~440 MB (BERT-base-uncased standard size)

Analysis:
Standard BERT-base model size
Suitable for deployment on modern servers
Can be optimized with model quantization if needed

Performance Summary
Metric Value Notes
Inference Speed 39.31 ms Fast enough for real-time use
Response Selection 0.20 ms Extremely fast (keyword-based)
Throughput 25.44 pred/s Good for interactive applications
Model Accuracy 33.33% Limited by small dataset
Model Size 109M params Standard BERT-base size

Recommendations
1. Dataset Expansion: Increase training data to improve model accuracy
2. Model Optimization: Consider using smaller models (e.g., DistilBERT) for faster inference
3. Caching: Cache model predictions for repeated queries
4. GPU Acceleration: Use GPU for faster inference in production
5. Hybrid Approach: Current keyword + model approach is effective for small datasets

Chapter Summaries
Case Studies
Case Studies chapter provides a comprehensive literature review of emotion-aware conversational systems, contextualizing the proposed chatbot within the broader field
of affective computing, natural language processing, and empathetic dialogue systems.

The chapter begins with the historical foundations of affective computing (Picard, Ekman, Cambria), establishing the psychological and computational principles that
underpin emotion recognition in text. It reviews major emotion theories (basic emotions, dimensional models) and connects them to modern NLP-based classification
approaches.

The chapter then surveys traditional approaches to emotion detection, such as keyword-based, lexicon-based, and machine-learning models, highlighting their limitations
in handling linguistic ambiguity, contextual expressions, and long-term dependencies. This is followed by an extensive review of deep learning and transformer-based
methods, with a focus on BERT, RoBERTa, GPT, and knowledge-enriched transformers. The literature consistently shows that transformer-based architectures yield
state-of-the-art accuracy and robustness in emotion recognition tasks—an insight that motivates the choice of BERT in the current research.

Subsequently, the chapter analyzes emotion-aware dialogue systems, including AffectBot, XiaoIce, empathetic open-domain models (Rashkin et al.), and knowledge-
enhanced conversational agents. Each system is compared to the current project along dimensions such as modality, emotional memory, context awareness,
interpretability, and computational complexity. While XiaoIce and AffectBot represent large-scale, resource-intensive systems, the proposed chatbot prioritizes modularity,
reproducibility, transparency, and deployment efficiency, making it more suitable for research environments and lightweight applications.

The chapter also compares traditional rule-based chatbots to modern emotion-aware systems, showing improvements in user satisfaction, engagement, and
conversation length. These findings reinforce that emotional intelligence is a key factor in building meaningful, long-term human–AI interactions.

Finally, the chapter identifies research gaps: the need for transparent architectures, accessible implementations, lightweight deployment, and interpretable emotion-to-
response mappings. The proposed system directly addresses these gaps with a modular BERT-based classifier, hybrid keyword fallback, and template-driven empathetic
responses.

Related work
Related work chapter develops a rigorous mathematical, architectural, and algorithmic model of the proposed emotion-aware chatbot. It formally defines all system
components, data flows, and interactions, providing a reproducible and analytically grounded foundation for implementation.

The chapter begins by introducing the system architecture, which is modeled as a modular pipeline consisting of:

Input processing

Emotion recognition (BERT + keyword override)

Memory/context module
Response generation (template-based, emotion-aware)

Output formatting

Each module is defined using formal notation, including sets, functions, and mappings. The BERT classifier is modeled mathematically using embeddings, attention
layers, feed-forward networks, and the cross-entropy optimization objective. The chapter also formalizes the hybrid prediction strategy, where rule-based keyword
detection takes precedence in low-confidence scenarios, ensuring robustness despite the small dataset.

The data model and flow are described from dataset loading and tokenization to batch construction, training, inference, and multi-turn conversation management. The
memory module is modeled as a state machine tracking conversation history and emotion transitions, enabling context-aware responses.

The chapter includes detailed algorithm definitions for:

Emotion classification

Keyword override logic

Context update

Template selection

Semantic similarity ranking (optional extension)

A complete performance model is then provided, calculating training time, inference latency, throughput, memory consumption, and computational complexity. This
includes real-world performance results from your system:

~39ms average inference latency for BERT

0.20ms average response selection latency

Throughput of ~25 predictions/second

Model size ~440 MB These findings confirm real-time feasibility even on CPU.

The chapter concludes with reproducibility mechanisms


(fixed seeds, deterministic splits), system constraints
(sequence length, context windows, memory limits), and
assumptions (English-only, discrete emotion taxonomy,
single-user sessions). Altogether, this chapter delivers a
mathematically rigorous, fully specified model of the chatbot
system.
Modeling of Experimental System
Modeling of Experimental System chapter presents the full implementation, integration, and evaluation of the emotion-aware chatbot system, using the models described
earlier. It translates the theoretical formalization into a functional software prototype built with Python, PyTorch, Hugging Face Transformers, Flask, and supporting utility
modules.

The chapter begins with the software architecture, mirroring the chapter 6 model:

train_classifier.py for fine-tuning BERT

evaluate.py for performance assessment

response_selector.py implementing the hybrid classifier

app.py providing a REST API

utility modules for datasets and metrics


This modular architecture ensures clarity, maintainability, and extensibility.

Next, the chapter describes the training process, including tokenizer configuration, dataset splitting, hyperparameters (batch size, epochs, optimizer, learning rate), and
the evaluation pipeline. The model is fine-tuned on your custom dataset of ~64 samples, with macro-averaged metrics showing modest accuracy due to the dataset size,
but stable inference performance. The chapter highlights that the keyword fallback system substantially improves reliability beyond what raw accuracy metrics reflect.

The response selection system is presented as a lightweight yet effective mechanism for empathetic conversation. Emotion-specific response templates are loaded from
a CSV file, enabling scalable and interpretable response strategies. The system is capable of real-time response generation with extremely low latency.

A performance evaluation section provides empirical measurements:

39.31 ms average model inference time

0.20 ms average response selector time

25.44 inferences / second throughput

109M parameters (standard BERT-base)

The Flask API is described along with usage examples, JSON schemas, error handling, and deployment notes.

The chapter ends with a discussion of limitations and future work, such as dataset enlargement, adopting smaller transformer models (DistilBERT) for optimized
performance, enabling context-aware multi-turn emotion modeling, LLM-based response generation, and multi-language support.

Overall, Chapter 7 demonstrates that the implemented


system is functional, efficient, modular, transparent, and
well-aligned with the research objectives.
Usage Examples
Training a New Model

cd research_project
$env:PYTHONPATH="D:\Desktop\facultate\anul 3\Research project\research_project"
python src/models/train_classifier.py --data data/emotion_chatbot_samples.csv --output models/my_model --epochs 5

Evaluating a Model

cd research_project
$env:PYTHONPATH="D:\Desktop\facultate\anul 3\Research project\research_project"
python src/models/evaluate.py --model models/emotion_bert_small --data data/emotion_chatbot_samples.csv

Running the Chatbot API

cd research_project
$env:PYTHONPATH="D:\Desktop\facultate\anul 3\Research project\research_project"
python src/chatbot/app.py --model models/emotion_bert_small --csv data/emotion_chatbot_samples.csv

Note: Make sure you're in the research_project directory and set the PYTHONPATH so Python can find the src module. Alternatively, you can use:

cd research_project
python -m src.chatbot.app --model models/emotion_bert_small --csv data/emotion_chatbot_samples.csv

Testing Performance
cd research_project
$env:PYTHONPATH="D:\Desktop\facultate\anul 3\Research project\research_project"
python test_performance.py --model models/emotion_bert_small --data data/emotion_chatbot_samples.csv --runs 100

Testing via API

# Start the server


python src/chatbot/app.py

# In another terminal, test the API


curl -X POST http://localhost:5000/chat -H "Content-Type: application/json" -d '{"text": "I feel sad today"}'

Dependencies
Key dependencies (see requirements.txt):

transformers==4.57.1: Hugging Face Transformers for BERT models


torch==2.8.0: PyTorch for deep learning
flask==3.0.3: Web framework for API
datasets==4.4.1: Dataset handling
pandas==2.3.3: Data manipulation
scikit-learn==1.6.1: Evaluation metrics
sentence-transformers==5.1.2: Semantic similarity (optional)

Future Improvements
1. Larger Dataset: Collect more training samples for better model accuracy
2. Context Awareness: Track conversation history for multi-turn emotion detection
3. Response Generation: Use LLMs to generate dynamic responses instead of templates
4. Multi-language Support: Extend to multiple languages
5. Emotion Intensity: Predict emotion intensity levels (currently available in dataset)
6. User Personalization: Adapt responses based on user preferences
7. Real-time Learning: Fine-tune model based on user feedback

Conclusion
This emotion chatbot system demonstrates a practical approach to emotion-aware conversational AI using fine-tuned BERT models. The hybrid keyword + neural model
approach provides robust emotion detection even with limited training data. The system achieves good inference speed suitable for real-time applications, though model
accuracy could be improved with a larger dataset.

The modular architecture allows for easy extension and improvement, making it a solid foundation for more advanced emotion-aware chatbot systems.

You might also like