Brazilian e-commerce data analysis project using the Olist dataset to uncover insights about delivery times, customer satisfaction, and revenue patterns.
Full analysis and findings available in: OlistDataPresentation.pdf
Source: Olist Brazilian E-Commerce Public Dataset
The dataset contains information about:
- Orders and their status
- Customers and their locations
- Order items and products
- Payments
- Reviews and ratings
- Sellers
olistdata/
├── data/
│ ├── csv/ # CSV datasets
│ └── olist.zip # Original dataset archive
├── olistanalyisis.py # Main analysis script
├── Untitled.ipynb # Jupyter notebook analysis
└── venv/ # Python virtual environment
The analysis includes:
- Data Loading & Merging: Combines multiple datasets (orders, customers, products, payments, reviews, sellers)
- Data Cleaning: Handles timestamps, calculates delivery times, removes null values
- Descriptive Statistics: Average delivery time, review scores
- Visualizations:
- Delivery time distribution
- Review score vs delivery time correlation
- Top 10 states by revenue
- Key Insights: Customer satisfaction metrics and delivery performance
- Python 3.12
- pandas
- numpy
- matplotlib
- seaborn
# Clone the repository
git clone https://github.com/ignaciogomenuka/olistdata.git
cd olistdata
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install pandas numpy matplotlib seaborn jupyterpython olistanalyisis.pyjupyter notebook Untitled.ipynb- Delivery Performance: Analysis of average delivery times across different regions
- Customer Satisfaction: Correlation between delivery time and review scores
- Revenue Distribution: Geographic revenue patterns across Brazilian states
- Review Patterns: Percentage of positive reviews (4-5 stars)
The dataset consists of multiple interconnected tables that provide a comprehensive view of the e-commerce operations:
olist_orders_dataset.csv- Order informationolist_customers_dataset.csv- Customer dataolist_order_items_dataset.csv- Items per orderolist_products_dataset.csv- Product catalogolist_order_payments_dataset.csv- Payment detailsolist_order_reviews_dataset.csv- Customer reviewsolist_sellers_dataset.csv- Seller informationolist_geolocation_dataset.csv- Geolocation dataproduct_category_name_translation.csv- Category translations
Ignacio Muñoz Gomeñuka
This project uses public data from Olist. Please refer to the original dataset terms of use.

