Here is the README file for Credit Card Fraud Detection project:
This project implements a Credit Card Fraud Detection System using various machine learning models to identify fraudulent transactions. The system leverages data preprocessing, exploratory data analysis (EDA), dimensionality reduction techniques, and model evaluation to achieve optimal performance.
- Overview
- Features
- Technologies Used
- Data Preprocessing
- Model Implementations
- Evaluation Metrics
- User Interface
- Results
- How to Run
- Acknowledgments
The Credit Card Fraud Detection System is designed to classify transactions as fraudulent or non-fraudulent. It emphasizes minimizing Type-II errors (False Negatives) to ensure fraudulent transactions are not overlooked.
- Data preprocessing including handling missing values, normalization, and encoding.
- Dimensionality reduction using PCA and t-SNE.
- Machine learning models:
- Logistic Regression
- Support Vector Machines (SVM)
- Linear Discriminant Analysis (LDA)
- K-Nearest Neighbors (KNN)
- Evaluation of model performance with metrics such as precision, recall, F1-score, AUC, and ROC curve.
- Interactive User Interface for real-time predictions using Gradio.
- Python (Jupyter Notebook)
- Libraries: Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn, Gradio
- Data Loading and Inspection:
- Loaded dataset to analyze its structure and content.
- Data Cleaning:
- Handled missing values and removed unnecessary columns like
customerID
.
- Handled missing values and removed unnecessary columns like
- Normalization:
- Applied z-score normalization to numerical data.
- Encoding:
- Label Encoding for ordinal categorical data.
- One-Hot Encoding for nominal categorical data.
- Logistic Regression:
- Basic and polynomial feature augmentation.
- Emphasis on recall for fraud detection.
- Support Vector Machines (SVM):
- Linear, Polynomial, and RBF kernels with hyperparameter tuning.
- Linear Discriminant Analysis (LDA):
- Linear classification approach for feature separation.
- K-Nearest Neighbors (KNN):
- Optimal neighbor selection via cross-validation.
- Accuracy: Overall correctness of predictions.
- Precision: Relevance of predicted positive cases.
- Recall: Ability to capture true positives.
- F1-Score: Balance between precision and recall.
- AUC: Area under the ROC curve to summarize performance.
- An interactive UI is developed using Gradio.
- Allows users to input transaction details and receive fraud probability predictions in real-time.
- Logistic Regression:
- Accuracy: 78.7%, AUC: 82.6%, Recall: 56.5%
- SVM:
- Best Kernel: RBF
- AUC: 80.2%, Recall: 42.6%
- KNN:
- Optimal Neighbors: 20
- Accuracy: 77%, AUC: 79.6%
- LDA:
- Accuracy: 77.1%, AUC: 81.4%, Recall: 53.5%
- Clone the repository:
git clone <repository-url>
- Navigate to the project directory and open the Jupyter Notebook:
jupyter notebook
- Install dependencies:
pip install -r requirements.txt
- Run the notebook for model training and evaluation.
- Launch the Gradio interface:
python app.py
- This project was developed as part of an Introduction to Machine Learning course.
- Special thanks to the dataset providers and open-source community.