High School Students' Grade Prediction

This project analyzes and predicts the grades of high school students, utilizing advanced data science techniques to identify the key factors influencing academic performance. Through exploratory data analysis (EDA) and predictive machine learning models, I uncovered valuable insights to improve educational interventions. Developed using various regression algorithms, the project highlights how factors such as family environment and alcohol consumption habits can significantly impact academic achievement. This work represents the final project of my Master's in Data Science at start2impact and reflects my commitment to fostering knowledge from a young age and supporting it with data

This repository contains a project focused on data exploration and grade prediction for a sample of high school students, with the goal of identifying key factors influencing academic performance.

Project Structure

dataset/: This folder contains the data sources used in the project. The datasets have been extracted from Kaggle and are utilized for both data exploration and prediction.
grade_EDA_PDA.ipynb: This Jupyter Notebook is the core of the project. It includes:
- Exploratory Data Analysis (EDA): Investigation of the dataset to understand the distributions, relationships, and key patterns.
- Predictive Data Analysis (PDA): The application of machine learning models to predict students' grades based on the explored features.
grade_documentation.pdf: This PDF document provides detailed documentation of the project, including the methodology, data processing steps, model selection, and evaluation.

How to Run the Project

Clone the repository:

git clone https://github.com/lalessia/students-grades-prediction.git

Navigate to the project directory:

cd students-grades-prediction

Open the Jupyter Notebook: You can open the notebook using Jupyter Notebook or Jupyter Lab.

jupyter notebook grade_EDA_PDA.ipynb

Install requirements: You can install all the required packages using the following command:

pip install -r requirements.txt

Run the notebook: Execute the cells in the notebook to perform the data exploration and run the predictive models.

Key Challenges

This project presented several significant challenges:

Variable Encoding: Properly encoding categorical variables was crucial to ensure that the data could be effectively utilized in the analysis. Choosing the right encoding techniques was essential to maintain the integrity of the data and avoid introducing biases.
Feature Selection: Identifying the most impactful features for data analysis was a critical step. It required careful consideration to ensure that only the variables with the highest predictive power were included, which directly influenced the accuracy of the analysis and predictions.
Regression Algorithm Selection: Choosing the most appropriate regression algorithms for predicting final grades was another major challenge. The selection process involved evaluating various models to find the best fit for the data, balancing complexity with predictive performance.

Project Detail

Dataset

The dataset used in this project includes information on various aspects of high school students, such as demographic details, alcohol consumption habits, family environment and academic performance. The dataset has been preprocessed and is available in the dataset/ folder.

EDA

The Exploratory Data Analysis (EDA) phase focuses on understanding the relationships between different variables within the dataset.

Machine Learning Algorithms

The project applies several machine learning algorithms to predict students' final grades.

Implemented sklearn algorithms:

The models are evaluated based on their:

R-square
RMSE
MAE
Max Err
Train Time

The notebook provides a step-by-step guide to how these models are implemented and evaluated.

Contributing

If you would like to contribute to this project, feel free to fork the repository and submit a pull request. Contributions that improve the analysis, add new models, or enhance the documentation are welcome.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
dataset		dataset
.gitignore		.gitignore
README.md		README.md
grade_EDA_PDA.ipynb		grade_EDA_PDA.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

High School Students' Grade Prediction

Project Structure

How to Run the Project

Key Challenges

Project Detail

Dataset

EDA

Machine Learning Algorithms

Contributing

About

Releases

Packages

Languages

lalessia/students-grades-prediction

Folders and files

Latest commit

History

Repository files navigation

High School Students' Grade Prediction

Project Structure

How to Run the Project

Key Challenges

Project Detail

Dataset

EDA

Machine Learning Algorithms

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages