An Exploration of Analysis Methods on Predictive Models of Student Success

This project explores machine learning models that predict student success in online courses, using a dataset from the Open University. Specifically, it compares traditional and Bayesian methods for model evaluation and introduces a fairness metric, Absolute Between Receiver Operating Characteristic Area (ABROCA), which assesses the model’s predictions across demographic subgroups.

Overview

The primary goal is to predict students at risk of failing or withdrawing within the first 30 days of a course. Predictions like these could help guide resource allocation toward at-risk students.

The project includes:

Data Engineering and Preprocessing:
- The Open University Learning Analytics Dataset (OULAD) was imported into a PostgreSQL database for efficient data wrangling and cleansing.
- Feature engineering created additional predictive fields from student demographics, course information, assessment data, and interactions with the virtual learning environment (VLE).
Model Training and Evaluation:
- Various machine learning classifiers, including tree-based ensemble methods, logistic regression, and support vector machines, were trained.
- Hyperparameter tuning was conducted using both grid search and random search for optimized model performance.
- Model performance was compared using frequentist and Bayesian evaluation techniques.
Fairness Metric:
- The ABROCA metric was applied to evaluate model fairness across gender and disability subgroups, showing correlations with demographic balance but little relationship with overall model performance.

Repository Contents

This repository contains:

Data Processing Pipeline: Scripts for setting up the database, loading the OULAD dataset, and transforming it into usable formats are located in /src/app/etl/.
Feature Engineering Scripts: Custom scripts for creating predictive features based on student engagement and assessment data.
Model Training Pipeline: A configurable pipeline using scikit-learn to train and test various classification models with exhaustive hyperparameter tuning can be found in /src/app/model/main.py.
Evaluation Metrics: Implementations of frequentist tests (Friedman and Nemenyi) and Bayesian methods to compare model efficacy.
Fairness Analysis: Code for calculating ABROCA and visualizing results across demographic categories is provided in the /src/notebooks/ModelEvaluation.ipynb notebook.

Installation

Clone this repository:

git clone https://github.com/witzbeck/master_proj.git
cd master_proj

Install dependencies using Poetry:
```
poetry install
```
Set up PostgreSQL for the data pipeline as per the instructions in the /src/app/etl/ directory.

Usage

Data Pipeline: Run the ETL pipeline located in the /src/app/etl/ directory to ingest, clean, and transform the dataset.
Model Training: Use the model training pipeline in /src/app/model/main.py to train and evaluate models with specified hyperparameters.
Fairness Analysis: Run the notebook /src/notebooks/ModelEvaluation.ipynb to calculate ABROCA and evaluate fairness across demographic groups.

Results Summary

Best Performing Models: Tree-based ensemble methods, such as Random Forest and Extra Trees, had the best overall performance.
Fairness Metric: The ABROCA metric showed a strong association with demographic ratios but did not correlate directly with predictive accuracy, indicating that it may be a valuable fairness measure independent of performance.

Future Work

Future development could expand on feature engineering, automate hyperparameter optimization, and further explore the fairness metric with additional datasets and subgroups.

References

For a detailed explanation of the project’s methodology, please refer to paper.pdf in the latest release.

Name		Name	Last commit message	Last commit date
Latest commit History 351 Commits
.github		.github
research		research
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
README.md		README.md
master_proj.code-workspace		master_proj.code-workspace
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

An Exploration of Analysis Methods on Predictive Models of Student Success

Overview

Repository Contents

Installation

Usage

Results Summary

Future Work

References

About

Uh oh!

Releases 7

Packages

Uh oh!

Uh oh!

Contributors 3

Uh oh!

Languages

witzbeck/master_proj

Folders and files

Latest commit

History

Repository files navigation

An Exploration of Analysis Methods on Predictive Models of Student Success

Overview

Repository Contents

Installation

Usage

Results Summary

Future Work

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Uh oh!

Contributors 3

Uh oh!

Languages

Packages