Skip to content

harsha070/cs265-mlsys-2024

 
 

Repository files navigation

cs265-mlsys-2024

Activation Checkpointing: Trading off compute for memory in deep neural network training

An activation checkpointing algorithm implementation to scale deep learning training workloads beyond batch sizes that entirely fit (including intermediate activations) on a single GPU.

Image 1 Image 2 Image 3
Image 1 Image 2 Image 3

Instructions

git clone https://github.com/harsha070/cs265-mlsys-2024.git
git clone https://github.com/pytorch/benchmark
pip install -e benchmark/
!sed -i '935d' /usr/local/lib/python3.10/dist-packages/transformers/models/bert/modeling_bert.py

Enable operator fusion in optimizer.

  1. Change ~/benchmark/torchbenchmark/util/framework/huggingface/basic_configs.py line 309-314. Add fused=True to optimizer kwargs.
  2. Change~/benchmark/torchbenchmark/util/framework/vision/model_factory.py line 39. Add fused=True to optimizer kwargs.

To benchmark a model run,

python cs265-mlsys-2024/benchmarks.py torchbenchmark.models.hf_Bert.Model 24
python cs265-mlsys-2024/benchmarks.py torchbenchmark.models.resnet18.Model 1024
python cs265-mlsys-2024/benchmarks.py torchbenchmark.models.resnet50.Model 256

About

ML Systems Research Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%