VisionDeltaNet is a lightweight vision backbone inspired by Parallelizing Linear Transformers with the Delta Rule over Sequence Length. It combines transformers with convolutions for efficient feature extraction and achieves 70%+ accuracy on CIFAR-10 within 10 epochs.
-
Install dependencies: pip install torch torchvision.
-
To train and evaluate on CIFAR-10, run:
python vdelta.py