Skip to content

Vision Transformers for Cross-domain few shot learning on Meta-Dataset benchmark

Notifications You must be signed in to change notification settings

manogna-s/ViT-MetaDataset

 
 

Repository files navigation

Vision Transformers for Cross-domain Few-shot Learning

This code was adapted from the following repositories:

  1. Meta-Dataset
  2. Vision Transformers
  3. Selecting Universal Representations

Install requirements

setup.sh

Get pretrained ViT-B16 model

cd checkpoints/pretrained_ckpts/
!wget https://storage.googleapis.com/vit_models/imagenet21k/ViT-B_16.npz

Prepare meta-dataset following these instructions

Fine tune ViT on new dataset:

-Extract datasets to ./data folder. Set model configs and fine-tune.

-Currently supports fine tuning on CIFAR-10, CIFAR-100, Omniglot, Aircraft, Textures dataset.

-To fine-tune on other datasets add Dataloaders in utils/data_utils.py

python train.py --model_config=vit_configs/dtd_224.gin 

Evaluate single feature extractor on tasks from Meta-Dataset:

python eval_vit.py --testsets dtd  --trainset 'dtd' --img_size 224

Evaluate ViT with Selecting Universal Representations on Meta-Dataset:

python eval_vit_sur.py --testsets cu_birds  --trainset 'imagenet dtd' --img_size 224 --num_tasks 100

Visualize attention maps on image

python visualize_attention.py --test_img data/test_imgs/dog.jpg

About

Vision Transformers for Cross-domain few shot learning on Meta-Dataset benchmark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.9%
  • Shell 0.1%