Zero-to-Hero: ViT🚀

I have tried to cover all the bases for understanding and implementing Vision Transformers (ViT) and their evolution into Video Vision Transformers (ViViT). The main focus is on dealing with the spatio-temporal relations using visual transformers.

1. Vision Transformer (ViT) Fundamentals

Surveys and Overviews:

Key Papers:

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: Paper | Code
Training data-efficient image transformers & distillation through attention (DeiT): Paper | Code

Concepts and Tutorials:

"Attention Is All You Need": Paper
"The Illustrated Transformers": Blog Post
"Vision Transformer Explained" Blog Post

2. Convolutional ViT and Hybrid Models:

CvT: Introducing Convolutions to Vision Transformers: Paper | Code
CoAtNet: Marrying Convolution and Attention for All Data Sizes: Paper
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases: Paper | Code

3. Efficient Transformers and Swin Transformer:

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows: Paper | Code
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions: Paper | Code
Efficient Transformers: A Survey: Paper

4. Space-Time Attention and Video Transformers:

TimeSformer: Is Space-Time Attention All You Need for Video Understanding? Paper | Code
Space-Time Mixing Attention for Video Transformer: Paper
MViT: Multiscale Vision Transformers: Paper | Code

5. Video Vision Transformer (ViViT):

ViViT: A Video Vision Transformer: Paper | Code
Video Transformer Network: Paper | Code

How to use this Repo?

Start by reading the survey papers to get a broad understanding of the field.
For each key paper, read the abstract and introduction, then skim through the methodology and results sections.
Implement key concepts using the provided GitHub repositories or your own code.
Experiment with different architectures and datasets to solidify your understanding.
Use the additional resources to dive deeper into specific topics or applications.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
proj		proj
vit/visualize		vit/visualize
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zero-to-Hero: ViT🚀

1. Vision Transformer (ViT) Fundamentals

Surveys and Overviews:

Key Papers:

Concepts and Tutorials:

2. Convolutional ViT and Hybrid Models:

3. Efficient Transformers and Swin Transformer:

4. Space-Time Attention and Video Transformers:

5. Video Vision Transformer (ViViT):

How to use this Repo?

About

Releases

Packages

Languages

License

paperwave/Vision-Transformers

Folders and files

Latest commit

History

Repository files navigation

Zero-to-Hero: ViT🚀

1. Vision Transformer (ViT) Fundamentals

Surveys and Overviews:

Key Papers:

Concepts and Tutorials:

2. Convolutional ViT and Hybrid Models:

3. Efficient Transformers and Swin Transformer:

4. Space-Time Attention and Video Transformers:

5. Video Vision Transformer (ViViT):

How to use this Repo?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages