 gary109
			's Collections
			gary109
			's Collections
			
			
		Vision Transformers
		
	updated
			
 
				
				
 - Mobile V-MoEs: Scaling Down Vision Transformers via Sparse
  Mixture-of-Experts- 
			Paper
			 •- 
			2309.04354
			 •
			Published
				
			•- 
				15
			 
 - Vision Transformers Need Registers- 
			Paper
			 •- 
			2309.16588
			 •
			Published
				
			•- 
				83
			 
 - AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models- 
			Paper
			 •- 
			2309.16414
			 •
			Published
				
			•- 
				19
			 
 - MotionLM: Multi-Agent Motion Forecasting as Language Modeling- 
			Paper
			 •- 
			2309.16534
			 •
			Published
				
			•- 
				16
			 
 - BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation- 
			Paper
			 •- 
			2201.12086
			 •
			Published
				
			•- 
				3
			 
 - FiT: Flexible Vision Transformer for Diffusion Model- 
			Paper
			 •- 
			2402.12376
			 •
			Published
				
			•- 
				48
			 
 - Subobject-level Image Tokenization- 
			Paper
			 •- 
			2402.14327
			 •
			Published
				
			•- 
				19
			 
 - Scalable Diffusion Models with Transformers- 
			Paper
			 •- 
			2212.09748
			 •
			Published
				
			•- 
				18
			 
 - mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal
  Large Language Models- 
			Paper
			 •- 
			2408.04840
			 •
			Published
				
			•- 
				34
			 
 - Seeing and Understanding: Bridging Vision with Chemical Knowledge Via
  ChemVLM- 
			Paper
			 •- 
			2408.07246
			 •
			Published
				
			•- 
				22