Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data Paper • 2510.03264 • Published Sep 26 • 23
MambaVision Collection MambaVision: A Hybrid Mamba-Transformer Vision Backbone. Includes both 1K and 21K pretrained models. • 13 items • Updated 11 days ago • 33
Gated Delta Networks: Improving Mamba2 with Delta Rule Paper • 2412.06464 • Published Dec 9, 2024 • 13
MambaVision: A Hybrid Mamba-Transformer Vision Backbone Paper • 2407.08083 • Published Jul 10, 2024 • 32
DiffiT: Diffusion Vision Transformers for Image Generation Paper • 2312.02139 • Published Dec 4, 2023 • 16
FasterViT: Fast Vision Transformers with Hierarchical Attention Paper • 2306.06189 • Published Jun 9, 2023 • 31