SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention Paper • 2312.07987 • Published Dec 13, 2023 • 41
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling Paper • 2311.00430 • Published Nov 1, 2023 • 57
Levels of AGI for Operationalizing Progress on the Path to AGI Paper • 2311.02462 • Published Nov 4, 2023 • 38
Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code Paper • 2311.07989 • Published Nov 14, 2023 • 26
Mamba: Linear-Time Sequence Modeling with Selective State Spaces Paper • 2312.00752 • Published Dec 1, 2023 • 146
LooseControl: Lifting ControlNet for Generalized Depth Conditioning Paper • 2312.03079 • Published Dec 5, 2023 • 16
Scaling Laws of Synthetic Images for Model Training ... for Now Paper • 2312.04567 • Published Dec 7, 2023 • 8
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want Paper • 2312.03818 • Published Dec 6, 2023 • 34
Prompt Cache: Modular Attention Reuse for Low-Latency Inference Paper • 2311.04934 • Published Nov 7, 2023 • 34
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents Paper • 2311.05437 • Published Nov 9, 2023 • 51