Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization Paper β’ 2409.00492 β’ Published Aug 31, 2024 β’ 11
Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment Paper β’ 2405.03594 β’ Published May 6, 2024 β’ 7
Sparse Finetuning for Inference Acceleration of Large Language Models Paper β’ 2310.06927 β’ Published Oct 10, 2023 β’ 15
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models Paper β’ 2203.07259 β’ Published Mar 14, 2022 β’ 4