view article Article GGML and llama.cpp join HF to ensure the long-term progress of Local AI +4 Feb 20 • 504
view article Article Welcome Gemma 4: Frontier multimodal intelligence on device +5 25 days ago • 879
view article Article Introducing Modular Diffusers - Composable Building Blocks for Diffusion Pipelines +2 Mar 5 • 51
view reply KV caching enables to re-use what the model previously generated. That way, the model only focuses on the new tokens to generate.Here is an illustrated explanation of KV caching: https://huggingface.co/blog/not-lain/kv-caching
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency Jan 30, 2025 • 307