view article Article Prefill and Decode for Concurrent Requests - Optimizing LLM Performance By tngtech • Apr 16 • 51
Quantization Spaces on the Hub ⚡ Collection A collection of spaces that allow you to quantize on the Hub • 4 items • Updated Nov 4, 2024 • 6
Reasoning Router Collection Reasoning Router explores routing for hybrid models between “Thinking” (accurate) and “Non-Thinking” (fast) modes using open models (Qwen3) • 8 items • Updated Sep 25 • 2
Scaling Test-Time Compute with Open Models Collection Models and datasets used in our blog post: https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute • 10 items • Updated Jan 6 • 27
view article Article Use Models from the Hugging Face Hub in LM Studio By yagilb • Nov 28, 2024 • 140
🪐 SmolLM Collection A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos • 12 items • Updated May 5 • 237