Running on CPU Upgrade Featured 2.43k The Smol Training Playbook 📚 2.43k The secrets to building world-class LLMs
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation Paper • 2406.07529 • Published Jun 11, 2024
Scaling Latent Reasoning via Looped Language Models Paper • 2510.25741 • Published 27 days ago • 216
Scaling Latent Reasoning via Looped Language Models Paper • 2510.25741 • Published 27 days ago • 216
Context Clues Collection Models from the paper Context Clues • 16 items • Updated 18 days ago • 6
Qwen2 Collection Qwen2 language models, including pretrained and instruction-tuned models of 5 sizes, including 0.5B, 1.5B, 7B, 57B-A14B, and 72B. • 39 items • Updated Jul 21 • 373