view article Article Prefill and Decode for Concurrent Requests - Optimizing LLM Performance By tngtech • Apr 16 • 51
Running 3.44k 3.44k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published Aug 7 • 178
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27, 2024 • 625
AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs Paper • 2507.05687 • Published Jul 8 • 27
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models Paper • 2112.10741 • Published Dec 20, 2021 • 4