TileRT Collection Tile-Based Runtime for Ultra-Low Latency LLM Inference • 1 item • Updated 12 days ago
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27, 2024 • 627
T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge Paper • 2407.00088 • Published Jun 25, 2024 • 12