APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding Paper • 2502.05431 • Published Feb 8 • 6
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference Paper • 2402.09398 • Published Feb 14, 2024
BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation Paper • 2205.13542 • Published May 26, 2022
FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer Paper • 2301.08739 • Published Jan 20, 2023