Parallel Loop Transformer for Efficient Test-Time Computation Scaling Paper • 2510.24824 • Published 3 days ago • 12
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM Paper • 2510.15870 • Published 14 days ago • 85
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper • 2510.11696 • Published 18 days ago • 168
Seedream 4.0: Toward Next-generation Multimodal Image Generation Paper • 2509.20427 • Published Sep 24 • 76
view article Article Efficient LLM Pretraining: Packed Sequences and Masked Attention By sirluk • Oct 7, 2024 • 55
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning Paper • 2505.16933 • Published May 22 • 34
Model Merging in Pre-training of Large Language Models Paper • 2505.12082 • Published May 17 • 40 • 6