ARGenSeg: Image Segmentation with Autoregressive Image Generation Model Paper • 2510.20803 • Published 5 days ago • 8
Unified Reinforcement and Imitation Learning for Vision-Language Models Paper • 2510.19307 • Published 6 days ago • 24
LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence Paper • 2509.12203 • Published Sep 15 • 19
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published Jul 1 • 236
Intern-S1: A Scientific Multimodal Foundation Model Paper • 2508.15763 • Published Aug 21 • 254
Running on Zero 172 172 Chat with Kimi-VL-A3B-Thinking-2506 🤔 Chat with images, videos, or PDFs to generate text
Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents Paper • 2508.05954 • Published Aug 8 • 6
view article Article Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders Jul 9 • 697
view article Article Asynchronous Robot Inference: Decoupling Action Prediction and Execution Jul 10 • 43
view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data Jun 3 • 268
Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models Paper • 2507.07104 • Published Jul 9 • 45