Seedream 4.0: Toward Next-generation Multimodal Image Generation Paper • 2509.20427 • Published Sep 24 • 76
UniVid: Unifying Vision Tasks with Pre-trained Video Generation Models Paper • 2509.21760 • Published Sep 26 • 14
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model Paper • 2509.09372 • Published Sep 11 • 236
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning Paper • 2509.09674 • Published Sep 11 • 79
Robix: A Unified Model for Robot Interaction, Reasoning and Planning Paper • 2509.01106 • Published Sep 1 • 48
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control Paper • 2508.21112 • Published Aug 28 • 75
RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation Paper • 2506.18088 • Published Jun 22 • 18
CRITICTOOL: Evaluating Self-Critique Capabilities of Large Language Models in Tool-Calling Error Scenarios Paper • 2506.13977 • Published Jun 11 • 10
VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning Paper • 2505.22019 • Published May 28 • 11
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer Paper • 2504.20690 • Published Apr 29 • 19
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models Paper • 2504.15279 • Published Apr 21 • 77
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning Paper • 2504.07956 • Published Apr 10 • 47
OmniSVG: A Unified Scalable Vector Graphics Generation Model Paper • 2504.06263 • Published Apr 8 • 180
Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model Paper • 2504.05594 • Published Apr 8 • 11
Long-Context Autoregressive Video Modeling with Next-Frame Prediction Paper • 2503.19325 • Published Mar 25 • 73
RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints Paper • 2503.16408 • Published Mar 20 • 41
Edit Transfer: Learning Image Editing via Vision In-Context Relations Paper • 2503.13327 • Published Mar 17 • 29
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning Paper • 2503.07459 • Published Mar 10 • 16