Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning Paper β’ 2504.13914 β’ Published Apr 10 β’ 4
FlowTok: Flowing Seamlessly Across Text and Image Tokens Paper β’ 2503.10772 β’ Published Mar 13 β’ 19
UVE: Are MLLMs Unified Evaluators for AI-Generated Videos? Paper β’ 2503.09949 β’ Published Mar 13 β’ 5
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference Paper β’ 2410.21465 β’ Published Oct 28, 2024 β’ 11
Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology Paper β’ 2507.07999 β’ Published Jul 10 β’ 48
CyberV: Cybernetics for Test-time Scaling in Video Understanding Paper β’ 2506.07971 β’ Published Jun 9 β’ 5
Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models Paper β’ 2505.24164 β’ Published May 30
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding Paper β’ 2504.10465 β’ Published Apr 14 β’ 27
DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World Paper β’ 2506.24102 β’ Published Jun 30
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer Paper β’ 2504.10462 β’ Published Apr 14 β’ 15
Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations Paper β’ 2506.18898 β’ Published Jun 23 β’ 33
VINCIE: Unlocking In-context Image Editing from Video Paper β’ 2506.10941 β’ Published Jun 12 β’ 2
SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training Paper β’ 2506.05301 β’ Published Jun 5 β’ 56
SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration Paper β’ 2501.01320 β’ Published Jan 2 β’ 12
CodeContests+: High-Quality Test Case Generation for Competitive Programming Paper β’ 2506.05817 β’ Published Jun 6 β’ 9
Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving Paper β’ 2504.02605 β’ Published Apr 3 β’ 48
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction Paper β’ 2508.11987 β’ Published Aug 16 β’ 69
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization Paper β’ 2508.14460 β’ Published Aug 20 β’ 82
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling Paper β’ 2508.17445 β’ Published Aug 24 β’ 80
USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning Paper β’ 2508.18966 β’ Published Aug 26 β’ 56
ReportBench: Evaluating Deep Research Agents via Academic Survey Tasks Paper β’ 2508.15804 β’ Published Aug 14 β’ 15
MOSAIC: Multi-Subject Personalized Generation via Correspondence-Aware Alignment and Disentanglement Paper β’ 2509.01977 β’ Published Sep 2 β’ 12
Robix: A Unified Model for Robot Interaction, Reasoning and Planning Paper β’ 2509.01106 β’ Published Sep 1 β’ 48
Manipulation as in Simulation: Enabling Accurate Geometry Perception in Robots Paper β’ 2509.02530 β’ Published Sep 2 β’ 9
Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions? Paper β’ 2509.04292 β’ Published Sep 4 β’ 57
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning Paper β’ 2509.02544 β’ Published Sep 2 β’ 122
Lynx: Towards High-Fidelity Personalized Video Generation Paper β’ 2509.15496 β’ Published Sep 19 β’ 12