FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution Paper • 2510.12747 • Published Oct 14 • 37
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist Paper • 2511.08521 • Published 12 days ago • 37
VideoSSR: Video Self-Supervised Reinforcement Learning Paper • 2511.06281 • Published 15 days ago • 22
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published 17 days ago • 197
Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs Paper • 2510.24514 • Published 27 days ago • 20
Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing Paper • 2510.19808 • Published Oct 22 • 28
MoGA: Mixture-of-Groups Attention for End-to-End Long Video Generation Paper • 2510.18692 • Published Oct 21 • 39
MUG-V 10B: High-efficiency Training Pipeline for Large Video Generation Models Paper • 2510.17519 • Published Oct 20 • 9
Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset Paper • 2510.15742 • Published Oct 17 • 50
ContextGen: Contextual Layout Anchoring for Identity-Consistent Multi-Instance Generation Paper • 2510.11000 • Published Oct 13 • 8
PickStyle: Video-to-Video Style Transfer with Context-Style Adapters Paper • 2510.07546 • Published Oct 8 • 21
InstructX: Towards Unified Visual Editing with MLLM Guidance Paper • 2510.08485 • Published Oct 9 • 16
DreamOmni2: Multimodal Instruction-based Editing and Generation Paper • 2510.06679 • Published Oct 8 • 73
VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning Paper • 2510.08555 • Published Oct 9 • 63
Hyperparameters are all you need: Using five-step inference for an original diffusion model to generate images comparable to the latest distillation model Paper • 2510.02390 • Published Sep 30 • 3
UniVideo: Unified Understanding, Generation, and Editing for Videos Paper • 2510.08377 • Published Oct 9 • 70