Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals Paper • 2510.27684 • Published Oct 31 • 22
UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions Paper • 2511.03334 • Published Nov 5 • 51
Kimi Linear: An Expressive, Efficient Attention Architecture Paper • 2510.26692 • Published Oct 30 • 116
SAO-Instruct: Free-form Audio Editing using Natural Language Instructions Paper • 2510.22795 • Published Oct 26 • 5
Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing Paper • 2510.19808 • Published Oct 22 • 28
Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation Paper • 2510.01284 • Published Sep 30 • 33
LongLive: Real-time Interactive Long Video Generation Paper • 2509.22622 • Published Sep 26 • 184
ReviewScore: Misinformed Peer Review Detection with Large Language Models Paper • 2509.21679 • Published Sep 25 • 63
Seedream 4.0: Toward Next-generation Multimodal Image Generation Paper • 2509.20427 • Published Sep 24 • 80
SD3.5-Flash: Distribution-Guided Distillation of Generative Flows Paper • 2509.21318 • Published Sep 25 • 10
DiffusionNFT: Online Diffusion Reinforcement with Forward Process Paper • 2509.16117 • Published Sep 19 • 21
EdgeFusion: On-Device Text-to-Image Generation Paper • 2404.11925 • Published Apr 18, 2024 • 23
Cut2Next: Generating Next Shot via In-Context Tuning Paper • 2508.08244 • Published Aug 11 • 13
Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Off Paper • 2508.04825 • Published Aug 6 • 58