Unimedvl: Unifying Medical Multimodal Understanding And Generation Through Observation-Knowledge-Analysis Paper • 2510.15710 • Published Oct 17 • 6
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey and Benchmark Paper • 2402.02242 • Published Feb 3, 2024
dMLLM-TTS: Self-Verified and Efficient Test-Time Scaling for Diffusion Multi-Modal Large Language Models Paper • 2512.19433 • Published 4 days ago • 3
Lumina-DiMOO Family Collection Open-Sourced Large Diffusion Language Model for Multi-Modal Generation and Understanding • 4 items • Updated 2 days ago • 5
Lumina-DiMOO Family Collection Open-Sourced Large Diffusion Language Model for Multi-Modal Generation and Understanding • 4 items • Updated 2 days ago • 5
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer Paper • 2511.22699 • Published 29 days ago • 212
Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield Paper • 2511.22677 • Published 29 days ago • 28
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer Paper • 2511.22699 • Published 29 days ago • 212
Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield Paper • 2511.22677 • Published 29 days ago • 28
Lumina-OmniLV: A Unified Multimodal Framework for General Low-Level Vision Paper • 2504.04903 • Published Apr 7
Factuality Matters: When Image Generation and Editing Meet Structured Visuals Paper • 2510.05091 • Published Oct 6 • 19
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding Paper • 2510.06308 • Published Oct 7 • 54
PICABench: How Far Are We from Physically Realistic Image Editing? Paper • 2510.17681 • Published Oct 20 • 62
RoboChallenge: Large-scale Real-robot Evaluation of Embodied Policies Paper • 2510.17950 • Published Oct 20 • 7
TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning Paper • 2511.01833 • Published Nov 3 • 15