Lumina-OmniLV: A Unified Multimodal Framework for General Low-Level Vision Paper • 2504.04903 • Published Apr 7
Factuality Matters: When Image Generation and Editing Meet Structured Visuals Paper • 2510.05091 • Published Oct 6 • 18
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding Paper • 2510.06308 • Published Oct 7 • 53
PICABench: How Far Are We from Physically Realistic Image Editing? Paper • 2510.17681 • Published 28 days ago • 62
Factuality Matters: When Image Generation and Editing Meet Structured Visuals Paper • 2510.05091 • Published Oct 6 • 18
Skywork UniPic 2.0: Building Kontext Model with Online RL for Unified Multimodal Model Paper • 2509.04548 • Published Sep 4 • 4
Inference-Time Alignment Control for Diffusion Models with Reinforcement Learning Guidance Paper • 2508.21016 • Published Aug 28
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions Paper • 2409.15278 • Published Sep 23, 2024 • 25
I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow Paper • 2410.07536 • Published Oct 10, 2024 • 5
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning Paper • 2504.07960 • Published Apr 10 • 50
From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning Paper • 2504.16080 • Published Apr 22 • 15
Resurrect Mask AutoRegressive Modeling for Efficient and Scalable Image Generation Paper • 2507.13032 • Published Jul 17
TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation Paper • 2503.07050 • Published Mar 10