On Epistemic Uncertainty of Visual Tokens for Object Hallucinations in Large Vision-Language Models Paper • 2510.09008 • Published 28 days ago • 15
Upsample What Matters: Region-Adaptive Latent Sampling for Accelerated Diffusion Transformers Paper • 2507.08422 • Published Jul 11 • 36
Robust 3D-Masked Part-level Editing in 3D Gaussian Splatting with Regularized Score Distillation Sampling Paper • 2507.11061 • Published Jul 15 • 37
Efficient Personalization of Quantized Diffusion Model without Backpropagation Paper • 2503.14868 • Published Mar 19 • 20
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity Paper • 2503.07677 • Published Mar 10 • 86
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding Paper • 2502.08946 • Published Feb 13 • 193
Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation Paper • 2502.08690 • Published Feb 12 • 43
Visual Question Decomposition on Multimodal Large Language Models Paper • 2409.19339 • Published Sep 28, 2024 • 9
Cottention: Linear Transformers With Cosine Attention Paper • 2409.18747 • Published Sep 27, 2024 • 17
INTRA: Interaction Relationship-aware Weakly Supervised Affordance Grounding Paper • 2409.06210 • Published Sep 10, 2024 • 26