OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation Paper • 2412.09585 • Published Dec 12, 2024 • 11
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Paper • 2408.15998 • Published Aug 28, 2024 • 87
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text Paper • 2403.14773 • Published Mar 21, 2024 • 11
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models Paper • 2312.14091 • Published Dec 21, 2023 • 17
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models Paper • 2312.04410 • Published Dec 7, 2023 • 15
HiFi Tuner: High-Fidelity Subject-Driven Fine-Tuning for Diffusion Models Paper • 2312.00079 • Published Nov 30, 2023 • 17
Versatile Diffusion: Text, Images and Variations All in One Diffusion Model Paper • 2211.08332 • Published Nov 15, 2022
Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators Paper • 2303.13439 • Published Mar 23, 2023 • 5
OneFormer: One Transformer to Rule Universal Image Segmentation Paper • 2211.06220 • Published Nov 10, 2022