Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation Paper • 2406.06890 • Published Jun 11, 2024 • 1
IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation Paper • 2407.10937 • Published Jul 15, 2024 • 1
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities Paper • 2408.00765 • Published Aug 1, 2024 • 14
SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation Paper • 2410.23277 • Published Oct 30, 2024 • 9
AR-Net: Adaptive Frame Resolution for Efficient Action Recognition Paper • 2007.15796 • Published Jul 31, 2020
AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition Paper • 2102.05775 • Published Feb 10, 2021
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement Paper • 2504.07934 • Published Apr 10 • 20
Audio-Aware Large Language Models as Judges for Speaking Styles Paper • 2506.05984 • Published Jun 6 • 15
ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs Paper • 2506.10128 • Published Jun 11 • 22
STITCH: Simultaneous Thinking and Talking with Chunked Reasoning for Spoken Language Models Paper • 2507.15375 • Published Jul 21 • 30
DisCo: Disentangled Control for Referring Human Dance Generation in Real World Paper • 2307.00040 • Published Jun 30, 2023 • 25
Equivariant Similarity for Vision-Language Foundation Models Paper • 2303.14465 • Published Mar 25, 2023
LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling Paper • 2206.07160 • Published Jun 14, 2022
SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning Paper • 2111.13196 • Published Nov 25, 2021
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) Paper • 2309.17421 • Published Sep 29, 2023 • 4
Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation Paper • 2310.08541 • Published Oct 12, 2023 • 18
MM-VID: Advancing Video Understanding with GPT-4V(ision) Paper • 2310.19773 • Published Oct 30, 2023 • 20