Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning Paper • 2303.14369 • Published Mar 25, 2023
Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment Paper • 2305.12218 • Published May 20, 2023
A Survey of Large Language Models in Medicine: Principles, Applications, and Challenges Paper • 2311.05112 • Published Nov 9, 2023 • 1
LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference Paper • 2406.18139 • Published Jun 26, 2024 • 2
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension Paper • 2411.13093 • Published Nov 20, 2024 • 2
QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension Paper • 2503.08689 • Published Mar 11 • 4
OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation Paper • 2505.20292 • Published May 26 • 53
Identity-Preserving Text-to-Video Generation by Frequency Decomposition Paper • 2411.17440 • Published Nov 26, 2024 • 37
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators Paper • 2404.05014 • Published Apr 7, 2024 • 34
Continuous-Multiple Image Outpainting in One-Step via Positional Query and A Diffusion-based Approach Paper • 2401.15652 • Published Jan 28, 2024
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation Paper • 2406.18522 • Published Jun 26, 2024 • 20