Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval and Filtering Paper • 2510.14605 • Published Oct 16 • 4
Taming Modality Entanglement in Continual Audio-Visual Segmentation Paper • 2510.17234 • Published Oct 20 • 4
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning Paper • 2508.21113 • Published Aug 28 • 109
Re-ranking Reasoning Context with Tree Search Makes Large Vision-Language Models Stronger Paper • 2506.07785 • Published Jun 9 • 1
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources Paper • 2504.00595 • Published Apr 1 • 37
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 46 items • Updated Jul 21 • 664
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis Paper • 2409.06135 • Published Sep 10, 2024 • 16