Cambrian-S: Towards Spatial Supersensing in Video Paper • 2511.04670 • Published about 18 hours ago • 15
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning Paper • 2510.13515 • Published 23 days ago • 11
LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training Paper • 2509.23661 • Published Sep 28 • 44
Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval Paper • 2509.09118 • Published Sep 11 • 8
Region-based Cluster Discrimination for Visual Representation Learning Paper • 2507.20025 • Published Jul 26 • 19
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization Paper • 2507.14683 • Published Jul 19 • 131
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published Jul 1 • 237
MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings Paper • 2506.23115 • Published Jun 29 • 37
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published Jun 16 • 270
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models Paper • 2506.05176 • Published Jun 5 • 74
QwenLong-CPRS: Towards infty-LLMs with Dynamic Context Optimization Paper • 2505.18092 • Published May 23 • 43
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning Paper • 2505.17667 • Published May 23 • 88
On Path to Multimodal Generalist: General-Level and General-Bench Paper • 2505.04620 • Published May 7 • 82
100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models Paper • 2505.00551 • Published May 1 • 36
Describe Anything: Detailed Localized Image and Video Captioning Paper • 2504.16072 • Published Apr 22 • 63