-
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 202 -
OpenGVLab/InternVL3_5-241B-A28B
Image-Text-to-Text • 241B • Updated • 6.26k • 131 -
OpenGVLab/InternVL3_5-38B
Image-Text-to-Text • 38B • Updated • 12.1k • 33 -
OpenGVLab/InternVL3_5-30B-A3B
Image-Text-to-Text • 31B • Updated • 41.7k • 38
Collections
Discover the best community collections!
Collections including paper arxiv:2508.18265
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 28 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
Apriel-1.5-15b-Thinker
Paper • 2510.01141 • Published • 113 -
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources
Paper • 2509.21268 • Published • 101 -
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model
Paper • 2509.00676 • Published • 83 -
Visual Representation Alignment for Multimodal Large Language Models
Paper • 2509.07979 • Published • 83
-
rStar2-Agent: Agentic Reasoning Technical Report
Paper • 2508.20722 • Published • 113 -
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper • 2508.16153 • Published • 154 -
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR
Paper • 2508.14029 • Published • 118 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 202
-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper • 2405.15223 • Published • 17 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 55 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 34
-
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 202 -
OpenGVLab/InternVL3_5-241B-A28B
Image-Text-to-Text • 241B • Updated • 6.26k • 131 -
OpenGVLab/InternVL3_5-38B
Image-Text-to-Text • 38B • Updated • 12.1k • 33 -
OpenGVLab/InternVL3_5-30B-A3B
Image-Text-to-Text • 31B • Updated • 41.7k • 38
-
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Paper • 2504.10479 • Published • 298 -
Qwen3 Technical Report
Paper • 2505.09388 • Published • 308 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 202 -
How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective
Paper • 2509.18905 • Published • 28
-
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 202 -
SmolVLM: Redefining small and efficient multimodal models
Paper • 2504.05299 • Published • 200 -
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models
Paper • 2504.15271 • Published • 66
-
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 202 -
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
Paper • 2508.05748 • Published • 137 -
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper • 2508.16153 • Published • 154 -
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
Paper • 2508.13167 • Published • 127
-
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 202 -
OpenGVLab/InternVL3_5-241B-A28B
Image-Text-to-Text • 241B • Updated • 6.26k • 131 -
OpenGVLab/InternVL3_5-38B
Image-Text-to-Text • 38B • Updated • 12.1k • 33 -
OpenGVLab/InternVL3_5-30B-A3B
Image-Text-to-Text • 31B • Updated • 41.7k • 38
-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper • 2405.15223 • Published • 17 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 55 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 34
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 28 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 202 -
OpenGVLab/InternVL3_5-241B-A28B
Image-Text-to-Text • 241B • Updated • 6.26k • 131 -
OpenGVLab/InternVL3_5-38B
Image-Text-to-Text • 38B • Updated • 12.1k • 33 -
OpenGVLab/InternVL3_5-30B-A3B
Image-Text-to-Text • 31B • Updated • 41.7k • 38
-
Apriel-1.5-15b-Thinker
Paper • 2510.01141 • Published • 113 -
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources
Paper • 2509.21268 • Published • 101 -
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model
Paper • 2509.00676 • Published • 83 -
Visual Representation Alignment for Multimodal Large Language Models
Paper • 2509.07979 • Published • 83
-
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Paper • 2504.10479 • Published • 298 -
Qwen3 Technical Report
Paper • 2505.09388 • Published • 308 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 202 -
How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective
Paper • 2509.18905 • Published • 28
-
rStar2-Agent: Agentic Reasoning Technical Report
Paper • 2508.20722 • Published • 113 -
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper • 2508.16153 • Published • 154 -
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR
Paper • 2508.14029 • Published • 118 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 202
-
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 202 -
SmolVLM: Redefining small and efficient multimodal models
Paper • 2504.05299 • Published • 200 -
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models
Paper • 2504.15271 • Published • 66
-
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 202 -
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
Paper • 2508.05748 • Published • 137 -
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper • 2508.16153 • Published • 154 -
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
Paper • 2508.13167 • Published • 127