-
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 208 -
OpenGVLab/InternVL3_5-241B-A28B
Image-Text-to-Text • 241B • Updated • 649 • 132 -
OpenGVLab/InternVL3_5-38B
Image-Text-to-Text • 38B • Updated • 5.88k • 37 -
OpenGVLab/InternVL3_5-30B-A3B
Image-Text-to-Text • 31B • Updated • 42.9k • 37
Collections
Discover the best community collections!
Collections including paper arxiv:2508.18265
-
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 208 -
OpenGVLab/InternVL3_5-241B-A28B
Image-Text-to-Text • 241B • Updated • 649 • 132 -
OpenGVLab/InternVL3_5-38B
Image-Text-to-Text • 38B • Updated • 5.88k • 37 -
OpenGVLab/InternVL3_5-30B-A3B
Image-Text-to-Text • 31B • Updated • 42.9k • 37
-
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks
Paper • 2310.19909 • Published • 21 -
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 19 -
FlashDecoding++: Faster Large Language Model Inference on GPUs
Paper • 2311.01282 • Published • 37 -
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Paper • 2311.04934 • Published • 34
-
Perception-Aware Policy Optimization for Multimodal Reasoning
Paper • 2507.06448 • Published • 47 -
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning
Paper • 2507.05920 • Published • 11 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 208
-
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 166 -
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper • 2505.22453 • Published • 46 -
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
Paper • 2505.23380 • Published • 22 -
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
Paper • 2505.21523 • Published • 13
-
Motif-Technologies/Motif-2.6B
Text Generation • 3B • Updated • 172 • 78 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 208 -
OpenGVLab/InternVL3_5-241B-A28B
Image-Text-to-Text • 241B • Updated • 649 • 132 -
openbmb/MiniCPM-V-4_5
Image-Text-to-Text • 9B • Updated • 49.5k • 1.02k
-
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction
Paper • 2508.11987 • Published • 71 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 208 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 496
-
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 158 -
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance
Paper • 2507.22448 • Published • 66 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 208 -
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning
Paper • 2508.21113 • Published • 110
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 8.01k • 1.22k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 141 • 15 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
-
ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems
Paper • 2503.20756 • Published • 7 -
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset
Paper • 2505.09568 • Published • 97 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 208 -
Qwen3-Omni Technical Report
Paper • 2509.17765 • Published • 139
-
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 208 -
OpenGVLab/InternVL3_5-241B-A28B
Image-Text-to-Text • 241B • Updated • 649 • 132 -
OpenGVLab/InternVL3_5-38B
Image-Text-to-Text • 38B • Updated • 5.88k • 37 -
OpenGVLab/InternVL3_5-30B-A3B
Image-Text-to-Text • 31B • Updated • 42.9k • 37
-
Motif-Technologies/Motif-2.6B
Text Generation • 3B • Updated • 172 • 78 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 208 -
OpenGVLab/InternVL3_5-241B-A28B
Image-Text-to-Text • 241B • Updated • 649 • 132 -
openbmb/MiniCPM-V-4_5
Image-Text-to-Text • 9B • Updated • 49.5k • 1.02k
-
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 208 -
OpenGVLab/InternVL3_5-241B-A28B
Image-Text-to-Text • 241B • Updated • 649 • 132 -
OpenGVLab/InternVL3_5-38B
Image-Text-to-Text • 38B • Updated • 5.88k • 37 -
OpenGVLab/InternVL3_5-30B-A3B
Image-Text-to-Text • 31B • Updated • 42.9k • 37
-
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction
Paper • 2508.11987 • Published • 71 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 208 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 496
-
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks
Paper • 2310.19909 • Published • 21 -
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 19 -
FlashDecoding++: Faster Large Language Model Inference on GPUs
Paper • 2311.01282 • Published • 37 -
Prompt Cache: Modular Attention Reuse for Low-Latency Inference
Paper • 2311.04934 • Published • 34
-
Agentic Reinforced Policy Optimization
Paper • 2507.19849 • Published • 158 -
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance
Paper • 2507.22448 • Published • 66 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 208 -
R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning
Paper • 2508.21113 • Published • 110
-
Perception-Aware Policy Optimization for Multimodal Reasoning
Paper • 2507.06448 • Published • 47 -
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning
Paper • 2507.05920 • Published • 11 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 208
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 8.01k • 1.22k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 141 • 15 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
-
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 166 -
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper • 2505.22453 • Published • 46 -
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
Paper • 2505.23380 • Published • 22 -
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
Paper • 2505.21523 • Published • 13
-
ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems
Paper • 2503.20756 • Published • 7 -
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset
Paper • 2505.09568 • Published • 97 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 208 -
Qwen3-Omni Technical Report
Paper • 2509.17765 • Published • 139