-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 28 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2411.10442
-
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Paper • 2411.10442 • Published • 87 -
OpenGVLab/InternVL2_5-78B-MPO
Image-Text-to-Text • 78B • Updated • 80 • 54 -
OpenGVLab/InternVL2_5-38B-MPO
Image-Text-to-Text • 38B • Updated • 1.08k • 20 -
OpenGVLab/InternVL2_5-26B-MPO
Image-Text-to-Text • 26B • Updated • 86 • 14
-
Stream of Search (SoS): Learning to Search in Language
Paper • 2404.03683 • Published • 31 -
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Paper • 2411.10442 • Published • 87 -
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
Paper • 2411.14405 • Published • 61 -
Hymba: A Hybrid-head Architecture for Small Language Models
Paper • 2411.13676 • Published • 45
-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper • 2405.15223 • Published • 17 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 55 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 34
-
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 62 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 141
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 37 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47
-
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Paper • 2411.10442 • Published • 87 -
VisionZip: Longer is Better but Not Necessary in Vision Language Models
Paper • 2412.04467 • Published • 118 -
NVILA: Efficient Frontier Visual Language Models
Paper • 2412.04468 • Published • 59 -
PaliGemma 2: A Family of Versatile VLMs for Transfer
Paper • 2412.03555 • Published • 133
-
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
Paper • 2410.13861 • Published • 56 -
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation
Paper • 2411.07975 • Published • 30 -
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Paper • 2411.10442 • Published • 87 -
Multimodal Autoregressive Pre-training of Large Vision Encoders
Paper • 2411.14402 • Published • 46
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 28 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper • 2405.15223 • Published • 17 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 55 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 34
-
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 62 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 119 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 141
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 37 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47
-
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Paper • 2411.10442 • Published • 87 -
OpenGVLab/InternVL2_5-78B-MPO
Image-Text-to-Text • 78B • Updated • 80 • 54 -
OpenGVLab/InternVL2_5-38B-MPO
Image-Text-to-Text • 38B • Updated • 1.08k • 20 -
OpenGVLab/InternVL2_5-26B-MPO
Image-Text-to-Text • 26B • Updated • 86 • 14
-
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Paper • 2411.10442 • Published • 87 -
VisionZip: Longer is Better but Not Necessary in Vision Language Models
Paper • 2412.04467 • Published • 118 -
NVILA: Efficient Frontier Visual Language Models
Paper • 2412.04468 • Published • 59 -
PaliGemma 2: A Family of Versatile VLMs for Transfer
Paper • 2412.03555 • Published • 133
-
Stream of Search (SoS): Learning to Search in Language
Paper • 2404.03683 • Published • 31 -
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Paper • 2411.10442 • Published • 87 -
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
Paper • 2411.14405 • Published • 61 -
Hymba: A Hybrid-head Architecture for Small Language Models
Paper • 2411.13676 • Published • 45
-
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
Paper • 2410.13861 • Published • 56 -
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation
Paper • 2411.07975 • Published • 30 -
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Paper • 2411.10442 • Published • 87 -
Multimodal Autoregressive Pre-training of Large Vision Encoders
Paper • 2411.14402 • Published • 46