-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
Collections
Discover the best community collections!
Collections including paper arxiv:2511.04570
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought
Paper • 2511.02779 • Published • 55 -
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains
Paper • 2511.04962 • Published • 52 -
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Paper • 2511.04570 • Published • 204
-
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Paper • 2511.04570 • Published • 204 -
V-Thinker: Interactive Thinking with Images
Paper • 2511.04460 • Published • 94 -
TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning
Paper • 2511.01833 • Published • 15 -
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning
Paper • 2510.27492 • Published • 80
-
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
Paper • 2508.20751 • Published • 89 -
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling
Paper • 2508.17445 • Published • 80 -
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space
Paper • 2508.19247 • Published • 41 -
VibeVoice Technical Report
Paper • 2508.19205 • Published • 124
-
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Paper • 2511.04570 • Published • 204 -
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
Paper • 2511.12609 • Published • 102 -
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought
Paper • 2511.02779 • Published • 55
-
One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models
Paper • 2511.10629 • Published • 116 -
PAN: A World Model for General, Interactable, and Long-Horizon World Simulation
Paper • 2511.09057 • Published • 72 -
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Paper • 2511.04570 • Published • 204
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
Paper • 2508.20751 • Published • 89 -
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling
Paper • 2508.17445 • Published • 80 -
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space
Paper • 2508.19247 • Published • 41 -
VibeVoice Technical Report
Paper • 2508.19205 • Published • 124
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Paper • 2511.04570 • Published • 204 -
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
Paper • 2511.12609 • Published • 102 -
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought
Paper • 2511.02779 • Published • 55
-
One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models
Paper • 2511.10629 • Published • 116 -
PAN: A World Model for General, Interactable, and Long-Horizon World Simulation
Paper • 2511.09057 • Published • 72 -
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Paper • 2511.04570 • Published • 204
-
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought
Paper • 2511.02779 • Published • 55 -
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains
Paper • 2511.04962 • Published • 52 -
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Paper • 2511.04570 • Published • 204
-
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm
Paper • 2511.04570 • Published • 204 -
V-Thinker: Interactive Thinking with Images
Paper • 2511.04460 • Published • 94 -
TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning
Paper • 2511.01833 • Published • 15 -
ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning
Paper • 2510.27492 • Published • 80