Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2505.09388

Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

Paper • 2506.06395 • Published Jun 5 • 133
Magistral

Paper • 2506.10910 • Published Jun 12 • 65
Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs

Paper • 2506.07240 • Published Jun 8 • 7
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation

Paper • 2506.09991 • Published Jun 11 • 55

Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 317
WebThinker: Empowering Large Reasoning Models with Deep Research Capability

Paper • 2504.21776 • Published Apr 30 • 59
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math

Paper • 2504.21233 • Published Apr 30 • 49

QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 88
Distilling LLM Agent into Small Models with Retrieval and Code Tools

Paper • 2505.17612 • Published May 23 • 81
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 317
Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Paper • 2505.03335 • Published May 6 • 188

Scaling Law for Quantization-Aware Training

Paper • 2505.14302 • Published May 20 • 76
Reward Reasoning Model

Paper • 2505.14674 • Published May 20 • 38
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 317
AdaptThink: Reasoning Models Can Learn When to Think

Paper • 2505.13417 • Published May 19 • 82

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14 • 97
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 317
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning

Paper • 2505.11049 • Published May 16 • 60
Emerging Properties in Unified Multimodal Pretraining

Paper • 2505.14683 • Published May 20 • 134

SeerAttention-R: Sparse Attention Adaptation for Long Reasoning

Paper • 2506.08889 • Published Jun 10 • 23
MiniCPM4: Ultra-Efficient LLMs on End Devices

Paper • 2506.07900 • Published Jun 9 • 92
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 262
OpenThoughts: Data Recipes for Reasoning Models

Paper • 2506.04178 • Published Jun 4 • 48

Skywork Open Reasoner 1 Technical Report

Paper • 2505.22312 • Published May 28 • 54
Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM's Instruction-Following Capabilities

Paper • 2505.21191 • Published May 27 • 3
Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Paper • 2505.03335 • Published May 6 • 188
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 317

Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 317
TPTT: Transforming Pretrained Transformer into Titans

Paper • 2506.17671 • Published Jun 21 • 5
Kimi Linear: An Expressive, Efficient Attention Architecture

Paper • 2510.26692 • Published Oct 30 • 114

May 2025 - Top Papers

The Leaderboard Illusion

Paper • 2504.20879 • Published Apr 29 • 72
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Paper • 2505.09343 • Published May 14 • 73
LLMs for Engineering: Teaching Models to Design High Powered Rockets

Paper • 2504.19394 • Published Apr 27 • 14
Generative AI for Character Animation: A Comprehensive Survey of Techniques, Applications, and Future Directions

Paper • 2504.19056 • Published Apr 27 • 18

Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers

Paper • 2504.20752 • Published Apr 29 • 92
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math

Paper • 2504.21233 • Published Apr 30 • 49
AF Adapter: Continual Pretraining for Building Chinese Biomedical Language Model

Paper • 2211.11363 • Published Nov 21, 2022 • 1
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Paper • 2405.12130 • Published May 20, 2024 • 50

Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models

Paper • 2506.06395 • Published Jun 5 • 133
Magistral

Paper • 2506.10910 • Published Jun 12 • 65
Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs

Paper • 2506.07240 • Published Jun 8 • 7
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation

Paper • 2506.09991 • Published Jun 11 • 55

SeerAttention-R: Sparse Attention Adaptation for Long Reasoning

Paper • 2506.08889 • Published Jun 10 • 23
MiniCPM4: Ultra-Efficient LLMs on End Devices

Paper • 2506.07900 • Published Jun 9 • 92
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 262
OpenThoughts: Data Recipes for Reasoning Models

Paper • 2506.04178 • Published Jun 4 • 48

Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 317
WebThinker: Empowering Large Reasoning Models with Deep Research Capability

Paper • 2504.21776 • Published Apr 30 • 59
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math

Paper • 2504.21233 • Published Apr 30 • 49

Skywork Open Reasoner 1 Technical Report

Paper • 2505.22312 • Published May 28 • 54
Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM's Instruction-Following Capabilities

Paper • 2505.21191 • Published May 27 • 3
Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Paper • 2505.03335 • Published May 6 • 188
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 317

QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 88
Distilling LLM Agent into Small Models with Retrieval and Code Tools

Paper • 2505.17612 • Published May 23 • 81
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 317
Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Paper • 2505.03335 • Published May 6 • 188

Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 317
TPTT: Transforming Pretrained Transformer into Titans

Paper • 2506.17671 • Published Jun 21 • 5
Kimi Linear: An Expressive, Efficient Attention Architecture

Paper • 2510.26692 • Published Oct 30 • 114

Scaling Law for Quantization-Aware Training

Paper • 2505.14302 • Published May 20 • 76
Reward Reasoning Model

Paper • 2505.14674 • Published May 20 • 38
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 317
AdaptThink: Reasoning Models Can Learn When to Think

Paper • 2505.13417 • Published May 19 • 82

May 2025 - Top Papers

The Leaderboard Illusion

Paper • 2504.20879 • Published Apr 29 • 72
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Paper • 2505.09343 • Published May 14 • 73
LLMs for Engineering: Teaching Models to Design High Powered Rockets

Paper • 2504.19394 • Published Apr 27 • 14
Generative AI for Character Animation: A Comprehensive Survey of Techniques, Applications, and Future Directions

Paper • 2504.19056 • Published Apr 27 • 18

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14 • 97
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 317
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning

Paper • 2505.11049 • Published May 16 • 60
Emerging Properties in Unified Multimodal Pretraining

Paper • 2505.14683 • Published May 20 • 134

Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers

Paper • 2504.20752 • Published Apr 29 • 92
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math

Paper • 2504.21233 • Published Apr 30 • 49
AF Adapter: Continual Pretraining for Building Chinese Biomedical Language Model

Paper • 2211.11363 • Published Nov 21, 2022 • 1
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Paper • 2405.12130 • Published May 20, 2024 • 50

Previous
1
2
3
4
5
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs