-
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper • 2506.06395 • Published • 133 -
Magistral
Paper • 2506.10910 • Published • 65 -
Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs
Paper • 2506.07240 • Published • 7 -
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation
Paper • 2506.09991 • Published • 55
Collections
Discover the best community collections!
Collections including paper arxiv:2505.09388
-
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88 -
Distilling LLM Agent into Small Models with Retrieval and Code Tools
Paper • 2505.17612 • Published • 81 -
Qwen3 Technical Report
Paper • 2505.09388 • Published • 317 -
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper • 2505.03335 • Published • 188
-
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset
Paper • 2505.09568 • Published • 97 -
Qwen3 Technical Report
Paper • 2505.09388 • Published • 317 -
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning
Paper • 2505.11049 • Published • 60 -
Emerging Properties in Unified Multimodal Pretraining
Paper • 2505.14683 • Published • 134
-
SeerAttention-R: Sparse Attention Adaptation for Long Reasoning
Paper • 2506.08889 • Published • 23 -
MiniCPM4: Ultra-Efficient LLMs on End Devices
Paper • 2506.07900 • Published • 92 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 262 -
OpenThoughts: Data Recipes for Reasoning Models
Paper • 2506.04178 • Published • 48
-
Skywork Open Reasoner 1 Technical Report
Paper • 2505.22312 • Published • 54 -
Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM's Instruction-Following Capabilities
Paper • 2505.21191 • Published • 3 -
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper • 2505.03335 • Published • 188 -
Qwen3 Technical Report
Paper • 2505.09388 • Published • 317
-
The Leaderboard Illusion
Paper • 2504.20879 • Published • 72 -
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Paper • 2505.09343 • Published • 73 -
LLMs for Engineering: Teaching Models to Design High Powered Rockets
Paper • 2504.19394 • Published • 14 -
Generative AI for Character Animation: A Comprehensive Survey of Techniques, Applications, and Future Directions
Paper • 2504.19056 • Published • 18
-
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers
Paper • 2504.20752 • Published • 92 -
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math
Paper • 2504.21233 • Published • 49 -
AF Adapter: Continual Pretraining for Building Chinese Biomedical Language Model
Paper • 2211.11363 • Published • 1 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 50
-
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper • 2506.06395 • Published • 133 -
Magistral
Paper • 2506.10910 • Published • 65 -
Overclocking LLM Reasoning: Monitoring and Controlling Thinking Path Lengths in LLMs
Paper • 2506.07240 • Published • 7 -
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation
Paper • 2506.09991 • Published • 55
-
SeerAttention-R: Sparse Attention Adaptation for Long Reasoning
Paper • 2506.08889 • Published • 23 -
MiniCPM4: Ultra-Efficient LLMs on End Devices
Paper • 2506.07900 • Published • 92 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 262 -
OpenThoughts: Data Recipes for Reasoning Models
Paper • 2506.04178 • Published • 48
-
Skywork Open Reasoner 1 Technical Report
Paper • 2505.22312 • Published • 54 -
Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM's Instruction-Following Capabilities
Paper • 2505.21191 • Published • 3 -
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper • 2505.03335 • Published • 188 -
Qwen3 Technical Report
Paper • 2505.09388 • Published • 317
-
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88 -
Distilling LLM Agent into Small Models with Retrieval and Code Tools
Paper • 2505.17612 • Published • 81 -
Qwen3 Technical Report
Paper • 2505.09388 • Published • 317 -
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper • 2505.03335 • Published • 188
-
The Leaderboard Illusion
Paper • 2504.20879 • Published • 72 -
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Paper • 2505.09343 • Published • 73 -
LLMs for Engineering: Teaching Models to Design High Powered Rockets
Paper • 2504.19394 • Published • 14 -
Generative AI for Character Animation: A Comprehensive Survey of Techniques, Applications, and Future Directions
Paper • 2504.19056 • Published • 18
-
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset
Paper • 2505.09568 • Published • 97 -
Qwen3 Technical Report
Paper • 2505.09388 • Published • 317 -
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning
Paper • 2505.11049 • Published • 60 -
Emerging Properties in Unified Multimodal Pretraining
Paper • 2505.14683 • Published • 134
-
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers
Paper • 2504.20752 • Published • 92 -
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math
Paper • 2504.21233 • Published • 49 -
AF Adapter: Continual Pretraining for Building Chinese Biomedical Language Model
Paper • 2211.11363 • Published • 1 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 50