-
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper • 2505.03335 • Published • 185 -
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper • 2408.06195 • Published • 73 -
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR
Paper • 2508.14029 • Published • 118 -
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play
Paper • 2509.25541 • Published • 137
Collections
Discover the best community collections!
Collections including paper arxiv:2408.06195
-
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Paper • 2501.04519 • Published • 285 -
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though
Paper • 2501.04682 • Published • 99 -
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning
Paper • 2410.02884 • Published • 54 -
Think Before You Speak: Cultivating Communication Skills of Large Language Models via Inner Monologue
Paper • 2311.07445 • Published
-
Phi-4 Technical Report
Paper • 2412.08905 • Published • 121 -
Evaluating and Aligning CodeLLMs on Human Preference
Paper • 2412.05210 • Published • 50 -
Evaluating Language Models as Synthetic Data Generators
Paper • 2412.03679 • Published • 48 -
Yi-Lightning Technical Report
Paper • 2412.01253 • Published • 28
-
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper • 2408.06195 • Published • 73 -
unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF
Text Generation • 8B • Updated • 113k • 338 -
unsloth/Qwen3-8B-GGUF
Text Generation • 8B • Updated • 23.9k • 74 -
SaffalPoosh/system_design_expert
Text Generation • Updated • 2 • 1
-
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper • 2408.03314 • Published • 63 -
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Paper • 2501.04519 • Published • 285 -
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper • 2408.06195 • Published • 73
-
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper • 2408.06195 • Published • 73 -
Thinking LLMs: General Instruction Following with Thought Generation
Paper • 2410.10630 • Published • 21 -
Democratizing Reasoning Ability: Tailored Learning from Large Language Model
Paper • 2310.13332 • Published • 16 -
OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning
Paper • 2412.16849 • Published • 9
-
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper • 2408.06195 • Published • 73 -
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 140 -
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper • 2408.03314 • Published • 63 -
Self-Reflection in LLM Agents: Effects on Problem-Solving Performance
Paper • 2405.06682 • Published • 3
-
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Paper • 2408.15998 • Published • 87 -
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Paper • 2409.01704 • Published • 83 -
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper • 2408.06195 • Published • 73 -
Self-Reflection in LLM Agents: Effects on Problem-Solving Performance
Paper • 2405.06682 • Published • 3
-
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper • 2505.03335 • Published • 185 -
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper • 2408.06195 • Published • 73 -
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR
Paper • 2508.14029 • Published • 118 -
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play
Paper • 2509.25541 • Published • 137
-
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper • 2408.06195 • Published • 73 -
unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF
Text Generation • 8B • Updated • 113k • 338 -
unsloth/Qwen3-8B-GGUF
Text Generation • 8B • Updated • 23.9k • 74 -
SaffalPoosh/system_design_expert
Text Generation • Updated • 2 • 1
-
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Paper • 2501.04519 • Published • 285 -
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though
Paper • 2501.04682 • Published • 99 -
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning
Paper • 2410.02884 • Published • 54 -
Think Before You Speak: Cultivating Communication Skills of Large Language Models via Inner Monologue
Paper • 2311.07445 • Published
-
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper • 2408.03314 • Published • 63 -
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Paper • 2501.04519 • Published • 285 -
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper • 2408.06195 • Published • 73
-
Phi-4 Technical Report
Paper • 2412.08905 • Published • 121 -
Evaluating and Aligning CodeLLMs on Human Preference
Paper • 2412.05210 • Published • 50 -
Evaluating Language Models as Synthetic Data Generators
Paper • 2412.03679 • Published • 48 -
Yi-Lightning Technical Report
Paper • 2412.01253 • Published • 28
-
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper • 2408.06195 • Published • 73 -
Thinking LLMs: General Instruction Following with Thought Generation
Paper • 2410.10630 • Published • 21 -
Democratizing Reasoning Ability: Tailored Learning from Large Language Model
Paper • 2310.13332 • Published • 16 -
OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning
Paper • 2412.16849 • Published • 9
-
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper • 2408.06195 • Published • 73 -
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 140 -
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper • 2408.03314 • Published • 63 -
Self-Reflection in LLM Agents: Effects on Problem-Solving Performance
Paper • 2405.06682 • Published • 3
-
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Paper • 2408.15998 • Published • 87 -
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Paper • 2409.01704 • Published • 83 -
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper • 2408.06195 • Published • 73 -
Self-Reflection in LLM Agents: Effects on Problem-Solving Performance
Paper • 2405.06682 • Published • 3