Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
zhuww 's Collections
multi-turn
RL
arena
SWE
code
agentic
LLM
reasoning llm

RL

updated 16 days ago
Upvote
-

  • Large Reasoning Models Learn Better Alignment from Flawed Thinking

    Paper • 2510.00938 • Published 26 days ago • 57

  • What Characterizes Effective Reasoning? Revisiting Length, Review, and Structure of CoT

    Paper • 2509.19284 • Published Sep 23 • 22

  • Learning to Reason as Action Abstractions with Scalable Mid-Training RL

    Paper • 2509.25810 • Published 28 days ago • 5

  • Agent Learning via Early Experience

    Paper • 2510.08558 • Published 18 days ago • 243
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs