Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
17
11
Xuandong Zhao
Xuandong
Follow
photosban's profile picture
kaiquliang's profile picture
zombieofCrypto's profile picture
6 followers
·
6 following
https://xuandongzhao.github.io/
xuandongzhao
XuandongZhao
xuandong-zhao-a3270610b
AI & ML interests
None yet
Recent Activity
replied
to
Kseniase
's
post
26 days ago
8 Emerging trends in Reinforcement Learning Reinforcement learning is having a moment - and not just this week. Some of its directions are already showing huge promise, while others are still early but exciting. Here’s a look at what’s happening right now in RL: 1. Reinforcement Pre-Training (RPT) → https://huggingface.co/papers/2506.08007 Reframes next-token pretraining as RL with verifiable rewards, yielding scalable reasoning gains 2. Reinforcement Learning from Human Feedback (RLHF) → https://huggingface.co/papers/1706.03741 The top approach. It trains a model using human preference feedback, building a reward model and then optimizing the policy to generate outputs people prefer 3. Reinforcement Learning with Verifiable Rewards (RLVR) → https://huggingface.co/papers/2506.14245 Moves from subjective (human-labeled) rewards to objective ones that can be automatically verified, like in math, code, or rubrics as reward, for example → https://huggingface.co/papers/2508.12790, https://huggingface.co/papers/2507.17746 4. Multi-objective RL → https://huggingface.co/papers/2508.07768 Trains LMs to balance multiple goals at once, like being helpful but also concise or creative, ensuring that improving one goal doesn’t ruin another 5. Parallel thinking RL → https://huggingface.co/papers/2509.07980 Trains parallel chains of thought, boosting math accuracy and final ceilings. It first teaches the model “parallel thinking” skill on easier problems, then uses RL to refine it on harder ones Read further below ⬇️ And if you like this, subscribe to the Turing post: https://www.turingpost.com/subscribe Also, check out our recent guide about the past, present and future of RL: https://www.turingpost.com/p/rlguide
new
activity
3 months ago
sunblaze-ucb/Qwen2.5-1.5B-Intuitor-MATH-1EPOCH:
Improve model card: Add transformers library, expand description, links, and usage
new
activity
3 months ago
sunblaze-ucb/OLMo-2-7B-SFT-GRPO-MATH-1EPOCH:
Improve model card: Add library, links, and usage example
View all activity
Organizations
Papers
7
arxiv:
2505.19590
arxiv:
2504.04715
arxiv:
2410.06172
arxiv:
2401.17256
Expand 7 papers
spaces
1
Runtime error
1
Unigram-Watermark
👀
models
7
Sort: Recently updated
Xuandong/Qwen3-14B-Intuitor-MATH-1EPOCH-R16-A100-ENLOSS
15B
•
Updated
Jun 16
•
1
Xuandong/Qwen3-14B-Intuitor-MATH-1EPOCH-R8-A100
15B
•
Updated
Jun 16
•
2
Xuandong/Qwen3-14B-GRPO-MATH-1EPOCH-R8-A100
15B
•
Updated
Jun 16
•
1
Xuandong/OLMo-2-7B-SFT-GRPO-MATH-1EPOCH-R16-A100
7B
•
Updated
Jun 16
Xuandong/OLMo-2-7B-SFT-Intuitor-MATH-1EPOCH-R16-A100
7B
•
Updated
Jun 16
Xuandong/HPD-TinyBERT-F128
Feature Extraction
•
Updated
May 10, 2022
•
11
•
1
Xuandong/HPD-MiniLM-F128
Feature Extraction
•
Updated
May 10, 2022
•
4
datasets
0
None public yet