4 26 8

Penghui Qi

QPHutu

QPHutu

AI & ML interests

None yet

Recent Activity

liked a dataset 4 days ago

LLM360/guru-RL-92k

liked a dataset 4 days ago

zwhe99/DeepMath-103K

updated a dataset 5 days ago

sail/Sanity-Test-R1D-1.5B

View all activity

Organizations

liked 2 datasets 4 days ago

LLM360/guru-RL-92k

Viewer • Updated Aug 20 • 91.9k • 2.64k • 38

zwhe99/DeepMath-103K

Viewer • Updated May 29 • 103k • 6.33k • 269

updated a dataset 5 days ago

sail/Sanity-Test-R1D-1.5B

Viewer • Updated 5 days ago • 1.52k • 58 • 6

liked a dataset 6 days ago

sail/Sanity-Test-R1D-1.5B

Viewer • Updated 5 days ago • 1.52k • 58 • 6

updated a collection 6 days ago

Precision-RL

Collection

Defeating the Training-Inference Mismatch via FP16 • 2 items • Updated 6 days ago

published a dataset 6 days ago

sail/Sanity-Test-R1D-1.5B

Viewer • Updated 5 days ago • 1.52k • 58 • 6

updated a collection 6 days ago

Precision-RL

Collection

Defeating the Training-Inference Mismatch via FP16 • 2 items • Updated 6 days ago

liked a model 10 days ago

zz1358m/SofT-GRPO-master

Updated 7 days ago • 6

upvoted a paper 14 days ago

Diffusion Language Models are Super Data Learners

Paper • 2511.03276 • Published 15 days ago • 116

authored a paper 17 days ago

Defeating the Training-Inference Mismatch via FP16

Paper • 2510.26788 • Published 21 days ago • 27

upvoted a paper 18 days ago

Defeating the Training-Inference Mismatch via FP16

Paper • 2510.26788 • Published 21 days ago • 27

commented a paper 18 days ago

Defeating the Training-Inference Mismatch via FP16

Paper • 2510.26788 • Published 21 days ago • 27 •

upvoted 2 papers about 2 months ago

Language Models Can Learn from Verbal Feedback Without Scalar Rewards

Paper • 2509.22638 • Published Sep 26 • 67

Variational Reasoning for Language Models

Paper • 2509.22637 • Published Sep 26 • 68

liked a dataset about 2 months ago

SynthLabsAI/Big-Math-RL-Verified

Viewer • Updated Mar 25 • 251k • 5.66k • 211

upvoted 3 papers 3 months ago

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published Sep 2 • 224

VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use

Paper • 2509.01055 • Published Sep 1 • 73

SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published Sep 2 • 83

updated a collection 4 months ago

LLM Agent

Collection

4 items • Updated Aug 4

upvoted a paper 5 months ago

Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy

Paper • 2507.01352 • Published Jul 2 • 55

Penghui Qi

AI & ML interests

Recent Activity

Organizations

QPHutu's activity