Penghui Qi's picture

3 24 4

Penghui Qi

QPHutu

·

QPHutu

AI & ML interests

None yet

Recent Activity

published a model 2 days ago

QPHutu/DeepSeek-R1-Distill-Qwen-1.5B

updated a model 2 days ago

QPHutu/DeepSeek-R1-Distill-Qwen-1.5B

upvoted a paper about 1 month ago

Language Models Can Learn from Verbal Feedback Without Scalar Rewards

View all activity

Organizations

upvoted 2 papers about 1 month ago

Language Models Can Learn from Verbal Feedback Without Scalar Rewards

Paper • 2509.22638 • Published Sep 26 • 67

Variational Reasoning for Language Models

Paper • 2509.22637 • Published Sep 26 • 68

upvoted 3 papers about 2 months ago

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published Sep 2 • 219

VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use

Paper • 2509.01055 • Published Sep 1 • 72

SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published Sep 2 • 83

upvoted 2 papers 4 months ago

Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy

Paper • 2507.01352 • Published Jul 2 • 54

SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

Paper • 2506.24119 • Published Jun 30 • 50

upvoted 5 papers 5 months ago

Fostering Video Reasoning via Next-Event Prediction

Paper • 2505.22457 • Published May 28 • 29

Skywork Open Reasoner 1 Technical Report

Paper • 2505.22312 • Published May 28 • 54

Reinforcing General Reasoning without Verifiers

Paper • 2505.21493 • Published May 27 • 26

Lifelong Safety Alignment for Language Models

Paper • 2505.20259 • Published May 26 • 23

Optimizing Anytime Reasoning via Budget Relative Policy Optimization

Paper • 2505.13438 • Published May 19 • 36

upvoted an article 7 months ago

Article

双流并行(DualPipe) 没有双流会更好

By

•

Feb 28

• 7

upvoted a paper 7 months ago

Understanding R1-Zero-Like Training: A Critical Perspective

Paper • 2503.20783 • Published Mar 26 • 56

upvoted a paper 8 months ago

PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization

Paper • 2503.01328 • Published Mar 3 • 16

upvoted an article 8 months ago

Article

DualPipe could be better without the Dual

By

•

Feb 28

• 17

upvoted 4 collections 11 months ago

⚓️ Sailor Language Models

Sailor: Open Language Models tailored for South-East Asia (SEA) released by Sea AI Lab. • 17 items • Updated Dec 3, 2024 • 17

💡 DICE

Self-alignment with DPO Implicit Rewards • 5 items • Updated Jul 28, 2024 • 9

📈 Scaling Laws with Vocabulary

Increase your vocabulary size when you scale up your language model • 5 items • Updated Aug 11, 2024 • 6

🧬 RegMix: Data Mixture as Regression

Automatic data mixture method for large language model pre-training • 10 items • Updated Jul 26, 2024 • 8