16 16

Xiangyu

xixy

https://xixy.github.io/

AI & ML interests

None yet

Recent Activity

upvoted a paper 13 days ago

Universal Reasoning Model

commented on a paper 15 days ago

Rethinking Expert Trajectory Utilization in LLM Post-training

commented on a paper 15 days ago

State over Tokens: Characterizing the Role of Reasoning Tokens

View all activity

Organizations

None yet

commented 2 papers 15 days ago

Rethinking Expert Trajectory Utilization in LLM Post-training

Paper • 2512.11470 • Published 19 days ago • 7 •

State over Tokens: Characterizing the Role of Reasoning Tokens

Paper • 2512.12777 • Published 17 days ago • 3 •

commented 2 papers about 2 months ago

Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

Paper • 2511.06221 • Published Nov 9 • 131 •

DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation

Paper • 2511.06307 • Published Nov 9 • 51 •

commented 3 papers 2 months ago

Learning from the Best, Differently: A Diversity-Driven Rethinking on Data Selection

Paper • 2510.18909 • Published Oct 21 • 4 •

First Try Matters: Revisiting the Role of Reflection in Reasoning Models

Paper • 2510.08308 • Published Oct 9 • 24 •

First Try Matters: Revisiting the Role of Reflection in Reasoning Models

Paper • 2510.08308 • Published Oct 9 • 24 •

New activity in XenArcAI/MathX-5M 5 months ago

What is the model used to produce responses?

#3 opened 5 months ago by

xixy

New activity in a-m-team/AM-DeepSeek-R1-0528-Distilled 7 months ago

什么叫中国速度！

#1 opened 7 months ago by

reign12

commented a paper 7 months ago

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 263 •

New activity in a-m-team/AM-DeepSeek-R1-0528-Distilled 7 months ago

请问有code相关的评测结果吗？

#2 opened 7 months ago by

xixy

commented a paper 7 months ago

Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning

Paper • 2505.21067 • Published May 27 • 3 •

commented a paper 8 months ago

Model Merging in Pre-training of Large Language Models

Paper • 2505.12082 • Published May 17 • 40 •

commented a paper about 1 year ago

IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization

Paper • 2411.06208 • Published Nov 9, 2024 • 21 •

commented 2 papers over 1 year ago

Scaling Synthetic Data Creation with 1,000,000,000 Personas

Paper • 2406.20094 • Published Jun 28, 2024 • 104 •

Rho-1: Not All Tokens Are What You Need

Paper • 2404.07965 • Published Apr 11, 2024 • 93 •

Xiangyu

AI & ML interests

Recent Activity

Organizations

xixy's activity

What is the model used to produce responses?

什么叫中国速度！

请问有code相关的评测结果吗？