Ruizhong Qiu

q-rz

·

https://q-rz.github.io

q-rz

AI & ML interests

None yet

Recent Activity

upvoted a paper about 2 months ago

Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning

upvoted a paper 2 months ago

Code as Agent Harness

upvoted a paper 2 months ago

RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards

View all activity

Organizations

None yet

upvoted a paper about 2 months ago

Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning

Paper • 2605.22642 • Published May 21 • 35

upvoted 2 papers 2 months ago

Code as Agent Harness

Paper • 2605.18747 • Published May 18 • 225

RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards

Paper • 2605.10899 • Published May 11 • 79

upvoted 3 papers 3 months ago

Heterogeneous Scientific Foundation Model Collaboration

Paper • 2604.27351 • Published Apr 30 • 222

Recursive Multi-Agent Systems

Paper • 2604.25917 • Published Apr 28 • 288

The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents

Paper • 2604.10577 • Published Apr 12 • 27

upvoted a paper 4 months ago

ReMix: Reinforcement routing for mixtures of LoRAs in LLM finetuning

Paper • 2603.10160 • Published Mar 10 • 26

upvoted 2 papers 5 months ago

dLLM: Simple Diffusion Language Modeling

Paper • 2602.22661 • Published Feb 26 • 154

Weak-Driven Learning: How Weak Agents make Strong Agents Stronger

Paper • 2602.08222 • Published Feb 9 • 290

upvoted 2 papers 6 months ago

Agentic Reasoning for Large Language Models

Paper • 2601.12538 • Published Jan 18 • 207

Your Group-Relative Advantage Is Biased

Paper • 2601.08521 • Published Jan 13 • 158

upvoted a paper 8 months ago

Latent Collaboration in Multi-Agent Systems

Paper • 2511.20639 • Published Nov 25, 2025 • 130

upvoted 3 papers 10 months ago

TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning

Paper • 2510.06217 • Published Oct 7, 2025 • 67

Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum

Paper • 2510.00526 • Published Oct 1, 2025 • 11

EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning

Paper • 2509.22576 • Published Sep 26, 2025 • 137

upvoted 3 papers about 1 year ago

VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use

Paper • 2505.19255 • Published May 25, 2025 • 5

Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance

Paper • 2506.06444 • Published Jun 6, 2025 • 73

RM-R1: Reward Modeling as Reasoning

Paper • 2505.02387 • Published May 5, 2025 • 81