10 26 150

Xie

Zhihui

https://zhxie.site/

zhxieml

AI & ML interests

None yet

Recent Activity

liked a dataset about 5 hours ago

rl-research/dr-tulu-rl-data

liked a model 5 days ago

meituan-longcat/LongCat-Flash-Chat

upvoted a paper 16 days ago

OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows

View all activity

Organizations

upvoted a paper 16 days ago

OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows

Paper • 2510.24411 • Published 22 days ago • 70

upvoted a paper 19 days ago

The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

Paper • 2510.25726 • Published 21 days ago • 44

upvoted 2 papers about 1 month ago

The Alignment Waltz: Jointly Training Agents to Collaborate for Safety

Paper • 2510.08240 • Published Oct 9 • 40

Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense

Paper • 2510.07242 • Published Oct 8 • 30

upvoted a paper about 2 months ago

RESTRAIN: From Spurious Votes to Signals -- Self-Driven RL with Self-Penalization

Paper • 2510.02172 • Published Oct 2 • 7

upvoted a paper 3 months ago

Jointly Reinforcing Diversity and Quality in Language Model Generations

Paper • 2509.02534 • Published Sep 2 • 24

upvoted 3 papers 6 months ago

SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond

Paper • 2505.19641 • Published May 26 • 67

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

Paper • 2505.19897 • Published May 26 • 104

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

Paper • 2505.13227 • Published May 19 • 45

upvoted 2 papers 8 months ago

MegaMath: Pushing the Limits of Open Math Corpora

Paper • 2504.02807 • Published Apr 3 • 34

CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era

Paper • 2503.12329 • Published Mar 16 • 27

upvoted a paper 9 months ago

CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction

Paper • 2502.07316 • Published Feb 11 • 50

upvoted a collection 9 months ago

UI Agent

Collection

a collection of algorithmic agents for user interfaces/interactions, program synthesis, and robotics • 425 items • Updated 6 days ago • 64

upvoted a paper 9 months ago

Teaching Language Models to Critique via Reinforcement Learning

Paper • 2502.03492 • Published Feb 5 • 24

upvoted 2 papers 11 months ago

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

Paper • 2412.19723 • Published Dec 27, 2024 • 87

Diving into Self-Evolving Training for Multimodal Reasoning

Paper • 2412.17451 • Published Dec 23, 2024 • 43

upvoted a paper 12 months ago

VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models

Paper • 2411.17451 • Published Nov 26, 2024 • 11

upvoted a paper about 1 year ago

Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale

Paper • 2409.17115 • Published Sep 25, 2024 • 63

upvoted 2 papers over 1 year ago

Qwen2 Technical Report

Paper • 2407.10671 • Published Jul 15, 2024 • 167

Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

Paper • 2404.12387 • Published Apr 18, 2024 • 39

Xie

AI & ML interests

Recent Activity

Organizations

Zhihui's activity