4 15 8

Yanxiao Zhao

sdpkjc

https://sdpkjc.me

AI & ML interests

Reinforcement Learning

Recent Activity

updated a dataset 15 days ago

TheFactoryX/edition_0001_Rowan-hellaswag-readymade

published a dataset 15 days ago

TheFactoryX/edition_0001_Rowan-hellaswag-readymade

updated a dataset 15 days ago

TheFactoryX/edition_0000_fancyzhx-ag_news-readymade

View all activity

Organizations

updated a dataset 15 days ago

TheFactoryX/edition_0001_Rowan-hellaswag-readymade

Viewer • Updated 15 days ago • 500 • 27

published a dataset 15 days ago

TheFactoryX/edition_0001_Rowan-hellaswag-readymade

Viewer • Updated 15 days ago • 500 • 27

updated a dataset 15 days ago

TheFactoryX/edition_0000_fancyzhx-ag_news-readymade

Viewer • Updated 15 days ago • 500 • 23

published a dataset 15 days ago

TheFactoryX/edition_0000_fancyzhx-ag_news-readymade

Viewer • Updated 15 days ago • 500 • 23

updated a Space 15 days ago

README

👁

published a Space 15 days ago

README

👁

New activity in xlangai/ubuntu_osworld_file_cache about 2 months ago

Fix update_browse_history_setup

#7 opened about 2 months ago by

sdpkjc

New activity in sdpkjc/SATQuest 2 months ago

Update dataset card: Add paper link, task categories, and tags

#2 opened 2 months ago by

nielsr

authored 3 papers 2 months ago

ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents

Paper • 2508.14040 • Published Aug 19 • 3

SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs

Paper • 2509.00930 • Published Aug 31 • 4

CAMEL: Continuous Action Masking Enabled by Large Language Models for Reinforcement Learning

Paper • 2502.11896 • Published Feb 17

updated a collection 2 months ago

SATQuest

Collection

SATQuest Dataset Collections • 3 items • Updated Sep 4 • 1

upvoted a paper 2 months ago

SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs

Paper • 2509.00930 • Published Aug 31 • 4

commented a paper 2 months ago

SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs

Paper • 2509.00930 • Published Aug 31 • 4 •

upvoted a paper 2 months ago

ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents

Paper • 2508.14040 • Published Aug 19 • 3

updated 2 datasets 4 months ago

sdpkjc/SATQuest-RFT-3k

Viewer • Updated Jul 30 • 3k • 10

sdpkjc/SATQuest

Viewer • Updated Sep 6 • 140 • 32

New activity in Qwen/Qwen3-1.7B 5 months ago

Fix chat template in case of multiple assistant messages and no thinking

👍 ❤️ 2

#9 opened 6 months ago by

VityaVitalich

updated a dataset 6 months ago

sdpkjc/24problems_quiz-eval-n4-1-10-24

Viewer • Updated May 22 • 55.5k • 10

published a dataset 6 months ago

sdpkjc/24problems_quiz-eval-n4-1-10-24

Viewer • Updated May 22 • 55.5k • 10

Yanxiao Zhao

AI & ML interests

Recent Activity

Organizations

sdpkjc's activity

README

README

Fix update_browse_history_setup

Update dataset card: Add paper link, task categories, and tags

Fix chat template in case of multiple assistant messages and no thinking