yanan

yananchen

AI & ML interests

None yet

Recent Activity

updated a dataset 1 day ago

yananchen/robomimic_lift

published a dataset 1 day ago

yananchen/robomimic_lift

updated a dataset 4 days ago

yananchen/robosuite_lift

View all activity

Organizations

None yet

updated a dataset 1 day ago

yananchen/robomimic_lift

Viewer • Updated 1 day ago • 40.8k • 44

published a dataset 1 day ago

yananchen/robomimic_lift

Viewer • Updated 1 day ago • 40.8k • 44

updated a dataset 4 days ago

yananchen/robosuite_lift

Viewer • Updated 4 days ago • 2.52k • 106

published a dataset 4 days ago

yananchen/robosuite_lift

Viewer • Updated 4 days ago • 2.52k • 106

updated a dataset 6 days ago

yananchen/robosuite_lift_pick_place_stack

Viewer • Updated 6 days ago • 4.55k • 32

published a dataset 6 days ago

yananchen/robosuite_lift_pick_place_stack

Viewer • Updated 6 days ago • 4.55k • 32

liked a dataset about 2 months ago

physical-intelligence/libero

Viewer • Updated Feb 2 • 273k • 17k • 42

upvoted an article 3 months ago

Article

Visualize and understand GPU memory in PyTorch

Dec 24, 2024

• 248

upvoted an article 4 months ago

Article

Simplifying Alignment: From RLHF to Direct Preference Optimization (DPO)

•

Jan 19

• 32

commented on DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge 4 months ago

hi there.
i think there is an error in your PPO description, actually, PPO does not explicitly penalize the KL divergence from the initial (reference) policy.

upvoted 2 articles 4 months ago

Article

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

•

Feb 7

• 243

Article

Proximal Policy Optimization (PPO)

Aug 5, 2022

• 64

updated a dataset 6 months ago

yananchen/agentbank_mixture

Viewer • Updated May 8 • 53.2k • 7

published a dataset 6 months ago

yananchen/agentbank_mixture

Viewer • Updated May 8 • 53.2k • 7

updated 2 datasets 11 months ago

yananchen/natural_plan__calendar_scheduling

Viewer • Updated Dec 19, 2024 • 1k • 16

yananchen/natural_plan__trip_planning

Viewer • Updated Dec 19, 2024 • 1.6k • 13

updated 4 datasets 12 months ago

yanan

AI & ML interests

Recent Activity

Organizations

yananchen's activity

Visualize and understand GPU memory in PyTorch

Simplifying Alignment: From RLHF to Direct Preference Optimization (DPO)

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

Proximal Policy Optimization (PPO)