SII-Yibin Wang's picture

SII-Yibin Wang

CodeGoat24

·

https://codegoat24.github.io/

CodeGoat24

AI & ML interests

I'm part of Shanghai Innovation Institute, focusing on Multimodal RL and Generation.

Recent Activity

updated a collection 2 days ago

UnifiedReward Edit Models

updated a collection 2 days ago

UnifiedReward Edit Models

updated a collection 2 days ago

UnifiedReward Edit Models

View all activity

Organizations

upvoted a paper 4 days ago

UniREditBench: A Unified Reasoning-based Image Editing Benchmark

Paper • 2511.01295 • Published 5 days ago • 36

upvoted a collection 5 days ago

UniREditBench

4 items • Updated 5 days ago • 1

upvoted 2 papers 17 days ago

MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues

Paper • 2510.17722 • Published 19 days ago • 19

UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation

Paper • 2510.18701 • Published 18 days ago • 66

upvoted a paper 23 days ago

RLFR: Extending Reinforcement Learning for LLMs with Flow Environment

Paper • 2510.10201 • Published 28 days ago • 35

upvoted 4 papers about 1 month ago

G^2RPO: Granular GRPO for Precise Reward in Flow Models

Paper • 2510.01982 • Published Oct 2 • 5

Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding

Paper • 2510.06308 • Published Oct 7 • 52

InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles

Paper • 2508.16072 • Published Aug 22 • 4

CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning

Paper • 2509.22647 • Published Sep 26 • 32

upvoted 2 collections 2 months ago

UnifiedReward 2.0 Qwen2.5VL Models

10 items • Updated 2 days ago • 1

Pref-GRPO & UniGenBench

6 items • Updated 2 days ago • 1

upvoted a paper 2 months ago

Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

Paper • 2508.20751 • Published Aug 28 • 89

upvoted 2 papers 3 months ago

SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience

Paper • 2508.04700 • Published Aug 6 • 52

Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models

Paper • 2508.00819 • Published Aug 1 • 62

upvoted a collection 5 months ago

UnifiedReward 1.0 Qwen2.5 Models GGUF

9 items • Updated 2 days ago • 2

upvoted a paper 5 months ago

GeometryZero: Improving Geometry Solving for LLM with Group Contrastive Policy Optimization

Paper • 2506.07160 • Published Jun 8 • 3

upvoted 3 papers 6 months ago

Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models

Paper • 2505.02686 • Published May 5 • 16

MagicFace: Training-free Universal-Style Human Image Customized Synthesis

Paper • 2408.07433 • Published Aug 14, 2024 • 1

Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning

Paper • 2505.03318 • Published May 6 • 93

upvoted a collection 7 months ago

UnifiedReward 1.0 Qwen2.5VL Models

6 items • Updated 2 days ago • 10