We just released TRL v0.20 with major multimodal upgrades!
šļø VLM support for GRPO (highly requested by the community!) šļø New GSPO trainer (from @Qwen, released last week, VLM-ready) š New MPO trainer (multimodal by design, as in the paper)