My AI - a malkesh2911 Collection

malkesh2911 's Collections

My AI

My AI

updated Nov 26, 2025

FlowRL: Matching Reward Distributions for LLM Reasoning

Paper • 2509.15207 • Published Sep 18, 2025 • 114
Kwaipilot/KAT-Dev-72B-Exp

Text Generation • 73B • Updated Oct 13, 2025 • 624 • 160
Agentic Entropy-Balanced Policy Optimization

Paper • 2510.14545 • Published Oct 16, 2025 • 104
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO

Paper • 2511.13288 • Published Nov 17, 2025 • 17