RL Compositionality
Collection
From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones. https://huggingface.co/papers/2509.25123
•
5 items
•
Updated
The model after Stage 1 RFT.
Base model
meta-llama/Llama-3.1-8B