Terminal-World: Scaling Terminal-Agent Environments via Agent Skills Paper • 2605.20876 • Published 4 days ago • 7
EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL Paper • 2605.18703 • Published 6 days ago • 48
The Granularity Axis: A Micro-to-Macro Latent Direction for Social Roles in Language Models Paper • 2605.06196 • Published 17 days ago • 9
The Granularity Axis: A Micro-to-Macro Latent Direction for Social Roles in Language Models Paper • 2605.06196 • Published 17 days ago • 9
From Context to Skills: Can Language Models Learn from Context Skillfully? Paper • 2604.27660 • Published 21 days ago • 162
SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution Paper • 2604.18982 • Published Apr 21 • 4
SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution Paper • 2604.18982 • Published Apr 21 • 4
SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution Paper • 2604.18982 • Published Apr 21 • 4
Stratagem: Learning Transferable Reasoning via Trajectory-Modulated Game Self-Play Paper • 2604.17696 • Published Apr 20 • 6
Stratagem: Learning Transferable Reasoning via Trajectory-Modulated Game Self-Play Paper • 2604.17696 • Published Apr 20 • 6
Stratagem: Learning Transferable Reasoning via Trajectory-Modulated Game Self-Play Paper • 2604.17696 • Published Apr 20 • 6
GlobeSumm: A Challenging Benchmark Towards Unifying Multi-lingual, Cross-lingual and Multi-document News Summarization Paper • 2410.04087 • Published Oct 5, 2024
Causal Tracing of Object Representations in Large Vision Language Models: Mechanistic Interpretability and Hallucination Mitigation Paper • 2511.05923 • Published Nov 8, 2025
Fine-Mem: Fine-Grained Feedback Alignment for Long-Horizon Memory Management Paper • 2601.08435 • Published Jan 13
ImplicitMemBench: Measuring Unconscious Behavioral Adaptation in Large Language Models Paper • 2604.08064 • Published Apr 9 • 8
ImplicitMemBench: Measuring Unconscious Behavioral Adaptation in Large Language Models Paper • 2604.08064 • Published Apr 9 • 8
CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models Paper • 2602.17684 • Published Feb 4 • 22