view article Article mem-agent: Equipping LLM Agents with Memory Using RL By driaforall and 1 other β’ 19 days ago β’ 32
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models Paper β’ 2508.06471 β’ Published Aug 8 β’ 186
Multi-Agent Game Generation and Evaluation via Audio-Visual Recordings Paper β’ 2508.00632 β’ Published Aug 1 β’ 3
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm Paper β’ 2507.18553 β’ Published Jul 24 β’ 40
Specification Self-Correction: Mitigating In-Context Reward Hacking Through Test-Time Refinement Paper β’ 2507.18742 β’ Published Jul 24 β’ 5
view article Article Automated Discovery of High-Performance GPU Kernels with OpenEvolve By codelion β’ Jun 27 β’ 23
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization Paper β’ 2507.06181 β’ Published Jul 8 β’ 43
Configurable Preference Tuning βοΈπ Collection CPT uses rubric-guided synthetic data and DPO to enable LLMs to dynamically adjust behavior (e.g., writing style) at inference with system prompts β’ 7 items β’ Updated Jun 17 β’ 1
Configurable Preference Tuning with Rubric-Guided Synthetic Data Paper β’ 2506.11702 β’ Published Jun 13 β’ 1
Training-Free Tokenizer Transplantation via Orthogonal Matching Pursuit Paper β’ 2506.06607 β’ Published Jun 7 β’ 2
Atropos Artifacts Collection A collection of experimental artifacts created with Atropos, Nous' RL Environments framework - https://github.com/NousResearch/Atropos β’ 9 items β’ Updated Sep 8 β’ 11
Reinforcement Learning for Reasoning in Large Language Models with One Training Example Paper β’ 2504.20571 β’ Published Apr 29 β’ 96
Perception Encoder: The best visual embeddings are not at the output of the network Paper β’ 2504.13181 β’ Published Apr 17 β’ 34