MLGYM

community

Activity Feed Request to join this org

AI & ML interests

None defined yet.

authored a paper 4 months ago

Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants

Paper • 2604.00842 • Published Apr 1 • 15

authored a paper 4 months ago

Procedural Generation of Algorithm Discovery Tasks in Machine Learning

Paper • 2603.17863 • Published Mar 18 • 5

authored 3 papers 4 months ago

WildSci: Advancing Scientific Reasoning from In-the-Wild Literature

Paper • 2601.05567 • Published Jan 9

Group-Evolving Agents: Open-Ended Self-Improvement via Experience Sharing

Paper • 2602.04837 • Published Feb 4 • 10

Procedural Generation of Algorithm Discovery Tasks in Machine Learning

Paper • 2603.17863 • Published Mar 18 • 5

authored a paper 5 months ago

AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents

Paper • 2602.06855 • Published Feb 6 • 83

authored a paper 5 months ago

AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents

Paper • 2602.06855 • Published Feb 6 • 83

authored 3 papers 8 months ago

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published Feb 20, 2025 • 196

The State and Fate of Linguistic Diversity and Inclusion in the NLP World

Paper • 2004.09095 • Published Apr 20, 2020

What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity

Paper • 2511.15593 • Published Nov 19, 2025 • 59

authored a paper 8 months ago

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published Feb 20, 2025 • 196

authored a paper 8 months ago

Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance

Paper • 2511.13254 • Published Nov 17, 2025 • 140

authored a paper 8 months ago

Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance

Paper • 2511.13254 • Published Nov 17, 2025 • 140

movb

authored a paper 10 months ago

ARE: Scaling Up Agent Environments and Evaluations

Paper • 2509.17158 • Published Sep 21, 2025 • 36

authored a paper 10 months ago

ARE: Scaling Up Agent Environments and Evaluations

Paper • 2509.17158 • Published Sep 21, 2025 • 36

published a dataset about 1 year ago

mlgym/coco-captioning

Viewer • Updated Dec 1, 2024 • 56.5k • 140

authored 4 papers over 1 year ago

A Family of Pretrained Transformer Language Models for Russian

Paper • 2309.10931 • Published Sep 19, 2023 • 7

RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark

Paper • 2010.15925 • Published Oct 29, 2020 • 1

Russian SuperGLUE 1.1: Revising the Lessons not Learned by Russian NLP models

Paper • 2202.07791 • Published Feb 15, 2022

Findings of the The RuATD Shared Task 2022 on Artificial Text Detection in Russian

Paper • 2206.01583 • Published Jun 3, 2022 • 1