Remote Labor Index: Measuring AI Automation of Remote Work
Abstract
AIs have made rapid progress on research-oriented benchmarks of knowledge and reasoning, but it remains unclear how these gains translate into economic value and automation. To measure this, we introduce the Remote Labor Index (RLI), a broadly multi-sector benchmark comprising real-world, economically valuable projects designed to evaluate end-to-end agent performance in practical settings. AI agents perform near the floor on RLI, with the highest-performing agent achieving an automation rate of 2.5%. These results help ground discussions of AI automation in empirical evidence, setting a common basis for tracking AI impacts and enabling stakeholders to proactively navigate AI-driven labor automation.
Community
- What: Introduces the Remote Labor Index (RLI) — a benchmark of real, paid remote-work projects (multi-sector) to test end-to-end AI agents on economically valuable tasks. 
- Why: Existing benchmarks don’t reflect true workplace automation/value; RLI aims to measure actual completion of freelance-style projects. 
- Scale: Projects total ~6,000+ human hours and >$140k of real work across areas like game dev, product design, architecture, data analysis, and animation. 
- How scored: Reports an “automation rate”—share of projects an agent completes to acceptable quality. Frontier agents are near floor performance. 
- Results (v1): Best agents reach only ~2.5% automation (Manus 2.5%, Grok-4/Sonnet-4.5 ~2.1%, GPT-5 1.7%, ChatGPT agent 1.3%, Gemini 2.5 Pro 0.8%). 
Takeaway: Despite strong scores on research benchmarks, today’s agents barely automate real freelance-style work; RLI provides a concrete yardstick to track progress. 
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper