arxiv:2510.26787

Remote Labor Index: Measuring AI Automation of Remote Work

Published on Oct 30

· Submitted by

Authors:

Abstract

AIs have made rapid progress on research-oriented benchmarks of knowledge and reasoning, but it remains unclear how these gains translate into economic value and automation. To measure this, we introduce the Remote Labor Index (RLI), a broadly multi-sector benchmark comprising real-world, economically valuable projects designed to evaluate end-to-end agent performance in practical settings. AI agents perform near the floor on RLI, with the highest-performing agent achieving an automation rate of 2.5%. These results help ground discussions of AI automation in empirical evidence, setting a common basis for tracking AI impacts and enabling stakeholders to proactively navigate AI-driven labor automation.

View arXiv page View PDF Add to collection

Community

taesiri

Paper submitter 4 days ago

•

edited 4 days ago

What: Introduces the Remote Labor Index (RLI) — a benchmark of real, paid remote-work projects (multi-sector) to test end-to-end AI agents on economically valuable tasks.
Why: Existing benchmarks don’t reflect true workplace automation/value; RLI aims to measure actual completion of freelance-style projects.
Scale: Projects total ~6,000+ human hours and >$140k of real work across areas like game dev, product design, architecture, data analysis, and animation.
How scored: Reports an “automation rate”—share of projects an agent completes to acceptable quality. Frontier agents are near floor performance.
Results (v1): Best agents reach only ~2.5% automation (Manus 2.5%, Grok-4/Sonnet-4.5 ~2.1%, GPT-5 1.7%, ChatGPT agent 1.3%, Gemini 2.5 Pro 0.8%).

Takeaway: Despite strong scores on research benchmarks, today’s agents barely automate real freelance-style work; RLI provides a concrete yardstick to track progress.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.26787 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.26787 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.26787 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.