MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research Paper • 2605.26114 • Published 3 days ago • 47
Claw-Anything: Benchmarking Always-On Personal Assistants with Broader Access to User's Digital World Paper • 2605.26086 • Published 3 days ago • 20
Claw-Anything: Benchmarking Always-On Personal Assistants with Broader Access to User's Digital World Paper • 2605.26086 • Published 3 days ago • 20
DexJoCo: A Benchmark and Toolkit for Task-Oriented Dexterous Manipulation on MuJoCo Paper • 2605.16257 • Published 13 days ago • 51
CLI-Gym: Scalable CLI Task Generation via Agentic Environment Inversion Paper • 2602.10999 • Published Feb 11 • 11
FeatureBench: Benchmarking Agentic Coding for Complex Feature Development Paper • 2602.10975 • Published Feb 11 • 18
CLI-Gym: Scalable CLI Task Generation via Agentic Environment Inversion Paper • 2602.10999 • Published Feb 11 • 11
RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System Paper • 2602.02488 • Published Feb 2 • 36
FeatureBench: Benchmarking Agentic Coding for Complex Feature Development Paper • 2602.10975 • Published Feb 11 • 18
vectara/hallucination_evaluation_model Text Classification • 0.1B • Updated Oct 20, 2025 • 142k • 354
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface Paper • 2503.01342 • Published Mar 3, 2025 • 8
Explicit Shape Encoding for Real-Time Instance Segmentation Paper • 1908.04067 • Published Aug 12, 2019 • 1