AutoEnv: Automated Environments for Measuring Cross-Environment Agent Learning Paper • 2511.19304 • Published 3 days ago • 82
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue Paper • 2510.13747 • Published Oct 15 • 29
PyBench: Evaluating LLM Agent on various real-world coding tasks Paper • 2407.16732 • Published Jul 23, 2024 • 1
ReCode: Unify Plan and Action for Universal Granularity Control Paper • 2510.23564 • Published about 1 month ago • 119
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence Paper • 2510.20579 • Published Oct 23 • 55
PyBench: Evaluating LLM Agent on various real-world coding tasks Paper • 2407.16732 • Published Jul 23, 2024 • 1
ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding Paper • 2508.21496 • Published Aug 29 • 54
ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding Paper • 2508.21496 • Published Aug 29 • 54