 zzfive
			's Collections
			zzfive
			's Collections
			
			
				
				
 - AgentOhana: Design Unified Data and Training Pipeline for Effective
  Agent Learning- 
			Paper
			 •- 
			2402.15506
			 •
			Published
				
			•- 
				18
			 
 - AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web
  Navigating Agent- 
			Paper
			 •- 
			2404.03648
			 •
			Published
				
			•- 
				29
			 
 - Similarity is Not All You Need: Endowing Retrieval Augmented Generation
  with Multi Layered Thoughts- 
			Paper
			 •- 
			2405.19893
			 •
			Published
				
			•- 
				33
			 
 - Parrot: Efficient Serving of LLM-based Applications with Semantic
  Variable- 
			Paper
			 •- 
			2405.19888
			 •
			Published
				
			•- 
				7
			 
 - Mobile-Agent-v2: Mobile Device Operation Assistant with Effective
  Navigation via Multi-Agent Collaboration- 
			Paper
			 •- 
			2406.01014
			 •
			Published
				
			•- 
				34
			 
 - AgentGym: Evolving Large Language Model-based Agents across Diverse
  Environments- 
			Paper
			 •- 
			2406.04151
			 •
			Published
				
			•- 
				23
			 
 - τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World
  Domains- 
			Paper
			 •- 
			2406.12045
			 •
			Published
				
			•- 
				9
			 
 - Agentless: Demystifying LLM-based Software Engineering Agents- 
			Paper
			 •- 
			2407.01489
			 •
			Published
				
			•- 
				63
			 
 - Internet of Agents: Weaving a Web of Heterogeneous Agents for
  Collaborative Intelligence- 
			Paper
			 •- 
			2407.07061
			 •
			Published
				
			•- 
				27
			 
 - Spider2-V: How Far Are Multimodal Agents From Automating Data Science
  and Engineering Workflows?- 
			Paper
			 •- 
			2407.10956
			 •
			Published
				
			•- 
				7
			 
 - Sibyl: Simple yet Effective Agent Framework for Complex Real-world
  Reasoning- 
			Paper
			 •- 
			2407.10718
			 •
			Published
				
			•- 
				19
			 
 - POGEMA: A Benchmark Platform for Cooperative Multi-Agent Navigation- 
			Paper
			 •- 
			2407.14931
			 •
			Published
				
			•- 
				22
			 
 - AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?- 
			Paper
			 •- 
			2407.15711
			 •
			Published
				
			•- 
				9
			 
 - CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis- 
			Paper
			 •- 
			2407.13301
			 •
			Published
				
			•- 
				55
			 
 - OpenDevin: An Open Platform for AI Software Developers as Generalist
  Agents- 
			Paper
			 •- 
			2407.16741
			 •
			Published
				
			•- 
				73
			 
 - LAMBDA: A Large Model Based Data Agent- 
			Paper
			 •- 
			2407.17535
			 •
			Published
				
			•- 
				37
			 
 - AppWorld: A Controllable World of Apps and People for Benchmarking
  Interactive Coding Agents- 
			Paper
			 •- 
			2407.18901
			 •
			Published
				
			•- 
				35
			 
 - MindSearch: Mimicking Human Minds Elicits Deep AI Searcher- 
			Paper
			 •- 
			2407.20183
			 •
			Published
				
			•- 
				43
			 
 - GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPS- 
			Paper
			 •- 
			2408.01584
			 •
			Published
				
			•- 
				10
			 
 - Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in
  Long-Horizon Tasks- 
			Paper
			 •- 
			2408.03615
			 •
			Published
				
			•- 
				31
			 
 - CodexGraph: Bridging Large Language Models and Code Repositories via
  Code Graph Databases- 
			Paper
			 •- 
			2408.03910
			 •
			Published
				
			•- 
				18
			 
 - Automated Design of Agentic Systems- 
			Paper
			 •- 
			2408.08435
			 •
			Published
				
			•- 
				40
			 
 - Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized
  Academic Assistance- 
			Paper
			 •- 
			2409.04593
			 •
			Published
				
			•- 
				26
			 
 - 
			Paper
			 •- 
			2409.07429
			 •
			Published
				
			•- 
				31
			 
 - SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research
  Repositories- 
			Paper
			 •- 
			2409.07440
			 •
			Published
				
			•- 
				8
			 
 - HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks
  at Scale- 
			Paper
			 •- 
			2409.16299
			 •
			Published
				
			•- 
				12
			 
 - MSI-Agent: Incorporating Multi-Scale Insight into Embodied Agents for
  Superior Planning and Decision-Making- 
			Paper
			 •- 
			2409.16686
			 •
			Published
				
			•- 
				10
			 
 - Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise- 
			Paper
			 •- 
			2410.03017
			 •
			Published
				
			•- 
				29
			 
 - Agent S: An Open Agentic Framework that Uses Computers Like a Human- 
			Paper
			 •- 
			2410.08164
			 •
			Published
				
			•- 
				26
			 
 - MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language
  Models- 
			Paper
			 •- 
			2410.11710
			 •
			Published
				
			•- 
				20
			 
 - Agent-as-a-Judge: Evaluate Agents with Agents- 
			Paper
			 •- 
			2410.10934
			 •
			Published
				
			•- 
				23
			 
 - Revealing the Barriers of Language Agents in Planning- 
			Paper
			 •- 
			2410.12409
			 •
			Published
				
			•- 
				27
			 
 - MobA: A Two-Level Agent System for Efficient Mobile Task Automation- 
			Paper
			 •- 
			2410.13757
			 •
			Published
				
			•- 
				33
			 
 - Web Agents with World Models: Learning and Leveraging Environment
  Dynamics in Web Navigation- 
			Paper
			 •- 
			2410.13232
			 •
			Published
				
			•- 
				44
			 
 - AgentStore: Scalable Integration of Heterogeneous Agents As Specialized
  Generalist Computer Assistant- 
			Paper
			 •- 
			2410.18603
			 •
			Published
				
			•- 
				32
			 
 - AutoKaggle: A Multi-Agent Framework for Autonomous Data Science
  Competitions- 
			Paper
			 •- 
			2410.20424
			 •
			Published
				
			•- 
				40
			 
 - OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World
  Exploration, Feedback and Optimization- 
			Paper
			 •- 
			2410.19609
			 •
			Published
				
			•- 
				18
			 
 - Teaching Embodied Reinforcement Learning Agents: Informativeness and
  Diversity of Language Use- 
			Paper
			 •- 
			2410.24218
			 •
			Published
				
			•- 
				6
			 
 - OS-ATLAS: A Foundation Action Model for Generalist GUI Agents- 
			Paper
			 •- 
			2410.23218
			 •
			Published
				
			•- 
				49
			 
 - Adapting While Learning: Grounding LLMs for Scientific Problems with
  Intelligent Tool Usage Adaptation- 
			Paper
			 •- 
			2411.00412
			 •
			Published
				
			•- 
				10
			 
 - AndroidLab: Training and Systematic Benchmarking of Android Autonomous
  Agents- 
			Paper
			 •- 
			2410.24024
			 •
			Published
				
			•- 
				49
			 
 - WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum
  Reinforcement Learning- 
			Paper
			 •- 
			2411.02337
			 •
			Published
				
			•- 
				36
			 
 - Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large
  Language Model- 
			Paper
			 •- 
			2411.04496
			 •
			Published
				
			•- 
				23
			 
 - GazeGen: Gaze-Driven User Interaction for Visual Content Generation- 
			Paper
			 •- 
			2411.04335
			 •
			Published
				
			•- 
				15
			 
 - The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer
  Use- 
			Paper
			 •- 
			2411.10323
			 •
			Published
				
			•- 
				34
			 
 - Is Your LLM Secretly a World Model of the Internet? Model-Based Planning
  for Web Agents- 
			Paper
			 •- 
			2411.06559
			 •
			Published
				
			•- 
				16
			 
 - BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games- 
			Paper
			 •- 
			2411.13543
			 •
			Published
				
			•- 
				19
			 
 - SketchAgent: Language-Driven Sequential Sketch Generation- 
			Paper
			 •- 
			2411.17673
			 •
			Published
				
			•- 
				19
			 
 - Interleaved Scene Graph for Interleaved Text-and-Image Generation
  Assessment- 
			Paper
			 •- 
			2411.17188
			 •
			Published
				
			•- 
				21
			 
 - Large Language Model-Brained GUI Agents: A Survey- 
			Paper
			 •- 
			2411.18279
			 •
			Published
				
			•- 
				31
			 
 - MALT: Improving Reasoning with Multi-Agent LLM Training- 
			Paper
			 •- 
			2412.01928
			 •
			Published
				
			•- 
				45
			 
 - Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction- 
			Paper
			 •- 
			2412.04454
			 •
			Published
				
			•- 
				70
			 
 - Unraveling the Complexity of Memory in RL Agents: an Approach for
  Classification and Evaluation- 
			Paper
			 •- 
			2412.06531
			 •
			Published
				
			•- 
				72
			 
 - The BrowserGym Ecosystem for Web Agent Research- 
			Paper
			 •- 
			2412.05467
			 •
			Published
				
			•- 
				22
			 
 - AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web
  Tutorials- 
			Paper
			 •- 
			2412.09605
			 •
			Published
				
			•- 
				29
			 
 - Large Action Models: From Inception to Implementation- 
			Paper
			 •- 
			2412.10047
			 •
			Published
				
			•- 
				36
			 
 - Evaluation Agent: Efficient and Promptable Evaluation Framework for
  Visual Generative Models- 
			Paper
			 •- 
			2412.09645
			 •
			Published
				
			•- 
				36
			 
 - Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation
  Model Internet Agents- 
			Paper
			 •- 
			2412.13194
			 •
			Published
				
			•- 
				12
			 
 - TheAgentCompany: Benchmarking LLM Agents on Consequential Real World
  Tasks- 
			Paper
			 •- 
			2412.14161
			 •
			Published
				
			•- 
				51
			 
 - 
			Paper
			 •- 
			2412.13501
			 •
			Published
				
			•- 
				29
			 
 - PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital
  World- 
			Paper
			 •- 
			2412.17589
			 •
			Published
				
			•- 
				14
			 
 - Agent-SafetyBench: Evaluating the Safety of LLM Agents- 
			Paper
			 •- 
			2412.14470
			 •
			Published
				
			•- 
				13
			 
 - Training Software Engineering Agents and Verifiers with SWE-Gym- 
			Paper
			 •- 
			2412.21139
			 •
			Published
				
			•- 
				24
			 
 - OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse
  Task Synthesis- 
			Paper
			 •- 
			2412.19723
			 •
			Published
				
			•- 
				87
			 
 - A3: Android Agent Arena for Mobile GUI Agents- 
			Paper
			 •- 
			2501.01149
			 •
			Published
				
			•- 
				22
			 
 - Agent Laboratory: Using LLM Agents as Research Assistants- 
			Paper
			 •- 
			2501.04227
			 •
			Published
				
			•- 
				94
			 
 - Search-o1: Agentic Search-Enhanced Large Reasoning Models- 
			Paper
			 •- 
			2501.05366
			 •
			Published
				
			•- 
				102
			 
 - InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning
  and Reflection- 
			Paper
			 •- 
			2501.04575
			 •
			Published
				
			•- 
				25
			 
 - PaSa: An LLM Agent for Comprehensive Academic Paper Search- 
			Paper
			 •- 
			2501.10120
			 •
			Published
				
			•- 
				52
			 
 - Agent-R: Training Language Model Agents to Reflect via Iterative
  Self-Training- 
			Paper
			 •- 
			2501.11425
			 •
			Published
				
			•- 
				108
			 
 - UI-TARS: Pioneering Automated GUI Interaction with Native Agents- 
			Paper
			 •- 
			2501.12326
			 •
			Published
				
			•- 
				65
			 
 - Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks- 
			Paper
			 •- 
			2501.11733
			 •
			Published
				
			•- 
				28
			 
 - FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in
  Virtual 3D Spaces- 
			Paper
			 •- 
			2501.12909
			 •
			Published
				
			•- 
				71
			 
 - IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI
  Systems- 
			Paper
			 •- 
			2501.11067
			 •
			Published
				
			•- 
				13
			 
 - CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web
  Navigation- 
			Paper
			 •- 
			2501.16609
			 •
			Published
				
			•- 
				7
			 
 - QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search- 
			Paper
			 •- 
			2502.02584
			 •
			Published
				
			•- 
				17
			 
 - Rethinking Mixture-of-Agents: Is Mixing Different Large Language Models
  Beneficial?- 
			Paper
			 •- 
			2502.00674
			 •
			Published
				
			•- 
				13
			 
 - MetaChain: A Fully-Automated and Zero-Code Framework for LLM Agents- 
			Paper
			 •- 
			2502.05957
			 •
			Published
				
			•- 
				16
			 
 - InSTA: Towards Internet-Scale Training For Agents- 
			Paper
			 •- 
			2502.06776
			 •
			Published
				
			•- 
				9
			 
 - Hephaestus: Improving Fundamental Agent Capabilities of Large Language
  Models through Continual Pre-Training- 
			Paper
			 •- 
			2502.06589
			 •
			Published
				
			•- 
				20
			 
 - EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language
  Models for Vision-Driven Embodied Agents- 
			Paper
			 •- 
			2502.09560
			 •
			Published
				
			•- 
				35
			 
 - OctoTools: An Agentic Framework with Extensible Tools for Complex
  Reasoning- 
			Paper
			 •- 
			2502.11271
			 •
			Published
				
			•- 
				18
			 
 - Autellix: An Efficient Serving Engine for LLM Agents as General Programs- 
			Paper
			 •- 
			2502.13965
			 •
			Published
				
			•- 
				19
			 
 - TAG: A Decentralized Framework for Multi-Agent Hierarchical
  Reinforcement Learning- 
			Paper
			 •- 
			2502.15425
			 •
			Published
				
			•- 
				9
			 
 - Self-Taught Agentic Long Context Understanding- 
			Paper
			 •- 
			2502.15920
			 •
			Published
				
			•- 
				3
			 
 - WebGames: Challenging General-Purpose Web-Browsing AI Agents- 
			Paper
			 •- 
			2502.18356
			 •
			Published
				
			•- 
				14
			 
 - ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic
  Iterative Reasoning Agents- 
			Paper
			 •- 
			2502.18017
			 •
			Published
				
			•- 
				21
			 
 - PodAgent: A Comprehensive Framework for Podcast Generation- 
			Paper
			 •- 
			2503.00455
			 •
			Published
				
			•- 
				6
			 
 - MPO: Boosting LLM Agents with Meta Plan Optimization- 
			Paper
			 •- 
			2503.02682
			 •
			Published
				
			•- 
				28
			 
 - Agent models: Internalizing Chain-of-Action Generation into Reasoning
  models- 
			Paper
			 •- 
			2503.06580
			 •
			Published
				
			•- 
				19
			 
 - API Agents vs. GUI Agents: Divergence and Convergence- 
			Paper
			 •- 
			2503.11069
			 •
			Published
				
			•- 
				37
			 
 - STEVE: AStep Verification Pipeline for Computer-use Agent Training- 
			Paper
			 •- 
			2503.12532
			 •
			Published
				
			•- 
				17
			 
 - Survey on Evaluation of LLM-based Agents- 
			Paper
			 •- 
			2503.16416
			 •
			Published
				
			•- 
				95
			 
 - Verbal Process Supervision Elicits Better Coding Agents- 
			Paper
			 •- 
			2503.18494
			 •
			Published
				
			•- 
				2
			 
 - Large Language Model Agent: A Survey on Methodology, Applications and
  Challenges- 
			Paper
			 •- 
			2503.21460
			 •
			Published
				
			•- 
				83
			 
 - UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement
  Learning- 
			Paper
			 •- 
			2503.21620
			 •
			Published
				
			•- 
				62
			 
 - Classical Planning with LLM-Generated Heuristics: Challenging the State
  of the Art with Python Code- 
			Paper
			 •- 
			2503.18809
			 •
			Published
				
			•- 
				9
			 
 - Agent S2: A Compositional Generalist-Specialist Framework for Computer
  Use Agents- 
			Paper
			 •- 
			2504.00906
			 •
			Published
				
			•- 
				25
			 
 - Advances and Challenges in Foundation Agents: From Brain-Inspired
  Intelligence to Evolutionary, Collaborative, and Safe Systems- 
			Paper
			 •- 
			2504.01990
			 •
			Published
				
			•- 
				300
			 
 - AgentRewardBench: Evaluating Automatic Evaluations of Web Agent
  Trajectories- 
			Paper
			 •- 
			2504.08942
			 •
			Published
				
			•- 
				28
			 
 - Breaking the Data Barrier -- Building GUI Agents Through Task
  Generalization- 
			Paper
			 •- 
			2504.10127
			 •
			Published
				
			•- 
				17
			 
 - SocioVerse: A World Model for Social Simulation Powered by LLM Agents
  and A Pool of 10 Million Real-World Users- 
			Paper
			 •- 
			2504.10157
			 •
			Published
				
			•- 
				17
			 
 - The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via
  Agentic Tree Search- 
			Paper
			 •- 
			2504.08066
			 •
			Published
				
			•- 
				14
			 
 - 
			Paper
			 •- 
			2504.11442
			 •
			Published
				
			•- 
				29
			 
 - MLRC-Bench: Can Language Agents Solve Machine Learning Research
  Challenges?- 
			Paper
			 •- 
			2504.09702
			 •
			Published
				
			•- 
				18
			 
 - Exploring Expert Failures Improves LLM Agent Tuning- 
			Paper
			 •- 
			2504.13145
			 •
			Published
				
			•- 
				12
			 
 - UFO2: The Desktop AgentOS- 
			Paper
			 •- 
			2504.14603
			 •
			Published
				
			•- 
				29
			 
 - InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to
  Deliberative Reasoners- 
			Paper
			 •- 
			2504.14239
			 •
			Published
				
			•- 
				13
			 
 - LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making
  Abilities- 
			Paper
			 •- 
			2504.16078
			 •
			Published
				
			•- 
				21
			 
 - Paper2Code: Automating Code Generation from Scientific Papers in Machine
  Learning- 
			Paper
			 •- 
			2504.17192
			 •
			Published
				
			•- 
				120
			 
 - LLM-Powered GUI Agents in Phone Automation: Surveying Progress and
  Prospects- 
			Paper
			 •- 
			2504.19838
			 •
			Published
				
			•- 
				22
			 
 - Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory- 
			Paper
			 •- 
			2504.19413
			 •
			Published
				
			•- 
				24
			 
 - RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn
  Reinforcement Learning- 
			Paper
			 •- 
			2504.20073
			 •
			Published
				
			•- 
				13
			 
 - Agentic Reasoning and Tool Integration for LLMs via Reinforcement
  Learning- 
			Paper
			 •- 
			2505.01441
			 •
			Published
				
			•- 
				39
			 
 - Think on your Feet: Adaptive Thinking via Reinforcement Learning for
  Social Agents- 
			Paper
			 •- 
			2505.02156
			 •
			Published
				
			•- 
				18
			 
 - Multi-Agent System for Comprehensive Soccer Understanding- 
			Paper
			 •- 
			2505.03735
			 •
			Published
				
			•- 
				25
			 
 - OSUniverse: Benchmark for Multimodal GUI-navigation AI Agents- 
			Paper
			 •- 
			2505.03570
			 •
			Published
				
			•- 
				8
			 
 - LLM-Independent Adaptive RAG: Let the Question Speak for Itself- 
			Paper
			 •- 
			2505.04253
			 •
			Published
				
			•- 
				13
			 
 - AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and
  Challenge- 
			Paper
			 •- 
			2505.10468
			 •
			Published
				
			•- 
				9
			 
 - Creating General User Models from Computer Use- 
			Paper
			 •- 
			2505.10831
			 •
			Published
				
			•- 
				5
			 
 - Visual Agentic Reinforcement Fine-Tuning- 
			Paper
			 •- 
			2505.14246
			 •
			Published
				
			•- 
				32
			 
 - NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop
  System from Hypothesis to Verification- 
			Paper
			 •- 
			2505.16938
			 •
			Published
				
			•- 
				120
			 
 - Distilling LLM Agent into Small Models with Retrieval and Code Tools- 
			Paper
			 •- 
			2505.17612
			 •
			Published
				
			•- 
				81
			 
 - UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based
  Mobile GUI Agents- 
			Paper
			 •- 
			2505.21496
			 •
			Published
				
			•- 
				38
			 
 - WebDancer: Towards Autonomous Information Seeking Agency- 
			Paper
			 •- 
			2505.22648
			 •
			Published
				
			•- 
				33
			 
 - Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and
  Benchmarking Multimodal LLM Agents- 
			Paper
			 •- 
			2505.24878
			 •
			Published
				
			•- 
				22
			 
 - GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents- 
			Paper
			 •- 
			2506.03143
			 •
			Published
				
			•- 
				52
			 
 - TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management
  in LLM-based Agentic Multi-Agent Systems- 
			Paper
			 •- 
			2506.04133
			 •
			Published
				
			•- 
				3
			 
 - ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow
  Development- 
			Paper
			 •- 
			2506.05010
			 •
			Published
				
			•- 
				79
			 
 - Surfer-H Meets Holo1: Cost-Efficient Web Agent Powered by Open Weights- 
			Paper
			 •- 
			2506.02865
			 •
			Published
				
			•- 
				33
			 
 - MedAgentGym: Training LLM Agents for Code-Based Medical Reasoning at
  Scale- 
			Paper
			 •- 
			2506.04405
			 •
			Published
				
			•- 
				7
			 
 - Agents of Change: Self-Evolving LLM Agents for Strategic Planning- 
			Paper
			 •- 
			2506.04651
			 •
			Published
				
			•- 
				8
			 
 - DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents- 
			Paper
			 •- 
			2506.11763
			 •
			Published
				
			•- 
				71
			 
 - Scaling Test-time Compute for LLM Agents- 
			Paper
			 •- 
			2506.12928
			 •
			Published
				
			•- 
				63
			 
 - OAgents: An Empirical Study of Building Effective Agents- 
			Paper
			 •- 
			2506.15741
			 •
			Published
				
			•- 
				35
			 
 - SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via
  Multi-Agent Multi-Turn Reinforcement Learning- 
			Paper
			 •- 
			2506.24119
			 •
			Published
				
			•- 
				50
			 
 - WebSailor: Navigating Super-human Reasoning for Web Agent- 
			Paper
			 •- 
			2507.02592
			 •
			Published
				
			•- 
				120
			 
 - PresentAgent: Multimodal Agent for Presentation Video Generation- 
			Paper
			 •- 
			2507.04036
			 •
			Published
				
			•- 
				10
			 
 - Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving- 
			Paper
			 •- 
			2507.06229
			 •
			Published
				
			•- 
				75
			 
 - MIRIX: Multi-Agent Memory System for LLM-Based Agents- 
			Paper
			 •- 
			2507.07957
			 •
			Published
				
			•- 
				74
			 
 - GUI-G^2: Gaussian Reward Modeling for GUI Grounding- 
			Paper
			 •- 
			2507.15846
			 •
			Published
				
			•- 
				132
			 
 - MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models- 
			Paper
			 •- 
			2507.12806
			 •
			Published
				
			•- 
				20
			 
 - LLM Economist: Large Population Models and Mechanism Design in
  Multi-Agent Generative Simulacra- 
			Paper
			 •- 
			2507.15815
			 •
			Published
				
			•- 
				6
			 
 - MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI
  Agents- 
			Paper
			 •- 
			2507.19478
			 •
			Published
				
			•- 
				30
			 
 - A Survey of Self-Evolving Agents: On Path to Artificial Super
  Intelligence- 
			Paper
			 •- 
			2507.21046
			 •
			Published
				
			•- 
				81
			 
 - GenoMAS: A Multi-Agent Framework for Scientific Discovery via
  Code-Driven Gene Expression Analysis- 
			Paper
			 •- 
			2507.21035
			 •
			Published
				
			•- 
				3
			 
 - ScreenCoder: Advancing Visual-to-Code Generation for Front-End
  Automation via Modular Multimodal Agents- 
			Paper
			 •- 
			2507.22827
			 •
			Published
				
			•- 
				98
			 
 - Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent
  Foundation Models Training- 
			Paper
			 •- 
			2508.00414
			 •
			Published
				
			•- 
				91
			 
 - SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution- 
			Paper
			 •- 
			2507.23348
			 •
			Published
				
			•- 
				11
			 
 - CellForge: Agentic Design of Virtual Cell Models- 
			Paper
			 •- 
			2508.02276
			 •
			Published
				
			•- 
				39
			 
 - RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong
  Learning in Physical Embodied Systems- 
			Paper
			 •- 
			2508.01415
			 •
			Published
				
			•- 
				7
			 
 - AgentTTS: Large Language Model Agent for Test-time Compute-optimal
  Scaling Strategy in Complex Tasks- 
			Paper
			 •- 
			2508.00890
			 •
			Published
				
			•- 
				6
			 
 - LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools?- 
			Paper
			 •- 
			2508.01780
			 •
			Published
				
			•- 
				19
			 
 - HyCodePolicy: Hybrid Language Controllers for Multimodal Monitoring and
  Decision in Embodied Agents- 
			Paper
			 •- 
			2508.02629
			 •
			Published
				
			•- 
				5
			 
 - Efficient Agents: Building Effective Agents While Reducing Cost- 
			Paper
			 •- 
			2508.02694
			 •
			Published
				
			•- 
				85
			 
 - SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from
  Experience- 
			Paper
			 •- 
			2508.04700
			 •
			Published
				
			•- 
				52
			 
 - Training Long-Context, Multi-Turn Software Engineering Agents with
  Reinforcement Learning- 
			Paper
			 •- 
			2508.03501
			 •
			Published
				
			•- 
				56
			 
 - Enhancing Vision-Language Model Training with Reinforcement Learning in
  Synthetic Worlds for Real-World Success- 
			Paper
			 •- 
			2508.04280
			 •
			Published
				
			•- 
				35
			 
 - Agent Lightning: Train ANY AI Agents with Reinforcement Learning- 
			Paper
			 •- 
			2508.03680
			 •
			Published
				
			•- 
				84
			 
 - Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web
  Agents- 
			Paper
			 •- 
			2508.01858
			 •
			Published
				
			•- 
				20
			 
 - CoAct-1: Computer-using Agents with Coding as Actions- 
			Paper
			 •- 
			2508.03923
			 •
			Published
				
			•- 
				14
			 
 - OS Agents: A Survey on MLLM-based Agents for General Computing Devices
  Use- 
			Paper
			 •- 
			2508.04482
			 •
			Published
				
			•- 
				9
			 
 - WideSearch: Benchmarking Agentic Broad Info-Seeking- 
			Paper
			 •- 
			2508.07999
			 •
			Published
				
			•- 
				109
			 
 - A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm
  Bridging Foundation Models and Lifelong Agentic Systems- 
			Paper
			 •- 
			2508.07407
			 •
			Published
				
			•- 
				97
			 
 - BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of
  Deep-Research Agent- 
			Paper
			 •- 
			2508.06600
			 •
			Published
				
			•- 
				40
			 
 - WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent- 
			Paper
			 •- 
			2508.05748
			 •
			Published
				
			•- 
				137
			 
 - Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale
  Asynchronous RL- 
			Paper
			 •- 
			2508.07976
			 •
			Published
				
			•- 
				51
			 
 - OpenCUA: Open Foundations for Computer-Use Agents- 
			Paper
			 •- 
			2508.09123
			 •
			Published
				
			•- 
				31
			 
 - Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with
  Long-Term Memory- 
			Paper
			 •- 
			2508.09736
			 •
			Published
				
			•- 
				56
			 
 - AWorld: Dynamic Multi-Agent System with Stable Maneuvering for Robust
  GAIA Problem Solving- 
			Paper
			 •- 
			2508.09889
			 •
			Published
				
			•- 
				32
			 
 - UI-Venus Technical Report: Building High-performance UI Agents with RFT- 
			Paper
			 •- 
			2508.10833
			 •
			Published
				
			•- 
				43
			 
 - Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent
  Distillation and Agentic RL- 
			Paper
			 •- 
			2508.13167
			 •
			Published
				
			•- 
				127
			 
 - MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents- 
			Paper
			 •- 
			2508.13186
			 •
			Published
				
			•- 
				18
			 
 - CAMAR: Continuous Actions Multi-Agent Routing- 
			Paper
			 •- 
			2508.12845
			 •
			Published
				
			•- 
				7
			 
 - Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic
  Thought Reward- 
			Paper
			 •- 
			2508.12800
			 •
			Published
				
			•- 
				5
			 
 - MCP-Universe: Benchmarking Large Language Models with Real-World Model
  Context Protocol Servers- 
			Paper
			 •- 
			2508.14704
			 •
			Published
				
			•- 
				42
			 
 - Mobile-Agent-v3: Foundamental Agents for GUI Automation- 
			Paper
			 •- 
			2508.15144
			 •
			Published
				
			•- 
				64
			 
 - AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs- 
			Paper
			 •- 
			2508.16153
			 •
			Published
				
			•- 
				154
			 
 - PosterGen: Aesthetic-Aware Paper-to-Poster Generation via Multi-Agent
  LLMs- 
			Paper
			 •- 
			2508.17188
			 •
			Published
				
			•- 
				17
			 
 - Training Language Model Agents to Find Vulnerabilities with CTF-Dojo- 
			Paper
			 •- 
			2508.18370
			 •
			Published
				
			•- 
				3
			 
 - ReportBench: Evaluating Deep Research Agents via Academic Survey Tasks- 
			Paper
			 •- 
			2508.15804
			 •
			Published
				
			•- 
				15
			 
 - MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World
  Tasks via MCP Servers- 
			Paper
			 •- 
			2508.20453
			 •
			Published
				
			•- 
				63
			 
 - AWorld: Orchestrating the Training Recipe for Agentic AI- 
			Paper
			 •- 
			2508.20404
			 •
			Published
				
			•- 
				38
			 
 - UItron: Foundational GUI Agent with Advanced Perception and Planning- 
			Paper
			 •- 
			2508.21767
			 •
			Published
				
			•- 
				12
			 
 - The Landscape of Agentic Reinforcement Learning for LLMs: A Survey- 
			Paper
			 •- 
			2509.02547
			 •
			Published
				
			•- 
				219
			 
 - UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn
  Reinforcement Learning- 
			Paper
			 •- 
			2509.02544
			 •
			Published
				
			•- 
				123
			 
 - AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making
  through Multi-Turn Reinforcement Learning- 
			Paper
			 •- 
			2509.08755
			 •
			Published
				
			•- 
				56
			 
 - MCP-AgentBench: Evaluating Real-World Language Agent Performance with
  MCP-Mediated Tools- 
			Paper
			 •- 
			2509.09734
			 •
			Published
				
			•- 
				15
			 
 - QuantAgent: Price-Driven Multi-Agent LLMs for High-Frequency Trading- 
			Paper
			 •- 
			2509.09995
			 •
			Published
				
			•- 
				14
			 
 - WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for
  Open-Ended Deep Research- 
			Paper
			 •- 
			2509.13312
			 •
			Published
				
			•- 
				105
			 
 - Scaling Agents via Continual Pre-training- 
			Paper
			 •- 
			2509.13310
			 •
			Published
				
			•- 
				112
			 
 - WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic
  Data and Scalable Reinforcement Learning- 
			Paper
			 •- 
			2509.13305
			 •
			Published
				
			•- 
				89
			 
 - Towards General Agentic Intelligence via Environment Scaling- 
			Paper
			 •- 
			2509.13311
			 •
			Published
				
			•- 
				70
			 
 - WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon
  Agents- 
			Paper
			 •- 
			2509.13309
			 •
			Published
				
			•- 
				67
			 
 - ReSum: Unlocking Long-Horizon Search Intelligence via Context
  Summarization- 
			Paper
			 •- 
			2509.13313
			 •
			Published
				
			•- 
				77
			 
 - ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform
  Data- 
			Paper
			 •- 
			2509.15221
			 •
			Published
				
			•- 
				109
			 
 - Towards Human-like Multimodal Conversational Agent by Generating
  Engaging Speech- 
			Paper
			 •- 
			2509.14627
			 •
			Published
				
			•- 
				1
			 
 - LIMI: Less is More for Agency- 
			Paper
			 •- 
			2509.17567
			 •
			Published
				
			•- 
				100
			 
 - ARE: Scaling Up Agent Environments and Evaluations- 
			Paper
			 •- 
			2509.17158
			 •
			Published
				
			•- 
				34
			 
 - SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering
  Tasks?- 
			Paper
			 •- 
			2509.16941
			 •
			Published
				
			•- 
				20
			 
 - 
			Paper
			 •- 
			2509.17336
			 •
			Published
				
			•- 
				10
			 
 - GEM: A Gym for Agentic LLMs- 
			Paper
			 •- 
			2510.01051
			 •
			Published
				
			•- 
				87
			 
 - Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel
  Execution- 
			Paper
			 •- 
			2509.25301
			 •
			Published
				
			•- 
				17
			 
 - JoyAgent-JDGenie: Technical Report on the GAIA- 
			Paper
			 •- 
			2510.00510
			 •
			Published
				
			•- 
				3
			 
 - Multi-Agent Tool-Integrated Policy Optimization- 
			Paper
			 •- 
			2510.04678
			 •
			Published
				
			•- 
				30
			 
 - Don't Just Fine-tune the Agent, Tune the Environment- 
			Paper
			 •- 
			2510.10197
			 •
			Published
				
			•- 
				28
			 
 - AlphaQuanter: An End-to-End Tool-Orchestrated Agentic Reinforcement
  Learning Framework for Stock Trading- 
			Paper
			 •- 
			2510.14264
			 •
			Published
				
			•- 
				9
			 
 - DeepAnalyze: Agentic Large Language Models for Autonomous Data Science- 
			Paper
			 •- 
			2510.16872
			 •
			Published
				
			•- 
				90