Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2503.16416

LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries

Paper • 2508.15760 • Published Aug 21 • 46
LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools?

Paper • 2508.01780 • Published Aug 3 • 19
API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs

Paper • 2304.08244 • Published Apr 14, 2023 • 1
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22 • 154

Survey on Evaluation of LLM-based Agents

Paper • 2503.16416 • Published Mar 20 • 95
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

Paper • 2505.04921 • Published May 8 • 185
Survey of User Interface Design and Interaction Techniques in Generative AI Applications

Paper • 2410.22370 • Published Oct 28, 2024 • 12
Survey of Hallucination in Natural Language Generation

Paper • 2202.03629 • Published Feb 8, 2022

Fun journal papers Ive read

Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders

Paper • 2503.03601 • Published Mar 5 • 232
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13 • 169
Survey on Evaluation of LLM-based Agents

Paper • 2503.16416 • Published Mar 20 • 95

Survey on Evaluation of LLM-based Agents

Paper • 2503.16416 • Published Mar 20 • 95

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

Paper • 2503.16419 • Published Mar 20 • 75
Survey on Evaluation of LLM-based Agents

Paper • 2503.16416 • Published Mar 20 • 95

Agent Evaluation

MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models

Paper • 2507.12806 • Published Jul 17 • 20
Survey on Evaluation of LLM-based Agents

Paper • 2503.16416 • Published Mar 20 • 95

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Paper • 2504.01990 • Published Mar 31 • 300
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14 • 298
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models

Paper • 2503.24235 • Published Mar 31 • 54
Seedream 3.0 Technical Report

Paper • 2504.11346 • Published Apr 15 • 70

All about agents including models, datasets, evals

Survey on Evaluation of LLM-based Agents

Paper • 2503.16416 • Published Mar 20 • 95
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 166
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Paper • 2504.01990 • Published Mar 31 • 300
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

Paper • 2403.13372 • Published Mar 20, 2024 • 159

Survey on Evaluation of LLM-based Agents

Paper • 2503.16416 • Published Mar 20 • 95

CoSTAast: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing

Paper • 2503.10613 • Published Mar 13 • 79
BrushEdit: All-In-One Image Inpainting and Editing

Paper • 2412.10316 • Published Dec 13, 2024 • 35
Survey on Evaluation of LLM-based Agents

Paper • 2503.16416 • Published Mar 20 • 95

LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries

Paper • 2508.15760 • Published Aug 21 • 46
LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools?

Paper • 2508.01780 • Published Aug 3 • 19
API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs

Paper • 2304.08244 • Published Apr 14, 2023 • 1
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22 • 154

Agent Evaluation

MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models

Paper • 2507.12806 • Published Jul 17 • 20
Survey on Evaluation of LLM-based Agents

Paper • 2503.16416 • Published Mar 20 • 95

Survey on Evaluation of LLM-based Agents

Paper • 2503.16416 • Published Mar 20 • 95
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

Paper • 2505.04921 • Published May 8 • 185
Survey of User Interface Design and Interaction Techniques in Generative AI Applications

Paper • 2410.22370 • Published Oct 28, 2024 • 12
Survey of Hallucination in Natural Language Generation

Paper • 2202.03629 • Published Feb 8, 2022

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Paper • 2504.01990 • Published Mar 31 • 300
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14 • 298
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models

Paper • 2503.24235 • Published Mar 31 • 54
Seedream 3.0 Technical Report

Paper • 2504.11346 • Published Apr 15 • 70

Fun journal papers Ive read

Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders

Paper • 2503.03601 • Published Mar 5 • 232
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13 • 169
Survey on Evaluation of LLM-based Agents

Paper • 2503.16416 • Published Mar 20 • 95

All about agents including models, datasets, evals

Survey on Evaluation of LLM-based Agents

Paper • 2503.16416 • Published Mar 20 • 95
Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 166
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Paper • 2504.01990 • Published Mar 31 • 300
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

Paper • 2403.13372 • Published Mar 20, 2024 • 159

Survey on Evaluation of LLM-based Agents

Paper • 2503.16416 • Published Mar 20 • 95

Survey on Evaluation of LLM-based Agents

Paper • 2503.16416 • Published Mar 20 • 95

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

Paper • 2503.16419 • Published Mar 20 • 75
Survey on Evaluation of LLM-based Agents

Paper • 2503.16416 • Published Mar 20 • 95

CoSTAast: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing

Paper • 2503.10613 • Published Mar 13 • 79
BrushEdit: All-In-One Image Inpainting and Editing

Paper • 2412.10316 • Published Dec 13, 2024 • 35
Survey on Evaluation of LLM-based Agents

Paper • 2503.16416 • Published Mar 20 • 95

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs