FreeAskWorld: An Interactive and Closed-Loop Simulator for Human-Centric Embodied AI
Abstract
As embodied intelligence emerges as a core frontier in artificial intelligence research, simulation platforms must evolve beyond low-level physical interactions to capture complex, human-centered social behaviors. We introduce FreeAskWorld, an interactive simulation framework that integrates large language models (LLMs) for high-level behavior planning and semantically grounded interaction, informed by theories of intention and social cognition. Our framework supports scalable, realistic human-agent simulations and includes a modular data generation pipeline tailored for diverse embodied tasks.To validate the framework, we extend the classic Vision-and-Language Navigation (VLN) task into a interaction enriched Direction Inquiry setting, wherein agents can actively seek and interpret navigational guidance. We present and publicly release FreeAskWorld, a large-scale benchmark dataset comprising reconstructed environments, six diverse task types, 16 core object categories, 63,429 annotated sample frames, and more than 17 hours of interaction data to support training and evaluation of embodied AI systems. We benchmark VLN models, and human participants under both open-loop and closed-loop settings. Experimental results demonstrate that models fine-tuned on FreeAskWorld outperform their original counterparts, achieving enhanced semantic understanding and interaction competency. These findings underscore the efficacy of socially grounded simulation frameworks in advancing embodied AI systems toward sophisticated high-level planning and more naturalistic human-agent interaction. Importantly, our work underscores that interaction itself serves as an additional information modality.
Community
Welcome everyone to talk about, how to simulate world more realistic than GTA5.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- BEAR: Benchmarking and Enhancing Multimodal Language Models for Atomic Embodied Capabilities (2025)
- ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning (2025)
- Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning (2025)
- PhysiAgent: An Embodied Agent Framework in Physical World (2025)
- Beyond Pipelines: A Survey of the Paradigm Shift toward Model-Native Agentic AI (2025)
- PhysVLM-AVR: Active Visual Reasoning for Multimodal Large Language Models in Physical Environments (2025)
- PFEA: An LLM-based High-Level Natural Language Planning and Feedback Embodied Agent for Human-Centered AI (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 2
Spaces citing this paper 0
No Space linking this paper