arxiv:2305.17568

Scalable Primal-Dual Actor-Critic Method for Safe Multi-Agent RL with General Utilities

Published on May 27, 2023

Authors:

Yunkai Zhang ,

Abstract

A primal-dual method with shadow reward and neighbor truncation is proposed for safe multi-agent reinforcement learning, achieving convergence to a first-order stationary point and demonstrating effectiveness through experiments.

AI-generated summary

We investigate safe multi-agent reinforcement learning, where agents seek to collectively maximize an aggregate sum of local objectives while satisfying their own safety constraints. The objective and constraints are described by {\it general utilities}, i.e., nonlinear functions of the long-term state-action occupancy measure, which encompass broader decision-making goals such as risk, exploration, or imitations. The exponential growth of the state-action space size with the number of agents presents challenges for global observability, further exacerbated by the global coupling arising from agents' safety constraints. To tackle this issue, we propose a primal-dual method utilizing shadow reward and κ-hop neighbor truncation under a form of correlation decay property, where κ is the communication radius. In the exact setting, our algorithm converges to a first-order stationary point (FOSP) at the rate of Oleft(T^{-2/3}right). In the sample-based setting, we demonstrate that, with high probability, our algorithm requires mathcal{O}left(ε^{-3.5}right) samples to achieve an ε-FOSP with an approximation error of O(φ_0^{2κ}), where φ_0in (0,1). Finally, we demonstrate the effectiveness of our model through extensive numerical experiments.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2305.17568 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2305.17568 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2305.17568 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.