On the Complexity of Computing Markov Perfect Equilibrium in General-Sum Stochastic Games

Xiaotie Deng; Yuhao Li; David Mguni; Jun Wang; Yaodong Yang

Electronic Colloquium on Computational Complexity

Under the auspices of the Computational Complexity Foundation (CCF)

REPORTS > DETAIL:

Paper:

TR21-128 | 4th September 2021 07:49

On the Complexity of Computing Markov Perfect Equilibrium in General-Sum Stochastic Games

TR21-128 Authors: Xiaotie Deng, Yuhao Li, David Mguni, Jun Wang, Yaodong Yang
Publication: 4th September 2021 09:57
Downloads: 884

Keywords:

Markov Perfect Equilibrium, PPAD-Completeness, Stochastic Games

Abstract:

Similar to the role of Markov decision processes in reinforcement learning, Markov Games (also called Stochastic Games)lay down the foundation for the study of multi-agent reinforcement learning (MARL) and sequential agent interactions. In this paper, we introduce the solution concept, approximate Markov Perfect Equilibrium (MPE), to finite-state Stochastic Games repeated in the infinite horizon, and prove its $\mathbf{PPAD}$-completeness in computational complexity. Technically, we adopt a function with a polynomially bounded description in the strategy space to convert the MPE computation to a fixed-point problem, even though the stochastic game may demand an exponential number of pure strategies, in the number of states, for each agent. The completeness result follows the reduction of the fixed-point problem to \textsc{End of the Line}.

Past works on the stochastic games are mostly zero-sum MARL algorithms. A $P^{PPAD}$ algorithm for the general sum stochastic games in the finite horizon can be derived to establish an approximation algorithm for the general-sum stochastic games. That implies an approximate NE solution to the infinite-horizon setting. Such a possible extension suffers from three weaknesses: 1. the solution is time-dependent and hence not a perfect equilibrium; 2. the time-dependent solution suffers a weakness of noncredible threats; 3. the time complexity is not tight (lower bound $\mathbf{PPAD}$ and upper bound $P^{PPAD}$). Our result beats such a solution in all those three properties.

ISSN 1433-8092 | Imprint