Multiagent reinforcement learning (MARL) has been widely applied in engineering problems. However, many strictly constrained problems such as distributed optimization in engineering applications are still a great challenge to MARL.… Click to show full abstract
Multiagent reinforcement learning (MARL) has been widely applied in engineering problems. However, many strictly constrained problems such as distributed optimization in engineering applications are still a great challenge to MARL. Especially for strict global constraints of agents' actions, it is very easy to lead to sparse rewards. Besides, existing studies cannot solve the instability caused by partial observability while making the algorithm fully distributed. Algorithms with centralized training may encounter significant obstacles in real‐world deployment. For the first time, we provide theoretical analysis for MARL to determine the adverse effects of partial observability on convergence, and a fully distributed and convergent MARL algorithm based on Reward Recorder is proposed. Each agent runs an independent reinforcement learning algorithm and uses the average‐consensus protocol to estimate the global state‐action value locally to achieve global optimization. To verify the performance of the algorithm, we propose a novel generalized constrained optimization model, which includes local inequality constraints and strict global constraints. The proposed distributed reinforcement learning algorithm is supported by several simulation examples. The results reveal that the proposed algorithm has high stability and excellent decision‐making ability.
               
Click one of the above tabs to view related content.