As a model-based reinforcement learning technique, linearly solvable Markov decision process (LMDP) gives an efficient way to find an optimal policy by making the Bellman equation linear under some assumptions.… Click to show full abstract
As a model-based reinforcement learning technique, linearly solvable Markov decision process (LMDP) gives an efficient way to find an optimal policy by making the Bellman equation linear under some assumptions. Since LMDP is regarded as model-based reinforcement learning, the performance of LMDP is sensitive to the accuracy of the environmental model. To overcome the problem of the sensitivity, linearly solvable Markov game (LMG) has been proposed, which is an extension of LMDP based on the game theory. This paper investigates the robustness of LMDP- and LMG-based controllers against modeling errors in both discrete and continuous state-action problems. When there is a discrepancy between the model used for building the control policy and dynamics of the tested environment, the LMG-based control policy maintained good performance while that of the LMDP-based control policy deteriorated drastically. Experimental results support the usefulness of LMG framework when acquiring an accurate model of the environment is difficult.
               
Click one of the above tabs to view related content.