Abstract The action-dependent heuristic approximate dynamic (ADHDP) for nonlinear multi-input multi-output (MIMO) system needs different forms to adapt to variable practical objects. Due to some inappropriate network structure or training… Click to show full abstract
Abstract The action-dependent heuristic approximate dynamic (ADHDP) for nonlinear multi-input multi-output (MIMO) system needs different forms to adapt to variable practical objects. Due to some inappropriate network structure or training algorithm, unsuccessful designs or undesirable control effect is common in reality. Thus, at first, this paper addresses the chain rule problem of the compound derivative in training the nonlinear MIMO ADHDP. Then, this paper researches and proposes four actor-critic algorithms systematically according to four typical nonlinear systems. That is, the action-network extension, the sub-network, the cascaded action-network and the combined method. To illustrate the four methods, their detailed structures, derivation procedures and training algorithms are derived. The Lyapunov stability for the nonlinear MIMO ADHDP is proved as well. Through examples of an idling engine and aircraft controlling, the simulation results show the effectiveness of these methods. Besides, the property, advantages, disadvantages and the applicability of these methods are compared and highlighted. The four methods can be used to meet the design requirement of almost all the nonlinear MIMO ADHDP control systems. For incoming scholars in search of a nonlinear MIMO ADHDP to achieve the best control effect, the four actor-critic structures and algorithms can be a reference.
               
Click one of the above tabs to view related content.