Abstract In this paper, we study multi-agent systems with asymmetric information structure. Due to limited channel capacity in communication network, the information in routing path suffers a transmission delay. Instead… Click to show full abstract
Abstract In this paper, we study multi-agent systems with asymmetric information structure. Due to limited channel capacity in communication network, the information in routing path suffers a transmission delay. Instead of the game theoretic setting, we formulate the problem as an online quadratic optimization problem subject to stochastic systems involving input delay. Since the probability statistics of system noise is unknown, the decision-maker can not utilize the traditional optimal control strategies. Motivated by online convex optimization theory, we introduce the notion of regret, which measures the cumulative performance difference between the optimal statistics known (offline) index value and the statistics unknown (online) index value. The contributions of this paper are twofold. First, utilizing the linear minimum mean square biased estimate, we derive a learning based control policy and then characterize its behavior. Second, under some basic assumptions, we further prove that the regret grows at a sub-linear rate and it is explicitly bounded by O ( ln T ) .
               
Click one of the above tabs to view related content.