Unmanned-aerial-vehicle (UAV)-assisted communications has attracted increasing attention recently. This article investigates air–ground coordinated communications system, in which trajectories of air UAV base stations (UAV-BSs) and access control of ground users… Click to show full abstract
Unmanned-aerial-vehicle (UAV)-assisted communications has attracted increasing attention recently. This article investigates air–ground coordinated communications system, in which trajectories of air UAV base stations (UAV-BSs) and access control of ground users (GUs) are jointly optimized. We formulated this optimization problem as a mixed cooperative–competitive game, where each GU competes for the limited resources of UAV-BSs to maximize its own throughput by accessing a suitable UAV-BS, and UAV-BSs cooperate with each other and design their trajectories to maximize the defined fair throughput to improve the total throughput and keep the GU fairness. Moreover, the action space of GUs is discrete, while that of UAV-BS is continuous. To tackle this hybrid action space issue, we transform the discrete actions into continuous action probabilities and propose a multiagent deep reinforcement learning (MADRL) approach, named air–ground probabilistic multiagent deep deterministic policy gradient (AG-PMADDPG). With well-designed rewards, AG-PMADDPG can coordinate two types of agents, UAV-BSs and GUs, to achieve their own objectives based on local observations. Simulation results demonstrate that AG-PMADDPG can outperform the benchmark algorithms in terms of throughput and fairness.
               
Click one of the above tabs to view related content.