The existing reinforcement learning-based downlink power allocation (PA) schemes mostly consider the power optimization space as a discrete value space, however, their results will deviate from the optimal result in… Click to show full abstract
The existing reinforcement learning-based downlink power allocation (PA) schemes mostly consider the power optimization space as a discrete value space, however, their results will deviate from the optimal result in ultra-dense networks, and the deviation grows as the network size increases. This letter proposed a PA model based on deep deterministic policy gradient (DDPG), where policy-based power selection assisted with value-based evaluation is leveraged to explore the optimal result from a continuous power space. Specifically, this model uses two CNNs named actors to formulate continuous deterministic PA strategy function instead of discrete power distribution sampling, and designs another two CNNs named critics for PA strategy evaluation and actor CNNs’ update supervision. Additionally, to reduce the interference, a tunable service base station set is designed for each user and is considered for the model training. Experiments demonstrate the proposed DDPG-based PA model respectively reaches 116.2% and 95.9% sum-rate relative to the iterative algorithm in small and large-scale networks.
               
Click one of the above tabs to view related content.