Efficient exploration is critical in deploying Deep Reinforcement Learning (DRL) for joint power control and beamforming in mmWave networks. This letter proposes Bootstrapped and Bayesian Deep Q-Network (B2DQN), a DRL… Click to show full abstract
Efficient exploration is critical in deploying Deep Reinforcement Learning (DRL) for joint power control and beamforming in mmWave networks. This letter proposes Bootstrapped and Bayesian Deep Q-Network (B2DQN), a DRL algorithm based on both Bootstrap Sampling (BS) and Thompson Sampling (TS). BS induces diversity to prevent exploration from being trapped in local optima, and TS allows for targeted exploration of actions with higher estimated returns with higher probability. B2DQN synthesizes the advantages of BS and TS by building a Bayesian linear regression model on each bootstrapped function to improve the diversity of BS and enhance targeted exploration with the diversity. The experiment results demonstrate that B2DQN outperforms the DQN variants based only on BS or TS, often learning policies that attain 78%-360% higher final performance measured by the convergence episodes in an open-source simulated 5G mmWave network.
               
Click one of the above tabs to view related content.