We investigate the problem of dynamic spectrum anti-jamming access against intelligent jammer using game theory and opponent modeling. Previous work has formulated the interaction between the user and the intelligent… Click to show full abstract
We investigate the problem of dynamic spectrum anti-jamming access against intelligent jammer using game theory and opponent modeling. Previous work has formulated the interaction between the user and the intelligent jammer as an adversarial game, and aimed to find the Nash Equilibrium (NE). However, sticking to NE will lead to overcautious behaviors and can’t achieve the best performance while the jammer is sub-optimal. Thus, this letter tries to exploit the adaptive jammer and find the Best Response (BR) rather than NE. We propose the minimax deep Q network (DQN) to approximate anti-jamming utility while applying imitation learning to reason about the jammer’s policy. Based on the utility and imitation jamming policy, the user is able to find the policy beyond equilibrium solutions and enhance anti-jamming performance. Numerical results demonstrate that our scheme achieves a 30% improvement in success access rate over the NE-based and single-user DRL.
               
Click one of the above tabs to view related content.