Although the deep reinforcement learning (DRL) technology has been widely adopted in various fields, it has become an important research hotspot to study the vulnerability of DRL for improving the… Click to show full abstract
Although the deep reinforcement learning (DRL) technology has been widely adopted in various fields, it has become an important research hotspot to study the vulnerability of DRL for improving the robustness of DRL agents. The adversarial attack methods based on white-box models, where the adversary can access all the information of victims, have been intensively investigated. However, in most practical situations, the adversary cannot obtain the internal information of the victim’s neural network. Furthermore, for reward-based attacks, the agent can perform anomaly detection on the perturbed rewards to detect whether it has been attacked. In this paper, we propose a black-box attack method with corrupted rewards, which employs DRL exploration mechanisms to improve the effectiveness of attacking agents. The adversary builds a deep neural network in advance to learn the successor representation (SR) of each state. Then, the adversary can determine the timing of attacks and generate imperceptible adversarial perturbations based on the values of the SR. Experimental results show that the black-box attack algorithm based on SR proposed in this paper can effectively attack agents with fewer adversarial samples.
               
Click one of the above tabs to view related content.