Natural language processing (NLP) models are known vulnerable to adversarial examples, similar to image processing models. Studying adversarial texts is an essential step to improve the robustness of NLP models.… Click to show full abstract
Natural language processing (NLP) models are known vulnerable to adversarial examples, similar to image processing models. Studying adversarial texts is an essential step to improve the robustness of NLP models. However, existing studies mainly focus on generating adversarial texts for English, with no prior knowledge that whether those attacks could be applied to Chinese. After analyzing the differences between Chinese and English, we propose a novel adversarial Chinese text generation solution Argot, by utilizing the method for adversarial English examples and several novel methods developed on Chinese characteristics. Argot could effectively and efficiently generate adversarial Chinese texts with good readability in both white-box and black-box settings. Argot could also automatically generate targeted Chinese adversarial texts, achieving a high success rate and ensuring the readability of the generated texts. Furthermore, we apply Argot to the spam detection task in both local detection models and a public toxic content detection system from a well-known security company. Argot achieves a relatively high bypass success rate with fluent readability, which proves that the real-world toxic content detection system is vulnerable to adversarial example attacks. We also evaluate some available defense strategies, and the results indicate that Argot can still achieve high attack success rates.
               
Click one of the above tabs to view related content.