"WordRevert: Adversarial Examples Defence Method for Chinese Text Classification"

Adversarial examples can evade the detection of text classification models based on Deep Neural Networks (DNNs), thus posing a potential security threat to the system. To address this problem, we propose an adversarial example defense method for Chinese text classification called WordRevert. The method first obtains the “positive text” containing the adversarial words by filtering the clauses that do not contribute to the current classification label. Then the detection network is combined with the position importance calculation function to achieve the detection of the adversarial words. Finally, the adversarial words are restored to the original words by calculating the candidate score and the detection score. The experiments show that the current popular Chinese text adversarial attack algorithms can be effectively defended by this method, and achieve a significant increase in the accuracy of the adversarial examples with a small reduction in the classification accuracy of clean samples while achieving better precision, recall, and F1 value of adversarial word detection and restoration.

Keywords: method; detection; classification; adversarial examples; chinese text; text classification

Journal Title: IEEE Access
Year Published: 2022

Link to full text (if available)

Share on Social Media: Sign Up to like & get
recommendations!
1

LAUSR

You are not signed in:

Sign Up!

Related content

More Information News Social Media Video Recommended