LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

TextFirewall: Omni-Defending Against Adversarial Texts in Sentiment Classification

Photo by enriquebancalari from unsplash

Sentiment classification has been broadly applied in real life, such as product recommendation and opinion-oriented analysis. Unfortunately, the widely employed sentiment classification systems based on deep neural networks (DNNs) are… Click to show full abstract

Sentiment classification has been broadly applied in real life, such as product recommendation and opinion-oriented analysis. Unfortunately, the widely employed sentiment classification systems based on deep neural networks (DNNs) are susceptible to adversarial attacks with imperceptible perturbations into the legitimate texts (also called adversarial texts). Adversarial texts could cause erroneous outputs even without access to the target model, bringing security concerns to systems deployed in safety-critical applications. However, studies on defending against adversarial texts are still in the early stage and not ready for tackling the emerging threats, especially in dealing with unknown attacks. Investigating the minor differences between adversarial texts and legitimate texts and enhancing the robustness of target models are two mainstream ideas for defending against adversarial texts. However, both of them suffer the generalization issue in dealing with unknown adversarial attacks. In this paper, we proposed a general method, called TextFirewall, for defending against adversarial texts crafted by various adversarial attacks, which shows the potential in identifying new developed adversarial attacks in the future. Given a piece of text, our TextFirewall identifies the adversarial text by investigating the inconsistency between the target model’s output and the impact value calculated by important words in the text. TextFirewall could be deployed as a third-party tool without modifying the target model and agnostic to the specific type of adversarial texts. Experimental results demonstrate that our proposed TextFirewall effectively identifies adversarial texts generated by the three state-of-the-art (SOTA) attacks and outperforms previous defense techniques. Specifically, TextFirewall achieves an average accuracy of 90.7% on IMDB and 96.9% on Yelp in defending the three SOTA attacks.

Keywords: sentiment classification; texts; adversarial texts; defending adversarial; textfirewall

Journal Title: IEEE Access
Year Published: 2021

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.