LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Membership Inference Attacks With Token-Level Deduplication on Korean Language Models

Photo from wikipedia

The confidentiality threat against training data has become a significant security problem in neural language models. Recent studies have shown that memorized training data can be extracted by injecting well-chosen… Click to show full abstract

The confidentiality threat against training data has become a significant security problem in neural language models. Recent studies have shown that memorized training data can be extracted by injecting well-chosen prompts into generative language models. While these attacks have achieved remarkable success in the English-based Transformer architecture, it is unclear whether they are still effective in other language domains. This paper studies the effectiveness of attacks against Korean models and the potential for attack improvements that might be beneficial for future defense studies. The contribution of this study is two-fold. First, we perform a membership inference attack against the state-of-the-art Korean GPT model. We found approximate training data with 20% to 90% precision in the top-100 samples and confirmed that the proposed attack technique for naive GPT is valid across the language domains. Second, in this process, we observed that the redundancy of the selected sentences could hardly be detected with the existing attack method. Since the information appearing in a few documents is more likely to be meaningful, it is desirable to increase the uniqueness of the sentences to improve the effectiveness of the attack. Thus, we propose a deduplication strategy to replace the traditional word-level similarity metric with the BPE token level. Our proposed strategy reduces 6% to 22% of the underestimated samples among selected ones, thereby improving precision by up to 7%p. As a result, we show that considering both language- and model-specific characteristics is essential to improve the effectiveness of attack strategies. We also discuss possible mitigations against the MI attacks on the general language models.

Keywords: token level; language; attack; language models; membership inference

Journal Title: IEEE Access
Year Published: 2023

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.