SECURING AI-DRIVEN TEXT CLASSIFICATION AGAINST ADVERSARIAL NLP ATTACKS

Komal Azim; Saima Noreen Khosa; Saba Tahir; Muhammad Altaf Ahmad; Wajahat Hussain; Urooj Akram; Muhammad Faheem Mushtaq

Authors

Komal Azim
Saima Noreen Khosa
Saba Tahir
Muhammad Altaf Ahmad
Wajahat Hussain
Urooj Akram
Muhammad Faheem Mushtaq

Keywords:

Natural Language Processing, Cyber Security, Text Classification, Adversarial Attacks, Deep Learning, Ensemble Models

Abstract

The integration of Artificial Intelligence (AI) has revolutionized Natural Language Processing (NLP) enables advanced text classification tasks such as sentiment analysis, spam detection, and news categorization. However, the widespread adoption of AI in NLP has introduced significant cybersecurity risks, as these systems are highly vulnerable to adversarial attacks. These attacks aim to skew predictions and compromise their accuracy and integrity by making minor adjustments to input data, taking advantage of flaws in NLP models. We analyse and assess adversarial assaults on text categorization methods using AG News datasets. We examine how the model's performance might be assessed without human visual notice by implementing relatively straightforward transformation techniques such word substitution, paraphrase, or syntax alterations. These attacks highlight the basic flaws in NLP systems and demonstrate how easily they may be twisted and used maliciously. With up to 97% resilience against hostile attacks, the models proposed ensemble models by integrating the deep learning architectures include Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), and Recurrent Neural Networks (RNN). CNN performed better at identifying localized features, even though both the LSTM and RNN models showed good sequential processing skills. They significantly increased their resilience by combining complimentary qualities into ensemble frameworks. The highest success rate demonstrates that the ensemble tactics work to reduce adversary manipulation while preserving excellent classification accuracy.