Date of Graduation
5-2025
Document Type
Dissertation
Degree Name
Doctor of Philosophy in Engineering (PhD)
Degree Level
Graduate
Department
Electrical Engineering and Computer Science
Advisor/Mentor
Wu, Xintao
Committee Member
Zhang, Lu
Second Committee Member
Le, Thi Hoang Ngan
Third Committee Member
Shen, Haoming
Keywords
Adaptive training; Adversarial attack; Adversarial defense; Multimodal hate mitigation; Robust training; Vision-Language Models
Abstract
With the rapid development of machine learning in real-world applications, enhancing security plays an important role. Adversarial machine learning focuses on understanding malicious actions from attackers and developing defensive techniques against such threats when deploying machine learning systems. An attack can occur in different scenarios, such as poisoning attacks during the training stage and evasion attacks during the testing stage. Although extensive research has explored defense strategies to deal with these harmful attacks, there is a need for further research into areas such as how to counteract malicious attacks with healthy noise or how to train an adaptive defense against a mixture of attacks. Additionally, developing novel attack methodologies is essential for uncovering underexplored vulnerabilities in the training and testing pipelines, thereby providing defenders with deeper insights into the inherent weaknesses of model architectures. Most adversarial attacks primarily aim to degrade overall classification accuracy; however, there is a notable lack of attack strategies that target models designed for fair prediction or multimodal retrieval. Furthermore, malicious users continuously devise subtle methods to disseminate harmful content on social media, necessitating the development of intelligent systems capable of detecting and mitigating such content. Vision-Language Models, which have been widely used in real-world applications, hold significant potential for fostering safer and more respectful online environments.
The goal of this dissertation is to address critical challenges in ensuring safety in machine learning models, focusing on the development of novel attack and defense methods. We begin by investigating two defenses against specific attack types: poisoning attacks and evasion attacks. Next, we examine the potential threats posed by adversaries targeting the fairness of machine learning models, introducing a novel poisoning attack on fair machine learning systems. We then analyze the vulnerabilities of multimodal pre-trained models under adversarial attacks. Finally, we explore methods for detecting and mitigating hateful content in multimodal memes utilizing Vision-Language Models. In this dissertation, we present the following frameworks and algorithms. We develop a defense against data poisoning attacks by leveraging the influence function, which helps the model reduce the harmful effect of poisoned training data; We introduce a defense against evasion attacks via adaptive training, which makes the model adaptive and robust to unseen attacks at the testing stage; We design an attack on fair machine learning models, which not only degrades model accuracy but also hinders the model's fairness objective; We investigate adversarial attacks that degrade the multimodal retrieval capability of pre-trained models; We introduce a definition-guided prompting-based method for detecting hateful memes; We develop a unified framework to transform hateful memes into non-hateful versions.
Citation
Van, M. (2025). Adversarial Machine Learning: Methods for Attacks and Defenses. Graduate Theses and Dissertations Retrieved from https://scholarworks.uark.edu/etd/5674