Date of Graduation

5-2025

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Engineering (PhD)

Degree Level

Graduate

Department

Electrical Engineering and Computer Science

Advisor/Mentor

Wu, Xintao

Committee Member

Zhang, Lu

Second Committee Member

Le, Thi Hoang Ngan

Third Committee Member

Shen, Haoming

Keywords

Adaptive training; Adversarial attack; Adversarial defense; Multimodal hate mitigation; Robust training; Vision-Language Models

Abstract

With the rapid development of machine learning in real-world applications, enhancing security plays an important role. Adversarial machine learning focuses on understanding malicious actions from attackers and developing defensive techniques against such threats when deploying machine learning systems. An attack can occur in different scenarios, such as poisoning attacks during the training stage and evasion attacks during the testing stage. Although extensive research has explored defense strategies to deal with these harmful attacks, there is a need for further research into areas such as how to counteract malicious attacks with healthy noise or how to train an adaptive defense against a mixture of attacks. Additionally, developing novel attack methodologies is essential for uncovering underexplored vulnerabilities in the training and testing pipelines, thereby providing defenders with deeper insights into the inherent weaknesses of model architectures. Most adversarial attacks primarily aim to degrade overall classification accuracy; however, there is a notable lack of attack strategies that target models designed for fair prediction or multimodal retrieval. Furthermore, malicious users continuously devise subtle methods to disseminate harmful content on social media, necessitating the development of intelligent systems capable of detecting and mitigating such content. Vision-Language Models, which have been widely used in real-world applications, hold significant potential for fostering safer and more respectful online environments.

The goal of this dissertation is to address critical challenges in ensuring safety in machine learning models, focusing on the development of novel attack and defense methods. We begin by investigating two defenses against specific attack types: poisoning attacks and evasion attacks. Next, we examine the potential threats posed by adversaries targeting the fairness of machine learning models, introducing a novel poisoning attack on fair machine learning systems. We then analyze the vulnerabilities of multimodal pre-trained models under adversarial attacks. Finally, we explore methods for detecting and mitigating hateful content in multimodal memes utilizing Vision-Language Models. In this dissertation, we present the following frameworks and algorithms. We develop a defense against data poisoning attacks by leveraging the influence function, which helps the model reduce the harmful effect of poisoned training data; We introduce a defense against evasion attacks via adaptive training, which makes the model adaptive and robust to unseen attacks at the testing stage; We design an attack on fair machine learning models, which not only degrades model accuracy but also hinders the model's fairness objective; We investigate adversarial attacks that degrade the multimodal retrieval capability of pre-trained models; We introduce a definition-guided prompting-based method for detecting hateful memes; We develop a unified framework to transform hateful memes into non-hateful versions.

Share

COinS