Author ORCID Identifier:

https://orcid.org/0009-0009-9246-8340

Date of Graduation

5-2026

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Science (PhD)

Degree Level

Graduate

Department

Computer Science & Computer Engineering

Advisor/Mentor

Wu, Xintao

Committee Member

Panda, Brajendra

Second Committee Member

Zhang, Lu

Third Committee Member

Arnold, Mark

Keywords

Causality; Generative Modeling; Representation Learning; Trustworthy Artificial Intelligence

Abstract

The hallmark of human intelligence is causal reasoning, the ability to infer relationships between causes and effects through observation and intervention. While modern deep learning has excelled at identifying statistical patterns, current generative models often struggle to capture the underlying structural causal mechanisms of the data-generating process, leaving them vulnerable to shortcut learning and spurious associations. To achieve true generalizability and interpretability, artificial intelligence must transition from simple association to higher-level causal reasoning to be capable of scheduling and planning in the real world. This dissertation develops fundamental methodologies for causal generative modeling by integrating Pearl’s Structural Causal Model (SCM) formalism with deep generative architectures. Specifically, this research aims to address critical gaps in generative modeling, including (1) how can we develop a new notion of disentanglement for causally-related generative factors and a flexible causal representation learning framework with theoretical guarantees? (2) how can we integrate causal modeling into the training process of state-of-the-art diffusion generative models to enable high-fidelity counterfactual generation? (3) how can we robustly frame and evaluate causal reasoning of pre-trained large vision-language models? (4) how can we utilize the strengths of generative foundation models to develop a unified inference-time framework for causal generative modeling from concept discovery to counterfactual generation? To address these questions, we develop the following methods: 1. We introduce ICM-VAE, a variational Bayes framework that leverages structured priors inspired by the Principle of Independent Causal Mechanisms to learn disentangled latent causal factors. We theoretically and empirically demonstrate that this approach enables the recovery of modular and disentangled causal mechanisms. 2. We propose CausalDiffAE, a novel framework integrating the learning of causal mechanisms into the training process of diffusion probabilistic models to enable high-fidelity counterfactual image generation. 3. We investigate reasoning abilities of pretrained large vision-language models through a causal lens. We propose CausalVLBench, a benchmarking framework that evaluates the formal causal reasoning capabilities of Large Vision-Language Models (LVLMs) across three novel tasks: causal structure inference, intervention target prediction, and counterfactual prediction. 4. We present a unified paradigm, the Foundation Model Powered Causal Generative Model (FM-CGM), a framework that utilizes LVLMs for concept inference and text-to-image diffusion models for counterfactual generation. Within this paradigm, we develop Causal Semantic Guidance, an inference-time diffusion-based editing method that performs minimal and faithful counterfactual image edits. Collectively, these methodologies provide a robust framework for building AI systems that do not merely mimic data distributions but understand and manipulate causal variables. This shift has significant implications for high-stakes domains such as healthcare and scientific discovery.

Share

COinS