Date of Graduation
5-2025
Document Type
Thesis
Degree Name
Master of Science in Computer Science (MS)
Degree Level
Graduate
Department
Electrical Engineering and Computer Science
Advisor/Mentor
Luu, Khoa
Committee Member
Cothren, Jackson D.
Second Committee Member
Gauch, John M.
Keywords
Computer Vision; Cyclic Graph Transformer; Hierarchical Interlacement Graph; HyperGraph; Video Scene Graph Generation; Video Understanding
Abstract
This thesis advances video understanding by enhancing Video Scene Graph Generation (VidSGG) through improved temporal modeling, the integration of long-range temporal dependencies via continuous updates to interaction histories, and the utilization of Large Language Models (LLMs) for scene graph reasoning. To this end, three novel datasets and corresponding approaches are introduced. First, the ASPIRe dataset incorporates interactivity annotations and leverages the Hierarchical Interlacement Graph (HIG) for hierarchical temporal modeling, providing deep insights into scene changes and effectively capturing intricate interactions. Next, the AeroEye dataset, focusing on drone videos, is paired with the Cyclic Graph Transformer (CYCLO), which establishes circular connectivity among video frames to model direct and long-range temporal relationships. Finally, the VSGR dataset, a new large-scale benchmark for advancing scene graph reasoning tasks, is introduced. Notably, the Multimodal LLMs on a Scene HyperGraph (HyperGLM) approach integrates hypergraph-based representations with LLMs, enabling more nuanced multimodal reasoning. These contributions significantly enhance dataset diversity, strengthen relationship modeling, and improve causal reasoning in VidSGG, resulting in state-of-the-art performance.
Citation
Nguyen, T. (2025). Towards Multimodal Scene Graph Generation Approaches to Video Understanding. Graduate Theses and Dissertations Retrieved from https://scholarworks.uark.edu/etd/5621