Date of Graduation

5-2025

Document Type

Thesis

Degree Name

Master of Science in Computer Science (MS)

Degree Level

Graduate

Department

Electrical Engineering and Computer Science

Advisor/Mentor

Luu, Khoa

Committee Member

Cothren, Jackson D.

Second Committee Member

Gauch, John M.

Keywords

Computer Vision; Cyclic Graph Transformer; Hierarchical Interlacement Graph; HyperGraph; Video Scene Graph Generation; Video Understanding

Abstract

This thesis advances video understanding by enhancing Video Scene Graph Generation (VidSGG) through improved temporal modeling, the integration of long-range temporal dependencies via continuous updates to interaction histories, and the utilization of Large Language Models (LLMs) for scene graph reasoning. To this end, three novel datasets and corresponding approaches are introduced. First, the ASPIRe dataset incorporates interactivity annotations and leverages the Hierarchical Interlacement Graph (HIG) for hierarchical temporal modeling, providing deep insights into scene changes and effectively capturing intricate interactions. Next, the AeroEye dataset, focusing on drone videos, is paired with the Cyclic Graph Transformer (CYCLO), which establishes circular connectivity among video frames to model direct and long-range temporal relationships. Finally, the VSGR dataset, a new large-scale benchmark for advancing scene graph reasoning tasks, is introduced. Notably, the Multimodal LLMs on a Scene HyperGraph (HyperGLM) approach integrates hypergraph-based representations with LLMs, enabling more nuanced multimodal reasoning. These contributions significantly enhance dataset diversity, strengthen relationship modeling, and improve causal reasoning in VidSGG, resulting in state-of-the-art performance.

Share

COinS