Date of Graduation
12-2024
Document Type
Thesis
Degree Name
Bachelor of Science in Computer Science
Degree Level
Undergraduate
Department
Computer Science and Computer Engineering
Advisor/Mentor
Luu, Khoa
Committee Member
Gauch, John
Second Committee Member
Gauch, Susan
Abstract
Video Question Answering (VideoQA) focuses on developing mod- els capable of engaging in natural language conversations about video con- tent. Current state-of-the-art typically analyze videos frame-by-frame, a process that is both computationally and memory-intensive. Integrating the Atkinson-Shiffrin memory model with Video Language Models has demon- strated potential for enhancing video understanding capabilities. Reducing the number of frames processed by the model is a crucial operation in this approach, which is achieved by a memory consolidation algorithm. This al- gorithm condenses a video sequence into a small set of representative frames which capture the essence of the video content. However, due to the com- plexity of events in videos, selecting keyframes efficiently and effectively remains a challenge. This work aims to address this challenge by comparing video understanding capabilities across different memory consolidation algo- rithms. Specifically, we present experiments evaluating simple but effective memory consolidation algorithms on the ActivityNet-QA dataset. Through this analysis, we aim to construct an optimal memory consolidation algo- rithm to improve model performance in VideoQA tasks.
Keywords
Video Understanding; Multimodal Large Language Models; Video Question Answering; Atkinson-Shiffrin Memory Model
Citation
Couts, M. (2024). Reducing Token Redundancy in Video-Language Models via Memory Consolidation Algorithm. Electrical Engineering and Computer Science Undergraduate Honors Theses Retrieved from https://scholarworks.uark.edu/elcsuht/17