Survey of Memory Consolidation Techniques for Video Question Answering

Matthew Couts, University of Arkansas, FayettevilleFollow
Pha Nguyen, University of Arkansas, FayettevilleFollow
Khoa Luu, University of Arkansas, FayettevilleFollow

Mentor

Khoa Luu

Keywords

Artificial Intelligence, Machine Learning, Large Language Model

Abstract

Video Question Answering (VideoQA) is a field of research focused on developing models that can engage in natural conversations with humans about the content of videos. Currently, the most successful approaches involve analyzing videos frame-by-frame, which is computationally and memory-intensive. To imitate human memory, the Atkinson-Shiffrin memory model can formulate the machine’s video understanding capability through Vision-Language Models. Reducing the number of frames processed by the model is a crucial operation in this approach category and can be handled by a memory consolidation algorithm. The memory consolidation algorithm should be able to determine the keyframes to transfer from short-term to long-term memory. However, due to the complexity of events in videos, this approach may need to pay more attention to critical information by efficient and appropriate operations. This paper aims to compare video understanding capabilities by analyzing the memory consolidation algorithms. Specifically, we present experiments evaluating simple but effective memory consolidation operations on the ActivityNet-QA dataset to construct an optimal memory consolidation process.

Recommended Citation

Couts, Matthew; Nguyen, Pha; and Luu, Khoa (2024) "Survey of Memory Consolidation Techniques for Video Question Answering," Inquiry: The University of Arkansas Undergraduate Research Journal: Vol. 23: Iss. 1, Article 5.
https://doi.org/10.54119/inquiry.2024.23101
Available at: https://scholarworks.uark.edu/inquiry/vol23/iss1/5

Download