Date of Graduation


Document Type


Degree Name

Bachelor of Science in Computer Science


Computer Science and Computer Engineering


Luu, Khoa

Committee Member/Reader

Jin, Kevin

Committee Member/Second Reader

Gauch, Johh


This paper delves into advancements and hurdles encountered in multi-object tracking, a critical aspect of computer vision, with a special emphasis on 'referring understanding.' This technique integrates natural language queries into multi-object tracking tasks, thus broadening the scope for practical applications. The innovative referring multi-object tracking (RMOT) approach emerges as a promising solution in this regard. The effectiveness of RMOT was tested using the Refer-KITTI dataset, a dataset specializing in traffic scenes. The evaluation revealed RMOT's ability to handle a diverse range of referent objects, its robust temporal dynamics, and a high level of adaptability. While the paper acknowledges the significant strides made with this approach, it also illuminates a few inherent limitations and new challenges such as multi-object prediction and cross-frame association. In addressing these issues, the paper attempts to retrain an end-to-end differentiable framework for RMOT, building on the latest DETR framework, suggesting promising prospects for future advancements in this domain. The ultimate goal of this paper is to refine the RMOT model further, promote a more profound understanding of the computer vision landscape, and underscore the technology's potential for future research and applications.


Computer vision