Graduate Theses and Dissertations

A Multimodal Retrieval-Augmented Generation Pipeline Integrating Computer Vision and Biomechanical Modeling for Tennis Serve Analysis

Olga Bienzobas, University of Arkansas-FayettevilleFollow

Author ORCID Identifier:

https://orcid.org/0009-0002-1477-7753

Date of Graduation

12-2025

Document Type

Thesis

Degree Name

Master of Science in Computer Science (MS)

Degree Level

Graduate

Department

Computer Science & Computer Engineering

Advisor/Mentor

Nelson, Alexander

Committee Member

Panda, Brajendra

Second Committee Member

Gauch, John

Keywords

Applied Machine Learning; Athlete Performance Analysis; Computer Vision in Sports; Multimodal AI Systems; Retrieval Augmented Generation (RAG); Tennis Serve Analysis

Abstract

Recent advances in large language models (LLMs) and Retrieval-Augmented Generation (RAG) have enabled AI systems to provide interpretable, domain-specific, and evidence-grounded feedback. Extending these capabilities to human movement analysis remains difficult because text-based knowledge must be fused with individualized biomechanical information extracted from video. This thesis examines whether a multimodal pipeline that integrates computer vision with RAG can generate personalized, literature-grounded feedback for a highly technical motor skill: the tennis serve. The proposed system processes a single slow-motion smartphone video of a serve, together with basic player metadata, to produce natural-language coaching guidance anchored in established biomechanics. The video analysis branch performs two-dimensional pose estimation, tennis-ball detection and tracking, temporal serve-phase segmentation, biomechanical feature extraction, exemplar-based comparison, and hybrid cue retrieval. These kinematic signals condition a RAG module that retrieves relevant coaching principles from a curated corpus of federation manuals, peer-reviewed research, and expert instructional material. A lightweight language model synthesizes the retrieved evidence with the cues relating the player’s movement patterns to produce specific, interpretable, and actionable feedback. Experimental evaluation demonstrates that consumer-grade 240 fps video provides sufficiently detailed motion information to support accurate serve-phase segmentation and meaningful retrieval alignment. For novice, intermediate players, and advanced players the system consistently generates feedback that is both personalized and grounded in validated expert knowledge, narrowing the accessibility gap between high-cost motion-capture systems and everyday training capabilities. This work highlights the feasibility of combining computer vision and retrieval-based language models for automated skill assessment and establishes a foundation for broader applications in AI-assisted coaching and human movement analysis.

Citation

Bienzobas, O. (2025). A Multimodal Retrieval-Augmented Generation Pipeline Integrating Computer Vision and Biomechanical Modeling for Tennis Serve Analysis. Graduate Theses and Dissertations Retrieved from https://scholarworks.uark.edu/etd/6096

Graduate Theses and Dissertations

A Multimodal Retrieval-Augmented Generation Pipeline Integrating Computer Vision and Biomechanical Modeling for Tennis Serve Analysis

Author ORCID Identifier:

Date of Graduation

Document Type

Degree Name

Degree Level

Department

Advisor/Mentor

Committee Member

Second Committee Member

Keywords

Abstract

Citation

Included in

Search

Links

Browse

Contact Us

Graduate Theses and Dissertations

A Multimodal Retrieval-Augmented Generation Pipeline Integrating Computer Vision and Biomechanical Modeling for Tennis Serve Analysis

Author

Author ORCID Identifier:

Date of Graduation

Document Type

Degree Name

Degree Level

Department

Advisor/Mentor

Committee Member

Second Committee Member

Keywords

Abstract

Citation

Included in

Share

Search

Links

Browse

Contact Us