Author ORCID Identifier:

https://orcid.org/0009-0002-1477-7753

Date of Graduation

12-2025

Document Type

Thesis

Degree Name

Master of Science in Computer Science (MS)

Degree Level

Graduate

Department

Computer Science & Computer Engineering

Advisor/Mentor

Nelson, Alexander

Committee Member

Panda, Brajendra

Second Committee Member

Gauch, John

Keywords

Applied Machine Learning; Athlete Performance Analysis; Computer Vision in Sports; Multimodal AI Systems; Retrieval Augmented Generation (RAG); Tennis Serve Analysis

Abstract

Recent advances in large language models (LLMs) and Retrieval-Augmented Generation (RAG) have enabled AI systems to provide interpretable, domain-specific, and evidence-grounded feedback. Extending these capabilities to human movement analysis remains difficult because text-based knowledge must be fused with individualized biomechanical information extracted from video. This thesis examines whether a multimodal pipeline that integrates computer vision with RAG can generate personalized, literature-grounded feedback for a highly technical motor skill: the tennis serve. The proposed system processes a single slow-motion smartphone video of a serve, together with basic player metadata, to produce natural-language coaching guidance anchored in established biomechanics. The video analysis branch performs two-dimensional pose estimation, tennis-ball detection and tracking, temporal serve-phase segmentation, biomechanical feature extraction, exemplar-based comparison, and hybrid cue retrieval. These kinematic signals condition a RAG module that retrieves relevant coaching principles from a curated corpus of federation manuals, peer-reviewed research, and expert instructional material. A lightweight language model synthesizes the retrieved evidence with the cues relating the player’s movement patterns to produce specific, interpretable, and actionable feedback. Experimental evaluation demonstrates that consumer-grade 240 fps video provides sufficiently detailed motion information to support accurate serve-phase segmentation and meaningful retrieval alignment. For novice, intermediate players, and advanced players the system consistently generates feedback that is both personalized and grounded in validated expert knowledge, narrowing the accessibility gap between high-cost motion-capture systems and everyday training capabilities. This work highlights the feasibility of combining computer vision and retrieval-based language models for automated skill assessment and establishes a foundation for broader applications in AI-assisted coaching and human movement analysis.

Available for download on Sunday, February 13, 2028

Share

COinS