Date of Graduation

5-2020

Document Type

Thesis

Degree Name

Master of Science in Computer Science (MS)

Degree Level

Graduate

Department

Computer Science & Computer Engineering

Advisor/Mentor

Susan Gauch

Committee Member

Qinghua Li

Second Committee Member

Khoa Luu

Keywords

Edit Distance, Information Retrieval, Quote Identification, String matching

Abstract

Quoting a borrowed excerpt of text within another literary work was infrequently done prior to the beginning of the eighteenth century. However, quoting other texts, particularly Shakespeare, became quite common after that. Our work develops automatic approaches to identify that trend. Initial work focuses on identifying exact and modified sections of texts taken from works of Shakespeare in novels spanning the eighteenth century. We then introduce a novel approach to identifying modified quotes by adapting the Edit Distance metric, which is character based, to a word based approach. This paper offers an introduction to previous uses of this metric within a multitude of fields, describes the implementation of the different methodologies used for quote identification and then shows how a combination of both Edit Distance methods can help achieve a higher accuracy in quote identification than any one method implemented alone with an overall increase of 10%: from 0.638 and 0.609 to 0.737. Although we demonstrate our approach using Shakespeare quotes in eighteenth century novels, the techniques can be generalized to locate exact and/or partial matches between any set of text targets in any corpus. This work would be of value to literary scholars who want to track quotations over time and could also be applied to other languages.

Share

COinS