Date of Graduation


Document Type


Degree Name

Master of Science in Computer Science (MS)

Degree Level



Computer Science & Computer Engineering


Susan Gauch

Committee Member

Qinghua Li

Second Committee Member

Khoa Luu


Edit Distance, Information Retrieval, Quote Identification, String matching


Quoting a borrowed excerpt of text within another literary work was infrequently done prior to the beginning of the eighteenth century. However, quoting other texts, particularly Shakespeare, became quite common after that. Our work develops automatic approaches to identify that trend. Initial work focuses on identifying exact and modified sections of texts taken from works of Shakespeare in novels spanning the eighteenth century. We then introduce a novel approach to identifying modified quotes by adapting the Edit Distance metric, which is character based, to a word based approach. This paper offers an introduction to previous uses of this metric within a multitude of fields, describes the implementation of the different methodologies used for quote identification and then shows how a combination of both Edit Distance methods can help achieve a higher accuracy in quote identification than any one method implemented alone with an overall increase of 10%: from 0.638 and 0.609 to 0.737. Although we demonstrate our approach using Shakespeare quotes in eighteenth century novels, the techniques can be generalized to locate exact and/or partial matches between any set of text targets in any corpus. This work would be of value to literary scholars who want to track quotations over time and could also be applied to other languages.