Date of Graduation
12-2019
Document Type
Thesis
Degree Name
Master of Science in Computer Science (MS)
Degree Level
Graduate
Department
Computer Science & Computer Engineering
Advisor/Mentor
Gauch, Susan E.
Committee Member
Panda, Brajendra N.
Second Committee Member
Li, Qinghua
Keywords
Graph Matching; Graph Similarity; NLP; Social network; Social network analysis
Abstract
This thesis develops an approach to extract social networks from literary prose, namely, Jane Austen’s published novels from eighteenth- and nineteenth- century. Dialogue interaction plays a key role while we derive the networks, thus our technique relies upon our ability to determine when two characters are in conversation. Our process involves encoding plain literary text into the Text Encoding Initiative’s (TEI) XML format, character name identification, conversation and co-occurrence detection, and social network construction. Previous work in social network construction for literature have focused on drama, specifically manually TEI-encoded Shakespearean plays in which character interactions are much easier to track in due to their dialogue-driven narrative structure. In contrast, prose is structured quite differently; character speeches are not very clearly formatted, making it more difficult to assign specific dialogue to each character. We implement two different parsing strategies based on context size (chapter scope and paragraph scope) to detect character interactions. To check the accuracy of our methods, we conduct one evaluation that is based on network statistics and another evaluation that involves measuring similarity (edit distance) between the networks constructed from manually encoded novels versus our constructed graphs. Our findings suggest that the choice of context size is non-trivial and can have a substantial influence on the resulting networks. In general, the paragraph level interaction approach seemed to be more accurate.
Citation
Bipasha, T. (2019). Extracting Social Network from Literary Prose. Graduate Theses and Dissertations Retrieved from https://scholarworks.uark.edu/etd/3415
Included in
Databases and Information Systems Commons, Digital Communications and Networking Commons