Date of Graduation
Master of Science in Computer Science (MS)
Computer Science & Computer Engineering
Brajendra Nath Panda
Second Committee Member
Graph Matching, Graph Similarity, NLP, Social network, Social network analysis
This thesis develops an approach to extract social networks from literary prose, namely, Jane Austen’s published novels from eighteenth- and nineteenth- century. Dialogue interaction plays a key role while we derive the networks, thus our technique relies upon our ability to determine when two characters are in conversation. Our process involves encoding plain literary text into the Text Encoding Initiative’s (TEI) XML format, character name identification, conversation and co-occurrence detection, and social network construction. Previous work in social network construction for literature have focused on drama, specifically manually TEI-encoded Shakespearean plays in which character interactions are much easier to track in due to their dialogue-driven narrative structure. In contrast, prose is structured quite differently; character speeches are not very clearly formatted, making it more difficult to assign specific dialogue to each character. We implement two different parsing strategies based on context size (chapter scope and paragraph scope) to detect character interactions. To check the accuracy of our methods, we conduct one evaluation that is based on network statistics and another evaluation that involves measuring similarity (edit distance) between the networks constructed from manually encoded novels versus our constructed graphs. Our findings suggest that the choice of context size is non-trivial and can have a substantial influence on the resulting networks. In general, the paragraph level interaction approach seemed to be more accurate.
Bipasha, T. (2019). Extracting Social Network from Literary Prose. Theses and Dissertations Retrieved from https://scholarworks.uark.edu/etd/3415