Date of Graduation

12-2019

Document Type

Thesis

Degree Name

Master of Science in Computer Science (MS)

Degree Level

Graduate

Department

Computer Science & Computer Engineering

Advisor/Mentor

Susan Gauch

Committee Member

Brajendra Nath Panda

Second Committee Member

Qinghua Li

Keywords

Graph Matching, Graph Similarity, NLP, Social network, Social network analysis

Abstract

This thesis develops an approach to extract social networks from literary prose, namely, Jane Austen’s published novels from eighteenth- and nineteenth- century. Dialogue interaction plays a key role while we derive the networks, thus our technique relies upon our ability to determine when two characters are in conversation. Our process involves encoding plain literary text into the Text Encoding Initiative’s (TEI) XML format, character name identification, conversation and co-occurrence detection, and social network construction. Previous work in social network construction for literature have focused on drama, specifically manually TEI-encoded Shakespearean plays in which character interactions are much easier to track in due to their dialogue-driven narrative structure. In contrast, prose is structured quite differently; character speeches are not very clearly formatted, making it more difficult to assign specific dialogue to each character. We implement two different parsing strategies based on context size (chapter scope and paragraph scope) to detect character interactions. To check the accuracy of our methods, we conduct one evaluation that is based on network statistics and another evaluation that involves measuring similarity (edit distance) between the networks constructed from manually encoded novels versus our constructed graphs. Our findings suggest that the choice of context size is non-trivial and can have a substantial influence on the resulting networks. In general, the paragraph level interaction approach seemed to be more accurate.

Citation

Bipasha, T. (2019). Extracting Social Network from Literary Prose. Graduate Theses and Dissertations Retrieved from https://scholarworks.uark.edu/etd/3415

Download

Included in

Databases and Information Systems Commons, Digital Communications and Networking Commons

COinS

Graduate Theses and Dissertations

Extracting Social Network from Literary Prose

Date of Graduation

Document Type

Degree Name

Degree Level

Department

Advisor/Mentor

Committee Member

Second Committee Member

Keywords

Abstract

Citation

Included in

Browse

Links

Search

Graduate Theses and Dissertations

Extracting Social Network from Literary Prose

Author

Date of Graduation

Document Type

Degree Name

Degree Level

Department

Advisor/Mentor

Committee Member

Second Committee Member

Keywords

Abstract

Citation

Included in

Share

Browse

Links

Search