Date of Graduation
12-2019
Document Type
Thesis
Degree Name
Bachelor of Science
Degree Level
Undergraduate
Department
Computer Science and Computer Engineering
Advisor/Mentor
Gauch, Susan
Committee Member/Reader
Gauch, John
Committee Member/Second Reader
Li, Qinghua
Abstract
Word embedding is the process of representing words from a corpus of text as real number vectors. These vectors are often derived from frequency statistics from the source corpus. In the GloVe model as proposed by Pennington et al., these vectors are generated using a word-word cooccurrence matrix. However, the GloVe model fails to explicitly take into account the order in which words appear within the contexts of other words. In this paper, multiple methods of incorporating word order in GloVe word embeddings are proposed. The most successful method involves directly concatenating several word vector matrices for each position in the context window. An improvement of 9.7% accuracy is achieved by using this explicit representation of word order with GloVe word embeddings.
Keywords
glove; word embedding; text
Citation
Cox, B. (2019). Incorporating word order explicitly in GloVe word embedding. Computer Science and Computer Engineering Undergraduate Honors Theses Retrieved from https://scholarworks.uark.edu/csceuht/71
Included in
Artificial Intelligence and Robotics Commons, Numerical Analysis and Scientific Computing Commons, Other Computer Sciences Commons