Date of Graduation

12-2019

Document Type

Thesis

Degree Name

Bachelor of Science

Degree Level

Undergraduate

Department

Computer Science and Computer Engineering

Advisor/Mentor

Gauch, Susan

Committee Member/Reader

Gauch, John

Committee Member/Second Reader

Li, Qinghua

Abstract

Word embedding is the process of representing words from a corpus of text as real number vectors. These vectors are often derived from frequency statistics from the source corpus. In the GloVe model as proposed by Pennington et al., these vectors are generated using a word-word cooccurrence matrix. However, the GloVe model fails to explicitly take into account the order in which words appear within the contexts of other words. In this paper, multiple methods of incorporating word order in GloVe word embeddings are proposed. The most successful method involves directly concatenating several word vector matrices for each position in the context window. An improvement of 9.7% accuracy is achieved by using this explicit representation of word order with GloVe word embeddings.

Keywords

glove; word embedding; text

Share

COinS