Date of Graduation
12-2022
Document Type
Thesis
Degree Name
Master of Science in Computer Science (MS)
Degree Level
Graduate
Department
Computer Science & Computer Engineering
Advisor/Mentor
Gauch, Susan E.
Committee Member
Zhan, Justin
Second Committee Member
Nakarmi, Ukash
Third Committee Member
Pan, Yanjun
Keywords
BERT; BiLSTM; Deep learning; Movie Reviews; Sentiment Analysis
Abstract
Sentiment analysis (SA) or opinion mining is analysis of emotions and opinions from texts. It is one of the active research areas in Natural Language Processing (NLP). Various approaches have been deployed in the literature to address the problem. These techniques devise complex and sophisticated frameworks in order to attain optimal accuracy with their focus on polarity classification or binary classification. In this paper, we aim to fine-tune BERT in a simple but robust approach for movie reviews sentiment analysis to provide better accuracy than state-of-the-art (SOTA) methods. We start by conducting sentiment classification for every review, followed by computing overall sentiment polarity for all the reviews. Both polarity classification and fine-grained classification or multi-scale sentiment distribution are implemented and tested on benchmark datasets in our work. To optimally adapt BERT for sentiment classification, we concatenate it with a Bidirectional LSTM (BiLSTM) layer. We also implemented and evaluated some accuracy improvement techniques including Synthetic Minority Over-sampling TEchnique (SMOTE) and NLP Augmenter (NLPAUG) to improve the model for prediction of multi-scale sentiment distribution. We found that including NLPAUG improved accuracy, however SMOTE did not work well. Lastly, a heuristic algorithm is applied to compute overall polarity of predicted reviews from the model output vector. We call our model BERT+BiLSTM-SA, where SA stands for Sentiment Analysis. Our best-performing approach comprises BERT and BiLSTM on binary, three-class, and four-class sentiment classifications, and SMOTE augmentation, in addition to BERT and BiLSTM, on five-class sentiment classification. Our approach performs at par with SOTA techniques on both classifications. For example, on binary classification, we obtain 97.67% accuracy, while the best performing SOTA model, NB-weighted-BON+dvcosine,has 97.40% accuracy on the popular IMDb dataset. The baseline, Entailment as Few-Shot Learners (EFL), is outperformed on this task by 1.30%. On the other hand, for five-class classification on SST-5, the best SOTA model, RoBERTa+large+Self-explaining, has 55.5% accuracy, while we obtain 59.48% accuracy. We outperform the baseline on this task, BERT-large, by 3.6%.
Citation
Nkhata, G. (2022). Movie Reviews Sentiment Analysis Using BERT. Graduate Theses and Dissertations Retrieved from https://scholarworks.uark.edu/etd/4768
Included in
Artificial Intelligence and Robotics Commons, Graphics and Human Computer Interfaces Commons, Human Ecology Commons