Date of Graduation


Document Type


Degree Name

Master of Science in Computer Science (MS)

Degree Level



Computer Science & Computer Engineering


Susan E. Gauch

Committee Member

Justin Zhan

Second Committee Member

Ukash Nakarmi

Third Committee Member

Yanjun Pan


BERT, BiLSTM, Deep learning, Movie Reviews, Sentiment Analysis


Sentiment analysis (SA) or opinion mining is analysis of emotions and opinions from texts. It is one of the active research areas in Natural Language Processing (NLP). Various approaches have been deployed in the literature to address the problem. These techniques devise complex and sophisticated frameworks in order to attain optimal accuracy with their focus on polarity classification or binary classification. In this paper, we aim to fine-tune BERT in a simple but robust approach for movie reviews sentiment analysis to provide better accuracy than state-of-the-art (SOTA) methods. We start by conducting sentiment classification for every review, followed by computing overall sentiment polarity for all the reviews. Both polarity classification and fine-grained classification or multi-scale sentiment distribution are implemented and tested on benchmark datasets in our work. To optimally adapt BERT for sentiment classification, we concatenate it with a Bidirectional LSTM (BiLSTM) layer. We also implemented and evaluated some accuracy improvement techniques including Synthetic Minority Over-sampling TEchnique (SMOTE) and NLP Augmenter (NLPAUG) to improve the model for prediction of multi-scale sentiment distribution. We found that including NLPAUG improved accuracy, however SMOTE did not work well. Lastly, a heuristic algorithm is applied to compute overall polarity of predicted reviews from the model output vector. We call our model BERT+BiLSTM-SA, where SA stands for Sentiment Analysis. Our best-performing approach comprises BERT and BiLSTM on binary, three-class, and four-class sentiment classifications, and SMOTE augmentation, in addition to BERT and BiLSTM, on five-class sentiment classification. Our approach performs at par with SOTA techniques on both classifications. For example, on binary classification, we obtain 97.67% accuracy, while the best performing SOTA model, NB-weighted-BON+dvcosine,has 97.40% accuracy on the popular IMDb dataset. The baseline, Entailment as Few-Shot Learners (EFL), is outperformed on this task by 1.30%. On the other hand, for five-class classification on SST-5, the best SOTA model, RoBERTa+large+Self-explaining, has 55.5% accuracy, while we obtain 59.48% accuracy. We outperform the baseline on this task, BERT-large, by 3.6%.