Date of Graduation
Bachelor of Science
Computer Science and Computer Engineering
Committee Member/Second Reader
Over the past two decades, online discussion has skyrocketed in scope and scale. However, so has the amount of toxicity and offensive posts on social media and other discussion sites. Despite this rise in prevalence, the ability to automatically moderate online discussion platforms has seen minimal development. Recently, though, as the capabilities of artificial intelligence (AI) continue to improve, the potential of AI-based detection of harmful internet content has become a real possibility. In the past couple years, there has been a surge in performance on tasks in the field of natural language processing, mainly due to the development of the Transformer architecture. One Google-developed Transformer-based model known as BERT has been used as a core part of many current research in the field of detecting toxic language. The methods presented in this paper propose to ensemble multiple BERT models trained on three classification tasks in order to improve capabilities of detecting abusive language in particular. This model uses sub-models using the BERT architecture trained on datasets labeled for hate speech, offensive language and abusive language. The approach presented in this paper is able to outperform the standard BERT model, and HateBERT, a re-trained variation of the BERT model used for detecting abusive language and other similar tasks.
Machine Learning, Natural Language Processing, BERT, Abusive Language
Ballinger, N. (2022). Using a BERT-based Ensemble Network for Abusive Language Detection. Computer Science and Computer Engineering Undergraduate Honors Theses Retrieved from https://scholarworks.uark.edu/csceuht/108
Artificial Intelligence and Robotics Commons, Databases and Information Systems Commons, Graphics and Human Computer Interfaces Commons, Information Security Commons, Programming Languages and Compilers Commons, Systems Architecture Commons