Date of Graduation

5-2022

Document Type

Thesis

Degree Name

Bachelor of Science

Degree Level

Undergraduate

Department

Computer Science and Computer Engineering

Advisor/Mentor

Zhan, Justin

Committee Member/Reader

Gauch, Susan

Committee Member/Second Reader

Streeter, Lora

Abstract

Over the past two decades, online discussion has skyrocketed in scope and scale. However, so has the amount of toxicity and offensive posts on social media and other discussion sites. Despite this rise in prevalence, the ability to automatically moderate online discussion platforms has seen minimal development. Recently, though, as the capabilities of artificial intelligence (AI) continue to improve, the potential of AI-based detection of harmful internet content has become a real possibility. In the past couple years, there has been a surge in performance on tasks in the field of natural language processing, mainly due to the development of the Transformer architecture. One Google-developed Transformer-based model known as BERT has been used as a core part of many current research in the field of detecting toxic language. The methods presented in this paper propose to ensemble multiple BERT models trained on three classification tasks in order to improve capabilities of detecting abusive language in particular. This model uses sub-models using the BERT architecture trained on datasets labeled for hate speech, offensive language and abusive language. The approach presented in this paper is able to outperform the standard BERT model, and HateBERT, a re-trained variation of the BERT model used for detecting abusive language and other similar tasks.

Keywords

Machine Learning, Natural Language Processing, BERT, Abusive Language

Citation

Ballinger, N. (2022). Using a BERT-based Ensemble Network for Abusive Language Detection. Computer Science and Computer Engineering Undergraduate Honors Theses Retrieved from https://scholarworks.uark.edu/csceuht/108

Download

Included in

Artificial Intelligence and Robotics Commons, Databases and Information Systems Commons, Graphics and Human Computer Interfaces Commons, Information Security Commons, Programming Languages and Compilers Commons, Systems Architecture Commons

COinS

Computer Science and Computer Engineering Undergraduate Honors Theses

Using a BERT-based Ensemble Network for Abusive Language Detection

Date of Graduation

Document Type

Degree Name

Degree Level

Department

Advisor/Mentor

Committee Member/Reader

Committee Member/Second Reader

Abstract

Keywords

Citation

Included in

Browse

Links

Search

Computer Science and Computer Engineering Undergraduate Honors Theses

Using a BERT-based Ensemble Network for Abusive Language Detection

Author

Date of Graduation

Document Type

Degree Name

Degree Level

Department

Advisor/Mentor

Committee Member/Reader

Committee Member/Second Reader

Abstract

Keywords

Citation

Included in

Share

Browse

Links

Search