Date of Graduation
5-2019
Document Type
Dissertation
Degree Name
Doctor of Philosophy in Educational Statistics and Research Methods (PhD)
Degree Level
Graduate
Department
Rehabilitation, Human Resources and Communication Disorders
Advisor/Mentor
Turner, Ronna C.
Committee Member
Liang, Xinya
Second Committee Member
Lo, Wen-Juo
Third Committee Member
Robinson, Samantha E.
Keywords
item response theory; psychometrics
Abstract
Statistical models used for estimating skill or ability levels often vary by field, however their underlying mathematical models can be very similar. Differences in the underlying models can be due to the need to accommodate data with different underlying formats and structure. As the models from varying fields increase in complexity, their ability to be applied to different types of data may have the ability to increase. Models that are applied to educational or psychological data have advanced to accommodate a wide range of data formats, including increased estimation accuracy with sparsely populated data matrices. Conversely, the field of online gaming has expanded over the last two decades to include the use of more complex statistical models to provide real-time game matching based on ability estimates. It can be useful to see how statistical models from educational and gaming fields compare as different datasets may benefit from different ability estimation procedures. This study compared statistical models typically used in game match making systems (Elo, Glicko) to models used in psychometric modeling (item response theory and Bayesian item response theory) using both simulated data and real data under a variety of conditions. Results indicated that conditions with small numbers of items or matches had the most accurate skill estimates using the Bayesian IRT (item response theory) one-parameter logistic (1PL) model, regardless of whether educational or gaming data were used. This held true for all sample sizes with small numbers of items. However, the Elo and the non-Bayesian IRT 1PL models were close to the Bayesian IRT 1PL model’s estimations for both gaming and educational data. While the 2PL models were not shown to be accurate for the gaming study conditions, the IRT 2PL and Bayesian IRT 2PL models outperformed the 1PL models when 2PL educational data were generated with the larger sample size and item condition. Overall, the Bayesian IRT 1PL model seemed to be the best choice across the smaller sample and match size conditions.
Citation
Morrison, B. (2019). Comparing Elo, Glicko, IRT, and Bayesian IRT Statistical Models for Educational and Gaming Data. Graduate Theses and Dissertations Retrieved from https://scholarworks.uark.edu/etd/3201