Date of Graduation


Document Type


Degree Name

Doctor of Philosophy in Educational Statistics and Research Methods (PhD)

Degree Level



Rehabilitation, Human Resources and Communication Disorders


Allison A. Boykin

Committee Member

Ronna C. Turner

Second Committee Member

Xinya Liang

Third Committee Member

Brandon L. Crawford


Constructed-response, Hierarchical rater model, Model-data fit, Performance assessments, Posterior predictive model checking, Rater


Fitting wrongly specified models to observed data may lead to invalid inferences about the model parameters of interest. The current study investigated the performance of the posterior predictive model checking (PPMC) approach in detecting model-data misfit of the hierarchical rater model (HRM). The HRM is a rater-mediated model that incorporates components of the polytomous item response theory (IRT) model, such as the partial credit model (PCM) and generalized partial credit model (GPCM), at the second level of the hierarchy, to model examinees’ responses to performance assessments. To date, the HRM has not been rigorously evaluated using PPMC techniques. Monte Carlo simulations were employed to explore the effectiveness of 13 discrepancy measures in detecting model-data misfit of the HRM. Misfits were assessed at the test-, item-, and rater-level. Using the HRM-GPCM, data were generated by varying the rating design (fully-crossed and spiral), proportion of aberrant raters (no rater effects and 25% of the raters with rater effects), and number of examinees (250 and 500). Data generated were analyzed using the HRM-PCM and HRM-GPCM with eight raters and four items. Type I error and power rates were computed for each discrepancy measure.

The results indicate that the standard deviation of the total score was the only useful discrepancy measure at the test level. Furthermore, the item-total correlation and odds ratio were found to be powerful in detecting misspecification of the HRM-PCM at the item level. Of the three rater-level discrepancy measures, only the score-estimate correlation and rater-total correlation were adequate in detecting the misfit of the HRM-PCM. The performance of the discrepancy measures in detecting misfit of HRM-PCM differed by the magnitude of the item discrimination parameters. The impact of the simulation factors on detecting misfit of the HRM-PCM and implications are further discussed.