Journal of educational measurement vol:38 issue:2 pages:121-145
In this study, patterns of variation in severities of a group of raters over time or so-called "rater drift" was examined when raters scored an essay written under examination conditions. At the same time feedback was given to rater leaders (called "table leaders") who then interpreted the feedback and reported to the raters. Rater severities in five successive periods were estimated using a modified linear logistic test model (LLTM, Fischer 1973) approach. It was found that the raters did indeed drift towards the mean, but a planned comparision of the feedback with a control condition was not successful; it was believed that this was due to contamination at the table leader level. A series of models was also estimated designed to detect other types of rater effects beyond severity: a tendency to use extreme scores, and tendency to prefer certain categories. The models for these effects were found to be showing significant improvement in fit, implying that these effects were indeed present, although they were difficult to detect in relatively short time periods.