Caries experience detection is prone to misclassification. For this reason, calibration exercises which aim at assessing and improving the scoring behavior of dental raters are organized. During a calibration exercise, a sample of children is examined by the benchmark scorer and the dental examiners. This produces a 2 × 2 contingency table with the true and possibly misclassified responses. The entries in this misclassification table allow to estimate the sensitivity and the specificity of the raters. However, in many dental studies, the uncertainty with which sensitivity and specificity are estimated is not expressed. Further, caries experience data have a hierarchical structure since the data are recorded for the surfaces nested in the teeth within the mouth. Therefore, it is important to report the uncertainty using confidence intervals and to take the clustering into account. Here we apply a Bayesian logistic multilevel model for estimating the sensitivity and specificity. The main goal of this research is to find the factors that influence the true scoring of caries experience accounting for the hierarchical structure in the data. In our analysis, we show that the dentition type and tooth or surface type affect the quality of caries experience detection.