For measurements to be accurate and precise, measurement errors should be small. In the anthropometry and craniofacial identification literature, four methods are commonly used for assessing measurement error: Pearson's product… Click to show full abstract
For measurements to be accurate and precise, measurement errors should be small. In the anthropometry and craniofacial identification literature, four methods are commonly used for assessing measurement error: Pearson's product moment correlation coefficient (r), intra-class correlation coefficients (ICC), statistical significance tests (often reported by P-values) and the technical error of measurement (TEM; also known as Dalberg's error/ratio). In this paper, the performance of all four of these statistics were evaluated using maximum cranial lengths (g-op) from Howells (n=2524), by duplicating the dataset and mathematically adding known degrees of error to the second set. This was repeated under a broad array of trials (2000 total) each with slightly different amounts of error simulation to comprehensively assess the four error metrics in terms of descriptive power and utility, using the same data for each of the four error assessment methods. Data simulations included the addition of random and systematic errors of different sizes with absolute differences ranging from 1 to 50mm (or in relative terms, 28% of the original measurement). Two sample sizes (n=25 and 2524 individuals) were explored and all analyses were conducted in R. P-values from Student's t-tests only showed significant differences (P<0.05) for the larger sample size when the error was systematic. Small samples, and/or any with random error, did not yield low or significant P-values (P<0.05). When raw differences were <4mm for 95% of the sample (n=2524), the ICC and r were high (>0.97) and remained so even after tripling the error, such that 95% of the sample possessed raw differences up to 12mm (r=0.8). In contrast, the TEM was low initially (<2mm or r-TEM<1%), and then increased (<4.5mm and 2.5%, TEM and r-TEM respectively). These data show that P-values, ICC and r values hold substantial limits for error description as they do not always flag error well. In contrast, TEM appears to covary with error more saliently and holds the advantage that changes are reported in the units of the original measurement. For these reasons, TEM is recommended in favour to P-values, ICC and r.
               
Click one of the above tabs to view related content.