Guenole and Brown (2014) have shown how failure to meet invariance criteria affects to path coefficients in SEM. In applied research context, these authors suggest testing non-invariance to detect possible… Click to show full abstract
Guenole and Brown (2014) have shown how failure to meet invariance criteria affects to path coefficients in SEM. In applied research context, these authors suggest testing non-invariance to detect possible undesired effects in the subsequent model evaluation. According to this line of argument, this work intends to show the negative consequences of ignoring the property of invariance when a scale is used with selection or diagnostic purposes. A scale is invariant when subjects from different groups with the same level on the latent variable have the same probability of obtaining equal test score. However, invariance is not an all-or-nothing judgment. Inmulti-group Confirmatory Factor Analysis (CFA), four levels of invariance are defined (Meredith, 1993): configural invariance (prerequisite of same factorial structure), metric invariance (MI) or weak invariance (equality of factor loadings), scalar or strong invariance (equality of factor loadings and intercepts), and strict invariance (equality of factor loadings, intercepts and residuals). When a multi-group CFA is conducted, the evaluation of these types of invariance consists on a stepwise procedure from the least restrictive solutions (configural vs. MI) to the most restrictives (MI vs. strong and strong vs. strict), using nested χ2 tests (Brown, 2015). Consequently, the evaluation of MI is a necessary requirement to compare group scores (Millsap, 2011). In the parallel model of Classical Test Theory (CTT), MI is directly related to reliability1. In this model all items have the same standardized factor loading (λ), and the communality (λ2) is equal to the average correlation of the scale. Consequently, for a scale of n items, reliability of a given value of λ can be calculated from the standardized alpha coefficient: α = nλ2 / (1+ (n− 1)λ2). Relationship between reliability and predictive validity was first established by Gulliksen (1950) and his attenuation formula. However, the effect of loss of reliability in one of the groups of the sample over the predictive validity is not sufficiently known. What happens when discriminability of some items (i.e., their factor loadings) is different between groups and the instrument is used to make predictions on a dichotomous pass/fail test criterion? How can this MI problem interfere with the correct classification of subjects? This paper aims to explore common practices in applied research that usually ignore MI evaluation (Borsboom, 2006). In this paper, we will try to show the need to reconsider the practical usefulness of psychological tests and scales in decision-making, due to the biased in the correct classification of the subjects.
               
Click one of the above tabs to view related content.