Advancing reading disabilities identification and risk prediction has been the focus of reading disabilities research for over three decades. Despite considerable research effort, approaches to identification and classification that primarily… Click to show full abstract
Advancing reading disabilities identification and risk prediction has been the focus of reading disabilities research for over three decades. Despite considerable research effort, approaches to identification and classification that primarily rely on a single factor have not proved to have sufficient reliability (Schatschneider et al., 2016). No single risk factor or risk assessment approach has demonstrated superiority. This finding is not surprising because when cut points are placed on an underlying continuous distribution, measurement error will result in cases ending up on opposite sides of the cut point upon repeated assessments (Francis et al., 2005). In recent years, new models have emerged that combine multiple risk factors, leading to better accuracy and positive predictive levels than approaches that primarily rely on single factors (Schatschneider et al., 2016). One possible explanation of why the identification and risk prediction of reading disabilities have remained inaccurate may be how reading disabilities risk has traditionally been conceptualized and modeled. Until recently, most studies have examined risk factors in isolation or within circumscribed sets of risk models combined in what appears to be intuitive ways (e.g., sum scores). The implicit assumption of such approaches is that a linear combination of a relatively small number of factors is sufficient for risk prediction. Yet, based on research findings to date, this approach might need to be complemented by an alternative approach to accurately classify children who might be at risk for reading disabilities, thereby capturing the complexity of the disorder. We posit that accurate identification and risk prediction of reading disabilities is a complex classification problem because many unique factors must be simultaneously integrated and considered in an algorithm for an accurate prediction. Traditional statistical techniques (e.g., logistic regression) might be less well-suited to model such complex algorithms. Machine learning and Bayesian inference, on the other hand, hold advantages that circumvent some of the drawbacks of traditional statistics. Here, we will describe three advantages. First, traditional statistical techniques require an a priori algorithm with prespecified type, number, and relations between factors and can test only the exact specifications. In contrast, machine learning techniques do not require risk factors or predictors to be prespecified. These methods consider highly complex relations among many potential factors to determine the optimal classification algorithm, thereby considering the multidimensional space in which children might differ in reading outcomes. Although parameters can be adjusted and set forth by the researcher, the machine largely determines the optimal path through the data. Second, machine learning techniques are highly flexible, and the resulting models can accommodate highly complex combinations among factors, including nonlinear relations. Given recent advances in computing power, simultaneous consideration of thousands of different factors is now possible within a single machine-learning model. Third, classification models based on Bayesian inference are relatively common in medicine and comparatively rare in risk prediction for reading disabilities and education. Bayesian-based prediction models can provide more accurate predictions than traditional frequentist approaches when informative priors (e.g., prevalence rates) exist and when the data are not entirely determinative for accurate classification. An example comes from mammography for detecting breast
Click one of the above tabs to view related content.