Motivated by the Medical Expenditure Panel Survey containing data from individuals' medical providers and employers across the United States, we propose a new semiparametric procedure for predicting whether a patient… Click to show full abstract
Motivated by the Medical Expenditure Panel Survey containing data from individuals' medical providers and employers across the United States, we propose a new semiparametric procedure for predicting whether a patient will incur high medical expenditure. Problems of the same nature arise in many other important applications where one would like to predict if a future response occurs at the upper (or lower) tail of the response distribution. The common practice is to artificially dichotomize the response variable and then apply an existing classification method such as binomial regression or a classification tree. We propose a new semiparametric prediction rule to classify whether a future response occurs at the upper tail of the response distribution. The new method can be considered a semiparametric estimator of the Bayes rule for classification and enjoys some nice features. It does not require an artificially dichotomized response and better uses the information contained in the data. It does not require any parametric distributional assumptions and tends to be more robust. It incorporates nonlinear covariate effects and can be adapted to construct a prediction interval and hence provides more information about the future response. We provide an R package plaqr to implement the proposed procedure and demonstrate its performance in Monte Carlo simulations. We illustrate the application of the new method on a subset of the Medical Expenditure Panel Survey data.
               
Click one of the above tabs to view related content.