OBJECTIVE The All of Us Research Program makes individual-level data available to researchers while protecting the participants' privacy. This article describes the protections embedded in the multistep access process, with… Click to show full abstract
OBJECTIVE The All of Us Research Program makes individual-level data available to researchers while protecting the participants' privacy. This article describes the protections embedded in the multistep access process, with a particular focus on how the data was transformed to meet generally accepted reidentification risk levels. METHODS At the time of the study, the resource consisted of 329 084 participants. Systematic amendments were applied to the data to mitigate reidentification risk (eg, generalization of geographic regions, suppression of public events, and randomization of dates). We computed the reidentification risk for each participant using a state-of-the-art adversarial model specifically assuming that it is known that someone is a participant in the program. We confirmed the expected risk is no greater than 0.09, a threshold that is consistent with guidelines from various US state and federal agencies. We further investigated how risk varied as a function of participant demographics. RESULTS The results indicated that 95th percentile of the reidentification risk of all the participants is below current thresholds. At the same time, we observed that risk levels were higher for certain race, ethnic, and genders. CONCLUSIONS While the reidentification risk was sufficiently low, this does not imply that the system is devoid of risk. Rather, All of Us uses a multipronged data protection strategy that includes strong authentication practices, active monitoring of data misuse, and penalization mechanisms for users who violate terms of service.
               
Click one of the above tabs to view related content.