LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Privacy-preserving statistical analyses in Learning Health Systems

Photo from wikipedia

A Learning Health System (LHS) is one in which “internal data and experience are systematically integrated with external evidence, and that knowledge is put into practice”. To accomplish this goal,… Click to show full abstract

A Learning Health System (LHS) is one in which “internal data and experience are systematically integrated with external evidence, and that knowledge is put into practice”. To accomplish this goal, we will need to analyze large volumes of routinely collected health data. However, creating data sets that span clinical populations poses significant problems of privacy and data governance. The article by Toh et al. demonstrates a possible way around these privacy and governance challenges. To advance personalized medicine, we need to develop tools that can predict how the outcomes of diseases or treatments will vary based on a profile of individual patient characteristics. Developing predictive models with sufficient precision to guide the tailoring of treatments to individuals requires large data sets. However, assembling large data sets is a challenge, in part because the relevant data are often held by independent stakeholders, as in the case considered by Toh, where data on BMI and antibiotic exposure have been collected by PEDSnet, a data-sharing consortium of pediatric hospitals. One way to do this is to export the data tables from each hospital and pool them in a common table (see Fig. 1, panel a). However, pooling individual data across hospital boundaries requires the fortification of the data pool to protect patient privacy, as well as procedures to control who is authorized to view the data. This is expensive and risky. But as Toh et al. demonstrate, for analyses based on ordinary least squares regression and some generalized linear models (hereafter, “standard regression”), it is possible to analyze a multiinstitution data set without pooling the data across institutions. They do this by exploiting a fact in mathematics: standard regressions do not require the analysis of individual data. You can estimate standard regressions from summary statistics (e.g., for ordinary least squares regression, the variable means and the covariance matrix). Figure 1 (panel b) illustrates this. Each hospital calculates the statistics summarizing its local data. The summary statistics are then exported and used to calculate pooled summary statistics, from which the analysts estimate the regression. Toh et al. showed that the results of the pooled individual data (panel a) and pooled summary data (panel b) approaches were identical. Although this was never in doubt, the demonstration illustrates the value of the method. The pooled individual data analysis versus pooled summary statistics analysis contrast is closely related to the difference between individual participant data meta-analysis (IPDMA) and standard meta-analysis. IPDMA pools individual-level data from all the controlled trials of an intervention to estimate a common treatment effect, while standard meta-analysis harvests means and standard deviations from each trial to the same end. Given that pooling individual participant data is expensive and time consuming, why would we ever do it? Is there ever a need to construct pooled, cross-hospital individual-level pediatric data sets? Unfortunately, unlike standard regressions, many analyses require more than pooled summary statistics. As Toh et al. note, these analytical computations use iterative optimization algorithms that repeatedly use individual-level data. Examples include nonlinear models, models involving clustering and nesting of subjects, Bayesian statistics, and nearly every species of machine learning. Iterative optimization is often required in predictive analytics, genomics, health geography, psychometrics, and population health. Unlike standard regressions, in these analyses you cannot estimate the parameters only from the summary statistics. Instead, you estimate them with an algorithm like this:

Keywords: data sets; analysis; learning health; privacy; health; summary statistics

Journal Title: Pediatric Research
Year Published: 2020

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.