LAUSR.org creates dashboard-style pages of related content for over 1.5 million academic articles. Sign Up to like articles & get recommendations!

Generating synthetic mixed discrete-continuous health records with mixed sum-product networks

Photo from wikipedia

OBJECTIVE Privacy is a concern whenever individual patient health data is exchanged for scientific research. We propose using mixed sum-product networks (MSPNs) as private representations of data and take samples… Click to show full abstract

OBJECTIVE Privacy is a concern whenever individual patient health data is exchanged for scientific research. We propose using mixed sum-product networks (MSPNs) as private representations of data and take samples from the network to generate synthetic data that can be shared for subsequent statistical analysis. This anonymization method was evaluated with respect to privacy and information loss. MATERIALS AND METHODS Using a simulation study, information loss was quantified by assessing whether synthetic data could reproduce regression parameters obtained from the original data. Predictors variable types were varied between continuous, count, categorical, and mixed discrete-continuous. Additionally, we measured whether the MSPN approach successfully anonymizes the data by removing associations between background and sensitive information for these datasets. RESULTS The synthetic data generated with MSPNs yielded regression results highly similar to those generated with original data, differing less than 5% in most simulation scenarios. Standard errors increased compared to the original data. Particularly for smaller datasets (1000 records), this resulted in a discrepancy between the estimated and empirical standard errors. Sensitive values could no longer be inferred from background information for at least 99% of tested individuals. DISCUSSION The proposed anonymization approach yields very promising results. Further research is required to evaluate its performance with other types of data and analyses, and to predict how user parameter choices affect a bias-privacy trade-off. CONCLUSION Generating synthetic data from MSPNs is a promising, easy-to-use approach for anonymization of sensitive individual health data that yields informative and private data.

Keywords: health; synthetic data; mixed discrete; sum product; product networks; mixed sum

Journal Title: Journal of the American Medical Informatics Association : JAMIA
Year Published: 2022

Link to full text (if available)


Share on Social Media:                               Sign Up to like & get
recommendations!

Related content

More Information              News              Social Media              Video              Recommended



                Click one of the above tabs to view related content.