PURPOSE The generalizability and trustworthiness of deep learning (DL)-based algorithms depend on the size and heterogeneity of training datasets. However, because of patient privacy concerns and ethical and legal issues,… Click to show full abstract
PURPOSE The generalizability and trustworthiness of deep learning (DL)-based algorithms depend on the size and heterogeneity of training datasets. However, because of patient privacy concerns and ethical and legal issues, sharing medical images between different centers is restricted. Our objective is to build a federated DL-based framework for PET image segmentation utilizing a multicentric dataset and to compare its performance with the centralized DL approach. METHODS PET images from 405 head and neck cancer patients from 9 different centers formed the basis of this study. All tumors were segmented manually. PET images converted to SUV maps were resampled to isotropic voxels (3 × 3 × 3 mm3) and then normalized. PET image subvolumes (12 × 12 × 12 cm3) consisting of whole tumors and background were analyzed. Data from each center were divided into train/validation (80% of patients) and test sets (20% of patients). The modified R2U-Net was used as core DL model. A parallel federated DL model was developed and compared with the centralized approach where the data sets are pooled to one server. Segmentation metrics, including Dice similarity and Jaccard coefficients, percent relative errors (RE%) of SUVpeak, SUVmean, SUVmedian, SUVmax, metabolic tumor volume, and total lesion glycolysis were computed and compared with manual delineations. RESULTS The performance of the centralized versus federated DL methods was nearly identical for segmentation metrics: Dice (0.84 ± 0.06 vs 0.84 ± 0.05) and Jaccard (0.73 ± 0.08 vs 0.73 ± 0.07). For quantitative PET parameters, we obtained comparable RE% for SUVmean (6.43% ± 4.72% vs 6.61% ± 5.42%), metabolic tumor volume (12.2% ± 16.2% vs 12.1% ± 15.89%), and total lesion glycolysis (6.93% ± 9.6% vs 7.07% ± 9.85%) and negligible RE% for SUVmax and SUVpeak. No significant differences in performance (P > 0.05) between the 2 frameworks (centralized vs federated) were observed. CONCLUSION The developed federated DL model achieved comparable quantitative performance with respect to the centralized DL model. Federated DL models could provide robust and generalizable segmentation, while addressing patient privacy and legal and ethical issues in clinical data sharing.
               
Click one of the above tabs to view related content.