Research ArticleCOMPUTER SCIENCE

Breaking medical data sharing boundaries by using synthesized radiographs

See allHide authors and affiliations

Science Advances  02 Dec 2020:
Vol. 6, no. 49, eabb7973
DOI: 10.1126/sciadv.abb7973
  • Fig. 1 Concept of constructing a public database without disclosing patient-sensitive data.

    The GAN in each hospital consists of a generator G and a discriminator D. During training, patient-sensitive data (shown in red) are never exhibited to the generators G directly. Patient-sensitive data is only exhibited to discriminator D while it is trying to differentiate between real and synthesized radiographs. After training is completed, only the generators G are transferred to a public database and can be used to generate synthesized radiographs.

  • Fig. 2 Pooled GAN training can improve pneumonia detection by enriching the diversity of the dataset.

    (A) Distributions of MS-SSIM of randomly selected 2450 pneumonia-positive pairs. Higher diversity of pneumonia cases in the GAN-augmented dataset is confirmed by a lower MS-SSIM (GAN-augmented MS-SSIM: 0.18 ± 0.09 versus RSNA subset MS-SSIM: 0.24 ± 0.12). (B)The performance of the classifier when trained on 1000 x-rays from the GAN-enriched dataset (healthy: 450 real and 400 synthesized; pneumonia: 50 real and 100 synthesized) reaches an AUC of 0.81 in pneumonia detection, outperforming that of a classifier trained on 1000 real x-rays (healthy, 950; pneumonia, 50) that reaches an AUC of 0.74. (C) Similarly, improved performance measures were found for sensitivity (Sens), specificity (Spec), accuracy (Accu), PPV, NPV, and F1 score. We used a test set of 6000 x-rays randomly sampled from the RSNA dataset to calculate those scores. The GANs used to generate the synthesized x-rays were trained based on the NIH and Stanford datasets.

  • Fig. 3 Using pooled synthesized data from different sites, classification performance can be increased.

    To simulate the scenario in Fig. 1, two classifiers were trained and compared: a classifier solely trained on anonymous radiographs generated with the NIH-GAN (blue) and a classifier trained on the pooled anonymized dataset generated with the NIH-GAN and the Stanford-GAN (red). The schematic of the data selection process is shown in (A). AUC, sensitivity, and specificity for the seven diseases are given in (B). In particular, the classification performances of formerly problematic cases such as edema, consolidation, and pneumonia were boosted by merging data from multiple sites (red arrows).

  • Fig. 4 Federated learning facilitates GAN training when facing insufficient amounts of local data.

    Hospitals can use federated learning algorithms to train a global GAN, and the central GAN deposit can serve as a hub. (A) Illustration of the GAN-related federated learning system. After local model initialization, local hospitals B and C (in red frames) were selected to update their local models. The global generator and discriminator were updated by the weights (w) transferred to the aggregation server (red arrows). All local models were subsequently redefined by the updated global GAN (blue arrows). The exchange of local and global weights continued until the global GAN converged. (B) Discriminator loss curves for three trained Wasserstein GANs. The Wasserstein GAN trained by federated averaging algorithm (federated 20 × 1k) outperformed the centralized GAN trained on only 1000 x-rays (centralized 1k) and performed comparably to the centralized 20k GAN. (C) FID evaluations of the GAN training process.

  • Fig. 5 Learned pathological features.

    (A) Generation of the disease-specific pixel map. A randomly chosen vector with 512 Gaussian distributed entries characterizes one specific patient. The GAN was tasked with generating a healthy and a diseased radiograph of that patient (cardiomegaly in this example). A subtraction map was generated to denote the changes brought about by the disease and was superimposed as a colormap over the generated healthy radiograph. (B) Disease-specific patterns generated by the generator for an exemplary randomly drawn pseudopatient. Red denotes higher signal intensity in the pathological radiograph, while blue denotes lower signal intensity. Note that for some diseases such as cardiomegaly and edema, the pattern looks realistic, while the GAN struggled with diseases that have a variable appearance and where ground truth data are limited, e.g., pneumonia. (C) Revealing correlations within generated pathological radiographs by the classifier trained on the real dataset. For each pathology, 5000 random synthesized radiographs with a pathology label drawn from a uniform distribution between 0.0 and 1.0 were generated. The images were then rated by the classifier network, and Pearson’s correlation coefficient was calculated for each pairing of pathologies [shown in (C) with the GAN cardiomegaly label on the x axis and the cardiomegaly and fibrosis classifier output on the y axis in red and blue, respectively]. (D) Resulting correlation coefficients for all 14 × 14 pairings are displayed and color coded in (B). Clustering on the x axis (i.e., the GAN label axis) was performed to group related diseases. The obtained clustered blocks are marked with white-bordered boxes.

  • Table 1 Real/synthesized radiographs test.

    Accuracy and interreader agreement for the group of three CV experts, three radiologists, and all readers when differentiating whether the presented radiograph is real or synthesized.

    256 × 256512 × 5121024 × 1024
    Accuracy, %Fleiss’ kappaAccuracy, %Fleiss’ kappaAccuracy, %Fleiss’ kappa
    CV experts60 ± 5−0.0367 ± 170.0782 ± 40.46
    Radiologists51 ± 50.1065 ± 50.1877 ± 130.39
    All readers55 ± 70.0067 ± 140.0780 ± 100.37

Supplementary Materials

  • Supplementary Materials

    Breaking medical data sharing boundaries by using synthesized radiographs

    Tianyu Han, Sven Nebelung, Christoph Haarburger, Nicolas Horst, Sebastian Reinartz, Dorit Merhof, Fabian Kiessling, Volkmar Schulz, Daniel Truhn

    Download Supplement

    This PDF file includes:

    • Preprocessing steps in CheXpert dataset
    • Network training details
    • Experimental subset selection details
    • Diversity of generated radiographs
    • Advantages of the proposed GAN-based approach
    • Privacy concerns in imaging modalities other than x-rays
    • Alternative GMs
    • Adversarial robustness
    • Domain adaption using a Cycle-GAN
    • Figs. S1 to S5
    • Tables S1 to S5
    • Algorithm S1
    • References

    Files in this Data Supplement:

Stay Connected to Science Advances

Navigate This Article