Research ArticleBIOENGINEERING

Universal microbial diagnostics using random DNA probes

See allHide authors and affiliations

Science Advances  28 Sep 2016:
Vol. 2, no. 9, e1600025
DOI: 10.1126/sciadv.1600025
  • Fig. 1 Schematic of UMD platform.

    (A) Genomic DNA is extracted from a bacterial sample and thermal-cycled with M random DNA probes. The genome- probe binding is quantified, producing a probe-binding vector y; in this study, the random probes are in the form of MBs, and the DNA-probe binding is quantified by the ratio of open/hybridized to closed/nonhybridized MBs. (B) The hybridization binding level of each probe to a potentially large reference database of N bacterial genomes (B1, B2, …, BN) is predicted using a thermodynamic model and stored in an M × N hybridization affinity matrix Φ. NCBI, National Center for Biotechnology Information. (C) Assuming that K bacterial species comprises the sample, the probe-binding vector y is a sparse linear combination of the corresponding K columns of the matrix Φ weighted by the bacterial concentrations x, that is, y = Φx + n, where the vector n accounts for noise and modeling errors. When K is small enough and M is large enough, Φ can be effectively inverted using techniques from compressive sensing, yielding the estimate for the microbial makeup of the sample x; in this illustration, the K = 2 bacteria-labeled B2 and B7 are present in the sample.

  • Fig. 2 Random probe design and hybridization affinity computation process.

    (A) DNA sequence structure of five test random DNA probes. (B) Both strands of the bacterial genome (blue lines) are first thermodynamically aligned with the probe sequence. The sequence of the bacteria is segmented into fragments of roughly equal length (~100 to 200 nt), each containing a significant hybridization affinity with the probe. Then, all of the bacterial fragments and probe sequences along with the experimental conditions are fed into the DNA software (18) to predict all stable probe-bacteria complexes and concentrations. These concentrations, in aggregate, determine the concentration of opened MBs, which is defined as the hybridization affinity of the probes to the bacterial genome. (C) Example of a predicted probe-bacteria fragment binding with many base pair mismatches.

  • Fig. 3 Binding patterns of five random probes correctly identify the bacteria present in nine diverse bacterial samples.

    (A) Experimentally measured FRET ratios to quantify hybridization between bacterial DNA and probes 1 to 5. (B) Hybridization affinity between DNA samples and probes converted from FRET ratio through the probe characteristic curve fit equations (table S1 and fig. S2). (C) Heat map of normalized inner products between the experimentally obtained hybridization affinity and predicted hybridization affinities (by thermodynamic model) for nine DNA samples as a measure of the similarity of the probe measurements to the bacteria in the data set. DNA samples are clustered into three groups: (i) exact sequence known, (ii) exact sequence unknown, and (iii) clinical isolates (whose exact sequence is unknown). UMD correctly recovers the diagonally highlighted bacterium (with inner product >0.9). (D) The average ROC curve of UMD in detecting nine bacteria, assuming the independence of the different experiments. Each point on the curve corresponds to a threshold value between [−1,1]. UMD achieves high values of the AUC (AUC > 0.9). (E) Correlation of measured and simulated hybridization affinities and the NRMSE of the prediction (straight line corresponds to maximum correlation). All experiments were performed in triplicate, and the results shown here average over the trials with the error bars representing SEM.

  • Fig. 4 Performance of UMD platform in genus-level recovery of 40 species listed as the most common human infectious genera by CDC with different number of random probes M and noise variance σ.

    (A) The ROC curve in detecting single bacterium (K = 1) with different noise levels. σ0 = 2.4 × 10−8 M denotes the variance of the additive white Gaussian noise used in the simulation. This value is obtained from the experiments in Fig. 3 by calculating the propagated variance of measured FRET ratios. UMD performs more accurately with lower noise variance. The detection is almost perfect (AUC > 0.95) under noise variance σ = σ0/5. (B) The average ROC curve in detecting single bacterium using different number of random probes M and fixed noise variance σ = σ0. The detection performance universally improves over all the 40 species by increasing the number of random probes. With 15 random probes, UMD achieves almost perfect detection performance (AUC > 0.95). (C) The percentage of simulated trials, where K bacteria present in the samples were recovered correctly with zero false positives, among all possible Embedded Image bacteria mixtures (blue and red curves corresponding to K = 2 and K = 3 bacteria, respectively). Simulations were repeated 1000 times with randomly selected MBs, and error bars represent SD. (D and E) Confusion matrices illustrating the detection result of UMD using M = 3 and M = 10 probes selected by the GPS algorithm.

Supplementary Materials

  • Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/2/9/e1600025/DC1

    Mathematical formulation of the comprehensive sensing (CS) detection and estimation algorithms

    Complete list of bacterial strains used in UMD simulations

    fig. S1. Comparison of the sloppiness of random probes to other MBs.

    fig. S2. Random probes’ characteristic curves.

    fig. S3. Experimentally measured FRET ratios to quantify hybridization between 11 bacterial DNA samples and probes 1 to 5.

    fig. S4. Hybridization affinity between 11 bacterial DNA samples and probes 1 to 5.

    fig. S5. Detection performance of 11 bacterial samples using five random probes.

    fig. S6. Comparison of the predicted concentrations of bacterial DNA with the experimentally measured values.

    fig. S7. Performance of UMD in species-level recovery of 24 strains of Staphylococcus and 23 strains of Vibrio.

    fig. S8. Performance of UMD in identifying pathogens in genus level using 15 GPS probes.

    fig. S9. Performance of UMD in identifying eight pathogenic and one nonpathogenic E. coli strains using GPS probes.

    fig. S10. Performance of UMD in identifying the composition of several complex samples.

    table S1. The fitted parameter to the probes’ characteristic curves.

    References (3135)

  • Supplementary Materials

    This PDF file includes:

    • Mathematical formulation of the comprehensive sensing (CS) detection and estimation algorithms
    • Complete list of bacterial strains used in UMD simulations
    • fig. S1. Comparison of the sloppiness of random probes to other MBs.
    • fig. S2. Random probes’ characteristic curves.
    • fig. S3. Experimentally measured FRET ratios to quantify hybridization between 11 bacterial DNA samples and probes 1 to 5.
    • fig. S4. Hybridization affinity between 11 bacterial DNA samples and probes 1 to 5.
    • fig. S5. Detection performance of 11 bacterial samples using five random probes.
    • fig. S6. Comparison of the predicted concentrations of bacterial DNA with the experimentally measured values.
    • fig. S7. Performance of UMD in species-level recovery of 24 strains of Staphylococcus and 23 strains of Vibrio.
    • fig. S8. Performance of UMD in identifying pathogens in genus level using 15 GPS probes.
    • fig. S9. Performance of UMD in identifying eight pathogenic and one nonpathogenic E. coli strains using GPS probes.
    • fig. S10. Performance of UMD in identifying the composition of several complex samples.
    • table S1. The fitted parameter to the probes’ characteristic curves.
    • References (3135)

    Download PDF

    Files in this Data Supplement: