Research ArticleIMMUNOLOGY

Accurate and predictive antibody repertoire profiling by molecular amplification fingerprinting

See allHide authors and affiliations

Science Advances  11 Mar 2016:
Vol. 2, no. 3, e1501371
DOI: 10.1126/sciadv.1501371
  • Fig. 1 Assessment of errors and bias in Ig-seq using synthetic antibody spike-ins.

    Colored dots refer to V-genes represented by spike-in clones. (A and B) Clonal (A) and intraclonal (B) diversity errors of spike-ins shown in relation to spike-in clonal frequency. (C) Mispriming of biological data during multiplex PCR is shown by plotting the number of unique primers found to be associated with a V-gene and the number of read counts per V-gene. (D) A statistically significant correlation (Pearson, two-tailed, P < 0.0001) is observed between the melting temperature (Tm) of primers in the multiplex PCR primer set and read counts associated with primers in Ig-seq data. (E) A correlation of spike-in clonal frequencies from library preparation with multiplex PCR versus singleplex PCR results in an R2 = 0.56. Amplification bias was systematic because error bars were very low across replicate sequencing runs. Clones with the same V-gene were consistently under- or overamplified. Ig-seq data are from replicate library sample preparations (n = 3; data sets consisted of 4 × 105 preprocessed full-length antibody reads) from mouse splenic cDNA with synthetic spike-ins [for (A), (B), (D), and (E), data are presented as means ± SD and are from replicate data sets Reddy-PS-1, Reddy-PS-2, and Reddy-PS-3; data set Reddy-PS-1 was used for (C); see table S2]. Relative spike-in frequencies are mean values obtained from replicate libraries (n = 5) generated by singleplex PCR (see fig. S2 and table S1).

  • Fig. 2 Ig-seq with MAF.

    (A) Workflow for library preparation by MAF consists of reverse transcription (RT), multiplex PCR, and adapter extension PCR and results in amplicons ready for Ig-seq. (B) Following Ig-seq, nucleotide sequence logos show FID and RID regions with predicted levels of variability and nonvariability in degenerate and spacer regions, respectively. (C) Schematic shows the principle of MAF bias correction and its ability to provide improved accuracy of clonal frequencies. The MAF % is based on the normalized RIDcount (NRID), which is equal to the RID clonal counts divided by the MAF bias factor (FIDclonal count/RIDclonal count). In the example above, the NRID for clone Y = (2)/(2/2) = 2; clone X = 3/(4/3) = 2.25 (see Results for more details).

  • Fig. 3 MAF error correction validation with spike-ins shows the removal of nearly all erroneous clonal and intraclonal variants.

    (A) Phylogenetic trees before and after MAF error correction of intraclonal variants for a single spike-in example clone. (B) Uncorrected and MAF error–corrected intraclonal variant values compared with spike-in frequency through linear regression with a 95% prediction band. (C) Intraclonal diversity index (clonal read or RID count, uncorrected and MAF error–corrected, respectively) showing reduced dependence on frequency plotted along with a slope = 0 line and 95% prediction bands. (D) Phylogenetic trees before and after MAF error correction of clonal variants (CDR3 amino acid sequences) for a single spike-in example clone. (E) Erroneous clonal variants (uncorrected) and accurate clonal identification (MAF error–corrected) plotted as a function of spike-in frequency with linear regression fits and 95% prediction bands. The Ig-seq data sets used in this figure consisted of 1 × 106 preprocessed full-length antibody reads and were obtained from replicate library sample preparations (n = 3) from mouse splenic cDNA with synthetic spike-ins [for (B), (C), and (E), data are presented as means ± SD and are from replicate data sets IM_1a, IM_1b, and IM_1c; data set IM_1a was used for (A) and (D); see table S7]. Relative spike-in frequencies are mean values obtained from replicate libraries (n = 5) generated by singleplex PCR (see fig. S2 and table S1).

  • Fig. 4 MAF bias correction validation with spike-ins shows highly accurate clonal frequencies.

    (A) Correlation of uncorrected spike-in clonal frequencies from multiplex PCR (with new reduced primer set) with singleplex PCR results in an R2 = 0.42. (B) Correlation of spike-in clonal frequencies based on FID counting or clonal frequencies based on RID counting with singleplex PCR. Data show that FID residuals are always larger than RID residuals. (C) Correlation of MAF bias corrected spike-in clonal frequencies from multiplex PCR with singleplex PCR results in a significantly improved R2 = 0.98. MAF bias–corrected counts were based on normalized RIDcount and MAF bias factor (see Fig. 2C). (D) Correlation of uncorrected spike-in clonal frequencies using two different multiplex PCR primer sets during library preparation results in an R2 = 0.08. (E) MAF-corrected spike-in clonal frequencies using two different multiplex PCR primer sets result in a significantly improved R2 = 0.84. The Ig-seq data sets used in this figure consisted of 1 × 106 preprocessed full-length antibody reads and were obtained from replicate library sample preparations (n = 3) from mouse splenic cDNA with synthetic spike-ins [for (A) to (C), data are presented as means ± SD and are from replicate data sets IM_1a, IM_1b, and IM_1c, see table S7; data sets Reddy-PS-Compare and TAK-PS-Compare were used for (D) and (E); see table S2]. Singleplex spike-in frequencies are mean values obtained from replicate libraries (n = 5) generated by singleplex PCR (see fig. S2 and table S1).

  • Fig. 5 MAF error and bias correction substantially alters Ig-seq data from mice.

    (A) Correlation of clonotype frequencies before and after MAF correction results in an R2 = 0.52. Red dots indicate flagged hotspot error clonotypes present in uncorrected data but removed after MAF correction in all replicate data sets. (B) The top 500 clonotypes ranked according to frequency are shifted after MAF correction. Spike-in frequencies cover most of the biological frequency range. (C) Correlation of V-gene frequencies of uncorrected versus MAF-corrected data results in an R2 = 0.42. (D) The normalized intraclonal variants are decreased substantially across clones after MAF correction. (E) The MAF bias factor shows grouping based on V-genes. The Ig-seq data sets used in this figure consisted of 1 × 106 preprocessed full-length antibody reads and were obtained from replicate library sample preparations (n = 3) from mouse splenic cDNA with synthetic spike-ins [for (A) and (C), data are presented as means ± SD and are from replicate data sets IM_1a, IM_1b, and IM_1c; data set IM_1a was used for (B), (D), and (E); see table S7].

  • Fig. 6 Immunological clonal prediction status improves significantly after MAF error and bias correction.

    (A and B) Comparison of the top 100 frequency-ranked clonotypes with their corresponding somatic hypermutation and intraclonotype diversity index values. Bubble size represents the median number of nonsilent nucleotide somatic hypermutations per clonotype. Uncorrected data show poor separation based on immune status (red, hyperimmunized; blue, untreated mice). MAF error and bias corrected clonotypes are clearly separated on the basis of these three parameters. (C and D) Applying a stepwise nominal logistic regression, we determined the significant model parameters that describe the separation of clonotype data based on immune status in a multivariate fashion (see Supplementary Materials and Methods). Using three combinations of four training data sets (top 100 frequency-ranked clonotypes, n = 2 untreated, and n = 2 hyperimmunized) and two test data sets (top 100 frequency-ranked clonotypes, n = 1 untreated, and n = 1 hyperimmunized), we show the combined results from the test data sets (n = 3 untreated and n = 3 hyperimmunized). The y axis represents the model prediction probability of whether a given clonotype belongs to the hyperimmunized group. The uncorrected data have a low resolving power, whereas the MAF error and bias corrected data show significant separation. (E and F) Comparison of the sensitivity and specificity of the nominal logistic regression models. The receiver operating characteristics and area under the curve (AUC) for nominal logistic regression models are shown for the significant factors using uncorrected and MAF-corrected data (for model performance using all factors, see fig. S20). The Ig-seq data sets used in this figure consisted of 1 × 106 preprocessed full-length antibody reads and were obtained from library sample preparations of splenic cDNA with synthetic spike-ins from hyperimmunized mice (n = 3) and untreated mice (n = 3) (the data sets used for hyperimmunized are IM_1a, IM_2, and IM_3; the data sets used for untreated are UM_1, UM_2, and UM_3; see table S7).

Supplementary Materials

  • Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/2/3/e1501371/DC1

    Materials and Methods

    Fig. S1. The 5′UTR lengths of mouse IGHV transcripts.

    Fig. S2. Antibody synthetic spike-in genes.

    Fig. S3. Nucleotide sequence logos of the primer-binding regions of selected spike-in clones.

    Fig. S4. Precise library quantification by linking qPCR to ddPCR.

    Fig. S5. Annotated example of biological sequence obtained from MAF library preparation.

    Fig. S6. Design of experiments (DoE) for library preparation optimization.

    Fig. S7. Response surface methodology analysis of clonal frequency bias with uncorrected data.

    Fig. S8. Response surface methodology analysis of CDR3 diversity.

    Fig. S9. Response surface methodology analysis of clonal frequency bias with MAF-corrected data.

    Fig. S10. Comparison of V-gene coverage using new reduced primer set (TAK) and previously published primer set (Reddy-2010).

    Fig. S11. Schematic of multistage error correction pipeline.

    Fig. S12. Flow chart of multistage error correction pipeline.

    Fig. S13. Error correction effects on various bias correction methods.

    Fig. S14. Bias correction using MAF V-gene bias factor.

    Fig. S15. Comparison of bias correction with a new reduced primer set (TAK) and a previously published primer set (Reddy-2010).

    Fig. S16. Comparison of V-gene (germlines) before and after MAF correction.

    Fig. S17. The MAF bias factor across V-genes.

    Fig. S18. Correlation of MAF bias correction factor across data sets.

    Fig. S19. Nominal logistic regression modeling based on Ig-seq clonotype measurements.

    Fig. S20. Comparison of the sensitivity and specificity of the nominal logistic regression models.

    Fig. S21. Comparison of factor correlations with prediction probabilities of the nominal logistic regression models.

    Fig. S22. Various immune profiling metrics from MAF-corrected Ig-seq data.

    Fig. S23. Processing time of reads for MAF error and bias correction pipeline.

    Fig. S24. Effect of the number of reads analyzed using final MAF sample preparation conditions.

    Table S1. Ig-seq read count statistics for spike-ins following replicate library preparation by singleplex PCR (see fig. S2, B and C).

    Table S2. Ig-seq read count statistics following MAF library preparation by multiplex PCR (see Fig. 2A).

    Table S3. A comparison of the VDJ annotation tool used in this study (modified from Laserson et al. (12) with IMGT HighV-Quest.

    Table. S4. Ig-seq read count statistics for DoE for library preparation optimization.

    Table S5. A complete list of primers and sequences used in this study.

    Table S6. Error correction statistics for spike in clones.

    Table S7. Expanded Ig-seq processing statistics.

    Table S8. Synthetic genes used in this study.

  • Supplementary Materials

    This PDF file includes:

    • Materials and Methods
    • Fig. S1. The 5′UTR lengths of mouse IGHV transcripts.
    • Fig. S2. Antibody synthetic spike-in genes.
    • Fig. S3. Nucleotide sequence logos of the primer-binding regions of selected spike-in clones.
    • Fig. S4. Precise library quantification by linking qPCR to ddPCR.
    • Fig. S5. Annotated example of biological sequence obtained from MAF library preparation.
    • Fig. S6. Design of experiments (DoE) for library preparation optimization.
    • Fig. S7. Response surface methodology analysis of clonal frequency bias with uncorrected data.
    • Fig. S8. Response surface methodology analysis of CDR3 diversity.
    • Fig. S9. Response surface methodology analysis of clonal frequency bias with MAF-corrected data.
    • Fig. S10. Comparison of V-gene coverage using new reduced primer set (TAK) and previously published primer set (Reddy-2010).
    • Fig. S11. Schematic of multistage error correction pipeline.
    • Fig. S12. Flow chart of multistage error correction pipeline.
    • Fig. S13. Error correction effects on various bias correction methods.
    • Fig. S14. Bias correction using MAF V-gene bias factor.
    • Fig. S15. Comparison of bias correction with a new reduced primer set (TAK) and a previously published primer set (Reddy-2010).
    • Fig. S16. Comparison of V-gene (germlines) before and after MAF correction.
    • Fig. S17. The MAF bias factor across V-genes.
    • Fig. S18. Correlation of MAF bias correction factor across data sets.
    • Fig. S19. Nominal logistic regression modeling based on Ig-seq clonotype measurements.
      Fig. S20. Comparison of the sensitivity and specificity of the nominal logistic regression models.
    • Fig. S21. Comparison of factor correlations with prediction probabilities of the nominal logistic regression models.
    • Fig. S22. Various immune profiling metrics from MAF-corrected Ig-seq data.
    • Fig. S23. Processing time of reads for MAF error and bias correction pipeline.
    • Fig. S24. Effect of the number of reads analyzed using final MAF sample preparation conditions.
    • Table S1. Ig-seq read count statistics for spike-ins following replicate library preparation by singleplex PCR (see fig. S2, B and C).
    • Table S2. Ig-seq read count statistics following MAF library preparation by multiplex PCR (see Fig. 2A).
    • Table S3. A comparison of the VDJ annotation tool used in this study (modified from Laserson et al. (12) with IMGT HighV-Quest.
    • Table. S4. Ig-seq read count statistics for DoE for library preparation optimization.
    • Table S5. A complete list of primers and sequences used in this study.
    • Table S6. Error correction statistics for spike in clones.
    • Table S7. Expanded Ig-seq processing statistics.
    • Table S8. Synthetic genes used in this study.

    Download PDF

    Files in this Data Supplement:

Navigate This Article