Research ArticleGENETICS

Structure-specific DNA recombination sites: Design, validation, and machine learning–based refinement

See allHide authors and affiliations

Science Advances  24 Jul 2020:
Vol. 6, no. 30, eaay2922
DOI: 10.1126/sciadv.aay2922
  • Fig. 1 Integron attC recombination sites.

    (A) Schematic of the integron system. Pint, integrase promoter; intI, integrase gene; Pc, cassette promoter; attI, cassette insertion site; attC, cassette attachment site. (B) Schematic of a folded bottom strand of attC recombination site.

  • Fig. 2 Synthetic attC site recombination.

    (A and B) Schematic of the bottom strands of attC sites with constraints used by the two versions of the algorithm generating synthetic sites: with constraints based on empirical results (A) and with constraints based on bioinformatic analysis of wild-type attC sites (B). N, arbitrary base. *Base generated according to the probability distribution of each base in the sequence of the R box in wild-type attC sites (table S1). (C) Predicted structures of the paradigmatic attCaadA7 site and synthetic attC sites, with seven sites generated by each version of the algorithm. All structural predictions were performed using ViennaRNA 2.1.8 package. (D) Schematic of the suicide conjugation assay to measure attC site recombination frequency. A plasmid carrying an attC site is conjugated into a strain that does not have the machinery for its replication. However, the plasmid and the chloramphenicol resistance marker (CmR) that it carries can be maintained in the recipient strain through attI1×attC recombination. The recombination frequency can be measured as the ratio of chloramphenicol-resistant cells to all recipient cells. (E) Recombination frequencies of an empty vector in the presence or absence of IntI1 integrase (negative controls), attCaadA7 (positive control), and synthetic attC sites. Values represent means of three independent experiments; error bars represent mean absolute error. Asterisks (*) indicate that the recombination frequency was below detection level, indicated by the bar height.

  • Fig. 3 Synthetic attC sites embedded into protein coding regions.

    (A) In silico directed evolution approach that was used to generate synthetic attC sites with peptide linker properties (B-E) and to embed synthetic attC sites into lacZ (F-H). (B) Structures of the three synthetic attC sites encoding peptide linkers L1, L2, and L3, predicted using ViennaRNA 2.1.8. (C) Protein sequences of encoded peptide linkers. (D) Recombination frequencies of attCaadA7 (positive control) and synthetic attC sites encoding peptide linkers. Values represent means of three independent experiments; error bars represent mean absolute error. Asterisk (*) indicates that the recombination frequency was below detection level, indicated by the bar height. (E) Results of a bacterial two-hybrid assay with the two domains of Bordetella pertussis adenylate cyclase fused either with a natural linker (p5, positive control), a natural linker with a frameshift mutation (p5FS, negative control), or synthetic attC sites encoding peptide linkers. Values represent means of three independent experiments; error bars represent mean absolute error. (F) Predicted structures of the synthetic attC sites embedded into four regions of the lacZ gene encoding β-galactosidase, predicted using ViennaRNA 2.1.8. (G) Protein sequences of the four β-galactosidase target regions and sequences after attC site embedding. Blue, mutations that preserve the amino acid physicochemical properties; red, other nonsilent mutations. (H) Strains with the four synthetic attC sites embedded into lacZ and streaked on an LB agarose plate with X-gal and isopropyl-β-D-thiogalactopyranoside. Blue color indicates a functional β-galactosidase, as in lacZwt. White color indicates that an embedded attC site perturbed the function of the β-galactosidase, as in lacZ::attCaadA7.

  • Fig. 4 Enrichment of attCr0 mutant library in sites with higher recombination frequencies.

    (A) Schematic of the folded bottom strand of attCr0 used for library construction. The region submitted to mutational analysis is shown in blue. (B) Recombination frequencies of the library throughout cycles of recombination (blue). The recombination frequency of attCr0 is used for comparison (black). Values represent means of three independent experiments; error bars represent mean absolute error. (C) Heat map of enrichment values for all pairwise attCr0 mutants. At the intersection of nucleotides depicted along each axis, a series of points represent enrichment values corresponding to all pairwise mutants of these nucleotides. Inset: Example for all pairwise mutants of C50 and T52.

  • Fig. 5 Analysis of ML results.

    (A) Performance of four different ML algorithms in regression, measured as described in Supplementary Materials. (B) Correlation between the measured and the predicted enrichment values given by the random forest regression (RFR) algorithm for attCr0 site mutants from the library test dataset (Pearson r = 0.81, P < 0.01). (C) Features used by RFR, ranked according to their importance measure. Inset: Most important features (score > 0.01). (D) Mapping of the most important nonglobal features (score > 0.01) onto the predicted structure of attCr0.

  • Fig. 6 Confirmation of ML-derived hypotheses for three synthetic attC sites.

    Structural predictions, positional entropies of bases, and recombination frequencies of initial and mutated sites attCr2 (A and B), attCr6 (C and D), and attCr11 (E and F). Mutations are shown in red. Arrows indicate EHBs that were stabilized in low entropy state through mutations. All structural predictions were performed using ViennaRNA 2.1.8 package. Recombination values represent means of three independent experiments; error bars represent mean absolute error.

Supplementary Materials

  • Supplementary Materials

    Structure-specific DNA recombination sites: Design, validation, and machine learning–based refinement

    Aleksandra Nivina, Maj Svea Grieb, Céline Loot, David Bikard, Jean Cury, Laila Shehata, Juliana Bernardes, Didier Mazel

    Download Supplement

    This PDF file includes:

    • Sections S1 to S13
    • Figs. S1 to S7
    • Tables S1 to S3
    • References

    Files in this Data Supplement:

Stay Connected to Science Advances

Navigate This Article