Research ArticleBIOCHEMISTRY

Blind prediction of noncanonical RNA structure at atomic accuracy

See allHide authors and affiliations

Science Advances  25 May 2018:
Vol. 4, no. 5, eaar5316
DOI: 10.1126/sciadv.aar5316
  • Fig. 1 SWM efficiently searches the complex energy landscapes of noncanonical RNA loops.

    (A and B) SWM trajectories solve a GCAA tetraloop [Protein Data Bank (PDB) ID: 1ZIH) (A) and a two-strand GG-mismatch two-way junction (1F5G) (B) in 10 moves or less (left). Final structures achieve low free energies and sub-angstrom RMSD accuracies; numerous such structures appear in simulations involving 100 models (right-hand panels). (C) Significantly reduced CPU time is required for convergence of SWM compared to enumeration by SWA (11), except for loops drawn from the 23S ribosomal RNA (rRNA) (red). (D to F) SWM models for J2/3 (D) from group II intron (3G78), modeled with the energy function previously used for SWA, and 23S rRNA loop (1S72) (E) and L2 loop (F) of viral pseudoknot (1L2X), both modeled with updated Rosetta free energy function, illustrate sub-angstrom recovery of irregular single-stranded loops excised from crystal structures.

  • Fig. 2 SWM recovers noncanonical base pairs ab initio for complex RNA motifs.

    From left to right in each panel: 2D diagram with problem definition, 2D diagram with experimental noncanonical base pairs, experimental 3D model, SWM 3D model, and 3D overlay (experimental, marine; SWM model, salmon). (A to H) Motifs are (A) most conserved domain of human signal-recognition particle (PDB ID: 1LNT); (B) noncanonical junction from human thymidylate synthase regulatory motif, RNA-Puzzle 1 (PDB ID: 3MEI); (C) irregular J5/5a hinge from the P4-P6 domain of the Tetrahymena group I self-splicing intron (PDB ID: 2R8S); (D) P2-P3-P6 three-way A-minor junction from the Varkud satellite nucleolytic ribozyme, RNA-Puzzle 7 (PDB ID: 4R4V); (E) tertiary contact stabilizing the Schistosoma hammerhead nucleolytic ribozyme (PDB ID: 2OEU); (F) tetraloop/receptor tertiary contact from the P4-P6 domain of the Tetrahymena group I self-splicing intron (PDB ID: 2R8S); (G) T-loop/purine interaction from yeast tRNAphe involving three chemically modified nucleotides (PDB ID: 1EHZ); and (H) RNA quadruplex including an inosine tetrad (PDB ID: 2GRB). Colors indicate accurately recovered noncanonical features (pastel colors), accurately recovered extrahelical bulges (wheat with white side chains), flanking helices built de novo (violet), parts of experimental structure used for modeling but allowed to minimize (dark violet), fixed context from experimental structure (black in 2D and white in 3D), and additional helical context not included in modeling (gray in 2D and white in 3D).

  • Fig. 3 SWM modeling and prospective experimental tests of previously unsolved tetraloop/receptor motifs.

    (A) Ab initio SWM models for canonical 11-nt tetraloop/receptor motif and alternative motifs discovered through in vitro selection that have resisted crystallization. Lavender, salmon, lime, and teal colorings highlight homologous structural features. During modeling, the bottom flanking helix (white) was allowed to move relative to the top helices of the receptor and tetraloop (gray), which were held fixed. (B) Canonical 11-nt tetraloop receptor module from the P4-P6 domain of the Tetrahymena group I self-splicing intron (PDB ID: 2R8S). In (A) and (B), red asterisks mark uracil residues predicted to be bulged. (C) CMCT mapping of the receptors installed into the P4-P6 domain of the Tetrahymena ribozyme (tetraloop and receptor indicated by black boxes) supports the bulged uracils in the predicted models (black asterisks). (D) Selective tests of each R(1) receptor base pair by compensatory mutagenesis in tectoRNA dimer. Rescue by double and triple mutants (black bars) was compared to energetic perturbations predicted based on the sum of effects (white bars) of component mutations or, more conservatively, to the single mutants. *P < 0.05, **P < 0.001, and ****P < 1 × 10−6 (computed by Student’s t test for difference of means); n.s., not significant. (E) Overall 3D model of tectoRNA dimer with SWM model for R(1) receptor. WT, wild-type.

  • Fig. 4 Blind prediction of a complex RNA tertiary fold during RNA-Puzzle 18.

    (A) Two-dimensional diagram of the RNA-Puzzle 18 (Zika xrRNA) modeling problem, highlighting motifs that needed to be built de novo in red (left) and SWM-predicted pairings (pastel colors; right). WC, Watson-Crick; HG, Hoogsteen. (B) Structures discovered by SWM (green) are lower in energy and ~4 Å from models from conventional fragment assembly (FARFAR; blue); note that x axis is RMSD to the lowest free energy SWM model, not the experimental structure (unavailable at the time of modeling). (C and D) Magnified view of noncanonical region built de novo for SWM model submitted for RNA-Puzzle competition (C) and the subsequently released crystal structure (D). (E) and (F) give overlays in magnified and global views, respectively (SWM, salmon; crystal, marine). (G) Fraction of noncanonical base pairs recovered and RMSD to native model obtained by Rosetta modeling (black; larger and smaller symbols are SWM and FARFAR, respectively) and other laboratories (gray) for RNA-Puzzle 18. Points recovering zero noncanonical pairs are given a small vertical perturbation to appear visually distinct.

  • Table 1 Benchmark of SWM compared to previous Rosetta FARFAR over different classes of RNA structure motifs.
    CategoryMotif propertiesBest of five cluster centers
    RMSD (Å)*FNWC†,*
    No. of motifsLength*Strands*SWMFARFARSWMFARFAR
    Single helix or multiple helices with crystallographic context provided
      Trans-helix loop15610.833.291.000.77
      Apical loop44.511.142.961.001.00
      Two-way junction147.520.741.151.001.00
      Multi-helix junction51131.911.930.800.33
      Tertiary contact108.521.251.780.830.50
    Multiple helices without crystallographic context provided
      Two-way junction15721.591.401.000.55
      Multi-helix junction51032.603.450.400.20
      Tertiary contact88.522.892.130.360.20
      Non-helix embedded51042.814.300.800.71
    Overall82721.491.930.960.67

    *Median values reported. Mean values given in tables S3 and S4.

    †Fraction of non–Watson-Crick pairs from experimental structure observed in computational model.

    • TRP4P6_WT2_0000DMS, CMCT, and 1M7 for GAAA/11-nt (wild type) receptor
      TRP4P6_C72_0000DMS, CMCT, and 1M7 for GAAA/C7.2 receptor
      TRP4P6_C7X_0000DMS, CMCT, and 1M7 for GAAA/C7.10 receptor
      TRP4P6_R1J_0000DMS, CMCT, and 1M7 for GAAA/C7.2 receptor
      TRP4P6_R1J_0001R(1) compensatory mutants tested by 1M7 and DMS

    Supplementary Materials

    • Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/4/5/eaar5316/DC1

      Supplementary Text

      Supplementary Methods

      fig. S1. Illustrated descriptions and modeling constraints of all 82 benchmark test cases.

      fig. S2. Rosetta free energy versus RMSD summaries of SWM modeling runs for 82 complex RNA motifs.

      fig. S3. Comparison of model accuracy between SWM and fragment assembly of RNA with FARFAR over an 82 motif benchmark.

      fig. S4. Potential routes to overcome limitations in Rosetta free energy function.

      fig. S5. Compensatory mutagenesis of the R(1) receptor read out through chemical mapping.

      fig. S6. Comprehensive single mutant analysis of the tetraloop receptor R(1).

      fig. S7. Global fold changes between the template viral xrRNA and the Zika xrRNA structure prediction challenge.

      fig. S8. Other models of RNA-Puzzle 18 (Zika xrRNA).

      table S1. A comparison of the SWA and SWM methods using the same energy function as the original SWA benchmark set of trans-helix single-stranded loops, and SWM results using the updated Rosetta free energy function (SWM*).

      table S2. Updates to the Rosetta energy function.

      table S3. Detailed performance of the stepwise Monte Carlo algorithm on 82 benchmark cases.

      table S4. Detailed performance of the FARFAR algorithm on 82 benchmark cases.

      table S5. Measurements of interaction free energy between R(1) mutant tetraloop receptors and GGAA tetraloop.

      data file S1. Three-dimensional SWM models canonical 11-nt:GAAA, R(1):GGAA, C7.2:GAAA, and C7.10:GAAA tetraloop/receptors in PDB format.

      References (3587)

    • Supplementary Materials

      This PDF file includes:

      • Supplementary Text
      • Supplementary Methods
      • fig. S1. Illustrated descriptions and modeling constraints of all 82 benchmark test cases.
      • fig. S2. Rosetta free energy versus RMSD summaries of SWM modeling runs for 82 complex RNA motifs.
      • fig. S3. Comparison of model accuracy between SWM and fragment assembly of RNA with FARFAR over an 82 motif benchmark.
      • fig. S4. Potential routes to overcome limitations in Rosetta free energy function.
      • fig. S5. Compensatory mutagenesis of the R(1) receptor read out through chemical mapping.
      • fig. S6. Comprehensive single mutant analysis of the tetraloop receptor R(1).
      • fig. S7. Global fold changes between the template viral xrRNA and the Zika xrRNA structure prediction challenge.
      • fig. S8. Other models of RNA-Puzzle 18 (Zika xrRNA).
      • table S1. A comparison of the SWA and SWM methods using the same energy function as the original SWA benchmark set of trans-helix single-stranded loops, and SWM results using the updated Rosetta free energy function (SWM*).
      • table S2. Updates to the Rosetta energy function.
      • table S3. Detailed performance of the stepwise Monte Carlo algorithm on 82 benchmark cases.
      • table S4. Detailed performance of the FARFAR algorithm on 82 benchmark cases.
      • table S5. Measurements of interaction free energy between R(1) mutant tetraloop receptors and GGAA tetraloop.
      • References (35–87)

      Download PDF

      Other Supplementary Material for this manuscript includes the following:

      • data file S1. Three-dimensional SWM models canonical 11-nt:GAAA, R(1):GGAA, C7.2:GAAA, and C7.10:GAAA tetraloop/receptors in PDB format.

      Files in this Data Supplement:

    Navigate This Article