Research ArticleSTRUCTURAL BIOLOGY

# Native proteins trap high-energy transit conformations

See allHide authors and affiliations

Vol. 1, no. 9, e1501188

## Abstract

During protein folding and as part of some conformational changes that regulate protein function, the polypeptide chain must traverse high-energy barriers that separate the commonly adopted low-energy conformations. How distortions in peptide geometry allow these barrier-crossing transitions is a fundamental open question. One such important transition involves the movement of a non-glycine residue between the left side of the Ramachandran plot (that is, ϕ < 0°) and the right side (that is, ϕ > 0°). We report that high-energy conformations with ϕ ~ 0°, normally expected to occur only as fleeting transition states, are stably trapped in certain highly resolved native protein structures and that an analysis of these residues provides a detailed, experimentally derived map of the bond angle distortions taking place along the transition path. This unanticipated information lays to rest any uncertainty about whether such transitions are possible and how they occur, and in doing so lays a firm foundation for theoretical studies to better understand the transitions between basins that have been little studied but are integrally involved in protein folding and function. Also, the context of one such residue shows that even a designed highly stable protein can harbor substantial unfavorable interactions.

Keywords
• ramachandran plot
• protein folding
• dipeptide conformation
• conformational transition
• peptide geometry
• strain
• protein stability
• transition state
• disallowed conformation

## INTRODUCTION

Proteins carry out a myriad of functions that are enabled by their three-dimensional structures, and decades of research have led to more than 100,000 structures in the Protein Data Bank [PDB (1)] and substantial understanding of protein folding and dynamics [for example, (2, 3)]. In pioneering work, Ramachandran and co-workers (4) introduced the ϕ and ψ torsion angles to describe protein backbone conformations (see Fig. 1, A and B), defining some conformations as “allowed” and others as “disallowed” due to collisions between atoms. Now, state-of-the-art energetics calculations (5) and the distributions of ϕ,ψ angles seen in high-resolution protein structures (6, 7) recapitulate the main features of the original ϕ,ψ plots remarkably well. For alanine-like residues (Fig. 1D), these include two well-populated low-energy regions—typically called the α and β basins—on the left-hand side of the plot (having ϕ < 0°) and a single, smaller, reasonably populated low-energy basin—called αL—on the right-hand side (having ϕ ~ +60°).

Although much study has been devoted to the geometries and relative energetics of the well-populated basins [for example, (8)], how alanine-like residues cross the high-energy barriers near ϕ = 0° or +135° (Fig. 1D) that match classically-disallowed regions and separate the common conformations having ϕ < 0° from those having ϕ ~ +60° proves to be much more difficult to study [for example, (5, 9)] and remains poorly understood. As estimated by Faller et al. (5), the heights of the barriers between the basins are about 5 to 7 kcal/mol (Fig. 1D). These barriers are much lower than the barrier (~20 kcal/mol) associated with cis-trans isomerization of proline that can be rate-limiting for folding (10), and thus, the transitions should not be rate-limiting but rather common occurrences during protein folding. Such transitions have also been seen to be important for regulatory conformational switches that govern the function of certain proteins, such as modulating peptide binding by an Src homology 2 domain (SH2 domain) (11) or switching between the low- and high-affinity states of the cell adhesion mediator CD44 (12).

As noted above, a residue must cross one of the two high-energy swaths near ϕ = 0° or +135° to transition between the populated conformations having ϕ < 0° or ϕ ~ +60°. These regions were classically described as disallowed because of collisions between the carbonyl carbon (C) or the Cβ carbon, respectively, and the peptide oxygen of the previous residue (O−1). For example, with standard peptide geometry (13), the O−1…C approach at ϕ = 0° is 2.32 Å (Fig. 1, A to C), which is much closer than the expected extreme contact limit of 2.7 Å (14). Like all transition states, these high-energy transit conformations are expected to be only fleetingly populated and inaccessible to direct experimental characterization, so that there cannot be certainty about what the transition structures really look like. Contrary to this expectation, we have discovered and describe here high-resolution observations of a series of conformations that have been trapped in native protein structures deposited in the PDB and that cover the full range of the ϕ ~ 0° transitions. The analysis of these observations provides an experimentally derived detailed map of the geometric distortions that take place during these conformational transitions.

## RESULTS

### Reliably modeled residues exist in the two high-energy passes near ϕ = 0°.

While surveying the conformations of residues in high-resolution (≤1.5 Å) protein structures, we were surprised to discover two narrow strings of observations that span completely across the classically disallowed transition regions near ϕ = 0° (an upper one with ψ ~ +90° and a lower one with ψ ~ −90°) as well as a few sporadic observations in the regions near ϕ = +135° (Fig. 1D). The existence of residues adopting conformations in the two “mountain passes” through the ϕ ~ 0° high-energy landscape can be seen in some previously published Ramachandran plots [for example, (15, 16)], but, to our knowledge, the reliability and potential importance of these residues have not been investigated. Even a recent paper that explicitly focused on describing residues in sparsely populated regions of the Ramachandran plot made no mention of these residues, which is consistent with them not being considered as reliably observable (15). We carried out visual checks of each of the putative transition residues against its electron density (for example, Fig. 2 and fig. S1) and found that most are reliably defined (Fig. 1D, circles). All the reliably defined residues have ϕ,ψ angles roughly falling within the predicted lowest-energy passes through the high-energy terrain (Fig. 1D). Furthermore, as might be anticipated, the observed residues having ϕ ~ 0° that are not in the low-energy passes were found to be the result of incorrect or unreliable modeling (Fig. 1D, triangles). Because the reliably determined residues with high-energy conformations near ϕ = 0° are real and relatively abundant (146 observations in the −35° < ϕ < 35° transit zone; see table S1), they represent fortuitous “natural experiments” that provide an unprecedented ability to experimentally define at high resolution exactly how the standard peptide geometry becomes distorted as a residue passes through these highly strained conformational transition states. Although 15 residues in the passes near ϕ ~ +135° are also well defined (Fig. 1D), those populations are not yet sufficiently large enough to enable an accurate description of the pathways they represent.

### The ϕ ~ 0° transition residues exist in diverse contexts

The ϕ ~ 0° transition residues trapped in native proteins exist in a variety of conformational contexts (fig. S2) and are distributed among 17 of the 20 standard residue types (table S2), implying that they are not special cases but represent realistic snapshots along a transition pathway. Many of these residues are present in or near active sites, but others are not (for example, fig. S3). The cases occurring in two proteins are particularly instructional. In one case, the occurrence proves that even a small, highly stable, designed helical bundle with a melting temperature of 105°C can accommodate a residue with such high local strain energy (fig. S3A). In the second case, it has been shown that a simple Cys-to-Ala mutation that removes a single hydrogen bond in the active site of an isocyanide hydratase (fig. S3B) leads to the rearrangement of a short backbone segment and the loss of the high-energy conformation (17). Furthermore, it was also shown that a Cys-to-Ser mutation that strengthened the hydrogen bond actually enhanced the stability of the segment in the native conformation (17). This example implies that the energy cost for a residue adopting a high-energy transition conformation can apparently be offset by the formation of a single hydrogen bond and the rearrangement of a few residues.

### Systematic ϕ-dependent bond-angle distortions allow passage through the transition region

On a Ramachandran plot, the strip of observations near ψ = −90° nearly perfectly matches through inversion symmetry that near ψ = +90° (see Fig. 1D, green lines, and table S3), making it reasonable to treat the two passes as a single phenomenon, roughly doubling the density of observations available for mapping the barrier crossing. To define the patterns of distortion that allow peptides to traverse this barrier, we calculated ϕ-dependent average values for the O−1…C distance and all backbone bond angles. Given the diverse contexts of the residues, treating them as an aggregate should average out specific features due to each particular context and provide a view of the generic transition properties that are solely due to local factors and are generally relevant. This is supported by previous studies showing that the average conformation dependence of backbone bond angles and planarity, found in ultrahigh-resolution protein structures, agrees well with those from quantum mechanics calculations of simple model compounds and those from structures of small peptides (1821).

The behavior of the O−1…C distance is striking (Fig. 3A). The average values near ϕ = ±60° track with the distance expected for standard geometry, until the distance reaches 2.8 Å (near ϕ ~ ±50°), and then distortion begins and the average distance decreases much less rapidly than predicted by standard geometry, until it reaches ~2.7 Å (near ϕ = ±25°). Then, between ϕ of −25° and +25°, the observed distance is remarkably flat, with the average distance of 2.68 ± 0.02 Å over that range matching remarkably well with the 2.7 Å “extreme approach limit” for these atom types defined nearly 50 years ago (14).

The ϕ-dependent variations of the backbone bond angles are also systematic, with each angle roughly matching its standard value at ϕ = ±60° and varying smoothly to its maximal deformation at ϕ = 0°. Only three bond angles—∠O−1-C−1-N, ∠C−1-N-Cα, and ∠N-Cα-C—expand substantially, with expansions of roughly 2°, 6°, and 4°, respectively (Fig. 3B). The lesser expansion of ∠O−1-C−1-N is consistent with the expectation that as a purely sp2-hybridized center, it would have a higher force constant for resisting distortion. Given the expanding ∠O−1-C−1-N angle, the ∠Cα−1-C−1-O−1 and ∠Cα−1-C−1-N angles decrease in a coordinated fashion by ~1° and 1.5°, respectively, to keep the C−1 carbonyl group largely planar.

To check the validity of treating the ψ ~ +90° and ψ ~ −90° passes as equivalent, we analyzed the data from the two passes separately and found that all angles behaved similarly, except that for the ψ ~ −90° transition, ∠Cα-C-N+1 also expands ~2° (fig. S4), as makes sense to minimize the clash between the N+1 hydrogen and Cβ (Fig. 1B). We note that, for two reasons, these empirical bond angle distortions may slightly underestimate the actual average distortions: first, structures in the 1.0 to 1.5 Å resolution range are still somewhat influenced by refinement restraints tethering them to the standard values (18, 22), and second, at ϕ = 0°—the point of expected maximal distortion—because of limited data, the empirical value is an average over the broad ϕ range of −12.5° to +12.5° (table S4).

### An analytical model for the transition is not matched by predictions from molecular mechanics

These observed ϕ dependencies of the backbone bond angles were modeled as a set of smooth conformation-dependent functions (Fig. 3B, green curves; table S3) that could be used to generate prototype models for the conformational transition. That these yield O−1…C distances (Fig. 3A, green line) matching reasonably well with the empirical averages supports the validity of these functions as capturing a realistic general model for how the ϕ ~ 0° transition is traversed. As noted above, the variations of the individual observations from the average behavior (such as in the examples shown in Fig. 2) are not primarily due to experimental uncertainty but are real variations reflecting the forces caused by the unique tertiary environments that stabilize the transit conformations.

To assess how accurately a state-of-the-art molecular mechanics force field handles these high-energy transition conformations, we used AMBER and the FF99SB force field [recently demonstrated (23) to perform best in a protein modeling test] to minimize conformational energy while restraining ϕ and ψ to the values along the upper narrow transit path. The energy-minimized O−1…C separation distances (Fig. 3A, orange line) and backbone bond angles (fig. S5) showed qualitative similarity to the empirical variations but were not in good quantitative agreement: the limiting O−1…C approach was ~0.15 Å too high and four bond angles had notable systematic displacements from the empirical values, with the largest difference of ~4° occurring for ∠C−1-N-Cα (fig. S5). These discrepancies imply that the empirical conformational details defined here for the ϕ ~ 0° high-energy conformations provide a real advance in our understanding of this transition, and represent a rare resource for enhancing force field parameterizations.

## DISCUSSION

The observation of these conformations and their conformation-dependent bond angles represent a remarkably detailed experimental characterization of two important conformational transition states that had not been thought to be accessible to direct observation. Importantly, the residues adopting these conformations are not transition-state analogs, artificially held in place by a covalent modification that might alter the pathway; rather, they are authentic residues that are free to transition through the barrier, yet are stabilized partway through by noncovalent interactions with their environment. The fact that the ϕ,ψ angles of the observed transition residues match so well with the low-energy pathway calculated for an isolated dipeptide (Fig. 1D) supports the conclusion that neither the specific protein environments nor the cryogenic temperatures at which most of the structures were determined have changed the nature of the pathway.

In one sense, these images contribute to our understanding of how this transition occurs in the same way that Muybridge’s striking “series of instantaneous photographs” of horses provided information previously considered unobservable and showed “with absolute accuracy the motions of horses when walking, trotting, and running” (24). These proved that all four legs of the horse are off the ground roughly half of the time even during a trot. Similarly, the observations presented here provide indisputable evidence that proteins can truly adopt these unfavorable ϕ ~ 0 conformations and, on the basis of direct observation, can reveal, in high-resolution detail, the nature of the bond angle deformations that are involved. Just as Muybridge’s photographs strung together could provide an observation-based movie of a horse in motion, our empirically derived analytical functions allow us to generate such a movie of a peptide traversing the mountain pass (movie S1).

In contrast, although molecular simulations are powerful, if simulations were the only source of information, many uncertainties would remain. One illustration of this is the discrepancies between the approach distances and distortions observed here and those predicted by the AMBER force field (Fig. 3A and fig. S5). Another illustration is a molecular dynamics study of the conformational switch in the SH2 domain for which a residue goes from the αL basin (ϕ,ψ ~ +60°, +60°) to the β basin (ϕ,ψ ~ −60°, +120°). Acknowledging they could not be certain which was the preferred path, and on the basis of a lower predicted energy in their molecular mechanics calculations, the authors proposed that the residue traversed the longer path through the high-energy pass near ϕ ~ +135 (11). Our results suggest that the shorter path through the mountain pass at ϕ,ψ ~ +0°,+90° should be reconsidered as an a priori more likely path.

In terms of the larger picture of protein folding and function, these analyses bring new clarity on how this fundamentally important transition occurs and the level of distortions that peptides are subject to. As such, they provide a foundation for future investigations of the important but little studied area of high-energy barrier crossings and open the door for a richer understanding of folding routes and conformational transitions. On a practical level, this work also provides conformation-dependent restraints, similar to those previously developed for well-populated areas of the Ramachandran plot (18, 22) that can both guide force field development and enhance the accuracy that can be achieved in experimental [for example, (25)] and predictive [for example, (26, 27)] modeling of proteins having residues adopting these rare but important conformations in the ϕ ~ 0° transition region. Finally, this study holds the promise that other high-energy transition conformations can be similarly characterized as the size of the PDB increases and more such observations accumulate.

This work also provides some insight into the thermodynamics of native proteins. It is well known that naturally occurring proteins are not optimized for stability, and this has recently been dramatically illustrated by the creation of a set of designed proteins adopting five different folds and having melting temperatures above 95°C (28) and of a similarly designed set of superstable helical bundles, one of which had a stability of ~60 kcal/mol and a melting temperature above 135°C (29). In the latter report, it was concluded that “low-energy structures must have unstrained backbone conformations…” (29), but this is not the case given that one of the proteins in our sample was a highly stable designed protein (fig. S3A). That example and the other example noted above in which the high-energy conformation was apparently stabilized by the folding of just a small segment of the protein (fig. S3B) emphasize two things: first, the potential stability achievable by a folded protein is so high that even highly stable proteins may still contain many suboptimal and even some highly unfavorable interactions, and second, suboptimal interactions (that is, “frustration”) present in native proteins need not only be present in the form of many slightly unfavorable interactions but can also include individual interactions that are even as high as 5 to 7 kcal/mol destabilizing.

## MATERIALS AND METHODS

### Protein Geometry Database searches

The data set plotted as small dots in Fig. 1D was created using the Protein Geometry Database (PGD) (30), and it includes 616,212 non-glycine residues. Each of these residues is at the center of a three-residue segment that has backbone, average side-chain, and γ-atom B-factors ≤ 25 Å2 and is present in a protein crystal structure refined to Rwork/Rfree ≤ 0.2/0.25 at a resolution of 1.5 Å resolution or better and from a protein having ≤90% sequence identity to any other structure in the set. To obtain amino acid frequencies that are representative of diverse sets of proteins (table S2), another smaller data set was generated using a ≤25% sequence identity cutoff.

### Manual curating of the observations in the high-energy passes

On the basis of the above search, all observations having their ϕ torsion angle in one of the high-energy pass regions, either −35° < ϕ < +35° or 110° < ϕ < 160°, were manually curated as to the reliability of their conformation on the basis of a visual assessment of the fit to their electron density map. Using conservative criteria, each residue was designated as either reliable (shown as large black dots in Fig. 1D) or unreliable (shown as triangles in Fig. 1D). Residues designated as reliable had to have a strong and well-defined electron density that is not highly anisotropic and a model that was well fit in that density. Observations that lead residues to be deemed unreliable also included the presence of alternate conformations or a close association with uninterpreted density that might indicate alternate conformations. These criteria erred on the side of possibly excluding residues that may have been accurately modeled, rather than including any residues that might not be accurately modeled.

### Generation of modeled peptide structures

All peptides were generated using the PeptideBuilder Python program and library (31), which was slightly modified to be able to handle ϕ-dependent equations instead of single-value standard geometries.

### Calculations of the protein geometries

The set of curated observed residues in the −35° < ϕ < +35° range output by the PGD was used as input for a custom script written in R, which made use of the Bio3D (32) package to read PDB files and then calculate specific geometric details for all residues of interest for each protein, excluding any residue not having at least two residues on both sides of it without a chain break. The quantities calculated included all of the relevant backbone torsion angles and bond angles and the O−1…C distances. Bond lengths were not analyzed because it has been shown that their variations are too small to be reliably determined in crystal structures at these resolutions, and also because the conformation-dependent variations are too small to substantially affect modeling accuracy (18, 25). Even those quantities available from the PGD search were recalculated so that the information could also be obtained for the noncrystallographic symmetry (ncs) mates of the PGD hits (which are not present in the PGD). This allowed the 100 unique and curated residues with ϕ in the −35° to +35° range to be expanded by the addition of 46 ncs chains (which were also manually curated and deemed as reliable), making for 146 total observations.

### Statistical analyses and least-squares modeling of the data

All averages and standard errors of the mean (SEM) were calculated using conventional formulae written in R. Best-fit lines in Fig. 1D were generated using principal components analysis to fit an orthogonal linear regression, due to experimental uncertainty in both x and y values. The function prcomp() in R was used, where

x = phi

y = psi

r = prcomp(~x + y)

slope = r$rotation[2,1]/r$rotation[1,1]

intercept = r$center[2] − slope * r$center[1]

Independent best-fit lines were calculated for each transit region separately, and there was negligible difference between the two (table S3 and Fig. 1D). The dependence of bond angles on ϕ was fit using the geom_smooth() function from the R package “ggplot2,” while specifying “formula = (y ~ I(cos(x * pi/120))),” where x is the central value in the phibin and y is the mean value of the bond angle.

### AMBER minimizations

AMBER calculations were performed using AMBER12 (33) and the FFSB99 force field. Peptides were capped with N-terminal acetyl and C-terminal N-methyl amide groups. SANDER minimizations were done every 1° in ϕ over the range of −60° to +60° with the ϕ and ψ torsion angles restrained using “NMR restraints” of 300.0 kcal/mol*rad and the ψ target value set according to the best-fit line given in table S3 for the ψ > 0 pass. Minimizations were carried out two ways, once starting from standard backbone bond angles and another time starting from the ϕ-dependent backbone bond angles as defined in the equations in table S3. For all calculations, the dielectric constant was set to 80 and minimizations were run for 2000 cycles. The results based on both starting points were equivalent; hence, only one is shown in Fig. 3A and fig. S5.

## SUPPLEMENTARY MATERIALS

Fig. S1. Electron density evidence for a reliable residue adopting a conformation in the +110° < ϕ < +160° range.

Fig. S2. ϕ,ψ angles describing the local conformational context of the mountain pass residues.

Fig. S3. Four diverse examples showing the contexts of residues adopting a ϕ ~ 0° conformation.

Fig. S4. How the average bond angle variations obtained by treating the ψ ≤ 0° and ψ ≥ 0° transitions separately compare with each other and with those based on the combined data.

Fig. S5. AMBER minimizations of alanine dipeptides distort bond angles to alleviate the O−1 … C steric clash in ϕ ~ 0 conformations.

Table S1. Complete list of analyzed ϕ ~ 0 mountain pass residues.

Table S2. Frequency of amino acid types in the mountain pass transition region.

Table S3. Equations governing ϕ-dependent changes in geometry during transition through the mountain pass.

Table S4. Further details of data plotted in Fig. 3 including the ranges for and numbers of observations in each ϕ bin and the average distances and angles.

Movie S1. An alanine dipeptide animation generated according to the “general” model of the ψ ~ +90° conformational transition described in this paper.

References (34, 35)

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.

## REFERENCES AND NOTES

Acknowledgments: We thank O. Guvench for providing the data for the energy contours in Fig. 1D. Funding: This work was supported by NIH grant R01-GM083136 (to P.A.K.). Author contributions: The project was conceived by P.A.K.; experiments, analyses, and figure preparation were carried out by A.E.B.; and writing was done by A.E.B. and P.A.K. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data are available in the PDB, with specific details reported in table S1.
View Abstract