Research ArticleCORONAVIRUS

The SARS-CoV-2 Spike variant D614G favors an open conformational state

See allHide authors and affiliations

Science Advances  16 Apr 2021:
Vol. 7, no. 16, eabf3671
DOI: 10.1126/sciadv.abf3671

Abstract

The COVID-19 (coronavirus disease 2019) pandemic underwent a rapid transition with the emergence of a dominant viral variant (from the “D-form” to the “G-form”) that carried an amino acid substitution D614G in its “Spike” protein. The G-form is more infectious in vitro and is associated with increased viral loads in the upper airways. To gain insight into the molecular-level underpinnings of these characteristics, we used microsecond all-atom simulations. We show that changes in the protein energetics favor a higher population of infection-capable states in the G-form through release of asymmetry present in the D-form inter-protomer interactions. Thus, the increased infectivity of the G-form is likely due to a higher rate of profitable binding encounters with the host receptor. It is also predicted to be more neutralization sensitive owing to enhanced exposure of the receptor binding domain, a key target region for neutralizing antibodies. These results are critical for vaccine design.

INTRODUCTION

The coronavirus SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) is responsible for the COVID-19 (coronavirus disease 2019) pandemic, a global emergency. The virus infects human cells through a process initiated by the binding of the Spike protein on the viral surface to its receptor on human cells, the angiotensin-converting 2 (ACE2) protein. The Spike protein comprises three identical protomers, noncovalently bonded protein subunits, which can adopt different conformations (Fig. 1) that can enable ACE2 interactions through the exposure of the receptor binding domain (or RBD). Once bound, the S1 subunit (Fig. 1A, red) of the protein detaches (“sheds”), and the S2 subunit (Fig. 1A, blue) triggers membrane fusion and mediates viral entry. In the “all-down” Spike conformation, all three RBDs of the protomers are packed together on the top, each one being in what is referred to as a “closed” conformation (Fig. 1A). Binding of the RBD to ACE2 is thought to require the transition of one of the protomers from a closed to a more accessible open conformation (Fig. 1B) (1, 2). This infection-capable configuration is referred to as the “one-up” Spike conformation. The “two-up” and “three-up” states, in which more than one protomer is open, are less stable and can be captured in the related SARS-CoV-1 Spike only by using stabilizing mutations in SARS-CoV-1 (3) or in the SARS-CoV-2 Spike by introducing non-native disulfide bonds (4).

Fig. 1 Structural representation of the Spike protein.

(A) The Spike complex is shown in the all-down conformation. Its S1 and S2 subunits are depicted in red and blue. (B) The Spike complex is shown in the one-up conformation. It is composed of three protomers, with the L-down protomer depicted in purple, the Up protomer in orange, and the R-down protomer in cyan. The white dashed circle indicates a D614G site. Here, only one of the three D614G sites is shown. (C) Definition of 12 domains and their residues used for analysis of simulations. It was necessary to define domains for every region for analysis, so some regions may have arbitrarily assigned names or the defined regions may not follow canonical sequence ranges. We display the domains highlighted in the Spike structure, shown from two different perspectives. Definitions of domain abbreviations: NTD, N-terminal domain; CT0, C-terminal domain 0; CT1, C-terminal domain 1; CT2, C-terminal domain 2; S2S2′, S1/S2 cleavage to S2′; FP, fusion peptide; FPR, fusion peptide region; HR1, heptad repeat 1; CH, center helix; CD, connector domain; and CD1, connector domain 1. Images in (A) and (B) were created with PyMOL (51); images in (C) were prepared using VMD (45).

A viral variant has recently emerged, carrying a single amino acid substitution in Spike at residue 614 from an aspartic acid (D) to a glycine (G) (D614G). This G-form (also referred to as G614) is now the globally prevalent form and is potentially more transmissible than the originally observed D-form (also referred to as D614) (5). It is associated with increased viral nucleic acid in the upper respiratory tract (5, 6), which potentially indicates more viruses attaching to ACE2 receptors and has higher infectivity in pseudotype virus assays in multiple cell types (5, 7). In addition, natural viruses that carry D614G are more infectious in primary human cells and are more infectious and outcompete the ancestral D614 form in animal models (8). Although there have been several studies validating the increase on the infectivity and transmissibility of D614G substitution in laboratory settings, the transition from the ancestral to the D614G form at the population level is still being investigated (6). To identify the molecular factors that determine these experimentally observed differences, we performed 20-μs-long all-atom molecular dynamics simulations of the full Spike protein in explicit solvent.

This extensive atomistic study of the differences between the globally most dominant G-form of the COVID-19 pandemic and the original D-form is of immediate relevance for ongoing vaccine and antibody-based therapeutic studies. We generated four separate sets of simulations, each with five replicas, to study both the D614 and G614 forms, in both the all-down and one-up states. Using these simulations, we compared interactions formed between 12 defined regions of the Spike protein (Fig. 1C). To identify the factors that lead to distinct dynamics for each of the four models, we compared residue-residue interactions and correlations, hydrogen bonding, and exposure of ACE-2 binding and RBD antibody epitope sites. We found that the D614G substitution was directly associated with far-reaching alterations in interactions between different protomers and indirectly associated with changes in binding site exposure. Specifically, in the G614 form, there is a rearrangement in residue-residue contacts and hydrogen bonding in comparison to the D614 form that can lead to an increase in the population of the infection-capable one-up state. A careful analysis of RBD and receptor binding motif (RBM) exposure showed that both the ACE2 binding site and critical sites for the binding of neutralizing antibodies (epitopes) in the RBD are more accessible in the one-up state (1, 9, 10).

RESULTS

Individual protomers of the Spike undergo distinct fluctuations (fig. S1). To distinguish between protomers when discussing the results, we refer to the protomer that transitions between the up and down orientations as the “Up” protomer. The other two protomers that are in the down configurations are referred to as “L-down” and “R-down,” depending on whether the protomer is on the left or right of the Up protomer, respectively (Fig. 1B).

Protein contacts are more symmetric between protomers in the G614 one-up state

To elucidate how specific interactions are altered by the D614G substitution, we calculated residue-residue contacts (11) that are formed between subunits of all three protomers. Specifically, these contacts include (i) S1-S2 interactions within a protomer (i.e., intra-protomer contacts) and (ii) S1-S1, S2-S2, and S1-S2 interactions where the subunits are from different protomers (i.e., inter-protomer contacts). Two residues were considered to be in contact if their minimum distance was within a cutoff of 6 Å. Contacts that occurred with probabilities of at least 50% were defined as “persistent” contacts.

The average number of persistent intra-protomer contacts between S1 and S2 was not affected by the D614G substitution in either the open or closed Spike conformation (Fig. 2, A and B, first three data points). Similarly, regardless of the conformational state, the inter-protomer interactions between S1-S1 and between S2-S2 subunits were largely unaffected by the substitution (Fig. 3, A and B). In contrast, the inter-protomer S1-S2 interactions are strongly affected by the D614G substitution in the open conformation (Fig. 2, A and B, last three data points). Specifically, in the all-down case, both the G- and D-forms lead to persistent numbers of contacts that are similar (within error bars) across all three protomer interfaces. This inter-protomer contact signature of the all-down Spike protein can be described as “symmetric.” In contrast, the D614 form loses the symmetrical contact signature when the Spike protein adopts the one-up conformation (Fig. 2B and Table 1), while the one-up G-form complex maintains the symmetry in the number of persistent contacts between the three protomers.

Fig. 2 D614G substitution alters inter-protomer interactions between S1 and S2 subunits.

(A and B) Total number of contacts formed at S1-S2 interfaces (A) in the all-down system and (B) in the one-up system. Appropriate labeling and dashed lines indicate intra-protomer and inter-protomer S1-S2 interactions. For each set of simulations, error bars were calculated as standard error across five replicas. In the x axis of all panels, “L” denotes the L-down protomer, “U” represents the Up protomer, and “R” denotes the R-down protomer. For instance, LS1-U2 represents the interactions between the S1 region of the L-down protomer and the S2 region of the Up protomer.

Fig. 3 S1-S1 and S2-S2 contacts.

Average total number of contacts at the S1-S1 and S2-S2 interfaces in (A) the all-down system and (B) the one-up system. For each set of simulations, error bars were calculated as standard error across five replicas. In the x axis of all panels, “L” denotes the L-down protomer, “U” represents the Up protomer, and “R” denotes the R-down protomer. For instance, LS1-US1 represents the interactions between the S1 region of the L-down protomer and the S1 region of the Up protomer.

Table 1 Average number of inter-protomer contacts between S1 and S2 regions.

View this table:

G-form RBD dynamics influenced by symmetric inter-protomer S1-S2 interactions

To further explore the factors that lead to the observed contact asymmetry, we calculated the cross-correlation matrix C of the Cα atoms for the entire complex (Fig. 4 and figs. S2 and S3). For an equilibrium system, the cross-correlation matrix is a measure of the extent to which different parts of the protein move in a coordinated fashion in the same direction (12) or in a coordinated fashion in opposing directions (blue). This interpretation holds best for small fluctuations, such as our equilibrium simulations, which obey the quasi-harmonic approximation. The greatest differences between the all-down state and the one-up state lie in the RBD-RBD and N-terminal domain (NTD)–RBD correlations (fig. S2). Consistent with the contact analysis above, we observe greater symmetry in the G614 form one-up state, relative to the D614 form one-up state. The magnitude of the correlations between and within all three RBDs are highly similar for the G614 form (Fig. 4A), but the intra-domain correlations and the U-RBD/R-RBD (anti)–correlations are stronger than the other inter-domain correlations in the D614 form (Fig. 4B, four squares in the lower right). There is also a slight asymmetry in the inter-domain correlations of the G-form all-down system (Fig. 5A in the off-diagonal elements shows slight segregation of blue and red compared with Fig. 5B in the off-diagonal elements). In other words, there are slight anticorrelations of the U protomer motions with respect to the other two in the G-form, compared to slight correlations between the R and L in the G-form, whereas there is almost no such correlation in the D-form. This observation is consistent with a subtle deviation from symmetry observed for the all-down G-form (Fig. 2A and Table 1). Overall, atomistic correlations indicate that in the one-up state, the motions of the RBDs in the G614 form are more synchronized than those in the D614 form. Such large positional correlations distal to the 614 substitution site are a signature of allostery (13, 14).

Fig. 4 Residue-residue cross-correlation matrices and hydrogen bonding.

(A) G614 form one-up. (B) D614 form one-up. In the x and y axes, L-RBD, U-RBD, and R-RBD denote RBD regions of the L-down, Up, and R-down protomers, respectively. (C) Hydrogen bond occupancy for critical residue pairs located between protomers. Occupancies were calculated for the Asp614/Gly614-Thr859 pair and the Gln613-Thr859 pair. For each system, error bars were calculated as standard error over five replicas. In all subpanels, “L” represents the L-down protomer, “U” denotes the Up protomer, and “R” represents the R-down protomer. For example, “LU” denotes the bonding between L and U protomers. In (A) and (B), red denotes positive and blue denotes negative cross-correlations. The color scale shows the range of correlations that increases from no correlation (0.00, white) to perfect positive correlation (dark blue, 1.00) or perfect negative correlation (dark red, −1.00). The higher the intensity of the pixels, the stronger the magnitude of correlations.

Fig. 5 Residue-residue cross-correlation matrices.

(A) G-form all-down. (B) D-form all-down. In the x and y axes, L-RBD, U-RBD, and R-RBD denote RBD regions of the L-down, Up, and R-down protomers, respectively. Red denotes positive and blue denotes negative cross-correlations. The color scale shows the range of correlations that increases from no correlation (0.00, white) to perfect positive correlation (dark blue, 1.00) or perfect negative correlation (dark red, −1.00). The higher the intensity of the pixels, the stronger the magnitude of correlations. The diagonal blocks (top left to bottom right) denote high intra-domain correlation and therefore have intense red pixelation. Inter-domain off-diagonal blocks have relatively stronger (though slightly asymmetric) correlations in G-form down (left) as compared to D-form down (right).

Contacts in C-terminal domains 1 and 2 (528-685) and the fusion peptide and fusion peptide region (816-911) domains are major contributors to the symmetrization in the G614 form

To capture the specific regions where the inter-protomer S1-S2 contacts are most affected by the D614G substitution, we carried out a global differential contact analysis, in which we identified persistent contacts that existed in one form but not the other in the all-down or one-up states. Although the overall number of contacts does not change for most of the interfaces, specific contacts do, with different contacts being gained at different interfaces (Fig. 6, purple asterisks, and fig. S4). Furthermore, the equalization of inter-protomer S1-S2 contacts in the one-up G614 form arises largely from (i) the overall loss of contacts between the C-terminal 1 (CT1) domain of the L-down protomer and the fusion peptide domain of the Up protomer and (ii) the overall gain of contacts between the CT1 and the CT2 domains of the Up protomer and the fusion peptide domain of the R-down protomer (fig. S5). The mutation of residues in the CT1, fusion peptide, and fusion peptide region domains to more hydrophobic residues was experimentally linked to an increase of protomers adopting the up conformation (4). This provides experimental support for the importance of these regions in determining the conformational state of the Spike.

Fig. 6 Domain-domain contact gain and loss for LS1-US2 and US1-RS2 interfaces.

(A and C) Average number of contacts gained between forms by each system in terms of domain-domain interactions. Each bar represents the average number of contacts gained by the system in question between the two domains listed on the x axis. To calculate this, we sum up the number of newly-formed contacts in each form in comparison to the other. Purple asterisks represent the three interfaces with the greatest differences. (B and D) Image of the two interfaces in question, with all three protomers shown as different colored surfaces, the domains involved in gain or loss shown as transparent surfaces with cartoon representations beneath, and the specific residues involved in the highest-frequency changing contact shown as spheres. Images rendered with PyMOL (51).

We next studied the hydrogen bonding interactions of the speculated critical residues D/G 614 with T859 (5). The occupancy of the D614─T859 hydrogen bond increases for the one-up versus all-down state in the D614 form (Fig. 4C, last two panels), but because the hydrogen bond no longer exists in the G614 form (Fig. 4C, first two panels), this bias likewise does not exist. This is another type of “symmetrization.” In addition, there is a concomitant increase in the flexibility of residues 610 to 650 in the G614 form (fig. S5). In general (fig. S4), we do not observe compensatory hydrogen bond formation with Thr859. Thus, it is conceivable that this hydrogen bond reduction leads to a conformational relaxation.

Symmetrization of inter-protomer interactions in the G614 form can lead to higher population of infection-capable (one-up) conformation

The distinct contact signatures associated with the D614G substitution may be associated with alterations in the relative population of the all-down and one-up ensembles of the Spike protein (Fig. 2). For the G-form, the symmetry in the number of contacts among the inter-protomer interfaces suggests that the energetics between the protomers are similar (15, 16). The connection between contacts and energetics is further supported by Molecular Mechanics Poisson Boltzmann Surface Area (MM-PBSA) calculations (see Materials and Methods), which demonstrate that inter-protomer S1-S2 interaction energy follows a similar pattern, in which the interaction energies as approximated are asymmetric in the one-up D-form, but are driven toward symmetric in the one-up G-form (fig. S6). Since the G-form preserves these interactions in both the all-down and one-up conformations, the interaction energy between up- and down-protomers would be similar to the energetics between two down-protomers. Inspired by this rationale, we use an Ising model, an established and straightforward method (17) that has been applied to many different applications in physics, to predict how the D614G substitution can affect the relative stability of the open and closed Spike conformations. In this model, each protomer can adopt an “up” or “down” state, and each protomer interacts with its two neighbors with an energy that depends only on the states of both protomers. Defining the Spike as a periodic three-spin system, the Ising model suggests that symmetrization of the energetics of the G-form will lead to an ensemble ratio of 75:25 of the one-up state to the all-down state, as compared to the 50:50 population ratio for the original D-form (see the Supplementary Materials for details). This demonstrates how a single residue substitution can facilitate the adoption of the open Spike configuration, thereby enhancing viral infectivity.

In summary, we find three contributions to symmetrizations associated with the D614G substitution: (i) symmetrization in the inter-protomer contacts, (ii) symmetrization in the correlations between the RBDs, and (iii) symmetrization of a specific inter-protomer hydrogen bond. We hypothesize that these three symmetrizations lead to a decrease in the energy differences between the all-down and one-up states, leading to an increase in infectivity through an increase in the one-up population of the G-form.

One-up state enhances exposure of RBD to ACE2 binding and epitope sites

The Spike trimer is a glycoprotein, meaning it is covered in short flexible sugars called glycans. We modeled the impact of those highly flexible glycan ensembles on the exposure of the protein surface, particularly focusing on the glycan shielding of RBD epitope regions. To do this, we use a quantitative measure called the glycan encounter factor (GEF) (18, 19). Given that glycans are highly dynamic, they transition between covering and revealing proximal protein surfaces. A low GEF value indicates a high exposure of the protein surface in a given region. We perform a GEF analysis here to assess changes in exposure between all-down and one-up states and between the D- and G-forms. Independent of D- or G-forms, the ACE2 binding site and epitope regions are more exposed in the one-up state than in the all-down state (Fig. 7, A and B) (for further details on local changes, see Fig. 8 and the Supplementary Materials). Specifically, the exposure increases by ~40% at the RBM (residues 438 to 506). This is accompanied by larger exposure of both the ACE2 binding site and RBD targeting antibody epitopes. For example, there is a ~64% increase in exposure of the C105 epitope region (9) when the RBD adopts the open state, as opposed to the closed orientation where it is buried by surrounding protein and glycans (Fig. 7, C and D). Thus, in the one-up state, which is favored by the G614 form, key epitopes of the RBD become more accessible compared to the D614 form.

Fig. 7 RBD is more exposed in the one-up conformation.

(A) Surface representation of the all-down Spike complex. Colors ranging between red, white, and blue represent values for the GEF. GEF describes relative exposure and coverage of protein surfaces. (B) Surface representation of the one-up Spike. Colors ranging between cyan, white, and magenta indicate GEF differences between the closed and open structures. The RBD in the up orientation is indicated by the dotted circle, where the originally buried (magenta) region around 458 to 478 is now exposed. (C and D) Top view of the Spike protein apex in (C) all-down and (D) one-up conformations. Protein surface is shown in gray. Glycan ensembles covering the protein surface are represented by point densities. Fucosylated 2 and 3 antennae complex (FA2 and FA3) glycans and oligomannose (OM) glycans are depicted in dark red, orange, and green. Binding site region, which includes RBM (residues 438 to 506) and C105 antibody binding residues (i.e., 403, 405, 406, 408, 409, 415 to 417, 420, 421, 449, 453, 455 to 460, 473 to 477, 486, 487, 489, 493 to 496, 498, and 500 to 505), is colored yellow. These binding site residues are more exposed in the one-up conformation, compared to the all-down case. (E) Representative configuration describing S309 Fab binding to an up-RBD in G-form that shows no rotation toward NTD of the neighboring protomer on the right (~0° rolling angle) (top view of the Spike is shown). This structure shows that the Fab domain does not sterically clash with the Spike protein. (F) Representative configuration showing S309 Fab binding to an up-RBD in D-form where the RBD is rotated toward the NTD of the neighboring protomer on the right [rolling angle is <−10°; same view as in (E) is shown]. The Fab domain starts to sterically clash with the Spike protein NTD of the neighboring protomer. See Supplementary Materials and Methods and fig. S7 for details on RBD rotations.

Fig. 8 Potential immunological effects of SARS-CoV-2 Spike protein glycosylation.

(A) Residue-wise GEF of the RBD domain in D-form, down conformation. Buried residues are colored blue. Exposure increases as the color changes toward white and red. (B and C) Glycan ensembles over Spike ectodomain, D-form. Spike protein surface given in gray. The surface glycans are highly dynamic. Glycans overlaid from 250 different conformations are shown as points; this gives a “cloud”-like rendition of the area most heavily occupied where the points are dense, and space that glycans sample less frequently. Fucosylated 2 and 3 antennae complex (FA2 and FA3) glycans, oligomannose (OM) glycans, and hybrid (Hyb) glycans are colored according to key. (B) All-down and (C) one-up conformations shown. Glycans directly affecting RBD and NTD coverage change due to RBD opening are marked. Removal of glycan 165 has experimentally been shown to increase the population of protomers in the single-protomer up state. Removal of glycan 234 has experimentally been shown to decrease the population of protomers in the single-protomer up state (4). FP residue range 816 to 855 shown in yellow. (D) Effect of D614G substitution on FP coverage. N-glycans surrounding FP are numbered. Glycans at 282, 603, and 801 from the same protomer and glycan 616 from the neighboring protomer contribute to partial shielding of FP. On the left is the point cloud of glycans from 250 different conformations for the four different systems. The white dashed circle shows the areas of change between D-form and G-form. Numbers label all glycans that touch the FP domain. On the right is the change in GEF superimposed on the relevant sequence.

In addition to affecting the binding site exposure, allostery-induced changes in rotations or tilting of RBDs between the D- and G-forms leads to differential neutralization (20); the G-form is more sensitive to polyclonal vaccine and convalescent sera, as well as a subset of neutralizing antibodies. The most notable example of this was the monoclonal antibody S309, isolated from a recovered SARS-CoV–infected subject, that cross-neutralizes both SARS-CoV and SARS-CoV-2 (21). Notably, S309 displays a notable increase in sensitivity for the G-form of the SARS-CoV-2 Spike [median inhibitory concentration (IC50) = 0.073 μg/ml] as compared to the D-form (IC50 = 11.83 μg/ml) (20). Structural and binding studies showed that its epitope lies outside the RBM of the RBD and that S309 binds to both the one-open and all-down conformations. Biolayer interferometry confirmed the absence of competition between S309 and ACE2 binding (21), suggesting that the molecular mechanism of action was not direct interference with ACE2 binding. Our GEF analysis shows that there is no difference in S309 epitope exposure between the D- and G-forms. However, some differences arise between the up and down orientations of the RBD, as expected. Reorientation of glycans N331 and N343 leads to altered exposure over S309 epitope residues 333 to 335 and 343 to 359 in the one-up conformation as compared to the all-down state (fig. S7A).

We further compared the orthogonal rotational space sampled by the RBD between the D- and G-forms (fig. S7, B to F). To describe RBD motions, we defined two axes of rotation that are roughly orthogonal to each other, “roll” and “pitch” (see Materials and Methods and fig. S7B). Although this space remains similar in the all-down state of the D- and G-forms (fig. S7C), it exhibited differences in the one-up state between the forms (fig. S7D). Using the known structure of S309 bound RBD as a reference, mapping various RBD configurations, we find that when the RBD has a negative rotation (clockwise) more than 10° toward the NTD of the neighboring protomer on the right (−10° and beyond in Fig. 7, E and F, and fig. S7, C and D), it could hinder the binding of S309 due to steric clashes from the neighboring NTD. Considering that the MD simulation ensemble sampled higher populations in this sterically hindered region for the D-form (~38%) compared to the G-form (15%) (fig. S7, E and F), it suggests that S309 experiences less hindrance when binding to the G-form compared to the D-form in the one-up conformation.

Last, the paratope of S309 is aligned almost parallel to the Spike longitudinal axis when the RBD of one protomer is in the up conformation, making it more feasible for the two arms of the antibody to cross-link with two different Spikes on the same viral surface in tandem (Supplementary Materials and fig. S8). Thus, the increase in the population of one-up states that we propose in the G-form would additionally favor mechanisms of neutralization such as cross-linking.

DISCUSSION

The D614G substitution occurs on the inter-protomer interface of the Spike protein mediating critical contacts. We have performed extensive all-atom MD simulations of the trimeric Spike protein and assessed the effects of this substitution on both all-down and one-up conformations. The fluctuations about the native contacts studied here provide a crucial foundation for how molecular-level interactions could facilitate the adoption of one state versus another (15). Although the longer time scales of the up and down conformational transitions do not allow us to directly probe the all-down to one-up dynamics, our findings are nonetheless of great significance.

Our simulations provide mechanistic details that indicate and can explain an increased occupancy of the one-up state in the G614 form, which, in turn, accounts for both its enhanced infectivity (5, 7) and neutralization sensitivity (20) and provides predictions for experiment, some of which have already been verified (8, 22). We found that the different interactions between protomers in the D614 form are different in the one-up and all-down states. By contrast, in the G614 form, there is a symmetrization in the number of inter-protomer contacts between S1 and S2 subunits, which is associated with allosteric fluctuations in RBD, distal to the 614 site. We have identified the regions of greatest difference between the two forms and the regions of high-magnitude correlations as areas of focus for experimental efforts.

Thermodynamically, the effect of symmetrization can be captured using a periodic three-spin Ising model. Using the observed 50:50 ratio of all-down to one-up conformations in the D614 form (4), we predict that symmetrization of the G-form energetics in comparison to the D-form leads to a 75:25 ratio of one-up to all-down. We do note a small tendency toward a concomitant deviation from symmetry in the all-down state of the G614 form, which would further increase this ratio. In parallel, Weissman et al. (20) obtained structural evidence using negative stain electron microscopy that showed that the G614 form Spike populates the one-up state 82% of the time (versus 18% of the time all-down), while the D614 form adopts this open conformation with only 42% frequency (versus 52% of the time all-down), largely confirming our predictions (20, 23). Furthermore, they found that the G614 form is more sensitive to neutralization by a range of neutralizing agents: (i) sera raised from D614 form Spike and Spike RBD vaccinations in mice, primates, and humans, (ii) convalescent COVID-19 sera, and (iii) a subset of RBD monoclonal antibodies.

A recent cryo–electron microscopy (cryo-EM) study points to a “looser” Spike protein with a higher probability of taking on an open structure with a more flexible RBD (22). According to that study, the G-form demonstrates increased infectivity in pseudovirus assays and across multiple species ACE2 orthologs, does not have any alterations in the synthesis or processing of the Spike trimer, and, despite the increased infectivity, does not have stronger binding to the ACE2 receptor. Therefore, this paper supports the idea that the dominant factors in the increased infectivity are likely to be, as we argued, conformational changes and/or an increase in the population of open states in the G-form. They observe conformational changes in the S1 domain; however, they are not able to resolve the RBD. We also observe conformational changes in the S1 domain between D- and G-forms, which we are able to localize to a difference in the rotational states occupied by the RBD. As we show, such a rotational effect of RBD in G-form can explain the notable increase in sensitivity of antibody S309 for the G-form of SARS-CoV-2 Spike as compared to the D-form.

The increased occupancy of the one-up state in the G614 form predicted by our simulations is further supported by a recent study by Hou et al. (8). That study demonstrates that D614G substitution exhibits faster transmission phenotypes in vivo and shows increased replication in the nasal epithelium and large airway epithelium. They conclude that the infectivity, replication fitness, and early transmission are enhanced in the G-form and their data are consistent with a mechanistic model that accounts for both its enhanced infectivity and neutralization sensitivity as proposed here, which is the increased occupancy of the one-up state in the G614 form. The shift toward one-up Spike trimers would increase the likelihood of binding events of RBD and ACE2, thus explaining the experimentally observed increase in infectivity (5). Since many neutralizing antibodies target this RBD region (9), it would also explain the enhanced sensitivity of the D614G form to neutralization activity in vaccine and convalescent sera (8, 20).

Our dynamic model also indicates that there is a transition in the glycan shield when going from all-down to one-up; the glycan coverage disappears at the apex of the trimer when in the one-up conformation. Thus, regardless of the D- or G-form, the one-up conformation exposes the RBD for binding to the ACE2 receptor while simultaneously exposing more of the RBD protein surface for antibody binding to RBD epitopes. There are no significant global changes coming from the D614G substitution on the ACE2 binding site exposure in RBD once the Spike trimer is in the one-up conformation. Thus, given the natural preference for a more open Spike conformation in the G-form, it is possible that this form may have advantages as a vaccine antigen.

In summary, our findings suggest that the overall protein exposure remains globally similar between D- and G-forms, but predicts a marked increase in the one-up state with the G-form, meaning that both ACE2 binding and RBD-targeting antibody binding are likely to increase in the one-up state. Therefore, a change toward a higher one-up state population is likely the dominant effect of the D614G substitution. The mechanistic studies presented here, the structural data (20, 22), the experimentally determined increase in infectivity (5, 8), and the enhanced neutralizing antibody sensitivity (8, 20) all come together to form a consistent story.

MATERIALS AND METHODS

Structure preparation

Atomic structures derived from cryo-EM [Protein Data Bank (PDB) IDs: 6VXX and 6VYB; Walls et al. (10)] were used to prepare the all-down and one-up configurations for simulation. As described in (10), residues 986 to 987 were kept as Proline-Proline, and fusion peptide residues 682 to 685 were kept as SGAG instead of the viral form residues KV and RRAR, to maintain the stable soluble form of the Spike extracellular domain protein. In the original 6VXX and 6VYB models, several flexible regions were unresolved, which vary in lengths from 2 to 27 residues. We used a data-driven structure-based modeling approach to build a complete model that accurately captures secondary structures of these missing regions, without introducing artificial “knotting” of loops. For this, we used homology modeling using numerous SARS-CoV-1 structures as templates (see the Supplementary Materials). Missing residues in the RBD were built using an ACE2-bound SARS-CoV-2 structure (PDB: 6M0J) (24). Structure-based sequence alignment was performed using the 3D-Coffee program (12) and homology modeling was performed using the MODELLER 9.20 suite (25). We generated 10 models for each configuration and selected the top model based on DOPE and Procheck scores (26, 27). After finding out that cryo-EM fitting did not greatly improve the model using Chimera 1.13.0 (28) rigid fitting, and the MDFF (29) flexible fitting, we arbitrarily chose as starting structures the all-down model after 2 ns in vacuum with gscale = 0.3 and a further 2 ns in vacuum with gscale = 0.2 and the one-up model after 2 ns in vacuum with gscale = 0.3. To avoid artificial charges at the protein ends, we introduced N-terminal acetylated and C-terminal N-methylamide capping groups. All histidines were modeled as the neutral tautomer where the epsilon nitrogen is protonated. Fifteen disulfide bonds found in 6VXX and 6VYB were modeled in the force field for simulation.

MD simulation details

All-atom explicit-solvent simulations were performed with the Gromacs v5.1.2 and 2018.3 software packages (30, 31), using the CHARMM36m (32) force field and TIP3P water model (33). Each configuration was solvated and centered in a cubic box. The side length of the box was defined such that there is at least 15 Å padding around the molecule. Each system was neutralized with an excess of 150 mM KCl. Energy minimization was conducted using the steepest descent algorithm. Equilibration simulations were performed under constant number-volume-temperature (NVT, 2 ns) and constant number-pressure-temperature (NPT, 10 ns) ensembles. During the equilibration stages, harmonic position restraints were imposed on all heavy atoms of the molecule. Temperature coupling was achieved using Langevin dynamics at 310 K with a relaxation time of 1 ps (34). The Berendsen barostat with isotropic coupling was used to maintain a constant pressure of 1 bar, with a relaxation time of 4 ps and a compressibility of 4.5 × 10−5/bar (35). Covalent bonds were constrained by implementing the LINCS algorithm (36). van der Waals interactions were evaluated using a cutoff where forces smoothly switch to zero between 1.0 and 1.2 nm. Coulomb interactions were calculated using the particle mesh Ewald (PME) method, with a cutoff of 1.2 nm, a Fourier spacing of 0.12 nm, an interpolation order of 4, and a tolerance of 1 × 10−5 (37). Unrestrained production simulations were performed in the NPT ensemble, with an integration time step of 2 fs. For each configuration, we simulated five replicas. Each replica was run for 1.1 μs. The first 100 ns of each run was considered as further equilibration and was not included for analysis. In total, we generated 20 1-μs production simulations.

Contact analysis

We used the g_contacts plugin for Gromacs (11) to identify contact frequencies of all residue-residue contacts formed, both intra- and inter-protomer. A contact was defined as any heavy atom of one residue coming within 6 Å of the heavy atom of another residue. For the contact analysis, we defined “persistent contacts” as any of the identified contacts that appeared with a frequency of greater than or equal to 0.5, that is, that appeared in at least 50% of analyzed simulation frames. Average contacts with region S2 excluded the CD1 domain because of observations that it “frayed” in simulation and was extremely flexible, which we presume to be an artifact because of the lack of embedding of the Spike in the viral membrane.

Inter-protomer S1-S2 interaction energy

The inter-protomer S1-S2 interaction energy was approximated using the MM-PBSA method (3840). This end-point approach estimates binding energy from the combination of molecular mechanical electrostatic and van der Waals energies, and the polar and nonpolar solvation energy terms. While this method has been known to overestimate energies, it performs reasonably well for relative comparison of binding strength (41, 42). It must also be noted that conformational entropy was not included in the energy calculation. MM-PBSA calculation was implemented through the “g_mmpbsa” plugin (43) for GROMACS. Because the PB calculations are computationally very costly, a selection of 80 snapshots equally distributed over each of the five equilibrated trajectories (total 400 snapshots per ensemble) was used for these calculations to obtain robust sampling. Standard errors over the five trajectories were calculated and shown in fig. S6.

Correlation analysis

The cross-correlation matrix is defined asCij=Δri(t)·Δrj(t)Δri(t)·Δri(t)Δrj(t)·Δrj(t)(1)which is the normalized covariance where Δri(t) is the fluctuations of atom i with respect to its average coordinates. We used the -cov, -norm, and -dot flags of the program carma to perform the analysis (44).

Hydrogen bond analysis

We performed the hydrogen bond analysis with VMD 1.9.3 Hbonds plugin (45), where the definition of a hydrogen bond was the requirement of donor-acceptor atoms being polar, a distance cutoff of 4 Å, and an angle cutoff of 30°. This is a much more stringent criterion than that used for the contact identification. A cutoff of 10% occupancy was used to identify extant hydrogen bonds.

Glycan modeling and exposure calculation

There are 22 potential N-glycosylation sites (PNGS) in the Spike, each of which has a unique glycoform distribution. We chose glycan types with the highest occurrence at each site, as determined by mass spectrometry studies (46) for 19 of these glycans present within our model range (see Fig. 8, B and C, and table S2). There are two putative O-glycan sites on the Spike; however, they have been reported to have less than 2% occupancy (47) and were therefore not included in our models.

We performed a Jarvis Patrick–based cluster analysis where a structure is added to a cluster if it has at least six neighbors in common with another neighbor. Clustering was performed on all five replicas, for each of the four different systems, using the “gmx cluster” command with a cutoff of 0.2 nm to identify a subset of distinct conformations based on Cα root mean square deviations (RMSDs). We identified 63 clusters for D614 all-down, 78 clusters for D614 one-up, 54 clusters for D614G all-down, and 103 clusters for D614G one-up. We took the closest to the mean structure from each of the top 20 largest clusters of each system and used these snapshots as the basis for building glycan models. Glycan ensemble modeling was performed using a previously established pipeline (18) with the ALLOSMOD (48) suite of MODELLER. An initial glycan conformation was added at each PNGS with randomized atomistic deviations from CHARMM36 ideal carbohydrate geometries and random orientations. This was followed by simulated annealing relaxation of the entire glycoprotein. Fifty glycosylated conformations were built on each of the 20 selected models, resulting in an ensemble of 1000 distinct glycoprotein conformations for each of the four systems. Glycoform for each PNGS was chosen as the one with the highest probability of occurrence from site-specific mass spectrometry results (46).

The GEF was calculated for each residue exposed on the surface of the protein as established in our previous study (18). It is defined as the number of glycan heavy atoms encountered by an approaching probe of 6 Å diameter mimicking a typical hairpin loop of antibodies interacting with epitopes. Geometric mean of this value measured in the three cardinal directions (perpendicular to the surface and along the plane) was taken to cover different orientations.

Reorientational dynamics of RBD

Treating the RBD domain as a rigid ellipsoid, rotations along the Euler angles roll and pitch were measured through the equilibrated simulation trajectories. The rotation axes are indicated in fig. S7B. Roll was defined as rotation around the axis that is spanned by residues 392 and 401. Pitch was defined as rotation around the axis that is spanned by residues 377 and 357 of the initial model conformations. The roll-pitch potential of mean force, PMF(X), was calculated using PMF(X) = −RT*log[p(X)], where X denotes values along the roll and pitch directions, and p(X) is the probability. Rotation of RBD-down ensemble included the L-RBD and R-RBD (down) protomers. S309 Fab binding to up-RBD was modeled by rigid-body alignment to closed-RBD and Fab interactions from PDB structure 6WPT (21), using the backbone of residues 331 to 527 for least squares fitting.

Molecular modeling of antibody mediated cross-linking of Spikes

We modeled an intact immunoglobulin G (IgG) structure based on PDB entry 1IGT (49), in combination with the S309 Fab domain to visualize the relative orientations between the Spike ectodomain and IgG (fig. S8). The S309 Fab and IgG Fab domains were aligned to model the intact IgG. These two had an RMSD of ~1 Å for the aligned and matched sequence residues. The Fab domain residues were replaced by S309 to obtain an accurate binding interface. In the RBD-down conformation, the orientation of the two IgG arms is such that Spike trimers from the same virion can only bind if there is large reorientation and structural reorganization of the antibody arms (fig. S8B). Since the inter-domain linkers in antibodies do impart flexible reorganizations (50), this pathway cannot be ruled out. However, we have tested all three available intact IgG structures in PDB (1IGT, 1IGY, and 1HZH), and they all indicate that large-scale reorientation would be necessary for inter-Spike binding in the closed-RBD conformation.

Limitations of the study

Although the results from these extensive MD simulations represent an extensive investment of computational resources, large-scale conformational shifts in the dynamics, such as the actual transition between the all-down and one-up states, are beyond the microsecond time scales considered here. Specifically, by using an all-atom explicit solvent model (as described in the manuscript), it is currently not possible to simulate large-scale up-down transitions because of the massive computational cost. According to statistical mechanics, the fluctuations about equilibrium studied here provide the motions of the system under linear response, and it has been well established that protein transitions between folded and unfolded states can be probed purely based on native contacts.

The work performed here lays the necessary groundwork for future simulations, which, out of computational necessity, must be coarse-grained, to study these transitions directly and indirectly as done here. To study large movements, one could use more simplified energetic descriptions (e.g., Gō-models) that would effectively elucidate the role of structural features during this process. However, these are additional questions that may be explored in future efforts and that cannot be performed without a detailed understanding of the system modeled accurately at an all-atom scale.

Also, the current structure does not include the heptad repeat 2, transmembrane, or cytoplasmic regions that could differentially alter the presentation of Spike on the membrane in the G-form. We cannot comment directly on the S1/S2 cleavage site conformation or the effect of two prolines used for stabilization, as we used the sequence from Walls et al. (10) that was mutated to stabilize a soluble protein. In addition, we cannot comment on effects such as hyperfusogenicity, which may affect the virus’s ability to fuse with the host cell. Such large-scale effects would require a coarse-grained or steered MD approach to be able to probe the interactions of the full trimer with the host cell and its receptor.

SUPPLEMENTARY MATERIALS

Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/7/16/eabf3671/DC1

https://creativecommons.org/licenses/by-nc/4.0/

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.

REFERENCES AND NOTES

Acknowledgments: We thank D. Weissman for sharing unpublished neutralization data and being willing to consider cosubmission of our papers. We thank P. Acharya for sharing unpublished negative stain EM data on G614. We are grateful to P. Weber for securing extensive computational resources that made this study possible. Funding: R.A.M. was supported by a Los Alamos National Laboratory (LANL) Director’s Postdoctoral Fellowship. S.C. was supported by the Center of Nonlinear Studies Postdoctoral Program. K.N. was supported by the Spatiotemporal Modeling Center at the University of New Mexico (NIH P50GM085273). B.K. and S.G. were supported by LANL LDRD project 20200706ER. This research used computational resources provided by the LANL Institutional Computing Program, which was supported by the U.S. Department of Energy National Nuclear Security Administration under contract no. 89233218CNA000001 to Triad National Security LLC. Author contributions: R.A.M., S.C., K.N., and S.G. designed the study. S.C. and K.N. prepared the structures. R.A.M. ran the simulations. S.C. calculated the GEF and MMPBSA energies. S.C., K.N., R.A.M., B.K., D.C.M., and S.G. performed data analysis and interpretation. R.A.M. and S.C. prepared the figures. R.A.M. wrote the initial manuscript draft. R.A.M., S.C., K.N., B.K., D.C.M., and S.G. rewrote and edited the manuscript. B.K. and S.G. secured the funding. Competing interests: The authors declare that they have no competing interests. Data and materials availability: Molecular simulation data and raw materials are available from the corresponding author upon reasonable request. All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Simulation trajectories and additional data related to this paper are available on: https://www.osti.gov/dataexplorer/biblio/dataset/1760388.

Stay Connected to Science Advances

Navigate This Article