Research ArticleCOMPUTATIONAL BIOLOGY

Metainference: A Bayesian inference method for heterogeneous systems

See allHide authors and affiliations

Science Advances  22 Jan 2016:
Vol. 2, no. 1, e1501177
DOI: 10.1126/sciadv.1501177
  • Fig. 1 Schematic illustration of the metainference method.

    (A and B) To generate accurate and precise models from input information (A), one must recognize that data from experimental measurements are always affected by random and systematic errors and that the theoretical interpretation of an experiment may also be inaccurate (B; green). Moreover, data collected on heterogeneous systems depend on a multitude of states and their populations (B; purple). (C) Metainference can treat all of these sources of error and thus it can properly combine multiple experimental data with prior knowledge of a system to produce ensembles of models consistent with the input information.

  • Fig. 2 Metainference of a model heterogeneous system.

    (A) Equilibrium measurements on mixtures of different species or states do not reflect a single species or conformation but are instead averaged over the whole ensemble. (B to D) We describe such a scenario using a model heterogeneous system composed of multiple discrete states on which we tested metainference (B), the maximum entropy approach (C), and standard Bayesian modeling (D), using synthetic data. We assess the accuracy of these methods in determining the populations of the states as a function of the number of data points used and the level of noise in the data. Among these approaches, metainference is the only one that can deal with both heterogeneity and errors in the data; the maximum entropy approach can treat only the former, whereas standard Bayesian modeling can treat only the latter.

  • Fig. 3 Scaling of the metainference harmonic restraint intensity in the absence of noise in the data.

    We verified numerically that in the absence of noise in the data and with a Gaussian noise model, the intensity of the metainference harmonic restraint Embedded Image, which couples the average of the forward model over the N replicas to the experimental data point (Eq. 7), scales as N2. This test was carried out in the model system at five discrete states, with 20 data points and with the prior at 16% accuracy. For each of the 20 data points, we report the average restraint intensity over the entire Monte Carlo simulation and its SD when using 8, 16, 32, 64, 128, and 256 replicas. The average Pearson’s correlation coefficient on the 20 data points is 0.999991 ± 3 × 10−6, showing that metainference coincides with the replica-averaging maximum entropy modeling in the limit of the absence of noise in the data.

  • Fig. 4 Analysis of the inferred uncertainties.

    (A and B) Distributions of inferred uncertainties (PDF) in the presence of systematic errors, using (A) a Gaussian data likelihood with one uncertainty per data point and (B) the outliers model with one uncertainty per data set. This test was carried out in the model system at five discrete states, with 20 data points (of which eight were outliers), 128 replicas, and the prior at 16% accuracy. For the Gaussian noise model, we report the distributions of three representative points not affected by noise (Embedded Image) and of two representative points affected by systematic errors (Embedded Image and Embedded Image). For the outliers model, we report the distribution of the typical data set uncertainty (Embedded Image).

  • Fig. 5 Example of the application of metainference in integrative structural biology.

    (A) Comparison of the metainference and maximum entropy approaches by modeling the structural fluctuations of the protein ubiquitin in its native state using NMR chemical shifts and RDC data. (B) The metainference ensemble supports the finding (36) that a major source of dynamics involves a flip of the backbone of residues D52-G53 (B; left scatterplot), which interconverts between an α state with a 65% population and a β state with a 35% population. This flip is coupled with the formation of a hydrogen bond between the side chain of E24 and the backbone of G53 (B; right scatterplot); the state in which the hydrogen bond is present (βHB+) is populated 30% of the time, and the state in which the hydrogen bond is absent (βHB−) is populated 5% of the time. By contrast, the NMR structure (Protein Data Bank code 1D3Z) provides a static picture of ubiquitin in this region in which the α state is the only populated one (black triangle). (C) Validation of the metainference (MI; red) and maximum entropy principle (MEP; green) ensembles, along with the NMR structure (blue) and the MD ensemble (purple), by the backcalculation of experimental data not used in the modeling: 3JHNC and 3JHNHA scalar couplings and two independent sets of RDCs (RDC sets 2 and 3).

  • Fig. 6 Distributions (PDF) of restraint intensities for different chemical shifts of ubiquitin.

    When combining data from different experiments, metainference automatically determines the weight of each piece of information. In the case of ubiquitin, the NH and HN chemical shifts were determined as the less reliable data and thus were downweighted in the construction of the ensemble of models. From this procedure it is not possible to determine whether these two specific data sets have a higher level of random or systematic noise, or whether instead the CAMSHIFT predictor (38) is less accurate for these specific nuclei.

Supplementary Materials

  • Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/2/1/e1501177/DC1

    Derivation of the basic metainference equations

    Details of the model system simulations

    Details of the ubiquitin MD simulations

    Fig. S1. Effect of prior accuracy on the error of the metainference method.

    Fig. S2. Scaling of metainference error with the number of replicas at varying levels of noise in the data.

    Fig. S3. Scaling of metainference error with the number of states.

    Fig. S4. Accuracy of the outliers model.

    Table S1. Comparison of the quality of the ensembles obtained using different modeling approaches in the case of the native state of the protein ubiquitin.

    Table S2. Comparison of the stereochemical quality of the ensembles or single models generated by the approaches defined in table S1.

    References (3945)

  • Supplementary Materials

    This PDF file includes:

    • Derivation of the basic metainference equations
    • Details of the model system simulations
    • Details of the ubiquitin MD simulations
    • Fig. S1. Effect of prior accuracy on the error of the metainference method.
    • Fig. S2. Scaling of metainference error with the number of replicas at varying levels of noise in the data.
    • Fig. S3. Scaling of metainference error with the number of states.
    • Fig. S4. Accuracy of the outliers model.
    • Table S1. Comparison of the quality of the ensembles obtained using different modeling approaches in the case of the native state of the protein ubiquitin.
    • Table S2. Comparison of the stereochemical quality of the ensembles or single models generated by the approaches defined in table S1.
    • References (39–45)

    Download PDF

    Files in this Data Supplement:

Navigate This Article