Belief propagation for networks with loops

See allHide authors and affiliations

Science Advances  23 Apr 2021:
Vol. 7, no. 17, eabf1211
DOI: 10.1126/sciadv.abf1211


Belief propagation is a widely used message passing method for the solution of probabilistic models on networks such as epidemic models, spin models, and Bayesian graphical models, but it suffers from the serious shortcoming that it works poorly in the common case of networks that contain short loops. Here, we provide a solution to this long-standing problem, deriving a belief propagation method that allows for fast calculation of probability distributions in systems with short loops, potentially with high density, as well as giving expressions for the entropy and partition function, which are notoriously difficult quantities to compute. Using the Ising model as an example, we show that our approach gives excellent results on both real and synthetic networks, improving substantially on standard message passing methods. We also discuss potential applications of our method to a variety of other problems.


Many complex phenomena can be modeled using networks, which provide powerful abstract representations of systems in terms of their components and interactions (1). Phenomena of interest are often modeled using probabilistic formulations that capture the probabilities of states of network nodes. Examples include the spread of epidemics through networks of social contacts (2), cascading failures in power grids (3), and the equilibrium behavior of spin models such as the Ising model (4). Networks are also used to represent pairwise dependencies between variables in statistical models that do not otherwise have a network component as a convenient tool for bookkeeping and visualization of model structure (5). Such “graphical models,” which allow us to represent the conditional dependencies between variables in a nonparametric manner, form the foundation for many modern machine learning techniques (6).

The solution of probabilistic models like this presents a challenge. Analytic methods such as those used for regular lattices do not generalize to the more complex topologies of networks, and mean field and other standard approximations often fail to take crucial details of network structure into account. Numerical methods can be successful but are computationally demanding on larger networks and sometimes give results of poor accuracy. Message passing or “belief propagation” methods offer an alternative and promising approach that straddles the line between analytic and numerical techniques (7, 8). Message passing works by deriving a set of self-consistent equations satisfied by the variables or probabilities of interest and then solving those equations by numerical iteration. The name “message passing” comes from the fact that the equations can be thought of as representing messages passed between neighboring nodes in the network.

Standard formulations of message passing, however, have a crucial weakness: They rely on the assumption that the states of the neighbors are uncorrelated with one another, which is only true if the network contains no loops. Unfortunately, almost all real-world networks do contain loops, and usually many of them (9), so standard message passing can give quite poor results in practical situations. Here, we propose a solution to this problem in the form of a new class of message passing methods for probabilistic models on “loopy” networks. These methods open up a host of possibilities for novel network calculations, many of which we discuss here.

The limitations of traditional message passing have been widely noted in the past, and a number of previous attempts have been made to remedy them. The only truly loopless networks are trees, but standard message passing methods have been shown to give good results on networks that satisfy the weaker condition of being “locally tree-like,” meaning that local regions of the network take the form of trees, although the network as a whole is not a tree. In effect, this means that the network can contain long loops but not short ones (1). However, realistic networks often fail to satisfy even this weaker condition and contain many short loops. Message passing has been extended to certain classes of random graphs with short loops, such as Husimi graphs (1012) and other tree-like agglomerations of small loopy subgraphs (13, 14), but these techniques are not generally applicable to real-world networks. Alternatively, one can incorporate the effect of loops by using a perturbative expansion around the loopless case (15, 16), although this approach becomes progressively less accurate as the number of loops increases and is therefore best suited to networks with a low loop density, which rules out a large fraction of real networks, whose loop density is often high (9, 17). Perhaps the best known extension of belief propagation, and the one most similar to our own approach, is the method known as generalized belief propagation (18), which is based on the idea of passing messages not just between pairs of nodes but between larger groups. This method is, however, quite complicated to implement and requires explicit construction of the groups, which involves nontrivial preprocessing steps (19). The method we propose requires no such steps.

In (20), we previously described message passing schemes for percolation models and spectral calculations on loopy networks. Here, we extend this approach to the solution of general probabilistic models. We derive a factorization of the probability of states for such models that allows us to write self-consistent message passing equations for the marginal probabilities on sets of nodes in a neighborhood around a given reference node. From these equations, we can then calculate a range of quantities of interest such as single-site marginals, partition functions, and entropies. To ground our discussion, we use the Ising model as an example of our approach, showing how our improved message passing methods can produce better estimates for this model than regular message passing. We show that our methods are asymptotically exact on networks whose loop structure satisfies certain general conditions and give good approximations for networks that deviate from these conditions. We give example results for the Ising model on both real and artificial networks and also discuss applications of our method to a range of other problems, emphasizing its wide applicability.


Our first step is to develop the general theory of message passing for probabilistic models on loopy networks. With an eye on the Ising model, our discussion will be in the language of spin models, although the methods that we describe can be applied to any probabilistic model with pairwise dependencies between variables, making it suitable for a broad range of calculations in probabilistic modeling.

Model description

Consider a general undirected, unweighted network G composed of a set V of nodes or vertices and a set E of pairwise edges. The network can be represented mathematically by its adjacency matrix A with elements Aij = 1 when nodes i and j are connected by an edge and 0 otherwise. On each node of the network, there is a variable or spin si, which is restricted to some discrete set of values S. In a compartmental model of disease propagation, for instance, siS = {0 (susceptible),1 (infected),2 (removed)} could be the infection state of a node (21, 22). In a spatial model of segregation, siS = {0 (unoccupied),1 (occupied)} could represent land occupation (23).

Spins si and sj interact if and only if there is an edge between nodes i and j, a formulation sufficiently general to describe a large number of models in fields as diverse as statistical physics, machine learning, economics, psychology, epidemiology, and sociology (2430). Interactions are represented by an interaction energy gij(si, sj∣ωij), which controls the preference for any particular pair of states si and sj to occur together. The quantity ωij represents any external parameters, such as temperature in a classical spin system or infection rate in an epidemiological model, which control the nature of the interaction. We also allow for the inclusion of an external field fi(si∣θi) with parameters θi, which controls the intrinsic propensity for si to take an particular state. This could be used, for instance, to encode individual risk of catching a disease in an epidemic model.

Given these definitions, we write the probability P(s∣ω, θ) that the complete set of spins takes value s in the Boltzmann formP(s|ω,θ)=eH(s|ω,θ)Z(ω,θ)(1)where the HamiltonianH(s|ω,θ)=(i,j)Egij(si,sj|ωij)iVfi(si|θi)(2)is the log probability of the state to within an arbitrary additive constant, and the partition functionZ(ω,θ)=seH(s|ω,θ)(3)is the appropriate normalizing constant, ensuring that P(s∣ω, θ) sums to unity. Here, we will primarily be concerned with computing the single-site (or one-point) marginal probabilitiesqi(si)=ssiP(s|ω,θ)(4)where ssi denotes all spins with the exception of si. For convenience, we have dropped ω and θ from the notation on the left of the equation, but it should be clear contextually that qi depends on both of these variables.

The one-point marginals reveal useful information about physical systems, such as the magnetization of classical spin models or the position of a phase transition. They are important for statistical inference problems, where they give the posterior probability of a variable taking a given state after averaging over contributions from all other variables (e.g., the total probability of an individual being infected with a disease at a given time). Unfortunately, direct computation of one-point marginals is difficult because the number of terms in the sum in Eq. 4 grows exponentially with the number of spins. The message passing method gives us a way to get around this difficulty and compute qi accurately and rapidly.

Message passing can also be used to calculate other quantities. For instance, we will show how to compute the average energy (also called the internal energy), which is given byU(ω,θ)=sH(s|ω,θ)P(s|ω,θ)(5)The average energy is primarily of interest in thermodynamic calculations, although it may also be of interest for statistical inference, where it corresponds to the average log likelihood.

We can also compute the two-point correlation function between spinsP(si=x,sj=y)=P(sj=y|si=x) qi(si=x)(6)This function can be computed by first calculating the one-point marginal qi(si = x), then fixing si = x and repeating the calculation for sj. The same approach can also be used to compute n-point correlation functions.

Message passing equations

Our method operates by dividing a network into neighborhoods (20). A neighborhood Ni(r) around node i is defined as the node i itself and all of its edges and neighboring nodes, plus all nodes and edges along paths of length r or less between the neighbors of i. See Fig. 1 for examples. The key to our approach is to focus initially on networks in which there are no paths longer than r between the neighbors of i, meaning that all paths are inside Ni(r). This means that all correlations between spins within Ni(r) are accounted for by edges that are also within Ni(r), which allows us to write exact message passing equations for these networks. Equivalently, we can define a primitive cycle of length r starting at node i to be a cycle (i.e., a self-avoiding loop) such that at least one edge in the cycle is not on any shorter cycle beginning and ending at i. Our methods are then exact on any network that contains no primitive cycles of length greater than r + 2.

Fig. 1 Hamiltonian expansion diagram, with r = 2.

The focal node is in red, while the rest of its neighborhood N0 is in green. Nodes and edges in purple represent the neighborhood N1 ∖ 0 excluding node 1. We also label the corresponding spin and message variables used in Eqs. 11 and 12.

This approach gives us a series of methods where the rth member of the series is exact on networks that contain primitive cycles of length r + 2 and less only. The calculations become progressively more complex as r gets larger: They are very tractable for smaller values but become impractical when r is large. In many real-world networks, the longest primitive loop will be relatively long, requiring an infeasible computation to reach an exact solution. However, long loops introduce smaller correlations between variables than short ones, and moreover, the density of long loops is, in many cases, low: The network is “locally dense but globally sparse.” In this situation, we find that the message passing equations for low values of r, while not exact, give excellent results. They account correctly for the effect of the short loops in the network while making only a small approximation by omitting the long ones.

In practice, quite modest values of r can give good results. The smallest possible choice is r = 0, which corresponds to assuming that there are no loops in the network at all and that the network is a tree. This is the assumption made by traditional message passing methods, and it gives poor results on many real-world networks. The next approximation after this, however, with r = 1, which correctly accounts for the effect of loops of length three in the network (i.e., triangles), produces substantially better results, and the r = 2 approximation (which includes loops of length three and four) is, in many cases, impressively accurate. In the following developments, we drop r from our notation for convenience—the same equations apply for all values of r.

Having defined the initial neighborhood Ni, we further define a neighborhood Nji to be node j plus all edges in Nj that are not contained in Ni and the nodes at their ends. Our method involves writing the marginal probability distribution on the spin at node i in terms of a set of messages received from nodes j that are in Ni, including nodes that are not immediate neighbors of i. (This contrasts with traditional message passing in which messages are received only from the immediate neighbors of i.) These messages are then, in turn, calculated from further messages j receives from nodes kNji and so forth.

When written in this manner, the messages i receives are independent of one another in any network with no primitive cycles longer than r + 2. Messages received from any two nodes j1 and j2 within Ni are necessarily independent, since they are calculated from the corresponding neighborhoods Nj1i and Nj2i, which are disconnected from one another: If they were connected by any path, then that path would create a primitive cycle starting at i but passing outside of Ni, of which by hypothesis there are none. By the same argument, we also know that Nji and Ni only overlap at the single node j for any jNi.

This much is in common with our previous approach in (20), but to apply these ideas to the solution of probabilistic models, we need to go further. Specifically, we now show how this neighborhood decomposition allows us to factorize the Hamiltonian into a product of independent sums over the individual neighborhoods, with interactions that can be represented by messages passed between neighborhoods. Consider Ni as comprising a central set of nodes and edges surrounding i, then we can think of the set of neighborhoods Nji for all jNi as comprising the next “layer” in the network, the sets Nkj for all kNji as a third layer, and so forth until all nodes and edges in the network are accounted for. In a network with no primitive cycles longer than r + 2, this procedure counts all interactions exactly once, allowing us to rewrite our Hamiltonian as a sum of independent contributions from the various layers, thusH(s)=HNi(sNi)+jNiHNji(sNji)+jNikNjiHNkj(sNkj)+jNikNjilNkjHNlk(sNlk)+,(7)where sNi and sNji are the sets of spins for the nodes in the neighborhoods Ni and Nji, and we have defined the local HamiltoniansHNi(sNi)=(j,k)Nigjk(sj,sk|ωjk)fi(si|θi)(8)HNji(sNji)=(k,l)Njigkl(sk,sl|ωkl)fj(sj|θj)(9)The decomposition of Eq. 7 is illustrated pictorially in Fig. 1.

The essential feature of this decomposition is that it breaks sums over spins such as those in Eqs. 3 and 4 into a product of sums over the individual neighborhoods {Nji}jNi. Because these neighborhoods are, as we have said, independent, this means that the partition function and related quantities factorize into products of sums over a few spins each, which can easily be performed numerically. For instance, the one-point marginal of Eq. 4 takes the formqi(si=x)sNi:si=xeHNi(sNi)jNi sNjijeHNji(sNji)×kNji sNkjkeHNkj(sNkj),(10)which can be written recursively asqi(si=x)=1ZisNi:si=xeHNi(sNi)jNiqij(sj)(11)withqij(sj=y)=1ZijsNji:sj=yeHNji(sNji)kNjijqjk(sk)(12)where the normalization constants Zi and Zij ensure that the marginals qi and messages qij are normalized so that they sum to 1. (In practice, we simply normalize the messages by dividing by their sum.) The quantity qij(sj) is equal to the marginal probability of node j having spin sj when all the edges in Ni are removed. Alternatively, one can think of it as a local external field on node j that influences the probability distribution of sj. To make this more explicit, one could rewrite Eq. 11 asqi(si=x)=1ZisNi:si=xeHNi(sNi)+jNilog qij(sj)(13)where log qij(sj) plays the role of the external field.

Equations 11 and 12 define our message passing algorithm and can be solved for the messages qij by simple iteration, starting from any suitable set of starting values and applying the equations repeatedly until convergence is reached.

With only slight modification, we can use the same approach to calculate the internal energy as well. The contribution to the internal energy from the interactions of a single node i is 12j:Aij=1g(si,sj|ωij)+f(si|θi), where the factor of 12 compensates for double counting of interactions. Summing over all nodes i and weighting by the appropriate Boltzmann probabilities, the total internal energy isU=iV1ZisNi[12j:Aij=1g(si,sj|ωij)+f(si|θi)]eHNi(sNi)jNiqij(sj)(14)All of the quantities appearing here are known a priori, except for the messages qij(sj) and the normalizing constants Zi, which are calculated in the message passing process. Performing the message passing and then using the final converged values in Eq. 14 then gives us our internal energy.


For less dense networks, those with node degrees up to about 20, the message passing equations of Eqs. 11 and 12 can be implemented directly and work well. The method is also easily parallelizable, as we can update all messages asynchronously based on their values from the previous iteration and compute the final marginals in parallel.

For networks with higher degrees, the calculations can become unwieldy, the huge reduction in complexity due to the factorization of the Hamiltonian notwithstanding. For a model with t distinct spin states at every node, the sum over states in the neighborhood of i has tNi terms, which can quickly become computationally expensive to evaluate. Moreover, if just a single node has too large a neighborhood, then it can make the entire computation intractable, as that single neighborhood can consume more computational power than is available.

In such situations, therefore, we take a different approach. We note that Eq. 12 is effectively an expectationqij(sj=y)=δsj,yNji(15)where we use the shorthandANji=sNjiA(sNji)eHNji(sNji)kNjijqskjkZij(16)

We approximate this average using Markov chain Monte Carlo importance sampling over spin states, and, after convergence of the messages, the final estimates of the marginals qi can also be obtained by Monte Carlo, this time on the spins in Ni. We describe the details in Results.

Calculating the partition function

The partition function Z is perhaps the most fundamental quantity in equilibrium statistical mechanics. From a knowledge of the partition function, one can calculate virtually any other thermodynamic variable of interest. Objects equivalent to Z also appear in other fields, such as Bayesian statistics, where the quantity known as the model evidence, the marginal likelihood of observed data given a hypothesized model, is mathematically analogous to the partition function and plays an important role in model fitting and selection (3133).

Unfortunately, the partition function is difficult to calculate in practice. The calculation can be done analytically in some special cases (34, 35), but direct numerical calculations are difficult due to the need to sum over an exponentially large number of states, and Monte Carlo is challenging because of the difficulty of correctly normalizing the Boltzmann distribution.

Another concept central to statistical mechanics is the entropyS=sP(s)ln P(s)(17)which has broad applications not just in physics but across the sciences (3638). Like the partition function, entropy is difficult to calculate numerically, and for exactly the same reasons, since the two are closely related. For the canonical distribution of Eq. 1, the entropy is given in terms of Z by S = ln Z + βU. Although we know the internal energy U therefore (which is relatively straightforward to compute), the entropy is at least as difficult to calculate as the partition function. The fundamental difficulty of normalizing the Boltzmann distribution is equivalent to establishing the zero of the entropy, a well-known hard problem (unsolvable within classical thermodynamics, requiring the additional axiom of the Third Law).

As we now show, the entropy can be calculated using our message passing formalism by appropriately factorizing the probability distribution over spin states. Since we have already developed a prescription for computing U (see Eq. 14), this also allows us to calculate the partition function. The details of the procedure are quite involved and do not follow straightforwardly from the previous discussion, so we defer the derivation to the Supplementary Materials, section S4. As shown there, the state probability P(s) in Eq. 1 can be rewritten in the factorized formP(s)=iGP(sNi)((i,j))GP(sij)2/|ij|(18)where P(sNi) is the joint marginal distribution of the variables in the neighborhood of node i, P(sij) is the joint marginal distribution in the intersection ∩ij = NiNj of the neighborhoods Ni and Nj, and ((i, j)) denotes pairs of nodes that are contained in each other’s neighborhoods.

By a series of manipulations, this form can be further expressed as the pure productP(s)=[((i,j))GP(sij)1/(|ij|2)] [(i,j)GP(si,sj)Wij] [iGP(si)Ci](19)whereWij=1((l,m))G1(|lm|2)1{(i,j)lm}(20)with 1{…} being the indicator function, andCi=1(jNi1|ij|1)(jNi(0)Wij)(21)

Substituting Eq. 19 into Eq. 17, we get an expression for the entropy, thusS=1(|ij|2)((i,j))GP(sij)ln P(sij)(i,j)G WijP(si,sj)ln P(si,sj)iGCiP(si)ln P(si)(22)

Note that, like the well-known Bethe approximation for the entropy (39), this expression not only has contributions from the one- and two-point marginals P(si) and P(si, sj) of Eqs. 6 and 11 but also contains a term that depends on the joint marginal P(sij) in the intersection ∩ij, which may be nontrivial if r > 0. As shown in the Supplementary Materials, section S4, we can calculate this joint marginal using the message passing equationP(sij)=1ZijeβH(sij)qij(sj)kijjqjk(sk)(23)where H(sij) denotes the terms of the Hamiltonian of Eq. 2 that fall within ∩ij and Zij is the corresponding normalizing constant. For ∣∩ij∣ sufficiently small, Zij can be computed exactly. In other cases, we can calculate it using Monte Carlo methods similar to those that we used previously for the marginals P(si).

Ising model calculations

As an archetypal application of our methods, we consider the Ising model on various example networks. The ferromagnetic Ising model in zero external field is equivalent in our notation to the choicesgij(si,sj)=βAijsisj,fi(si)=0(24)where β = 1/T is the inverse temperature. Note that temperature in this notation is considered a part of the Hamiltonian. It is more conventional to write temperature separately, so that the Hamiltonian has dimensions of energy rather than being dimensionless as here, but absorbing the temperature into the Hamiltonian is notationally convenient in the present case. It effectively makes the temperature a parameter ωij in Eq. 2 (and all ωij are equal).

As example calculations, we will compute the average magnetization M, which is given byM=|1Ni=1Nsi|=1N|i=1N[2qi(si=+1)1]|(25)and the heat capacity C, given byC=dUdT=β2dUdβ(26)

As detailed in section S1, we use an extension of the message passing equations to compute C that avoids having to use a numerical derivative to evaluate Eq. 26. Briefly, we consider the messages qij to be a function of β, then define their derivatives with respect to β as their own set of messagesηij=dqijdβ(27)with their own associated message passing equations derived by differentiating Eq. 12. We then compute the heat capacity C by differentiating Eq. 14, expressing the result in terms of the ηij, and substituting it into Eq. 26.

Behavior at the phase transition

In many geometries, the ferromagnetic Ising model has a phase transition at a nonzero critical temperature between a symmetric state with zero average magnetization and a symmetry broken state with nonzero magnetization. Substituting Eq. 24 into Eqs. 11 and 12, we can show that the message passing equations for the Ising model always have a trivial solution qij(sj)=12 for all i, j. This choice is a fixed point of the message passing iteration: When started at this point, the iteration will remain there indefinitely. Looking at Eq. 25, we see that this fixed point corresponds to magnetization M = 0. If the message passing iteration converges to this trivial fixed point, therefore, it tells us that the magnetization is zero, and we are above the critical temperature; if it settles elsewhere, then the magnetization is nonzero, and we are below the critical temperature. Thus, the phase transition corresponds to the point at which the fixed point changes from being attracting to being repelling.

This behavior is well known in standard belief propagation, where it has been shown that, on networks with long loops only, there is a critical temperature TBP below which the trivial fixed point becomes unstable and hence the system develops nonzero magnetization and that this temperature corresponds precisely to the conventional zero-field continuous phase transition on these networks (40). Extending the same idea to the present case, we expect the phase transition on a loopy network to fall at the corresponding transition point between stable and unstable in our message passing formulation.

Moreover, because the values of the messages at the trivial fixed point are known, we can compute an expression for the phase transition point without performing any message passing. We treat the message passing iteration as a dynamical system and perform a linear stability analysis of the trivial fixed point. Perturbing around q=12 (shorthand for setting all qij=12) and keeping terms to linear order, we find that the dynamics is governed by the JacobianJji,νμ=qijqμν|q=1/2=B˜ji,νμDji,νμ(28)where B is a generalization of the so-called non-backtracking matrix (41) to our loopy message passing formulationB˜ji,νμ={ 1if j=μ and νNji0otherwise,(29)and Dji, ν → μ is a correlation function between the spins sμ and sν within the neighborhood Nji—see section S3 for details.

When the magnitude of the leading eigenvalue λmax of this Jacobian is less than 1, the trivial fixed point is stable; when it is greater than 1, the fixed point is unstable. Hence, we can locate the phase transition temperature by numerically evaluating the Jacobian and locating the point at which ∣λmax∣crosses 1, for instance, by binary search.

Equation 29 is also useful in its own right. The non-backtracking matrix has numerous applications within network science, for instance in community detection (41), centrality measures (42), and percolation theory (43). The generalization defined in Eq. 29 could be used to extend these applications to loopy networks, although we will not explore such calculations here.


A model network

As a first example application, we examine the behavior of our method on a model network created precisely to have short loops only up to a specified maximum length. The network has short primitive cycles only of length r + 2 and less for a given choice of r, although it can also have long loops—it is locally dense but globally sparse in the sense discussed previously. This turns out to be a crucial point. The Ising model does not have a normal phase transition on a true tree, because, at any finite temperature, there is always a nonzero density of defects in the spin state (pairs of adjacent spins that are oppositely oriented), which, on a tree, divide the network into finite sized regions, imposing a finite correlation length and hence no critical behavior. Similarly, in the case of a network with only short loops and no long ones, there is no true phase transition. The long loops are necessary to produce criticality, a point discussed in detail in (44).

To generate networks that have short primitive cycles only up to a certain length, we first generate a random bipartite network—a network with two types of nodes and connections only between unlike kinds—then “project” down onto one type of node, producing a network composed of a set of complete subgraphs or cliques. In detail, the procedure is as follows.

1) We first specify the degrees of all the nodes, of both types, in the bipartite network.

2) We represent these degrees by “stubs” of edges emerging, in the appropriate numbers, from each node, then we match stubs at random in pairs to create our random bipartite network.

3) We project this network onto the nodes of type 1, meaning that any two such nodes that are both connected to the same neighbor of type 2 are connected directly with an edge in the projection. After all such edges have been added, the type 2 nodes are discarded.

4) Last, we remove a fraction p of the edges in the projected network at random.

If p = 0, then the network is composed of fully connected cliques, but when p > 0, some cliques will be lacking some edges; hence, the network is composed of a collection of subgraphs of size equal to the degrees of the corresponding nodes of type 2 from which they were projected. If we limit these degrees to a maximum value of r + 2, then there will be no short loops of length longer than this.

Figure 2 shows the magnetization per spin, entropy, and heat capacity for the ferromagnetic Ising model on an example network of 9447 nodes and 13,508 edges generated using this procedure with r = 2 and p = 0.6. We also limit the degrees of the type 1 nodes in the bipartite graph to a maximum of 5, which ensures that no neighborhood in the projection is too large to prevent a complete summation over states and hence that Monte Carlo estimation of the sums in the message passing equations is unnecessary.

Fig. 2 Ferromagnetic Ising model critical behavior on synthetic network.

The top panel shows the average magnetization, while the bottom one shows the heat capacity and the entropy (the latter shifted up for visualization purposes). Results are shown for r = 0, 1, 2 and direct Monte Carlo simulation (denoted MCMC). The magnitude of the leading eigenvalue for the Jacobian is also shown in the top panel for all three values of r, and we can see that the apparent positions of the phase transition, revealed by discontinuities in the physical quantities or their gradients, correspond closely to the temperatures at which the associated eigenvalues are equal to 1.

Results are shown for belief propagation calculations with r = 0, 1, and 2, the last of which should, in principle, be exact except for the weak correlations introduced by the presence of long loops in the network. We also show in the figure the magnitude of the leading eigenvalue of J for each value of r. The points at which this eigenvalue equals 1, which give estimates of the critical temperature for each r, are indicated by the vertical lines. Also shown in the figure for comparison are results from direct Monte Carlo simulations of the system, with the entropy calculated from values of the heat capacity computed from energy fluctuations and then numerically integrated using the identityS=0TC(T)T dT(30)

The message passing simulations offer significantly faster results for this system: For r = 2, message passing was about 100 times faster than the Monte Carlo simulations.

Looking at Fig. 2, we can see that as we increase r, the message passing results approach those from the direct Monte Carlo, except close to the phase transition, where the Monte Carlo calculations suffer from finite size effects that smear the phase transition, to which the message passing approach appears largely immune. While the results for conventional belief propagation (r = 0) are quite far from the direct Monte Carlo results, most of the improvement in accuracy from our method is already present even at r = 1. Going to r = 2 offers only a small additional improvement in this case.

The apparent position of the phase transition aligns well with the predictions derived from the value of the Jacobian for each value of r. The transition is particularly clear in the gradient discontinuity of the magnetization. For r = 1 and 2, the heat capacity appears to exhibit a discontinuity at the transition, which differs from the divergence we expect on low-dimensional lattices but bears a resemblance to the behavior seen on Bethe lattices and other homogeneous tree-like networks (7, 45, 46).

Real-world networks

For our next example, we look at an application on a real-world network, where we do not expect the method to be exact, although as we will see it nonetheless performs well. The network that we examine has larger local neighborhoods than our synthetic example, which means that we are not able to sum exhaustively over all configurations of the spins sNji in Eq. 12 (and similarly sNi in Eq. 11) so, as described in Implementation, we instead make use of Monte Carlo sampling to estimate the messages qij and marginals qi.

The summation over local spins in Eq. 12 is equivalent to computing the expectation in Eq. 15. To calculate qij(sj = y), we fix the values of the incoming messages {qjk} and perform Monte Carlo sampling over the states of the spins in the neighborhood Nji with the Hamiltonian of Eq. 9. Then, we compute the average in Eq. 15 separately for the cases y = 1 and −1 and normalize to ensure that the results sum to one. The resulting values for qij can then be used as incoming messages for calculating other messages in other neighborhoods. We perform the Monte Carlo using the Wolff cluster algorithm (47), which makes use of the Fortuin-Kasteleyn percolation representation of the Ising model to flip large clusters of spins simultaneously and can significantly reduce the time needed to obtain independent samples, particularly close to the critical point. Once the messages have converged to their final values, we compute the marginals qi by performing a second Monte Carlo, this time over the spins sNi with the Hamiltonian of Eq. 8. More details on the procedure are given in section S2.

The Monte Carlo approach combines the best aspects of message passing and traditional Monte Carlo calculations. Message passing reduces the sums we need to perform to sets of spins much smaller than the entire network, while the Monte Carlo approach markedly reduces the number of spin states that need to be evaluated. The approach has other advantages too. For instance, because of the small neighborhood sizes, it shows improved performance in systems with substantial energy barriers that might otherwise impede ergodicity, such as antiferromagnetic systems. But perhaps its biggest advantage is that it effectively allows us to sample very large numbers of states of the network without taking very large samples of individual neighborhoods. If we sample k configurations from one neighborhood and k configurations from another, then, in effect, we are summing over k2 possible combinations of states in the union of the two neighborhoods. Depending on the value of r, there are at least 2m neighborhoods Nji in a network, where m is the number of edges, and hence we are effectively summing over at least k2m states overall, a number that increases exponentially with network size. Effective sample sizes of 101000 or more are easily reachable, far beyond what is possible with traditional Monte Carlo methods.

Figure 3 shows the results of applying these methods with r = 0…4 to a network from (48) representing the structure of an electric power grid, along with results from direct Monte Carlo simulations on the same network. As the figure shows, the magnetization is again poorly approximated by the traditional (r = 0) message passing algorithm but improves as r increases. In particular, the behavior in the region of the phase transition is quite poor for r = 0 and does not provide a good estimate of the position of the transition. For r = 1 and 2, however, we get much better estimates, and for r = 3 and 4, the method approaches the Monte Carlo results quite closely, with the critical temperature falling somewhere in the region of T = 1.6 in this case. We also see a much clearer phase transition in the message passing results than in the standard Monte Carlo because of finite size effects in the latter. These results all suggest that, for real systems, our method can give substantial improvements over both ordinary belief propagation and direct Monte Carlo simulation and, in some cases, show completely different behavior altogether.

Fig. 3 Ferromagnetic Ising model critical behavior on a power grid network.

Message passing and Monte Carlo calculations of the average magnetization, entropy, and specific heat on the “494 bus power system” network from (48). Results are shown for r = 0…4 and direct Monte Carlo simulation (MCMC). Again, the message passing results approximate the real solution progressively better as r grows larger.


Here, we have presented a new class of message passing algorithms for solving probabilistic models on networks that contain a high density of short loops. Taking the Ising model as an example, we have shown that our methods give substantially improved results in calculations of magnetization, heat capacity, entropy, marginal spin probabilities, and other quantities over standard message passing methods that do not account for the presence of loops. Our methods are exact on networks with short loops up to a fixed maximum length, which we can choose, and can give good approximations on networks with loops of any length.

Message passing methods for probabilistic models on loopy networks have been proposed in the past, the best known being the generalized belief propagation method of Yedidia et al. (18). Generalized belief propagation uses a region-based approximation (49), in which the free energy ln Z is approximated by a sum of independent local free energies of regions within the network. Once the regions are defined, it is straightforward to write down belief propagation equations, which can be used to calculate marginals and other quantities of interest, including approximations to the partition function and entropy. Perhaps the best known example of generalized belief propagation, at least within the statistical physics community, is the cluster variational method, in which the regions are defined so as to be closed under the intersection operation (24), and the resulting free energy is called the Kikuchi free energy (50).

The accuracy and complexity of generalized belief propagation is determined by the specific choice of regions, which has been described as being “more of an art than a science” (39). Loops contained within regions are correctly accounted for in the belief propagation, while those that span two or more regions are not and introduce error. At the same time, the computational complexity of the belief propagation calculations increases exponentially with the size of the regions (39), so choosing the right regions is a balancing act between enclosing as many loops as possible while not making the regions too large. A number of heuristics have been proposed for choosing the regions (19, 51, 52), but real-world networks can pose substantial difficulties because they often contain both high degrees and many loops (1), which effectively forces us to compromise either by leaving loops out or by using very large regions. Our method can have a substantial advantage in these systems because it can accommodate large, tightly connected neighborhoods through local Monte Carlo sampling. Our method also has the benefit that the neighborhoods are constructed automatically based on the network structure rather than being chosen by the user.

There are many ways in which the methods and results of this paper could be extended. We have studied only one application in detail, the Ising model, but the formalism that we present is a general one that could be applied to many other models. In principle, any model with sparse pairwise interactions (i.e., interactions whose number scales subquadratically with the number of variables) could be studied using these methods. For example, there is a large class of generative models of networks in which edges appear with probabilities that depend on the properties of the adjacent nodes. Examples include the Chung-Lu model (53) and the stochastic block model and its variants (54, 55). If we assume an observed network to be drawn from such a model, then we can use statistical inference to estimate the values of hidden node attributes that influence edge probability, such as community membership. Our message passing methods could be applied to such inference calculations and could, in principle, give more accurate results in the common case where the observed network contains many short loops.

Another potential application in the realm of statistical inference is the inverse Ising model, the problem of inferring the parameters of an Ising or Ising-like model from an observed sequence of spin states, which has numerous applications including the reconstruction of neural pathways (56), the inference of protein structure (57), and correlations within financial markets (58). It can be shown that the one- and two-point correlation functions of the observed spins are sufficient statistics to reliably estimate coupling and external field parameters (59), and our method could be used to compute these statistics on loopy networks to greater accuracy than with traditional message passing and faster than standard Monte Carlo simulation. Other potential applications, further afield from traditional statistical physics, include the solution of constraint satisfaction problems, coding theory, and combinatorial optimization.


Supplementary material for this article is available at

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.


Acknowledgments: Funding: This work was funded in part by the U.S. Department of Defense NDSEG fellowship program (to A.K.) and by the National Science Foundation under grants NSF IIS-1838251 (to G.T.C.), DMS–1710848 and DMS–2005899 (to M.E.J.N.). Author contributions: A.K., G.T.C., and M.E.J.N. designed the study; A.K. performed the simulations; and A.K. and G.T.C. performed the mathematical analysis. All authors contributed to writing the manuscript. Competing interests: The authors declare that they have no competing interests. Data materials and availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials, except for the power grid data, which are available from the original source in (48): Additional data related to this paper may be requested from the authors.

Stay Connected to Science Advances

Navigate This Article