## Abstract

Reconstruction of the structure and parameters of an Ising model from binary samples is a problem of practical importance in a variety of disciplines, ranging from statistical physics and computational biology to image processing and machine learning. The focus of the research community shifted toward developing universal reconstruction algorithms that are both computationally efficient and require the minimal amount of expensive data. We introduce a new method, interaction screening, which accurately estimates model parameters using local optimization problems. The algorithm provably achieves perfect graph structure recovery with an information-theoretically optimal number of samples, notably in the low-temperature regime, which is known to be the hardest for learning. The efficacy of interaction screening is assessed through extensive numerical tests on synthetic Ising models of various topologies with different types of interactions, as well as on real data produced by a D-Wave quantum computer. This study shows that the interaction screening method is an exact, tractable, and optimal technique that universally solves the inverse Ising problem.

## INTRODUCTION

The Ising model is a renowned model in statistical physics that was originally introduced to study the phase transition phenomenon in ferromagnetic materials (*1*). In modern applications, the Ising model is regarded as the most general graphical model describing stationary statistics of binary variables, called spins, that admit a pairwise factorization. The spins are associated with the nodes of a graph, and the edges specify pairwise interactions. Given a graph *G* = (*V*, *E*), where *V* is the set of *N* nodes and *E* is the set of edges, the probability measure of an Ising model reads**σ** = {σ_{i}}_{i∈V} denotes the vector of spin variables σ_{i} ∈ {−1, +1}, *Z*, called the partition function, is a normalization factor that ensures * J** and

**. Regimes corresponding to small and large interactions and magnetic field intensities are respectively known as high- and low-temperature phases. Models in which couplings or magnetic fields are positive, negative, or have mixed signs are traditionally referred to as ferromagnet, antiferromagnet, and spin glass, respectively. In numerous application fields, such as statistical physics (*

**H***2*,

*3*), neuroscience (

*4*,

*5*), biopolymers (

*6*), gene regulatory networks (

*7*), quantum computing (

*8*), image segmentation (

*9*), deep learning (

*10*), and sociology (

*11*), the underlying interaction graph and the values of couplings are often unknown a priori and have to be reconstructed from the data, which takes the form of several observed spin configurations. The learning problem that we consider in this paper, called the inverse Ising problem, is stated as follows: given

*M*statistically independent samples {

**σ**

^{(m)}}

_{m=1,…,M}generated by an unknown probability measure

*G*and the parameters {

**,*

**J****}.*

**H**Over the past several decades, a considerable number of techniques have been developed in statistical physics, machine learning, and computer science communities to carry out this reconstruction task (*12*–*24*). A direct maximization of the log-likelihood of the data is generally intractable because it requires a repeated evaluation of the partition function *Z* for different trial values of the parameters {* J*,

*}. Computing*

**H***Z*is, in general, a task of exponential complexity in the number of spins (

*25*), with the exception of some special cases such as tree-structured Ising models (

*26*) and planar Ising models with zero magnetic fields (

*27*). Despite this difficulty, one may still try to maximize the log-likelihood using, for instance, Monte Carlo simulations, as done in the study by Ackley

*et al.*(

*12*) via the so-called learning for Boltzmann machines. In this method, one estimates all the magnetizations and pairwise correlation functions from samples and then maximizes the log-likelihood using a gradient ascent procedure over all couplings and magnetic fields. The Monte Carlo nature of the method makes it exponentially expensive in the number of runs required to achieve a predefined accuracy. Note, however, that this method is asymptotically exact as the number of samples goes to infinity, thus illustrating that “sufficient statistics”–based approaches that use only estimates of first moments and pair correlations of spins can achieve exact reconstruction albeit through computations with exponential complexity (

*28*).

Following the observation that the first and second moments are sufficient to reconstruct Ising models, a number of mean-field approximations have been suggested to circumvent the difficulty of an analytical evaluation of magnetizations and pair correlation functions [see the study by Roudi *et al*. (*14*) for a review]. The applicability of these methods is limited: They perform weakly on systems embedded in a low-dimensional space and in the spin glass regime, where fluctuations are important and cannot be neglected. Some of the limitations of these naïve mean-field methods (*13*) are addressed in more advanced mean-field methods: The small correlation expansion (*15*) considers corrections to the mean-field in the high-temperature regime; Nguyen and Berg (*16*) exploits clustering of samples in the configuration space according to their mutual overlaps; and the Bethe approximation (*17*) is based on the tree-like approximation of the interaction graph. Nevertheless, the applicability of these approximate techniques remains limited to Ising models pertaining to specific classes.

Although sufficient statistics consisting of the first and second moments of the data carry all the information needed for estimating the couplings, the computations required to extract this information are expensive and prohibitive for large systems (*28*). This leaves the use of higher-order moments of the spin statistics as the only way to improve computational complexity. Several heuristic algorithms that use higher-order moments have been proposed on the basis of statistical physics arguments. Among other approximate methods, let us mention the adaptive cluster expansion (*18*), which controls the accuracy of the approximation at a cost of a higher computational complexity involving computation of entropies of growing clusters, and the probabilistic flow method (*19*), which introduces a relaxation dynamics to a certain trial distribution. However, both schemes remain computationally expensive and thus not suitable for large systems and rely on fine tuning of auxiliary parameters. An alternative method, which uses the full information contained in the samples, has been suggested and rigorously analyzed in the study by Ravikumar *et al*. (*20*). Although it has been shown by Montanari and Pereira (*29*) that this estimator is unable to correctly reproduce the underlying graph of the original model at low temperatures, until lately with certain modifications it remained the state-of-the-art practical method (*21*, *22*, *30*). Partly anticipating on our results, we show later in this paper that this regularized pseudolikelihood estimator (RPLE) can be turned into an exact and universal method if completed with a rather natural, but key, ingredient: a post-inference thresholding of reconstructed couplings.

The problem of designing a universal learning algorithm with polynomial computational complexity (*28*) that achieves exact graph topology reconstruction for arbitrary Ising models in all regimes was resolved only recently in the studies by Bresler *et al.* (*23*, *24*). The biggest challenges addressed were the low-temperature regime and long-range correlations, which are known to be particularly difficult for learning. Nonetheless, the computational cost of these algorithms is still high and scales as a polynomial of high degree in the number of nodes (*23*) or double exponential in the maximum node degree *d*_{max} and in the maximum interaction strength (*24*). Moreover, both algorithms require prior information on the bounds on the interaction strengths, that is, positive α and β such that α ≤ |*J*_{ij}| ≤ β for all (*i*, *j*) ∈ *E*, as well as the knowledge of *d*_{max}.

In an attempt to determine the optimal number of samples needed for reconstructing the graph, information-theoretic bounds were derived by Santhanam and Wainwright (*31*). We emphasize three salient features of these bounds. First, the optimal number of samples *M*_{opt} for perfect graph recovery scales exponentially with the maximum interaction value and node degree, *M*_{opt} ∝ *e*^{cγ}, where γ = β*d*_{max} + *h*_{max}, and *h*_{max} denotes an upper bound on the absolute values of magnetic fields. Although it was shown that *c* ∈ [1, 4], the precise value of *c* remains unknown; here, we refer to this range of *c* as to the optimal regime with respect to the dependence of the number of samples on γ. Intuitively, this exponential scaling requirement can be attributed to the typical waiting time for collecting a sufficient number of “nontrivial” samples, that is, those that are different from the ground state configurations. This waiting time is more pronounced in the low-temperature regime when γ is large. Second, for finite *d*_{max}, the dependence on the number of variables *N* is very weak: *M*_{opt} ∝ ln *N*. This logarithmic dependence represents the amount of information needed for hypothesis testing over the set of *32*). Third, the number of required configurations grows as α decreases, because it is difficult to distinguish between the presence of a very weak coupling and its absence. In particular, in the limit of small α, *M*_{opt} ∝ 1/α^{2}.

In what follows, we discuss two exact methods for solving the inverse Ising problem. The first method is based on the RPLE of Ravikumar *et al*. (*20*) supplemented with a post-optimization parameter thresholding procedure. We prove that this ingredient makes the estimator exact, meaning that the algorithm can reconstruct an arbitrary Ising model with an appropriate number of samples. The second algorithm that we introduce is an exact estimator based on the interaction screening method. By setting up a framework for an empirical assessment of the performance of the algorithms guided by the information-theoretic arguments presented above, we show that our new estimator outperforms the pseudolikelihood-based algorithm and requires in all test cases a number of samples lying within the information-theoretically optimal regime.

## RESULTS

### Regularized pseudolikelihood estimator

A widely used approach aiming at achieving the optimal scalings was suggested in the study by Ravikumar *et al*. (*20*), where the estimation of model parameters is performed on the basis of the so-called pseudolikelihood acting as a surrogate for the intractable log-likelihood function. The method is based on maximizing a set of local RPLEs. Each of them can be interpreted as a regularized probability of a single spin *i* conditioned on the remaining *N* − 1 spins in the system given by**J**_{i} and *H*_{i} are the optimization parameters; and **J**_{i} is the shorthand notation for {*J*_{ij}}_{j≠i}. The sparsity promoting **J**_{i} ∥_{1} = ∑_{j≠i} |*J*_{ij}| is important because it discourages the minimizer *20*). Models known to satisfy these conditions are particular ferromagnetic models at high temperature, but this procedure provably fails in other regimes, most noticeably at low temperatures (*29*). A natural extension of this algorithm that uses a postestimation thresholding of a part of nonzero couplings was introduced in the study by Aurell and Ekeberg (*21*). In this scheme, all recovered *J*_{ij} satisfying |*J*_{ij}| < δ, where δ is a chosen threshold, are declared to be zero. However, the performance of the RPLE-based algorithm with thresholding has never been rigorously analyzed, and until now, it was believed that any RPLE scheme fails in the low-temperature regime, following theoretical indications (*29*) and experimental studies conducted in a framework that does not fully account for the sample complexity structure of the inverse Ising problem (*21*). The reason that previous numerical studies show failure of the RPLE with thresholding at low temperatures is most likely due to the hidden dependence of the required number of samples *M** on the strength of the couplings (inverse temperature β) in the original analysis (*20*), which resulted in tests of the reconstruction quality as a function of inverse temperature assuming a constant number of samples only. At the same time, it is clear that in the low-temperature regime, the Boltzmann probability measure concentrates on the ground state samples, that is, most of the samples in a typical batch would correspond to less-informative ground state configurations. Hence, an assessment of empirical performance should be based on a setting where the number of provided samples is exponentially increasing, in agreement with information-theoretic dependencies (*31*). We take this fact into account in the numerical experiments presented below.

In the Supplementary Text, we prove that there exists a minimum number of samples *M** for which the error on the estimated couplings is bounded by α/2, so that choosing δ = α/2 leads to a perfect reconstruction of the graph topology. Hence, our first result states that the RPLE with a post-evaluation thresholding is exact: In the worst case, the required number of samples scales at most as *M** ∝ exp (8γ)ln *N*/α^{2}, (see Supplementary Text for details). Note that the parameter estimation problem for each vertex is independent, and the optimization can be carried out separately for each spin. As we explain below, the symmetrized estimate of coupling associated with the edge (*i*, *j*) is obtained as an average of local estimates *22*).

### Interaction screening method

Recently, we introduced the first exact reconstruction algorithm having the same parametric dependence as the information-theoretic bound and termed the regularized interaction screening estimator (RISE) (*33*). Our theoretical analysis showed that the RISE has a lower theoretical sample complexity for perfect graph recovery compared to the one derived here for the RPLE with thresholding, guaranteeing that a number of samples *M** ∝ exp (6γ)ln *N*/α^{2} is sufficient for reconstruction of the graph structure [see the Supplementary Text and (*33*) for details]. However, factors 6 and 8 in the exponents of the RISE and the RPLE, respectively, are likely to be an artifact of the used proof techniques and are not tight as indicated by the computational experiments in this paper.

The RISE is based on the minimization of the interaction screening objective (ISO)*J*_{i} and the probe magnetic field *H*_{i} for a given spin *i*. The ISO, as its name suggests, is constructed on the basis of the property of “interaction screening,” which is illustrated in Fig. 1. As a consequence of this property, in the limit of a large number of samples, the unique minimizer of the convex ISO objective is achieved at *33*). Here, we introduce a modification to the RISE that leads to a new exact learning method for the inverse Ising problem, which we call the logRISE and which takes the following form

The name logRISE comes from the fact that instead of the ISO itself, we use its logarithm to form the logRISE objective (4). Obviously, in the absence of the regularizer (for λ = 0), taking the logarithm of the ISO does not change its minimizer. However, this difference is crucial for nonzero values of the regularization term, which suggests that logRISE might have good properties for the reconstruction problem due to a particular form of its first and second derivatives (see the Supplementary Text for additional explanations and details).

Unfortunately, the proof techniques used for deriving bounds on scaling for the RPLE and the RISE provide less tight expressions when applied to the estimator logRISE, because it no longer can be represented in a form of finite functional sum over individual samples. Our analysis states that the number of required samples for logRISE in the worst case scales as *M** ∝ exp (10γ)ln *N*/α^{2} for guaranteeing the reconstruction of the structure of the underlying Ising model with a high probability. Given the looseness of the theoretical analysis in this case, the empirical assessment of the performance of the logRISE and its comparison with the RPLE are required. We provide a detailed numerical study of the quality of different estimators below.

As we show through a rigorous analysis in the Supplementary Text, the *N*. However, the performance of the RPLE, the RISE, and the logRISE and hence the number of required samples *M** are dependent on the regularization coefficient λ. The choice of λ needs to account for the following tradeoff: If λ is too small, then the estimation is prone to noise, and if λ is too large, then it introduces a bias in the estimated couplings toward zero. The optimal value of λ is unknown a priori. In the Supplementary Text, we present detailed simulations for different topologies, which show that for achieving correct graph reconstruction with probability 1 − ε, the choice *c*_{λ} ≃ 0.2 for the RPLE, *c*_{λ} ≃ 0.4 for the RISE, and *c*_{λ} ≃ 0.8 for the logRISE. We use these values for λ in all numerical experiments reported below. Given a sufficient number of samples, other techniques such as consistency cross-validation can be used for selecting the optimal value of the regularization coefficient on a case-by-case basis. An illustration of this approach alongside some practical remarks is provided in the Supplementary Text.

### Learning structure and parameters of the model

We state our three-step algorithm for learning the underlying graph and the parameter values of the Ising model using the RPLE or the logRISE (the same algorithm applies to the RISE). First, given *M* samples, we find the optimizer of the objective (2) or (4), respectively, at each node *i* ∈ *V* and obtain a collection of estimated parameters *34*); however, as we comment in the Supplementary Text, better choices such as composite-type gradient descent methods exist for experiments with very large networks (*35*, *36*).

Given a sufficient number of samples *M*, a typical histogram of couplings estimated by the RPLE, the RISE, or the logRISE takes the form shown in Fig. 2A. Notice the emergence of gaps separating a group of inferred couplings that are close to zero from those with significantly bigger intensities in absolute value. In the second step, we threshold the inferred couplings below the observed gaps to zero. The edges associated with the remaining nonzero couplings form the reconstructed graph *N* = 25 nodes and spin glass couplings, where the scatter plot of predicted versus true values of the model parameters is presented, and only parameters over the already reconstructed graph from *M* = *M** = 5000 samples have been accounted for. We see that even using a small number of samples (in this example, the minimal amount required for a correct structure recovery), the numerical values of the parameters are also reconstructed with a very good accuracy that increases when more samples are provided.

To have statistical confidence in our results, we determine *M** as follows. Progressively increasing values of *M*, the reconstruction experiment runs *L* times, using *L* independent sets of *M* samples. On the basis of the number of successful topology reconstructions *L*_{succ}, one can define the empirical probability of reconstruction *P*_{emp} = *L*_{succ}/*L*. We define *M** as the minimum *M* for which *P*_{emp} = 1 (see Fig. 2C for a typical example). The value *L* that we use in our computations, *L* = 45, comes from the requirement of a perfect topology reconstruction with probability greater than 1 − ε, where we fix ε = 0.05. That is, it is essential to get *L* = 45 successful reconstructions in a row to make sure that the probability of correct topology recovery is above 0.95 with confidence of at least 90%, as we explain in the Supplementary Text. We use this value of *L* in the computations throughout the text.

We performed extensive numerical experiments to empirically obtain the minimal number of samples *M** required for perfect graph reconstruction for different topologies and types of interactions. We carried out numerical experiments for all of the three estimators considered in this paper. However, for the sake of simplicity and for the clarity of presentation, in what follows in the main text, we present numerical results only for the logRISE, which is the central object of the present study, and for the RPLE, which is the state-of-the-art method for the inverse Ising problem. Note that throughout the manuscript, we present comparisons of the logRISE with the exact and universal version of the RPLE, that is, corrected through our thresholding procedure. The corresponding scalings for the RISE are available in the Supplementary Text.

We first verify the logarithmic scaling of *M**, claimed in our theoretical analysis for RPLE and logRISE, with respect to the number of spins *N* in ferromagnetic Ising models without magnetic fields (*N*, and given that for spin glass models, this is a nontrivial task (*37*). For the two aforementioned topologies, we generate independent samples using Glauber dynamics for different values of *N* in the low-temperature regime, where the correlations are long-range: we have used *M** on both topologies is presented in Fig. 3. We see that *M** exhibits a logarithmic dependence on *N* for both estimators: the logRISE and the RPLE.

The major difference in performance between the estimators is observed in the scaling with respect to γ = β*d*_{max} + *h*_{max}. This is critical because a favorable exponent allows the algorithm to have a lower sample complexity in the low-temperature regime, where known algorithms either do not work or exhibit poor scaling. An extensive numerical study is presented in Fig. 4, where we study quasi-homogeneous systems with ferromagnetic-type couplings (Fig. 4A, B, and E) and spin glass–type couplings (Fig. 4C and D) on two topologies: square lattice with double-periodic boundary conditions (Fig. 4A, C, and E) and random regular graphs (Fig. 4B and D). This choice of topologies eliminates fluctuations with respect to the heterogeneity of node degrees, so that it becomes easier to extract the right scaling with respect to β and *d*. To disentangle the effects of α and β, we always fix one (for ferromagnets) or two (for spin glass systems) couplings to α and −α, which is different from the interaction values ±β carried by the rest of the edges. Therefore, β can be conveniently thought of as the inverse temperature of the model. To investigate the effect of temperature on the scalings, we deliberately set magnetic fields to zero and fix the thresholding parameter to δ = α/2. The test cases represented in Fig. 4 (A to D) show that, overall, the RPLE and the logRISE demonstrate similar scaling properties. Notice that there exists a qualitative difference in the scaling behavior between the low- and high-temperature regimes, with an exponential scaling for both estimators observed for large β. Our numerical study shows that from the learning perspective, the ferromagnetic model on the two-dimensional lattice appears to be the most challenging class of Ising models for both the logRISE and the RPLE. It has the highest scaling exponent with respect to γ and, hence, the largest sample complexity for the inverse Ising problem. This observation supports theoretical evidence that this case belongs to the hardest class of models with respect to learning (*38*). In particular, this finding shows that, paradoxically, the inverse Ising problem is significantly harder for planar ferromagnetic models compared to spin glass models on random graphs, whereas the direct problem of drawing independent samples from the former can be incomparably easier than from the latter (*37*).

The ultimately hardest case for the reconstruction problem is unknown. However, we were able to construct a slight variant of the ferromagnetic model on a lattice that appears to be even harder for all algorithms considered: a ferromagnetic model with a weak antiferromagnetic interaction, that is, an edge carrying a negative coupling −α. In Discussion, we present intuitive arguments why this case should be fundamentally hard. The results for the extraction of the *M** in this model instance are presented in Fig. 4E. We see that the logRISE has a markedly better scaling exponent compared to the RPLE. In this test case, the scaling exponent of the RPLE is significantly larger than the information-theoretic upper bound, whereas the corresponding value for the logRISE lies within the optimal regime in terms of the information-theoretic predictions. We summarize the scaling behavior of the estimators in Discussion.

### Application to a real system: D-Wave quantum computer

To evaluate the performance and robustness of the estimators in a nonsynthetic case, we apply the logRISE and the RPLE to real data produced by the D-Wave 2X quantum annealer “Ising” at Los Alamos National Laboratory. The D-Wave computer (*39*) has been designed for solving binary quadratic optimization problems in the form of Ising models through quantum annealing, that is, slowly transforming an initially prepared state of the system to the ground state of the desired input Ising Hamiltonian encoded on its chip. Because of the thermal noise in the system, a single annealing run may end in one of the excited states instead of the desired ground state. In practice, the device attempts to find the target ground state by rerunning the annealing multiple times and producing as output the best solution found. Previous experiments with D-Wave report that the produced samples are distributed according to the Boltzmann distribution at some effective temperature (*40*) related but not equal to the native temperature at which D-Wave operates. This effective temperature is naturally low because D-Wave contains superconducting elements as a part of its architecture. Because of the temperature rescaling effect, as well as inevitable biases present in this analog device, the effective Ising model from which the samples are produced does not exactly correspond to the input Ising model. It then becomes interesting to see how the structure of the distorted effective Ising model is related to the one encoded in the chip. This task is exactly what the methods presented in our paper are designed to solve, making it a good real-world application for testing their performance.

Let us describe the procedure that we followed for generating the data. Our goal was to check the performance of the algorithms on a noisy heterogeneous instance, both in node degrees and couplings as well as in magnetic fields. Hence, we encoded an Ising model with random couplings and magnetic fields, distributed uniformly in the range [−0.16, −0.02] ∪ [0.02, 0.16]. We also chose to encode these couplings in a region of the chip with the highest concentration of broken qubits that are inevitably present and can potentially create additional noise. The topology of this portion of the chip containing *N* = 62 qubits is illustrated in gray in Fig. 5A. We observed that the initial Ising model got distorted while being implemented on the chip. From several trial tests, we inferred that the effective rescaling factor in this regime roughly fluctuates around β_{eff} ≈ 12, although this factor is different for individual model parameters. Because the precise values of the couplings and magnetic fields actually implemented on the chip are unknown, the only “ground truth” available to us in this experiment is the topology of the portion of the chip that we encoded our model on. However, let us point out that because of the complexity of the D-Wave architecture and a possible conflict of superconducting loops representing couplers between qubits, it is a priori unclear whether the resulting topology of the effective Ising model will necessarily remain unchanged.

The maximum number of annealing runs for a given Ising model implementation is limited to 10^{4} by standard system settings on the D-Wave. We collected 5 × 10^{5} samples corresponding to the same input model specified above by obtaining 50 batches of 10^{4} samples each and provided them as an input to the logRISE and the RPLE. Notice that each additional implementation of the same chosen Ising model for each batch in principle corresponds to a different actual Ising Hamiltonian owing to a different concrete realization of random biases; this creates an additional source of noise in our data. The reconstructed model parameters are presented in Fig. 5 (A and B). We emphasize that it is difficult to disentangle the effects of statistical errors due to the finiteness of the number of samples and the errors due to noise.

For structure reconstruction, we chose to threshold the parameters *J*_{ij} in the tail of a set of couplings reconstructed in the vicinity of zero. Given this choice of threshold, we found that both algorithms are quite robust to noise and are able to accurately reconstruct the graph topology, making only a few false positives and false negatives. The reconstructed topologies are shown in the left of Fig. 5 (A and B). Notice that although the RPLE makes local errors, detecting one false-positive and one false-negative connections between neighboring spins, the logRISE misclassifies a nonexisting edge as existing in a clearly nonlocal fashion, meaning that the vertices it misclassifies as neighbors are far away in graph theoretic distance on the D-Wave chip. Although in the case of the logRISE it is possible to choose an optimal threshold that allows one to completely separate zero couplings from nonzero ones and, thus, reconstruct the structure of the chip perfectly, no such thresholding is possible for the result produced by the RPLE, suggesting that the RPLE needs more samples before this separation becomes possible. Finally, notice that according to the histograms on reconstructed magnetic fields in the right insets of Fig. 5 (A and B), the RPLE seems to make larger errors in the reconstruction of magnetic fields that should be of the same order as couplings according to our input Hamiltonian.

As we pointed out in the Introduction, a plethora of other methods have been proposed for the inverse Ising problem, but the majority of them either are too computationally expensive for practical applications or fail at low temperatures, sometimes even when an infinite number of samples are provided. To illustrate the value of exact algorithms, especially for problems at low temperatures (such as this application), we compare the results obtained from the logRISE and the RPLE to those from mean-field–type methods (see Fig. 5C). The particular scheme that we used for comparison is obtained from a high-temperature expansion of our estimators and is closely related to the naïve mean-field method of statistical physics, which performs well at high temperatures. See Materials and Methods for a detailed description of the method and related discussions. As expected for such systems with strong and long-range correlations, this method using only information contained in magnetization and pairwise correlations behaves poorly, incurring a very large number of false positives and false negatives. This illustrates an importance of taking into account higher-order interaction in data samples for a reliable reconstruction in the low-temperature regime.

## DISCUSSION

In Fig. 6, we comparatively present the algorithmic scalings that summarize the main theoretical and empirical results of our paper. All of the three considered estimators for the inverse Ising problem have a better worst-case empirical scaling compared to their theoretical estimates. The empirical sample complexity of the logRISE algorithm introduced in this paper lies in the optimal regime with respect to the information-theoretic predictions, outperforming all existing methods. The worst-case scalings are based on the hardest case for the learning problem that we were able to construct. To describe the logic behind this case, we first mention observations in existing literature and then provide intuitive arguments regarding the way the structure of the underlying graph and the nature of interactions affect the hardness of the reconstruction. There are strong theoretical indications that ferromagnetic-type spin systems are among the models requiring a maximum number of samples to be learned. Information-theoretic bounds suggest that these models are at least as hard to learn as any other model (*38*). Moreover, the presence of strong long-range correlations is known to be a challenging situation to deal with (*24*). This hardness of learning of ferromagnetic models is consistent with our numerical studies in which ferromagnetic random graphs and especially ferromagnetic lattices are the cases requiring the largest amount of samples. Intuitive explanations for this behavior are twofold. As mentioned earlier, ferromagnetic models are more prompt to develop strong long-range correlations at low temperatures, especially on lattices, and they tend to favor two configurations that are the ground states. Long-range correlations make it less likely to obtain nontrivial samples, that is, fluctuations around ground states that are crucial to obtain information about the detailed structure of the graph that is crucial for the reconstruction. This translates into a need for a larger number of samples, proportional to the likeliness of such fluctuations that is typically exponentially suppressed in γ. Moreover, when several similar models share identical ground states, it becomes very hard to make a distinction between them solely based on configurations close to their ground states. This mechanism can be illustrated very simply using an extreme example of three spins with homogeneous couplings forming a chain that is either open or closed, forming a triangle. Deciding which chain is formed is impossible for a ferromagnetic system when only the ground states ±(1, 1, 1) are observed. However, it is an easy task for an antiferromagnetic system because an open chain has two ground states ±(1, −1, 1), whereas the close chain has six ground states ±(1, 1, −1), ±(1, −1, 1), and ±(−1, 1, 1).

The hardest test case studied in our numerical experiments contains an extra ingredient that makes the inverse Ising problem even more challenging: an additional weak negative coupling or an “antiferromagnetic impurity” added on top of the ferromagnetic model on a lattice. This weak antiferromagnetic bond has the effect of weakening or cancelling the correlation between the two spins that it connects. Consequently, it becomes difficult to distinguish between the presence of this weak negative coupling from its absence. Although we do not claim with certainty that this model is the hardest to learn, we believe that any such difficult-to-learn model is likely to include the features outlined above.

We proved that the three techniques explored in this paper, the logRISE, the RISE, and the RPLE, are exact and universal methods to solve the inverse Ising problem. Exactness and universality in this context mean that these methods reconstruct couplings and magnetic fields up to any given accuracy with a sufficient but finite number of samples and for every Ising model regardless of its structure, density, temperature, or any other property that characterizes it. Although in the present article we focused on the quantification of the scaling of the number of required samples with structural properties and temperature of sparse systems, it remains an interesting question left for exploration in dense models, for instance, in the Curie-Weiss or the Sherrington-Kirkpatrick type (*37*). In these models, the exponential scaling with coupling intensities and degrees, denoted by γ for sparse models, will be more intricate. It seems reasonable to expect that the sample requirement scales exponentially with the typical “energy per spin.” For instance, in the Curie-Weiss–type models with all *J*_{ij} ≥ 0, this quantity is *J*_{ij} are centered random variables, it reads *31*); instead, we expect it to exhibit a polynomial dependence. Note that, in this case, the inclusion of the

In conclusion, in this paper, we showed both theoretically and experimentally that an arbitrary Ising model can be reconstructed exactly with an information-theoretically minimum number of samples using the introduced interaction screening method. In addition, no prior knowledge on the graph and associated parameters is required to implement the algorithm, making it a very practical choice for applications. The practical advantages of our methods have been illustrated on real data coming from a D-Wave quantum computer. We also provided a sample complexity analysis of the popular RPLE showing the logarithmic scaling in system size for arbitrary Ising models, albeit with a higher worse-case scaling with respect to the inverse temperature when compared to the logRISE. We demonstrated the paradoxical relation between sampling and learning, showing that the instances that are easier for one task are harder for the other. In Materials and Methods, we point out a curious connection to the mean-field approximation at high temperatures. The second-order high-temperature expansion of all exact estimators considered in this paper provides an identical reconstruction scheme, valid in the limit of weak couplings. This high-temperature regime is related to learning methods based on the well-known naïve mean-field approximation in statistical physics. Finally, although this paper is dedicated to the reconstruction of Ising models, the interaction screening method can be generalized to graphical models with higher-order interactions and nonbinary alphabets, including those described by Hamiltonians over continuous variables. Exploration of these research directions is underway.

## MATERIALS AND METHODS

### Interaction screening property

Here, we presented a simple argument that illustrated the fact that in the limit of a large number of samples, the unique minimizer of the convex ISO objective (Eq. 3) was achieved at *S*(*J*_{i}, *H*_{i}) → *S**(*J*_{i}, *H*_{i}) = 〈1/F_{i}(*J*_{i}, *H*_{i})〉. The derivative of the ISO corresponded to weighted pairwise correlations, ∂*S**/∂*J*_{ij} = 〈σ_{i}σ_{j}/F_{i}(*J*_{i}, *H*_{i})〉, and this shed light on its key property. When *M* → ∞.

### High-temperature expansion of exact estimators and connection to the mean-field method

Among all heuristics undertaken to solve the inverse Ising problem, a large fraction of methods is based on mean-field approximations using various level of sophistication [see the study by Roudi *et al*. (*14*) for a review]. In particular, the first such attempt to solve the inverse Ising problem was based on a naïve mean-field approach, where inferred couplings are related to the inverse reduced correlation matrix (*13*). Although these techniques provided satisfactory estimates in the high-temperature regime, they were known to exhibit poor behaviors at low temperatures when the model developed long-range correlations, even for an infinite number of samples (*29*).

It is interesting to observe that there exists a connection between mean-field approaches and the high-temperature expansion of the exact estimators RISE and RPLE. A second-order Taylor expansion of the pseudolikelihood objective function (without regularizer) and the ISO around the high-temperature point (*J*_{i}, *H*_{i}) = (**0**, 0) produced an explicitly solvable minimization problem (see Supplementary Text for an exact derivation). It is remarkable that, in this regime, both objective functions produced identical estimates for the model parameters. Couplings and magnetic fields reconstructed in this mean-field regime (MFR) were expressed as functions of the inverse connected correlation matrix and local magnetizations*m*_{i} = 〈σ_{i}〉_{M}. Note that there was a subtle difference between the MFR estimates in Eq. (5) and the naïve mean-field estimates in (*13*). The values produced by the naïve mean-field method were directly equal to the inverse connected correlation matrix, whereas the MFR estimates were rescaled by the diagonal entries of this matrix. As a result, although both estimators provided similar answers, the choice of symmetrization and thresholding procedures for the reconstructed parameters can lead to significant discrepancies in the final estimates of the graph structure. It is worth noticing that the exact same expression producing the MFR estimates arose in the context of reconstructing multivariate Gaussian distributions (*41*). This parallel suggests that an optimal thresholding and symmetrization procedure for the MFR estimates was likely to be based on the geometrical mean rather than the arithmetic average.

## SUPPLEMENTARY MATERIALS

Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/4/3/e1700791/DC1

Supplementary Text

section S1. Analysis of the estimators RPLE, RISE, and logRISE

section S2. On optimization techniques for minimizing the estimators

section S3. On the procedure for *M** selection

section S4. On the theoretical predictions for λ selection

section S5. Empirical selection of the regularization parameter λ

section S6. Hyperparameter λ selection through cross-validation

section S7. Scalings of the RISE with respect to γ

section S8. High-temperature expansion of the RISE and the RPLE

fig. S1. Schematic representation of the proof strategy for bounding the reconstruction error.

fig. S2. Dependence of the number of samples *M** required for a correct structure recovery on the regularization coefficient.

fig. S3. Selection of the hyperparameter λ through the *K*-fold cross-validation method on a 4 × 4 spin glass system on a square lattice.

fig. S4. Values of *M** and γ exponents for the RISE across different test cases.

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is **not** for commercial advantage and provided the original work is properly cited.

## REFERENCES AND NOTES

**Acknowledgments:**The authors are grateful to G. Bresler, C. Coffrin, A. Montanari, N. Uvarov, and M. Zamparo for fruitful discussions and valuable comments.

**Funding:**The work at Los Alamos National Laboratory (LANL) was carried out under the auspices of the National Nuclear Security Administration of the U.S. Department of Energy under contract no. DE-AC52-06NA25396.

**Author contributions:**All authors designed the research, developed the theory, performed the experiments, and wrote the manuscript.

**Competing interests:**The authors declare that they have no competing interests.

**Data and materials availability:**All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. The code implementing all exact estimators presented in this paper as well as real data produced by the D-Wave quantum computer “Ising” at LANL and used in this work can be found at https://github.com/lanl-ansi/inverse_ising.

- Copyright © 2018 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution NonCommercial License 4.0 (CC BY-NC).