## Abstract

Quickly obtaining optimal solutions of combinatorial optimization problems has tremendous value but is extremely difficult. Thus, various kinds of machines specially designed for combinatorial optimization have recently been proposed and developed. Toward the realization of higher-performance machines, here, we propose an algorithm based on classical mechanics, which is obtained by modifying a previously proposed algorithm called simulated bifurcation. Our proposed algorithm allows us to achieve not only high speed by parallel computing but also high solution accuracy for problems with up to one million binary variables. Benchmarking shows that our machine based on the algorithm achieves high performance compared to recently developed machines, including a quantum annealer using a superconducting circuit, a coherent Ising machine using a laser, and digital processors based on various algorithms. Thus, high-performance combinatorial optimization is realized by massively parallel implementations of the proposed algorithm based on classical mechanics.

## INTRODUCTION

Combinatorial optimization problems appear in various social and industrial situations, so quickly solving such problems makes the society and industry more efficient. However, these problems are notoriously hard due to combinatorial explosion, an exponential increase in the number of candidate solutions depending on the problem size (*1*). Thus, novel computational approaches to combinatorial optimization have been expected. A well-known example is a quantum annealer (QA), which is based on quantum annealing (*2*–*4*) and its superconducting circuit implementation (*5*, *6*). The QA is an Ising machine designed to find ground states of Ising spin models (*7*). Such Ising machines are believed to be broadly useful, because the Ising problem belongs to the nondeterministic polynomial time (NP)–complete class (*7*), and consequently, many combinatorial optimization problems can be reduced to the Ising problem (*8*). Various Ising machines other than QA have been developed, including coherent Ising machines (CIMs) implemented with pulse lasers (*9*–*14*) and other kinds of optical Ising machine (*15*–*18*), as well as digital processors based on simulated annealing (SA) (*19*–*23*), which is a standard heuristic algorithm for combinatorial optimization (*1*, *24*, *25*), or other recently proposed algorithms (*26*–*29*).

A heuristic algorithm called simulated bifurcation (SB) has recently been proposed for accelerating combinatorial optimization (*30*). SB is a purely quantum-inspired algorithm, that is, it was derived from a classical-mechanical model corresponding to a quantum computer called a quantum bifurcation machine (QbM) (*30*–*33*), which is based on quantum adiabatic optimization using nonlinear oscillators exhibiting quantum-mechanical bifurcation phenomena. Consequently, SB is based on numerical simulation of adiabatic evolutions in classical nonlinear Hamiltonian systems exhibiting bifurcations (*34*). Different dynamical approaches have also recently been proposed (*35*–*41*). SB is not based on the gradient method, unlike other dynamical approaches such as the Hopfield-Tank model (*42*), simulated CIM (SimCIM) (*35*, *39*), and their variants, but based on adiabatic evolutions of energy conservative systems like purely adiabatic QA and QbM. [Such interesting contrast between QbM and CIM has been summarized in a review paper on this topic (*33*).]

Unlike SA in general cases, SB allows simultaneous updating of variables and therefore can easily accelerate combinatorial optimization through massively parallel processing using modern many-core processors such as field-programmable gate arrays (FPGAs) (*30*, *43*, *44*) and graphic processing units (GPUs) (*30*). An SB-based machine (SBM) implemented with a single FPGA, where about 8000 operations are performed in parallel, was able to find good approximate solutions of a 2000-spin Ising problem in 0.5 ms, about 10 times faster than a CIM (*30*). This result suggests that parallelizability is a key property of optimization algorithms for their acceleration by fully exploiting modern high-performance computing systems. In this direction of research, other parallelizable algorithms have also recently been proposed by mapping a given problem to a bipartite one and applying parallel SA updating to each group of spins (*26*–*28*). These new algorithms essentially rely on the same mechanism, although they are given different names: momentum annealing (MA) (*26*), stochastic cellular automata annealing (SCA) (*27*), and restricted Boltzmann machine (RBM)’s parallel stochastic sampling (*28*).

The previous results on SB demonstrate that SB is useful for quickly finding good approximate solutions. However, it remains unclear whether SB can find optimal solutions of large-scale problems. For enhancing the power of SB in terms of solution accuracy, in this work, we introduce two SB variants, named ballistic SB (bSB) and discrete SB (dSB), in addition to the original adiabatic SB (aSB). We solve various problems to compare the performance of bSB and dSB with that of aSB and other recently developed machines, including a QA, a CIM, and digital processors based on various algorithms. This benchmarking shows that bSB and dSB provide faster and more accurate optimizations than does aSB and that our new SBMs achieve high performance compared to the other machines. dSB can find optimal or near-optimal solutions of problems with up to one million spins, which aSB and bSB cannot achieve. Thus, high-performance machines for combinatorial optimization are realized by massively parallel implementations of the proposed algorithms based on classical mechanics.

## RESULTS

### bSB and dSB algorithms

The Ising problem is to find a spin configuration minimizing the Ising energy, defined as*s _{i}* denotes the

*i*th spin taking 1 or −1,

*N*is the number of spins,

*J*

_{i,j}is the coupling coefficient between the

*i*th and

*j*th spins (

*J*

_{i,j}=

*J*

_{j,i}and

*J*

_{i,i}= 0), and

*h*is the local field on the

_{i}*i*th spin. Since introducing an ancillary spin reduces the Ising problem to the one without local fields (see section S1), here, we focus on the Ising problem with no local fields (

*h*= 0).

_{i}To solve the Ising problem, QbM uses quantum-mechanical nonlinear oscillators called Kerr-nonlinear parametric oscillators (KPOs), each of which can generate a Schrödinger cat state, i.e., a quantum superposition of two oscillating states, via a quantum-mechanical bifurcation (*31*). Such a KPO has recently been realized experimentally using superconducting circuits (*45*, *46*). However, the realization of a large-scale QbM will require a long time. On the other hand, it was found that a classical-mechanical model corresponding to QbM, referred to as a classical bifurcation machine (CbM), also works as an Ising machine (*31*, *33*). In this case, we can efficiently simulate such a classical machine using present digital computers, instead of building real machines. [This is not the case for QbM, because QbM can also be used as a universal quantum computer (*47*, *48*), which is classically unsimulatable.] This simulation approach paves the way for large-scale Ising machines with all-to-all connectivity.

By modifying the equations of motion for CbM such that computational costs are reduced and the symplectic Euler method (*49*) is applicable, we obtain the following Hamiltonian equations of motion for aSB (*30*)*x _{i}* and

*y*are, respectively, the position and momentum of a particle corresponding to the

_{i}*i*th spin, dots denote time derivatives,

*a*(

*t*) is a control parameter increased from zero,

*a*

_{0}and

*c*

_{0}are positive constants, and

*V*

_{aSB}is the potential energy in aSB.

To qualitatively explain the operation principle of aSB, we show an example of the dynamics in aSB in Fig. 1 (A and B), where the ferromagnetic two-spin Ising problem (*J*_{1,2} = *J*_{2,1} = 1) with solutions *s*_{1} = *s*_{2} = ± 1 is solved as the simplest problem. All positions and momenta are randomly set around zero at the initial time. The initial potential has a single local minimum at the origin (top and middle of Fig. 1B), so the particles circulate around the origin (Fig. 1A and middle of Fig. 1B). When *a*(*t*) becomes sufficiently large, bifurcations occur, that is, multiple local minima of the potential appear. Then, the particles adiabatically follow one of the minima. Consequently, each *x _{i}* bifurcates to a positive or negative value (Fig. 1A and bottom of Fig. 1B). Since two local minima corresponding to the two solutions have lower energies and appear earlier than the other two local minima, the particles successfully find one of the solutions. Last, the sign of

*x*, sgn(

_{i}*x*), gives the

_{i}*i*th spin,

*s*, for the solution of the Ising problem. It has been empirically found that aSB works well for much larger-scale and more complex problems (

_{i}*30*).

This aSB relies on the fact that the second term in *V*_{aSB} is approximately proportional to the Ising energy at the final time (*30*). In this approximation, analog errors arise from the use of continuous variables (positions) instead of discrete variables (spins). These analog errors in aSB may degrade solution accuracy and result in approximate solutions. Such analog errors in different dynamical approaches have also been discussed (*38*, *40*).

To suppress analog errors, we introduce perfectly inelastic walls at *x _{i}* = ± 1. That is, at each time, we replace

*x*with its sign, sgn(

_{i}*x*) = ± 1, and set

_{i}*y*= 0 if ∣

_{i}*x*∣ > 1. These walls force positions to be exactly equal to 1 or −1 when

_{i}*a*(

*t*) becomes sufficiently large. Moreover, we drop the fourth-order terms in

*V*

_{aSB}, because the inelastic walls can play a role similar to the nonlinear potential walls. We thus obtain the following equations

*, together with the updating for the inelastic walls (see Methods for a detailed algorithm). (If we solve these equations by the standard Euler method, instead of the symplectic Euler method, then solution accuracy becomes lower. See section S2.)*

_{t}Similar modification to the above walls has been proposed for SimCIM (*39*), but this algorithm is based on the gradient method, like the Hopfield-Tank model (*42*), and also uses stochastic processes. In contrast, bSB is based on a classical-mechanical system conserving energy except for the inelastic walls and adiabatic changes of energy and also uses deterministic processes except for initial value setting. As a result, the performance of bSB is quite different from that of SimCIM (see section S3 for the comparison between bSB and SimCIM).

In bSB, it is sufficient to increase *a*(*t*) to *a*_{0}. Then, the final potential has only the second term related to the Ising energy. Consequently, the following condition is satisfied for all *i* at the final time (see section S4)*s _{i}* = sgn (

*x*) is the sign of

_{i}*x*and Δ

_{i}*E*represents the change in the Ising energy for a flip of

_{i}*s*. Note that Eq. 10 is a sufficient condition to show that the spin configuration is a local minimum of the Ising problem. Hence, solutions obtained by bSB are at least local minima in the Ising problem. In contrast, this is not necessarily guaranteed in aSB because of its nonlinear potential terms. (This means that solutions obtained by aSB can sometimes be improved by a naïve local search based on sequential spin flips.) This is another reason why bSB should achieve higher accuracy than aSB. Throughout this work, we linearly increase

_{i}*a*(

*t*) from 0 to

*a*

_{0}and set

*a*

_{0}to 1.

Here, we show an example of the bSB dynamics using the same two-spin problem as above. The initial potential has a single local minimum at the origin (top of Fig. 1D) and particles circulate around the origin (Fig. 1C and middle of Fig. 1D), as in aSB. In bSB, however, stable points suddenly jump from the origin to the walls at *x _{i}* = ± 1, which prevents adiabatic evolution. Instead, particles move toward walls in a ballistic manner (Fig. 1C and bottom of Fig. 1D). This ballistic (nonadiabatic) behavior in bSB leads to fast convergence to a local minimum of

*V*

_{bSB}and, consequently, to fast optimization.

For further improvement, we introduce another variant of SB by discretizing *x _{j}* to sgn(

*x*) in the second term in Eq. 7

_{j}Note that the singularity on the boundaries between positive and negative regions has been intentionally neglected. This leads to a violation of conservation of energy across boundaries and, hence, to escape from local minima over potential barriers, as shown below. In this sense, dSB goes beyond naïve algorithms based on classical-mechanical systems conserving energy (except adiabatic change), such as aSB and bSB. We also increase *a*(*t*) to *a*_{0} for the convergence to a local minimum of the Ising problem at the final time, as in bSB (see section S4).

Figure 1 (E and F) shows an example of the dSB dynamics using the same two-spin problem as above. Unlike aSB and bSB, the particles go back and forth between two local minima through the potential barriers (Fig. 1E and middle of Fig. 1F). This is similar to quantum tunneling, as depicted by the inset in Fig. 1I. This tunneling-like behavior is possible due to the above-mentioned neglect of the singularity on the boundaries; otherwise, the potential walls on the boundaries prevent this tunneling. In contrast, conservation of energy prevents such tunneling in aSB and bSB, as suggested by the insets in Fig. 1 (G and H). Thus, it is expected that this tunneling-like behavior will help dSB to escape local minima of the potential, and hence, dSB will outperform aSB and bSB in terms of solution accuracy.

Note that both bSB and dSB maintain the advantage of aSB over SA, namely, high parallelizability. Therefore, they are expected to realize both high speed and high accuracy simultaneously.

### Performance for a 2000-spin Ising problem with all-to-all connectivity

To compare the performance of bSB and dSB with that of aSB, we solved a 2000-spin Ising problem with all-to-all connectivity. This problem was named K_{2000} and previously solved by aSB (*30*), a CIM (*11*), and a recently developed digital chip called STATICA (*27*), which is based on the above-mentioned SCA. This problem can be regarded as a 2000-node MAX-CUT problem (*11*, *30*); so, here, we evaluate performance using cut values (see section S6 for the definition of the cut value and the relation between MAX-CUT and the Ising problem). The best cut value for K_{2000} is estimated to be 33,337 (see section S7).

The lines and symbols in Fig. 2A show average and maximum cut values, respectively, for 1000 trials as functions of the number of time steps, *N*_{step}. (Throughout the paper, *N*_{step} denotes the total number of time steps for each trial and the values of cost functions (cut values or Ising energies) as functions of *N*_{step} are final values, not intermediate values, in each trail.) The results clearly show that both bSB and dSB outperform aSB in terms of both speed and accuracy. In addition, only dSB obtained the best value. On the other hand, best values obtained by bSB and aSB become lower for larger *N*_{step}. This result suggests that the best values may be obtained accidentally by nonadiabatic processes in bSB and aSB. For large *N*_{step}, dynamics becomes more adiabatic and the chance to obtain better solutions by nonadiabatic processes may be lost.

We implemented 2048-spin-size bSB and dSB machines (bSBM and dSBM) using single FPGAs (see section S8 for details) and solved K_{2000} by them. Figure 2B shows the comparison between our machines and the above three other machines (*11*, *27*, *30*), where the lines and the crosses show the average values of our machines and the others, respectively, for 100 trials, and the bars show the maximum and minimum values among the 100 trials. (The bars for our machines are shown at only typical computation times.) Only our dSBM obtained the best value in a short time (2 ms), thereby simultaneously realizing both high speed and high accuracy. Also, our bSBM is remarkably fast, about three times faster than STATICA (*27*), the previously fastest machine for K_{2000}. Note that the results by STATICA for K_{2000} are predicted values by a simulator, and a real STATICA chip is still 512-spin size (*27*). On the other hand, in this work, we have implemented faster real machines.

### Benchmarking using time-to-solution and time-to-target

To evaluate the computation speed more quantitatively, here, we introduce two metrics: time-to-solution (TTS) and time-to-target (TTT). TTS is a standard metric for evaluating Ising machine speeds (*14*, *23*, *28*, *29*), defined as the computation time for finding an optimal or best known value with 99% probability. TTT uses a target value, instead of an optimal value, as a good approximate solution. In this work, we define the target as 99% of the optimal or best known value. TTS and TTT are formulated as *T*_{com} log (1 − 0.99)/ log (1 − *P*_{S}) (*14*, *23*), where *T*_{com} is the computation time per trial and *P*_{S} is the success probability for finding the optimal (TTS) or target (TTT) value. *P*_{S} is estimated from experimental results with many trials. When *P*_{S} > 0.99, TTS and TTT are defined as *T*_{com}.

In the following, we compare TTS and TTT of our 2048-spin-size bSBM and dSBM with those of other recently developed machines shown in Fig. 3A. Since the bSBM can quickly find good approximate solutions and the dSBM can find optimal solutions of large-scale problems, we use the bSBM and dSBM for evaluations of TTT and TTS, respectively. TTS and TTT of other machines are cited or estimated from the data in the literature (see section S9 for details), because we could not use such machines for the present work. This limits the range of instances that can be used for this benchmarking. Also note that some machines are not the latest ones, as mentioned below.

Figure 3B shows the results of TTT. For K_{2000}, the TTT of our bSBM (0.26 ms) is much shorter than those of STATICA (*27*) (1.50 ms, a predicted value by a simulator) and the CIM (*11*) (1.1 s). As a 2000-spin-size instance with sparse connectivity, we also solved G22, which is one of the well-known MAX-CUT benchmark instances called G-set and was solved by the CIM (*11*). For G22, the TTT of our bSBM is two orders of magnitude shorter than that of the CIM. These results demonstrate that our bSBM can find good approximate solutions faster than other recently developed machines of the same spin size. (TTTs of our machines for other G-set instances are provided in table S2.)

Next, we show the results of TTS in Fig. 3C. We start with the same two instances, namely, K_{2000} and G22. The TTS of our dSBM for them are 1.3 and 2.7 s, respectively. While TTS for K_{2000} has not been reported so far, TTS for G22 was evaluated with a SimCIM implemented on a FPGA (*29*), which is based on a recently proposed algorithm called chaotic amplitude control (CAC) (*29*, *40*). The TTS of the SimCIM is estimated at 12 s (see section S9). Thus, our dSBM has achieved a shorter TTS for G22 than that of the state-of-the-art machine. (TTSs of our machines for other G-set instances and the comparison between them and those of the SimCIM are provided in table S2 and fig. S6, respectively.)

We also solved other various instances of the Ising problem (MAX-CUT) by dSBM to compare it with other machines shown in Fig. 3A. For small-scale problems, we can simultaneously perform multiple trials using the 2048-spin-size machine by a block-diagonal structure of the *J* matrix, as done using a CIM (*14*). This so-called batch processing improves the success probability *P*_{S} by selecting the best result among multiple trials, while *T*_{com} is defined as the computation time per batch. In the limit as the number of trials per batch *N*_{batch} goes to infinity, *P*_{S} may exceed 0.99, and then TTS and TTT become *T*_{com} from the above definitions. In this sense, the TTS and TTT are well defined even for batch processing.

As shown in Fig. 3C, for two small-scale problems with 60 spins (all-to-all connectivity) and 200 spins (sparse connectivity), our dSBM achieved much shorter TTSs than those of a QA and a CIM (*14*). (This QA is not the latest version.) For 100-spin and 700-spin problems with all-to-all connectivity, the TTSs of the dSBM are much shorter than those of the SimCIM with CAC (*29*). These short TTSs of dSBM compared to the SimCIM come not from computation speed or implementations but the algorithmic advantage of dSB over CAC. That is, dSB needs fewer matrix-vector multiplications (MVMs), which is the most computation-intensive part in both algorithms, to obtain solutions than does the SimCIM with CAC. [The numbers of MVMs to solutions of the 100-spin and 700-spin problems are 9.4 × 10 and 8.1 × 10^{4}, respectively, for dSB but 5.6 × 10^{3} and 7.8 × 10^{5}, respectively, for CAC (*29*).] For two 1024-spin problems (all-to-all connectivity) with different ranges of *J*_{i,j}, the dSBM achieved shorter TTSs than those of a Digital Annealer (DA) (*23*), which is based on an FPGA implementation of “Digital Annealer’s algorithm” developed from SA and outperformed CPU implementations of SA (*25*) and parallel tempering (*50*). (This DA is not the latest version.) Last, for 60-spin and 100-spin problems with all-to-all connectivity, the TTSs of the dSBM are comparable to those of another state-of-the-art machine based on an FPGA implementation of the above-mentioned RBM’s stochastic sampling (*28*), the size of which, however, is limited to 200 spins. Overall, we conclude that our bSBM and dSBM have achieved remarkably high performance for the present benchmark instances compared to the recently developed machines.

### Performance for ultralarge-scale Ising problems

Last, we present the results for two ultralarge-scale Ising problems: a 100,000-spin problem with all-to-all connectivity and a 1,000,000-spin problem with a sparse connectivity of 1% (see section S7 for their detailed definitions for reproduction). Using a GPU cluster with 16 GPUs, we solved these by aSB, bSB, dSB, and our best implementation of SA (see section S10 for details). For comparison, we also solved them by aSB and SA (*25*) running on a CPU core. Figure 4 shows the results, where the result obtained by the four-GPU implementation of the above-mentioned MA (*26*) is also shown. The horizontal lines show optimal (dashed) and target (dotted) values estimated using a formula based on statistical mechanics (see section S7 for details) (*51*, *52*).

Figure 4A shows that all three GPU-cluster SBMs outperformed the MA machine (*26*) in terms of both speed and accuracy. Figure 4 (A and B) also show that the GPU-cluster implementation achieved about 1000 times speedup over a CPU core for aSB but only about 100 times for SA. This difference comes from the higher parallelizability of aSB than that of SA. The GPU-cluster bSBM and dSBM are faster than the GPU-cluster aSBM, because of their algorithmic advantage. Figure 4B shows that the dSBM achieved the closest value to the estimated optimal value, giving it the highest accuracy. As shown in Fig. 4 (C and D), similar results also hold for the 1,000,000-spin Ising problem with sparse connectivity.

## DISCUSSION

In this work, we have proposed two new variants of the SB algorithm, named bSB and dSB, both of which outperform the original aSB in terms of both speed and solution accuracy. dSB allows us to find optimal solutions of large-scale problems, which aSB and bSB cannot achieve. We have implemented 2048-spin-size bSBM and dSBM using single FPGAs. Our benchmarking with TTS and TTT has shown that the bSBM and dSBM can achieve remarkably high performance compared to other recently developed machines. GPU-cluster implementations of bSB and dSB also allow us to find nearly optimal solutions of ultralarge-scale problems with up to one million spins.

Last, we discuss possible future works on SB. First, it is important to check the performance of SB for a broader range of instances than those evaluated in this work, which were chosen to compare our machines with previously developed machines reported in the literature. It is known that a single solver cannot achieve the highest performance for all kinds of instances (*53*). Thus, we should examine what kinds of instances can be solved well by SB. Second, it is desirable to develop a technique for auto-tuning of parameters in SB, such as a constant *c*_{0} and a time step Δ* _{t}*. In this work, we used the definition of

*c*

_{0}based on random matrix theory (

*30*) and also chose the best value of Δ

*among five values (see Methods for details). Such preliminary search for the best parameter values is often used for benchmarking to evaluate potential performance of solvers. In practical applications, however, it is desirable to determine appropriate parameter values automatically without preliminary search. Last, development of other variants of SB is an interesting possibility. In this work, we have focused on the Ising problem (MAX-CUT). The generalization of SB to broader classes of problem, e.g., combinatorial optimization with higher-order polynomial cost functions, is an interesting next target.*

_{t}## METHODS

### Ballistic simulated bifurcation

In bSB, we numerically solve the Hamiltonian equations of motion given by Eqs. 6 and 7 using the symplectic Euler method (*49*), as in aSB (*30*). The updating rule for bSB is as follows* _{t}* is the time step and

*t*is the discretized time satisfying

_{k}*t*

_{0}= 0 and

*t*

_{k + 1}=

*t*+ Δ

_{k}*. After updating*

_{t}*x*, we check whether ∣

_{i}*x*∣ > 1. If so, we replace

_{i}*x*with sgn(

_{i}*x*) and set

_{i}*y*= 0, which implements perfectly inelastic walls at

_{i}*x*= ± 1.

_{i}### Discrete simulated bifurcation

In dSB, we numerically solve Eqs. 11 and 12. The updating rule for dSB is as follows*x _{i}* with sgn(

*x*) and set

_{i}*y*= 0 if ∣

_{i}*x*∣ > 1.

_{i}### Efficient implementations of bSB and dSB

The bSB and dSB can be implemented more efficiently as follows. [A similar speedup technique has been used for SA (*25*).]

Storing *x _{j}*(

*k*) =

*x*(

_{j}*t*) −

_{k}*x*(t

_{j}_{k − 1}). Note that Δ

*x*(

_{j}*k*) often becomes zero around the final time. (This is not the case for aSB, making this implementation ineffective.) Hence, the product-sum operation in Eq. 20 can be accelerated by neglecting the terms corresponding to Δ

*x*(

_{j}*k*) = 0.

In the case of dSB, instead of Δ*x _{j}*(

*k*), we use Δ

*s*(

_{j}*k*) = sgn [

*x*(

_{j}*t*)] − sgn [

_{k}*x*(t

_{j}_{k − 1})] as

*s*(

_{j}*k*) becomes zero more often than Δ

*x*(

_{j}*k*), this implementation is more effective for speedup in dSB than in bSB.

### Parameter setting

In the SB algorithms, the time step Δ* _{t}* and the constants

*a*

_{0}and

*c*

_{0}are appropriately set in advance. In this work, we set the constants as

*a*

_{0}= 1 and

*J*〉 is defined as

*c*

_{0}is based on random matrix theory (

*30*). Although this setting of

*c*

_{0}may not be optimal for some instances, this is enough to achieve high performance presented in this work. On the other hand, the setting of Δ

*is more sensitive to performance. We therefore selected the best value among five values: 0.25, 0.5, 0.75, 1, and 1.25. In Figs. 2A and 4, Δ*

_{t}*is set to 0.5 (aSB) or 1 (bSB and dSB). In Fig. 2B, Δ*

_{t}*is set to 1.25 for the bSBM and dSBM. The Δ*

_{t}*values used in Fig. 3 are provided in table S1. As mentioned in Discussion, automatic setting of*

_{t}*c*

_{0}and Δ

*is an important issue but beyond the scope of the present work and so left for future work.*

_{t}## SUPPLEMENTARY MATERIALS

Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/7/6/eabe7953/DC1

This is an open-access article distributed under the terms of the Creative Commons Attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## REFERENCES AND NOTES

**Acknowledgments:**We thank Y. Kaneko, O. Hori, M. Watabiki, M. Iwasaki, M. Ootomo, M. Tomoda, and Y. Izumi for comments and support.

**Funding:**The authors acknowledge no funding in support of this research.

**Author contributions:**H.G. described the dynamical properties of three SBs by producing Fig. 1; estimated optimal solutions for ultralarge instances with T.K.; wrote the manuscript and the Supplementary Materials with data from K.E., Y.H., and R.H.; and supervised the project. K.E. conceived bSB and dSB through discussion with M.S., implemented SBs and SA on a GPU and a 16-GPU cluster, and collected performance data; Y.S. pointed out the mapping of the Ising problem with local fields to one without local fields and convergence to a local minimum of the Ising problem in bSB and dSB; Y.H. and R.H. implemented dSB and bSB, respectively, on an FPGA and collected performance data; M.Y. optimized the FPGA implementations; and K.T. developed the basis of the FPGA implementations, wrote the part of FPGA implementations in the Supplementary Materials, and supervised the FPGA team.

**Competing interests:**H.G., K.E., M.S., Y.S., and K.T. are inventors on a Japanese patent application related to this work filed by the Toshiba Corporation (no. P2019-164742, filed 10 September 2019). The authors declare that they have no other competing interests.

**Data and materials availability:**All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested to the authors.

- Copyright © 2021 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution License 4.0 (CC BY).