Abstract
Combinatorial optimization problems are ubiquitous but difficult to solve. Hardware devices for these problems have recently been developed by various approaches, including quantum computers. Inspired by recently proposed quantum adiabatic optimization using a nonlinear oscillator network, we propose a new optimization algorithm simulating adiabatic evolutions of classical nonlinear Hamiltonian systems exhibiting bifurcation phenomena, which we call simulated bifurcation (SB). SB is based on adiabatic and chaotic (ergodic) evolutions of nonlinear Hamiltonian systems. SB is also suitable for parallel computing because of its simultaneous updating. Implementing SB with a field-programmable gate array, we demonstrate that the SB machine can obtain good approximate solutions of an all-to-all connected 2000-node MAX-CUT problem in 0.5 ms, which is about 10 times faster than a state-of-the-art laser-based machine called a coherent Ising machine. SB will accelerate large-scale combinatorial optimization harnessing digital computer technologies and also offer a new application of computational and mathematical physics.
INTRODUCTION
In social life and industry, one often encounters problems for finding the best combination among an enormous number of candidates. These combinatorial optimization problems are mathematically formulated as minimization or maximization of certain functions of discrete variables, which are called objective functions or cost functions. These problems are notoriously difficult because of combinatorial explosion (1, 2). Thus, special-purpose hardware devices for these problems are expected to be useful. In particular, “Ising machines” designed for finding a ground state of the Ising spin model (3) have recently attracted much attention because many combinatorial optimization problems can be mapped to the Ising problem (4), including very-large-scale integrated circuit design (5), drug design (6), and financial portfolio management (7). These machines have been developed by various approaches: quantum computers based on quantum annealing (8, 9) or quantum adiabatic optimization (10–12) implemented with superconducting circuits (13, 14), coherent Ising machines (CIMs) implemented with laser pulses (15–20), and Ising machines based on simulated annealing (SA) (1, 21, 22) implemented with digital circuits such as field-programmable gate arrays (FPGAs) (23–29).
Inspired by recently proposed quantum adiabatic optimization using a nonlinear oscillator network (30–34), here, we propose a new heuristic algorithm for the Ising problem. In this algorithm, which we call simulated bifurcation (SB), we numerically simulate adiabatic evolutions of classical nonlinear Hamiltonian systems exhibiting bifurcation phenomena (35), where two branches of the bifurcation in each nonlinear oscillator correspond to two states of each Ising spin (30, 34). [The CIMs also use two branches of a bifurcation for two states of an Ising spin (15–20), but they are not Hamiltonian systems.] Unlike simulations of quantum systems, we can efficiently simulate the Hamiltonian systems (36) using current digital computers. This enables treatment of large-scale problems with dense connectivity, which are challenging for near-term quantum computers. Moreover, SB allows one to update variables simultaneously at each time step. This is in contrast to SA, because it generally requires one-by-one updating to guarantee convergence. [Simultaneous updating is allowed only for isolated spins (22, 24).] This difference between SB and SA implies that the acceleration of SB by massively parallel processing is easier than that of SA. Exploiting these advantages, we implement an ultrafast all-to-all connected 2000-spin Ising machine based on SB with a single FPGA and demonstrate that our machine is about 10 times faster than a state-of-the-art CIM (17, 19). In addition, we also solve an all-to-all connected 100,000-spin Ising problem with continuous parameters by SB using a graphics processing unit (GPU) cluster, which is about 10 times and 1000 times faster than our best SA and the software of SA provided by (22), respectively. This suggests that SB can accelerate large-scale combinatorial optimization without specially designed hardware devices or custom circuits.
While quantum adiabatic optimization is based on the quantum adiabatic theorem (37, 38), SB is based on adiabatic and chaotic (ergodic) evolutions of classical nonlinear Hamiltonian systems. We will discuss this mechanism of SB using the simulation results of a simple example instead of giving a mathematically rigorous proof of the operational principle of SB, which is left as an interesting problem for future work.
RESULTS
SB algorithm
The N-spin Ising problem without external magnetic fields is formulated as follows. A dimensionless Ising energy is given by
For the Ising problem, a new kind of quantum adiabatic optimization using a network of Kerr-nonlinear parametric oscillators (KPOs) has recently been proposed (30–34). The quantum mechanical Hamiltonian in this approach is given by (30, 34)
The corresponding classical Hamiltonian system is derived from a classical approximation where the expectation value of ai is approximated as a complex amplitude xi + iyi (where i denotes the imaginary unit) (30, 34). Here, the real and imaginary parts, xi and yi, are a pair of canonical conjugate variables, such as a position and a momentum, for the ith KPO. The classical mechanical Hamiltonian and the equations of motion for this classical system are given by (30, 34)
However, Eqs. 4 and 5 are not suitable for fast numerical simulation. By disregarding some terms proportional to the momenta y, which varies around zero (30, 34), we simplify Eqs. 4 and 5 as follows
Hereafter, we treat all the parameters with the dimension of frequency (correspondingly, also time) as dimensionless quantities because we use Eqs. 7 and 8 as an algorithm for solving mathematical problems, and therefore, the physical meanings of the parameters are unimportant. x and y are also dimensionless by definition.
The SB algorithm is based on Eqs. 7 and 8. Note that the new Hamiltonian in Eq. 6 is separable with respect to positions and momenta, unlike the previous one in Eq. 3. Therefore, Eqs. 7 and 8 can be solved by the explicit symplectic Euler method (see also Methods) (36) instead of implicit methods or, e.g., the explicit fourth-order Runge-Kutta method. The simplicity of the explicit symplectic Euler method is crucial for hardwiring of the SB algorithm with custom circuits, such as FPGAs. Its stability is also important, because this allows one to set the time step to a large value, which results in high-speed simulation. [The symplectic Euler method is first-order accurate. Higher-order explicit symplectic methods, such as the Störmer-Verlet method (36), can also be used, but this is not effective not only for the speed and stability of the present simulation but also for the solution accuracy for the Ising problem. Hence, we adopt the simplest first-order method. See Methods for details.]
The SB algorithm is as follows. All the variables, x and y, are initially set around zero. Gradually increasing p(t) from zero, we solve Eqs. 7 and 8 numerically by the explicit symplectic Euler method. (For further speedup, we modify the method from its standard form. See Methods for details.) The sign of the final xi provides the ith spin of an approximate solution of the Ising problem.
SB has further algorithmic advantages: Eqs. 7 and 8 have only one product-sum operation with respect to the coupling matrix J, which is the most computation-intensive part in this algorithm, unlike Eqs. 4 and 5 including two such product-sum operations; we can update the variables simultaneously, and therefore, we can fully exploit massively parallel processing for accelerating SB; and SB includes only addition and multiplication, and no probabilistic processes, and therefore can be easily hardwired, e.g., with FPGAs.
Mechanism of SB
The mechanism of SB is qualitatively explained as follows. When p(t) is gradually increased from zero to a sufficiently large value pf, each oscillator exhibits a bifurcation with two stable branches whose positions are approximately given by
Note that the first term is independent of s, and the second term is proportional to the Ising energy regarding s as a spin configuration. In the adiabatic evolution with slowly varying p(t), the state will follow one of the potential energy minima near to the present state. Assuming that minima with relatively low final energies appear earlier and are lower during the adiabatic evolution than those with relatively high final energies, we expect that the state will arrive at a minimum with a relatively low final energy. Thus, we will find a spin configuration with a low Ising energy as an approximate solution.
Instead of showing the general validity of the above qualitative description, here, we show the situation of a simple example: the two-spin Ising model with ferromagnetic coupling (J1,2 = J2,1 = 1), the ground states of which are (up, up) and (down, down). This will give us an intuitive understanding of SB, which is enough to use SB as a heuristic algorithm.
Figure 1 (A and B) shows the simulation results for this problem with a linearly increased pumping amplitude p(t) = εpt. As shown in this figure, after two bifurcations, the potential energy finally has four minima corresponding to the four configurations of the two-spin Ising model.
(A) Time-dependent pumping amplitude, p(t) = εpt (εp = 0.01), in the simulation (top) and time evolutions of x and y (oscillating thin lines in middle and bottom panels). Bold lines represent x of stable fixed points (potential energy minima): x1 = x2 = 0 (p ≤ Δ − ξ0 = 0.4),
Before the first bifurcation, the potential energy has a single minimum at the origin, around which the trajectory circulates, as shown in the top panel of Fig. 1B. After the first bifurcation, two stable fixed points (potential energy minima) corresponding to the ground states, (up, up) and (down, down), appear earlier than the other two minima corresponding to (up, down) and (down, up). The state adiabatically follows one of the two minima that appear earlier. Thus, SB successfully finds a ground state (down, down).
This adiabatic evolution is explained as follows. First, the total energy E(t) and the potential energies at the minima, V±,±(t) (+ and − correspond to up and down, respectively), are formulated from their time derivatives as
The above mechanism of SB suggests that solutions obtained by SB are determined at the first bifurcation point. Since x(t) at the first bifurcation point is given by the maximum-eigenvalue eigenvector of the coupling matrix J (see the Supplementary Materials), SB may provide the signs of the elements of the eigenvector as an approximate solution, which is actually an approximate solution obtained by a continuous reduction method (30, 34). As shown in the next subsection, however, SB can provide much better solutions than this. This fact implies that there may exist another mechanism of SB; that is, SB may continue to search better solutions after the first bifurcation.
To investigate this new mechanism of SB, we also did another simulation where p(t) is given in the top panel of Fig. 1C. The results are shown in Fig. 1 (C and D). In this case, the initial value of p(t) is higher than the two bifurcation points, and therefore, there are four potential energy minima even at the initial time. While p(t) is constant, x(t) moves in the energetically allowable region including the four minima, as shown in the top panel of Fig. 1D. After that, the allowable region separates into four parts, as shown in the middle panel of Fig. 1D. Then, x(t) is confined in one of the four regions. Eventually, SB find a ground state (up, up) successfully.
Since x(t) also moves around the other potential energy minima, as shown in the top and middle panels of Fig. 1D, the failure probability may be finite. To evaluate the success probability, we repeat this simulation 104 times with different initial conditions, where x1(0) = x2(0) = 0 and y1(0) and y2(0) are set randomly from the interval (−0.1, 0.1). As a result, we obtain ground states with a probability of about 0.83. This result indicates that lower energy states (better solutions) are obtained with higher probabilities even in the initial condition where the energetically allowable region includes multiple potential energy minima.
This interesting result can be explained by assuming the ergodicity of this nonlinear Hamiltonian system, implying that a trajectory will eventually visit all states in the phase-space allowable region (36). This is suggested by the top panel of Fig. 1D, where x(t) moves around two low minima more frequently than two high minima. From the ergodicity, the success probability may be proportional to the phase-space volume of the allowable region satisfying x1x2 > 0. The success probability numerically estimated by the phase-space volume for the middle panel of Fig. 1D is about 0.841, which is remarkably close to the actual success probability of 0.83. This numerical fact supports the mechanism based on the ergodicity. In general, the ergodicity suggests that x(t) moves around low potential energy minima in longer time than high minima, which results in lower Ising energies. In the next subsection, we provide additional numerical evidence for this mechanism; that is, SB can find much better solutions for a 2000-spin Ising problem than that obtained at the first bifurcation. Thus, we conclude that SB finds lower minima among many minima harnessing the ergodicity. [Very recently, a new approach to the Ising problem using a classical Hamiltonian system with “classical spins” mimicking quantum annealing has been proposed (42), where a fixed point is tracked by the technique called shortcut to adiabaticity. Although this approach uses classical Hamiltonian dynamics as well, it is quite different from SB, because it will not exploit the ergodic search but just track a fixed point. In addition, it has been unclear so far whether it can achieve a high-speed computation with its complicated equations.]
Solving an all-to-all connected 2000-node MAX-CUT problem by a single-FPGA SB machine
From the advantages and properties of SB, we expect that SB will be useful for approximately solving large-scale Ising or MAX-CUT problems with dense connectivity as fast as possible. To check this expectation, we compare our SB machine with a state-of-the-art CIM (17, 19), because this machine has achieved outstanding performance for these problems. For this comparison, we solved an all-to-all connected 2000-node MAX-CUT problem named K2000, which was solved by the CIM (17, 19).
We implemented an SB-based 2000-spin Ising machine with a single FPGA. The product-sum operation,
One of the main results in (17) is that the CIM reached a target energy given by the Goemans-Williamson semidefinite programming (GW-SDP) algorithm (17, 38) in about 70 μs in the best case among 100 trials, which is about 50 times shorter than that by the SA highly tuned in (17). [The SA in (17, 19) is very fast by putting all the elements of the coupling matrix J on a cache of a single core. Thus, parallel computing with multiple cores cannot accelerate the SA because of communication overheads.] In comparison with this, our single-FPGA SB machine reaches the GW-SDP value in only 58 μs even in the worst case among 100 trials, as shown in Fig. 2A.
(A) Time evolutions of Ising energies. (The computation times do not include the loading of data and the output of the results; see the Supplementary Materials.) Constants in SB are set as K = Δ = 1 and
Another main result in (17) is that the CIM took only 5 ms to obtain as good solutions (large cut values) as the highly tuned SA obtained in 50 ms, where the cut value of the MAX-CUT problem is given by
In Fig. 2A, we also show the energy obtained by the Hopfield neural network (HNN) of two-state neurons (19, 43). The HNN is the most naïve local search algorithm, and therefore, more sophisticated algorithms should outperform the HNN. (The HNN is equivalent to SA at zero temperature; see the Supplementary Materials.) Note that the HNN energy is lower than the GW-SDP energy. Our SB machine reaches the HNN energy about 13 times faster than the CIM on average. From these results, we conclude that our single-FPGA SB machine is about 10 times faster than the CIM developed in (17).
As mentioned in the last subsection, at the first bifurcation point, SB may find an approximate solution determined by the maximum-eigenvalue eigenvector of the coupling matrix J (30, 34). In the case of K2000, this approximate solution is EIsing = −55724 or the cut value of 27,342. This is much worse than those actually obtained by SB, as shown in Fig. 2. This fact supports the mechanism of SB based on the ergodicity discussed in the last subsection.
Solving an all-to-all connected 100,000-node MAX-CUT problem by a GPU-cluster SB machine
We have further solved an all-to-all connected 100,000-node MAX-CUT problem with continuous weights set randomly from the interval [−1, 1], which is named M100000. [See the Supplementary Materials for its definition with a pseudo-random number generator in (44).] Note that in M100000, there are about 5 billion edges, the data size of which is too large for FPGAs. We instead use two large computational systems and compare the performance from each. System 1 is a GPU cluster using eight GPUs communicating with one another via a fast link. System 2 is a PC cluster composed of multiple nodes with two 14-core processors each communicating with one another via a fast link. We also solved M100000 by the HNN and the SA developed by us based on (22). Our SA is remarkably fast, e.g., about 100 times faster than the software provided by (22). See the Supplementary Materials for details.
The comparison of computation times for SB and SA is shown in Fig. 3A. The fastest implementation is the GPU cluster for SB and the 50-core PC cluster for SA. This difference comes from the fact that the massively parallel processing of the GPU cluster is effective for SB, but not for SA, due to the lack of parallelizability of the spin update and resultant large communication overheads. Figure 3B shows the computation times of SB and SA in their fastest implementations. The dotted lines roughly give the envelopes of the time-evolution curves. This means that the dotted lines indicate the best cases with respect to Nstep and Nsweep. From this comparison at the same Ising energies below the HNN value, it turns out that the GPU-cluster SB machine is about 10 times faster than our fastest SA or 1000 times faster than the software provided by (22). This demonstrates that SB allows one to accelerate large-scale combinatorial optimization without specially designed hardware devices or custom circuits.
(A) Computation times of SB (Nstep = 100) and SA (Nsweep = 100) for M100000. (The computation times do not include the loading of data and the output of the results; see the Supplementary Materials.) Constants in SB are set as K = Δ = 1 and
DISCUSSION
We have proposed a new heuristic algorithm for the Ising problem, which we call SB, inspired by quantum adiabatic optimization using a nonlinear oscillator network. Exploiting the advantages of SB, such as simultaneous updating and simple calculations, we have realized an ultrafast all-to-all connected 2000-spin Ising machine based on SB using a single FPGA, which is about 10 times faster than a state-of-the-art CIM. We have also solved an all-to-all connected 100,000-node MAX-CUT problem with continuous weights using a GPU-cluster SB machine, which is about 10 times faster than our fastest SA. We found that the computation speed of the GPU-cluster SB machine is limited by the memory bandwidth of each GPU (see the Supplementary Materials). This indicates that hardware devices with broader memory bandwidths specially designed for SB, such as application-specific integrated circuits or multi-FPGA systems, will substantially improve the performance. The mechanism of SB is based on adiabatic and chaotic (ergodic) evolutions of the nonlinear Hamiltonian system. We have discussed the mechanism using simulation results of a simple model. The mechanism of SB seems to be related to adiabatic invariants of classical integrable systems (36, 40, 45, 46) and also ergodic adiabatic invariants of classical nonintegrable systems (47–49). The mathematically rigorous treatment of the mechanism using these concepts is left for future work. Another interesting direction of research is to extend SB to thermal states at finite temperature, e.g., using the Nosé-Hoover method (36), which will relate SB to nonequilibrium statistical mechanics (50).
METHODS
Modified explicit symplectic Euler method
The symplectic method (36) is a stable numerical method for solving the Hamiltonian equations of motion. Hereafter, the flow map by a Hamiltonian H with time step Δt is denoted by Φ(Δt, H). Since the flow map by the kinetic energy alone Φ(Δt, H − V) or the potential energy alone Φ(Δt, V) is obviously symplectic, their composition Φ(Δt, V) ∘ Φ(Δt, H − V) is also symplectic (36). Since the present Hamiltonian is separable with respect to x and y, we obtained the explicit symplectic Euler method for the present Hamiltonian
For realizing further speedup, we modified this method as follows. In the SB algorithm, the product-sum operation,
The second-order symplectic method called the Störmer-Verlet method is given by the flow map Φ(Δt/2, HSB − V) ∘ Φ(Δt, V) ∘ Φ(Δt/2, HSB − V) (36). Because of Φ(Δt, V), the stability of this method is equivalent to the explicit symplectic Euler method. We found that this method is stable for M100000 when Δt ≤ 0.5. Thus, this method does not contribute to a speedup. We also found that the second-order accuracy for the Hamiltonian equations does not contribute to the solution accuracy for the Ising problem. This may be because we need only the signs of the positions, which will be robust against numerical errors. Therefore, we adopted the above modified explicit symplectic Euler method.
SUPPLEMENTARY MATERIALS
Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/5/4/eaav2372/DC1
Solving MAX-CUT with Ising machines
First bifurcation in SB
Setting of ξ0
HNN of two-state neurons
Simulated annealing
FPGA implementation of SB for K2000
Definition of M100000
Eight-GPU implementation of SB and SA for M100000
PC-cluster implementation of SB and SA for M100000
Fig. S1. Computation times of our best SA and the SA software provided by (22) for M100000.
This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.
REFERENCES AND NOTES
- Copyright © 2019 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution NonCommercial License 4.0 (CC BY-NC).