## Abstract

Protecting quantum information from errors is essential for large-scale quantum computation. Quantum error correction (QEC) encodes information in entangled states of many qubits and performs parity measurements to identify errors without destroying the encoded information. However, traditional QEC cannot handle leakage from the qubit computational space. Leakage affects leading experimental platforms, based on trapped ions and superconducting circuits, which use effective qubits within many-level physical systems. We investigate how two-transmon entangled states evolve under repeated parity measurements and demonstrate the use of hidden Markov models to detect leakage using only the record of parity measurement outcomes required for QEC. We show the stabilization of Bell states over up to 26 parity measurements by mitigating leakage using postselection and correcting qubit errors using Pauli-frame transformations. Our leakage identification method is computationally efficient and thus compatible with real-time leakage tracking and correction in larger quantum processors.

## INTRODUCTION

Large-scale quantum information processing hinges on overcoming errors from environmental noise and imperfect quantum operations. Fortunately, the theory of quantum error correction (QEC) predicts that the coherence of single degrees of freedom (logical qubits) can be better preserved by encoding them in ever-larger quantum systems (Hilbert spaces), provided the error rate of the constituent elements lies below a fault-tolerance threshold (*1*). Experimental platforms based on trapped ions and superconducting circuits have achieved error rates in single-qubit (SQ) gates (*2*–*4*), two-qubit gates (*2*, *4*, *5*), and qubit measurements (*3*, *6*–*8*) at or below the threshold for popular QEC schemes such as surface (*9*, *10*) and color codes (*11*). They therefore seem well poised for the experimental pursuit of quantum fault tolerance. However, a central assumption of textbook QEC that error processes can be discretized into bit flips (*X*), phase flips (*Z*), or their combination (*Y* = *iXZ*) only is difficult to satisfy experimentally. This is due to the prevalent use of many-level systems as effective qubits, such as hyperfine levels in ions and weakly anharmonic transmons in superconducting circuits, making leakage from the two-dimensional computational space of effective qubits a threatening error source. In quantum dots and trapped ions, leakage events can be as frequent as qubit errors (*12*, *13*). However, even when leakage is less frequent than qubit errors as in superconducting circuits (*2*, *5*), if ignored, then leakage can produce the dominant damage to encoded logical information. To address this, theoretical studies propose techniques to reduce the effect of leakage by periodically moving logical information and removing leakage when qubits are free of logical information (*14*–*17*). Alternatively, more hardware-specific solutions have been proposed for trapped ions (*18*) and quantum dots (*19*). In superconducting circuits, recent experiments have demonstrated single- and multiround parity measurements to correct qubit errors with up to nine physical qubits (*20*–*28*). Parallel approaches encoding information in the Hilbert space of single resonators using cat (*29*) and binomial codes (*30*) used transmon-based photon parity checks to approach the breakeven point for a quantum memory. However, no experiment has demonstrated the ability to detect and mitigate leakage in a QEC context.

Here, we experimentally investigate leakage detection and mitigation in a minimal QEC system. Specifically, we protect an entangled state of two transmon data qubits (*Q*_{DH} and *Q*_{DL}) from qubit errors and leakage during up to 26 rounds of parity measurements via an ancilla transmon (*Q*_{A}). Performing these parity checks in the *Z* basis protects the state from *X* errors, while interleaving checks in the *Z* and *X* bases protects it from general qubit errors (*X*, *Y*, and *Z*). Leakage manifests itself as a round-dependent degradation of data-qubit correlations ideally stabilized by the parity checks: 〈*Z* ⊗ *Z*〉 in the first case and 〈*X* ⊗ *X*〉, 〈*Y* ⊗ *Y*〉, and 〈*Z* ⊗ *Z*〉 in the second. We introduce hidden Markov models (HMMs) to efficiently detect data-qubit and ancilla leakage, using only the string of parity outcomes, demonstrating restoration of the relevant correlations. Although we use postselection here, the low technical overhead of HMMs makes them ideal for real-time leakage correction in larger QEC codes.

## RESULTS

### A minimal QEC setup

Repetitive parity checks can produce and stabilize two-qubit entanglement. For example, performing a *Z* ⊗ *Z* parity measurement (henceforth a ZZ check) on two data qubits prepared in the unentangled state ∣++〉 = (∣0〉 + ∣1〉) ⊗ (∣0〉 + ∣1〉) /2 will ideally project them to either of the two (entangled) Bell states *M*_{A}. Subsequent ZZ checks will ideally leave the entangled state unchanged. However, qubit errors will alter the state in ways that may or may not be detectable and/or correctable. For instance, a bit-flip (*X*) error on either data qubit, which transforms ∣Φ^{+}〉 into ∣Ψ^{+}〉, will be detected because *X* anticommutes with a ZZ check. The corruption can be corrected by applying a bit flip on either data qubit because this cancels the original error (*X*^{2} = *I*) or completes the operation *X* ⊗ *X*, of which ∣Φ^{+}〉 and ∣Ψ^{+}〉 are both eigenstates. The correction can be applied in real time using feedback (*20*, *21*, *28*, *31*) or kept track of using Pauli frame updating (PFU) (*24*, *32*). We choose the latter, with PFU strategy “*X* on *Q*_{DH}.” Phase-flip errors are not detectable since *Z* on either data qubit commutes with a ZZ check. These errors transform ∣Φ^{+}〉 into ^{+}〉 into *Y* errors produce the same signature as *X* errors. Our PFU strategy above converts them into *Z* errors. Crucially, by interleaving checks of type ZZ and XX (measuring *X* ⊗ *X*), arbitrary qubit errors can be detected and corrected. The ZZ check will signal either *X* or *Y* error, and the XX check will signal *Z* or *Y*, providing a unique signature in combination.

### Generating entanglement by measurement

Our parity check is an indirect quantum measurement involving coherent interactions of the data qubits with *Q*_{A} and subsequent *Q*_{A} measurement (Fig. 1A) (*33*). The coherent step maps the data-qubit parity onto *Q*_{A} in 120 ns using SQ and two-qubit controlled-phase (CZ) gates (*5*). Gate characterizations (see the Supplementary Materials) indicate state-of-the-art gate errors *e*_{SQ} = {0.08 ± 0.02,0.14 ± 0.016,0.21 ± 0.06}% and *e*_{CZ} = {1.4 ± 0.6,0.9 ± 0.16}% with leakage per CZ *L*_{1} = {0.27 ± 0.12,0.15 ± 0.07}%. We measure *Q*_{A} with a 620-ns pulse including photon depletion (*7*, *34*), achieving an assignment error *e*_{a} = 1.0 ± 0.1%. We avoid data-qubit dephasing during the *Q*_{A} measurement by coupling each qubit to a dedicated readout resonator (RR) and a dedicated Purcell resonator (PR) (fig. S1) (*8*). The parity check has a cycle time of 740 ns, corresponding to only 2.5 ± 0.2% and 5.0 ± 0.3% of the data-qubit echo dephasing times (see the Supplementary Materials).

The parity measurement performance can be quantified by correlating its outcome with input and output states. We first quantify the ability to distinguish even-parity (∣00〉, ∣11〉) from odd-parity (∣01〉, ∣10〉) data-qubit input states, finding an average parity assignment error *e*_{a,ZZ} = 5.1 ± 0.2%. Second, we assess the ability to project onto the Bell states by performing a ZZ check on ∣++〉 and reconstructing the most likely physical data-qubit output density matrix ρ, conditioning on *M*_{A} = ±1. When tomographic measurements are performed simultaneously with the *Q*_{A} measurement, we find Bell-state fidelities *F*_{∣Φ+〉∣MA=+1} = 〈Φ^{+}∣ρ_{MA=+1}∣Φ^{+}〉 = 94.7 ± 1.9% and *F*_{∣Ψ+〉∣MA=−1} = 94.5 ± 2.5% (Fig. 1, C and D). We connect ∣Ψ^{+}〉 to ∣Φ^{+}〉 by incorporating the PFU into the tomographic analysis, obtaining *F*_{∣Φ+〉} = 94.6 ± 0.9% without any postselection (Fig. 1E). The nondemolition character of the ZZ check is then validated by performing tomography only once the *Q*_{A} measurement completes. We include an echo pulse on both data qubits during the *Q*_{A} measurement to reduce intrinsic decoherence and negate residual coupling between data qubits and *Q*_{A} (fig. S3). The degradation to *F*_{∣Φ+〉} = 91.8 ± 0.5% is consistent with intrinsic data-qubit decoherence under echo and confirms that measurement-induced errors are minimal.

### Protecting entanglement from bit flips and the observation of leakage

QEC stipulates repeated parity measurements on entangled states. We therefore study the evolution of *F*_{∣Φ+〉} = (1 + 〈*X* ⊗ *X*〉 − 〈*Y* ⊗ *Y*〉 + 〈*Z* ⊗ *Z*〉)/4 and its constituent correlations as a function of the number *M* of checks (Fig. 2A). When performing PFU using the first ZZ outcome only (ignoring subsequent outcomes), we observe that *F*_{∣Φ+〉} witnesses entanglement (>0.5) during 10 rounds and approaches randomization (0.25) by *M* = 25 (Fig. 2B). The constituent correlations also decay with simple exponential forms. A best fit of the form 〈*Z* ⊗ *Z*〉[*M*] = *a* · *e*^{−M/υZZ} + *b* gives a decay time υ_{ZZ} = 9.0 ± 0.9 rounds; similarly, we extract υ_{XX} = 11.7 ± 1.0 rounds (Fig. 2, C and D). By comparison, we observe that Bell states evolving under dynamical decoupling only (no ZZ checks; see fig. S4) decay similarly (υ_{ZZ} = 8.6 ± 0.3, υ_{XX} = 12.8 ± 0.4 rounds). These similarities indicate that intrinsic data-qubit decoherence is also the dominant error source in this multiround protocol.

To demonstrate the ability to detect *X* and *Y* but not *Z* errors, we condition the tomography on signaling no errors during *M* rounds. This boosts 〈*Z* ⊗ *Z*〉 to a constant, while the undetectability of *Z* errors only allows slowing the decay of 〈*X* ⊗ *X*〉 to υ_{XX} = 33.2 ± 1.7 rounds (and of 〈*Y* ⊗ *Y*〉 to υ_{YY} = 31.3 ± 1.9 rounds). Naturally, this conditioning comes at the cost of the postselected fraction *f*_{post} reducing with *M* (fig. S5).

Moving from error detection to correction, we consider the protection of ∣Φ^{+}〉 by tracking *X* errors and applying corrections in postprocessing. The correction relies on the final two *M*_{A} only, concluding even parity for equal measurement outcomes and odd parity for unequal. For this small-scale experiment, this strategy is equivalent to a decoder based on minimum-weight perfect matching (MWPM) (*10*, *35*), justifying its use. Because our PFU strategy converts *Y* errors into *Z* errors, one expects a faster decay of 〈*X* ⊗ *X*〉 compared to the no-error conditioning; we observe υ_{XX} = 11.8 ± 1.0 rounds. Correction should lead to a constant 〈*Z* ⊗ *Z*〉. While 〈*Z* ⊗ *Z*〉 is clearly boosted, a weak decay to a steady state 〈*Z* ⊗ *Z*〉 = 0.73 ± 0.03 is also evident (Fig. 2D). As previously observed in (*31*), this degradation is the hallmark of leakage [see also (*21*, *24*)]. We additionally compare the experimental results to simulations using a model that assumes ideal two-level systems (*35*) (no leakage) based on independently calibrated parameters of table S1 (fig. S8, A to D). At *M* = 1, the model and the experiment coincide for all correction strategies. At larger *M*, “first” and “final” correction strategies deviate substantially, consistent with a gradual buildup of leakage, which we now turn our focus to.

### Leakage detection using HMMs

Both ancilla and data-qubit leakage in our experiment can be inferred from a string *Q*_{A} to the second excited transmon state ∣2〉 produces *M*_{A} = −1 because measurement does not discern it from ∣1〉. This leads to the pattern *Q*_{A} seeps back to ∣1〉 (coherently or by relaxation), as it is unaffected by subsequent π/2 rotations (Fig. 3C). Leakage of a data qubit (Fig. 3B) leads to apparent repeated errors [signaled by *s*_{D}[*m*] ≔ *M*_{A}[*m*] · *M*_{A}[*m* − 2]—*s*_{D} = (…, −1, −1, −1, …). (We call *s*_{D}[*m*] = −1 an error signal as in the absence of noise *s*_{D}[*m*] = +1, while the measurements *M*_{A}[*m*] will still depend on the ZZ parity.)

Neither of the above patterns is entirely unique to leakage; each may also be produced by some combination of qubit errors. Therefore, we cannot unambiguously diagnose an individual experimental run of corruption by leakage. However, given a set of ancilla measurements *M*_{A}[0], …, *M*_{A}[*m*], the likelihood *Q* is in the computational subspace during the final parity checks is well defined. In this work, we infer *36*), which treats the system as leaking out of and seeping back to the computational subspace in a stochastic fashion between each measurement round (a leakage HMM in its simplest form is shown in Fig. 3A and further described in the “Hidden Markov models,” “HMMs for QEC experiments,” and “Simplest models for leakage discrimination” sections). This may be extended to scalable leakage detection (for the purposes of leakage mitigation) in a larger QEC code, by using a separate HMM for each data qubit and ancilla. To improve the validity of the HMMs, we extend their internal states to allow the modeling of additional noise processes in the experiments (detailed in the “Modeling additional noise” and “HMMs used in Figs. 2 and 4” sections).

Before assessing the ability of our HMMs to improve fidelity in a leakage mitigation scheme, we first validate and benchmark them internally. A common method to validate the HMM’s ability to model the experiment is to compare statistics of the experimentally generated data to a simulated dataset generated by the model itself. As we are concerned only with the ability of the HMM to discriminate leakage, ^{5} experimental and simulated experiments, binned according to *37*)*H* (maximized over all parameters *p _{i}* in the model, as listed in Table 1), and

*n*

_{p, M}is the number of parameters

*p*. The number

_{i}*A*(

*H*) is rather meaningless by itself; we require a comparison model

*H*

^{(comp)}for reference. Our model is preferred over the comparison model whenever

*A*(

*H*) >

*A*(

*H*

^{(comp)}). For comparison, we take the target HMM

*H*, remove all parameters describing leakage, and re-optimize. We find the difference

*A*(

*H*) −

*A*(

*H*

^{(comp)}) = 1.1 × 10

^{5}for the data-qubit HMM and 2.1 × 10

^{4}for the ancilla HMM, giving significant preference for the inclusion of leakage in both cases. [The added internal states beyond the simple two-state HMMs clearly improves the overlap in histograms (fig. S10, A and B). The added complexity is further justified by the Akaike information criterion (see the Supplementary Materials).]

The above validation suggests that we may assume that the ratio of actual leakage events at a given *L*_{comp,Q} is well approximated by *L*_{comp,Q} itself (which is true for the simulated data). Under this assumption, we expose the HMM discrimination ability by plotting its receiver operating characteristic (ROC) (*38*). The ROC (Fig. 3F) is a parametric plot (sweeping a threshold *y* = *x*; the better the detection, the greater the upward shift. Both ROCs indicate that most of the leakage (TPR = 0.7) can be efficiently removed with FPR ∼ 0.1. Individual mappings of TPR and FPR as a function of *15*).

We now verify and externally benchmark our HMMs by their ability to improve 〈*Z* ⊗ *Z*〉 by rejecting data with a high probability of leakage. To do this, we set a threshold *Z* ⊗ *Z*〉 to its first-round value across the entire curve (Fig. 3G), mildly reducing *f*_{post} to 0.82 (averaged over *M*). This restoration from leakage is confirmed by the “final + HMM” data matching the no-leakage model results in fig. S8 (A to D). As low *Z* ⊗ *Z*〉 is partly due to false positives. Of the ∼ 0.13 increase at *M* = 25, we attribute 0.07 to actual leakage (estimated from the ROCs). By comparison, the simple two-state HMM leads to a lower improvement while rejecting a larger part of the data (fig. S10G), ultimately justifying the increased HMM complexity in this particular experiment.

### Protecting entanglement from general qubit errors and mitigation of leakage

We lastly demonstrate leakage mitigation in the more interesting scenario where ∣Φ^{+}〉 is protected from general qubit error by interleaving ZZ and XX checks (*28*, *31*). ZZ may be converted to XX by adding π/2 *y* rotations on the data qubits simultaneous with those on *Q*_{A}. This requires that we change the definition of the syndrome to *s*_{D}[*m*] = *M*_{A}[*m*] · *M*_{A}[*m* − 1] · *M*_{A}[*m* − 2] · *M*_{A}[*m* − 3], as we need to “undo” the interleaving of the ZZ and XX checks to detect errors. For an input state *X* and/or *Z* on *Q*_{DH}, we find *F*_{∣Φ+〉} = 83.8 ± 0.8% (fig. S6). For subsequent rounds, the “final” strategy now relies on the final three *M*_{A}. We observe a decay toward a steady state *F*_{∣Φ+〉} = 73.7 ± 0.9% (Fig. 4), consistent with previously observed leakage. We battle this decay by adapting the HMMs (detailed in the “Modeling additional noise” and “HMMs used in Figs. 2 and 4” sections). We find an improved ROC for *Q*_{A} leakage (fig. S7). For data-qubit leakage, however, the ROC is degraded. This is to be expected—when one data qubit is leaked in this experiment, the ancilla effectively performs interleaved *Z* and *X* measurements on the unleaked qubit. This leads to a signal of random noise *P*(*s*_{D}[*m*] = −1) = 0.5, which is less distinguishable from unleaked experiments *P*(*s*_{D}[*m*] = −1) ∼ 0 than the signal of a leaked data-qubit during the 〈*Z* ⊗ *Z*〉-only experiment *P*(*s*_{D}[*m*] = −1) ∼ 1. Thresholding to TPR = 0.7 restores 〈*X* ⊗ *X*〉 and 〈*Z* ⊗ *Z*〉, leading to an almost constant *F*_{∣Φ+〉} = 82.8 ± 0.2% with *f*_{post} = 0.81 (averaged over *M*), as expected from the no-leakage model results in fig. S8 (E to H). In this experiment, the simple two-state HMMs perform almost identically compared to the complex HMM, achieving Bell-state fidelities within 2% while retaining the same amount of data (fig. S10N).

## DISCUSSION

This HMM demonstration provides exciting prospects for leakage detection and correction. In larger systems, independent HMMs can be dedicated to each qubit because leakage produces local error signals (*16*). An HMM for an ancilla only needs its measurement outcomes, while a data-qubit HMM only needs the outcomes of the nearest-neighbor ancillas (details in see the Supplementary Materials). Therefore, the computational power grows linearly with the number of qubits, making the HMMs a small overhead when running parallel to MWPM. HMM outputs could be used as inputs to MWPM, allowing MWPM to dynamically adjust its weights. The outputs could also be used to trigger leakage reduction units (*14*–*17*) or qubit resets (*39*).

In summary, we have performed the first experimental investigation of leakage detection during repetitive parity checking, successfully protecting an entangled state from qubit errors and leakage in a circuit quantum electrodynamics processor. Future work will extend this protection to logical qubits, e.g., the 17-qubit surface code (*35*, *40*). The low technical overhead and scalability of HMMs are attractive for performing leakage detection and correction in real time using the same parity outcomes as traditionally used to correct qubit errors only.

## MATERIALS AND METHODS

### Device

Our quantum processor (fig. S1) follows a three-qubit–frequency extensible layout with nearest-neighbor interactions that is designed for the surface code (*41*). Our chip contains low- and high-frequency data qubits (*Q*_{DL} and *Q*_{DH}) and an intermediate-frequency ancilla (*Q*_{A}). SQ gates around axes in the equatorial plane of the Bloch sphere are performed via a dedicated microwave drive line for each qubit. Two-qubit interactions between nearest neighbors are mediated by a dedicated bus resonator (extensible to four per qubit) and controlled by individual tuning of qubit transition frequencies via dedicated flux-bias lines (*42*). For measurement, each qubit is dispersively coupled to a dedicated RR, which is itself connected to a common feedline via a dedicated PR. The RR-PR pairs allow frequency-multiplexed readout of selected qubits with negligible backaction on untargeted qubits (*8*).

### Hidden Markov models

HMMs provide an efficient tool for indirect inference of the state of a system given a set of output data (*36*). An HMM describes a time-dependent system as evolving between a set of *N _{h}* hidden states {

*h*} and returning one of

*N*outputs {

_{o}*o*} at each timestep

*m*. The evolution is stochastic: the system state

*H*[

*m*] of the system at timestep

*m*depends probabilistically on the state

*H*[

*m*− 1] at the previous timestep, with probabilities determined by a

*N*×

_{h}*N*transition matrix

_{h}*A*

The user cannot directly observe the system state and must infer it from the outputs *O*[*m*] ∈ {*o*} at each timestep *m*. This output is also stochastic: *O*[*m*] depends on *H*[*m*] as determined by a *N _{o}* ×

*N*output matrix

_{h}*B*

If the *A* and *B* matrices are known, along with the expected distribution *N _{h}* possibilities

*m*

### HMMs for QEC experiments

To maximize the discrimination ability of HMMs in the various settings studied in this work, we choose different quantities to use for our output vectors *M*_{A}[*m*] = −1, and so, we choose

One may predict the computational likelihood for data-qubit (D) leakage at timestep *M* in the ZZ-check experiment given *h* correspond to leakage, we may write

However, in the repeated ZZ-check experiment, the ancilla (A) needs to be within the computational subspace for two rounds to perform a correct parity measurement. Therefore, the computational likelihood is slightly more complicated to calculate

In the interleaved ZZ- and XX-check experiment, the situation is more complicated as we require data from the final two parity checks to fully characterize the quantum state. This implies that we need unleaked data qubits for the last two rounds and unleaked ancillas for the last three. The likelihood of the latter may be calculated by similar means to the above.

### Simplest models for leakage discrimination

One need not capture the full dynamics of the quantum system in an HMM to infer whether a qubit is leaked. This is of critical importance if we wish to extend this method for the purposes of leakage mitigation in a large QEC code (as we discuss in the Supplementary Materials). The simplest possible HMM (Fig. 3A) has two hidden states: *H*[*m*] = 1 if the qubit(s) in question are within the computational subspace and *H*[*m*] = 2 if *Q*_{A} (or either data qubit) is leaked. The labels 1 and 2 are arbitrary here and explicitly have no correlation with the qubit states ∣1⟩ and ∣2⟩. Then, the 2 × 2 transition matrix simply captures the leakage and seepage rates of the system in question

The 2 × 2 output matrices then capture the different probabilities of seeing output *O*[*m*] = 0 or *O*[*m*] = 1 when the qubit(s) are leaked or unleaked

When studying data-qubit leakage, *p*_{0,1} simply captures the rate of errors within the computational subspace. Then, in the repeated ZZ-check experiment, *p*_{1,0} captures events such as ancilla or measurement errors that cancel the error signal of a leakage event. However, in the interleaved ZZ and XX experiments, a leaked qubit causes the syndrome to be random, so we expect *p*_{1,0} ∼ 0.5. When studying ancilla leakage, *p*_{1,0} is simply the probability of ∣2⟩ state being read out as ∣0⟩ and is also expected to be close to 0. However, *p*_{0,1} ∼ 0.5, as we do not reset *Q*_{A} or the logical state between rounds of measurement, and thus, any measurement in isolation is roughly equally likely to be 0 or 1. In all situations, we assume that the system begins in the computational subspace—π* _{n}*(0) = δ

_{n,0}. With this fixed, we may choose the parameters

*p*

_{leak},

*p*

_{seep},

*p*

_{0,1}, and

*p*

_{1,0}to maximize the likelihood

*o*}. Note that

*L*

_{comp,Q}.

### Modeling additional noise

The simple model described above does not completely capture all of the details of the stabilizer measurements

Here, we choose the matrices *D* matrices to emphasize that each error channel only appears in one of the two above equations.]

The error generators *D*^{(A)} and *D*^{(B)} may be identified as derivatives of *A* with respect to these error rates

This may be extended to calculate derivatives of the likelihood *p _{i}*. This allows us to obtain the maximum likelihood model within our parametrization via gradient descent methods [in particular the Newton-Conjugate Gradient (CG) method] instead of resorting to more complicated optimization algorithms such as the Baum-Welch algorithm (

*36*). All models were averaged over between 10 and 20 optimizations using the Newton-CG method in SciPy (

*43*), calculating likelihoods, gradients, and Hessians over 10,000 to 20,000 experiments per iteration and rejecting any failed optimizations. As the signal of ancilla leakage is identical to the signal for even ZZ and XX parities with ancilla in ∣1⟩ and no errors, we find that the optimization is unable to accurately estimate the ancilla leakage rate, and so, we fix this in accordance with independent calibration to 0.0040 per round using averaged homodyne detection of ∣2⟩ (making use of a slightly different homodyne voltage for ∣1⟩ and ∣2⟩).

### HMMs used in Figs. 2 and 4

Different Markov models (with independently optimized parameters) were used to optimize ancilla and data-qubit leakage estimation for both the ZZ experiment and the experiment interleaving ZZ and XX checks. This leads to a total of four HMMs, which we label *H*_{ZZ}-D, *H*_{ZZ}-A, *H*_{ZZ, XX}-D, and *H*_{ZZ, XX}-A. A complete list of parameter values used in each HMM is given in Table 1. We now describe the features captured by each HMM. As we show in the Supplementary Materials, these additional features are not needed to increase the error mitigation performance of the HMMs but rather to ensure their closeness to the experiment and increase trust in their internal metrics.

To go beyond the simple HMM in the ZZ-check experiment when modeling data-qubit leakage (*H*_{ZZ}-D), we need to include additional states to account for the correlated signals of ancilla and readout errors. If we assume data-qubit errors (that remain within the logical subspace) are uncorrelated in time, then they are already well captured in the simple model. This is because any single error on a data qubit may be decomposed into a combination of *Z* errors (which commute with the measurement and thus are not detected) and *X* errors (which anticommute with the measurement and thus produce a single error signal *s*_{D}[*m*] = 1) and is thus captured by the *p*_{0,1} parameter. When one of the data qubits is leaked, uncorrelated *X* errors on the other data qubit cancel the constant *s*_{D}[*m*] = −1 signal for a single round and are thus captured by the *p*_{1,0} parameter. However, errors on the ancilla, and readout errors, give error signals that are correlated in time (separated by one or two timesteps, respectively). This may be accounted for by including extra “ancilla error states.” These may be most easily labeled by making the *h* labels a tuple *h* = (*h*_{0}, *h*_{1}), where *h*_{0} keeps track of whether or not the qubit is leaked, and *h*_{1} = 1,2,3 keeps track of whether or not a correlated error has occurred. In particular, we encode the future syndrome for 2 cycles in the absence of error on *h*_{1}, allowing us to account for any correlations up to two rounds in the future. This extends the model to a total of 4 × 2 = 8 states. The transition and output matrices in the absence of error for the unleaked *h*_{0} = 0 states may then be written in a compact form (noting that leakage errors cancel out with correlated ancilla and readout errors to give *s*_{D}[*m*] = +1)

Let us briefly demonstrate how the above works for an ancilla error in the system. Suppose the system was in the state *h* = (0,3) at time *m*, it would output *M*_{A}[*m*] = −1 and then evolve to *h* = (0,3//2) = (0,1) at time *m* + 1 (in the absence of additional error). Then, it would output a second error signal [*M*_{A}[*m* + 1] = −1] and lastly decay back to the *h* = (0,1//2) = (0,0) state. This gives the HMM the ability to model the ancilla error as an evolution from *h* = (0,0) to *h*(0,3). Formally, we assign the matrix

The corresponding error rate *p*_{ancilla} is then an additional free parameter to be optimized to maximize the likelihood. To finish the characterization of this error channel, we need to consider the effect of the ancilla error in states other than *h* = (0,0). Two ancilla errors in the same timestep cancel, but two ancilla errors in subsequent timesteps will cause the signature *s*_{D} = …, −1, +1, −1, …. This may be captured by an evolution from *h* = (0,2) to *h* = (0,3) [instead of *h* = (0,1)], which implies that we should set

[Note that *A*_{0} already captures a decay from *h* = (0,2) → (0,1) → (0,0), which will give the desired signal.] We note that this also matches the signature of readout error, which can then be captured by a separate error channel

One can then check that ancilla errors in *h* = (0,3) should cause the system to remain in *h* = (0,3) and that ancilla or readout errors in *h* = (0,1) should evolve the system to *h* = (0,2). We note that this model cannot account for the *s*_{D} = … −1, +1, +1, +1, −1 signature of readout error at time *m* and *m* + 2, but adjusting the model to include this has negligible effect.

An ancilla error in the ZZ-check experiment when the data qubits are leaked has the same correlated behavior as when they are not but may occur at a different rate. This requires that we define a new matrix *p*_{ancilla,leaked}. As we do not expect the readout of the ancilla to be significantly affected by whether the data qubit is leaked, we do not add an extra parameter to account for this behavior and instead simply set

We also assume that leakage *p*_{leak} and seepage *p*_{seep} rates are independent of these correlated errors (i.e., *h*_{0},1)] or not [corresponding to an evolution to (*h*_{0},0)]. We lastly account for data-qubit error in the output matrices in the same way as in the simple model but with different error rates *p*_{data,leaked} for the leaked states (1, *h*_{1}) and *p*_{data} for the unleaked states (0, *h*_{1}).

There are a few key differences between the interleaved ZZ-XX and ZZ experiments that need to be captured in the data-qubit HMM H_{ZZ,XX}-D. First, as the syndrome is now given by *s*_{D}[*m*] = *M*_{A}[*m*] · *M*_{A}[*m* − 1] · *M*_{A}[*m* − 2] · *M*_{A}[*m* − 3], ancilla and classical readout errors can then generate a signal stretching up to four steps in time. This implies that we require 2^{4} possibilities for *h*_{1} to keep track of all correlations. However, as a leaked data qubit makes ancilla output random in principle, we no longer need to keep track of the ancilla output upon leakage. This implies that we can accurately model the experiment with 16 + 1 = 17 states, which we can label by *h* ∈ {2, (1, *h*_{1})}. The *A*_{0} and *B*_{0} matrices in the unleaked states (1, *h*_{1}) follow Eq. 15, anwd we fix [*A*_{0}]_{2,2} = 1 (as in the absence of *p*_{seep}, a leaked state stays leaked). However, we allow for some bias in the leaked state error rate—*B*_{−1,2} = *p*_{data,leaked} is not fixed to 0.5. For example, this accounts for a measurement bias toward a single state, which will reduce the error rate below 0.5. The nonzero elements in the matrices

Here, *a* ⊕ *b* refers to the addition of each binary digit of *a* and *b* modulo 2. We may use this formalism to additionally keep track of *Y* data-qubit errors, which show up as correlated errors on subsequent XX and ZZ stabilizer checks, by introducing a new error channel*p*_{data,Y}. As before, we assume that leakage occurs at a rate *p*_{leak} independently of *h*_{1} and that seepage takes the system either to the state with either no error signal *h* = (1,0) or one error signal *h* = (0,1) with a rate *p*_{seep}.

As the output used for the H_{ZZ}-A HMM is the pure measurement outcome *M*_{A}, the dominant signal that must be accounted for is that of the stabilizer ZZ itself. This causes either a constant signal *M*_{A}[*m*] = *M*_{A}[*m* − 1] or a constant flipping signal *M*_{A}[*m*] = −*M*_{A}[*m* − 1]. This cannot be accounted for in the simple HMM, as it cannot contain any history in a single unleaked state. To deal with this, we extend the set of states in the H_{ZZ}-A HMM to include both an estimate of the ancilla state *a* ∈ {0,1,2} at the point of measurement and the stabilizer state *s* ∈ {0,1} and label the states by the tuple (*a*, *s*). The ancilla state then immediately defines the device output in the absence of any error

The only thing that affects the output matrices is readout error

Data-qubit errors flip the stabilizer with probability *p*_{data}

Ancilla errors flip the ancilla with probability *p*_{ancilla}, but these are dominated by *T*_{1} decay and so are highly asymmetric. To account for this, we used different error rates *p*_{anc,a,a′} for the four possible combinations of ancilla measurement at time *m* − 1 and expected ancilla measurement at time *m*

(Note that this asymmetry could not be accounted for in the data-qubit HMM as the state of the ancilla was not contained within the output vector.) As with the data-qubit HMMs, we assume that ancilla leakage is HMM state independent, as it is dominated by CZ gates during the time that the ancilla is either in ∣+⟩ or ∣−⟩. We also assume that leakage (with rate *p*_{leak}) and seepage (with rate *p*_{seep}) have equal chances to flip the stabilizer state, as ancilla leakage has a good chance to cause additional error on the data qubits.

The ancilla-qubit HMMs need little adjustment between the ZZ-check experiment and the experiment interleaving ZZ and XX checks. The *H*_{ZZ,XX}-A HMM behaves almost identically to the *H*_{ZZ}-A HMM, but we include in the state information on the XX stabilizer and the ZZ stabilizer. This leaves the states indexed as (*a*, *s*_{1}, *s*_{2}). The HMM needs to also keep track of which stabilizer is being measured. This may be achieved by shuffling the stabilizer labels at each timestep: For *a* = 0,1, we set

Other than this, the HMM follows the same equations as above (with the additional index added as expected).

### Uncertainty calculations

All quoted uncertainties are an estimation of SEM. SEMs for the independent device characterizations (see the “Generating entanglement by measurement” section and table S1) are either obtained from at least three individually fitted repeated experiments (*T*_{1}, *e*_{a}, *e*_{a,ZZ}) or in the case that the quantitiy is only measured once (*e*_{SQ}, *e*_{CZ}, *L*_{1}), the SEM is estimated from least-squares fitting by the LmFit fitting module using the covariance matrix (*44*).

SEMs in the first-round Bell-state fidelities (Fig. 1 and fig. S6 and see the “Generating entanglement by measurement,” “Protecting entanglement from bit flips and the observation of leakage,” “Leakage detection using HMMs,” and “Protecting entanglement from general qubit errors and mitigation of leakage” sections) are obtained through bootstrapping. For bootstrapping, a dataset (in total 4096 runs with each 36 tomographic elements and 28 calibration points) is subdivided into four subsets, and tomography is performed on each of these subsets individually. As verification, subdivision was performed with eight subsets leading to similar SEMs. SEMs in the multiround experiment parameters (steady-state fidelities and decay constants) are also estimated from least-squares fitting by the LmFit fitting module using the covariance matrix (see the “Protecting entanglement from bit flips and the observation of leakage,” “Leakage detection using HMMs,” and “Protecting entanglement from general qubit errors and mitigation of leakage” sections) (*44*).

## SUPPLEMENTARY MATERIALS

Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/6/12/eaay3050/DC1

Supplementary Materials and Methods

Supplementary Text

Fig. S1. Quantum processor.

Fig. S2. Complete wiring diagram of electronic components inside and outside the ^{3}He/^{4}He dilution refrigerator (Leiden Cryogenics CF-CS81).

Fig. S3. Study of data-qubit coherence and phase accrual during ancilla measurement.

Fig. S4. Quantum circuit for Bell-state idling experiments under dynamical decoupling.

Fig. S5. Postselected fractions for the “no error” conditioning in Figs. 2 and 4.

Fig. S6. Generating entanglement by sequential ZZ and XX parity measurements and PFU.

Fig. S7. ROCs for mitigation of data-qubit and ancilla leakage during interleaved ZZ and XX checks.

Fig. S8. Comparison of experimental data and no-leakage modeling of the repeated parity check experiments of Figs. 2 and 4.

Fig. S9. Leakage mitigation for the repeated parity check experiments as a function of the chosen threshold.

Fig. S10. Leakage mitigation for the simple, two-state HMMs for repeated parity check experiments as a function of the chosen threshold.

Fig. S11. ROCs for leakage mitigation as in fig. S7 but using simple two-state HMMs.

Table S1. Measured parameters of the three-transmon device.

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is **not** for commercial advantage and provided the original work is properly cited.

## REFERENCES AND NOTES

**Acknowledgments:**We thank W. Oliver and G. Calusine for providing the parametric amplifier; J. van Oven and J. de Sterke for experimental assistance; and F. Battistel, C. Beenakker, C. Eichler, F. Luthi, B. Terhal, and A. Wallraff for discussions.

**Funding:**This research is supported by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via the U.S. Army Research Office Grant No. W911NF-16-1-0071, and by Intel Corporation. T.E.O. is funded by Shell Global Solutions BV. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the ODNI, IARPA, or the U.S. Government. X.F. was funded by China Scholarship Council (CSC).

**Author contributions:**C.C.B. performed the experiment. R.V. and M.W.B. designed the device with input from C.C.B. and B.T. N.M. and A.B. fabricated the device. T.E.O. devised the HMMs with input from B.T., B.V., and V.O. X.F. and M.A.R. contributed to the experimental setup and tune-up. C.C.B., T.E.O., and L.D.C. co-wrote the manuscript with feedback from all authors. L.D.C. supervised the project.

**Competing interests:**The authors declare that they have no competing interests.

**Data and materials availability:**All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the authors.

- Copyright © 2020 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution NonCommercial License 4.0 (CC BY-NC).