## Abstract

Investigating the role of causal order in quantum mechanics has recently revealed that the causal relations of events may not be a priori well defined in quantum theory. Although this has triggered a growing interest on the theoretical side, creating processes without a causal order is an experimental task. We report the first decisive demonstration of a process with an indefinite causal order. To do this, we quantify how incompatible our setup is with a definite causal order by measuring a “causal witness.” This mathematical object incorporates a series of measurements that are designed to yield a certain outcome only if the process under examination is not consistent with any well-defined causal order. In our experiment, we perform a measurement in a superposition of causal orders—without destroying the coherence—to acquire information both inside and outside of a “causally nonordered process.” Using this information, we experimentally determine a causal witness, demonstrating by almost 7 SDs that the experimentally implemented process does not have a definite causal order.

- Quantum Information
- Quantum Optics
- Quantum Foundations

## INTRODUCTION

The notion of causality is an innate concept, which defines the link between physical phenomena that temporally follow one another, with one phenomenon manifestly being the cause of the other. Nevertheless, in quantum mechanics, the concept of causality is not as straightforward. For example, when the superposition principle is applied to causal relations, situations without a definite causal order can arise (*1*, *2*). Although this can lead to disconcerting consequences, forcing one to question concepts that are commonly viewed as the main ingredients of our physical description of the world (*3*), these effects can be exploited to achieve improvements in computational complexity (*4*–*6*) and quantum communications (*7*–*9*). Recently, this computational advantage was experimentally demonstrated in the study of Procopio *et al.* (*10*). However, the absence of a causal order was inferred from the success of an algorithm rather than being directly measured. Here, we explicitly demonstrate the realization of a causally nonordered process by measuring a so-called “causal witness” (*11*).

To make our results stronger (that is, make the causal witness more robust to noise), we performed a superposition of the orders of a unitary gate and a measurement operation. In other words, we made a measurement inside a quantum process with an indefinite order of operations [the quantum SWITCH (*1*)]. Performing a standard measurement inside the quantum SWITCH would destroy its coherence, because it would reveal the time at which the measurement is performed and would thus also reveal whether it is performed before or after other operations. In other words, such a measurement would reveal the causal order between the operations. However, in our scheme, the measurement outcomes are read out only “at the end” of the process, thus preserving its coherence. Because applications of indefinite causal orders will most likely require the superposition of orders of complex quantum operations, we believe that, in addition to the first direct demonstration of an indefinite causal order, our measurement in a quantum SWITCH can also be considered a technological step toward these applications (*4*–*9*).

In our usual understanding of causal relations, if we consider two events *A* and *B*, which are connected by a time-like curve, then we will have one of two cases: Either “*A* is in the past of *B*,” or “*B* is in the past of *A*.” However, the application of the superposition principle to these causal relations calls into question the interpretation of causal orders as a preexisting property. The causal order can become genuinely indefinite. To see this, consider a two-qubit quantum state |ϕ〉 lying in the composite Hilbert space , with and each being two-dimensional Hilbert spaces. It is possible to condition the order in which operations are applied to a target state on the value of a control state . For example, if the state of the control qubit is |0〉^{C}, then the two operators will be applied in the order *A* and then *B* on the state of the target qubit |ψ〉^{T}, and vice versa if the state of the control qubit is |1〉^{C}. Therefore, if the control qubit is in a superposition state , then a controlled quantum superposition of the situations “*A* is in the past of *B*” and “*B* is in the past of *A*” is established (Fig. 1). In the above situation, the causal order is not merely in a superposition. It is entangled with the state of the control qubit.

From this simple example, we can see that the causal order between events is not always definite in quantum mechanics. One could, in the spirit of hidden-variable theories, insist that there might nonetheless be a well-defined causal order. However, such a claim requires, in general, a theory to be nonlocal and contextual because of the Bell and Kochen-Specker theorems (*12*, *13*).

The case described above, called the quantum SWITCH, is the first explicit example wherein it was shown that quantum mechanics does not allow for a well-defined causal order (*1*). The SWITCH was recently experimentally implemented (*10*) by superposing the order in which two unitary operations acted. That experiment confirmed that a causally nonordered quantum circuit can solve a specific computational problem more efficiently than an ordered quantum circuit. However, only an indirect evidence of indefinite causal order was observed through the demonstration of this computational advantage. Therefore, the primary goal of our current experiment is to provide direct experimental proof of the causal nonseparability of the quantum SWITCH. For this purpose, we used a recently developed theoretical tool: the causal witness (*11*).

## RESULTS

### Theory

A causal witness is a carefully designed set of measurements, whose outcome will tell us if a given process is causally ordered or not. An intuitive way to introduce causal witnesses is through the well-known concept of an entanglement witness (*14*). First, recall that a composite quantum system ρ lying in a Hilbert space is separable or entangled depending on whether it can be written in the form (with and states of the subsystems *A* and *B* and 0 ≤ *p*_{i} ≤ 1, ∑_{i}*p*_{i} = 1) or not. Then, it can be shown that for all entangled states ρ^{ent}, there exists a Hermitian operator *S*, called an “entanglement witness,” such that Tr(Sρ^{ent}) < 0, but Tr(Sρ^{sep}) ≥ 0 for all separable states ρ^{sep}. Hence, it follows that if one measures an entanglement witness on a state and finds a negative value, then the state must be entangled.

A similar quantity was recently introduced to determine whether a process matrix *W* is causally separable or not (*2*). A process matrix (the counterpart of the density matrix in the entanglement witness example) describes causal relations between local laboratories (*15*). Consider two observers Alice and Bob who perform local operations *M*^{A} and *M*^{B} (*M*^{A} and *M*^{B} can be arbitrary quantum operations, from simple unitary operations to more complex measurement channels). By local operations, we mean that the only connection that Alice and Bob have with the external world is given by the quantum state that they receive from it and the state that they return to it. The process matrix *W* then details how this quantum state moves between the two local laboratories (Fig. 2). Hence, it is independent of the individual operations that Alice and Bob perform. In the case of the quantum SWITCH, the process matrix first routes the input state to Alice and Bob in superposition, then connects Alice’s output to Bob’s input and vice versa, and finally coherently recombines their outputs.

Because a causal witness characterizes a process rather than a state (unlike an entanglement witness), it requires a procedure akin to “process tomography” (that is, “causal tomography,” see Materials and Methods). Namely, we must probe the process with several different input states ρ^{(in)}. Then, for each input state, Alice and Bob implement several different known operations, and then, we perform a final measurement *D*^{(out)} (Fig. 2). In general, *M*^{A} and *M*^{B} can include measurement operations; thus, each could have additional measurement outcomes associated with it. We denote the outcomes of Alice and Bob’s local operations by *a* and *b*, and their choice of operation by *x* and *y*, respectively. We label the specific choice of an input state with *z* and the output of a detection operation with *d*. With this in mind, the probability of obtaining the outcomes , , and , with the input state can be written, using the Choi-Jamiołkowski isomorphism (*16*) (see the Supplementary Materials), as(1)with ∑_{a,b,d}*p*(*a*, *b*, *d*| *x*, *y*, *z*) = 1 for all the possible settings *x*, *y*, *z* and where *W* is the process matrix (*11*).

To calculate these probabilities for the quantum SWITCH, we must construct its process matrix, which we will call *W*_{SWITCH}. To do this, we will again use the Choi-Jamiołkowski isomorphism, which is a way of representing a linear operator that maps to as a state in the composite Hilbert space . As a first step, consider the identity channel from the output space of a party *P*_{1} to the input space of a second party *P*_{2}. To describe this as a process matrix, we can write it as a projector onto a process vector in the “double-ket notation” (*17*, *18*)(2)where *j* labels a basis over the spaces. We can now use this process matrix to describe an input state passing first to Alice (), then to Bob (), and finally to the output space (). This process is described by(3)

Alice and Bob are free to perform measurements and , respectively, but they are not part of the above process vector. Note that swapping the order of Alice and Bob is as simple as swapping the labels *A* and *B*. The vectors |*w*^{A→B}〉 (describing “Alice acts before Bob”) and |*w*^{B→A}〉 (describing “Bob acts before Alice”) both have a well-defined causal order (Fig. 1, A and B).

We are now in the position to construct the process matrix of the quantum SWITCH. Recall that for the quantum SWITCH, the control qubit’s state sets the relative amplitudes of Alice → Bob and Bob → Alice. Thus, the process vector of the quantum SWITCH [with the control qubit initially in the state )] is simply(4)

For the causal witness we will consider here, we will only measure the state of the control qubit after the SWITCH. Thus, we need to construct the process matrix taking an input state and returning the state of the control qubit. This is done by tracing over the SWITCH output (that is, the target qubit) and fixing the state of the control qubit to be . Thus, the process matrix to compute the final state of the control qubit is represented by the process matrix(5)where is the partial trace over the output system qubit.

Using the same formalism, one can also concisely describe all causally separable processes. Consider two general process matrices linking the two local laboratories *A* and *B*, *W*^{A→B} and *W*^{B→A}. Here, contrary to Eq. 3, the link between the laboratories is in general no longer the identity channel. Then, by simply taking an incoherent mixture of the two, one can describe all possible causally separable processes (*11*)(6)where 0 ≤ *p* ≤ 1. Physically, this can be understood as each run of the process having a well-defined order, with Alice acting first with probability *p* and Bob acting first with probability 1 − *p*. From this definition, it is apparent that every convex combination of causally separable process matrices is still a causally separable process matrix; thus, the set of causally separable process matrices is convex.

Causal witnesses are designed to distinguish between causally separable (Eq. 6) and causally nonseparable process matrices (such as Eq. 5). For all causally nonseparable process matrices *W*^{n−sep}, there exists a Hermitian operator *S*, called a causal witness, such that(7)but Tr(*SW*^{sep}) ≥ 0 for all causally separable process matrices *W*^{sep} (*11*), just as in the entanglement witness example. As we show in Materials and Methods, such an operator is always guaranteed to exist. This is because the convexity of the causally separable process matrices set ensures that there is always a hyperplane, which separates the set from a given causally nonseparable process *W*^{n−sep} (*19*).

To implement a causal witness experimentally, we need to decompose it in terms of operations that we can realize in the laboratory: preparation of states, applying quantum channels, and doing measurements. This can always be done, because the tensor product of these operations spans the whole Hilbert space of Hermitian operations, which includes the Hilbert space of process matrices. Using the notation defined in Eq. 1, a causal witness can be expanded as(8)where the coefficients α_{a,b,d,x,y,z} are real numbers that define (together with the input states, operations, and measurements) a particular witness. From the definition in Eq. 1, it follows that(9)and, therefore, the evaluation of the quantity Tr(*SW*) for a given process *W* translates into a determination of probabilities *p*(*a*, *b*, *d*|*x*, *y*, *z*) for several input states and measurement choices.

In the case where there are no restrictions on which operations we can implement, we choose the coefficients α_{a,b,d,x,y,z} by maximizing the quantity −Tr(*SW*) over the set of all possible causal witnesses, as described in Materials and Methods. This quantity, for such an optimal witness, corresponds to the maximum “amount of worst-case noise” that the process under examination can tolerate while remaining causally nonseparable (*11*). More precisely, it is the minimal λ ≥ 0 for which the process matrix(10)becomes causally separable, where Ω is any other process that could have been prepared instead of the desired *W*^{n−sep}. We will refer to this quantity as the “causal nonseparability” (CNS) of a process *W*(11)

When the −Tr(*SW*) < 0, we define the CNS(W) to be zero.

However, in practice, we may not be able to maximize −Tr(*SW*) over the whole set of causal witnesses, because there can be restrictions on which operations Alice and Bob have access to. To fully assess the CNS, Alice and Bob must be able to implement a complete basis of operators, which gives them access to the maximal amount of information about the process. Therefore, we define the experimentally certifiable CNS [hereafter referred to as CNS_{exp}(*W*) = −Tr(*S*_{exp}*W*)] as the maximum of −Tr(*SW*) over the restricted set of operators. In this case, CNS_{exp}(*W*) is no longer the amount of noise that the process can tolerate before becoming nonseparable but the maximal amount of noise for which this restricted class of witnesses can still detect its causal nonseparability.

If Alice and Bob could only implement unitaries, for example, then this would drastically diminish the attainable CNS_{exp}(*W*)—this path was chosen by Procopio *et al.* (*10*). Because a unitary operation cannot extract any explicit information from the manipulated state (and, consequently, from the process), neither Alice nor Bob can gain any knowledge about their received state when applying only these gates, and consequently, the estimated CNS_{exp}(*W*) is less efficient. However, if the unitary operations are replaced with projective measurements, then, roughly speaking, information about the process at different points throughout the SWITCH can be extracted. If both Alice and Bob have access to measure and reprepare operations, then one can achieve CNS_{exp}(*W*) = CNS(*W*).

Because of the experimental challenges of coherently adding measure-and-reprepare operations, Alice performs a measure-and-reprepare operation and Bob implements a unitary channel in our experiment. It turns out that giving one party a measure-and-reprepare operation and the other a unitary operation still increases CNS_{exp}(*W*) substantially. Thus, the causal witness we will measure depends both on Alice’s outcome (performed inside the SWITCH) and on our final measurement outcome.

### Experiment

To experimentally implement the quantum SWITCH, we need a control and a target qubit. In our experiment, we encode a control qubit in a path degree of freedom of a photon and a target qubit in the same photon’s polarization. The technique of using multiple degrees of freedom has enabled many previous quantum technologies (*20*–*22*). For our present experiment, this is convenient because Bob’s unitary gate can be implemented easily with three wave plates, whereas Alice can perform a projective measurement with wave plates and a polarizing beam splitter. Note that there are other proposals to coherently control the causal orders of events (*11*, *23*, *24*). In these proposals (as in ours), the target and control system are encoded in the same particle. In principle, it is also possible to use different particles. With photons, this could be done using a so-called controlled path gate (*25*) or potentially by using a spin qubit to control the causal order acting on a photon (*26*).

In our experiment, the realization of the unitary channel is straightforward, but a short remark is necessary concerning Alice’s measurement. It is clear that a polarizing beam splitter enables one to distinguish the polarization of an incoming photon. However, a polarizing beam splitter gives rise to additional spatial modes (that is, there are two output paths after the polarizing beam splitter). These two spatial modes can be considered as a new spatial qubit. Then, the action of the polarizing beam splitter is to couple the polarization qubit to this additional qubit. This is formally equivalent to a von Neumann system-probe coupling, which can model the interaction necessary for any projective measurement (*27*) and has been used between path and polarization degrees of freedom in the experiment reported by Rozema *et al.* (*28*). In our experiment, the polarization qubit is the system, and it is coupled (via the polarizing beam splitter) to an additional spatial qubit, which is the probe. We can read out information about the system by measuring the probe (with a photon detector) at a later time. This solves the nontrivial problem of realizing a measurement operation inside a quantum SWITCH. Most approaches to acquire information inside the SWITCH would lead to distinguishing information about the order in which the operations were applied, destroying the quantum superposition. However, in our solution, because the probe qubit is not measured until the information about the order of application of the operations is erased, the entire process can remain coherent. This solution also works deterministically; that is, both of Alice’s outcomes are retained. It also allows Alice to implement a measurement-dependent repreparation by placing different wave plates in each of the two outcome modes.

Our implementation of the quantum SWITCH draws inspiration from the study of Procopio *et al.* (*10*), in which only orders of unitary operations were superimposed. Therefore, as in the study of Procopio *et al.* (*10*), our experimental skeleton consists of a Mach-Zehnder interferometer (MZI) with a loop in each arm. However, because Alice’s measure-and-reprepare channel adds an additional path degree of freedom, we need an extra interferometric loop.

A scheme of our experimental apparatus is presented in Fig. 3. The first step is to set the state of the system qubit (encoded in the polarization) with a polarizer and a half–wave plate. Then, the photon impinges on a 50/50 beam splitter; this sets the state of the control qubit (encoded in a path degree of freedom) in |+〉. Depending on the state of the path qubit, the photon is sent to either Alice (who performs *M*^{A}) and then Bob (who performs *U*^{B}) or vice versa. As described above, *M*^{A} is a projective measurement (a sequence of two wave plates and a polarizing beam splitter) and a corresponding repreparation (a sequence of two wave plates in only one of the polarizing beam splitter outputs), and *U*^{B} is a unitary gate (a sequence of three wave plates). Because the polarizing beam splitter adds a second path qubit, this results in four path modes, encoding both the state of the control qubit and the outcome of the measure-and-reprepare channel. Referring to Fig. 3, the external (yellow) interferometer arises from the outcome *H*—also referred to as a logical 0—and the internal (purple) one arises from the outcome *V*—a logical 1. We finalize the SWITCH by erasing the information about the order of the gates. This can be done by applying a Hadamard gate to the control qubit. Because the control qubit is a path qubit, a Hadamard gate can be implemented with a 50/50 beam splitter. However, in our experiment there are two path qubits (the control qubit and Alice’s ancilla measurement qubit). Thus, we must use two 50/50 beam splitters: one beam splitter to interfere the control qubit when Alice’s ancilla qubit is in the state |0〉, and one beam splitter when it is in the state |1〉. Finally, each of the four outputs is coupled into single-mode fibers, which are each connected to single-photon detectors (SPDs). Then, detecting a photon in one of the four modes yields the result of both the measurement of the control qubit in the superposition basis and Alice’s measurement (see the detector labels in Fig. 3).

We wish to evaluate the CNS of our quantum SWITCH by experimentally estimating the expectation value of a causal witness *S* (Eq. 8). In other words, we want to assess Tr(*S*_{exp}*W*_{SWITCH}), where *W*_{SWITCH} here refers to the process matrix describing our experiment. Because the trace is linear, this can be done by implementing one term in the sum of *S* (Eq. 8) at a time. To estimate a single term, we injected an input state into the SWITCH, Alice and Bob each perform an operation inside, and then we measured the outputs of the overall process. Because the control qubit measurement and Alice’s measurement are both single-qubit projective measurements, there are a total of four possible outcomes. For each measurement setting, different input states are sent into the SWITCH, and the probabilities of each outcome are experimentally estimated by sending multiple copies of the same input state. To compute the final value of the CNS_{exp}(*W*_{SWITCH}), the results of these measurements are weighted by the corresponding α_{a,b,d,x,y,z} and summed.

The number of terms in the sum of Eq. 8 is determined by the specific witness we wish to evaluate. In general, Alice and Bob must each implement a set of operators forming a basis over their channels. For Bob’s unitary channel, this requires 10 elements, and for Alice’s measure-and-reprepare channel, this requires 16 (*11*). In our case, we formed Alice’s basis with four (noncommutative) projection operators and three unitary repreparation operators when the outcome was *H* and one operator (the identity operator) when the outcome was *V*. This corresponds to 12 measure-and-reprepare channels when the outcome of Alice’s measurement is *H* and 4 when it is *V*, for a total of 16 measure-and-reprepare operators. For Bob, we implement all 10 unitaries.

Varying the input state can make CNS_{exp}(*W*_{SWITCH}) more robust to noise. Hence, for our experiment, we used three different input states: |*H*〉, |*V*〉, and |+〉. Finally, we implemented two different measurement operators *D*^{(out)} on the control qubit (corresponding to the two outcomes of the projection onto basis ). Thus, for our experiment, the calculation of CNS_{exp}(*W*_{SWITCH}) translates into(12)

Here, we do not need the sum over *b*, because Bob’s unitaries do not have an outcome. The probability in Eq. 12 is defined as(13)

We must experimentally estimate all of these probabilities to evaluate CNS_{exp}(*W*_{SWITCH}). There are 1440 terms in this sum. However, four outcomes (two from Alice’s measurement and two from the final detection) are collected simultaneously (experimentally, this means the counts of four SPDs are collected in one setting). Therefore, we need 360 different experimental settings. However, for our witness of the 360 prefactors α_{a,d,x,y,z}, 101 are equal to zero; thus, there are actually only 259 relevant experimental settings.

With this in place, we can experimentally measure the CNS_{exp}(*W*_{SWITCH}) (for information relating to experimental visibility, stability, and data taking procedure, see Materials and Methods). Figure 4 shows some of the probabilities *p*(*a*, *d*|*x*, *y*, *z*) (Eq. 13) for the four outcomes; that is, for Alice, *a* = 0, 1, and our final measurement, *d* = 0, 1 (the remainder are shown in the Supplementary Materials). In Fig. 4, the experimentally obtained values are denoted by blue dots, and the theoretical predictions are represented by bars.

Our main source of error is phase fluctuations in the two interferometers. Therefore, we performed a separate measurement (presented in Materials and Methods) to characterize this error. The error bars in Fig. 4 represent both these phase errors and Poissonian errors due to finite counts. These errors do not take into account systematic errors, such as wave plate miscalibration, because these systematic errors represent a deviation of our experimental SWITCH from the ideal SWITCH.

We can now obtain a value for the CNS of our process by weighting the data presented in Fig. 4 (and figs. S1 to S3) by α_{a,d,x,y,z} and then summing them. The result is(14)

The error bar on CNS_{exp}(*W*_{SWITCH}) was calculated by Gaussian error propagation from the errors of the individual probabilities. The theoretical maximum value for CNS_{exp}(*W*_{SWITCH}) is 0.2842. The disagreement between this and our measured result is caused primarily by two effects. First, given the reduced visibility of the interferometers (which we will discuss in detail shortly), the maximal value for CNS_{exp}(*W*_{SWITCH}) is 0.2523, when the visibility is 0.9539. The remaining discrepancy comes from systematic errors, such as wave plate miscalibration, which effectively mean that the unitaries Alice and Bob implement differ slightly from their targets. For example, we estimate, using a simple Monte Carlo simulation, that a wave plate calibration error of 3° would explain this discrepancy, leading to a drop in the CNS of approximately 0.043. Still, given our measured result, we can conclude that our process is causally nonseparable by a margin of approximately 7 SDs. This large margin demonstrates the effectiveness of performing a measurement operation inside the quantum SWITCH.

As mentioned above, the causal nonseparability (as measured using a causal witness) can be considered as a measurement of how much noise can be added to the process before it becomes causally separable. The CNS_{exp} we have discussed so far refers to a worst-case noise model (*11*), wherein the desired process is replaced with the process that can do the most damage to its causal nonseparability with a probability(15)

Because the replacement is done with the worst-case process, this is a lower bound on the “probability of noise” that can be tolerated (see Materials and Methods). For our process *p*_{worst−case} = 0.168 ± 0.001.

We studied the effect of the noise most relevant to our experiment, namely, dephasing the control qubit but not the system qubit. This noise is the strongest in our setup because the control qubit is encoded in a path degree of freedom, which must remain interferometrically stable [see the study of Branciard (*29*) for the formal definition of this noise model]. We realized this noise by unbalancing the path length of the interferometers by more than the photons’ coherence length. The experimental signature of this imbalance is a reduced visibility of the interferometer. We measured the CNS for several visibilities between 0.95 and 0.06. Figure 5 shows a decrease in the expectation value of as the noise increases. There is an offset between the experimental data and the theoretical prediction due to systematic errors. However, both theory and experiment follow the same trend. By extrapolating our fit of the experimental data to (where the process becomes causally separable), we observe a “noise tolerance” of 0.342 for this type of noise. As expected, this is larger than our experimentally measured *p*_{worst−case}, indicating that it is a lower bound.

## DISCUSSION

Our experiment demonstrates how to perform a measurement inside a quantum SWITCH without destroying the superposition of causal orders. The task was only assumed to be possible in the study of Araújo *et al.* (*11*), but no method to accomplish it was proposed. The difficulty is that performing a standard measurement reveals the time at which it is performed and, thus, whether it is performed before or after the partner’s operation. Consequently, the superposition of causal orders becomes incoherent. Our way around this is to break the measurement into two steps: First, the system coherently interacts with an ancilla through a unitary operation (namely, the additional path modes introduced by the local operation in our experiment). Second, after finalizing the quantum SWITCH (interfering these modes), the ancilla is measured. This allows us to make a “coherent measurement at different times” and then erase the ordering information.

We demonstrated the causal nonseparability of our experimental apparatus by measuring a causal witness. With the ability to perform a measurement inside the SWITCH, we could increase the robustness of the causal witness to noise. Previous experimental work only indirectly accessed the causal nonseparability of the SWITCH and, moreover, only used unitary gates in the SWITCH (*10*). Although some other experiments (*30*–*33*) have also studied the topic of causal relations in quantum mechanics, they focused on a different aspect. For example, in previous studies (*30*, *31*, *33*), instead of creating a genuinely indefinite causal order, as in our work, the authors distinguished between different causal structures. The incoherent mixture (*30*) and a quantum superposition (*31*) of different causal relations reported previously are both compatible with one party in the past and the other in the future. Thus, in our language, they correspond to causally ordered processes.

Our work represents the first experimental realization of a quantum superposition of orders of nonunitary channels and the first measurement of a causal witness. We believe that this will be an important step toward the realization of quantum superpositions of the order of more elaborate processes. Because it has been theoretically demonstrated that causally nonordered processes can give rise to a reduction in the query complexity of certain tasks (*4*–*6*) and lead to more efficient communication channels (*7*, *8*), it is important to study new techniques to create more complex, causally nonordered processes. We already see an advantage in our current work. Making a measurement inside the quantum SWITCH made our experiment more robust to noise and allowed us to demonstrate, by approximately 7 SDs, that our setup cannot be described by a causally ordered process.

## MATERIALS AND METHODS

### Single-photon source

We generated heralded single photons using a type II spontaneous parametric down-conversion (SPDC) process in a Sagnac loop (*34*). The Sagnac loop was realized using a dual-wavelength polarizing beam splitter and two mirrors. The SPDC crystal was a 20-mm-long periodically poled crystal potassium titanyl phosphate crystal. The crystal was pumped by a 23.7-mW diode laser centered at 395 nm. The polarization of the laser was set to be horizontal. With this, we generated degenerate pairs of single photons centered at 790 nm, in a separable polarization state |*H*〉|*V*〉. Polarizers in the signal and idler modes were used to ensure that the polarization was in a well-defined state. The down-converted photons were coupled into single-mode fibers. One photon was sent directly to an SPD and used to herald the other photon’s presence for the experiment, whereas the other was sent to our experiment. After passing through the experiment, we observed a coincidence rate between the herald detector and the four final-measurement detectors of 3750 pairs per second.

### Implementing Alice and Bob’s channels

As discussed in the main text, to experimentally measure a causal witness, Alice and Bob need to implement a series of quantum channels on a polarization qubit inside the quantum SWITCH. Alice must perform a measure-and-reprepare channel, whereas Bob must implement a unitary channel. Alice measures in four different bases. We define her different bases by a unitary operator preceding a projective measurement in the basis {|0〉, |1}. Alice’s premeasurement operators are listed in the first column of Table 1. When her outcome is |0〉 (in a given basis), Alice implements one of three different repreparation operators (second column of Table 1). On the other hand, when her outcome is |1〉, she performs the identity channel. Thus, she has 16 different measure-and-reprepare maps. Bob simply implements 10 different unitary operators (third column of Table 1).

We experimentally implemented both Alice’s measurement operators and repreparation operators through a sequence of two wave plates (quarter–wave plate and then half–wave plate) and Alice’s projective measurement in a polarizing beam splitter measuring in {*H*〉, |*V*}. Bob’s operators were implemented via three wave plates (quarter–wave plate, half–wave plate, and then quarter–wave plate). In Table 2, we show the specific wave plate angles we used for each operator.

### Experimentally estimating probabilities

Because Alice makes a two-outcome measurement, and our final measurement has two outcomes, for each setting of Alice and Bob, there are four different outcomes. Experimentally, each outcome corresponds to a different SPD. For each setting, we collected approximately 7500 counts in total after 2 s of data acquisition. From these counts, we estimated the four corresponding output probabilities through the formula(16)where *C*_{mn} is the number of counts collected at one of the detectors, and the η factors are different relative detector efficiencies, described below. Here, *m* labels Alice’s outcome [experimentally, this labels in the internal (purple) or external (yellow) interferometer] and *n* labels the outcome of the final measurement (experimentally, port 0 or port 1 of either interferometer). The total number of (efficiency corrected) counts, appearing in Eq. 16, is(17)

The efficiency factors in the above equations are defined as follows. The single-subscript factor η_{m} refers to relative efficiencies between the internal (*m* = 1) and external (*m* = 0) interferometer (Fig. 3). The other factors refer to the relative efficiencies between the two ports *n* = 0 and *n* = 1, of interferometer *m*. Then the absolute efficiency of a given detector is . Roughly speaking, to estimate the relative efficiencies, we must send the same number of photons between the detectors and compare the measured count rates.

To estimate within each interferometer, we sent the photons between the two ports by scanning the phase (when all of the internal wave plates are set to 0) by means of a piezo-electrically driven translation stage. Plots of representative interference fringes (already efficiency corrected) for each interferometer are shown in Fig. 6. By requiring the total counts out of each port to be constant, we can obtain a relative efficiency between the two ports in each interferometer. In practice, we obtain the efficiency by plotting the counts out of one port versus the counts out of the other port. If the two efficiencies are equal, the slope of this line will be 1. However, because of different coupling and detector efficiencies, this is enforced by requiring(18)where *K*_{0} and *K*_{1} are constants. We set one efficiency of each pair to 1, because we were interested in the relative efficiency. Setting (arbitrarily) means that the slope of *C*_{m1} versus *C*_{m0} will be . These plots, for both interferometers, are shown in Fig. 7.

If we next estimate η_{m}, the relative efficiency between two interferometers, then we can estimate the required probabilities (Eq. 16). To do this, we used the state preparation wave plate (Fig. 3) to send the photons all to one interferometer or the other. In each case, we scanned the phase. Then, using the previously discussed efficiencies we have *K*_{0} and *K*_{1} (Eq. 18). As before, we can set one of the relative efficiencies to 1, we chose η_{0} = 1. Then, we can calculate the final efficiency as(19)

This works because by using the wave plates and the polarizing beam splitter, we can send nearly all of the incident photons one way or the other.

Using this procedure, we now have relative efficiencies between all of the detectors. Note that ; however, this does not matter because even if we had the absolute efficiency of each detector, it would cancel out in the calculation of the probability (Eq. 16), because we must normalize by *C*_{tot}. After evaluating *p*_{00}, *p*_{01}, *p*_{10}, and *p*_{11} for each of Alice and Bob’s settings, we weighted each by the corresponding α_{a,d,x,y,z} (Eq. 12) and summed them all up. This gave us our experimental value of the causal nonseparability.

### Stability and visibility of the interferometers

Central to our experiment were two interferometers whose overall size was approximately 80 cm × 120 cm. The visibility of the two interferometers was 95%; this is apparent in the interferograms shown in Fig. 7. This error can be interpreted as dephasing noise on the control qubit (see the discussion in the main text). In addition to the reduced visibility, the phase of the interferometer fluctuated. If the phase fluctuates on the time scale of the acquisition time, then this would further decrease the visibility. However, we found that the phase drifts rather slowly, by approximately 0.01 rad/min. To measure the causal witness, we needed to set 259 different wave plate settings. Moving the wave plates from one setting to the next took approximately 30 s. Combined with the measurement time of 2 s, this means that it took approximately 30 s per measurement setting. Therefore, after 30 measurements, the phase drifted enough to cause a noticeable error. To combat this, we automatically reset the phase to 0 rad every 20 measurement settings by setting the wave plates to 0°, scanning the piezo-electrically driven translation stage, and moving to the maximum of the fringe. Despite this action, there was still residual phase drift. We performed a separate measurement, mimicking our experimental procedure, to characterize this remaining phase drift. We set the wave plates to 0° so that we could directly observe the drift phase drift. As above, we counted for 2 s, and reset the phase to 0 rad every 20 measurements. However, the wave plates remained set to 0° the entire time. Therefore, in the absence of phase drift, the fringe would have remained at a maximum. By measuring the deviation from the ideal values, we estimated that, over the course of our entire data run, we had a residual phase fluctuation of approximately 0.04 rad. Then, we propagated this error to estimate an error on each probability that we measured. These are the error bars drawn in Fig. 4 and figs. S1 to S3.

### Causal witness derivation for our setup

Here, we define what a causal witness is and sketch the algorithm that was used to compute the witness suitable for our experimental setup. See the study of Araújo *et al.* (*11*) for an exhaustive introduction to the subject. Throughout this section, we will use the Choi-Jamiołkowski isomorphism, which we introduce briefly in the Supplementary Materials.

A process matrix (where the Hilbert spaces refer both to the input and the output of the laboratories) is “causally separable” if it can be written as a convex combination of processes compatible with the causal order *A* → *B* and *B* → *A*, that is, as *W*^{sep} ≔ *pW*^{A→B} + (1 − *p*)*W*^{B→A}. A causal witness is a Hermitian operator such that for all “causally nonseparable” process matrices *W*^{n−sep}, Tr(*SW*^{n−sep}) < 0, but for any process*W*^{sep}, Tr(*SW*^{sep}) ≥ 0. The existence of this Hermitian operator *S* is justified by the separating hyperplane theorem (*19*). As a consequence of this theorem, and because the set of causally separable processes is convex, for every causally nonseparable process *W*^{n−sep}, there exists a causal witness *S* such that Tr(*SW*^{n−sep}) < 0. This is illustrated graphically in Fig. 8.

The optimal causal witness *S*_{opt} for a given process *W* can be computed efficiently using a “semidefinite program” (SDP) (*11*)(20)where and are, respectively, the set of causal witnesses and the set of Hermitian operators that have nonnegative trace with process matrices, as defined in the study of Araújo *et al.* (*11*), and is the identity operator on divided by the dimension of the output spaces out for normalization.

The causal nonseparability CNS(*W*^{n−sep}) = −Tr(*S*_{opt}*W*^{n−sep}) is the minimal λ ≥ 0 such that the process matrix(21)is causally separable, after being optimized over all valid process matrices Ω. This means that it is the minimum amount of worst-case noise necessary to make *W*^{n−sep} causally separable or, equivalently, the maximum (or rather the supremum) amount of worst-case noise that *W*^{n−sep}can tolerate before becoming causally separable. Noting that , we see that can be interpreted as the probability that the worst-case process is prepared instead of the desired process *W*^{n−sep} and therefore that is the maximal probability that still allows us to see causal nonseparability.

Any witness *S* (particularly *S*_{opt}) can be decomposed with respect to a basis for the space . Such a basis consists of the Choi-Jamiołkowski representations of general state preparations on , general measurement and repreparation operations on and , and general measurements on . Having access to such a basis of operations means being able to perform full causal tomography.

However, in our experimental setup, Alice could implement general measure-and-reprepare operations , but Bob could implement only unitary operations , and measurements were carried out only in the superposition basis. Thus *S*_{opt} will not necessarily be experimentally achievable, and in our case, it was not. To compute the best witness that we could experimentally implement, we added a restriction on the decomposition of the witness as an additional constraint in the SDP, which then outputs the optimal experimentally accessible witness S_{exp}(22)where are the 24 Choi-Jamiołkowski representations of measurement-repreparation maps, among which 16 were linearly independent, are the 10 linearly independent Choi-Jamiołkowski representations unitaries, which are listed under the heading of Implementing Alice and Bob’s Channels, and are the two projectors onto the superposition basis.

The algorithm 21 returns the coefficients α_{a,d,x,y,z}, which were used to weight the experimental probabilities *p*(*a*, *d*|*x*, *y*, *z*) corresponding to to compute the experimental value for Tr(*S*_{exp}*W*_{SWITCH}).

Analogously to the ideal case, the “experimentally accessible causal nonseparability” [that is, CNS_{exp}(*W*_{SWITCH}) = −Tr(*S*_{exp}*W*_{SWITCH})] is the maximal amount of worst-case noise that can be admixed to *W*_{SWITCH} before our experimental setup becomes incapable of certifying that *W*_{SWITCH} is causally nonseparable, and is the maximal probability of preparing the worst-case noise process instead of the ideal *W*_{SWITCH}.

## SUPPLEMENTARY MATERIALS

Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/3/3/e1602589/DC1

section A. Choi-Jamiołkowski isomorphism

fig. S1. Experimentally estimated probabilities.

fig. S2. Experimentally estimated probabilities.

fig. S3. Experimentally estimated probabilities.

table S1. List of all the experimental measurement settings and the corresponding coefficients.

Reference (*35*)

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is **not** for commercial advantage and provided the original work is properly cited.

## REFERENCES AND NOTES

**Acknowledgments:**We thank I. Alonso Calafell for assisting with the electronics and C. Branciard, F. Costa, F. Massa, and M. Zych for useful discussions.

**Funding:**G.R. acknowledges support from the uni:docs fellowship programme. L.A.R. acknowledges support from the Templeton World Charity Foundation (fellowship no. TWCF0194). Č.B. acknowledges support from the John Templeton Foundation, and Individual Project (no. 24621). Č.B. and P.W. acknowledge support from the Doctoral Programme CoQuS (no. W1210-3). P.W. also acknowledges support from the European Commission, Emulators of Quantum Frustrated Magnetism (EQuaM) (no. 323714), Photonic Integrated Compound Quantum Encoding (PICQUE) (no. 608062), Graphene-Based Single-Photon Nonlinear Optical Devices (GRASP) (no.613024), Quantum Simulation on a Photonic Chip (QUCHIP) (no.641039), the Austrian Science Fund (FWF) through the START Program (Y585-N20), and the U.S. Air Force Office of Scientific Research (FA9550-16-1-0004). L.M.P. acknowledges partial support from Consejo Nacional de Ciencia y Tecnología–Mexico, from 1 November 2015 to 31 October 2016; the corresponding project is 10010-2015-02.

**Author contributions:**G.R., L.A.R., M.A., A.F., and L.M.P. designed the experiment. G.R. and L.A.R. built the setup and carried out data collection. G.R., L.A.R., A.F., and M.A. performed data analysis. J.M.Z. designed and built the automated components. G.R. and M.A. created the figures. P.W. and C.B. supervised the project. All authors contributed to writing the paper.

**Competing interests:**The authors declare that they have no competing interests.

**Data and materials availability:**All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the authors.

- Copyright © 2017, The Authors