Experimental learning of quantum states

A photonic system is used to demonstrate that quantum states can be approximately learned using a linear number of measurements.

The exponential scaling of the wavefunction, arising from the tensor product description of multi-particle states, is one of the remarkable properties of quantum systems.If exploited correctly it can be used to achieve the computational advantages theorised in quantum information processing, but it can also lead one to question the consistency of quantum mechanics itself: does it make sense at all to talk about objects with more parameters than the number of atoms in the universe?
One of the problems arising from the exponential scaling of the wavefunction can be formalised in quantum tomography [1][2][3][4][5][6][7][8].The central task of quantum tomography is to produce a description of an n-qubit state given the ability to prepare and measure k of its copies [6].Characterising an unknown quantum state is a fundamental tool in quantum information processing.A survey of the major applications and present challenges in state tomography can be found in the review by Banaszek, Cramer, and Gross [1].State estimation is, in general, an expensive procedure.For an n-qubit quantum state it can be shown that estimating the ideal state up to an approximation parameter requires Ω(4 n / 2 ) operations [2].Although prior information, such as the state being low-rank, can be used to reduce the computational cost of quantum tomography [3,7,8], there is no hope of overcoming the exponential scaling for general unknown quantum states.Given this difficulty, it is valuable to interpret quantum tomography as a learning problem, with the hope of using the well-developed machinery of computational learning theory, for optimizing the number of required measurements.
Computational learning theory [9,10] is a research field devoted to studying the design and analysis of machine learning algorithms.Particularly relevant for our purposes is supervised machine learning.Here the learner is presented with a number of examples consisting of input-output pairs and is subsequently assigned the task of predicting the output of a new input.This model of learning has been formalised in computational learning theory by Valiant in 1984 [11] with the introduction of the Probably Approximately Correct (PAC) model.This framework provides two indicators of the efficiency of a learner: the sample complexity and the time complexity.The first is the worst-case number of examples it uses to reach some target competency, while the second one is the worst-case running time of the learner.In this article we are concerned with the sample complexity of the problem of learning quantum states.
Quantum state tomography can be rephrased as a learning problem in the following sense.A full tomography requires a complete set of measurements.Consider a learner that by looking at only a few measurements can predict the outcome of any measurement made on the state.It is easy to see that generating this hypothesis is equivalent to reconstructing the density matrix of the state.Because quantum tomography requires an exponentially large number of measurements we might assume that the same applies for the learning problem.
This apparent exponential scaling of the learning problem for quantum states can be interpreted as formalising the objections of quantum mechanics sceptics (for a critical discussion see [12]).Indeed, one of the fundamental tasks of science is to come up with hypotheses that, by explaining past observations, let us predict future observations.A theory that requires an exponentially growing number of observations to produce its hypothesis may signal a problem with the theory itself.
Computational learning theory, and in particular the PAC model, can help to address these conundrums.By analysing quantum tomography from a computational FIG. 1. Schematic of the learning procedure.In the learning phase (top panel) measurements drawn randomly from D are performed on the physical state ρ.Based on the measurement outcomes the learning algorithm outputs an hypothesis σ.In the prediction phase (bottom panel) the goal is to predict the experimental outcome of a measurement E drawn from D using σ as hypothesis.
learning perspective, Aaronson [13] proved that quantum states can be PAC-learned with a linearly scaling training set.Here we present the first experimental demonstration of such linear scaling.Our contributions also include developing a testable model for the main theorem proved in [13] and estimating an important scaling constant.We run the experiments on a photonic platform including up to 6 qubits.Our results demonstrate experimentally an important property of quantum states and highlight the power of computational learning theory in the quantum information framework.

QUANTUM LEARNABILITY THEORY
Let us recall some standard definitions in quantum theory.A generic n-qubit state ρ is a trace-one, positive semidefinite matrix acting on a Hilbert space of dimension 2 n .Every observation of a state is mathematically described by a positive operator valued measurement (POVM), E = {E (j) }, where each E (j) is a Hermitian positive semidefinite operator such that j E (j) = I.The probability of measurement outcome j is p(j) =Tr(E (j) ρ).For our purposes, we refer to a measurement of ρ as a two-outcome POVM {E (1) = E, E (2) = 1 − E} with eigenvalues in [0, 1].We denote by S the set of all mea-surements on n qubits.Following Ref. [13] we define the learning of ρ as the task of processing a training set composed of m tuples {(E i , Tr(E i ρ))}, drawn from a probability distribution D, in order to predict the "behaviour" of ρ on most measurements drawn from D. This concept of learning is defined in the context of Valiant's PAC model [11].In this framework, originally developed for Boolean functions but then extended to real-valued ones by Barlett and Long [14], a learning algorithm (the learner) tries to approximate with high probability an unknown function f : X → Y from a training set of random labelled examples.Each labelled example is of the form (x, f (x)), where x is distributed according to some unknown distribution D. In order to make learning possible we restrict the hypothesis that the learner can use to approximate f to a set of functions H = {h : X → Y}.We refer to H as the hypothesis class.The learning algorithm takes as input the training set and generates a hypothesis h ∈ H that approximates f .The PAC model makes use of two approximation parameters, and δ.The accuracy parameter determines how far the hypothesis h can be from f .The confidence parameter gives the probability of sampling a training set that is not representative of the underlying distribution D. A hypothesis class H is said to be PAC-learnable if there exists an algorithm that, for every probability distribution D and function f and for every ,δ ∈ (0, 1), when running the learning algorithm on m ≥ m H examples drawn from D, we have that, with probability at least 1 − δ, Here by ∼ we indicate that x is drawn from D. The value m H determines the minimum number of examples required to PAC-learn the class H.We refer to m H as the sample complexity of the hypothesis class H.We note that the learner must test the predictions under the same distribution D that determines the elements in the training set.
The PAC-model has been adapted to quantum states in [13].Here the learner tries to approximate a function F ρ : S → [0, 1] where F ρ is defined as i ))}.Notice that we always take the first element E (1) i of each POVM E i .For this reason, in the following, we take E (1) i = E i .The POVM measurements {E i } are drawn from an unknown distribution D and the F ρ (E i ) values are determined experimentally.After processing the training set the learner outputs a hypothesis state σ.A quantum state is considered to be learned if, with probability 1 − δ, a training set generated according to the distribution D can be used to predict with probability and accuracy γ any other measurement drawn from D: ( A pictorial description of this learning procedure is shown ) is locally manipulated via QWPs, HWPs and q-plates, set to generate a specific GHZ state.The analysis is performed using QWPs, HWPs and PBSs.The OAM analysis requires a q-plate to transfer the information encoded in the OAM space to the polarisation degree of freedom which can be then analysed with standard techniques.After the analysis, both photons are sent to single mode fibres connected to single photon detectors.
(II) Two polarization-entangled photon pairs are generated via SPDC in two separated non-linear crystals.Photon A and D of the first and second pair respectively are sent directly to a HWP and a PBS for polarisation analysis.Photons B and C instead are sent to a 50/50 in-fiber PBS followed by another PBS which realizes the polarisation-path entanglement.The two paths go through two HWP and are rejoined in the same PBS forming a Sagnac-like configuration which allow us to perform polarisation and path analysis without worrying about phase instabilities.A motorized delay line is adopted to change the photons wave-packet temporal overlap in the PBS.The path analysis section is composed by HWP and PBS after which the photons are coupled into SM fibres connected to single photon detectors.Generation and analysis sections are represented by cyan and grey zones, respectively.
in Fig. 1.Because σ is a 2 n × 2 n -dimensional matrix we would expect that the number of examples in the training set required to learn ρ also scales exponentially.However, it has been proved [13] that the number of examples required to learn F ρ scales linearly with n and inverse polynomially with the relevant error parameters (a full statement of the theorem is given in Methods.In the following we shall refer to theorem as Theorem 1).More specifically, fixed the error parameters , γ and δ, we can PAC-learn a quantum state provided: where K is a constant.This result provides an upper bound on the number of measurements required to learn a quantum state with respect to any probability measure over two-outcome POVM measurements.The value of K is left unbounded but it is critical for applying the theorem in an experimental setting.
The learning procedure prescribed by Theorem 1 is simple and it involves finding a hypothesis state σ such that Tr(E i σ) ≈ Tr(E i ρ) for all i.Then, with high probability, that hypothesis will generalise in the sense that Tr(Eσ) ≈ Tr(Eρ) for most E's drawn from D. It is then possible to interpret the problem of finding a mixed n-qubit state which approximately agrees with the measurements as an optimisation problem.
The optimisation problem takes as input m POVM measurements described by Hermitian matrices {E 1 , . . ., E m } and their corresponding measurement out- where by σ 0 we denote the positive semidefiniteness of σ.
The above formulation is a convex program whose solution is known to be computable in polynomial time in the dimension of σ using interior point methods [15,16] or the ellipsoid method [17].However, because the dimension of σ scales exponentially with n, the problem of finding the minimum of f (σ) is in practice not efficiently computable.This is still compatible with the linear scaling of Theorem 1 (see Methods) because the results proved in [13] are purely information-theoretic and are concerned only with the sample size m.For any given class of quantum states, the question is still open of whether hypothesis states can be produced efficiently.In this context, Rocchetto recently proved that stabiliser states are efficiently PAC-learnable [18].
Finally, we note that learning a quantum state is not a complete replacement for standard quantum state tomography.The PAC-learning framework of Theorem 1 tests the predictions over the same distribution of the training set; a good hypothesis state could be arbitrarily far from the true state in the usual trace distance metric, but hard to distinguish from the true state with respect to the given distribution over measurements.

EXPERIMENTAL SETUP
We test the learning Theorem 1 over different Greenberger-Horne-Zeilinger (GHZ) states [19] (see Methods for a definition).There are several methods to produce GHZ states [20][21][22][23][24][25][26][27][28] in photonic systems.In order to scale up to 6 qubits we use two different approaches: the first one aims to increase the number of degrees of freedom per photon while the second one exploits an increasing number of photons (see Fig. 2).In setup (I) we generate 2-photon states, encoding up to 4 qubits, and perform a full set of measurements in the computational basis.In setup (II) we generate four-photon states, able to encode up to 6 qubits.Both setups exploit spontaneous parametric down conversion (SPDC) in order to generate polarisation-entangled photons pairs (see Methods).
In setup (I), depicted in Fig. 2, we use the q-plate [29], a birefringent patterned slab, to entangle polarisation and orbital angular momentum (OAM) of single photons [30][31][32][33].This makes possible to encode 2 qubits per photon, exploiting their polarisation and OAM degrees of freedom.In order to obtain a 4-qubit GHZ state, the q-plate acts on the Bell state |ψ − = 1 √ 2 (|RL − |LR ), where R and L denote, respectively, the right and left circular polarisation of the two photons, allowing a polarisationcontrolled variation of the OAM.More specifically, states with right or left polarisation become OAM eigenstates with = −1 or = +1 respectively.Conditioned on the measurements of a subset of qubits, we can also generate 3-and 2-qubit states, as summarized in table I.In order to perform a complete quantum state tomography in both Hilbert spaces, the analysis is carried out using two series The probability of predicting a measurements with less than γ = 0.1 accuracy.The black line represents the predictions made using the completely mixed state as hypothesis.Clearly, the informed predictions are always better than a random guess.(right) The distance, in terms of the fidelity F = σ 1/2 ρσ 1/2 , between the hypothesis state σ and ρ and between σ and the completely mixed state I/2 n , starting guess of the optimisation algorithm.A discussion on the high variance of the datapoints with m = 4 is provided in the Methods section.The learning distribution D (I) is uniform over the set of stabiliser measurements of the state minus the identity matrix (see Methods).Every datapoint is an average of 20 different, randomly generated, sets of measurement configurations.
of quarter-wave plates (QWP), half-wave plates (HWP) and polarising beam splitters (PBS), separated by another q-plate, to transfer the information from the OAM to the polarisation subspace [33].The photons are then sent to single mode fibres (SMF), which can be coupled only with states carrying null OAM.
In setup (II), depicted in Fig. 2, we encode the qubits in the polarisation and path degrees of freedom.Through this encoding we can generate 4-photon states and up to 6-qubit.This setup involves two separate SPDC sources, which generate two pairs of polarisation-entangled photons, (A,B) and (C,D), with the same pulse of the laser.We can then obtain a 4-qubit GHZ state encoded in polarisation, by simultaneously injecting one photon from each source (B and C) over the two inputs of a fibre-based PBS.In this configuration, each photon carries one qubit.The dimension of the system can be increased to 5 qubits by sending one of the two output modes of the fibre-based PBS in a Sagnac interferometer (shown in Fig. 2).This allows us to entangle and measure the polarisation and path degrees of freedom of a single photon while retaining phase stability.This scheme can be easily extended to 6 qubits by sending the other output mode of fibre-based PBS in another Sagnac interferometer.In this case, two out of the four photons carry 2 qubits, which are encoded in the polarisation and path degree of freedom, as shown in Table I.Through the above procedures we can generate the state for n = 3, 4, 5, 6 qubits.The polarisation analysis is performed with a HWP and a PBS for each path.

EXPERIMENTAL DEMONSTRATION
We demonstrate numerically and experimentally, through two photonic systems able to encode from 2 to 6 qubits, that quantum states can be PAC-learned with a linearly scaling training set: that is, we demonstrate that the number of elements m in the training set required to learn an n-qubit quantum state ρ scales linearly with n.
Although Theorem 1 can be applied under any distribution D, it is interesting to test its prediction under distributions that include measurements that are difficult to predict.If, for example, one were to take the uniform distribution over all possible measurement bases, with high probability no measurement drawn from this distribution would be able to distinguish the state from the completely mixed one.We define the completely mixed state as the state described by the density matrix I/2 n , where by I we denote the identity matrix.
All of our experiments are performed on GHZ states, a type of stabiliser state (see Methods for further details).We remark that the validity of Theorem 1 extends to all quantum states.The advantage of using GHZ states is the possibility of clearly identifying a set of measurements and a probability distribution that make the predictions of theorem "interesting" in the sense that they cannot be reproduced using the completely mixed state as hypothesis.Depending on the experimental setup we use two probability distributions, D (I) for setup (I) and D (II) for setup (II), that are uniform over a subset of the stabilisers of state (details on the distributions can be found in Methods).Under these distributions the completely mixed state is never a good hypothesis (unless γ > 0.5) because the stabiliser measurements performed on the state will always return 1 as an outcome.On the completely mixed the same measurements will output 1 or 0 with equal probability.
In the case of learning with experimental data we have to take into account two factors that can invalidate Theorem 1: noise in the measurements and the lack of access to the true value of Tr(Eρ).Both issues can be positively addressed.We examine the noise problem first.As discussed in [13], if the noise that corrupts E to E is governed by a known probability distribution such as a Gaussian, then E is still just a POVM, so Theorem 1 applies directly.If the noise is adversarial, then we can also apply Theorem 1 directly, provided we have an upper bound on |Tr (E i ρ) − Tr (E i ρ)|.As for the second issue, approximate values of the expectation values are also within the validity of the theorem.A discussion is provided in the Methods section.
We begin our experimental analysis with a full characterisation of the PAC-learnability of a 4-qubit GHZ state generated with setup (I).The complete set of measurements available with setup (I) allows us to compare the quality of the hypothesis σ not only in terms of the learning theorem but also from a tomographic perspective.The results, presented in Fig. 3, show that, by increasing the number of measurements in the training set, the hypothesis σ is getting closer, in terms of fidelity, to the ideal state and to the experimental state (right panel).In the same figure it is possible to see that the predictions (left panel, red dots) obtained by minimising f (σ) are always better than those obtained by taking the completely mixed state (black line) as hypothesis.This confirms that the distributions we selected are "interesting" from a learning perspective because it is not possible to make good predictions using random guessing.
Still using GHZ states generated from setup (I) we test the dependency of the measurement complexity on the error parameters , δ, γ.This kind of test is necessary in order to ensure that the hardness of the learning problem used in the experimental demonstration of the theorem is representative of a typical learning scenario.The numerical simulations on the scaling of the error parameters are shown in Fig. 4 and indicate that, as expected from Eq. 2, the hardness of the learning problem does not change abruptly with the error parameters (unless they introduce pathological cases; for example, for γ > 0.5 random guessing becomes a good prediction strategy).
We demonstrate the linear scaling of Theorem 1 over a GHZ of the type described in Eq. 4 and generated by exploiting setup (II).Our algorithm takes as input the error parameters , γ, δ and, for a given n, outputs the minimum m such that a training set that respects Eq. 1 is generated with probability p = 1 − δ.We present the results in Fig. 5 for both numerical and experimental data.The experimental data demonstrates that quantum states are PAC-learnable.A linear fit performed on the experimental data returns a slope value of 1.1.This implies that the value of the scaling constant K in Eq. 2, left undetermined in Theorem 1, is compatible with learning in an experimental setting.The values obtained from the linear fit in Fig. 5 show that learning a 20-qubit state would require ∼ 23 measurements.Notice that a 20-qubit stabiliser state has 1048576 stabilisers.

DISCUSSION
Our work experimentally demonstrates that quantum states, as a hypothesis class, are PAC-learnable.This result, first proved in [13], constitutes an important advance in our understanding of quantum information.The line of research that seeks to establish how much information is really contained in a quantum state, and thereby to gain insight about the reality of the wavefunction, has recently found a new addition in the "shadow tomography" protocol proposed by Aaronson [34].This protocol can predict the outcomes of M different two-outcome measurements on a D-dimensional state, to high accuracy, by measuring only poly(log(D), log(M )) copies of the state.An demonstration of this protocol is a natural future direction, and would be a valuable addition to our physical comprehension of these theoretical results.
From a broader perspective, our work constitutes an example of how the techniques developed in the framework of computational learning theory can be used within quantum information.The interplay of these two fields, recently surveyed by Arunachalam and de Wolf [35], can offer new tools to investigate properties of quantum states and circuits and can help to identify cases in machine learning where classical quantum computation behave differently.This is particularly important in light of the recent advances in quantum algorithms for machine learning (recently reviewed by Biamonte et al. [36] and by Ciliberto et al. [37]) where, despite the growing interest for the topic, it is still unclear whether caveat-free speedups can be attained (for a critical discussion see [37,38]).

The learning theorem
The theorem proved in Ref. [13] In this article, rather than working with single measurement outcomes b i , we are concerned with estimated expected values where each b (j) i is the j-th measurement outcome corresponding to E i .In order to show that the hypothesis σ generated by considering the expected values is equivalent to that obtained by taking the measurements outcome b i , we define If we take m = m S and solve for σ the equations df /dσ = 0 and df /dσ = 0 it is possible to verify that the hypothesis that minimises the function f is also satisfying f .

The learning distributions
We use different learning distributions for the two experimental setups, D (I) and D (II) .The distribution D (I) is uniform over the set of stabiliser measurements [39] of the GHZ state minus the identity matrix.The distribution D (II) is uniform over the set of stabiliser measurements in X and Z of the GHZ state minus the identity matrix.A GHZ state [19] is a type of stabiliser state.A stabiliser state |ψ is the unique eigenstate with eigenvalue +1 of a set of N commuting multi-local Pauli operators P i 's.That is, P i |ψ = |ψ , where P i = j w j and w j ∈ {I, σ x , σ y , σ z } are the Pauli matrices.We define the P i as the stabilisers of the state.
There are 2 n different stabilisers for an n-qubit stabiliser state.Because one of the stabilisers is always the identity (whose eigenvalue is 1 for every state) we chose not to include this measurement in those sampled by D.
Each P i is a two-outcome observable (with eigenvalues +1 or −1).We construct the POVM elements E (1) i and E (2) i of the observable P i by noting that E The set of stabilisers of a state form a group under the operation of matrix multiplication.To represent a state it is then sufficient to consider the n stabilisers that generate this group.For a n-qubit state there are n elements in the set of generators.
The high variance around m = 4 in Fig. 3 can be explained in the following way: each datapoint is obtained by averaging over a number of different configurations sampled from D (I) .It is then likely to sample a configuration that includes 2 generators and 2 other stabilisers that can be obtained by the product of the generators.It is easy to see how the information content of such a configuration is less than the one where 4 independent stabilisers are sampled.This will in turn limit the ability of σ to output good predictions and will generate the high variance in the data.

Numerical simulations
We minimise the function f over the positive semidefinite matrices of unit trace with a variant of the Frank-Wolfe algorithm [40] developed by Hazan [41].All our simulations are performed using 300 iterations of the Hazan algorithm.

Experimental details
For the experimental setups of Fig. 2, a pump laser with λ = 397.5 nm is produced by a second harmonic generation (SHG) process from a Ti:Sapphire mode locked laser with repetition rate of 76 MHz.Photon pairs entangled in the polarisation degree of freedom are generated exploiting type-II SPDC in 2 mm-thick beta-barium borate (BBO) crystals.The photons generated by SPDC are filtered in wavelength and spatial mode by using narrow band interference filters and SMF, respectively.After coupling into SMF, the spatial mode becomes a fundamental Gaussian mode (T EM 00 ) with null associated OAM.

FIG. 2 .
FIG.2.Experimental setups for generating the 3, 4, 5 and 6-qubit GHZ states.Pictorial representation of the two different experimental setups used to generated the quantum states learned with Theorem 1.In setup (I) we makes use of two photons and encode up to 4 qubits.In setup (II) we makes use of four photons and encode up to 6 qubits.(I) In the generation stage, the state of each of the two entangled photons (1 and 2) is locally manipulated via QWPs, HWPs and q-plates, set to generate a specific GHZ state.The analysis is performed using QWPs, HWPs and PBSs.The OAM analysis requires a q-plate to transfer the information encoded in the OAM space to the polarisation degree of freedom which can be then analysed with standard techniques.After the analysis, both photons are sent to single mode fibres connected to single photon detectors.(II) Two polarization-entangled photon pairs are generated via SPDC in two separated non-linear crystals.Photon A and D of the first and second pair respectively are sent directly to a HWP and a PBS for polarisation analysis.Photons B and C instead are sent to a 50/50 in-fiber PBS followed by another PBS which realizes the polarisation-path entanglement.The two paths go through two HWP and are rejoined in the same PBS forming a Sagnac-like configuration which allow us to perform polarisation and path analysis without worrying about phase instabilities.A motorized delay line is adopted to change the photons wave-packet temporal overlap in the PBS.The path analysis section is composed by HWP and PBS after which the photons are coupled into SM fibres connected to single photon detectors.Generation and analysis sections are represented by cyan and grey zones, respectively.

FIG. 3 . 2 .
FIG. 3. Learning of a 4-qubit GHZ state.Numerical simulations (blue curves) and experimental data (red curves) of the learning of the state (|0000 + |1111 ) / √ 2. (left)The probability of predicting a measurements with less than γ = 0.1 accuracy.The black line represents the predictions made using the completely mixed state as hypothesis.Clearly, the informed predictions are always better than a random guess.(right) The distance, in terms of the fidelity F = σ 1/2 ρσ 1/2 , between the hypothesis state σ and ρ and between σ and the completely mixed state I/2 n , starting guess of the optimisation algorithm.A discussion on the high variance of the datapoints with m = 4 is provided in the Methods section.The learning distribution D (I) is uniform over the set of stabiliser measurements of the state minus the identity matrix (see Methods).Every datapoint is an average of 20 different, randomly generated, sets of measurement configurations.

FIG. 4 .
FIG. 4. Measurement complexity of error parameters.Dependence of m on the error parameters for learning 4-qubit GHZ states generated with setup (I).Learning is performed under the distribution D (I) (see Methods for further details) and each data-point is an average over 4 different GHZ states.When a given error parameter is changed the other ones are kept constant at the following values δ = 0.1, γ = 0.1, and = 0.05 (left) Scaling of δ. (center) Scaling of γ. (right) Scaling of .

FIG. 5 .
FIG. 5. Experimental demonstration of Theorem 1. Scaling of size of the training set m required to learn a GHZ state as a function of the number of qubits n.Experimental data-points (red crosses) are obtained using the experimental setup (II).Each data-point is obtained using 50 different, randomly generated sets of measurement configurations drawn from D (II) (see Methods for further details).Error bars show the standard deviation for an average of 10 different runs of the algorithm to estimate m.The red line is a linear fit on the experimental data-points with equation m = 1.19n − 0.34.The learning parameters are = 0.15, γ = 0.2 and δ = 0.2.

TABLE I .
Qubit encoding.The table shows the encoding map between logical states and photons.Photons are labeled with capital letters A, B, C and D. Two photons (A and B in the table) are used in setup (I) to encode states up to 4 qubits in the polarisation and OAM basis.For setup (II) states up to 6-qubits are generated adding two extra photons (C and D) and using an encoding in path and polarisation.The states |H , |V , |R , |L denote the polarisation degree of freedom while |+1 and |−1 represent the eigenstates of the OAM with l = +1 and l = −1, respectively.To identify the two possible paths of the photons in setup (II) we use the labels |a and |b .
comes {Tr(E 1 ρ), . . ., Tr(E m ρ)}.The goal is to find an Hermitian positive semidefinite matrix σ that minimises Let ρ be an n-qubit state, let D be a distribution over two-outcome measurements, and let E = (E 1 , . . ., E m ) consist of m measurements drawn independently from D. Suppose we are given bits B = (b 1 , . . ., b m ), where each b i is 1 with independent probability Tr (E i ρ) and 0 with probability 1 − Tr (E i ρ).