Research ArticleINFERENCE

Inferring propagation paths for sparsely observed perturbations on complex networks

See allHide authors and affiliations

Science Advances  21 Oct 2016:
Vol. 2, no. 10, e1501638
DOI: 10.1126/sciadv.1501638
  • Fig. 1 Shortest paths, label propagation, and the exposure model for inferring the propagation path of perturbations.

    (A) A perturbation spread from node 1 to node 6 through nodes 2 to 5, additionally branching out from node 4 to node 9 (perturbed nodes are light red, whereas unperturbed ones are gray). To the observer, the only available information is that nodes 1 and 6 are perturbed and that nodes 15, 17, and 18 are not (observed nodes are dark-colored, whereas unobserved nodes are light-colored). (B to D) The color and size of the nodes indicate the ranking assigned by the shortest paths approach (B), label propagation (C), and the exposure model (D); larger red nodes are deemed more likely to be perturbed, whereas smaller darker nodes are deemed less likely. (B) By using shortest paths to infer the perturbation propagation path, nodes 2 to 5 and 11 to 14 are all assigned the same probability of being perturbed (because both paths connecting 1 and 6 are equally short), whereas all other nodes are deemed equally unlikely of being perturbed (because no shortest path traverses them). Therefore, by considering shortest paths, all information provided by the observed unperturbed nodes 15, 17, and 18 is ignored. (C) Label propagation assigns the maximum probability of being perturbed not only to nodes in the 2-to-5 path but also to nodes 7 to 10, because no observed node in that region of the network is unperturbed. (D) The exposure model approach exploits the information provided by the observed unperturbed nodes and assigns a higher probability to the 1-to-5 path than to the 11-to-14 path. Additionally, it acknowledges that branching out of this path into nodes 7 to 10 is possible and more likely than branching out into node 16.

  • Fig. 2 Accuracy of the node state inference for simulated perturbations with homogeneous perturbation probability c.

    We plot the AUC (see Materials and Methods) as a function of the fraction of nodes whose state is observed, for the real networks described in the text. We compare the results of our algorithm on the basis of the exposure model (filled circles) with those obtained using shortest paths (dashed line) and label propagation (solid line). All points are averages over 100 perturbation repetitions (error bars are smaller than the markers). Different colors correspond to different values of the probability c of contagion for the perturbation (the larger the value of c, the larger the perturbation). (A) Metabolic network of the bacterium E. coli (4, 30, 31), (B) reconstruction of the 1999 Internet at the autonomous system level (34), (C) global air transportation network (32), and (D) email communication network (5). Note that because observed nodes are picked at random, the observation is unbiased (see fig. S3); nonetheless, the performance of our method is not greatly affected if the observation is biased toward perturbed nodes (see fig. S4).

  • Fig. 3 Accuracy of the node state inference for simulated perturbations with heterogeneous perturbation probability.

    We fix the fraction of observed nodes and plot the AUC as a function of the range δc allowed for the probability of propagation c, for synthetic perturbations generated with a heterogeneous SI model (described more in detail in the Supplementary Materials). We show here the accuracy of the exposure model (filled circles) and of label propagation (solid line). Each point is an average over 100 different perturbations from distinct root nodes Embedded Image; error bars are smaller than the markers.

  • Fig. 4 Results on human metabolism.

    (A) The giant connected component of the human metabolic network (30, 31), which we use to infer the real metabolic perturbation, as explained in Materials and Methods. Each node of the network is a metabolite; nodes are connected if a metabolic reaction processes both of them. For the sake of clarity, nodes are not shown. Large red circles indicate metabolites that changed concentration after the perturbation (observed perturbed nodes). Large green circles show metabolites whose concentration was not affected by the perturbation (observed unperturbed nodes). The observed nodes are just a minor fraction scattered all around the underlying network. (B) Upon hiding the state of a fraction of the observed nodes, we used the remaining set to infer the state of all the nodes. The graph shows the AUC (see Materials and Methods) obtained when using the exposure model (filled circles; error bars indicate 1 SE), shortest paths (dashed line; the shaded area indicates 1 SE), and label propagation (solid line; the shaded area indicates 1 SE) as a function of the fraction of nodes actually fed to the algorithm.

Supplementary Materials

  • Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/2/10/e1501638/DC1

    Supplementary Materials and Methods

    fig. S1. Belief propagation method used in our approach.

    fig. S2. Topological effects on the accuracy of the model.

    fig. S3. Dependence of the algorithm performance on perturbation size and fraction of observed nodes.

    fig. S4. Dependence of the performance of our approach versus the fraction of perturbed nodes in the observation.

    fig. S5. Expectation maximization-estimated model parameter η versus perturbation parameter c.

    fig. S6. Accuracy of a k-neighbor classifier at detecting perturbations.

    fig. S7. Perturbations on synthetic networks.

    References (4244)

  • Supplementary Materials

    This PDF file includes:

    • Supplementary Materials and Methods
    • fig. S1. Belief propagation method used in our approach.
    • fig. S2. Topological effects on the accuracy of the model.
    • fig. S3. Dependence of the algorithm performance on perturbation size and fraction of observed nodes.
    • fig. S4. Dependence of the performance of our approach versus the fraction of perturbed nodes in the observation.
    • fig. S5. Expectation maximization-estimated model parameter η* versus perturbation parameter c.
    • fig. S6. Accuracy of a k-neighbor classifier at detecting perturbations.
    • fig. S7. Perturbations on synthetic networks.
    • References (4244)

    Download PDF

    Files in this Data Supplement:

Navigate This Article