Research ArticleMICROBIOLOGY

Scaffolding bacterial genomes and probing host-virus interactions in gut microbiome by proximity ligation (chromosome capture) assay

See allHide authors and affiliations

Science Advances  17 Feb 2017:
Vol. 3, no. 2, e1602105
DOI: 10.1126/sciadv.1602105
  • Fig. 1 Meta3C analysis of the mice gut microbiome.

    (A) Flowchart representing the computational analysis steps of a meta3C experiment. First, the reads from two sequenced meta3C libraries are assembled de novo into contigs. The meta3C contact information from both data sets is then used to generate a contact network between all contigs. The Louvain algorithm is then applied iteratively to segment the global network into CCs. (B) MG-RAST taxonomy analysis of the contigs generated from the de novo assembly step. (C) Evolution of the distribution of CC sizes over 100 Louvain iterations (x axis). Triangles, CCs with 10 to 99 contigs; squares, CCs with 100 to 499 contigs; diamonds, CCs with 500 contigs or more. (D) Stacked bar chart of the distribution of CC sizes for 1, 50, or 100 Louvain iterations. Categories of CCs are indicated under the histograms. (E) Contact maps of the 100 largest CCs recovered after a single and 100 Louvain iterations (1 vector = 200 kb). The x and y axes are labeled with the cumulated DNA size and the index of the community, respectively. (F) Vioplot of different functional contig annotations as a function of their CC size (in number of contigs) (y axis = log scale). The number of annotated elements is indicated for each category.

  • Fig. 2 De novo scaffolding of bacterial genomes from large CCs.

    (A) Pipeline describing the computational processing of CCs. Contigs pooled together within a CC are used to build a genome index (step 1). All PE reads from meta3C libraries are aligned against this index (step 2). If one read of a pair maps onto these contigs, then both reads are retained for the de novo assembly using IDBA-UD (step 3). If the cumulated size of the newly assembled contigs of 5 kb or more reaches at least 500 kb, then they are processed with the GRAAL scaffolding program (step 4). For each CC, the resulting scaffolds and/or contigs are then annotated for taxonomy or the presence of phage sequences (step 5). (B) Example of CC63: The 3264 newly assembled contigs [step 2 in (A)] are processed by GRAAL [step 4 in (A)]. Left: Contact map of the newly assembled contigs. Right: Contact map of the 3.2-Mb scaffold obtained after GRAAL processing. Pink triangles point at the circularization signal in the map, consistent with a bacterial circular chromosome. (C) Schematic representation of the typical primary and secondary features found on a bacterial contact map (left), alongside a diagram of the corresponding chromosome organization (right). Beside the circularization signal (purple triangles), a secondary diagonal is often found (dotted black lines) as a result of contacts between the left (violet) and right (green) replichores. The secondary diagonal crosses the main diagonal at the origin of replication (blue triangles). (D) Contact maps (10-kb bins) of the largest (>500 kb) GRAAL scaffolds retrieved in four CCs, displaying patterns characteristic of bacterial chromosomes [with (i, ii, and iv) or without (iii) a secondary diagonal]. Taxonomic annotation, distribution of read coverage, and position of dnaA (blue triangles) are indicated for each scaffold. The read coverage distribution can be used to infer the growth state of the corresponding bacterium. When present, putative prophage loci are represented on the right vertical axis with green (complete prophage) or red (incomplete prophage) rectangles. (E) Same analysis as in (D) but for two CCs each containing two large and distinct scaffolds [core 22 (v); core 6 (vi)]. Scaffold 2 from core 6 (vi) exhibits a discrete, more covered (see red rectangle on the coverage distribution) region annotated as an incomplete prophage. (F) Comparison of the positions of orthologous genes in the scaffolds obtained in (E). Orthologous genes are displayed as dots based on their position along scaffolds 1 and 2 represented in the x and y axes, respectively (top, core 22; bottom, core 6). The conservation of synteny between the two scaffolds is apparent from the higher density of orthologous genes (dots) in the diagonal of the graph.

  • Fig. 3 Analysis of phage-bacteria interactions.

    (A and B) Putative prophage sequences in bacterial scaffolds. Magnification of the main diagonal and annotations of the two genomic loci characterized as intact prophages by Phaster in the core 25 scaffold (green rectangles, Fig. 2D). GC content, read coverage distribution, and the predicted ORF annotations (six-frame translation) are indicated under each matrix. Orange genes encode for hypothetical proteins and are enriched in this genomic region. The peculiar contact signals displayed by prophages in contact matrices (see fig. S6) suggest that the border of the prophage locus predicted by Phaster (green double arrows) can be refined because of the meta3C data (dotted black lines and blue double arrows). (C and D) Representative contact maps between large independent phage contigs (cores 129 and 151) and bacterial scaffolds of interest either (i) display enriched contacts or (ii) present clustered regularly interspaced short palindromic repeats (CRISPR) spacer sequences also found in the phage sequence (scaffold labeled with an asterisk). The read coverage of the bacterial scaffolds and the normalized contact frequencies between the phage contigs and the bacterial scaffolds are plotted under the maps (black and blue graphs, respectively). “#” indicates a set of contigs not scaffolded by GRAAL. (E and F) Cis contact map and read coverage distribution for the candidate phage contigs from (C) and (D), respectively. A circularization signal appears on the large (235 kb) core 129 contig. The corresponding coverage also points at the possible multiplication of this genomic structure from a discrete position.

  • Fig. 4 Overview of phage-bacteria interactions through meta3C.

    Normalized contact map between the 40 candidate phage contigs in the x axis (obtained from the reassembly of small CCs) and the 47 bacterial genome scaffolds/assemblies in the y axis. An interaction had to represent at least 10% of the total contacts made by a candidate phage with a bacterial genome scaffold/assembly to be retained. Bacterial genome scaffolds/assemblies were ordered according to their phylogeny relationships (tree on the left of the map). Main taxonomic annotations based on genetic marker analysis are indicated with colored circles next to each predicted bacterial genome. The color scale reflects the contact frequencies, in % of total contacts made by the phage sequence. The stars points at CCs of bacterial genome scaffolds emphasized in Figs. 2 and 3 and fig. S8. The phage contigs outlined along the x axis correspond to those described in Fig. 3 and fig. S8.

Supplementary Materials

  • Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/3/2/e1602105/DC1

    fig. S1. Generation of raw CCs.

    fig. S2. Iterative Louvain procedure and characterization of CCs.

    fig. S3. Comparison of CAGs and meta3C approaches.

    fig. S4. Scaffolding of dozens of bacterial chromosomes.

    fig. S5. Example of post-GRAAL scaffold correction.

    fig. S6. Structural behavior of phage SPβ in B. subtilis genome.

    fig. S7. Schematic representation of the phiKZ-like genome.

    fig. S8. Interactions of phages with bacterial genomes.

    fig. S9. CRISPR spacers’ blast output.

    table S1. Description of the 140 largest genomic structures (>500 kb) detected in the mice gut microbiome and their assembly/scaffolding statistics.

    table S2. Description of the 59 contigs corresponding to candidate phages hailing from the unscaffolded output of the GRAAL software.

    table S3. Description of the 43 contigs hailing from the reassembly of small CCs and corresponding to candidate phages.

    table S4. CRISPR spacers’ blast output (format #6).

    data set S1. Contig data (contigs_id, contig_name, GC content, coverage, core_community_index, core_size).

    data set S2. Normalized contig network (contig_1, contig_2, normalized interaction).

    data set S3. This file contains all the GRAAL scaffolds larger than 300 kb (FASTA format).

    data set S4. This file, in complement of data set S3, contains all the contigs not included in the scaffolds larger than 300 kb (FASTA format).

    data set S5. This file contains all the CC assemblies (contigs >5 kb, FASTA format) that were not scaffolded by GRAAL because of their small size (cumulated size, <500 kb; see steps 4 and 5 in fig. S2).

  • Supplementary Materials

    This PDF file includes:

    • fig. S1. Generation of raw CCs.
    • fig. S2. Iterative Louvain procedure and characterization of CCs.
    • fig. S3. Comparison of CAGs and meta3C approaches.
    • fig. S4. Scaffolding of dozens of bacterial chromosomes.
    • fig. S5. Example of post-GRAAL scaffold correction.
    • fig. S6. Structural behavior of phage SPβ in B. subtilis genome.
    • fig. S7. Schematic representation of the phiKZ-like genome.
    • fig. S8. Interactions of phages with bacterial genomes.
    • fig. S9. CRISPR spacers’ blast output.
    • Legends for tables S1 to S4
    • data set S1. Contig data (contigs_id, contig_name, GC content, coverage, core_community_index, core_size).
    • data set S2. Normalized contig network (contig_1, contig_2, normalized interaction).
    • data set S3. This file contains all the GRAAL scaffolds larger than 300 kb (FASTA format).
    • data set S4. This file, in complement of data set S3, contains all the contigs not included in the scaffolds larger than 300 kb (FASTA format).
    • data set S5. This file contains all the CC assemblies (contigs >5 kb, FASTA format) that were not scaffolded by GRAAL because of their small size (cumulated size, <500 kb; see steps 4 and 5 in fig. S2).

    Download PDF

    Other Supplementary Material for this manuscript includes the following:

    • table S1 (Microsoft Excel format). Description of the 140 largest genomic structures (>500 kb) detected in the mice gut microbiome and their assembly/scaffolding statistics.
    • table S2 (Microsoft Excel format). Description of the 59 contigs corresponding to candidate phages hailing from the unscaffolded output of the GRAAL software.
    • table S3 (Microsoft Excel format). Description of the 43 contigs hailing from the reassembly of small CCs and corresponding to candidate phages.
    • table S4 (Microsoft Excel format). CRISPR spacers’ blast output (format #6).

    Files in this Data Supplement:

Navigate This Article