Research ArticleLIFE SCIENCES

A unique chromatin complex occupies young α-satellite arrays of human centromeres

See allHide authors and affiliations

Science Advances  12 Feb 2015:
Vol. 1, no. 1, e1400234
DOI: 10.1126/sciadv.1400234
  • Fig. 1 CENP-A and CENP-C enrichment decreases with α-satellite divergence in pericentric heterochromatin.

    Log-ratio CENP-A, CENP-C, and H3 enrichment profiles spanning the 40-kb most proximal annotated segment of chromosome arm Xp, which spans the DXZ1 α-satellite HOR gradient (3). Dense CENP-A and CENP-C enrichment diminishes with distance from the centromere-proximal edge, and depletion of H3 diminishes ~20 kb from the edge. Diverged α-satellite occupies the Xp arm punctuated by LINE-1 and other elements where centromere protein enrichment is low.

  • Fig. 2 Variable CENP-A, CENP-C, and H3 occupancies at annotated α-satellite arrays.

    Occupancy profiles for the most centromere-proximal 5-kb regions of eight HORs and monomeric α-satellite arrays present on BAC clones that have been tested for artificial centromere function (4), and for four selected HORs from the hg38 genomic assembly (2). The DXZ1 profile represents an enlargement of the rightmost 5 kb of Xp shown in Fig. 1. HORs are classified on the basis of localization by FISH (centromeric) (10, 35) or by an artificial chromosome assay (competent or inactive) (4). Within each segment, normalized count occupancies were scaled to the maximum occupancy of CENP-A ChIP using the IGV Genome Browser (46). The number in parentheses indicates the fold enrichment of the maximum relative to that of the D19Z1 HOR, which is set at 1, such that the maximum (CENP-A) peak in the D5Z2 HOR is 1376-fold higher than the maximum (H3) peak in the D19Z1 HOR, and the maximum (CENP-A) peak in the D11Z1 HOR is 93.1-fold higher than that in the D19Z1 HOR. Significant BLAST matches to the 17-bp CENP-B box consensus sequence (CTTCGTTGGAAACGGAA) are indicated (magenta lines).

  • Fig. 3 Centromere proteins from multiple human individuals occupy the same subsets of α-satellite units.

    (A) Clustering strategies for identifying the most abundant CENP-A ChIP-enriched sequences. (B) Phylogenetic tree representing the 20 ChIP and input reference sequences that were most abundantly enriched for CENP-A ChIP. Bootstrap percentages are shown for the earliest divergences, defining four branches on the basis of a 70% bootstrap threshold. The same four branches were obtained using only ChIP or only input reference sequences in the alignment. (C) Phylogeny representing the 10 most abundant CENP-A ChIP reference sequences from each of five individuals.

  • Fig. 4 Young α-satellite dimers are the basic units of expansion and homogenization.

    (A) Phylogenetic tree of the 20 most abundantly CENP-A-enriched input sequences, numbered by decreasing abundance and color-coded by clade. (B) Top: MegaBLAST alignments of 11 reference sequences to GenBank NW_001835979.1, where gray horizontal bars represent 100% identity and vertical red lines represent mismatches. Bottom: Same as top except for one of 11 HOR units of NT_167220.1. Numbers on the left are color-coded to correspond to clades in (A). (C) Overlaps of Cen-like and annotated α-satellites for CENP-A ChIP merged pairs.

  • Fig. 5 Long tandem repeats of the Cen1-like consensus are detected in PacBio single sequence reads.

    (A) Maps of BLASTN hits (boxes, where gray horizontal bars represent 100% identity, vertical red lines represent mismatches, and vertical black lines represent indels) in raw PacBio reads. Displayed are the 10 PacBio single sequence reads (indicated by their sequence read identifier) with the highest bit scores in a MegaBLAST search of SRR1304331 using the Cen1-like 340-bp query. Alternating hits are shown in two tiers for visual clarity. We attribute gaps in the array to the ~15% mostly indel error rate characteristic of PacBio raw data, an interpretation that is supported by the near-perfect alignment of BLAST hits to the 340-bp tiling shown as tandem black diamonds at bottom. (B) A consensus sequence was derived for each of the raw sequences indicated in (A) by automated alignment of the tandem BLAST hits, and a dendrogram was produced, rooting the tree with the Cen1-like consensus. (C) Alignment of the Cen1-like consensus (top sequence) identifies 44 ambiguous residues (indicated as “u” or “s”) and six indels (indicated as dashes) in the overall PacBio-derived consensus (bottom sequence) over the 340-bp sequence.

  • Fig. 6 Two 100-bp CENP-A nucleosomes are precisely positioned over young, but not old, α-satellite units.

    (A) Normalized count profiles of CENP-A and CENP-C ChIP occupancies mapped to the 340-bp Cen1-like consensus. (B) Same as (A) except mapped to the most abundantly enriched 340-bp noncentromeric α-satellite dimer derived from a centromere-competent chromosome 11 HOR (Fig. 2). (C) Same as (A) except for a Cen13-like dimer. (D) Same as (A) except for a Y-chromosome dimer, which lacks a CENP-B box.

  • Fig. 7 Two distinct chromatin complexes occupy specific α-satellite arrays of human centromeres.

    (A) Sequence divergence of selected dimeric units relative to the Cen1-like consensus dimers. (B) ChIP occupancy profiles for a composite 38-mer with dimers rank-ordered by divergence (green dots with indels indicated as triangles). (C) Same as (A) except for Cen13-like dimers. (D) Same as (C) except for a 16-mer Cen13-like composite sequence.

  • Fig. 8 Young α-satellite dimers precisely position ~100-bp CENP-A nucleosomes.

    (A to C) Size distributions of fragments mapping to the Cen1-like (A) and Cen13-like (B) composites and the most proximal 6-kb region of DXZ1 (C). Graphs on the right are expansions of graphs on the left (indicated by brackets). The y-axis scale is for input normalized counts, and the areas under the other curves were equalized to that for input.

  • Fig. 9 Satellite DNA evolution by mutation and unequal crossing over [based on (6) and (47)].

    In this toy example, a three-unit tandem array undergoes an out-of-register pairing event and unequal crossing over to produce a four-unit duplication and a two-unit deletion. Because the blue mutation is close to the left edge of the array, crossing-over events are most likely to occur to its right, and it will be inherited in both the duplication and deletion daughter chromosomes, whereas the red mutation is near the middle, and so it will be duplicated and deleted with similar expected frequencies. Further unequal crossing-over events within the four-unit array will result in expansion and contraction of the array, with corresponding gains and losses of the red mutation, leading to homogenization, but without consequence for the blue mutation. Other mutations that arise near the middle of the array will undergo homogenization like the red mutation, and those that arise near the edge will accumulate without gain or loss like the blue mutation. Over evolutionary time, the edges of the array will diverge, and longer-period out-of-register pairing and crossing-over events will result in HORs encompassing multiple tandem repeat units that are diverged from one another (3). Successive mutations and homogenization events in the middle of the array will result in divergence of homogeneous satellite sequences from the ancestral repeat unit.

  • Table 1 Merged pairs mapping to annotated α-satellites [chromosome-specific α-satellite units catalogued by Hayden et al. (4)].

    Merged pairs aligned with multiple sites were counted only once. Intersection percentages are of the catalogued α-satellite.

    No. of merged pairsCENP-ACENP-CInput
    Total merged pairs3,652,7304,432,99121,929,193
    Catalogued α-satellite*539,990165,420194,925
    Cen1-like intersecting
    207,267 (38.4%)54,910 (33.2%)61,567 (31.5%)
    Cen13-like intersecting α-sat7064 (1.3%)1834 (1.2%)1737 (0.9%)
    Cen1-like intersecting
    62 (0.01%)40 (0.02%)54 (0.03%)
    All three intersecting61 (0.01%)39 (0.02%)53 (0.03%)

    *Total merged pairs mapping to concatenated α-satellite units in the catalog.

    †Total merged pairs mapping to the 38-mer concatenated array.

    ‡Total merged pairs mapping to the 16-mer concatenated array.

    Supplementary Materials

    • Supplementary material for this article is available at content/full/1/1/e1400234/DC1

      Fig. S1. Size distribution of input library fragments.

      Fig. S2. CENP occupancies in a male and female cell line.

      Fig. S3. Joint phylogeny of the most frequent CENP-A ChIP sequences for five human individuals.

      Fig. S4. Cen1-like repeat units in an unplaced clone.

      Fig. S5. Cen1-like and Cen13-like alignments.

      Fig. S6. Normalized count profiles mapped to individual clones.

    • Supplementary Material

      This PDF file includes:

      • Fig. S1. Size distribution of input library fragments.
      • Fig. S2. CENP occupancies in a male and female cell line.
      • Fig. S3. Joint phylogeny of the most frequent CENP-A ChIP sequences for five human individuals.
      • Fig. S4. Cen1-like repeat units in an unplaced clone.
      • Fig. S5. Cen1-like and Cen13-like alignments.
      • Fig. S6. Normalized count profiles mapped to individual clones.

      Download PDF

      Files in this Data Supplement:

    Navigate This Article