Research ArticleBIOCHEMISTRY

Coordinated regulation of cellular identity–associated H3K4me3 breadth by the COMPASS family

See allHide authors and affiliations

Science Advances  24 Jun 2020:
Vol. 6, no. 26, eaaz4764
DOI: 10.1126/sciadv.aaz4764


Set1A and Set1B, two members of the COMPASS family of methyltransferases that methylate the histone H3 lysine 4 (H3K4) residue, have been accredited as primary depositors of global H3K4 trimethylation (H3K4me3) in mammalian cells. Our previous studies in mouse embryonic stem cells (ESCs) demonstrated that deleting the enzymatic SET domain of Set1A does not perturb bulk H3K4me3, indicating possible compensatory roles played by other COMPASS methyltransferases. Here, we generated a series of ESC lines harboring compounding mutations of COMPASS methyltransferases. We find that Set1B is functionally redundant to Set1A in implementing H3K4me3 at highly expressed genes, while Mll2 deposits H3K4me3 at less transcriptionally active promoters. While Set1A-B/COMPASS is responsible for broad H3K4me3 peaks, Mll2/COMPASS establishes H3K4me3 with narrow breadth. Additionally, Mll2 helps preserve global H3K4me3 levels and peak breadth in the absence of Set1A-B activity. Our results illustrate the biological flexibility of such enzymes in regulating transcription in a context-dependent manner to maintain stem cell identity.


Epigenetic posttranslational modifications on histones are highly dynamic; extensive changes of these histone states can affect recruitment of effector proteins that fundamentally shape gene expression programs underlying processes that govern development and disease (16). Specifically, methylation at histone lysine residues has received considerable attention due to the important involvement with transcriptional regulation. One such mark, histone H3 lysine 4 trimethylation (H3K4me3), is an evolutionarily conserved modification consistently found at transcription start sites (TSSs) and serves as a hallmark of active gene promoters (68). Studies have conferred that H3K4me3 interacts with chromatin remodelers (9, 10) and promotes recruitment of basic transcription factors to facilitate transcriptional activation (11, 12). Other functional implications ascribed to H3K4me3 include DNA damage repair response (13, 14), marking of bivalent genes comarked with the repressive H3K27me3 modification (15), and cell identity specification (16, 17). Thus, there is a consensus that H3K4me3 is critically linked to transcriptional output and cellular response.

Deposition of methylation on H3K4 is catalyzed by lysine methyltransferases (KMTs), namely, the family of COMPASS (COMplex of Proteins ASsociated with Set1) complexes, which is widely conserved from yeast to human (1820). While Set1/COMPASS is solely responsible for implementing all three methylation patterns in yeast (1921), our laboratory and others have described the division of labor in methylating H3K4 among the COMPASS members in higher organisms (2128). In mammals, there are six Set1-related enzymes that reside in COMPASS-like complexes: Set1A, Set1B, Mll1, Mll2, Mll3, and Mll4 (18, 29, 30). Evolutionary expansion of the COMPASS family in higher metazoans denotes functional diversification of H3K4 methylation, showcasing the underlying complexity of epigenetic regulation (31). Mammalian Set1A/Set1B have been accredited as mainly responsible for bulk H3K4me3 genome-wide (6, 28, 30, 32, 33); Mll1/Mll2 catalyze H3K4me3 in a locus-specific manner (e.g., targeting Hox genes) (25, 27, 3436); and Mll3/Mll4 are key H3K4 monomethyltransferases at enhancers (24, 29, 37). Recent studies have begun to reassess the biological significance of the catalytic activity as the primary function of COMPASS. For instance, the enzymatic SET domain of Set1A was shown to be nonessential for embryonic stem cell (ESC) self-renewal (38), although deletion of the full-length protein impairs viability (39, 40). Likewise, inactivating mutations of the SET domain of Trr, the Drosophila homolog of Mll3/Mll4, does not result in clear developmental defects, and Mll3/Mll4 catalytically deficient ESCs have less transcriptional aberrations than seen in cells with total protein loss (41, 42). Our laboratory also determined Mll4/COMPASS could regulate enhancers without its enzymatic activity (43). These studies suggest that the COMPASS family function independent of H3K4 methylation is context dependent. However, deletion of any individual COMPASS member in mice results in embryonic or prenatal lethality with distinct phenotypes (37, 39, 4446), signifying these proteins have partially nonredundant functions. Although research has provided insight into how each COMPASS member operates in unique contexts, mechanisms underlying their roles in cellular and developmental regulation remain elusive.

We previously reported that ablating the SET domain of Set1A (Set1AΔSET) does not disrupt bulk H3K4me3 in ESCs (38), indicating the likelihood of other COMPASS members having functionally redundant roles to sustain global H3K4me3 in Set1AΔSET ESCs. Multiple lines of evidence thus far have suggested that both Set1A and its paralog Set1B contribute to genome-wide H3K4me3 deposition (18, 28, 32, 33). However, recent studies suggest Set1A and Set1B have functionally distinct roles, as evidenced by Set1B overexpression not being able to mitigate proliferation defects caused by loss of Set1A protein in ESCs (39), and Set1B is localized mostly in the cytoplasm (47). Given the existing perplexity in their function, we sought to elucidate the extent to which Set1A and Set1B, as well as their familial relatives Mll1 and Mll2, may have distinct versus overlapping responsibilities in H3K4me3 regulation in ESCs in the current study. Through generating an array of ESC lines containing compounding mutations of COMPASS family, we found that Set1A, Set1B, and Mll2 engage in an epigenetic collaborative circuit to modulate the H3K4me3 signature and breadth in ESCs. Our findings provide evidence for both functional specialization and redundancy of the mammalian COMPASS family members to direct transcriptional regulation and cell identity in a context-specific manner, shedding novel insights into mechanisms underlying disease pathogenesis associated with mutations of such critical epigenetic modifiers of chromatin.


Our previous study demonstrated that deleting the C-terminal catalytic SET domain of Set1A does not disrupt global H3K4 methylation in ESCs, which suggested that other enzymes implement bulk H3K4 methylation in the absence of Set1A activity (38). Because Set1B is structurally homologous to Set1A (fig. S1A), we investigated whether Set1B could compensate for Set1AΔSET function to sustain global H3K4 methylation in ESCs. Therefore, we generated Set1B knockout (Set1BKO) ESCs by deleting the first four exons of the Set1B genomic locus via CRISPR-Cas9 gene editing. Polymerase chain reaction (PCR) genotyping and RNA sequencing (RNA-seq) confirmed the deletion of the intended genomic region and Set1B transcript (fig. S1, B and C), and Western blotting substantiated the complete loss of the Set1B protein in ESCs (Fig. 1A). Ablating Set1B did not alter bulk H3K4 methylation as observed by Western blotting (Fig. 1B and fig. S1D). ChIP-seq [chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing] analyses exhibited similar H3K4me3 levels in wild-type (WT) and Set1BKO ESCs (Fig. 1C). In addition, the expressions of key pluripotency factors Sox2, Nanog, and Oct4 remain invariable when comparing WT and Set1BKO cells (fig. S1E). These findings thus far agree with prior studies demonstrating that Set1B is not essential for ESC self-renewal and that Set1B removal does not affect bulk H3K4 methylation (39).

Fig. 1 Set1B compensates Set1A in depositing H3K4me3 in ESCs.

(A) Western blot of Set1B in wild-type (WT) and Set1BKO ESCs, with Rbbp5 as the loading control. Samples were loaded at a 1:2 ratio. Whole-cell extract lysates of MCF7 cells with Set1B knockdown and overexpression (OE) (47) were used to identify the correct Set1B band. (B) Western blot comparing H3K4me3 levels in Set1BKO with WT cells. Total H3 was used as the loading control. Samples were loaded at a 1:2 ratio. (C) Left: Heatmaps of H3K4me3 ChIP-seq levels in WT and Set1BKO ESCs. Occupancy levels were aligned to WT peaks sorted by decreasing peak width. Right: Log2 fold changes (log2FC) in H3K4me3 binding comparing Set1BKO and WT ESCs for the same ordered peaks. Two biological replicates of each genotype were analyzed. (D) Cytoplasmic and nuclear fractions were extracted from WT, Set1AΔSET, and Set1BKO ESCs, and endogenous levels of Set1B and Rbbp5 were detected by Western blot. HSP90 and total H3 served as cytoplasmic and insoluble nuclear fractionation marker, respectively. (E) RNA-seq results validate CRISPR-Cas9–mediated deletion of exons 1 to 4 of Set1B (top) and the sequence coding for the SET domain of Set1A (bottom) in ESCs. Vertical red bars indicate targeted genomic regions in Set1BKO-Set1AΔSET ESCs. RPM, reads per million. (F) Western blot of H3K4me3 in WT and Set1BKO-Set1AΔSET cells, with H3 serving as loading control. Samples were loaded at a 1:2 ratio. (G) Heatmaps showing H3K4me3 levels in WT and Set1BKO-Set1AΔSET ESCs. Occupancy was aligned to WT peaks, sorted in descending width, and displayed on the left. Log2FC of H3K4me3 between double-mutant and WT cells is shown on the right. Two biological replicates of each genotype were analyzed.

Our laboratory has recently established that Set1B, unlike the rest of its COMPASS family relatives, resides predominantly in the cytoplasm (47); however, we find that a small fraction of Set1B still binds to chromatin in ESCs, prompting us to determine the potential role of Set1B in H3K4 methylation (Fig. 1D). Because independent abrogation of Set1B or the catalytic SET domain of Set1A did not change overall H3K4 methylation levels (38), we used CRISPR-Cas9 to establish Set1BKO-Set1AΔSET ESCs. Successful generation of homozygous double-mutant ESCs was verified by PCR genotyping and RNA-seq (fig. S1F and Fig. 1E). Expressions of pluripotency markers Oct4, Nanog, and Sox2 are comparable between WT and Set1BKO-Set1AΔSET ESCs at the significance level of P < 0.01 (fig. S1H). We assessed H3K4 methylation in the double-mutant cells compared with that in WT ESCs and found that while bulk H3K4me1/me2 levels did not differ, H3K4me3 was somewhat decreased in Set1BKO-Set1AΔSET ESCs by Western blotting (fig. S1G and Fig. 1F). The reduction in H3K4me3 levels in the double mutant compared with WT ESCs was also consistently detected by ChIP-seq (Fig. 1G). The diminished H3K4me3 in the combined deletions of the SET domain of Set1A and knockout of Set1B, compared with either mutation alone, suggests functional redundancy between the two Set1 paralogs in regulating H3K4 methylation in ESCs.

Several studies have implicated a role for H3K4me3 in transcriptional activation by RNA polymerase II (RNAP II) (11, 12). Given the noticeable decrease in H3K4me3 in Set1BKO-Set1AΔSET compared with WT cells, we used ChIP-seq to evaluate RNAP II occupancy genome-wide. Reduced RNAP II is evident at sites with H3K4me3 loss, and accompanying reduction in gene expression can be seen in representative track examples (Fig. 2A). Such decrease in H3K4me3 and RNAP II is not due to the loss of Set1A SET domain or Set1B alone (38), further supporting their functionally redundant roles in H3K4me3 implementation and transcription regulation at these regions (fig. S2, A and B). Genome-wide analyses in the Set1BKO-Set1AΔSET double-mutant cells revealed that H3K4me3 is lost across regions proximal to TSSs, with a corresponding decrease in RNAP II occupancy compared with WT ESCs (fig. S2C). Partitioning of these H3K4me3 peaks using K-means clustering revealed that H3K4me3 loss in the double mutant occurs in the second and third clusters, where Set1A binding is the strongest (Fig. 2, B and C). Both clusters also exhibit somewhat diminished RNAP II levels in Set1BKO-Set1AΔSET compared with WT cells (Fig. 2C). The accompanying RNA-seq data demonstrate moderate differences in gene expression pattern, with a stronger decrease in expression pertinent to cluster 3, which has a relatively more severe H3K4me3 loss (Fig. 2C). Cluster 3 also contains genes that are more highly expressed in WT ESCs (fig. S2D) and is especially enriched for housekeeping factors such as ribosomal-related proteins as shown by gene ontology (GO) annotation (fig. S2E). Collectively, these data signify that H3K4me3 at promoters of more highly expressed genes are more perturbed by the combinatorial deletions of Set1B and the SET domain of Set1A in ESCs. These findings are consistent with previous studies reporting that H3K4me3 at genes with higher expression are more sensitive to loss of CXXC1, a key subunit of the Set1/COMPASS complexes (48, 49).

Fig. 2 H3K4me3 is implemented by Set1/COMPASS at more transcriptionally active promoters, while Mll2 catalyzes H3K4me3 at lowly expressed genes.

(A) Genome browser track examples showing ChIP-seq tracks of H3K4me3 and RNAP II, with corresponding RNA-seq tracks for Gpt2 (left) and Hexb (right). (B) TSS-proximal H3K4me3 (13,486) peaks were identified and partitioned into three groups via K-means clustering, and corresponding Set1A (left), H3K4me3 (middle), and RNAP II (right) levels were plotted for WT and double-mutant ESCs. (C) Log2FC changes in H3K4me3 (left) and RNAP II (middle) occupancy, as well as in RNA-seq (right), were determined in Set1BKO-Set1AΔSET relative to WT cells for the three clusters of peaks identified and ordered in (B). (D) Mll2 (left), H3K4me3 (middle), and RNAP II (right) occupancy levels were plotted for WT and Mll2ΔSET ESCs at the clustered 13,486 H3K4me3 peaks identified in (B). (E) Log2FC changes in H3K4me3 (left) and RNAP II (middle) occupancy, as well as in RNA-seq (right), were assessed in Mll2ΔSET compared with WT ESCs for the three clusters of peaks determined and ordered in (B).

We previously found that Mll2, another member of the COMPASS family of H3K4 methyltransferases, also catalyzes H3K4me3 in ESCs (25); therefore, we deleted the SET domain of Mll2 using CRISPR-Cas9 (fig. S2F). Through Western blotting, we noted that disrupting the SET domain of Mll2 adversely affects the protein’s stability (fig. S2G), indicating that Mll2 protein level in Mll2ΔSET ESCs is quite comparable to Mll2KO cells, which were previously generated in our laboratory (34). Histone H3K4me3 ChIP-seq analysis in Mll2ΔSET ESCs revealed loss of H3K4me3 at sites distinct from those seen in Set1BKO-Set1AΔSET when compared with WT cells (fig. S2, C and H). Particularly, the greatest H3K4me3 reduction in Mll2ΔSET cells is confined mainly to cluster 1 peaks, where there is also a corresponding decrease in RNAP II level and overall expression of genes nearest to these sites (Fig. 2, D and E). Cluster 1 contains the TSS of genes that are typically less transcriptionally active and are linked to function in proper development (fig. S2, D and E), concordant with previous findings that Mll2/COMPASS implements H3K4me3 primarily at specific loci such as bivalent genes (25). Analysis of H3K4me3 ChIP-seq in ESCs without Mll1 (50), also known to deposit H3K4me3 in mammals, showed comparable levels of global H3K4me3 between Mll1KO and WT cells (fig. S2I), signifying that Mll1/COMPASS is not a crucial regulator of bulk H3K4me3 in ESCs.

The K-means clustering of H3K4me3 in Fig. 2 effectively divided peaks by their width, which also correlated with gene expression, such that wider H3K4me3 coincided with higher nearest gene expression, and narrower H3K4me3 concurred with lower nearest gene expression (Fig. 2, B to E, and fig. S2D), suggesting an underlying biological significance to H3K4me3 breadth. Multiple studies have recently explored such functional implications, reporting a positive relationship between H3K4me3 breadth and gene expression (17, 51) and attributing the role of breadth to determining specific cell identity during development and disease (16, 52, 53). We noted the predilection of Set1/COMPASS mutant cells to present H3K4me3 loss in clusters 2 and 3, which have greater H3K4me3 breadth, while mutant Mll2/COMPASS primarily affected cluster 1 with the narrowest breadth (Fig. 2, B to E). To investigate the function of H3K4me3 breadth further, we classified broad and narrow WT peaks, annotated to the nearest TSS, and sorted these regions from wide to narrow. We retrieved 500 sites with either the broadest or narrowest peaks and examined H3K4me3 levels in WT, Set1BKO-Set1AΔSET, and Mll2ΔSET ESCs. In comparison to WT, Set1BKO-Set1AΔSET ESCs show diminished H3K4me3 at genes primarily with broader peaks, while Mll2ΔSET cells exhibit abated H3K4me3 levels at genes with mainly narrower peaks (Fig. 3A). These unequivocal differences of altered ChIP density by peak width in the two mutant cell lines are further illustrated quantitatively in composite profiles (Fig. 3B) and representative track examples (Fig. 3C). GO term analyses indicate that genes with broader H3K4me3 peaks are enriched for biological processes that are relatively more important for proper maintenance of stem cell identity (fig. S3A), supporting previous studies associating H3K4me3 breadth with cell identity specification (16, 17). In addition, narrow peaks are enriched for processes that are more pertinent to neuronal development (fig. S3A), consistent with published studies showing Mll2 implementing H3K4me3 at development-related genes in ESCs (25, 54, 55).

Fig. 3 H3K4me3 peak breadth is coordinately controlled by Set1 and Mll2/COMPASS.

(A) Left: Heatmaps of H3K4me3 occupancy in WT, Set1BKO-Set1AΔSET, and Mll2ΔSET ESCs at 500 genes with the broadest (top) and 500 genes with the narrowest (bottom) H3K4me3 peaks in WT. TSSs are ordered by descending peak width of their nearest H3K4me3 peak. Right: Log2FC in H3K4me3 levels comparing either Set1BKO-Set1AΔSET or Mll2ΔSET relative to WT cells for the same ordered set of most broad (top) and narrow (bottom) peaks. (B) Profiles of log2FC of average H3K4me3 signal density (RPM) at the loci with either 500 most broad (top) or narrow (bottom) H3K4me3 peaks in Set1BKO-Set1AΔSET or Mll2ΔSET compared with WT ESCs. (C) Example genome tracks of broad (right) and narrow (left) H3K4me3 peaks in WT, Set1BKO-Set1AΔSET, and Mll2ΔSET ESCs. (D) Box plot contrasting normalized H3K4me3 ChIP-seq coverage for the top 20% broadest H3K4me3 peaks (n = 2626) and top 20% narrowest H3K4me3 peaks (n = 2627) in WT, Set1BKO-Set1AΔSET, and Mll2ΔSET ESCs. Coverage was normalized to peak width. P values (indicated by lowercase letters) were determined using the Wilcoxon rank sum test with continuity correction (paired). Calculated P values are as follows: a, P = 0.01901; b, P < 6.173 × 10−282; c, P = 6.173 × 10−282; d, P = 1.7914 × 10−74. (E) Log2FC heatmaps of RNAP II levels at H3K4me3-proximal TSSs for each indicated mutant ESC line relative to WT cells for the top 20% broadest (top) or narrowest (bottom) H3K4me3 peaks initially defined in (D).

To further investigate the consequence of H3K4me3 peak width, the top and bottom breadth quintiles were evaluated. For the top 20% H3K4me3 peaks, the Set1BKO-Set1AΔSET mutant featured a significantly drastic decrease in H3K4me3 coverage compared with WT, while the Mll2ΔSET mutant exhibited comparable H3K4me3 occupancy relative to WT (Fig. 3D). In contrast, Mll2ΔSET ESCs showed a more significant reduction in H3K4me3 for the 20% narrowest H3K4me3 peaks compared with WT, while the Set1BKO-Set1AΔSET mutant is much less altered for H3K4me3 at these narrow peak regions (Fig. 3D). Furthermore, when assessing peak width of differential H3K4me3 TSS-proximal peaks for each mutant, Set1/COMPASS mainly affected wider peaks, while Mll2ΔSET mutant primarily affected narrower peaks (fig. S3B). By examining the levels of RNAP II at H3K4me3-proximal TSSs in Set1BKO-Set1AΔSET and the Mll2ΔSET mutants compared with WT, we observe decreased RNAP II levels at broader peaks for the Set1BKO-Set1AΔSET mutant and reduced RNAP II levels at narrower peaks for the Mll2ΔSET mutant (Fig. 3E). On the basis of these findings, we speculate the following: (i) transcription of cell identity genes, which are marked by broad H3K4me3 (16), is regulated by Set1/COMPASS and, thus, illuminates the importance of Set1/COMPASS in maintaining stem cell viability (38, 39), and (ii) the observed phenomenon of Mll2 affecting narrow H3K4me3 peaks, which mark developmental genes, supports the critical role of Mll2 in development (34, 56, 57). There are several plausible explanations for the differential recruitment and activity of these COMPASS methyltransferases at distinct groups of loci. Studies have shown that Set1, the yeast homolog to mammalian Set1A and Set1B, is recruited to chromatin through its association with elongating RNAP II (5860). One study has recently proposed that repeated passaging of elongation complexes containing Set1/COMPASS contributes to the widening or broadening of H3K4me3 levels, which correlate with increased transcription frequency (61). While this model has yet to be demonstrated for Mll2/COMPASS, it is possible that Mll2 is recruited to chromatin by other factors that include Menin, which also resides in the Mll/COMPASS complexes, and LEDGF (27, 6265). In addition, different DNA-binding specificities of the CXXC motif found in the CXXC1 protein, a key subunit in the Set1/COMPASS complex, and in Mll2 have been reported to contribute to selective targeting of various COMPASS complexes to their respective genomic loci (6668). In sum, these findings demonstrate that the difference in H3K4me3 patterns established by Set1/COMPASS and Mll2/COMPASS is indicative of functional significance in stem cells.

We noted that Set1A and Mll2 are both highly enriched at H3K4me3 TSS-proximal peaks depicted in the K-means clustered heatmaps, indicating colocalization of Set1A and Mll2 on chromatin (Fig. 2, B and D). In addition, at annotated TSS sites, there is >98% overlap in Mll2 and Set1A binding regions (fig. S4A). This colocalization could be explained by the fact that both CXXC1, a key subunit in the Set1A/COMPASS complex as discussed earlier, and Mll2 harbor a CXXC motif that recognizes and binds to unmethylated CpG-containing DNA (34). It is therefore possible that Mll2 could compensate for Set1/COMPASS loss in H3K4me3 deposition. Consequently, we depleted Mll2 using a short hairpin RNA (shRNA) in Set1BKO-Set1AΔSET lines and compared to WT ESCs (fig. S4B). Knockdown of Mll2 resulted in a marked decrease in H3K4me3 in Set1BKO-Set1AΔSET ESCs (Fig. 4, A and B). To investigate this observation in further detail, we therefore generated a triple-mutant cell line, where we deleted the SET domain of Mll2 in Set1BKO-Set1AΔSET ESCs by CRISPR-Cas9 (fig. S4, C and D). Simultaneously, we removed Mll1 in Set1BKO-Set1AΔSET ESCs using previously reported guide RNAs (gRNAs) for targeting Mll1 (50), in the event that Mll1 may manifest a compensatory role in implementing H3K4me3 under conditions of Set1/COMPASS mutation (fig. S4, C and D). We were able to successfully retrieve homozygous triple-mutant ESCs harboring the intended Set1BKO-Set1AΔSET-Mll2ΔSET and Set1BKO-Set1AΔSET-Mll1KO mutations, as validated by PCR genotyping and RNA-seq (fig. S4, C and D). However, we noted that the Set1BKO-Set1AΔSET-Mll2ΔSET cells proliferate more slowly than WT, Set1BKO-Set1AΔSET, and Mll2ΔSET cells (data not shown). To ascertain if the appended deletion of either SET domain of Mll2 or of Mll affected H3K4 methylation, we performed Western blotting and ChIP-seq analyses to evaluate H3K4me3 levels. Consistent with the effects seen in knocking down Mll2 in the double-mutant ESCs (Fig. 4, A and B), bulk H3K4me3, including the level at TSS-proximal regions, is substantially lowered in the Set1BKO-Set1AΔSET-Mll2ΔSET cells (fig. S4, E to G). However, no additive perturbation effect on H3K4me3 levels was observed in the Set1BKO-Set1AΔSET-Mll1KO triple mutant relative to the double mutant (fig. S4, E to G), affirming the minimal contribution to H3K4me3 deposition by Mll1 in ESCs.

Fig. 4 Mll2 is functionally redundant to Set1/COMPASS in sustaining global H3K4me3 level and breadth.

(A) Heatmaps of H3K4me3 occupancy in WT and Set1BKO-Set1AΔSET ESCs upon control (shNT) and Mll2 (shMll2) knockdowns. Profiles are plotted in decreasing H3K4me3 peak width. Two replicates of each genotype were analyzed. (B) Log2FC heatmaps comparing differences in H3K4me3 levels in WT-shMll2, Set1BKO-Set1AΔSET-shNT, or Set1BKO-Set1AΔSET-shMll2 and WT-shNT conditions for the same ordered peaks as in (A). (C) Heatmaps of clustered H3K4me3 occupancy (left) and corresponding log2FC (right) comparing Set1BKO-Set1AΔSET-Mll2ΔSET to WT cells at the 13,486 H3K4me3 TSS-proximal peaks initially determined in Fig. 2B. (D) Track examples of broad (top) and narrow (bottom) H3K4me3 peaks in WT, Set1BKO-Set1AΔSET, Mll2ΔSET, and Set1BKO-Set1AΔSET-Mll2ΔSET ESCs. (E) Heatmaps of H3K4me3 occupancy (left) and corresponding log2FC (right) at genes with the 500 most broad (top) and 500 most narrow (bottom) H3K4me3 peaks in Set1BKO-Set1AΔSET-Mll2ΔSET and WT ESCs. Broad and narrow peaks were initially determined in Fig. 3A. (F) Average log2FC H3K4me3 signal density plots (RPM) at genes with the 500 most broad (top) and 500 most narrow (bottom) H3K4me3 peaks in Set1BKO-Set1AΔSET-Mll2ΔSET, Set1BKO-Set1AΔSET, or Mll2ΔSET relative to WT ESCs. (G) Box plot comparing H3K4me3 ChIP-seq density for the top 20% broadest or narrowest H3K4me3 peaks in WT, Set1BKO-Set1AΔSET, Mll2ΔSET, and Set1BKO-Set1AΔSET-Mll2ΔSET cells. Normalized coverage, peak number per group, and P values (indicated by lower case letters) were calculated as in Fig. 3D. Calculated P values are as follows: a, c, d, e, f, and i, P < 6.173 × 10−282; b, P = 0.01901; g, P = 6.1730 × 10−282; h, P = 1.7914 × 10−74; j, P = 1.1087 × 10−206. (H) Box plot showing RNAP II coverage at H3K4me3-proximal TSSs for the top 20% broadest or narrowest H3K4me3 peaks in WT, Set1BKO-Set1AΔSET, Mll2ΔSET, and Set1BKO-Set1AΔSET-Mll2ΔSET ESCs. Peak number per group and P values (indicated by lower case letters) were calculated as in (G). Calculated P values are as follows: a, P = 9.0609 × 10−256; b, P = 7.5707 × 10−219; c, P = 5.0143 × 10−54; d, P = 4.0960 × 10−204; e, P < 9.0609 × 10−256; f, P = 1.7272 × 10−21; g, P = 1.0845 × 10−6; h, P = 0.0017; i, P = 6.9636 × 10−38; j, P = 0.4571.

Extending our analyses by evaluating H3K4me3 changes in the previously defined clusters, we observe a robust and synergistic decrease in H3K4me3 occupancy in all three clusters in the Set1BKO-Set1AΔSET-Mll2ΔSET triple-mutant cells (Fig. 4C), evincing that Mll2/COMPASS is functionally redundant to Set1/COMPASS in sustaining global H3K4me3 levels in ESCs. When we analyzed H3K4me3 changes at Set1- versus Mll2-dependent H3K4me3 regions in Set1BKO-Set1AΔSET, Mll2ΔSET, and Set1BKO-Set1AΔSET-Mll2ΔSET compared with WT, we noted the following: The significant decrease in H3K4me3 in the triple mutant compared with that in the double mutant at Set1-dependent H3K4 methylation is unquestionably greater than the significant reduction in H3K4me3 in the triple mutant compared with that in the Mll2ΔSET at Mll2-dependent sites (fig. S4, H and I). This supports the interpretation that while Mll2 is functionally redundant to Set1 at Set1-controlled regions, Set1 does not appear to reciprocate redundancy to Mll2 at Mll2-controlled sites, suggesting a unidirectional compensatory relationship between Mll2 and Set1. The added mutation of Mll2ΔSET in Set1BKO-Set1AΔSET ESCs also further perturbed H3K4me3 breadth, such that the reduction in H3K4me3 density is seen at both genes with broad or narrow breadth, with a more radical decrease at broader peaks in the triple mutant (Fig. 4, D to G, and fig. S4J; also compare Fig. 3A to Fig. 4E). These data pinpoint the role of Mll2/COMPASS to be a key compensator for Set1/COMPASS function in bolstering overall H3K4me3 level and breadth in ESCs.

On the basis of global analysis, we noted an additional decrease in RNAP II at broader and narrower H3K4me3 peaks for the Set1BKO-Set1AΔSET-Mll2ΔSET versus Set1BKO-Set1AΔSET cells (Fig. 4H). The severe loss of H3K4me3 in the Set1BKO-Set1AΔSET-Mll2ΔSET mutant at these regions correlates with a significant decrease in the transcription of cell identity genes compared with WT and Set1BKO-Set1AΔSET cells and as a result may contribute to the proliferation defect in these triple-mutant cells. Thus, in retrospect, the less marked decrease in RNAP II levels in the double mutant at these broader peak regions depicted in Fig. 3E may indicate that the remaining H3K4me3 at these loci is adequate to maintain expression of such cell identity genes to the levels that cell proliferation is not impaired. Furthermore, we previously demonstrated that proper ESC differentiation requires the catalytic activity of Set1A (38), which is a key contributor to the formation of broad H3K4me3. Therefore, we deduce that loss of H3K4me3 at cell identity genes in Set1-mutant cells has a direct effect on cellular differentiation. Because self-renewal and differentiation are hallmarks of pluripotent stem cells, the loss of differentiation potential in Set1-mutant ESCs reflects the impairment of their identity. Together, these data signify that H3K4me3 levels at broad domains play an instructive role in maintaining ESC identity.

In summary, we present findings delineating the functionally redundant roles of the COMPASS family members in H3K4me3 implementation in mammalian stem cells. By investigating H3K4me3 enrichment and breadth in a series of ESC lines harboring compounding mutations of COMPASS enzymes relative to WT cells, we report that COMPASS members Set1A, Set1B, and Mll2 coordinate regulation of H3K4me3 level and peak breadth across the mammalian genome. We also establish that Mll2 plays an important compensatory role in sustaining global H3K4me3 level and breadth in the absence of Set1/COMPASS. This study enhances our current knowledge, shedding new light on the extraordinary ability of our cells to adapt to contextual changes for sustainment. Moreover, with numerous studies indicating the role of COMPASS in disease pathogenesis—Set1A/Set1B in cancer (69, 70), Set1A in schizophrenia (71, 72), and Mll2 in childhood-onset dystonia (73, 74)—our work offers insight into discovering potential targets for future therapy against these relevant diseases through characterization and assessment of the broad/narrow H3K4me3 epigenetic signature.


Murine ESC culture, CRISPR-Cas9–mediated gene editing, and shRNA knockdown

V6.5 ESCs were cultured in N2B27 media supplemented with MEK inhibitor and GSK3 inhibitor and LIF (2i/LIF) as previously described (38). Vectors expressing Cas9 and CRISPR single guide RNAs (sgRNAs) were generated by annealing and cloning oligos encoding the desired sgRNA sequences into the pX459 plasmid following a previously published protocol (75). Subsequent ESC transfection was performed as follows: 5 million ESCs were electroporated with the CRISPR sgRNA targeting vectors using Lonza’s Nucleofector2b (following the manufacturer’s instructions). One day after transfection, ESCs were selected with puromycin (2 μg/ml) (Life Technologies) for 2 days and grown in 2i/LIF ESC medium for 10 days until clone picking. gRNA sequences used in this study are listed as follows: Set1BKO, ccagatccacgcgaaaagcc (left) and ctaccggtgccccttccctg (right); Mll2ΔSET, ggtcagaaagggtcctaaag (left) and tgccccttgggtggacagat (right); and Set1BKO-Set1AΔSET-Mll2ΔSET, ggtcagaaagggtcctaaag (left) and gtaagtggcgtgaaggtttg (right). gRNA sequences to generate Mll1KO were previously reported (50). The following primers were used to genotype clones: Set1B—gaatctggaccaaaaacaag (WT forward), acttgcccagcagttaaaaa (WT reverse), tcagcgtctaataactcaagc (mutant forward), and cccaagatccactatcaatg (mutant reverse); Mll2—gaggtgcagcttcggtccaccag (forward) and ggaacatgtagcacccaatacc (reverse); Set1BKO-Set1AΔSET-Mll2ΔSET—tccaccaggtgtgcagataa (forward) and ccatggacaggaaggtagga (reverse), with Set1A primers previously reported (38). Knockdown was performed as described in Cao et al. (43). The lentiviral construct against Mll2 was reported previously (25).


The following antibodies were generated in-house: anti-Set1B (47), anti-Mll2, anti-Set1A, anti-H3, anti-H3K4me1, anti-H3K4me2, and anti-H3K4me3. Other antibodies used in this study were H3 (CST #1B1B2), H3K4me3 (CST #9727), Rpb1 (CST #D8L4Y), HSP90 (Santa Cruz #7947), and Rbbp5 (Bethyl Laboratories #A300-109A). Set1A ChIP-seq data in Fig. 2B were from the Sze et al. (38).

RNA-seq, ChIP-seq, and next-generation sequencing data processing

RNA was extracted using the RNeasy mini kit following Qiagen’s instructions, and RNA-seq libraries were prepared using the TruSeq Stranded Total RNA Preparation Kit (Illumina). ChIP was performed as previously described (38, 76). ChIP-seq libraries were prepared using the KAPA HTP library preparation kit (Kapa Biosystems). Libraries were single read sequenced on the Illumina NextSeq 500 Sequencing System, and raw BCL output files were processes using bcl2fastq (Illumina, version before quality trimming using Trimmomatic (77). Trimmed ChIP-seq and RNA-seq reads were aligned to the mouse genome [University of California at Santa Cruz (UCSC) mm9] using Bowtie v1.1.2 (78) and TopHat v2.1.0 (79), respectively. Only uniquely mapped reads satisfying the two-mismatch maximum threshold within the entire length of the gene were considered for ensuing analyses. Mapped ChIP-seq reads were extended to 150 base pairs toward the interior of the sequenced fragment, and raw read counts were normalized to total reads per million (RPM). Output BAM files were converted into bigwig coverage plots to generate UCSC genome browser tracks. In the case of RNA-seq, exonic reads were assigned to genes from Ensembl release 67 using the python package HTSeq-0.6.0 (80). At least two biological replicates were performed for RNA-seq and ChIP-seq under each experimental condition.

ChIP-seq and RNA-seq analyses

For ChIP-seq, peak calling for H3K4me3 was performed using MACS v1.4.2. with default parameters. Heatmaps and metaplots were generated using deepTools 2.0 v3.1.1 (81). Occupancy levels in RPM were observed over peak regions as denoted and were either ranked by peak width or partitioned by K-means clustering. For the classification of broad and narrow H3K4me3 peaks in Figs. 3 and 4, we overlapped broad peaks called using SICER v1.1 (82) and narrow TSS-proximal peaks called using MACS, retaining the peaks called using SICER that overlapped with narrow TSS-proximal peaks. Peaks were mapped to a TSS using the GREAT algorithm (83), where TSS-proximal peaks were required to be located within 5 kb from the identified TSS. To determine the H3K4me3 levels for breadth analyses shown in Figs. 3D and 4(G and H), and figs. S3B and S4J, the BEDTools map function was used to quantify the coverage over peak regions, and custom R scripts were used for data plotting. The differential H3K4me3 peaks shown in figs. S3B and S4J for the respective mutant lines were defined as having a log2 fold change (log2FC) of H3K4me3 peak coverage of <−1 relative to WT peak coverage. To identify Set1-dependent H3K4me3 and Mll2-dependent H3K4me3 peaks in fig. S4 (H and I), H3K4me3 peaks were first called from at least two WT biological replicates using MACS, and overlapping regions determined using BEDTools intersect were chosen for subsequent differential binding analysis. To define sites with differential binding between WT and the indicated mutant line, BAM files of H3K4me3 ChIP-seq from WT and mutant samples were converted to a bed file using BEDTools v.2.29.1. BEDTools coverage was then used to quantify read counts within the initially identified overlapping H3K4me3 regions in WT and mutant samples. Using the edgeR v3.12.1 (84) package from Bioconductor, the quantified counts were normalized to counts per million, and differential peaks were then identified by fitting a negative binomial generalized log linear model to the normalized counts data. Statistically significantly decreased peaks in the mutant cell lines were identified using Benjamini-Hochberg adjusted P values <0.01 and log2FC <−1. Set1- or Mll2-dependent H3K4me3 peaks were determined by overlapping the respective decreased H3K4me3 regions with Set1A or Mll2 binding. For RNA-seq, for differential gene expression, gene count tables from HTSeq were used as input for edgeR v3.12.1 (84), and genes with Benjamini-Hochberg adjusted P values <0.01 were regarded as differentially expressed and used for downstream GO functional analysis with Metapscape (85). Log2FC heatmaps of nearest gene expression relevant to the clustered peaks were determined using custom scripts and visualized using Java TreeView.


Supplementary material for this article is available at

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.


Acknowledgments: We express gratitude to members of the Shilatifard laboratory for constructive criticism toward this manuscript. We thank E. R. Smith for the critical reading and editing of this manuscript. We thank M. Iwanaszko for insightful feedback on statistical tests and bioinformatics analyses in this manuscript. We are thankful to K. Chen of The Methodist Hospital Research Institute in Houston, Texas, for instructive advice on performing breadth analyses. We also thank B. K. Cenik for inspiring scientific quip. Funding: C.C.S. was supported, in part, by NIH/NCI Kirschstein-NRSA F31CA228149 and is currently supported, in part, by NIH/NCI Predoctoral to Postdoctoral Transition Award F99CA234945. K.C. is supported, in part, by NIH/NICHD Pathway to Independence Award K99HD094906. Studies in the Shilatifard laboratory regarding the role of the COMPASS family in development and cancer are supported by the NCI’s Outstanding Investigator Award R35CA197569. Author contributions: C.C.S. and A.S. conceived and initiated the project. C.C.S., M.U., L.W., C.A.R., S.A.M., E.J.R., D.Z., and D.D. performed the experiments. C.C.S., P.A.O., K.C., F.X.C., and S.D. analyzed the data. C.C.S. and A.S. wrote the manuscript. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Next-generation sequencing data have been deposited at Gene Expression Omnibus database under accession number GSE152595. Additional data related to this paper may be requested from the authors.

Stay Connected to Science Advances

Navigate This Article