Research ArticleHUMAN GENETICS

Transcriptome and regulatory maps of decidua-derived stromal cells inform gene discovery in preterm birth

See allHide authors and affiliations

Science Advances  02 Dec 2020:
Vol. 6, no. 49, eabc8696
DOI: 10.1126/sciadv.abc8696


While a genetic component of preterm birth (PTB) has long been recognized and recently mapped by genome-wide association studies (GWASs), the molecular determinants underlying PTB remain elusive. This stems in part from an incomplete availability of functional genomic annotations in human cell types relevant to pregnancy and PTB. We generated transcriptome (RNA-seq), epigenome (ChIP-seq of H3K27ac, H3K4me1, and H3K4me3 histone modifications), open chromatin (ATAC-seq), and chromatin interaction (promoter capture Hi-C) annotations of cultured primary decidua-derived mesenchymal stromal/stem cells and in vitro differentiated decidual stromal cells and developed a computational framework to integrate these functional annotations with results from a GWAS of gestational duration in 56,384 women. Using these resources, we uncovered additional loci associated with gestational duration and target genes of associated loci. Our strategy illustrates how functional annotations in pregnancy-relevant cell types aid in the experimental follow-up of GWAS for PTB and, likely, other pregnancy-related conditions.


Spontaneous preterm birth (PTB), defined as spontaneous labor and birth before 37 weeks of gestation, is associated with considerable infant mortality and morbidity, as well as long-term health consequences into adulthood (1). A genetic component to PTB has long been recognized, but the significant role of environmental factors and the etiologic heterogeneity of birth before 37 weeks (24) have made it challenging to discover genetic associations and causal genes. For example, recent genome-wide association studies (GWASs) of gestational duration in 43,568 women (3331 with a preterm delivery) (5) and in 84,689 infants (4775 born preterm) (6) reported six and one genome-wide significant associations, respectively, with gestational duration considered as a continuous variable. Three loci were also associated with PTB (defined as a categorical variable of birth) in the maternal GWAS (5), but no loci were associated with PTB in the infant GWAS (6). These studies highlight the challenges of such complex and multifactorial phenotypes and the need for additional approaches to facilitate discovery of genes contributing to gestational duration and PTB.

Integrating GWAS that results with genomic and epigenomic annotations is a promising approach for assigning function to variants discovered by GWAS, as well as for identifying additional associations that do not reach stringent genome-wide significance threshold (7, 8). While large consortia [e.g., ENCODE (Encyclopedia of DNA Elements) (9), GTEx (Genotype-Tissue Expression Project) (10), and Roadmap Epigenomics (11)] have generated annotations of putative functional elements and genetic variants for many human cell types and tissues, there is a remarkable absence in these databases for the cell types and tissues that are relevant to pregnancy in general and to PTB in particular. Because the regulation of transcription has strong cell type–specific components and because annotations in disease-relevant tissues or cells tend to be most enriched among GWAS signals for those specific diseases (10, 12), follow-up studies of GWASs of pregnancy-associated conditions have been disadvantaged compared to most other complex diseases due to the paucity of functional annotations in cells relevant to pregnancy. To fill this gap in knowledge, we characterized the transcriptional and chromatin landscapes of cultured mesenchymal stromal/stem cells (MSCs) collected from human placental membranes and decidualized MSCs, also known as decidual stromal cells (DSCs). These cells play critical roles in promoting successful pregnancy, interfacing with fetal cells throughout pregnancy, and the timing of birth (13, 14). We then built a computational framework that integrated these decidua-derived stromal cell annotations with the results of a large GWAS of gestational duration to facilitate discovery of PTB genes.

This integrated analysis revealed a significant enrichment of heritability estimates for gestational duration in decidua-derived stromal cell genomic regions marked by open chromatin or histone marks. Leveraging those functional annotations in a Bayesian statistical framework, we discovered additional loci associated with gestational duration and improved fine mapping in regions associated with gestational duration. Last, using promoter capture Hi-C (pcHi-C), we linked functionally annotated gestational age-associated variants to their putative target genes. More generally, these functional annotations should prove a valuable resource for studying other pregnancy-related conditions, such as preeclampsia and recurrent miscarriage, as well as conditions associated with endometrial dysfunction, such as endometriosis and infertility.


Generation of transcriptome and epigenome maps of untreated (MSC) and in vitro differentiated DSCs

Decidualization is the process of transformation of endometrial MSCs into DSCs that is induced by progesterone production that begins during the luteal phase of the menstrual cycle and then increases throughout pregnancy when successful implantation occurs [reviewed in (15)]. Using progesterone and estrogen or cyclic adenosine 5′-monophosphate (cAMP) to induce decidualization of MSCs in culture has been used in cells derived from endometrial biopsies in nonpregnant women to characterize their transcriptomes and epigenomes and to identify genes and molecular pathways involved in this process (1621).

Because obtaining endometrial cells in nonpregnant women through biopsies requires an invasive procedure that carries some risk and MSCs can also be obtained from human placentas (2224), we isolated these cells from the decidua parietalis of three women who had delivered at term and established one primary MSC line from each to model the process of decidualization (see Materials and Methods). Briefly, cells were treated with medroxyprogesterone acetate (MPA) and cAMP for 48 hours, and a paired set of untreated samples was cultured in parallel for 48 hours. Three replicates of treated/untreated sets of each cell line were studied to assess experimental variability in the two conditions. Each of the 18 samples (3 individual lines × 3 replicates × 2 conditions) were assayed to generate transcriptome [RNA sequencing (RNA-seq)], open chromatin [assay for transposase-accessible chromatin sequencing (ATAC-seq)], and histone modification [chromatin immunoprecipitation sequencing (ChIP-seq)] maps. A summary of those data is shown in table S1, and a representative example of the full set of annotations for one primary cell line is shown in Fig. 1. The number of reads generated for each sample in each condition and other descriptive data are provided in data file S1.

Fig. 1 Schematic of RNA-seq, ATAC-seq, ChIP-seq, and pcHi-C maps centered on the prolactin (PRL) gene, as an example.

Each histone modification and RNA-seq track shows read counts per base pair for each experiment. The pcHi-C signal track shows the number of reads per MboI restriction fragment. Arcs in the pcHi-C interactions track show significant interactions between the promoter of the PRL gene and putative distal regulatory elements identified with pcHi-C. Pooled data (three replicates) for one cell line are shown for untreated cells (MSCs, in green) and decidualized cells (DSCs, in purple). pcHi-C data were generated in a fourth cell line that was decidualized.

Robust gene expression changes occur in decidualized stromal cells

Analysis of the RNA-seq data using DESeq2 (25) revealed 1135 differentially expressed genes after decidualization (table S1). Genes with decreased expression after 48 hours of treatment were highly enriched for cell cycle genes (data file S2), consistent with observations from endometrial biopsies from nonpregnant women that decidualization is associated with cell cycle arrest (19, 26). Genes with increased expression after treatment were enriched for insulin-related terms, also consistent with previous results from endometrial biopsies (26), and for glucose metabolism (18).

Identification of regulatory elements associated with decidualization

To identify putative regulatory elements in MSCs and DSCs, we assayed H3K27ac, H3K4me1, and H3K4me3 histone modifications, which are markers of active enhancers, poised enhancers, and active promoters, respectively [reviewed in (27)]. We also used ATAC-seq to identify open chromatin regions to complement ChIP-seq data. To identify regulatory regions that might be altered in response to, and potentially regulate decidualization, we compared read counts of ATAC-seq and ChIP-seq peaks in untreated and decidualized cells, revealing tens of thousands of regions that differed between untreated and treated samples (table S1). Most of the differential peaks were marked with H3K27Ac and H3K4me1, indicating that the epigenetic changes underlying alterations in gene expression during decidualization predominantly occur in distant regulatory elements, such as enhancers.

We observed a moderate degree of overlap between the differential peaks across ATAC-seq and ChIP-seq data, with the two enhancer marks, H3K27ac and H3K4me1, showing the most overlap (Fig. 2A). In addition, putative regulatory regions that showed chromatin changes in response to decidualization were associated with genes whose expression also changed in response to decidualization (Fig. 2B). Regulatory regions with increased read counts clustered around genes that were more highly expressed after decidualization, indicating increased chromatin accessibility or activation of enhancers of those genes. Conversely, genes that were more lowly expressed after decidualization were enriched for enhancers that became less accessible or active. These observations indicate that the differential peaks of open chromatin and histone marks observed after decidualization correspond to regulatory elements that become more or less active, resulting in correlated gene expression changes of the nearby genes.

Fig. 2 Differential histone modification and ATAC-seq peaks are associated with differential expression and enriched for transcription factors with roles in decidualization.

(A) Plot showing the overlap between the different histone modifications and ATAC-seq maps (intersection between annotations). Peaks were assigned to 100-bp bins to avoid ambiguity in overlap due to different peak borders. Black circles indicate overlap with other annotations; light gray circles indicate that the annotation does not overlap others. (B) Each data point shows the ratio between the number of increased/decreased differential peaks nearby genes that increase expression after decidualization (blue, positive log ratios; upper half of the figure) or decrease expression after decidualization (orange, negative log ratios; lower half of the figure). Genes that were more highly expressed in decidualized cells were flanked by a higher number of ChIP-seq and ATAC-seq peaks that displayed increased read counts in decidualized samples compared to peaks that displayed decreased read counts (top inset). Genes that were down-regulated in decidualized cells showed the opposite trend (bottom inset). All enrichments: P < 10−25. (C) DNA binding motifs of transcription factors relevant in decidualization are enriched in peaks that change following decidualization treatment. Motifs are color-coded by similarity. (D) Colocalization of PGR, FOSL2, FOXO1, GATA2, and NR2F2 with ATAC-seq and ChIP-seq peaks. Transcription factor binding sites co-occur with ATAC-seq and ChIP-seq peaks in both untreated (green) and decidualized (purple) cells more often than with random peaks. Enrichment of the co-occurrences of PGR, FOXO1, GATA2, and NR2F2 are higher when co-occurring with peaks that have increased read counts (navy blue) and lower with peaks that have decreased read counts (orange) in decidualized compared to untreated cells. Enrichment of co-occurrences with peak sets was calculated as the fold difference between the number of transcription factor peaks overlapping with ATAC-seq/ChIP-seq peaks and with a random set of peaks (see Materials and Methods).

Previous work identified transcription factors that play critical roles in decidualized stromal cells (2832). Several of the DNA binding motifs that were enriched in peaks with increased or decreased read counts in our data correspond to transcription factors previously implicated in decidualization (Fig. 2C), such as CAAT-enhancer binding protein (CEBP) (33), progesterone receptor (PGR) (28) that shares the same motif with androgen response element, and glucocorticoid receptor, FOSL2 (Fos-related antigen 2) (28), that shares the same motif with Fra1 (Fos-related antigen 1), Atf3 (Activating transcription factor 3), and BATF (basic leucine zipper ATF-like transcription factor), and TEA (transcriptional enhancer factor) domain transcription factors (21, 34). Whereas CEBP and PGR were exclusively enriched in peaks with increased read counts in decidualized cells, the FOSL2 motif was present in peaks that both changed positively and negatively in decidualized cells.

To better understand the role of these transcription factors in decidualization, we obtained publicly available ChIP-seq data for PGR (28) and FOSL2 (28) from endometrial biopsies and analyzed the colocalization of their binding locations with the putative regulatory elements identified by ATAC-seq and ChIP-seq identified in our study (Fig. 2B). We additionally analyzed FOXO1 (Forkhead box O1) (29), NR2F2 (nuclear receptor subfamily 2 group F member 2) (30), and GATA2 (GATA binding protein 2) (31) ChIP-seq data because these transcription factors have also been implicated in decidualization (2931). With the exception of FOSL2, the colocalization enrichments of PGR, FOXO1, GATA2, and NR2F2 with ATAC-seq and ChIP-seq peaks were higher (9 to 16 folds) among peaks that were increased in decidualized cells (more open chromatin or increased histone modification levels) compared to all peaks (7.5 to 12.8 folds) and to peaks that decreased in decidualized cells (2 to 5 folds). This observation supports the notion that these transcription factors are involved in regulation of decidualization (2831, 35). Although FOSL2 has been reported as a positive coregulator of PGR (28), the presence of FOSL2 motifs in peaks that both increased and decreased in decidualized cells (Fig. 2C) and the lack of difference in the colocalization enrichment between these two sets of peaks (Fig. 2D) suggests that FOSL2 may have a dual role in decidualization.

Together, our results support a model of decidualization that involves changes in the regulatory landscape during the differentiation of MSCs into DSCs, including alterations in chromatin accessibility and in the activation levels of distant regulatory elements, accompanied by the differential binding of key transcription factors, resulting in increases or decreases in gene expression.

Chromatin interactions aid in the identification of target genes of distal regulatory elements

As shown in Fig. 2B, the surrounding regions of differentially expressed genes were enriched for differential ChIP-seq and ATAC-seq peaks that changed in the same direction as the genes in decidualized samples. Accordingly, when we paired differential peaks with the nearest expressed gene as its putative gene target, we observed that these pairs were more likely to have matching directions of change (i.e., both the peak and the gene have increased or decreased read counts in decidualized samples) than nonmatching directions when compared with pairs that were assigned randomly (Fig. 3A).

Fig. 3 pcHi-C connects predicted regulatory elements to their putative target genes.

(A) Randomly assigning a gene to a peak (see Materials and Methods) resulted in fewer peaks that matched the direction of change with that of differentially expressed genes than when using pcHi-C interactions or the nearest gene to pair peaks to genes. (B) The FOXO1 gene is more highly expressed in decidualized samples (fourfold increase, P = 7 × 10−22) and its promoter physically interacts (red arcs) with distal regulatory elements (yellow highlights) that show increased activation in decidualized samples. The nearest expressed gene to these differential peaks is COG6.

In many cases, however, the target gene for a regulatory element is not the nearest gene (36), and therefore, information about distal chromatin interactions can be useful in prioritizing candidate gene targets of variants identified in GWAS. To this end, we generated a pcHi-C map of a decidualized cell line, thus enriching for the identification of long-range chromatin interactions between promoters and distant regulatory elements (3739). We identified a total of 161,337 interactions, of which 53,211 were between promoters and distal regions of accessible chromatin assayed by ATAC-seq and ChIP-seq, suggestive of their regulatory role. We used the significant interactions between promoters and distal regions that we identified to pair differential peaks with putative target genes. As shown in Fig. 3A, using pcHi-C interactions as a pairing method resulted in enhanced identification of differential peak/differential target gene pairs that have matching directions of change compared to random assignment of gene-target pairs.

Whereas assigning peaks to the nearest expressed gene also led to enhanced assignment of differential peaks to target genes with matching directions of change (Fig. 3A), pcHi-C was helpful in identifying less obvious target genes, as shown in Fig. 3B. In this example, several pcHi-C interactions link distal regulatory elements up to 847 kb away that became more active in decidualized cells to the promoter of a gene (FOXO1) that was up-regulated in decidualized cells and is known to be involved in decidualization (32). The nearest expressed gene method assigned those differential peaks to COG6, a gene that does not change expression in decidualized samples and is therefore a less likely target.

In conclusion, by combining pcHi-C interactions with the epigenome maps and transcriptome data, we were able to identify genes and putative regulatory elements that respond to, or regulate, the decidualization process. We next used these functional genomic maps and datasets to fine map GWAS loci for gestational duration and identify new candidate genes with a potential role in PTB.

The heritability of gestational duration is enriched for functional annotations in DSCs

To identify candidate genes that may play a role in gestational duration and PTB, we used summary data from a GWAS of gestational duration based on a meta-analysis of a 23andMe GWAS (n = 42,121) (5) and the results from six European datasets (n = 14,263). A detailed description of the GWAS is in the Supplementary Materials and figs. S1 and S2. After filtering for single-nucleotide polymorphism (SNPs) that are present in the 1000 Genomes Project data and minor allele frequency of >0.01, we identified SNPs at six autosomal loci, defined as approximately independent blocks by LDetect (40), that were associated with gestational duration at genome-wide significance of P < 5 × 10−8 (table S2). We then created a computational pipeline to assess enrichment of GWAS signals in functional annotations that we generated in untreated (MSCs) and decidualized (DSCs) stromal cells to fine map GWAS loci and discover candidate causal genes and to potentially provide support for additional loci that did not reach genome-wide significance in the GWAS (Fig. 4A). Each step of this procedure is explained below and described in details in Materials and Methods.

Fig. 4 GWAS analysis pipeline and heritability enrichment in functional annotations.

(A) Computational pipeline for analyzing GWAS of gestation duration. Yellow boxes (input data): GWAS summary statistics and functional annotations from endometrial stromal cells (in both untreated and decidualized cells). Green boxes: Stages of statistical analysis (see Materials and Methods). (B) Stratified LDSC heritability analysis of GWAS of gestational duration using functional annotations. Left: Fold enrichment of heritability in each annotation. Dashed line shows values at 1, i.e., no enrichment. Center: Proportion of heritability explained by each annotation. Right: Proportion of SNPs across the genome that fall within an annotation. For each annotation, enrichment (left) is the ratio of h2 proportion (center) divided by the SNP proportion (right). Error bars represent 95% confidence intervals.

We first used stratified linkage disequilibrium (LD) score regression (S-LDSC) (41) to assess enrichment of GWAS signals in functional annotations in endometrial stromal cells. S-LDSC takes as input GWAS summary statistics across the genome and functional annotations of SNPs, e.g., whether an SNP is in ATAC-seq peak, and returns as output heritability enrichment of each annotation. S-LDSC is a commonly used tool for estimating the proportion of heritability of complex phenotypes that is explained by variants in certain functional annotations. The heritability enrichment is defined by the proportion of heritability explained by annotations divided by the expected proportion, which is the percent of SNPs genome wide that are in these functional annotations. To account for possible systematic bias in this analysis, i.e., SNPs within annotations of interest may differ from background SNPs in systematic ways such as their LD structure and epigenomic properties, we included a range of baseline annotations (default S-LDSC setting), including LD-related annotations, deoxyribonuclease (DNase) hypersensitivity, enhancer annotation, H3K27ac, H3K4me1, and other histone marks (the union across cell types). Thus, if an annotation is shared by many cell types, then it would not show the enrichment in S-LDSC analysis (see Materials and Methods).

Using S-LDSC, we found 5- to 10-fold enrichments of GWAS heritability for gestational duration in our functional annotations compared to the baseline model of S-LDSC (Fig. 4). The enrichment of enhancer marks H3K27ac and H3K4me1 was higher in decidualized than in untreated cells, but the opposite pattern was observed for the promoter mark H3K4me3, which was more enriched in untreated (MSCs) than in decidualized (DSCs) cells. These findings are consistent with previous observations that enhancers are often more dynamic and condition- or tissue-specific than promoters (10). We observed weaker heritability enrichments of open chromatin regions defined by ATAC-seq and of interaction regions in pcHi-C. However, because we performed joint analysis of all annotations together, the enrichment of one annotation (e.g., ATAC-seq peaks) will be reduced if the enrichment is partially explained by other, overlapping annotations (e.g., H3K27ac). Although the promoter mark H3K4me3 in untreated cells showed the highest enrichment, the annotations that contributed most to the heritability of gestational duration were enhancers (Fig. 4) due to the much larger number of enhancer histone marks than promoters in the genome. Our results thus highlight the importance of functional annotations in endometrial stromal cells at GWAS loci for gestational duration.

Integrated analysis of GWAS and decidual cell functional annotations improves fine mapping of causal variants of gestational duration and identifies putative target genes

We next developed a computational procedure, based on fine mapping, to integrate the decidua stromal cell functional maps with a GWAS of gestational duration to identify putative causal variants (Fig. 4A). Because of extensive LD in the human genome, the causal variants driving the associations are unknown at most loci discovered by GWAS. Fine mapping is a Bayesian statistical procedure that takes as input GWAS summary statistics and patterns of LD at trait-associated loci and computes the probability of each variant at a locus to be a causal variant (7). These probabilities, known as posterior inclusion probabilities (PIPs), reflect our confidence of certain SNPs being causal variants. The PIP of a variant ranges from 0 to 1, with 1 indicating full confidence that the SNP is a causal variant. If a region contains a single causal variant, the PIPs of all SNPs in the region should approximately sum to 1.

While fine mapping has been commonly used in identifying putative causal variants from GWAS of complex traits (7), it is often difficult to narrow down causal signals to one or a small number of variants in most GWAS loci. Standard fine mapping treats all SNPs at a locus equally. Recent work suggests that incorporating Bayesian prior probabilities that favor functional SNPs improves fine mapping (8, 42). We posited that integrating functional annotations in pregnancy-relevant cells in a statistical fine-mapping framework would aid in (i) identifying candidate causal variants at each locus associated with gestation duration, (ii) linking those variants to their target genes, and (iii) discovering additional loci and genes associated with gestational duration that may have failed to reach the stringent threshold for significance in GWAS.

We first leveraged the enrichments of DSC annotations to create Bayesian prior probabilities for a variant being causal. On the basis of the results of S-LDSC, we chose H3K27ac, H3K4me1, and pcHi-C interactions from the decidualized cells, and H3K4me3 from untreated cells, as functional genomic annotations to create informative priors using TORUS (42). TORUS takes as input genome-wide summary statistics from GWAS and the functional annotations of SNPs and computes enrichment parameters of annotations, which reflect how much more likely an SNP is a causal variant than randomly chosen SNPs (table S3). SNPs associated with functional annotations are generally assigned higher prior probabilities. In addition, TORUS computes statistical evidence at the level of genomic blocks, defined as the probability that a block (determined by LD) contains at least one causal SNP. Without including any histone marks or chromatin accessibility annotations, TORUS implicated six autosomal blocks in the genome at false discovery rate (FDR) of < 0.05, including five of the six genome-wide significant autosomal loci identified in the GWAS (P < 5 × 10−8). One locus on chromosome 3 had an FDR = 0.11 and was therefore not identified by TORUS, and one locus on chromosome 9 that was not identified in the GWAS was implicated by TORUS (data file S3). By including the functional genomic annotations from endometrial stromal cells, the number of high confidence blocks increased to 10, including all 6 that were significant in the gestational duration GWAS and 4 that were not significant in the GWAS (data file S3).

We next performed computational fine mapping on these 10 blocks, with the informative priors learned by TORUS, using sum of single effects (SuSiE) regression (43). Conceptually, SuSiE is a Bayesian version of the stepwise regression analysis commonly used in GWAS (i.e., conditioning on one variant and testing if there is any remaining signal in a region). SuSiE accounts for the uncertainty of causal variants in each step and reports the results in the form of PIPs. Including the priors defined by TORUS using DSC functional annotations significantly improved fine mapping (Fig. 5A, table S3, and data file S4). For example, only one SNP reached PIP > 0.3 across all 10 blocks using the default setting under SuSiE (uniform prior, treating all SNPs in a block equally). This reflects the general uncertainty of pinpointing causal variants due to LD, e.g., a strong GWAS SNP in close LD with nine other SNPs would have PIP about 0.1. By using the annotation-informed priors, eight SNPs in six different blocks reached PIP > 0.3 (Fig. 5A). In some blocks, we were able to fine-map a single high-confidence SNP, e.g., the FOXL2 locus on chromosome 3, while in other blocks, we had considerable uncertainty of the causal variants, as shown by large credible sets, i.e., the minimum set of SNPs to include the causal SNP with 95% probability (Fig. 5B). Table 1 summarizes the most probable causal variants in eight blocks (fine mapping in the remaining two blocks produced large credible sets with no high-PIP SNPs) and their likely target genes based on promoter assignment or chromatin interactions from pcHi-C. We note that our results of the WNT4 locus identified rs3820282 as the likely causal variant. This is consistent with our previous results demonstrating experimentally that the T allele of this SNP disrupts the binding of estrogen receptor 1 (5). This SNP was among the three most likely SNPs in our fine-mapping study, with a PIP of 0.27 (Table 1).

Fig. 5 Fine-mapping GWAS loci of gestational duration.

(A) PIPs of SNPs using uniform vs. functional priors in SuSiE (each dot is an SNP). The functional prior of an SNP is based on SNP annotations and is estimated using TORUS. (B) Summary of fine-mapping statistics of all 10 regions. X axis: The size (number of SNPs) of credible set. Y axis: The maximum PIP in a region. We label each region by its top SNP (by PIP) and the likely causal gene, according to Table 1 or the nearest gene of the top SNP. (C) Likely causal variants near HAND2 and their functional annotations. The top panel shows the significance of SNP association in the GWAS and the middle panel shows fine-mapping results (PIPs) in the region. The vertical yellow bar highlights the two SNPs with high PIPs. These SNPs are located in a region annotated with ATAC-seq, H3K27ac, H3K4me1, and H3K4me3 peaks (bottom). This putative enhancer also had increased ATAC-seq, H3K27ac, and H3K4me1 levels in decidualized samples and interacts with the HAND2 promoter (red arc).

Table 1 Most probable SNPs identified from computational fine mapping of regions associated with gestational duration.

Functional annotations are based on data from endometrial stromal cells. We list an annotation if the SNP is located in a sequence with that annotation in either untreated or decidualized condition. Functional prior is the prior probability of an SNP being a causal variant. For an SNP without any functional annotation, its prior probability is 3.6 × 10−6. We list the pcHi-C annotation if the SNP is within 1 kb of a region involved in a pcHi-C interaction. We call a gene the target of an SNP if (i) the SNP is located in the promoter (< 1 kb of transcription start site) of that gene or (ii) the promoter of that gene has a pcHi-C interaction with a region within 1 kb of the SNP. In the case of rs147843771 at the FOXL2 locus, the target was defined by literature evidence (69). The number of credible SNPs at each region is shown in Fig. 5B. SNPs in bold are discussed in the text. FOXL2 (69), forkhead box L2; GATA2, GATA-binding protein 2; HAND2, heart and neural crest derivatives expressed 2; KCNAB1, potassium voltage-gated channel subfamily A member regulatory beta subunit 1; WNT4, Wnt family member 4.

View this table:

We highlight the results from two regions. In both cases, we were able to identify putative risk genes with relatively high confidence, and neither is the nearest gene of lead SNPs in GWAS. In the first case, two adjacent SNPs [311–base pair (bp) apart], rs13141656 and rs7663453, on chromosome 4q34 did not reach genome-wide significance in the GWAS (P = 3.9 × 10−7 and 4.5 × 10−7, respectively). After using functional annotations in decidua-derived stromal cells, the block containing these SNPs was highly significant (TORUS q = 0.02), suggesting the presence of at least one causal variant in this block. The two SNPs together explained most of the PIP signal in the block (PIP 0.38 and 0.33, respectively, Table 1). The two SNPs are located in a region of open chromatin in endometrial stromal cells, with enhancer activity marked by both H3K27ac and H3K4me1 (Fig. 5C). Only 9 of the 129 tissues from the Epigenome Roadmap (11) also had H3K27ac, H3K4me1, or H3K4me3 peaks spanning the rs13141656 locus and only 2 spanning the rs7663453 locus. In addition, this putative enhancer is bound by multiple transcription factors, including GATA2, FOXO1, NR2F2, and PGR, based on ChIP-seq data. The only physical interaction of this enhancer in the pcHi-C data in decidualized stromal cells is with the promoter of the HAND2 gene, located 277 kb away (Fig. 5C). Summing over the PIPs of all SNPs whose nearby sequences interact with HAND2 (heart and neural crest derivatives expressed 2) via chromatin looping gives an even higher probability, 0.89, suggesting that HAND2 is very likely to be the causal gene in this region (table S4). HAND2 is an important transcription factor that mediates the effect of progesterone on uterine epithelium (44). Thus, in this example, we identified a previously unknown locus, the likely causal variant(s), the enhancers they act on, and an outstanding candidate gene for gestational duration and PTB.

The second example focuses on the locus showing a strong GWAS association with gestational duration on chromosome 3q21. The lead SNP, rs144609957 (GWAS P = 4 × 10−13), is located upstream of the EEFSEC (eukaryotic elongation factor, selenocysteine-tRNA–specific) gene. There is considerable uncertainty of the causal variants in this region, with 50 SNPs in the credible set and the lead SNP explaining only a small fraction of signal (PIP = 0.02). Among all 12 SNPs with PIP > 0.01, 11 have functional annotations, most commonly H3K4me1 and pcHi-C interactions. For nine SNPs (first three shown in Table 1), the sequences in which they are located physically interact with the promoter of GATA2 in the pcHi-C data but not with any other promoters in the region (fig. S3). The PIPs of all SNPs in the genomic regions that likely target GATA2 through chromatin looping sum to 0.68 (table S5). Thus, despite uncertainty of causal variants in this region, our results implicate GATA2 as a candidate causal gene in endometrial stromal cells. GATA2 is a master regulator of embryonic development and differentiation of tissue-forming stem cells (45). As support for the possible role of GATA2 in pregnancy, GATA2 deficient mice show defects in embryo implantation and endometrial decidualization (35), making this another excellent candidate causal gene for gestational duration and PTB.


The molecular processes that signal the onset of parturition in human pregnancies, and how perturbation of those processes result in PTB, are largely unknown. Yet, understanding these processes would reveal important insights into the potential causes of adverse pregnancy outcomes, including spontaneous labor before 37 weeks’ gestation, and potentially lead to the identification of biomarkers and therapeutic targets for PTB. Although it is experimentally challenging to link decidualization processes directly to parturition in humans, it is well accepted that shallow implantation due to suboptimal decidualization is associated with poor pregnancy outcomes in general (4648) and that the decidua is key in triggering parturition (13, 14). Thus far, however, specific genes that perturb decidualization processes and lead to PTB are poorly defined.

Unbiased GWASs do not require prior knowledge of molecular processes underlying disease phenotypes and have the potential to identify novel genes and pathways contributing to common diseases. However, the significant heterogeneity of most common diseases and small effects of most common disease-associated variants lead to the requirement for very large sample sizes (in the tens to hundreds of thousands of cases) to discover more than a handful of associated loci that meet stringent criteria for genome-wide significance. To address this limitation and provide orthogonal evidence for assessment of associations, we characterized the transcriptional and chromatin landscapes in decidua-derived stromal cells and integrated those functional annotations with a GWAS of gestational duration to discover novel loci and genes. The primary motivation for these studies was the notable paucity of genomic and epigenomic functional annotations in pregnancy-relevant primary cells among those studied by large consortia (911). Here, we filled a significant gap by providing maps in untreated and decidualized stromal cells and used these maps for annotating GWAS of pregnancy-related traits.

We chose to focus these studies on endometrial stromal cells because of their central importance in both the establishment and maintenance of pregnancy, as well as their intimate juxtaposition to fetal trophoblast cells throughout pregnancy. Of particular relevance are the roles that decidualized stromal cells play in regulating trophoblast invasion, modulating maternal immune and inflammatory responses at the maternal-fetal interface, and controlling remodeling of the endometrium (48). Defects in all of these processes have been considered a contributing factor to pregnancy disorders (48, 49). Moreover, we showed that the SNPs in regions with endometrial stromal cell functional annotations explained more of the heritability of gestational duration compared to just using baseline annotations. Among all annotations, enhancer marks H3K4me1 (in both decidualized and untreated stromal cells) and H3K27ac (in decidualized cells) were 8- to 10-folds enriched at GWAS loci after adjusting for the general annotations and accounted for 50 to 70% of the GWAS heritability. The lack of complete independence between these marks makes it difficult to delineate their individual effects but, nonetheless, highlights the importance of enhancers and of gene regulation in endometrial stromal cells in modulating the effects of GWAS variants on gestational duration. This is consistent with both the known tissue-specific roles of enhancers and the observation that more than 90% of GWAS loci reside outside of the coding portion of the genome and are enriched in regions of open chromatin and enhancers (12, 41).

Integrating transcriptional and chromatin annotations of gene regulation from MSCs and DSCs improved our ability to discover novel GWAS loci and identify likely causal SNPs and genes associated with gestational duration. We illustrate how our integrated platform identified a novel causal locus and candidate gene (HAND2) associated with gestational duration, as well as refined the annotation of loci that had been previously identified. Our data suggest that in endometrial stromal cells, GATA2 is likely the target gene of enhancers harboring SNPs associated with gestational duration. This does not exclude the possibility that the nearest gene to the associated SNPs, EEFSEC, may be a target gene in other cell types. Both HAND2 (50) and GATA2 (51) are involved in decidualization processes in humans, and perturbations in this process have been linked to poor pregnancy outcomes (4648). Neither GATA2 nor HAND2 was identified as potential candidate genes in previous GWASs of gestational duration, or PTB supports our approach and the importance of using functional annotations from cell types relevant to pregnancy to fine map and identify candidate genes for the pregnancy-related traits. Overall, the integrated analyses performed in this study resulted in the identification of both novel GWAS loci and novel candidate genes for gestational duration, as well as maps of the regulatory architecture of these cells and their response to decidualization.

However, there are some limitations. Our results are based on cells from only three individuals, which may not fully capture the regulatory landscape of endometrial stromal cells. For pcHI-C, we used cells from a single individual to generate the chromatin interactions map. Another limitation is that we focused on only one cell type, albeit one that plays a central role in pregnancy and only one exposure (hormonal induction of decidualization) at one time point (48 hours). Furthermore, it is unclear how our model of in vitro decidualization mimics the endogenous decidualization of endometrial cells during pregnancy. While we chose decidualization as a perturbation to ascertain the dynamic features of functional genomic annotations, we fully anticipate that obtaining annotations in other cell types and in response to other relevant perturbations will improve the ability to identify novel loci, variants, and genes associated with PTB. Future studies that include fetal cells from the placenta and uterine or cervical myometrial cells could reveal additional processes that contribute to gestational duration and PTB, such as those related to fetal signaling and the regulation of labor, respectively. Inclusion of additional exposures, such as trophoblast conditioned media (52) and additional exposure times, may further reveal processes that are pregnancy specific. Second, to maximize power, we focused on a GWAS of gestational duration and not PTB per se. While previous GWAS have shown that all PTB loci were among the gestational age loci (5), we realize that some of the loci that we identified could be related to normal variation in gestational duration and not specifically to PTB. Nonetheless, our findings contribute to our understanding of potential mechanisms underlying the timing of human gestation, about which we still know little. Last, although our ChIP-seq results revealed an association between GATA2 binding and decidualization, confirming the role of this transcription factor in decidual cell biology (53, 54), and studies in murines support its role in endometrial processes (35), we do not yet have direct evidence showing that perturbations in the expression of GATA2, or any of the other target genes identified, influence the timing of parturition in humans. Future studies will be needed to directly implicate the expression of these genes in gestational duration or PTB. Our study highlights the importance of generating functional annotations in pregnancy-relevant cell types to inform GWASs of pregnancy-associated conditions. Our results suggest that the expression of two transcription factors, GATA2 and HAND2, in endometrial stromal cells may regulate transcriptional programs that influence the timing of parturition in humans, which could lead to the identification of biomarkers of or therapeutic targets for PTB.


Ethics statement

This study was approved by the Institutional Review Boards at the University of Chicago, Northwestern University, and Duke University Medical School. Informed consent for the use of data collected via questionnaires and clinics was obtained from participants following the recommendations of the ALSPAC (Avon Longitudinal Study of Parents and Children) Ethics and Law Committee at the time. Informed consent for the use of genetic data in the other six GWASs used in this study was also obtained from participants. Details are available in the Supplementary Materials.

Sample collections

Placentas were collected from three African American women (≥18 years old) who delivered at term (≥37 weeks) following spontaneous labor; all were vaginal deliveries of singleton pregnancies. Within 1 hour of delivery, 5 cm by 5 cm pieces of the membranes were sampled from a distant location of the rupture site. Pieces were placed in Dulbecco’s modified Eagle’s medium (DMEM)-Ham’s F12 media containing 10% fetal bovine serum (FBS) and 1% penicillin/streptomycin. Samples were kept at 4°C and processed within 24 hours of tissue collection. This study was approved by the Institutional Review Boards at the University of Chicago, Northwestern University, and Duke University Medical School.

Isolation of mesenchymal stromal cells from human placental membranes

Third trimester placental tissue was enzymatically digested by a modification of previously described methods (55, 56). Decidua tissue was gently scraped from chorion, and tissue was enzymatically digested in a solution (1× Hanks’ balanced salt solution, 20 mM Hepes, 30 mM sodium bicarbonate, and 1% bovine serum albumin fraction V) containing collagenase type IV (200 U/ml; Sigma-Aldrich, C-5138), hyaluronidase type IS (1 mg/ml; Sigma-Aldrich, H-3506), and DNase type IV (0.45 KU/ml, Sigma-Aldrich, D-5025) at 37°C, until a single-cell suspension was obtained (usually three rounds of 30 min digestion using fresh digestion media each round). Epithelial cells were removed by filtering through a 75 μM nylon membrane and RPMI (Sigma-Aldrich) containing 10% FBS was added for enzyme inactivation. Dissociated cells were collected by centrifugation at 400g for 10 min and washed in RPMI/10% FBS. Erythrocytes were removed by cell pellet incubation with 1× red blood cell lysis buffer (Sigma-Aldrich) for 2.5 min at room temperature. The resulting cells were counted and resuspended in seeding media [1× phenol red-free high-glucose DMEM (Gibco)] supplemented with 10% FBS (Thermo Fisher Scientific), 2 mM l-glutamine (Life Technologies), 1 mM sodium pyruvate (Fisher Scientific), 1× insulin-transferrin-selenium (ITS; Thermo Fisher Scientific), 1% penicillin/streptomycin, and 1× antibiotic-antimycotic (Thermo Fisher Scientific). Dissociated cells were plated into a T75 flask and incubated at 37°C and 5% CO2 for 15 to 30 min (enrichment by attachment). The supernatant was carefully removed, and loosely attached cells were discarded. Plates were allowed to grow in fresh media containing 10% charcoal-stripped FBS (CS-FBS), and 1× antibiotic-antimycotic until the plate was 80% confluent. The antibiotic-antimycotic was removed from the culture media after 2 weeks of culture. We obtained >99% vimentin-positive cells after three passages (fig. S4). Cells were expanded, harvested in 0.05% trypsin, and cryopreserved in 10% dimethylsulfoxide culture media for subsequent use. Each cell line was defined as coming from a different sample collection (different pregnancy).

Decidualization of mesenchymal stromal cells in vitro

Cells were plated and grown for 2 days in cell culture media (1× phenol red-free high-glucose DMEM, 10% CS-FBS, 2 mM l-glutamine, 1 mM sodium pyruvate, and 1× ITS). After 2 days, cells were treated either with control media (1× phenol red-free high-glucose DMEM, 2% CS-FBS, 2 mM l-glutamine) or decidualization media (1× phenol red-free high-glucose DMEM, 2% CS-FBS, 2 mM % l-glutamine, 0.5 mM 8-Br-cAMP, and 1 μM MPA) for 48 hours. Cells were incubated at 37°C and 5% CO2 and harvested for ATAC-seq, ChIP-seq, and RNA-seq, and prolactin (PRL) and insulin-like growth factor-binding protein 1 (IGFBP1) mRNA were assessed by quantitative real-time polymerase chain reaction (PCR) before each downstream assay was performed.

RNA sequencing

Total RNA was extracted from approximately 1 million cells using the AllPrep DNA/RNA Kit (QIAGEN) according to manufacturer’s instructions. RNA quality (RNA integrity number) and concentration was assessed by Bioanalyzer 2100 (Agilent technology). RNA-seq libraries were generated by a TruSeq stranded total RNA library prep kit (Illumina) and TruSeq RNA CD Index Plate.

Chromatin immunoprecipitation sequencing

For ChIP experiments, cells were cross-linked by adding to the media 37% formaldehyde to a final concentration of 1%, gently mixed, incubated for 10 min, and quenched for 5 min with 2.5 M glycine for a final of 0.125 M per plate. Cells were washed using cold 1× phosphate-buffered saline and scraped in 15 ml of cold Farnham lysis buffer and protease inhibitor (Roche, 11836145001), and cell pellets were flash frozen and kept at −80°C. Thawed pellets were resuspended in radioimmunoprecipitation assay buffer on ice, aliquoted into 20 million cells per tube, and sonicated by Bioruptor (three 15-min rounds of 30 s ON, 30 s). ChIP was performed on 10 million cells using antibodies to H3K27ac, H3K4me3, and H3K4me1 histone marks (ab4729/lot no. GR274237, ab8580/lot no. GR273043, and ab8895/lot no. GR262515, respectively). M-280 sheep anti-rabbit immunoglobulin G Dynabeads (Invitrogen, 11203D) was used for chromatin immunoprecipitation. DNA was purified using the Qiagen MinElute PCR Purification Kit, quantified by Qubit, and prepared for sequencing using the Kapa Hyper Prep Kit. All libraries were pooled to 10 nM per sample before sequencing.

Assay for transposase-accessible chromatin sequencing

Approximately, 50,000 cells were harvested and used for ATAC-seq library preparation as described in the Fast-ATAC protocol (57). ATAC-seq libraries were uniquely indexed with Nextera PCR Primers and amplified with 9 to 12 cycles of PCR amplification. Amplified DNA fragments were purified with 0.8:1 ratio of Agencourt AMPure XP (Beckman Coulter) to sample. Libraries were quantified by Qubit, and size distribution was inspected by Bioanalyzer (Agilent Genomic DNA chip, Agilent Technologies). All libraries were pooled to 10 nM per sample before sequencing.

Promoter capture Hi-C

In situ Hi-C was performed as described previously (58). Briefly, 5 million decidualized cells were treated with formaldehyde 1% to cross-link interacting DNA loci. Cross-linked chromatin was treated with lysed and digested with MboI endonuclease (New England Biolabs). Subsequently, the restriction fragment overhangs were filled in and the DNA ends were marked with biotin-14-dATP (Life Technologies). The biotin-labeled DNA was sheared and pulled down using Dynabeads MyOne Stretavidin T1 beads (Life Technologies, 65602) and prepared for Illumina paired-end sequencing. The in situ Hi-C library was amplified directly off of the T1 beads with nine cycles of PCR using Illumina primers and protocol (Illumina, 2007). Promoter capture was performed as described previously (39). The Hi-C library was hybridized to 81,735 biotinylated 120-bp custom RNA oligomers (Custom Array) targeting promoter regions (four probes/RefSeq transcription start sites). After hybridization, postcapture PCR was performed on the DNA bound to the beads via biotinylated RNA.

Differential expression

Read counts per gene were calculated with Salmon (59) version 0.12.0 on transcripts from human Gencode release 19 ( and Estimated counts were used in exploratory analysis (transformed with DESeq2’s rlog function) and in DESeq2 (25) version 1.24.0 to identify differentially expressed genes (adjusted P ≤ 0.05 and absolute fold change of ≥1.2). After observing that replicates for each cell lines clustered together, we pooled reads for each cell line, combining three decidualization experiments in each sample. We then performed a paired analysis to obtain genes that were differentially expressed between untreated and decidualized samples. The six samples clustered by treatment and by cell line and analysis with svaseq (60) showed that the two surrogate variables identified correlated with cell line, and therefore, a paired analysis was enough to correct the data.

Peak calling

ATAC-seq reads were trimmed with cutadapt and aligned with bowtie2 (61) version Reads with mapping quality lower than 10 were discarded. ChIP-seq reads were also aligned with bowtie2. Peaks were called using MACS2 (62) version 2.1.2 with parameters --llocal 20000 --shift -100 --extsize 200 -q 0.05 for ATAC-seq and default parameters for ChIP-seq. Peaks overlapping coordinates blacklisted by Kundaje were excluded (

Differential ATAC-seq and ChIP-seq peaks

Similarly to RNA-seq, we pooled reads from replicates for each cell line. We called peaks for each of the six samples using MACS2 and converted peak coordinates into 100-bp contiguous bins. Bins covered by less than 60% of their extension were excluded. To identify reproducible peaks, we only kept bins that were present in at least two of the three cell lines in each condition, allowing for condition-specific peaks. See table S7 for an assessment of the contribution of each cell line to the universe of peaks obtained. We then merged all adjacent bins, expanding them back into longer peaks. We counted the number of reads in all peaks and in all samples and compared the read counts using DESeq2 (adjusted P < 0.05 and absolute fold change >1.2).

Statistical analysis of the frequencies of differential peaks near differentially expressed genes

The P values in Fig. 2B were calculated with a chi-square test of the number of peaks with increased or decreased numbers of reads observed and an expected probability based on the number of peaks in each category for each dataset. Bonferroni correction was performed to correct for multiple testing.

Transcription factor ChIP-seq

ChIP-seq reads were downloaded from National Center for Biotechnology Information Gene Expression Omnibus and processed locally. HOMER 4.9 (63) was used to call peaks for the following samples: PGR (GSE94038); NR2F2 (GSE52008); FOSL2 (GSE94038), FOXO1 (GSE94037); and NR2F2 input (GSE52008); and FOXO1, PGR, and FOSL2 input (GSE94038).

Overlap between ATAC-seq and ChIP-seq peaks

Reproducible peaks were converted into 100 bp bins and those with >60% of their extension covered by a peak was retained. Common bins were counted, and the number of counts was plotted with UpSetR 1.4.0.

Motif enrichment

We used HOMER 4.9 to identify DNA binding motifs enriched in peaks with parameters -len 8,10,12 -size 200 -mask.

Enrichment of overlap between peaks

Enrichment was calculated as the observed number of overlapping peaks divided by the expected number of overlapping peaks using bedtools intersectBed with a 1 bp minimum. The expected number of overlapping peaks was obtained by averaging 100 random samples of peaks with bedtools shuffle excluding gaps annotated by the University of California, Santa Cruz Genome Browser (64). While shuffling peaks does not account for mapping and other biases that make peak locations nonuniform and may result in overestimation of enrichment, our results are limited to comparisons between enrichments, which should cancel any biases.

Hi-C interaction calling

We used HiCUP v0.5.9 (65) to align and filter Hi-C reads. HiCUP used bowtie2 version 2.2.3 to align reads. Unique reads were used as input by CHiCAGO (66) version 1.2.0, and significant interactions were called with default parameters. We only kept interactions identified by CHiCAGO that were in cis and with an end located at least 10 kb from a capture probe.

Pairing differential peaks with putative target genes

To pair peaks using pcHi-C, significant interactions identified by CHiCAGO that overlapped an ATAC-seq or ChIP-seq peak and were less than 300 kb away from a promoter were used. We chose 300 kb because the mean distance between interacting promoters and other regions was 280 kb (median, 200 kb). To pair peaks to the nearest gene, BEDTools closest -t first -d was used to find the gene closest to a peak, up to 300 kb away. To pair peaks to a random gene, all genes up to 300 kb from a peak were selected and one gene was randomly assigned to each peak. For each of these sets of pairs, we calculated the fraction of peak/gene pairs that had the same direction of change according to differential read count analysis with DESeq2, of the total number of peak/gene pairs. Only genes expressed at >1 transcript per million across all samples were used in the nearest and random gene assignments.

P values were calculated with a chi-square test comparing the number of cases in the matched and unmatched categories observed in the random set (average from 200 iterations) and in the two peak/gene pairing methods: nearest gene and pcHi-C interactions.

Gestational duration GWAS

The GWAS results used in this study was an extension of our previously published results (5). Like our previous study, we used summary results from 23andMe, which were obtained from GWAS of gestational duration in 42,121 mothers of European ancestry. In addition, we performed GWA analyses in 14,263 European mothers from six academic datasets. To increase the power of GWA discovery, we performed meta-analysis between the results from 23andMe and the results from the six datasets. See the Supplementary Materials for a full description of the GWAS.

GWAS enrichment analysis with S-LDSC

We assessed how much of the heritability of gestational duration is contained within ATAC-seq, H3K4me1, H3K4me3, H3K27ac, and pcHi-C peaks using S-LDSC (41). S-LDSC is a generalization of LD score regression, a method for estimating the heritability of a trait using SNP-level GWAS summary statistics and SNP-level estimates of the amount of genetic variation tagged at each variant, known as LD scores. Under the LD score regression model, the expected value of the GWAS summary statistic for a variant (specifically, the expected value of the χ2 statistic) is a linear function of the LD score at that site, and h2, the per-SNP heritability, and a an intercept parameter. Under the S-LDSC model, rather than estimating a single per-SNP heritability parameter, a parameter is estimated for each of several functional annotations. In a standard S-LDSC analysis, user-provided annotations are combined with a “baseline” set of genomic annotations from publicly available datasets. For this analysis, LD scores were calculated using the peaks identified as reproducible across either treated or untreated samples as annotations and the genotype data from the European individuals from phase 3 of the 1000 Genomes project (obtained from the Price Lab website: as a reference LD panel, using only the HapMap3 SNP list (also from the Price lab website). S-LDSC was performed on the gestational duration GWAS using the endometrial-tissue derived LD scores and the baseline LD scores contained in version 2.2 of the LD score regression baseline LD model. We include all annotations from the baseline LD model except those “flanking” annotations. This resulted in a total of 64 baseline annotations used in our S-LDSC analysis.

Fine-mapping GWAS loci associated with gestational length

Fine mapping proceeded in three stages. In the first stage, we partitioned the genome into 1703 regions approximately independent regions using breakpoints derived by Berisa et al. (40). Next, we constructed an SNP-level prior probability of being causal variant, informed by the functional genomic data that we collected. We used a Bayesian hierarchical model [TORUS (42)]. TORUS takes as input GWAS summary statistics and genomic annotations and estimates the extent to which SNPs with functional genomic annotations are likely to be causal for a trait of interest. Specifically, under TORUS, each SNP has a small prior probability of being a causal variant, which is a logistic function of the annotations of the SNP. Then, TORUS estimates the parameters of this logistic function using genome-wide summary statistics. Once these parameters are estimated, each SNP will have a prior causal probability based on its unique functional annotations. We ran TORUS with the gestational age GWAS summary statistics and the reproducible H3K27ac and H3K4me1 peaks from the treated samples along with the pcHi-C contact regions to obtain an SNP-level prior.

Last, fine mapping was performed using a summary statistics-based version of the “sum of single effects” model (43) using 1000 Genome as reference panel. SuSiE (as implemented in the R package “susieR”) was run on the 10 regions believed to have one or more causal variants with an FDR of 0.1 as estimated by TORUS. For each region, SuSiE was run with a uniform prior (default setting of SuSiE) and with an informed prior learned by TORUS. The parameter L of SuSiE (maximum number of causal variants) is set at 1 when running SuSiE (67, 68).

SNPs in Epigenome roadmap histone modification peaks

H3K27ac, H3K4me1, and H4K4me3 histone modification peak coordinates were downloaded from the Epigenome Roadmap data website, and bedtools intersect was used to find peaks that overlapped SNPs coordinates.


Supplementary material for this article is available at

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.


Acknowledgments: We acknowledge C. Billstrand, K. Naughton, M. Soliai, and R. Nicolae for assistance with sample processing; R. Nasim, S. Chinthala, and R. Loth for help with sample collection; R. Minhas for help with clinical questions and data entry; and B. Furner and S. Choi for designing and maintaining the sample tracking database. ALSPAC GWAS data: We are grateful to all the families who took part in this study, the midwives for help in recruiting them, and the whole ALSPAC team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists, and nurses. Funding: The UChicago-Northwestern-Duke Prematurity Research Center was supported by a research grant from the March of Dimes to C.O., M.A.N., G.E.C., and J.K. This work was also supported by the March of Dimes Prematurity Research Center Ohio Collaborative and Bill and Melinda Gates Foundation (OPP1113966) to L.J.M. and G.Z. The UK Medical Research Council and Wellcome (grant reference 217065/Z/19/Z) and the University of Bristol provide core support for ALSPAC. A comprehensive list of grants funding is available on the ALSPAC website ( This research was specifically funded by Wellcome Trust WT088806 (Maternal genotype). This work was partially funded from grants from the National Institutes of Health, R01HL129735 (C.O.), 1R01MH110531 (X.H.), R01HL128075, R01119577, and R01DK114661 (M.A.N.). Ethics statement: This study was approved by the Institutional Review Boards at the University of Chicago, Northwestern University and Duke University Medical School. Informed consent for the use of data collected via questionnaires and clinics was obtained from participants following the recommendations of the ALSPAC Ethics and Law Committee at the time. Informed consent for the use of genetic data in the other six GWAS used in this study was also obtained from participants. Details are available in the Supplementary Materials. Author contributions: C.O., M.A.N., G.E.C., and X.H. designed and conceptualized the work. N.J.S., N.K., and J.M. analyzed data and interpreted results. I.A. coordinated all experiments. N.C., D.R.S., C.P., C.H., R.Z., H.K., and I.A. performed experiments. R.A. and S.R. facilitated sample collection. G.Z., B.J., M.H., K.T., and L.J.M. contributed the GWAS data. X.L., V.C.C., J.K., S.R., W.G., A.M., C.G., and I.A. contributed to discussions on study design. S.R., G.Z., L.J.M., V.J.L., G.E.C., C.O., X.H., and M.A.N. supervised aspects of the study. N.J.S., I.A., C.O., G.E.C., X.H., and M.A.N. wrote the manuscript. All authors read and commented on the manuscript. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data generated or analyzed during this study are included in this published article and available at All the peaks sets, pcHi-C, read count, and TPM data are available. An expanded set of 2530 differentially expressed genes and sets of differential peaks called without statistically testing for fold change is also made available. Source code for the GWAS enrichment analyses can be found at Blacklisted regions: Price Lab website: Epigenome Roadmap data website: Additional data related to this paper may be requested from the authors.

Stay Connected to Science Advances

Navigate This Article