Research ArticleGENETICS

Cardiac cell type–specific gene regulatory programs and disease risk association

See allHide authors and affiliations

Science Advances  14 May 2021:
Vol. 7, no. 20, eabf1444
DOI: 10.1126/sciadv.abf1444

Abstract

Misregulated gene expression in human hearts can result in cardiovascular diseases that are leading causes of mortality worldwide. However, the limited information on the genomic location of candidate cis-regulatory elements (cCREs) such as enhancers and promoters in distinct cardiac cell types has restricted the understanding of these diseases. Here, we defined >287,000 cCREs in the four chambers of the human heart at single-cell resolution, which revealed cCREs and candidate transcription factors associated with cardiac cell types in a region-dependent manner and during heart failure. We further found cardiovascular disease–associated genetic variants enriched within these cCREs including 38 candidate causal atrial fibrillation variants localized to cardiomyocyte cCREs. Additional functional studies revealed that two of these variants affect a cCRE controlling KCNH2/HERG expression and action potential repolarization. Overall, this atlas of human cardiac cCREs provides the foundation for illuminating cell type–specific gene regulation in human hearts during health and disease.

INTRODUCTION

Disruption of gene regulation is an important contributor to cardiovascular disease, the leading cause of morbidity and mortality worldwide (1). Cis-regulatory elements such as enhancers and promoters are crucial for regulating gene expression (25). Mutations in transcription factors and chromatin regulators can result in heart disease (6, 7), and genetic variants associated with risk of cardiovascular disease are enriched within annotated candidate cis-regulatory elements (cCREs) in the human genome (8). However, a major barrier to understanding the genetic and molecular basis of cardiovascular diseases is the paucity of maps and tools to interrogate gene regulatory programs in the distinct cell types of the human heart. Recent single-cell/nucleus RNA sequencing (RNA-seq) (911) and spatial transcriptomic (12) studies have revealed gene expression patterns in distinct cardiac cell types across developmental and adulthood stages in the human heart, including some that display gene expression patterns that are cardiac chamber/region specific (10, 11). However, the transcriptional regulatory programs responsible for cell type–specific and chamber-specific gene expression, and their potential links to noncoding risk variants for cardiovascular diseases and traits, remain to be fully defined.

cCREs have been annotated in the human genome with the use of chromatin immunoprecipitation sequencing (ChIP-seq), deoxyribonuclease sequencing (DNase-seq), assay for transposase-accessible chromatin using sequencing (ATAC-seq), global run-on sequencing, etc. in a broad spectrum of human tissues including in bulk heart tissues and in purified cardiomyocytes (25, 1318). These maps have provided important insights into dynamic gene regulation during heart failure (15, 16, 18) and begun to shed light on the function of noncoding cardiovascular disease variants (8, 13, 16, 18, 19). However, major limitations of these studies including their focus on particular chambers/regions of the heart and failure to interrogate cis-regulatory elements across all distinct cardiac cell types have restricted their utility in understanding how specific gene regulatory mechanisms may affect distinct cell types and regions of human hearts in health and disease. Although recent single-cell genomic tools provide the opportunity to interrogate cis-regulatory elements at single-cell resolution (2023), their application to mammalian hearts has been limited to one single-cell ATAC-seq dataset from adult mouse heart (24), fewer than 200 total cells from mouse fetal hearts (25), and fetal human heart (26). Thus, to comprehensively investigate cis-regulatory elements in the specific cell types of the adult human heart, we profiled chromatin accessibility in ~80,000 heart cells using single-nucleus ATAC-seq (snATAC-seq) (21, 27) and created a comprehensive cardiac cell atlas of cCREs annotated by cell type and putative target genes. Integration of these data with single-nucleus RNA-seq (snRNA-seq) datasets from matched specimens revealed gene regulatory programs in nine major cardiac cell types. Using this human cardiac cCRE atlas, we further observed the remodeling of cell type–specific candidate enhancers during heart failure and the enrichment of cardiovascular disease–associated genetic variants in cCREs of specific cell types. Last, we showed that a cardiomyocyte-specific enhancer harboring risk variants for atrial fibrillation (AF) is necessary for cardiomyocyte KCNH2 expression and regulation of cardiac action potential repolarization.

RESULTS

Single nucleus analysis of chromatin accessibility and transcriptome in adult human hearts

To assess the accessible chromatin landscape of distinct cardiovascular cell types, we performed snATAC-seq (27), also known as sci-ATAC-seq (21), on all cardiac chambers from four adult human hearts without known cardiovascular disease (table S1). We obtained accessible chromatin profiles for 79,515 nuclei, with a median of 2682 fragments mapped per nucleus (Fig. 1, A and B; fig. S1; and table S2). We also performed snRNA-seq for a subset of the above heart samples to complement the accessible chromatin data and obtained 35,936 nuclear transcriptomes, with a median of 2184 unique molecular identifiers and 1286 genes detected per nucleus (Fig. 1, A and C; fig. S2, A to F; and table S3). Using SnapATAC (28) and Seurat (29), we identified nine clusters from snATAC-seq (Fig. 1B) and 12 major clusters from snRNA-seq (Fig. 1C and fig. S2, G and H), which were annotated on the basis of chromatin accessibility at promoter regions or expression of known lineage-specific marker genes, respectively (Fig. 1, D and E, and table S4) (10, 11). For example, chromatin accessibility and gene expression of atrial and ventricular cardiomyocyte markers such as NPPA and MYH7 (30) were used to classify these two cardiomyocyte subtypes (Fig. 1, D and E). Although gene expression patterns of lineage markers strongly correlated with accessibility at promoter regions across annotated cell types (Fig. 1F) and single-cell integration analysis (29) revealed 93% concordance in annotation between snATAC-seq and snRNA-seq datasets (fig. S3 and table S3), some cellular subtypes identified from snRNA-seq including endocardial cells and myofibroblasts were not detected by snATAC-seq (Fig. 1F). Notably, cluster correlation and integration analysis showed that these cell types are present within the snATAC-seq data as part of the endothelial and smooth muscle clusters, respectively (Fig. 1F, fig. S3, and table S3). The discrepancy in clustering may be attributable to the conservative snATAC-seq clustering parameters or the sparse nature of snATAC-seq data (31). In addition, atrial and ventricular cardiomyocyte nuclei from the left and right regions of the heart could be further clustered by transcriptome but not chromatin accessibility (fig. S2, I and J). We noted that cell type composition varied substantially between biospecimens and donors, highlighting the importance of single-cell approaches to limit biases due to cell proportion differences in bulk assays (fig. S4 and tables S2 and S3). In summary, we identified and annotated cardiac cell types using both chromatin accessibility and nuclear transcriptome profiles.

Fig. 1 Single-nucleus chromatin accessibility and transcriptome profiling of human hearts.

(A) snATAC-seq and snRNA-seq were performed on nuclei isolated from cardiac chambers from four human donors without cardiovascular pathology. snATAC-seq: n = 4 (left ventricle), n = 4 (right ventricle), n = 3 (left atrium), and n = 2 (right atrium); snRNA-seq: n = 2 (left ventricle), n = 2 (right ventricle), n = 2 (left atrium), and n = 1 (right atrium). (B) Uniform manifold approximation and projection (UMAP) (108) and clustering analysis of snATAC-seq data reveal nine clusters. Each dot represents a nucleus colored by cluster identity. (C) UMAP (108) and clustering analysis of snRNA-seq data reveal 12 major clusters. Each dot represents a nucleus colored by cluster identity. Nerv., nervous. Art. sm. musc., arterial smooth muscle. (D) Genome browser tracks (141) of aggregate chromatin accessibility profiles [scale: reads per million (RPM)] at selected representative marker gene examples for individual clusters and for all nuclei pooled together into an aggregated heart dataset (top track, gray). Black genes below tracks represent the indicated marker genes, nonmarker genes are grayed. (E) Dot plot illustrating expression of representative marker gene examples in individual snRNA-seq clusters. (F) Heatmap illustrating the correlation between clusters defined by chromatin accessibility and transcriptomes. Pearson correlation coefficients were calculated between chromatin accessibility at cCREs within 2 kbp of annotated promoter regions (76) and expression of the corresponding genes for each cluster.

Identification of cCREs in distinct cell types of the human heart

To discover the cCREs in each cell type of the human heart, we aggregated snATAC-seq data from nuclei comprising each cell cluster individually and determined accessible chromatin regions with MACS2 (32). We then merged the peaks from all nine cell clusters into a union of 287,415 cCREs, which covered 4.7% of the human genome (Fig. 2A and table S5). Sixty-seven percent of the cCREs identified in the current study overlapped previously annotated cCREs from a broad spectrum of human tissues and cell lines (fig. S5A) (5), and the union of heart cCREs captured 98.6 and 95.4% of candidate human heart enhancers reported in two previous bulk studies (fig. S5, B and C) (13, 15). Furthermore, 75% of cCREs in the union were at least 2 kilo–base pairs (kbp) away from annotated promoter regions, and 19,447 displayed high levels of cell type specificity [false discovery rate (FDR) < 0.01; Fig. 2B and table S6]. Gene ontology analysis (33) revealed that these cell type–specific cCREs were proximal to genes involved in relevant biological processes, including collagen fibril organization for cardiac fibroblast–specific cCREs (K1) and myofibril organization for ventricular cardiomyocyte–specific cCREs (K2; Fig. 2C and table S7). Using chromVAR (table S8) (34) and HOMER (table S9) (35), we detected cell type–dependent enrichment for 231 transcription factor–binding signatures, such as MEF2A/B, NKX2.5, and THR-β sequence motifs in cardiomyocyte-specific cCREs and TCF21 motifs in cardiac fibroblast–specific cCREs (Fig. 2D). To discover the transcription factors that may bind to these sites, we combined corresponding snRNA-seq data with sequence motif enrichments to correlate the expression of these transcription factors with motif enrichment patterns across cell types (Fig. 2E). As an example, we found strong enrichment of the binding motif for the macrophage transcription factor SPI1/PU.1 (36) in macrophage-specific cCREs, and SPI1 was exclusively expressed in macrophages (Fig. 2E and table S4). In addition, we observed that transcription factor family members were expressed in cell type–specific combinations. For instance, while GATA Family of transcription factors (GATA-binding factor) displayed similar motif enrichment patterns across sets of cell type–specific cCREs, we found that endothelial cells and cardiac fibroblasts expressed GATA2 and GATA6, respectively, whereas cardiomyocytes expressed both GATA4 and GATA6 and endocardial cells expressed GATA2, GATA4, and GATA6 (Fig. 2E and table S4). In summary, these results establish a resource of cCREs for interrogation of cardiac cell type–specific gene regulatory programs.

Fig. 2 Characterization of gene regulatory programs in cardiac cell types.

(A) Heatmap illustrating row-normalized chromatin accessibility values for the union of 287,415 cCREs. K-means clustering was performed to group cCREs on the basis of relative accessibility patterns. (B) Heatmap showing row-normalized chromatin accessibility of 19,447 cell type–specific cCREs (FDR < 0.01 after Benjamini-Hochberg correction; fold change > 1.2). K-means clustering was performed to group cCREs on the basis of relative accessibility patterns. Number of cCREs per K can be found in brackets. (C) Genomic Regions Enrichment of Annotations Tool (GREAT) ontology analysis (33) of cell type–specific cCREs. Q value for enrichment indicates Bonferroni-adjusted P value. NK, natural killer; GO, gene ontology; BP, blood pressure. (D) Transcription factor motif enrichment (35) for known and de novo motifs within cell type–specific cCREs. The heatmap in shows motifs with an enrichment P value of <10−5 in at least one cluster. For de novo transcription factor motifs, the best matches for the top motifs are displayed. Statistical test for motif enrichment: hypergeometric test. P values were not corrected for multiple testing. (E) Combination of transcription factor motif enrichment and gene expression shows cell type–specific roles for members of transcription factor families. Displayed are heatmaps for known motif enrichment in cell type–specific cCREs (left) and gene expression across clusters (right). (Fb., fibroblast; vCm., ventricular cardiomyocyte; aCm., atrial cardiomyocyte; Ec., endothelial; Sm., smooth muscle; Mac., macrophage; Lc., lymphocyte; Ad., adipocyte; Nr., nervous.)

Cardiac cell type–specific gene regulatory programs implicated in chamber-specific structure and function

Each cardiac chamber performs a unique role that is crucial to system-level heart function (37). To investigate the gene regulatory programs underlying chamber-specific gene expression and cellular functions in distinct cardiac cell types, we tested cCREs for differential accessibility across five of the most abundant cell types of the heart: cardiomyocytes, cardiac fibroblasts, endothelial cells, smooth muscle cells, and macrophages. We found 16,451 differentially accessible (DA) cCREs between pooled atria and ventricles, most of which were detected in cardiomyocytes (Fig. 3, A to C, and table S10). Specifically, 11,159 cCREs displayed differential accessibility between the right atrium and the right ventricle, and 12,962 cCREs exhibited differential accessibility between the left atrium and the left ventricle (fig. S6, A to C, and table S10). Comparing the left and right sides of the heart, we identified 101 DA cCREs between the right and left ventricles (fig. S6D) and 2687 DA cCREs between left and right atria, which, in contrast to comparisons between atria and ventricles, were found primarily in cardiac fibroblasts (fig. S6E and table S10).

Fig. 3 Cardiomyocyte cCREs display chamber-dependent differences in chromatin accessibility.

(A) Scheme for comparison of major cell types across heart chambers. (B) Volcano plot showing DA cCREs in each cell type between atria and ventricles. Each dot represents a cCRE, and the color indicates the cell type. cCREs with log2(fold change) > 1 and FDR < 0.05 after Benjamini-Hochberg correction (outside the shaded area) were considered as DA. (C) Number of DA cCREs between atria and ventricles by cell type. (D) Heatmaps showing normalized gene expression levels of differentially expressed genes between atrial (aCM) and ventricular cardiomyocytes (vCM) that were linked by coaccessibility to distal DA cCREs that were more accessible in atrial cardiomyocytes (Atrial CMDA) or ventricular cardiomyocytes (Ventr. CMDA), respectively. (E) GREAT ontology analysis (33) of DA cCREs between atrial and ventricular cardiomyocytes. AV, atrioventricular. P values shown are Bonferroni adjusted (n.d., not detected). (F) Genome browser tracks (141) showing chromatin accessibility (scale, RPM) and gene expression [scale, reads per kilobase per million mapped reads (RPKM)] in atrial and ventricular cardiomyocytes as well as DA cCREs that were coaccessible with the promoter of MYL2. Gray dotted line indicates coaccessibility threshold (>0.1). Red shaded areas, distal DA cCREs coaccessible with MYL2 promoter. Gray shaded areas, DA cCREs overlapping the promoter region of MYL2. (G) Transcription factor motif enrichment analysis (35) of DA cCREs between atrial and ventricular cardiomyocytes. The best matches for the top de novo motifs (score > 0.7) are shown. Statistical test for motif enrichment: hypergeometric test. P values were not corrected for multiple testing.

Using coaccessibility analysis (38) to link distal DA cCREs (~88% of all DA cCREs) to their putative target genes (table S11; median distance, 88.7 kbp), we observed that distal DA cCREs in cardiomyocytes between atria and ventricles were associated with chamber-specific gene expression of their putative target genes (Fig. 3D; fig. S6, B to E; and table S12), and genes near these DA cCREs were enriched for chamber-specific biological processes (Fig. 3E; fig. S6, B to E; and table S13). Specifically, distal DA cCREs with higher accessibility in atrial cardiomyocytes were associated with genes such as PITX2, a transcriptional regulator of cardiac atrial development (39), as well as the ion channel subunit SCN5A, which regulates cardiomyocyte action potential (Fig. 3E and table S13) (40). Furthermore, we found distal DA cCREs with higher accessibility in atrial cardiomyocytes at the HAMP gene locus, which encodes a key regulator of ion homeostasis and was recently described as a potential previously unknown cardiac gene in the right atrium by single-nucleus transcriptomic analysis (10, 11). Conversely, genes near distal DA cCREs with higher accessibility in ventricular cardiomyocytes were enriched for biological processes such as trabecula formation and ventricular cardiac muscle cell differentiation. For example, several distal DA cCREs with increased accessibility in ventricular cardiomyocytes compared to atrial cardiomyocytes were linked to the promoter region of MYL2, which encodes the ventricular isoform of myosin light chain 2 (Fig. 3F and table S4) (41), a regulator of ventricular cardiomyocyte sarcomere function.

In addition, analysis of distal DA cCREs in cardiac fibroblasts revealed that putative target genes were involved in distinct biological processes between right and left atria. In particular, we found that DA cCREs with higher accessibility in right atrial cardiac fibroblasts were proximal to genes involved in heart development, heart growth, and tube development, whereas DA cCREs with higher accessibility in left atrial cardiac fibroblasts were adjacent to genes involved in biological processes such as wound healing and vasculature development (fig. S6E and table S13). We further found a cardiac fibroblast–specific DA cCRE with higher accessibility in left atria at the fibrinogen FN1 gene locus, potentially indicating a more activated fibroblast state (10, 42). Supporting these findings, we identified several other DA cCREs with higher accessibility in left atrial cardiac fibroblasts adjacent to genes involved in generation of extracellular matrix (ECM) such as MMP2 and FBLN2 (table S13). These observations are consistent with previous findings that a higher fraction of ECM is produced in fibroblasts of the left atrium (10).

Using motif enrichment analysis, we inferred candidate transcriptional regulators involved in chamber-specific cellular specialization, including TBX5 (T-Box Transcription Factor 5), GATA4, and TGIF1 (TGFB Induced Factor Homeobox 1) for atrial cardiomyocytes and nuclear factor of activated T cells (NFAT), ERRG (Estrogen-Related Receptor Gamma), HAND1 (Heart- and Neural Crest Derivatives-Expressed Protein 1), and HAND2 for ventricular cardiomyocytes (Fig. 3G and table S14). While the TBX5 DNA binding motif was strongly enriched in both right and left atrial cardiomyocyte DA cCREs, the NFAT5 motif ranked highest in left ventricular cardiomyocyte DA cCREs and the TBX20 motif was strongly enriched in right ventricular cardiomyocyte DA cCREs (fig. S6, B and C, and table S14). Furthermore, cardiac fibroblast DA cCREs with higher accessibility in the right atrium were enriched for the binding motif of forkhead transcription factors (fig. S6E), whereas cardiac fibroblast DA cCREs with higher accessibility in the left atrium were enriched for the homeobox transcription factor CUX1 motif (fig. S6E and table S14). Together, we identified cCREs and candidate transcription factors associated with specific cardiac chambers, particularly within cardiomyocytes and cardiac fibroblasts.

Cell type specificity of candidate enhancers associated with heart failure

Recent large-scale studies profiling the H3K27ac histone modification in human hearts have uncovered candidate enhancers associated with heart failure (15, 18). However, because these studies either examined heterogeneous bulk heart tissue (15, 18) or focused solely on enriched cardiomyocytes (16), it remains unclear what role, if any, additional cardiac cell types and cCREs may contribute to heart failure pathogenesis. Using our cell atlas of cardiac cCREs, we revealed the cell type specificity of candidate enhancers showing differential H3K27ac signal strength between human hearts from healthy donors and donors with dilated cardiomyopathy (heart failure) (Fig. 4 and fig. S7, A to E) (15). We observed that a large fraction of candidate enhancers that displayed increased activity (45%) during heart failure were accessible primarily in cardiac fibroblasts (Fig. 4A, K2-4up, and table S15), whereas most of those exhibiting decreased activity (67%) were accessible primarily in cardiomyocytes (Fig. 4B, K1-3down, and table S15). Candidate enhancers with increased activity in cardiac fibroblasts were proximal to genes involved in ECM organization and connective tissue development (Fig. 4A, K2-4up, and table S16), whereas those exhibiting decreased activity in cardiomyocytes were proximal to genes involved in the regulation of heart contraction and cation transport (Fig. 4B, K1-3down, and table S16). For example, several of these cardiac fibroblast candidate enhancers were present at loci encoding the ECM proteins lumican (LUM) and decorin (DCN) and coaccessible with the promoters of these genes (Fig. 4C). Consistent with these findings, both genes were primarily expressed in cardiac fibroblasts (fig. S7F and table S4), and LUM has been reported to exhibit increased expression in failing hearts compared to control hearts (15). On the other hand, several cardiomyocyte candidate enhancers displaying decreased activity in heart failure were coaccessible with the promoter region of IRX4 (Fig. 4D), which encodes a ventricle-specific transcription factor (43) and is primarily expressed in cardiomyocytes of the left ventricle (fig. S7G and table S4).

Fig. 4 Cell type specificity of candidate enhancers associated with heart failure.

(A) Cell type specificity of 4406 candidate enhancers with increased H3K27ac signal in failing left ventricles (15). Heatmap displays cell type–resolved chromatin accessibility RPKM values for cell types from left ventricular snATAC-seq datasets. Candidate enhancers were grouped on the basis of chromatin accessibility patterns across cell clusters using K-means. HF, heart failure. (B) Cell type specificity of 3101 candidate enhancers with decreased H3K27ac signal in failing left ventricles (15). (C) Genome browser tracks (141) showing several candidate enhancers with increased activity during heart failure that were primarily accessible in fibroblasts and coaccessible with the promoters of LUM and/or DCN. For visualization, linkages between cCREs within candidate enhancers and all gene promoters are shown (coaccessibility > 0.1, gray dotted line). Candidate enhancers coaccessible with gene promoters are indicated by red shaded boxes and promoter regions are indicated by gray shaded boxes (scale, RPM). (D) Genome browser tracks (141) showing several bulk candidate enhancers with decreased activity in heart failure that were primarily accessible in cardiomyocytes and coaccessible with the promoter of IRX4 (scale, RPM). (E and F) Transcription factor motif enrichment (35) in the candidate enhancers with (E) increased and (F) decreased activity in failing left ventricles. Analysis was performed on the indicated K cluster(s) from (A) and (B), respectively. The best matches for selected de novo motifs (score > 0.7) are shown. Statistical test for motif enrichment: hypergeometric test. P values were not corrected for multiple testing.

To identify potential transcription factors regulating these pathologic responses during heart failure, we performed motif enrichment analysis in cell type–specific subsets of enhancers showing differential H3K27ac signal strength between healthy and failing hearts (table S17). For candidate enhancers exhibiting increased activity in heart failure, we identified enrichment of not only bHLH motifs such as AP4 in cardiac fibroblast candidate enhancers, which matched previous bulk analysis (Fig. 4E, K2-4up) (15), but also TEAD3 and MYF6 motifs in cardiomyocyte candidate enhancers (Fig. 4E, K1up). Conversely, for candidate enhancers displaying decreased activity in heart failure, we observed enrichment of nuclear receptor motifs such as glucocorticoid response element in cardiomyocyte candidate enhancers, which is consistent with previous findings (Fig. 4F, K1-3down) (15), as well as other motifs that were not detected in bulk analyses, such as the bZIP transcription factor CEBPA for cardiac fibroblast candidate enhancers (Fig. 4F, K4down). Thus, these results show that this cardiac cell atlas of cCREs may be used to assign disease-associated candidate enhancers from bulk assays to their affected cell types and infer transcriptional regulators involved in lineage-specific disease pathogenesis.

Interpreting noncoding risk variants of cardiac diseases and traits

Noncoding genetic variants contributing to risk of complex diseases are enriched within cCREs in a tissue and cell type–dependent manner (24, 4447). To examine the enrichment of cardiovascular disease variants within cCREs active in cardiac cell types, we performed cell type–stratified LD (linkage disequilibrium) score regression analysis (48) using genome-wide association study (GWAS) summary statistics for cardiovascular diseases (Fig. 5A) (4953) and control traits (fig. S8A and table S18) by measuring the enrichment of disease-associated variants within all cCREs identified for each cell type. This analysis revealed significant enrichment of AF-associated variants in both atrial (Z = 5.61, FDR = 1.9 × 10−6) and ventricular cardiomyocyte cCREs (Z = 6.80, FDR = 2.8 × 10−9), varicose vein-associated variants in endothelial cell cCREs (Z = 4.36, FDR = 3.9 × 10−4), and coronary artery disease–associated variants in cardiac fibroblast cCREs (Z = 3.29, FDR = 1.7 × 10−2; Fig. 5A). Notably, except for AF, these associations were not significant in a pseudo-bulk heart dataset created by combining chromatin accessibility profiles from all cardiac cell types (Fig. 5A). Furthermore, cardiovascular disease variants were not significantly enriched in accessible chromatin from noncardiac tissues (3, 5, 17) or human lung cell types (54), with the exception of a significant enrichment of varicose vein–associated variants in endothelial cells (fig. S8, B and C).

Fig. 5 Identification and characterization of AF-associated variants at the KCNH2 locus.

(A) Enrichment of cardiovascular disease variants within cardiac cell type cCREs. z scores are shown and were used to compute two-sided P values for enrichments. *FDR < 0.05 and ***FDR < 0.001. (B) Cardiomyocyte differentiation model schematic. (C) Fine mapping (55) and molecular characterization of two AF variants. Genome browser tracks (141) for snATAC-seq (top; scale, RPM) and indicated molecular features during hPSC-cardiomyocyte differentiation (scale, RPKM). Coaccessibility track shows linkages between the AF variant–containing cCRE and promoters (cutoff > 0.1). PPA, posterior probability of association (55). (D) Transgenic mouse embryo showing LacZ reporter expression under control of a genomic region [hs2192, VISTA database (58)] overlapping the variant-cCRE pair. (E) Luciferase reporter assay for the AF variant–harboring cCRE in D15 cardiomyocytes. Genotypes for rs7789146 and rs7789585 were either both G (risk), both A (nonrisk), or a combination. Each dot represents luciferase activity (average of two transfections) in independent replicates of D15 cardiomyocytes. Data are means ± SD. ***P < 0.001 and **P < 0.01 [one-way analysis of variance (ANOVA) and Tukey post hoc test]. Control: minimal promoter. (F) Expression of KCNH2 and TNNT2 in D25 cardiomyocytes after CRIPSR-Cas9–mediated cCRE deletion. Each dot represents an independent cardiomyocyte differentiation. Data are means ± SD. ***P < 0.001, **P < 0.01, and *P < 0.05 (one-way ANOVA and Tukey post hoc test); WT, unperturbed control. FC, fold change. (G) Action potential recordings in hPSC-derived cardiomyocytes with and without cCRE deletion at D25-35. (H) APD90 (action potential duration at 90% depolarization) at 1-Hz pacing for four independent hPSC-derived cardiomyocytes with and without cCRE deletion at D25-35. Data are means ± SD. **P < 0.01 (unpaired two-sided t test).

Next, to identify likely causal AF risk variants in cardiomyocyte cCREs, we first determined the probability that variants were causal for AF [posterior probability of association (PPA)] at 111 known loci using Bayesian fine mapping (55). We then intersected fine-mapped AF variants with cCREs detected in atrial and/or ventricular cardiomyocytes and identified 38 variants with PPA > 10% in cardiomyocyte cCREs, including previously reported variants at the HCN4 (13) and SCN10A/SCN5A (56) loci (table S19). We further prioritized AF variants for molecular characterization on the basis of their overlap with cCREs that were primarily accessible in cardiomyocytes, evolutionarily conserved, and coaccessible with promoters of genes expressed in cardiomyocytes. To experimentally validate the molecular functions of cCREs containing AF variants, we used a human pluripotent stem cell (hPSC)–derived cardiomyocyte differentiation model system (Fig. 5B) (57). From the variant prioritization analysis, we found a cCRE in the second intron of the potassium channel gene KCNH2 (HERG), which was coaccessible with the KCNH2 promoter (Fig. 5C) and harbored two variants, rs7789146 and rs7789585, with a combined PPA of 28% (Fig. 5C and fig. S9A). KCNH2 was primarily expressed in ventricular and atrial cardiomyocytes in the adult human heart (fig. S9B). The cCRE appeared to be activated during hPSC-cardiomyocyte differentiation, as evidenced by an increase in H3K27ac signal that correlated with KCNH2 expression (Fig. 5C). Supporting its in vivo role in regulating gene expression in mammalian hearts, a genomic region (hs2192) (58) containing this cCRE was previously shown to drive LacZ reporter expression in mouse embryonic hearts (Fig. 5D) (58).

A cardiomyocyte enhancer of KCNH2 is affected by noncoding risk variants associated with AF

To investigate whether these AF variants may affect enhancer activity and thereby regulate KCNH2 expression and cardiomyocyte electrophysiologic function, we initially carried out reporter assays using a hPSC cardiomyocyte model system. Results from these studies confirmed that in D15 hPSC cardiomyocytes, the KCNH2 enhancer carrying the rs7789146-G/rs7789585-G AF risk allele displayed significantly weaker enhancer activity than when containing the nonrisk variants (Fig. 5E and fig. S9C), thus supporting the functional significance of these AF variants. We next used CRISPR-Cas9 genome editing strategies to remove the enhancer and performed quantitative polymerase chain reaction (qPCR) and electrophysiologic assays to examine its role in KCNH2 expression and function. Supporting the aforementioned findings, CRISPR-Cas9 genome deletion of this cCRE in hPSC cardiomyocytes resulted in decreased KCNH2 expression in an enhancer dosage–dependent manner (Fig. 5F and fig. S9D). Similar to human cardiomyocytes with loss of KCNH2 function due to mutations in the KCNH2 coding sequence (59) or gene knockdown (60), cellular electrophysiologic studies demonstrated that these cCRE-deleted hPSC cardiomyocytes displayed a significantly prolonged action potential duration (Fig. 5, G and H), thus suggesting that cardiac repolarization abnormalities in atrial cardiomyocytes may lead to AF in an analogous manner to ventricular arrhythmias due to long QT syndrome (59). Together, these results highlight the utility of this single-cell atlas for assigning noncoding cardiovascular disease risk variants to distinct cell types and affected cCREs and functionally interrogating how these variants may contribute to cardiovascular disease risk.

DISCUSSION

The limited ability to interrogate cell type–specific gene regulation in the human heart has been a major barrier for understanding molecular mechanisms of cardiovascular traits and diseases. Here, we report a cell type–resolved atlas of cCREs in the human heart, which was ascertained by profiling accessible chromatin in individual nuclei from all four chambers of multiple human hearts and includes both cell type–specific and heart chamber–specific cCREs. In particular, we observed chamber-specific differences in chromatin accessibility between ventricles and atria as well as left and right atria but notably detected few differences between left and right ventricles. This finding is consistent with recent snRNA-seq analysis in human hearts, which found few differentially expressed genes between left and right ventricles (11). We note that the power to detect chamber-specific differences in chromatin accessibility depends on the number of samples assayed and the total nuclei used as input for differential analysis. Thus, future studies with larger cohorts will likely reveal additional chamber-specific differences between cardiac cell types.

We further highlight the utility of this atlas of heart cCREs to provide previously unidentified insight into aberrant gene regulation during cardiovascular pathology. To this end, we delineated the cell type specificity of enhancers that were differentially active between healthy and failing heart tissue (15) and identified additional transcription factors that may be involved in the pathogenesis of specific cell types during heart failure. Such cell type–specific analysis is particularly important in the context of heart failure because cellular composition can differ between diseased and control hearts (16, 61). This change in cellular composition may, in part, explain the cell type bias that we observed between candidate enhancers exhibiting increased and decreased activity during heart failure (i.e., cardiac fibroblasts and cardiomyocytes, respectively). However, because of the large differences in H3K27ac signal, we suspect that measured changes in candidate enhancer activity could be due to a combination of both enhancer remodeling and shift in cell type composition. Thus, future studies profiling snATAC-seq and H3K27ac in parallel from the same cardiac sample or novel approaches to profile histone modifications in single nuclei (62, 63) will provide greater insight into the extent of changes in chromatin accessibility and enhancer activity in individual cardiac cell types from diseased hearts.

Last, we show how this atlas can be used to not only assign noncoding genetic variants associated with cardiovascular disease risk to cCREs in specific cardiac cell types but also illuminate their cellular and molecular consequences. In particular, we found significant enrichment of AF-associated variants within cardiomyocyte cCREs and functionally interrogated one of these cCREs by demonstrating its role in regulating KCNH2 expression and cardiomyocyte repolarization. Similar to electrophysiologic phenotypes of human cardiomyocytes exhibiting KCNH2 loss of function (59, 60), hPSC cardiomyocytes harboring deletions of this cCRE displayed action potential prolongation, suggesting that cardiac repolarization abnormalities may contribute to AF, possibly through similar mechanisms as to how they may contribute ventricular arrhythmias (59). On the other hand, we found no enrichment of variants associated with heart failure in any cardiac cell type. This finding may reflect the heterogeneous etiologies of cardiovascular diseases and the limited number of currently known risk loci for heart failure (49). Future GWAS in large cohorts with detailed phenotyping, including biobanks such as the UK Biobank (64) and the BioBank Japan Project (65) and whole-genome sequencing efforts such as the National Heart, Lung, and Blood Institute Trans-Omics for Precision Medicine program (66), will help identify and refine disease association signals. Therefore, this atlas of cardiac cCREs will be a valuable resource for continued discovery of regulatory elements, target genes, and specific cell types that may be affected by noncoding cardiovascular genetic variants.

In summary, we created a human heart cell atlas of >287,000 cCREs, which may serve as a reference to further expand our knowledge of gene regulatory mechanisms underlying cardiovascular disease. To facilitate distribution of these data, we created a web portal at http://catlas.org/humanheart. Integrating this resource with genomic and epigenomic clinical cardiac datasets, we built a systematic framework to interrogate how cis-regulatory elements and genetic variants might contribute to cardiovascular diseases such as heart failure or AF. Overall, such information will have great potential to provide insight into the development of future cardiac therapies that are tailored to affected cell types and thus optimized for treating specific cardiovascular diseases.

MATERIALS AND METHODS

Experimental design

We performed snATAC-seq to define a comprehensive catalog of cCREs for the cell types in four regions of nonfailing human hearts and generated in parallel snRNA-seq datasets for a subset to delineate gene expression patterns. We used the cCRE catalog to computationally assign dynamic enhancers in failing hearts to cell types and to assign cardiovascular disease risk variants to cCREs in individual cardiac cell types. Last, we applied reporter assays, genome editing, and electrophysiological measurements in in vitro differentiated human cardiomyocytes to validate the molecular mechanisms of cardiovascular disease risk variants.

Human tissues

Adult human heart tissues were procured at the time of organ donation using an Institutional Review Board protocol (no. 101021) approved by the University of California, San Diego. Donated hearts were perfused with cold cardioplegia before cardiectomy and then explanted immediately into an ice-cold physiologic solution as we previously described (67). Full-thickness samples from each chamber were obtained, and epicardial fat was rapidly removed before immediately flash-freezing samples in liquid nitrogen. Samples were received from the United Network for Organ Sharing. Limited clinical data were obtained for each heart per approved Institutional Review Board protocol (table S1). All samples were stored at −80°C until processing.

Single-nucleus ATAC-seq

Combinatorial barcoding snATAC-seq was performed as described previously (21, 27, 28) with slight modifications. Nuclei were isolated in gentleMACS M tubes (Miltenyi) on a gentleMACS Octo dissociator (Miltenyi) using the “Protein_01_01” protocol in magnetic-activated cell sorting (MACS) buffer [5 mM CaCl2, 2 mM EDTA, 1× protease inhibitor (Roche, 05-892-970-001), 300 mM MgAc, 10 mM tris-HCl (pH 8), and 0.6 mM dithiothreitol (DTT)]. Nuclei were pelleted with a swinging-bucket centrifuge (500g, 5 min, 4°C; 5920R, Eppendorf) and resuspended in 1 ml of nuclear permeabilization buffer [1× phosphate-buffered saline (PBS), 5% bovine serum albumin (BSA), 0.2% IGEPAL CA-630 (Sigma-Aldrich), 1 mM DTT, and 1× protease inhibitor]. Nuclei were rotated at 4°C for 5 min before being pelleted again with a swinging-bucket centrifuge (500g, 5 min, 4°C; 5920R, Eppendorf). After centrifugation, permeabilized nuclei were resuspended in 500 μl of high-salt tagmentation buffer [36.3 mM tris acetate (pH 7.8), 72.6 mM potassium acetate, 11 mM Mg acetate, and 17.6% N,N′-dimethylformamide] and counted using a hemocytometer. Concentration was adjusted to 2000 nuclei/9 μl, and 2000 nuclei were dispensed into each well of a 96-well plate per sample (96 tagmentation wells per sample; samples were processed in batches of two to four samples). For tagmentation, 1 μl of barcoded Tn5 transposomes (table S20) was added using a BenchSmart 96 (Mettler Toledo), mixed five times, and incubated for 60 min at 37°C with shaking (500 rpm). To inhibit the Tn5 reaction, 10 μl of 40 mM EDTA (final 20 mM) was added to each well with a BenchSmart 96 (Mettler Toledo), and the plate was incubated at 37°C for 15 min with shaking (500 rpm). Next, 20 μl of 2× sort buffer (2% BSA and 2 mM EDTA in PBS) were added using a BenchSmart 96 (Mettler Toledo). All wells were combined into a separate fluorescence-activated cell sorter tube for each sample and stained with DRAQ7 at 1:150 dilution (Cell Signaling Technology). Using an SH800 (Sony), 20 nuclei per sample were sorted per well into eight 96-well plates (total of 768 wells) containing 10.5 μl of Elution Buffer (EB; Qiagen) [25 pmol of primer i7, 25 pmol of primer i5, and 200 ng of BSA (Sigma-Aldrich)]. During the sort, nuclei with two to eight copies of DNA (2-8n) were included because cardiomyocyte nuclei in human hearts are often polyploid (16). Preparation of sort plates and all downstream pipetting steps were performed on a Biomek i7 automated workstation (Beckman Coulter). After addition of 1 μl of 0.2% SDS, samples were incubated at 55°C for 7 min with shaking (500 rpm). One microliter of 12.5% Triton X was added to each well to quench the SDS. Next, 12.5 μl of NEBNext High-Fidelity 2× PCR Master Mix (New England Biolabs) were added, and samples were PCR-amplified [72°C 5 min, 98°C 30 s, (98°C 10 s, 63°C 30 s, 72°C 60 s) × 12 cycles, held at 12°C]. After PCR, all wells were combined. Libraries were purified according to the MinElute PCR Purification Kit manual (Qiagen) using a vacuum manifold (QIAvac 24 Plus, Qiagen), and size selection was performed with SPRISelect reagent (Beckman Coulter; 0.55× and 1.5×). Libraries were purified one more time with SPRISelect reagent (Beckman Coulter; 1.5×). Libraries were quantified using a Qubit fluorimeter (Life Technologies), and a nucleosomal pattern of fragment size distribution was verified using a TapeStation (High Sensitivity D1000, Agilent). Libraries were sequenced on a NextSeq 500 sequencer (Illumina) using custom sequencing primers with the following read lengths: 50 + 10 + 12 + 50 (read 1 + index 1 + index 2 + read 2). Primer and index sequences are listed in table S20.

Single-nucleus RNA-seq

Nuclei were isolated from heart tissue using a gentleMACS dissociator (Miltenyi). Frozen heart tissue (~40 mg) was suspended in 2 ml of MACS dissociation buffer [5 mM CaCl2 (G-Biosciences, R040), 2 mM EDTA (Invitrogen, 15575-038), 1× protease inhibitor (Roche, 05-892-970-001), 3 mM MgAc (Grow Cells, MRGF-B40), 10 mM tris-HCl (pH 8) (Invitrogen, 15568-075), 0.6 mM DTT (Sigma-Aldrich, D9779), and ribonuclease (RNase) inhibitor (0.2 U/μl; Promega, N251B) in water (Corning, 46-000-CV)] and placed on wet ice. Next, samples were homogenized using a gentleMACS dissociator (Miltenyi) with gentleMACS M tubes (Miltenyi, 130-096-335) and the Protein_01_01 protocol. Suspension was filtered through a 30 μM CellTrics filter (Sysmex, 04-0042-2316). M tube and filter were washed with 3 ml of MACS dissociation buffer and combined with the suspension. Suspension was centrifuged in a swinging-bucket centrifuge (Eppendorf, 5920R) at 500g for 5 min (4°C, ramp speed of 3/3). Supernatant was carefully removed and pellet was resuspended in 500 μl of nuclei permeabilization buffer[(0.1% Triton X-100 (Sigma-Aldrich, T8787), 1× protease inhibitor (Roche, 05-892-970-001), 1 mM DTT (Sigma-Aldrich, D9779), RNase inhibitor (0.2 U/μl; Promega, N251B), and 2% BSA (Sigma-Aldrich, SRE0036) in PBS]. Sample was incubated on a rotator for 5 min at 4°C and then centrifuged at 500g for 5 min (Eppendorf, 5920R; 4°C, ramp speed of 3/3). Supernatant was removed and pellet was resuspended in 600 to 1000 μl of sort buffer [1 mM EDTA and RNase inhibitor (0.2 U/μl) in 2% BSA (Sigma-Aldrich, SRE0036) in PBS] and stained with DRAQ7 (1:100; Cell Signaling Technology, 7406). Seventy-five thousand nuclei were sorted using an SH800 sorter (Sony) into 50 μl of collection buffer [RNase inhibitor (1 U/μl) and 5% BSA (Sigma-Aldrich, SRE0036) in PBS]. Sorted nuclei were then centrifuged at 1000g for 15 min (Eppendorf, 5920R; 4°C, ramp speed of 3/3), and supernatant was removed. Nuclei were resuspended in 18 to 25 μl of reaction buffer [RNase inhibitor (0.2 U/μl) and 1% BSA (Sigma-Aldrich, SRE0036) in PBS] and counted using a hemocytometer. Twelve thousand nuclei were loaded onto a Chromium controller (10x Genomics). Libraries were generated using the Chromium Single Cell 3′ Library Construction Kit v3 (10x Genomics, 1000078) according to the manufacturer specifications. Complementary DNA was amplified for 12 PCR cycles. SPRISelect reagent (Beckman Coulter) was used for size selection and cleanup steps. Final library concentration was assessed by the Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific), and fragment size was checked using TapeStation High Sensitivity D1000 (Agilent) to ensure that fragment sizes were distributed normally around 500 bp. Libraries were sequenced using a NextSeq 500 or HiSeq 4000 (Illumina) using these read lengths: read 1, 28 cycles; read 2, 91 cycles; and index 1, 8 cycles.

hPSC culture

An engineered H9-hTnnTZ-pGZ-D2 hPSC transgenic reporter line was purchased from WiCell and maintained on Geltrex (Gibco) precoated tissue culture plates in E8 medium (68) containing Dulbecco’s modified Eagle’s medium/F12, l-ascorbic acid-2-phosphate magnesium (64 mg/liter), sodium selenium (14 μg/liter), fibroblast growth factor 2 (100 μg/liter), insulin (19.4 mg/liter), NaHCO3 (543 mg/liter), transferrin (10.7 mg/liter), and transforming growth factor–β1(2 μg/liter). Cells were passaged every 3 to 5 days upon reaching ~80% confluency. For single-cell passaging experiments, cells were incubated with prewarmed TrypLE Select Enzyme, no phenol red (1 ml per well of a six-well plate) for 2 to 3 min in a 37°C, 5% CO2 incubator. Following incubation, cells were triturated to create a single-cell suspension and cultured in E8 medium supplied with Rock inhibitor (69) for 18 to 24 hours after split, followed by daily feeding with E8 medium.

In vitro cardiomyocyte differentiation

The H9-hTnnTZ-pGZ-D2 cell line was differentiated into beating cardiomyocytes using a previously reported Wnt-based monolayer differentiation protocol (70). Briefly, the H9-hTnnTZ-pGZ-D2 cell line was cultured in E8 medium for 3 to 10 passages. Before differentiation, hPSCs were seeded at a density of 350,000 to 400,000 cells per well of a 12-well plate and cultured for 2 days. For direct differentiation, cells were treated with 10 μM CHIR99021 (Fisher Scientific, no. 442350) in RPMI/B-27 without insulin. Fresh RPMI/B-27 without insulin media was replaced at post 24 hours, and cells were then cultured for 2 days. At day 3, cells were treated with 5 μM IWP2 (TOCRIS, no. 353310) in conditional medium and RPMI/B-27 without insulin 1:1 mix medium for another 2 days. At day 5, cells were exposed to fresh RPMI/B-27 without insulin media again for 2 days. Then, fresh RPMI/B-27 with insulin media was used and replenished every 2 days. Contracting cardiomyocytes were usually observed at days 7 and 8. D25 in vitro cardiomyocytes were purified using a PSC-derived cardiomyocyte isolation kit, human (Miltenyi Biotec, 130-110-188) and used for real-time qPCR (RT-qPCR).

Luciferase reporter assay

A genomic region harboring the KCNH2 intronic enhancer (containing the risk allele rs7789146-G/rs7789585-G) was amplified by nested PCR (KCNH2-E-cF, CTGGCTGAAGACACCTTACTTT; KCNH2-E-cR, ACGGAGCAGTCAAGGAAAC and KCNH2-In-cF, CGGGGTACCCCTCCGTAAATGAGGTGCTATC; KCNH2-In-cR, CCCTCGAGACGGAGCAGTCAAGGAAAC) using the genomic DNA of H9-hTnnTZ-pGZ-D2 transgenic cells as a template and cloned into pGL4.23 [luc2/minP] (Promega, catalog no. E8411) luciferase reporter vector. Synthetic DNA containing the KCNH2 intronic 5′-half enhancer (rs7789045-rs7789690) with the nonrisk/nonrisk allele (rs7789146-A/rs7789585-A), nonrisk/risk allele (rs7789146-A/rs7789585-G), and risk/nonrisk allele (rs7789146-G/rs7789585-A) were purchased from Integrated DNA Technologies. KCNH2 intronic 3′-half enhancer (rs7789654-rs7790480) was amplified by PCR (KCNH2-R-In-cF, GCTGTGCAGTGTCAGGTTAT; KCNH2-In-cR, CCCTCGAGACGGAGCAGTCAAGGAAAC). Then, the whole KCNH2 intronic enhancer with the nonrisk/nonrisk allele (rs7789146-A/rs7789585-A), nonrisk/risk allele (rs7789146-A/rs7789585-G), and risk/nonrisk allele (rs7789146-G/rs7789585-A) were generated by second PCR (KCNH2-In-cF, CGGGGTACCCCTCCGTAAATGAGGTGCTATC; KCNH2-In-cR, CCCTCGAGACGGAGCAGTCAAGGAAAC). One day before transfection, 3 × 105 of D15 in vitro differentiated cardiomyocytes were plated in a Geltrex-coated 24-well plate. Cardiomyocytes were transfected with 500 ng of pGL4.23 plasmid (either empty, KCNH2 enhancer with G/G allele, A/A allele, or mix) and 10 ng of TK:Renilla-luc as an internal control using Lipofectamine Stem Transfection Reagent (Invitrogen, no. STEM00003). Media were replaced with fresh media at 24 hours after transfection. At 72 hours after transfection, media were removed and the cells were washed with PBS. Luminescence was measured using a Dual-Luciferase Reporter Assay System (Promega, no. E2920) according to the manufacturer’s protocol.

CRISPR-mediated genome editing experiments

To interrogate the functional significance of the AF-associated risk variant–containing cCRE at the KCNH2 locus, the cCRE sequence was genetically deleted in H9-hTnnTZ-pGZ-D2 transgenic hPSCs using an efficient CRISPR-Cas9–mediated knockout system (57, 71). Two adjacent guide RNAs (gRNAs) (KCNH2-enh gRNA-1, CTCATTTACGGAGGAGCGCA; KCNH2-enh gRNA-2, TACAGTGGCCTTCTAGACGA) targeting the cCRE were designed using a web-based software tool CRISPOR (72) based on targeting region of interest and minimizing potential off-target effects. The identified gRNAs were then synthesized in vitro using the GeneArt Precision gRNA Synthesis kit (Invitrogen) according to the manufacturer’s protocol. One day before transfection, 1.5 × 105 H9-hTnnTZ-pGZ-D2 hPSCs were seeded in 12-well plates. A pair of ribonucleoprotein complexes containing 1.2 μg of Cas9 protein (New England Biolabs) and 400 ng of in vitro transcribed gRNA was then transfected (73, 74) using Lipofectamine stem transfection reagent (Invitrogen). Seventy-two hours after the transfection, cells were diluted and clonally expanded another 7 days. Colonies were picked and lysates were prepared after the first passage for genotyping (75) (KCNH2-enh extended forward primer, ACACCTTACTTTGGGTGAGAAG; KCNH2-enh extended reverse primer, AGACAGAGCACAGACCTAGAA; KCNH2-enh internal forward primer, GCTGTGCAGTGTCAGGTTAT; KCNH2-enh internal reverse primer, TCTCCCTCCTTCTCTCTCATTC). After confirmation of genome-edited clones by Sanger sequencing, two transfected wild-type (WT) clones, two heterozygote clones, and two homozygote clones were selected for further functional analysis.

Real-time quantitative polymerase chain reaction

Total RNA was isolated from the cells using TRIzol reagent (Invitrogen). One microgram of total RNA was reverse-transcribed using the iScript Reverse Transcription Supermix kit (Bio-Rad) for RT-qPCR. RT-qPCR was performed using PowerUP SYBR Green Master Mix (Applied Biosystems) in the CFX Connect Real-Time System (Bio-Rad). The results were normalized to the TBP gene. The primers used for RT-qPCR are listed in table S21.

Electrophysiology of cardiomyocytes

Both WT and KCNH2 enhancer knockout D15 in vitro cardiomyocytes were purified using the PSC-derived cardiomyocyte isolation kit, human (Miltenyi Biotec, 130-110-188) and cultured for another 10 to 20 days in a low density before electrophysiological measurements. The single-pipette, whole-cell patch current-clamp technique was used for recordings. Action potentials were recorded with a patch-clamp amplifier (Axopatch 200B, Axon), and experiments were performed at a temperature of 35° ± 0.5°C. Current-clamp command pulses were generated by a digital-to-analog converter (DigiData 1440, Axon), which was controlled by the pCLAMP software (10.3; Axon). Pipettes (resistance, 3 to 5 megohm) were pulled using a micropipette puller (Model P-87, Sutter Instrument Co.). Several minutes after seal formation, the membrane was ruptured by gentle suction to establish the whole-cell configuration for voltage clamping. Subsequently, the amplifier was switched to the current-clamp mode. Cells were paced with 1 Hz and injected current stimuli from 3 to 15 nA for 5-ms duration. Cells were superfused with extracellular solution containing the following: 140 mM NaCl, 5.4 mM KCl, 1.8 mM CaCl2, 1.0 mM MgCl2, 5.5 mM glucose, and 5.0 mM Hepes (pH 7.4 adjusted with NaOH). Pipette solution contained the following: 120 mM K-gluconate, 10 mM KCl, 5 mM NaCl, 10 mM Hepes, 5 mM phosphocreatine, 5 mM ATP-Mg2, and amphotericin (0.44 μM; pH 7.2 adjusted with KOH).

Data analysis

Demultiplexing of snATAC-seq reads. For each sequenced snATAC-Seq library, we obtained four FASTQ files, two for paired-end DNA reads as well as the combinatorial indexes for i5 (768 different PCR indices) and T7 (96 different tagmentation indices; table S20). We selected all reads with ≤2 mistakes per individual index (Hamming distance between each pair of indices is 4) and subsequently integrated the full barcode at the beginning of the read name in the demultiplexed FASTQ files (https://gitlab.com/Grouumf/ATACdemultiplex/).

Filtering of snATAC-seq profiles by transcriptional start site enrichment and unique fragments. TSS (transcriptional start site) positions were obtained from the GENCODE database v31 (76). Tn5-corrected insertions were aggregated ±2000 bp around each TSS genome wide. Then, this profile was normalized to the mean accessibility ± (1900 to 2000) bp from the TSS and smoothed every 11 bp. The maximum value of the smoothed profile was taken as the TSS enrichment. We selected all nuclei that had at least 1000 unique fragments and a TSS enrichment of at least 7 for all datasets.

Clustering strategy for snATAC-seq datasets. We used two rounds of clustering analysis to identify clusters. The first round of clustering analysis was performed on individual samples. We divided the genome into 5-kbp consecutive bins and then scored each nucleus for any insertions in these bins, generating a bin-by-cell binary matrix for each sample. We filtered out those bins that were generally accessible in all nuclei for each sample using a z score threshold of 1.65 (equivalent to a one-tailed P value < 0.05). On the basis of the filtered matrix, we then carried out dimensionality reduction followed by graph-based clustering to identify cell clusters. We called peaks using MACS2 (32) for each cluster using the aggregated profile of accessibility and then merged the peaks from all clusters to generate a union peak list. On the basis of the peak list, we generated a cell-by-peak count matrix and used Scrublet (77) to remove potential doublets with default parameters. Doublet scores returned by Scrublet (77) were then used to fit a two-component Gaussian mixture model using the BayesianGaussianMixture function from the python package scikit-learn (78). Nuclei in the component with the larger mean doublet score were removed from downstream analysis because they likely reflected doublets.

Next, to carry out the second round of clustering analysis, we merged peaks called from all samples to form a reference peak list. We generated a binary cell-by-peak matrix using nuclei from all samples and again performed dimensionality reduction followed by graph-based clustering to obtain the final cell clusters across the entire dataset.

Dimensionality reduction and batch correction of snATAC-seq data. For processing of snATAC-seq data, we adapted our previously published method, SnapATAC (28). To reduce the dimensionality of the peak by cell count matrix, SnapATAC uses spectral embedding for dimensionality reduction. To further increase the performance and scalability of spectral embedding, we applied the Nyström method (79) to enable handling of large datasets. Specifically, we first randomly sampled 35,000 nuclei as training data. We computed the matrix P = D−1S S, where D is the diagonal matrix such that Dii=jSij. The eigendecomposition was performed on P, and the eigenvector with eigenvalue 1 was discarded. From the rest of the eigenvectors, we took k of them corresponding to the largest eigenvalues as the spectral embedding of the training data. We used the Nyström method (79) to extend the embedding to the data outside the training set. Given a set of unseen samples, we computed the similarity matrix S′ between the new samples and the training set. The embedding of the new samples is given by ′ = SUΛ−1, where U and Λ are the eigenvectors and eigenvalues of P obtained in the previous step. To correct for donor/batch-specific effects, after dimensionality reduction, we performed cell grouping on individual samples using k-mean clustering with k equal to 20. We then constructed k-NN graphs for each sample and used the MNN (Mutual Nearest Neighbor) correction method to identify mutual nearest neighbors (80). These mutual nearest neighbors were used as anchors to match the cells between different samples and correct for donor/batch effects as described previously (80).

Clustering of snATAC-seq data. We constructed the k-nearest neighbor graph (k-NNG) using low-dimensional embedding of the nuclei with k equal to 50. We then applied the Leiden algorithm (81) with constant Potts model to find communities in the k-NNG corresponding to the cell clusters. The Leiden algorithm can be configured to use different quality functions. The modularity model is a popular choice, but it is hampered by the resolution limit, particularly when the network is large (82). Therefore, we used the modularity model only in the first round of clustering analysis to identify initial clusters. In the final round of clustering, we chose the constant Potts model as the quality function because it is resolution limit free and is better suited for identifying rare populations in a large dataset (82). Nuclei from two small clusters (280 and 254 nuclei) with low reproducibility and stability were discarded from downstream analysis. Thirty-four nuclei that formed clusters of 1 and 2 nuclei were discarded as well.

Processing and clustering analysis of snRNA-seq datasets. Raw sequencing data were demultiplexed and preprocessed using the Cell Ranger software package v3.0.2 (10x Genomics). Raw sequencing files were first converted from Illumina BCL files to FASTQ files using cellranger mkfastq. Demultiplexed FASTQs were aligned to the GRCh38 reference genome (10x Genomics), and reads for exonic and intronic reads mapping to protein coding genes, long noncoding RNA, antisense RNA, and pseudo-genes were used to generate a counts matrix using cellranger count; expect-cells parameter was set to 5000. A separate counts matrix for each sample was also generated using only reads mapped to intronic regions.

Next, exon and intron count matrices for individual datasets were processed using the Seurat v3.1.4 R package (29) (https://satijalab.org/seurat/) to assess dataset quality. Features represented in at least three cells and barcodes with between 500 and 4000 genes were used for downstream processing. In addition, barcodes with mitochondrial read percentages greater than 5% were removed. Counts were log-normalized and scaled by a factor of 10,000 using NormalizeData. To identify variable genes, FindVariableFeatures was run with default parameters except for nfeatures = 3000 to return the top 3000 variable genes. All genes were then scaled using ScaleData, which transforms the expression values for downstream analysis. Next, principal components analysis was performed using RunPCA with default parameters and the top 3000 variable features as input. The first 20 principal components were used to run clustering using FindNeighbors and FindClusters (parameter res = 0.4). To generate uniform manifold approximation and projection (UMAP) coordinates, RunUMAP was run using the first 20 principal components and with parameters umap.method = “umap-learn” and metric = “correlation”. Doublet scores [pANN (proportion of Artificial Nearest Neighbors)] were generated for cell barcodes using DoubletFinder (https://github.com/chris-mcginnis-ucsf/DoubletFinder) (83) using the parameters pN = 0.15 and pK = 0.005; the anticipated collision rate was set by specifying 2% collisions per thousand nuclei for individual datasets.

Individual datasets were merged together using the merge function in Seurat to combine the count matrices and designate unique barcodes. Cell barcodes with pANN scores greater than 0 were removed from downstream analysis. Metadata were also encoded for each barcode, and the merged dataset was processed in a similar manner as described above; clusters were identified using FindNeighbors and FindClusters (res = 0.8). To generate the UMAP coordinates, the first 14 principal components were used in RunUMAP; the UMAP algorithm for Seurat v3.1.4 uses the uwot R package and that setting was used to generate the coordinates here. To regress out donor specific effects, the Harmony R package (https://github.com/immunogenomics/harmony) (84) was used, and the recomputed principal components were used to recluster the cells and rerun UMAP using the above parameters. For downstream analysis and comparison to snATAC-seq data, we manually combined ventricular cardiomyocyte clusters, atrial cardiomyocyte clusters, fibroblast clusters, and endothelial cell clusters on the basis of shared gene expression patterns (fig. S2, G and H). Cluster-specific genes in the all-transcripts dataset were identified in a global differential gene expression test using FindAllMarkers with parameters logFC = 0.25, min.pct = 0.25, and only.pos = FALSE.

Integration of snRNA-seq and snATAC-seq data. The snRNA-seq and snATAC-seq datasets were used to perform label transfer from the RNA cells onto the snATAC-seq dataset using the Seurat v3.1.4 R package (https://satijalab.org/seurat/) (29). Gene activity scores were calculated using chromatin accessibility in regions from the promoter up to 2 kb upstream for each ATAC nucleus. Activity scores were log-normalized and scaled using NormalizeData and ScaleData. To compare the snRNA and snATAC datasets and identify anchors, FindTransferAnchors was run considering the top 3000 variable features from the snRNA-seq dataset. Anchor pairs were used to assign RNA-seq labels to the snATAC-seq cells using TransferData, with the weight.reduction parameter set to the principal components used in snATAC-seq clustering. The efficacy of integration was assessed by examining the distribution of the maximum prediction scores output by TransferData and the distribution of annotated snATAC-seq identities to the corresponding predicted label.

Creation of a consensus list of heart cCREs. MACS2 (v2.1.2) (32) was used to identify accessible chromatin sites for each cluster with the following parameters: -q 0.01 --nomodel --shift -100 --extsize 200 -g 2789775646 --call-summits --keepdup-all. Estimated genome size was determined to be 2,789,775,646 bp and was indicated by the -g parameter. We next filtered out peaks overlapping with the ENCODE blacklist (hg38; https://github.com/Boyle-Lab/Blacklist/) (85). To generate the union of heart cCREs, we merged the blacklist-filtered peaks obtained for each cluster using the BEDtools merge command with default settings (v2.25.0) (86).

Computing relative accessibility scores for cCREs. To correct biases arising from differential read depth among cells and cell types, we derived a procedure that normalizes chromatin accessibility at cCREs identified by MACS2 peak calling (v2.1.2) (32). We define the set of accessible loci by L, and we define a peak p as a subset of related loci l from L. Let al be the accessibility of accessible locus l and P the set of nonoverlapping peaks used to define the loci. For a given cell type SiS, we computed the median medj number of reads sequenced per cell. For each feature pjP, we computed mij, the average number of reads sequenced from Si, and overlapping pj. We then defined the activity aij of loci pj in Si as aij=106×(1mij)1/medjjP(1mij)1/medj. We then define the relative accessibility score (RAS) Aij=aijiSaij.

K-means clustering of cCREs. We clustered the union of 287,415 cCREs using a K-means clustering procedure. We first created a sparse cell × peak matrix that was transformed into a RAS-normalized cell type × peak matrix. We then performed K-means on the normalized matrix with K from 2 to 12 and computed the Davies-Bouldin (DB) index for each K (87). Let Rxy=(sx+sy)dxy with sx the average distance of each cell of cluster x and dxy the distance between the centroids of clusters x and y. The DB index is defined as DB=1Kx,yKmaxxy(Rxy). We selected K = 9 because it resulted in the lowest DB index that indicates the best partition. We used the python library scikit-learn (78) to compute the K-means algorithm and the DB index (87).

Cell type annotation. We annotated snATAC-seq and snRNA-seq clusters on the basis of chromatin accessibility at promoter regions or expression of known lineage marker genes, respectively. We annotated atrial and ventricular cardiomyocytes on the basis of differential chromatin accessibility and gene expression at NPPA, MYH6, KCNJ3, MYL7, MYH7, HEY2, MYL2, and other reported markers of atrial and ventricular cardiomyocytes (30, 88, 89). We used, for example, the gene DCN to annotate cardiac fibroblasts (90), VWF and EGFL7 for endothelial cells (91, 92), GJA4 and TAGLN for smooth muscle cells (93, 94), CD163 and MS4A6A for macrophages (95, 96), IL7R and THEMIS for lymphocytes (97, 98), ADIPOQ and CIDEA for adipocytes (99, 100), and NRXN3 and GPM6B for a cluster of nervous cells with neuronal and Schwann-like gene expression and chromatin accessibility signatures (10, 11, 101). From snRNA-seq, we identified a population of endothelial-like cells with specific expression of endocardial cell markers NRG3 and NPR3 (102, 103). We also identified subtypes of mesenchymal cells that included myofibroblasts with characteristic expression of embryonic smooth muscle actin MYH10 (104, 105) as well as arterial smooth muscle cells with preferential expression of ACTA2 and TAGLN relative to a larger cluster of pericytes (table S4) (106). snRNA-seq annotations were consistent with recent single-cell transcriptomic analyses of adult human heart tissue (10, 11).

Identification of cell type–specific cCREs. We used edgeR (version 3.24) in R (107) to identify cell type–specific cCREs. For each cCRE, accessibility within a cell type was compared to average accessibility in all other clusters. For each cell type, we created a count table for each cCRE using the following strategy: Each sample was described with a donor and a chamber ID. For each sample ID, we reported read count within (i) the cell type and (ii) the rest of the cell types in aggregate. We used this count matrix as input for edgeR analysis (107) and performed a likelihood ratio test. We considered cCREs with fold change > 1.2 and FDR < 0.01 after Benjamini-Hochberg correction as cell type specific.

Coaccessibility analysis using Cicero. We used the R package Cicero (38) to infer coaccessible chromatin loci. For each chromosome, we used as input the corresponding peaks from our 287,415 cCRE union set and the coordinates of the snATAC-seq UMAP (108). We randomly subsampled 15,000 cells from our aggregate snATAC-seq dataset to construct input matrices for Cicero analysis. We used ±250 kbp as cutoff for coaccessibility interactions. All other settings were default.

Correlation of gene expression and promoter accessibility. We defined promoter regions as TSS ±2 kbp. TSS were extracted from annotation files from GENCODE release 33 (76). We identified promoter-overlapping peaks using BEDtools (86) and a custom script (see the “Code availability” section). For each overlapping pair (peak and promoter) identified, we kept only the open chromatin site closest to the TSS to obtain a 1:1 correspondence between genes and open chromatin peaks. We then used the RAS and the cluster-scaled gene expression (29) to create feature × cell type matrices for RNA-seq and ATAC-seq datasets. We then used these matrices to create heatmaps and to perform ATAC-seq/RNA-seq cluster correlation analysis using the Pearson similarity metric. For each cell type, we computed the Pearson correlation score between the RAS vector of the 7081 promoters and the scaled expression vector of the corresponding 7081 genes identified via the 1:1 correspondence method described above.

DA cCREs between heart chambers. Between-heart chamber differential accessibility analysis was performed for five cell types from our aggregated snATAC-seq dataset. We considered only cell types that had a representation of at least 50 nuclei per dataset and at least 300 nuclei across each tested condition. The cell types that met these inclusion criteria included cardiomyocytes, fibroblasts, endothelial cells, smooth muscle cells, and macrophages. Within each cell type, a generalized linear model framework was used using the R package edgeR (107). All fragments for a given cell type were aggregated in the .bed format. MACS2 (32) was used to call peaks on the aggregate .bed file for each cell type with the parameters specified above. NarrowPeak output bed files were used for differential accessibility testing. The aggregate .bed file for each cell type was then partitioned on the basis of dataset of origin using nuclear barcodes. The “coverage” option of the BEDtools package (86) was applied with default settings to count the total number of chromatin fragments from each dataset overlapping narrowPeaks called on the aggregate .bed file for the corresponding cell type. This yielded a raw count matrix in the format of snATAC-seq datasets (columns) by narrowPeaks (rows) for each cell type. The raw count matrix was used as input for edgeR analysis. To filter low-coverage peaks from our analysis, we used the “filterByExpr” command within edgeR with default settings. We applied an average prior count of one during fitting of the generalized linear model to avoid inflated fold changes in instances for which peaks lacked coverage for one but not both tested conditions. We modeled chromatin accessibility at each peak as a function of heart chamber (group) with sex as a covariate. The generalized linear model was expressed as follows in edgeR notation:design<model.matrix(sex+group)y<estimateDisp(y,design,prior.count=1)glmFit(y,design)

Significance was tested using a likelihood ratio test. To account for testing multiple hypotheses, a Benjamini-Hochberg significance correction was applied for all cCREs tested within each considered cell type. Any cCRE with an absolute log2(fold change) > 1 and an FDR-corrected P value < 0.05 was considered significant.

Gene expression analysis of genes coaccessible with DA cCREs. To compare the expression of genes coaccessible with heart chamber–dependent distal DA cCREs (outside ±2 kb of TSS) in cardiomyocytes and fibroblasts, we performed differential expression testing for all genes between indicated heart chambers using Wilcoxon rank sum test in Seurat (29). Genes with an absolute fold change > 1.5 and an FDR-adjusted P value < 0.05 were considered differentially expressed. We then tested the resulting genes for coaccessibility (38) with distal DA cCREs at a coaccessibility score threshold of 0.1 and displayed scaled gene expression values from Seurat for the indicated differentially expressed genes linked to chamber-dependent distal DA cCREs.

Genomic Regions Enrichment of Annotations Tool ontology analysis. The Genomic Regions Enrichment of Annotations Tool (GREAT; http://great.stanford.edu/public/html/index.php) (33) was used with default settings for indicated cCREs or candidate enhancers in the .bed format. Biological process enrichments are reported. P values shown for enrichment are Bonferroni-corrected binomial P values.

Motif enrichment analysis. For de novo and known motif enrichment analysis of cluster-specific cCREs, the findMotifsGenome.pl utility of the HOMER package was used with default settings (35). For display of enrichment patterns for motifs from the JASPAR (109) database with evidence of enrichment in at least one set of cell type–specific cCREs, motifs with an enrichment P value < 10−5 in at least one set of cluster-specific cCREs were selected. For motif enrichment within DA cCREs, narrowPeak calls from MACS2 were used as input, with peaks called on the corresponding cell type (as described above) used as background. For enrichment of motifs within cell type–attributed bulk enhancers, snATAC-seq peaks from the union of snATAC-seq peaks were used. Summits were extracted from peaks that overlapped bulk enhancer annotations and extended by 250 bp on either side to obtain fixed-width peaks. We also computed motif enrichment scores at single-cell resolution using chromVAR (34).

For input to chromVAR, we used the summits of the 287,415 peaks in our consensus list extended by 250 bp in either direction and a set of 870 nonredundant motifs as input (https://github.com/GreenleafLab/chromVARmotifs). To identify differentially enriched motifs in each cell type, we used the following strategy: For each cell type and each motif, we computed a rank sum test between the chromVAR z score distributions from cells within the cell type and outside of the cell type. Tests were run using a random sampling of 40,000 cells. Then, for each cell type, we used 1 × 10−8 as P value cutoff. In addition, we applied a Bonferroni correction to account for multiple testing, which resulted in selection of significant motifs with P value < 1 × 10−11.

Measuring single-cell chromatin accessibility signal within bulk candidate heart enhancers. We obtained published candidate heart enhancers annotated by H3K27ac ChIP-seq from a recently reported bulk survey of healthy left ventricular tissue from 18 human donors (15). Candidate enhancers were defined per the study as H3K27ac ChIP-seq peaks that were at least 1 kb away from a TSS and present in two or more donors. Because these reference annotations were derived from bulk profiling of healthy left ventricles, we selected only left ventricular nuclei from our aggregate dataset for comparison. We limited our analysis to cell types that comprised at least 5% of nuclei by proportion in our aggregate dataset. These included cardiomyocytes, fibroblasts, endothelial cells, smooth muscle cells, and macrophages. We first combined all fragments for each cell type from left ventricular datasets. The coverage option of BEDtools (86) was applied with default settings to count the total number of chromatin fragments from each ventricular cell type overlapping the candidate enhancer annotations. This yielded a raw count matrix in the format of snATAC-seq cell types (columns) by candidate enhancers (rows). The raw count matrix was normalized to RPKM (reads per kilobase per million mapped reads) for each candidate enhancer. We next used Cluster3.0 (110) to k-means cluster the 31,033 healthy heart candidate enhancers into K groups between 2 and 12 with the following settings (Method = k-Means, Similarity Metric = Euclidian distance, number of runs = 100). We calculated the DB index (87) as described above for each clustering using the index.DB function of the R package clusterSim (http://keii.ue.wroc.pl/clusterSim/). We selected a k-means of 8, which yielded the lowest DB index, indicating the best partitioning.

We repeated the above analysis for 4406 candidate enhancers reported to have increased bulk H3K27ac ChIP signal and 3101 candidate enhancers reported to have decreased signal in 18 late-stage idiopathic dilated cardiomyopathy (heart failure) left ventricles versus 18 healthy control left ventricles reported in the same study. We again clustered the candidate enhancers for both groups into k groups between 2 and 12 as above and selected the clustering that yielded the lowest DB index (87).

GWAS variant enrichment analysis. We used LD score regression (48, 111) to estimate genome-wide enrichment for GWAS traits using annotation sets from single-cell chromatin accessibility from the heart or lung (54) or bulk DNase hypersensitivity sites for tissues from ENCODE (3, 5, 17). For bulk DNase-seq datasets, peak annotations were merged across biological replicates from the same tissue type. We compiled published GWAS summary statistics for cardiovascular diseases (4953), other diseases (112122), and nondisease traits (123132) using the European subset from transethnic studies where applicable. We created custom LD score files by using peaks from each cell type or tissue as a binary annotation. As background, we used baseline annotations included in the baseline-LD model v2.2. For each trait, we used LD score regression to estimate enrichment coefficient z scores for each annotation relative to the background. Using these z scores, we computed two-sided P values for enrichment and used the Benjamini-Hochberg procedure to correct for multiple tests within each set of annotations.

Fine mapping for AF. We obtained published AF GWAS summary statistics and index variants for 111 disease-associated loci (50). To construct credible sets of variants for each locus, we first extracted all variants in LD (r2 > 0.1 using the EUR (European) subset of 1000 Genomes phase 3) (133) in a large window (±2.5 Mb) around each index variant. We next calculated approximate Bayes factors (ABF) (55) for each variant using effect size and SE estimates. We then calculated PPAs for each variant by dividing its ABF by the sum of ABF for all variants within the locus. For each locus, we then defined 99% credible sets by sorting variants by descending PPA and retaining variants that added up to a cumulative PPA of >0.99. This resulted in an output of 6014 candidate causal variants.

Variant prioritization for functional validation. To prioritize variants for functional validation, we refined our list of candidate causal variants from fine mapping analysis to only those with a PPA > 0.1 (216 remaining of 6014). We used BEDtools (86) to intersect these variants with ATAC-seq peaks called on an aggregate .bed file for atrial and ventricular cardiomyocyte snATAC-seq clusters (cardiomyocyte cCREs). This resulted in 40 fine-mapped variants that resided within 38 candidate cardiomyocyte cCREs (38 cCRE-variant pairs).

We assessed each remaining cCRE-variant pair via the following criteria:

1) cCREs primarily accessible in cardiomyocytes

2) presence of a corresponding ATAC-seq peak at a testable time point in the in vitro hPSC-cardiomyocyte differentiation model system

3) sequence conservation in 100 vertebrates [genome browser track generated using phyloP of the PHAST5 package downloaded from UCSC genome browser (134), http://hgdownload.soe.ucsc.edu/goldenPath/hg38/phyloP100way/]

4) predicted coaccessibility of candidate enhancer with a gene promoter

5) expression of putative target gene associated with cCRE appearance (chromatin accessibility and H3K27ac) during hPSC-cardiomyocyte differentiation (57)

A candidate cCRE-variant pair at the KCNH2 locus was prioritized for functional experimentation.

ChIP-seq data processing. Reads were mapped to the human genome reference GRCh38 using Bowtie2 (version 2.2.6) (135) and reads with MAPQ (mapping quality) > 30 selected using SAMtools (version 1.3.1) (136). PCR duplicates were removed using the MarkDuplicates function of Picard tools (version 1.119) (137). RPKM-normalized signal tracks were generated using the BamCoverage function in deepTools (version 2.4.1) (138).

RNA-seq data processing. Reads were mapped to the human genome reference GRCh38 using STAR (version 020201) (139) and reads with MAPQ > 30 selected using SAMtools (version 1.3.1) (136). PCR duplicates were removed using the MarkDuplicates function of Picard tools (version 1.1.19) (137). RPKM-normalized signal tracks were generated using the BamCoverage function in deepTools (version 2.4.1) (138).

ATAC-seq data processing. Reads were mapped to the human genome reference GRCh38 using Bowtie2 (version 2.2.6) (135) and reads with MAPQ > 30 selected using SAMtools (version 1.3.1) (136). PCR duplicates were removed using SAMtools (version 1.3.1) (136). RPKM-normalized signal tracks were generated using the BamCoverage function in deepTools (version 2.4.1) (138).

Statistical analysis. No statistical methods were used to predetermine sample sizes. There was no randomization of the samples, and investigators were not blinded to the specimens being investigated. However, clustering of single nuclei based on chromatin accessibility was performed in an unbiased manner, and cell types were assigned after clustering. Low-quality nuclei and potential barcode collisions were excluded from downstream analysis as outlined above. Cluster specificity at each cCRE was tested using edgeR (107) as described above, with P values corrected via the Benjamini-Hochberg procedure. To identify DA sites between the heart chambers and for each cell type, a likelihood ratio test was used and the resulting P value was corrected using the Benjamini-Hochberg procedure. For significance of ontology enrichments using GREAT, Bonferroni-corrected binomial P values were used (33). For significance testing of enrichment of de novo and known motifs, a hypergeometric test was used without correction for multiple testing (35). For luciferase and qPCR data, we performed one-way ANOVA (analysis of variance) analysis with post hoc Tukey test using GraphPad Prism version 8.0.0 for Windows (GraphPad Software, San Diego, CA, USA; www.graphpad.com).

External datasets. Cardiomyocyte differentiation: RNA-seq, H3K27ac day 0 (hPSC); day 5 (cardiac mesoderm); and day 15 (primitive cardiomyocytes) were downloaded from GSE116862 (57). Signal tracks for heart H3K27ac ChIP-seq data were downloaded from https://portal.nersc.gov/dna/RD/heart/. List of candidate enhancers was downloaded from supplementary tables (15). H3K27ac ChIP-seq data for cardiomyocyte nuclei from nonfailing donors (NF1) were downloaded from National Center for Biotechnology Information Sequence Read Archive BioProject ID PRJNA353755 (140). We acquired snATAC-seq data for human lung from GSE161383 (54) and bulk DNase-seq datasets for human tissues from ENCODE (3, 5, 17) with the following identifiers: ENCSR053ZKP, ENCSR259GYP, ENCSR277KRY, ENCSR458AOS, ENCSR597NVK, ENCSR422IIZ, ENCSR968TPO, ENCSR788IZL, ENCSR859LTL, ENCSR060HPL, ENCSR783OCW, ENCSR171ADO, ENCSR171ETY, ENCSR520BAD, ENCSR686WJL, ENCSR791BHE, ENCSR856XLJ, ENCSR361DND, ENCSR579KDC, ENCSR693UHT, ENCSR365NDK, ENCSR712PYJ, ENCSR930PDT, ENCSR178JBL, ENCSR595HZQ, ENCSR261RWJ, ENCSR455GUW, ENCSR689DSM, ENCSR954AJK, ENCSR564FZH, ENCSR000ELO, ENCSR909HFI, ENCSR931UQB, ENCSR128GBN, ENCSR006IMH, ENCSR163PKT, ENCSR641ZPF, ENCSR782SSS, ENCSR101QXF, ENCSR709IYR, ENCSR866ODX, ENCSR195ONB, ENCSR450PWF, ENCSR549NRK, ENCSR749MUH, ENCSR080ISA, ENCSR090IDV, ENCSR102RSU, ENCSR401ESD, ENCSR484UAU, ENCSR508FVM, ENCSR340MRJ, ENCSR760QZM, ENCSR763AKE, ENCSR923JYH, ENCSR164WOF, ENCSR323UTX, ENCSR650FLQ, ENCSR702DPD, ENCSR129BZE, and ENCSR437AYW.

Code availability. The pipeline for processing snATAC-seq data is available as a part of the Taiji software: https://taiji-pipeline.github.io/. Custom code used for demultiplexing and downstream analysis for snATAC data is available here: https://gitlab.com/Grouumf/ATACdemultiplex/-/tree/master/ATACdemultiplex and https://gitlab.com/Grouumf/ATACdemultiplex/-/blob/master/scripts/.

SUPPLEMENTARY MATERIALS

Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/7/20/eabf1444/DC1

https://creativecommons.org/licenses/by-nc/4.0/

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.

REFERENCES AND NOTES