Research ArticleDISEASES AND DISORDERS

Single-cell RNA-seq reveals ectopic and aberrant lung-resident cell populations in idiopathic pulmonary fibrosis

See allHide authors and affiliations

Science Advances  08 Jul 2020:
Vol. 6, no. 28, eaba1983
DOI: 10.1126/sciadv.aba1983
  • Fig. 1 Profiling human lung heterogeneity with scRNA-seq.

    (A) Overview of experimental design. (i) Disease lung explants and unused donor lungs collected. (ii) Lungs dissociated to single-cell suspension. (iii) Droplet-based scRNA-seq library preparation (iv) sequencing. (v) Exploratory analysis. (vi) Spatial localization with IHC. (B) Uniform Manifold Approximation and Projection (UMAP) representation of 312,928 cells from 32 IPF, 18 COPD, and 28 control donor lungs; each dot represents a single cell, and cells are labeled as one of 38 discrete cell varieties. AT, alveolar type; cDC, classical dendritic cell; pDC, plasmacytoid dendritic cell; M, macrophage; NK, natural killer; ILC, innate lymphoid cell; PNEC, pulmonary neuroendocrine cell; SMC, smooth muscle cell; VE, vascular endothelial. (C) Heat map of marker genes for all 38 identified cell types, categorized into four broad cell categories. Each cell type is represented by the top five genes ranked by false discovery rate (FDR) adjusted P value of a Wilcoxon rank sum test between the average expression per subject value for each cell type against the other average subject expression of the other cell types in their respective grouping. Each column represents the average expression value for one subject, hierarchically grouped by disease status and cell type. Gene expression values are unity normalized from 0 to 1 across rows within each categorical cell type group.

  • Fig. 2 Identification of aberrant basaloid cells in IPF and COPD lungs.

    (A) UMAPs of 21,184 epithelial cells from 32 IPF, 18 COPD, and 28 control lungs labeled by cell type (top left), disease status (bottom right), and subject (bottom right). In the subject plot, each color depicts a distinct subject. (B) Boxplots representing the nonzero percent makeup distributions of epithelial cell types as a proportion of all sampled epithelial cells per subject within each disease group. Each dot represents a single subject, and whiskers represent 1.5 × interquartile range (IQR). FDR-adjusted Wilcoxon rank sum test results comparing IPF and control proportions are reported in data S12. (C) Heat map of average gene expression and predicted transcription factor activity per subject across each of the identified epithelial cell types. Columns are hierarchically ordered by disease status and cell type. The average gene expression per subject per cell type is unity normalized between 0 and 1 across samples. Top (green): Transcription factor signatures predicted by analysis with pySCENIC (43), and z scores are calculated across samples. Right: Zoom annotation of distinguishing markers for aberrant basaloid cells. (D) IHC staining of aberrant basaloid cells in IPF lungs: epithelial cells covering fibroblast foci are p63+ KRT17+ basaloid cells staining COX2-, p21-, and HMGA2-positive, while basal cells in bronchi do not. (E) Correlation matrix of epithelial cell populations were identified and reannotated in an independent dataset (3) with analogous cell types from our data. Matrix cells are colored by Spearman’s rho, and cell populations are ordered with hierarchical clustering. The origin dataset for each cell population is denoted by in the annotation bars.

  • Fig. 3 Identification of disease-enriched VE cell population.

    (A) UMAPs of 14,985 endothelial and mesenchymal cells from 32 IPF, 18 COPD, and 27 control lungs labeled by cell type, disease status, and subject. In the subject plot, each color was represented by a unique color. The dotted line delineates the VE cells from the other stromal and endothelial cell types. (B) Heat map representing characteristics of five subtypes of VE cells. Gene expression is unity normalized between 0 and 1 across VE cells. Each column represents an individual cell information regarding subject and disease state, and then VE type is represented in the colored annotation bars above. Each subject is represented by a unique color. (C) Boxplots representing the nonzero percent makeup distributions of each VE cell type among all VE cells per subject organized by disease group. Each dot represents a single subject, and whiskers represent 1.5× IQR. FDR-adjusted Wilcoxon rank sum test results comparing IPF and control proportions are reported in data S12. (D) IHC staining for CD31 (PECAM1) and COL15A1 in control distal lung, control proximal lung, and affected regions of distal IPF lungs. Arrows indicate vessels with positive COL15A1 staining. (E) Violin plots of expression of pan-VE markers and pVE-specific markers across VE cells from distal and airway lung samples from independent dataset (20).

  • Fig. 4 IPF fibroblast and myofibroblast archetype analysis.

    (A) Heat map of unity-normalized gene expression of curated markers observed to delineate myofibroblast and fibroblast; each column is representative of the average expression value per cell type for one subject. (B) Top: UMAPs of 6166 myofibroblast and fibroblast cells from 32 IPF, 18 COPD, and 26 control lungs labeled by cell type, disease, and unsupervised Louvain subclusters. Bottom: Partition graph abstraction (PAGA) analysis. Nodes represent subclusters, and edges represent the probability of internode overlap based on the underlying network of cell neighborhoods. (C) UMAPs of myofibroblast and fibroblast cells following diffusion map implementation labeled by cell type, disease status, and subject. In the subject plot, each color represents a unique subject. (D and E) Heat maps of myofibroblast and fibroblast, respectively, with cells ordered by diffusion pseudotime (DPT) distances along UMAP manifolds representing the continuum of observed cellular phenotypes spanning from control-enriched phenotypes toward IPF-enriched archetypes (left to right). The arrows on each UMAP indicate the orientation of DPT cell ordering, matching the arrows above each heat map’s annotation bars. The heat map’s annotation bars represent the DPT distance, subject, and disease status for each cell. Expression values are centered and scaled across cells within each cell type.

  • Fig. 5 Immune analysis confirms profibrotic macrophage archetype in IPF.

    (A) UMAP of 271,481 immune cells from 32 IPF, 18 COPD, and 28 control lungs labeled by cell type. (B) Boxplots representing the nonzero percent makeup distributions of individual varieties of immune cell as a proportion of all immune cells per subject within each disease group. Each dot represents a single subject, and whiskers represent 1.5 × IQR. FDR-adjusted Wilcoxon rank sum test results comparing IPF and control proportions are reported in data S12. (C) UMAPs of 124,470 classical monocytes and monocyte-derived macrophage cells from 32 IPF, 18 COPD, and 28 control cells labeled by cell type, disease, and subject. Each color represents a unique subject. (D) Archetype analysis of classical monocytes and two macrophage archetypes. Cells are assigned colors along three gradients with a ternary plot based on each cell’s relative DPT distance from three reference points in the UMAP: a monocyte terminus (1, cyan), an inflammatory macrophage archetype (2, yellow), and a profibrotic, IPF-enriched macrophage (3, magenta). Distance color assignments are also projected onto cells for UMAP visualization. UMAP arrows represent the orientation of DPT cell ordering at each terminus and match the arrows above the heat map annotation bar. In the heat map, each column represents a single cell whose respective subject, disease, and DPT color assignment are in the annotation bar above. Macrophage cells belonging to two separate, single-subject–enriched archetypes are removed from analysis. (E) UMAPs of monocyte and macrophage cells colored by gene expression values of features that are associated with increasing aberrancy along the IPF archetype manifold.

  • Fig. 6 GRN analysis of IPF and control lungs.

    (A) Overview of bigSCale method for computing GRN. (i) Cells are recursively clustered down to subclusters. (ii) Z scores are calculated on the basis of differential expression between subclusters (iii) correlations between all genes via Pearson and cosine. (iv) Correlation edges are thresholded using the top 99.3% quantile of correlation coefficients; only edges where at least one node has a Gene Ontology (GO) annotation as a gene regulator. (B) Summary of network structure for control and IPF GRNs. (C) GRNs of control and IPF lung cells. Nodes represent genes, and edges represent correlations of putative regulatory relationships. Nodes sizes correspond to PageRank centralities, and the largest clusters are assigned colors to their nodes, with each color representing a distinct cluster. The top cell types relevant to each highlighted cluster are shown. Behind each highlighted cluster is a polygon shape covering the domain of the cluster colored by the category of cell type that is predominantly relevant to the community. (D) The same GRNs with the top 300 nodes ranked by differential PageRank centrality between IPF and control highlighted in red. Node sizes correspond to PageRank centralities. (E) Selected results from GO gene set enrichment of the top 300 differential PageRank nodes between IPF and controls, with all nodes used as a reference. TGF-β, transforming growth factor–β; BMP, bone morphogenetic protein.

Supplementary Materials

  • Supplementary Materials

    Single-cell RNA-seq reveals ectopic and aberrant lung-resident cell populations in idiopathic pulmonary fibrosis

    Taylor S. Adams, Jonas C. Schupp, Sergio Poli, Ehab A. Ayaub, Nir Neumark, Farida Ahangari, Sarah G. Chu, Benjamin A. Raby, Giuseppe DeIuliis, Michael Januszyk, Qiaonan Duan, Heather A. Arnett, Asim Siddiqui, George R. Washko, Robert Homer, Xiting Yan, Ivan O. Rosas, Naftali Kaminski

    Download Supplement

    The PDF file includes:

    • Figs. S1 to S9
    • Table S1
    • Legends for data S1 to S13
    • References

    Other Supplementary Material for this manuscript includes the following:

    Files in this Data Supplement:

Stay Connected to Science Advances

Navigate This Article