Gene expression signatures of target tissues in type 1 diabetes, lupus erythematosus, multiple sclerosis, and rheumatoid arthritis

See allHide authors and affiliations

Science Advances  06 Jan 2021:
Vol. 7, no. 2, eabd7600
DOI: 10.1126/sciadv.abd7600


Autoimmune diseases are typically studied with a focus on the immune system, and less attention is paid to responses of target tissues exposed to the immune assault. We presently evaluated, based on available RNA sequencing data, whether inflammation induces similar molecular signatures at the target tissues in type 1 diabetes, systemic lupus erythematosus, multiple sclerosis, and rheumatoid arthritis. We identified confluent signatures, many related to interferon signaling, indicating pathways that may be targeted for therapy, and observed a high (>80%) expression of candidate genes for the different diseases at the target tissue level. These observations suggest that future research on autoimmune diseases should focus on both the immune system and the target tissues, and on their dialog. Discovering similar disease-specific signatures may allow the identification of key pathways that could be targeted for therapy, including the repurposing of drugs already in clinical use for other diseases.


The incidence of autoimmune diseases is increasing on a worldwide basis, and the prevalence of some of the most severe autoimmune diseases, i.e., type 1 diabetes (T1D), systemic lupus erythematosus (SLE), multiple sclerosis (MS), and rheumatoid arthritis (RA), has reached levels of prevalence ranging from 0.5 to 5% in different regions of the world (1). There is no cure for these autoimmune diseases, which are characterized by the activation of the immune system against self-antigens. This is, in most cases, orchestrated by autoreactive B and T cells that trigger and drive tissue destruction in the context of local inflammation (25). While the immune targets of T1D, SLE, MS, and RA are distinct, they share several similar elements, including common variants that pattern disease risk, local inflammation with contribution by innate immunity, and downstream mechanisms mediating target tissue damage. In addition, disease courses are characterized by periods of aggressive autoimmune assaults followed by periods of decreased inflammation and partial recovery of the affected tissues (3, 611). Endoplasmic reticulum stress (1215), reactive oxygen species (1619), and inflammatory cytokines, such as interleukin-1β (IL-1β) and interferons (IFNs), are also shared mediators of tissue damage in these pathologies (2023).

Despite these common features, autoimmune disorders are traditionally studied independently and with a focus on the immune system rather than on the target tissues. There is increasing evidence that the target tissues of these diseases are not innocent bystanders of the autoimmune attack but participate in a deleterious dialog with the immune system that contributes to their own demise, as shown by our group and others in the case of T1D [reviewed in (3, 24, 25)]. Furthermore, in T1D, several of the risk genes for the disease seem to act at the target tissue level—in this case, pancreatic β cells—regulating the responses to “danger” signals, the dialog with the immune system, and apoptosis (20, 25, 26). Against this background, we hypothesize that key inflammation-induced mechanisms, potentially shared between T1D, SLE, MS, and RA, may drive similar molecular signatures at the target tissue level. Discovering these similar (or, in some cases, divergent) disease-specific signatures may allow the identification of key pathways that could be targeted for therapy, including the repurposing of drugs already in clinical use for other diseases.

To test this hypothesis, we obtained RNA sequencing (RNA-seq) datasets from pancreatic β cells from controls or individuals affected by T1D (27), from kidney cells from controls or individuals affects by SLE (28), from optic chiasm from controls or individuals affected by MS (29), and from joint tissue from controls or individuals affected by RA (30). In some cases, we also compared these datasets against our own datasets of cytokine-treated human β cells (31). These unique data were mined to identify similar and dissimilar gene signatures and to search for drugs that may potentially revert the observed signatures. Furthermore, we searched for the expression of candidate genes for the different autoimmune diseases at the target tissue level and the signaling pathways downstream of these candidate genes.

These studies indicate major common gene expression changes at the target tissues of the four autoimmune disease evaluated, many of them downstream of types I and II IFNs, and massive expression of candidate genes (>80% in all cases). These findings support the importance of studying the target tissue of autoimmune diseases, in dialog with the immune system, to better understand the genetics and natural history of these devastating diseases.


Metadata and global gene expression in the target tissues of different autoimmune diseases

The metadata of the tissue donors evaluated in the present analysis are shown in Table 1. The number of samples is proportional to the accessibility of the target tissues, with the highest number of samples available for joint tissue in RA. The age and sex of the patients reflect the natural history of the different diseases, with younger patients in the T1D group and a higher proportion of female patients in the MS and SLE groups. Sex information was obtained from the original metadata and, when not available, was inferred using chromosomal marker information present in the transcriptome (see Materials and Methods). Of note, while some of the samples used for RNA-seq were obtained following fluorescence-activated cell sorting (FACS) purification (e.g., pancreatic β cells) (27), other samples comprised a mixture of target cells and infiltrating immune cells. Determination of the leukocyte marker CD45 expression in the different samples indicated a trend for higher presence of immune-derived cells among samples obtained in T1D, MS, and RA, but not in SLE (table S1). This contribution by immune cells was, however, modest. For instance, while in the β cell preparation the number of transcripts per million (TPM) for CD45 in the patient group was 16.4 (mean), the TPM values for the following β cell markers were as follows: INS (Insulin), 125.359; Sodium/potassium-transporting ATPase gamma chain (FXYD2γa), 65; GCK (Glucokinase), 20; Homeobox protein Nkx-2.2 (NKX2-2), 28; Synaptotagmin 4 (SYT4), 36; Neurogenic Differenciation 1 (NEUROD1), 27; Homeobox protein Nkx-6.1 (NKX6-1), 27; and MAF BZIP Transcription Factor B (MAFB), 23, indicating that the observed responses are driven, at least in part, by the constitutive cells of the target tissues. Of note, proinflammatory cytokines decrease the expression of several of the β cell markers (3, 20, 32) described above.

Table 1 Summary of the metadata for the RNA-seq samples of the four autoimmune diseases.

RNA-seq data from four studies of target tissues in autoimmune diseases were retrieved from the Gene Expression Omnibus (GEO) portal (, reanalyzed, and quantified with Salmon using GENCODE 31 as the reference. N/A, data nonavailable. For the sex column: M, male; F, female.

View this table:

In the T1D and SLE datasets, but not in the MS and RA ones, there was a trend for more up-regulated than down-regulated genes in the target tissues, which was particularly marked in the T1D dataset, with more than twofold higher number of up-regulated genes as compared with the down-regulated ones (Fig. 1A). Of note, because of a statistically significant difference in the age of patients with RA and their respective controls, we have included age as an independent variable when determining the differentially expressed genes in the joint tissue samples (see Materials and Methods).

Fig. 1 Overview of the number of differentially expressed genes and the signaling pathways activated in the target tissues of four autoimmune diseases.

(A) Number of protein-coding genes differentially expressed in four autoimmune diseases. Each RNA-seq data set was quantified with Salmon using GENCODE 31 as the reference. Differential expression was assessed with DESeq2. The numbers within the bars represent the protein-coding genes with |fold change| >1.5 and an adjusted P value <0.05. RNA-seq sample numbers (n) are as follows: T1D (n = 4 for patients, n = 10 for controls), SLE (n = 20 for patients, n = 7 for controls), MS (n = 5 for patients, n = 5 for controls), and RA (n = 56 for patients, n = 28 for controls). Results for the RA samples were adjusted by age as an independent variable. (B to E) Gene set enrichment analysis (GSEA) of T1D (B), SLE (C), MS (D), and RA (E) target tissues. After quantification using Salmon and differential expression with DESeq2, genes were ranked according to their fold change. Then, the fGSEA algorithm (76) was used along with the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome databases to determine significantly modified pathways. Bars in red and blue represent, respectively, a positive and negative enrichment in the associated pathway. The x axis shows the normalized enrichment score (NES) of the fGSEA analysis, and the y axis the enriched pathways. The numbers at the end of the signaling pathway names represent, respectively, (i) the number of genes present in the leading edge of the GSEA analysis and (ii) the total number of genes present in the gene set.

Analysis of the gene patterns in target tissues of autoimmune diseases indicates up-regulation of IFN-related pathways

Enrichment analysis of these disease-modified genes (Fig. 1, B to E) indicated similarities and differences between the different autoimmune diseases. Thus, both T1D and SLE have several up-regulated IFN-related pathways among the top up-regulated ones (Fig. 1, B and C); IFN pathways were also detected as enriched for MS and RA, but not among the 20 top ones [e.g., MS: IFN-αβ signaling normalized enrichment score (NES) = 2.26 (P adj. < 0.007); RA, IFN-αβ signaling NES = 2.64 (P adj. < 0.004)]. This similar enrichment in IFN-related genes can also explain the appearance of SLE as the top up-regulated pathway in T1D (Fig. 1B). Up-regulated pathways related to antigen presentation or antigen-related activation of immune cells were present for the four diseases (Fig. 1, B to E), in line with their autoimmune nature, while complement cascades were preeminent in MS (Fig. 1D) and RA (Fig. 1E), but less so in T1D and SLE. To evaluate whether these observed IFN-induced signatures originate, at least in part, from nonimmune cells in the target tissues, we reanalyzed available single-cell(sc)/nucleus(sn)–RNA-seq data focusing on nonimmune cells in affected tissues in T1D [pancreatic β cells (33)], SLE [kidney epithelial cells (34)], MS [brain neurons (35)], and RA [synovial fibroblasts (36)] (fig. S1A), confirming that there is a significant IFN signature in the target of the four autoimmune diseases as measured by an IFN response score, defined as the average expression of known IFN-stimulated genes (ISGs; see Materials and Methods) (34, 37).

The down-regulated pathways tended to be more disease specific and related to the dysfunction of the target organ. Thus, for T1D, there was down-regulation of pathways involved in “integration of energy metabolism,” a key step for insulin release, and in “regulation of gene expression in β cells,” which reflects the down-regulation of several transcription factors (TFs) critical for the maintenance of β cell phenotype and function (e.g., PDX1 and MAFA) (38) (Fig. 1B), while in RA, there was a decrease in collagen chain trimerization, an important step for proper collagen folding (Fig. 1E) (39). Moreover, down-regulation of pathways involved in lipid metabolism was enriched in MS samples (Fig. 1D). Supporting that, disruption of lipid metabolism in oligodendrocytes compromises the lipid-rich myelin production/regeneration, a hallmark of MS, both in in vitro studies (40) and in samples obtained from individuals with MS (41).

Gene set enrichment analysis (GSEA) of the sc/sn–RNA-seq data of nonimmune cells from the four autoimmune diseases (fig. S1, B to E) confirmed several up-regulated pathways in common, including IFN signaling (present for all diseases, although not always among the top 20 shown), T1D (which appears in three of the four diseases), allograft rejection, etc. As observed in the bulk RNA-seq analysis, there were less similarities between diseases regarding the down-regulated pathways.

We also analyzed the intersection between significantly up- and down-regulated genes of the bulk RNA-seq of the four diseases using another criterion, namely, considering genes as significantly modified if they presented a false discovery rate <0.10 without a fold change threshold (fig. S2, A and B). This showed a higher similarity among up- than down-regulated genes, but there were few genes in common between the four diseases. On the basis of a hypergeometric test to search for gene set enrichment for the cases where there were >50 genes in common between two and three diseases, we identified IFN signaling, antigen processing, and presentation and cytokine signaling, among others. It was, however, difficult to find common pathways among the down-regulated genes. A limitation of this approach is that we can only analyze genes that pass a fixed statistical cutoff, which makes the results very susceptible to the number of samples studied, as presently observed for the higher intersection between RA (a disease with a much higher number of samples) and the other autoimmune diseases. This type of analysis must thus be redone as more samples become available for the different diseases.

To obtain more detailed information on the (dis)similarities between the different autoimmune diseases, avoiding the pitfalls mentioned above for threshold-based analysis, we performed the rank-rank hypergeometric overlap (RRHO) analysis (Fig. 2) (42), a genome-wide approach that compares two equally ranked datasets using a threshold-free algorithm (see Materials and Methods). The main similarities between the diseases were observed among up-regulated genes, while there was no major intersection of commonly down-regulated genes between datasets (Fig. 2). This finding is in line with the above-described observation that down-regulated genes tended to be target-tissue related (Fig. 1, B to E). β cells in T1D, in particular, showed a strong correlation with regard to up-regulated genes with SLE, RA, and MS (Fig. 2). The functional enrichment analysis of these up-regulated overlapping pathways showed concordance for both types I and II IFN signaling for nearly all disease pairs (Fig. 3). Pathways related to neutrophil degranulation were highly up-regulated when comparing MS against T1D (Fig. 3B), SLE (Fig. 3D), or RA (Fig. 3F); this pathway also appeared highly in common between T1D and RA (Fig. 3C).

Fig. 2 RRHO analysis comparing the gene expression signatures of target tissues among four autoimmune diseases demonstrates a high degree of similarity between up-regulated genes.

(A) Genes were ranked by their fold change from the most down- to the most up-regulated ones and then submitted to the RRHO algorithm. The level map colors display the adjusted log P values of the overlap (the P values were adjusted using the Benjamini and Yekutieli method) between genes up-regulated in both datasets (bottom left quadrant), down-regulated in both (top right quadrant), up-regulated in the left-hand pathology and down in the bottom part (top left quadrant), and down in the left-hand pathology and up-regulated in the bottom part pathology (bottom right quadrant). (B) The panel displays the number of genes significantly overlapping in each pairwise analysis (A). NS, not significant quadrant.

Fig. 3 Functional enrichment analysis of overlapping genes among four autoimmune diseases demonstrates signaling pathways concordance.

(A to F) Genes significantly overlapping between different pairs of autoimmune diseases in the RRHO analysis (Fig. 2B) were selected for enrichment analysis using the clusterProfiler tool with the Reactome database. The top 20 gene sets are represented according to their adjusted P values (Benjamini and Hochberg correction) and their gene ratio (no. of modified genes/total gene set size). Diseases were analyzed in pairs. Enrichment analysis of genes significantly up-regulated in the target tissues of both (A) T1D and SLE, (B) T1D and MS, (C) T1D and RA, (D) SLE and MS, (E) SLE and RA, and (F) MS and RA.

We next investigated the potential TFs controlling the observed interdisease similarities. For this purpose, we evaluated the enrichment of TF binding site motifs in the promoter region of up-regulated genes from the pairwise analysis of autoimmune diseases (fig. S3). In line with the predominance of IFN-related pathways observed in Fig. 3, there was a high prevalence of common binding site motifs for IFN-induced TFs, including IFN-stimulated response element (ISRE), IFN regulatory factor 1 (IRF1), and IRF2, particularly when comparing T1D versus SLE (fig. S3A) and T1D versus RA (fig. S3C). To examine whether these TFs are expressed by constitutive cells of the target tissues, we have reevaluated the TF expression in nonimmune cells present in sc/sn–RNA-seq of the target tissues from the four autoimmune diseases. Since the presently available methods for sc/sn–RNA-seq only detect on average 1000 to 5000 genes per cell (43), which is 75 to 80% lower than the total number of genes identified by bulk cell RNA-seq (>20,000 genes), we selected for this analysis the top 10 TFs presenting the highest expression in the affected target tissues. By this approach, we observed that the majority of these TFs are expressed by nonimmune cells from the target tissues (fig. S3G). In agreement with this observation, we have previously shown that exposure of the human β cell line EndoC-βH1 to INFα leads to the activation of several of the same TFs identified, including signal transducer and activator of transcription 1 (STAT1), STAT2, STAT3, IRF1, and IRF9 (31, 48).

To assess whether a putative in vivo type I IFN signaling in the context of different autoimmune diseases activates similar pathways in the target tissues, we compared gene expression of primary human islets (31) and skin keratinocytes (44) exposed in vitro to IFN-α for 8 and 6 hours, respectively (fig. S4). There were approximately 40% differentially expressed genes in common between these two tissues (fig. S4A), leading to the induction of pathways such as IFN signaling and antigen presentation/processing (fig. S4B) that were similar to the pathways observed in target tissues from patients affected by T1D (Fig. 1B and fig. S5) or SLE (Fig. 1C and fig. S5).

It is noteworthy that when comparing SLE versus T1D and SLE versus RA (Fig. 2, A and B), there were a large number of genes up-regulated in one disease but down-regulated in the other. A more detailed analysis of these oppositely regulated genes (fig. S6) indicated that neutrophil degranulation and signaling by RHO GTPases (guanosine triphosphatases) were among the most enriched gene sets. A similar observation was made regarding SLE versus RA, where neutrophil degranulation was also the most represented gene set. This apparent disagreement between genes regulating neutrophil degranulation in SLE and other autoimmune diseases may reflect the presence of two distinct populations of neutrophils in patients with SLE that have functional differences in pathways controlling chemotaxis, phagocytosis, and degranulation (45). Other dissimilarities include the anti-inflammatory IL-10 signaling and groups related to the regulation of the dialog between immune and resident cells, such as “immunoregulatory interaction between a lymphoid and nonlymphoid cell” and “PD-1 (programmed cell death protein 1) signaling.”

The availability of the above-described datasets allowed us to mine the overlapping genes in the target tissues of the different autoimmune diseases to search for common therapeutic targets, with the potential to find drugs to be repurposed (Fig. 4). As a proof of concept, we identified dihydrofolate reductase inhibitors as a potential therapeutic target for several pairs of autoimmune diseases (Fig. 4, B to D and F), and methotrexate, a member of this class, is already routinely used for the treatment of different autoimmune diseases, including RA (46) and SLE (47). Bromodomain inhibitors were also observed as common perturbagens between T1D and SLE (Fig. 4A) and SLE versus RA (Fig. 4E). This is in line with our recent observations that two of these bromodomain inhibitors, JQ1 and I-BET-151, protect human β cells against the deleterious effects of IFN-α (31). There were additional interesting candidates, some with a profile covering multiple diseases, such as phosphoinositide 3-kinase (PI3K) (T1D versus SLE, SLE versus RA, and MS versus RA) and janus kinase (JAK) inhibitors (SLE versus RA and MS versus RA), while others acting on specific pairs of diseases, namely, bile acids (T1D versus MS) and fibroblast growth factor receptor (FGFR) inhibitors (SLE versus MS) (Fig. 4). Of note, clinical trials are currently evaluating the effects of the bile acid tauroursodeoxycholic acid (TUDCA) in patients with recent-onset T1D (, NCT02218619) and MS (, NCT03423121).

Fig. 4 Mining overlapping genes among target tissues in four autoimmune diseases allows the identification of potential common therapeutic targets.

(A to F) After determining statistically which genes were overlapped in pairs of autoimmune diseases from the RRHO analysis (Fig. 2), the top 150 overlapping genes were submitted to the Connectivity Map database to identify perturbagen classes driving an opposite signature (negative tau score) to the one present in the target tissues of the four autoimmune diseases. Only classes with a median tau score <−80 were considered. (A to F) Perturbagen classes driving down the genomic signatures of up-regulated genes. The same methodology and conditions have been applied for subsequent analysis: (A) T1D and SLE, (B) T1D and MS, (C) T1D and RA, (D) SLE and MS, (E) SLE and RA, and (F) MS and RA. EGFR, epidermal growth factor receptor. LOF, Loss of Function; GOS, Gain of Function; IAP, inhibitor of Apoptosis; FGFR, Fibroblast Growth Factor Receptor; MDM, Murine Double Minute; HIF, Hypoxia Inducible Factor; BCL, B-Cell Lymphoma.

Expression of candidate genes for the different autoimmune diseases at the target tissue level

We have previously shown that isolated human pancreatic islets express a large number of risk genes for T1D (20, 24, 26, 48), and we presently examined whether this is also the case for the target tissues in other autoimmune diseases (table S2). Confirming our previous findings, 81% of risk genes for T1D were expressed in human β cells; similar findings were observed for the target tissues for SLE (92%), MS (83%), and RA (88%). The autoimmune assault changed the expression of >65% of these candidate genes for joint tissue RA (table S2), but the number of disease-induced and significantly modified genes was much smaller for the other autoimmune diseases, probably because of limited statistical power associated to the number of samples analyzed (>80 samples studied in the case of RA and between 10 and 27 for the other diseases). The list of risk genes expressed in the target tissues is available in data file S1. An overview of these candidate genes and their coexpression in different autoimmune diseases is provided in Fig. 5. Genes related to antigen presentation [human lymphocyte antigen (HLA)–DQB1 and HLA-DRB1] and to type I IFN signaling (TYK2) are present in all target tissues for the four autoimmune diseases. Reactome (49) analysis of risk genes in T1D (data file S2) identified ILs and IFN signaling as important pathways. IFN signaling also appears pro-eminently for kidney tissue in SLE, optic chiasm in MS, and joint tissue in RA (data file S2), but there are also clusters related to defense against the autoimmune assault, including PD-1 (for all diseases) and IL-10 signaling (for SLE and MS only); PD-1–PDL1 (programmed death ligand 1) is probably also an important defense mechanism of human β cells in T1D (50).

Fig. 5 Venn diagrams of risk genes expressed in the target tissues of the four autoimmune diseases shows shared candidate genes among them.

Venn diagram representing risk genes identified in GWAS studies in target tissues for T1D, SLE, MS, and RA. For each disease, the risk genes were extracted from the GWAS Catalog ( and selected as described in Materials and Methods. In brief, each list was curated according to their relationship to the disease, and only genes with a P value <0.5 × 10−8 for their SNP-trait relationship were kept. Last, an intersection between the four lists was performed and represented as a Venn diagram. Numbers in the diagram represent the numbers of genes present in each subgroup, and genes overlapping between diseases are displayed by their HGNC symbols. A gene was considered as expressed if it presents a mean TPM > 0.5 in either the patient or control group. N/A, not applicable (no gene in common).

To evaluate whether the observed candidate genes are expressed in nonimmune cells from the target tissues studied, we have used a similar approach as done for the TF analysis (fig. S3G) and revised sc/sn–RNA-seq data from nonimmune cells in affected tissues in T1D (33), SLE [kidney epithelial cells (34)], MS [brain neurons (35)], and RA [synovial fibroblasts (36)]. This confirmed that >80% of the top 50 risk genes are expressed by the target cells (fig. S1, F to I). Of note, the present limitations of the sc–RNA-seq method regarding the number of genes detected (commented upon above) may explain why less candidate genes are observed in single cells (fig. S1, F to I) than in whole tissue or FACS-sorted bulk cells (data file S1).


In the present study, we tested the hypothesis that target tissues from four different autoimmune diseases, namely, T1D, SLE, MS, and RA, engage in a “dialog” with the invading immune cells that leaves “molecular footprints.” These footprints may share similarities, as local inflammation is a common outcome of these diseases, and point to common mechanisms that can be targeted by therapy.

The analysis of the gene expression patterns of the target tissues in the different diseases showed up-regulation of type I and II IFN–related pathways, which is in line with observations made in the peripheral blood of individuals with T1D (51), SLE (52, 53), MS (54), and RA (55). These descriptive similarities were confirmed by comparing the ranking of the up-regulated genes via RRHO, a method that allows the comparison between differentially expressed genes in control and diseased tissue from two different diseases, outlining the similarities and/or dissimilarities between the modified genes in both diseases. Here, we observed clear but different degrees of overlap between the diseases mostly regarding the up-regulated expression patterns. In support of the robustness of the present findings, these similarities were present despite the fact that the original RNA-seq data were obtained by different research teams, using different extraction and sequencing processes, and that there were major differences between the studies regarding age and sex of the patients and respective controls (many of these differences were inherent to the diseases studied, e.g., SLE is more common in females).

The observed similarities in pathway activation between target tissues were translated into the identification of several classes of drugs that could potentially be used to treat more than one autoimmune disease (Fig. 4). Among them, JAK inhibitors, which act downstream of the types I and II IFN receptors by blocking activation of the kinases JAK1 and JAK2, are of particular interest. These inhibitors were recently approved for the treatment of RA (56) and had promising results in a phase 2 clinical trial of patients with SLE (57). In line with this, JAK inhibitors prevent the proinflammatory and proapoptotic effects of IFN-α on human pancreatic β cells (31) and revert established insulitis in diabetes-prone NOD (nonobese diabetic) mice (58). Another class of drugs presently identified for potential use in several autoimmune diseases are the PI3K inhibitors. These drugs target a family of lipid kinases that phosphorylate phosphoinositides from cell membranes, modulating cellular processes such as cell growth, metabolism, and immune responses. In agreement with our analysis, inhibitors of the PI3K isoforms δ and γ have beneficial effects in animal models of MS (59), SLE (60), and RA (61). PI3K inhibitors, however, may have opposite effects on different tissues. Thus, PI3Kδ inhibitors exacerbate inflammatory responses in the airways and gut, tissues often exposed to pathogens, leading to severe cases of pneumonitis and colitis (62). This indicates that selection of potential new therapeutic agents needs to consider also the specific characteristics of the target tissue(s). This is in agreement with our present observations of tissue-specific down-regulated pathways in different diseases, such as pathways related to maintenance of the β cell phenotype in T1D, or down-regulation of pathways involved in collagen folding in joint tissues from RA.

There have been previous attempts to perform individual drug repurposing on these pathologies [e.g., (63, 64)]. Our present study attempts to expand this approach, potentially leading to drug repurposing for multiple autoimmune diseases, for instance, in the case of JAK inhibitors. Repurposing already-studied drugs provides the benefits of having their pharmacodynamic and pharmacokinetic profiles already well studied, which considerably reduces the bench-to-bedside time frame (65), and helping the treating physicians to survey for previously detected side effects.

More than 80% of candidate genes for which a single-nucleotide polymorphism (SNP)–trait link has been deemed significant are expressed in the target tissues of the different autoimmune diseases studied. This is in line with our previous observations in T1D (20, 26, 48), where these candidate genes probably regulate β cell responses to “danger signals,” such as viral infections, and the signal transduction of type I IFNs (23). The fact that similar observations are now made in the target tissues of SLE, MS, and RA (present data) suggests that future studies in these diseases should also consider the impact of candidate genes acting at the target tissue level. Of note, and to detect eQTL (Expression quantitative trait loci) in target tissues, it may be necessary to expose them to relevant stimuli, such as proinflammatory cytokines in the case of T1D (26).

The present observations, showing the expression of candidate genes in the target tissues of autoimmune diseases, may contribute to explain why certain people have different innate immune responses at the tissue level to seemingly similar triggers (such as viral infections or other danger signals), leading to different outcomes, e.g., progressive tissue damage or resolution of inflammation and return to homeostasis. For instance, diverse polymorphisms in candidate genes for T1D may contribute to disease at the β cell level by regulating antiviral responses, innate immunity, activation of apoptosis, and, at least for a few of them, β cell phenotype (24, 25, 66).

The candidate genes presently observed as overlapping between target tissues of two or more diseases are mostly related to inflammatory mediators, particularly the signal transduction of IFNs, suggesting that similarities between these diseases are dependent, at least in part, on the genetically mediated regulation of local immune responses. These findings may have therapeutic implications. For instance, one of the candidate genes in common between all the four autoimmune diseases is TYK2, a key component of the JAK-STAT signaling pathway. TYK2 inhibitors are already in phase 3 clinical trial for another autoimmune disease, psoriasis (67), and two different TYK inhibitors protect human β cells against the deleterious effects of IFN-α (68). Targeting IFN pathways at an early step of its signal transduction may not be, however, a sufficiently specific approach, and the role of IFNs may vary according to the stage of disease and the genetic background of the affected individuals. The success of IFN-blocking therapies in human SLE and other rheumatic diseases remains to be proven (69).

The data generated in the present study contribute to a better understanding of the communication between the immune system and the target tissues in T1D, SLE, MS, and RA, and strengthen the putative implication of the target tissues in these autoimmune diseases. These findings also indicate a role for similar candidate genes expressed in target tissues of two or more diseases and indicate potential new therapeutic agents to target key similar pathways. As a whole, these observations suggest that future research on the genetics and pathogenesis of autoimmune diseases should focus on both the immune system and their target tissues and on their dialog.


The study’s first limitation relates to the scarcity of RNA-seq data for target tissues in autoimmune diseases, particularly in the cases where these tissues are difficult to access, such as in T1D or MS. This decreases the power of the analysis and may bias the data in favor of diseases where a larger number of samples were available (e.g., RA). Another issue is the stage of the disease, as the impact of the immune system on the target tissues may differ in the early and late phases of the disease [for instance, in the case of T1D, innate rather than adaptive immunity may have a major role at earlier stages (3, 25, 70)]. Unfortunately, and because of the scarcity of samples in, for instance, T1D or MS, this stage issue is difficult to address. It is noteworthy that despite these limitations, it was still possible to obtain clear conclusions from the available data.

Another potential limitation is that immune cells are present in the target tissue preparations analyzed (although there was a statistically significant increase in the expression of the immune marker CD45 only in T1D and RA), which may affect the gene expression pathways described above. The facts that (i) an IFN signature is present in nonimmune cells of the diseased tissues analyzed and these nonimmune cells express several candidate genes for the diseases studied (fig. S1); (ii) at least in the case of a pure human β cell line, EndoC-βH1 cells, exposure to IFN-α induces a gene signature that is similar to that observed in β cells obtained from patients affected by T1D (31); and (iii) histological analysis of pancreatic islets from patients with T1D show expression of HLA class I (ABC) (71), HLA-E (31), PDL1 (50), CXCL10 (72), and STAT1 (71) in pancreatic β cells, taken as a whole, suggest that at least part of the observed gene signatures originate from the target tissues and cannot be explained by the immune infiltration alone. Future follow-up studies based on direct histological staining of the specific cells involved are required to define the exact contribution of immune and nonimmune cells in the affected target tissues.


Target tissue bulk RNA-seq processing and analysis

For each dataset, control and patient target tissue gene expressions were quantified using Salmon version 0.13.2 (73) with parameters “--seqBias –gcBias --validateMappings.” GENCODE version 31 (GRCh38) (74) was chosen as the reference genome and has been indexed with the default k-mer values. Differential expression was performed with DESeq2 version 1.24.0 (75). For each gene included in DESeq2’s model, a log2 fold change was computed and a Wald test statistic was assessed with a P value and an adjusted P value. In this study, we consider a gene as differentially expressed when |fold change| >1.50 and adjusted P value <0.05. Since there was a statistical difference in the age between patients with RA and controls, for this particular dataset, we have taken age as an independent variable in the general linear model performed by DESeq2. To introduce age as a confounding factor in the analysis, we performed a binning on the ages and assigned each donor a group, respectively: 10 to 29, 30 to 49, 50 to 69, and >70 years old. All the other parameters of the DESeq2 analysis were the same as for the others target tissues.

sc/sn–RNA-seq processing and analysis

We have obtained the expression matrices containing the processed reads from transcriptome studies of the following target tissues: (i) sc–RNA-seq from cryo-banked islets obtained from three donors with T1D and three controls matched for body mass index, age, sex, and storage time, performed using the SmartSeq-2 protocol as described in (33) and accessible under the Gene Expression Omnibus (GEO) number GSE124742; (ii) sc–RNA-seq from kidney biopsies from 24 patients with LN and 10 control samples acquired from living donor kidney biopsies using a modified CEL-Seq2 protocol as described in (34) and accessible in the ImmPort repository (accession code SDY997); (iii) sc–RNA-seq from snap-frozen brain tissue blocks obtained at autopsies from 10 patients with MS (1 primary progressive MS, 9 secondary progressive MS) and 9 nonaffected individuals processed using the 10x Genomics Single-Cell 3′ system as described in (35) and accessible on Sequence Read Archive (SRA; accession number PRJNA544731); and (iv) sc–RNA-seq of synovial tissues from ultrasound-guided biopsies or joint replacements of 36 patients with RA and 15 patients with osteoarthritis, as reference controls, using the CEL-Seq2 protocol as described in (34) and available at ImmPort (accession code SDY998). After that, we normalized the gene expression levels by transforming the counts to log2(CPM + 1) (counts per million).

For the purpose of reproducibility, we have kept the same cell identity classification defined in the original sc/sn–RNA-seq study (3336). To represent nonimmune cells on the target tissues, we have selected (i) in T1D, the β cells isolated from pancreatic islets; (ii) in SLE, all the kidney epithelial cells from the kidney biopsy; (iii) in MS, all the cells from different clusters of brain neurons; and (iv) in RA, all the cells from the fibroblast clusters of joint synovial tissues.

Sex determination

For most, but not all, target tissues, sex information was available in the metadata on the GEO website. To compensate for this lack of information, we inferred the sex based on the expression of 40 genes exclusively coded on the Y chromosome and the female-expressed XIST (X-inactive specific transcript) (data file S1). We created a machine learning model on the basis of a linear discriminant analysis algorithm that we trained on the expression of both controls and patient expression matrices in RA. The training was supervised with the sex described in the metadata as the desired outcome. We then tried our model to predict the sex of patients on different target tissues (i.e., T1D and MS) where the outcome was known, according to their metadata, which provided only one prediction different from the expected outcome (96% accuracy). This allowed us to estimate the sex ratio in the studies missing this information in the available metadata.

Risk genes

Risk genes associated with each disease were identified using genome-wide association study (GWAS) catalog (; consulted January 2020). The candidate genes were selected on the basis of the following criteria: (i) T1D, SLE, MA, and RA as the disease/trait evaluated by the study; (ii) a P value of <0.5 × 10−8 for the lead SNP; (iii) selecting the reported genes linked to the lead SNP described by the original study; and (iv) expression of the reported genes in the target tissue (TPM > 0.5). An overlap between the four lists of genes was then performed and represented as a Venn diagram.

ISG score

To evaluate for the presence of types I and II IFN signatures on the target tissues of the four autoimmune diseases, we have calculated for each cell from the sc/sn–RNA-seq an ISG score. This ISG score was calculated as the average expression of known ISGs listed on data file S1. The statistical difference between groups was determined using a two-tailed Mann-Whitney U test.

RRHO analysis

To compare the genomic signatures of the target tissues of the four autoimmune diseases, we used an RRHO (42) mapping, an unbiased method to uncover the concordances and discordances between two similarly ranked lists. Briefly, for a pair of diseases, the full list of genes is ranked according to their fold change from the most down-regulated to the most up-regulated gene. Then, an intersection of shared genes is performed, and the analysis of the ranking order of genes is performed with a hypergeometric test.

The visual output of this analysis is an RRHO level map (Fig. 2A), where the hypergeometric P value for enrichment of k overlapping genes is calculated for all possible threshold pairs for each experiment, generating a matrix where the indices are the current rank in each experiment. P values for each test are then log transformed and reported on a heatmap to display the degrees of similarities according to four quadrants representing the concordance or the discordance in gene ranking in the two differential expression analysis (e.g., up-regulated in one disease and down-regulated in the other).

Functional enrichment analysis

The functional enrichment analysis was based on results from the differential expression analysis. Genes from bulk RNA-seq data were preranked according to the Wald test statistic of the differential expression results from DESeq2. For sc/sn–RNA-seq data, we filtered out genes that were expressed in less than 10% of all cells to minimize the dropout impact on the overall gene expression. The remaining genes were then preranked according to the log2 fold change of the differential expression results from DESeq2. We used fGSEA (76) along with the Kyoto Encyclopedia of Genes and Genomes (KEGG) (77) and Reactome (49) databases as the references to determine which pathways were positively or negatively enriched in the target tissue of each disease. Default parameters were used, except for the number of permutations (10,000) for the most accurate P values. For bulk RNA-seq data, results with an adjusted P value <0.05 (Benjamini-Hochberg correction) were then sorted according to their NES. For sc/sn–RNA-seq data, results with an adjusted P value <0.15 (Benjamini-Hochberg correction) were then sorted according to their NES.

To determine the functional enrichment in genes up-regulated in pairs of diseases, we used a hypergeometric test included in the clusterProfiler package (78) on the genes overlapping significantly in the RRHO mapping. The Reactome (49) database was used as the reference for the gene sets. Default parameters were used, and P values were adjusted with the Benjamini-Hochberg correction.

Venn diagrams

Genes differentially expressed with an adjusted P value <0.10 (Benjamini-Hochberg correction) were selected. The gene lists of all diseases were then overlapped and represented as a Venn diagram of up- or down-regulated genes. In case of an overlap of >50 genes, the gene list was processed using a hypergeometric test with the Reactome database as the reference. Defaults parameters were used, and P values were adjusted with the Benjamini-Hochberg correction.

TF binding site analysis

Motif discovery for TF binding site in the promoter regions of up-regulated genes was done using the script from the HOMER (79) tools suite with parameters “-start -2000 -end 2000.” The promoter regions were considered as ±2000 base pairs from the gene transcription start site. Known TF binding site motifs uncovered and included in the study have a P value <0.05.

Therapeutic target identification

For each RRHO analysis result, we picked the top 150 up-regulated genes shared between two diseases and processed this list with the Connectivity Map dataset (80) using the cloud-based CLUE software platform ( This allowed us to query the database for compounds that are driving down the input genomic signatures, revealing potential drugs that could be repurposed to treat one or more diseases. We focused then on perturbagen classes that displayed a negative median tau score and retained as potential drug candidates only classes with a median tau score <−80.

Statistical analysis

TPM values are given according to their means ± SD. Results considered as significant in this study have a P value (or an adjusted P value when applicable) <0.05. For gene expression, we considered that a gene is differentially expressed if |fold change| >1.5 and adjusted P value <0.05, unless explicitly stated.


Supplementary material for this article is available at

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.


Acknowledgments: Funding: D.L.E. acknowledges the support of a grant from the Welbio–FNRS (Fonds National de la Recherche Scientifique), Belgium, the Dutch Diabetes Fonds (DDFR), Holland, startup funds from the Indiana Biosciences Research Institute (IBRI), Indianapolis, Indiana, USA, and the Innovative Medicines Initiative 2 Joint Undertaking under grant agreement nos. 115797 (INNODIA) and 945268 (INNODIA HARVEST), supported by the European Union’s Horizon 2020 Research and Innovation Programme. These Joint Undertakings receive support from the Union’s Horizon 2020 research and innovation programme and “EFPIA,” “JDRF,” and “The Leona M. and Harry B. Helmsley Charitable Trust.” C.E.-M. acknowledges the support of NIH grants R01 DK093954, VA Merit Award I01BX001733, JDRF 2-SRA-2018-493-A-B, and JDRF 2-SRA-2019-834-S-B (to C.E.-M. and D.L.E.); M.M. is supported by the JDRF. Author contributions: M.L.C., F.S., M.J.M., C.E-M., and D.L.E. conceived the analysis. M.L.C. and D.L.E. supervised the analysis. F.S. and M.L.C. performed the bioinformatics analyses. F.S., M.L.C., and D.L.E. wrote the manuscript, and all authors revised it. D.L.E. provided the main funding. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Raw RNA-seq data are accessible from the GEO repository ( via their GSE codes as stated in Table 1. sc/sn–RNA-seq data are accessible from the repositories indicated in Materials and Methods (see above). Additional data related to this paper may be requested from the authors.

Stay Connected to Science Advances

Navigate This Article