Research ArticleIMMUNOLOGY

Glycosylation of immunoglobulin G is regulated by a large network of genes pleiotropic with inflammatory diseases

See allHide authors and affiliations

Science Advances  19 Feb 2020:
Vol. 6, no. 8, eaax0301
DOI: 10.1126/sciadv.aax0301


Effector functions of immunoglobulin G (IgG) are regulated by the composition of a glycan moiety, thus affecting activity of the immune system. Aberrant glycosylation of IgG has been observed in many diseases, but little is understood about the underlying mechanisms. We performed a genome-wide association study of IgG N-glycosylation (N = 8090) and, using a data-driven network approach, suggested how associated loci form a functional network. We confirmed in vitro that knockdown of IKZF1 decreases the expression of fucosyltransferase FUT8, resulting in increased levels of fucosylated glycans, and suggest that RUNX1 and RUNX3, together with SMARCB1, regulate expression of glycosyltransferase MGAT3. We also show that variants affecting the expression of genes involved in the regulation of glycoenzymes colocalize with variants affecting risk for inflammatory diseases. This study provides new evidence that variation in key transcription factors coupled with regulatory variation in glycogenes modifies IgG glycosylation and has influence on inflammatory diseases.


Glycosylation, a series of reactions that creates complex carbohydrate structures (glycans) attached to a polypeptide backbone, is among the most common and complex posttranslational protein modifications. Highly regulated attachment of different glycans to the same glycosylation site (alternative glycosylation) can greatly contribute to variability in glycoprotein structure and influence function in a way that is analogous to changes in protein sequence (1). Immunoglobulin G (IgG) is a simple glycoprotein with only one conserved N-glycosylation site per heavy chain and glycans that have no more than two antennae, but even this simple glycosylation has the capacity to generate hundreds of different glycoforms of IgG. It is well established that alternative glycosylation of IgG can act as a molecular switch between different immune response outcomes and thus significantly affect the function of the immune system (2). Aberrant protein glycosylation has been observed in many physiological states, from aging- and age-related diseases to autoimmune diseases and cancer (3).

Glycans are synthesized in the endoplasmic reticulum and the Golgi apparatus by a complex interplay of glycosyltransferases—enzymes that add monosaccharides to a growing glycan chain—and various other enzymes and transporters involved in synthesis and delivery of donors and substrates needed for chemical reactions (4). However, our understanding of the mechanisms by which alternative glycosylation is achieved and regulated is very limited.

N-linked glycans attached to a protein can be released, and their abundance can be quantified using a range of analytical methods. These measurements represent a set of quantitative traits characterizing the glycome of the studied protein. As with any quantitative trait, the genetic determinants of the glycome can be studied by associating levels of glycans to polymorphic sequence variants measured in large cohorts of individuals [genome-wide association studies (GWAS)]. The first GWAS of the glycome identified associated variants in or near 7 glycosyltransferase genes and in 15 additional loci with no apparent role in protein glycosylation (5, 6). The list was recently extended by six additional loci (7, 8). However, there is still limited understanding of how these genes are functionally related or what their role is in the genetic regulation of IgG N-glycosylation in health and disease.

To address these questions, we performed the largest GWAS of the total IgG N-glycome to date, on 8090 samples from individuals of European ancestry, and more than doubled the number of associated loci. To prioritize genes with a plausible role in IgG glycosylation, we assessed whether variants were located in the coding regions of genes, showed pleiotropy with gene expression in biologically relevant cell types, and assessed enrichment in gene sets from different biological pathways. We explored how these genes are connected in a functional network and confirmed some of the network connections with in vivo functional follow-up. Last, we investigated pleiotropy with other complex traits and diseases by interrogating the overlap of glycosylation associations with susceptibility loci for other phenotypes. Where available, we also assessed whether IgG N-glycosylation, diseases, and gene expression are likely to be controlled by the same underlying causal variant using summary-level Mendelian randomization (SMR). This strategy not only resulted in the discovery of new candidate genes but also suggested how some of these genes could regulate glycosylation enzymes and how they could influence the aberrant glycosylation observed in diseases with an inflammatory signature.


Discovery and replication meta-analyses

We performed a discovery IgG N-glycosylation GWAS on four cohorts of European descent (N = 8090). Associations of 77 ultraperformance liquid chromatography (UPLC) IgG N-glycan traits with HapMap2 (release 22) imputed genetic data were studied. We assumed an additive linear model for each glycan trait, followed by a fixed-effect inverse-variance meta-analysis.

Overall, associations in 27 loci reached genome-wide significance [P ≤ 2.4 × 10−9; Bonferroni corrected for 21 independent glycan traits; (6)], and another 6 loci were suggestively significant (2.4 × 10−8P < 2.4 × 10−9) (Table 1). Eight of the genome-wide significant loci confirmed previous findings from Lauc et al. (6), 5 confirmed those from Shen et al. (7) (one of which, AZI1, maps to the same locus as the suggestive association near SLC38A10 from Lauc et al.), and 1 confirmed those from Wahl et al. (8) (Fig. 1 and table S1), while 14 loci were not previously associated with IgG glycosylation. Individual associations and Manhattan plots can be found in the online resource available at For 19 of 27 significant loci (9 of 14 previously unassociated), the same single-nucleotide polymorphism (SNP)–glycan association was replicated in a meta-analysis of four independent European cohorts (N = 2388) with P ≤ 0.05/27 ≈ 1.9 × 10−3. None of the genes in novel loci have a known role in glycosylation. For all loci, the direction of effect estimates obtained for the UPLC data (effect either increasing or decreasing with the same allele) was the same in the discovery and the replication parts of the study (Table 1).

Table 1 Loci associated with IgG glycosylation.

Each locus is represented by the SNP with the strongest association in the region. Locus, coded by “chromosome: locus start–locus end” (GRCh37); No. of SNPs, maximum number of SNPs independently contributing to trait variation; SNP, variant with the strongest association in the locus; position, position of the SNP on the NCBI (National Center for Biotechnology Information) build 37; EA, allele for which effect estimate is reported; OA, other allele; EAF, frequency of the effect allele; R2, sample size weighted average of imputation quality for the SNP with the strongest association in the locus; glycan, glycan associated with the reported SNP; No. of glycans, number of other glycans suggestively or significantly associated with variants in given locus; β, effect estimate for the SNP and glycan with the strongest association in the locus in the discovery; P, P value for the discovery effect estimate; β UPLC repl, effect estimate for the SNP and glycan with the strongest association in the locus in replication; P UPLC repl, P value for the effect estimate in replication. The loci replicated in UPLC replication at P ≤ 0.0019 are in bold. The loci from Lauc et al. (6), Shen et al. (7), and Wahl et al. (8) are in italics.

View this table:
Fig. 1 Gene prioritization in loci associated with IgG N-glycosylation.

The Manhattan plot was created by taking the lowest P value at every genomic position from all 77 GWAS. For simplicity, the plot was trimmed at the equivalent of P = 10−50. The lowest observed P value in this analysis was 4.65 × 10−276 at ST6GAL1. Known loci, loci detected in previous IgG N-glycosylation GWAS; replicated, UPLC replication GWAS. Bottom: Summary of support for prioritization of every gene in the Manhattan plot. DEPICT, genes from enriched gene sets; expression, genes whose expression is pleiotropic with IgG N-glycosylation; CD19, B cells; PB, peripheral blood; coding, genes for which IgG N-glycosylation–associated SNP results in a changed amino acid sequence.

To further validate our findings, we analyzed all significantly associated SNPs in a cohort where glycans were measured with liquid chromatography coupled to electrospray mass spectrometry (LCMS; N = 1842). From the 27 genome-wide significant SNPs, 17 were associated with at least one of the LCMS-measured IgG glycopeptide levels at Bonferroni-corrected P ≤ 3.7 × 10−5. Of these 17, 14 loci were also replicated in the UPLC replication study (table S2).

We performed an approximate conditional and joint analysis using GCTA-COJO (genome-wide complex trait conditional and joint analysis) software (9) to investigate secondary associations in the 27 loci. We found evidence of multiple SNPs independently contributing to glycan level variation for four loci. Three of these loci span glycosyltransferase genes, coding for enzymes directly involved in biosynthesis of glycans (4) (table S3). The greatest number of independently associated SNPs (six) was observed for the fucosyltransferase locus, FUT8, followed by three SNPs in the galactosyltransferase locus, B4GALT1, and two SNPs in the sialyltransferase gene, ST6GAL1. Two SNPs in the fourth locus, spanning the human leukocyte antigen (HLA) region on chromosome 6, are likely to be due to the complexity of this region, where multiple antigen-presenting genes are in close proximity and in high linkage disequilibrium (LD).

Using only the marginal effects of top SNPs, we were able to explain up to 20% of the genetic variance (for a single glycan trait). Uncovering additional variants in the conditional analysis resulted in a maximum of 22% variance explained. These maxima were reached for IGP29, the percentage of monosialylation of fucosylated digalactosylated structures without bisecting N-acetylglucosamine, GlcNAc (table S3).

Prioritizing genes associated with IgG N-glycosylation

We applied the following strategy to prioritize the most likely functional genes. The Data-driven Expression Prioritized Integration for Complex Traits (DEPICT) framework (10) was used to perform a gene set and tissue/cell type enrichment analysis and to provide additional evidence for gene prioritization. Next, for all candidate genes in every locus, we explored evidence for genes containing SNPs affecting amino acid sequence using the Variant Effect Predictor (VEP) (11) and genes whose expression is likely to be regulated by the same underlying variant as N-glycosylation of IgG. For the latter, we applied SMR with heterogeneity in dependent instruments (HEIDI) (12) on expression quantitative trait loci (eQTLs) from five different immune cells (neutrophils, macrophages, B cells, and two types of T cells) from the CEDAR dataset [N = 350; Momozawa et al. (13)] and peripheral blood from Westra et al. (14) (N = 5311). This test assesses whether the coassociation of two traits to the same region may be due to pleiotropic action of the same variants to both traits or due to the variant being in LD with two independent causal variants, each exhibiting independent effects on each trait. More specifically, a pleiotropic scenario would assume that the same unobserved variant is responsible for the association with both gene expression and glycan levels, while an LD scenario would assume that coassociation is due to two or more underlying variants that are in LD and independently contribute to either gene expression or glycan levels.

The DEPICT gene prioritization tool provided evidence of prioritization for 18 genes in 14 loci at false discovery rate (FDR) <0.05 and another 3 genes at FDR <0.2 (Supplementary Note, Appendix Table 9). In four genes, associated variants were predicted to result in a potentially deleterious amino acid change, and in five genes, probably benign amino acid changes. SMR/HEIDI indicated that IgG N-glycosylation–associated variants in 9 loci had pleiotropic effects on expression of 13 genes in different tissues, including B cells (4 genes in 3 loci), the cell lineage responsible for IgG synthesis. For 11 loci, we did not find any additional evidence to prioritize target genes and report the gene closest to the strongest association in the region (Fig. 1 and the Supplementary Note;

To obtain insights into the biological pathways that these genes are involved in, we performed FUMA’s GENE2FUNC (15) for Gene Ontology (GO) and DEPICT for gene set enrichment analyses. In total, our prioritized genes were significantly enriched in 75 different GO gene sets. These gene sets reflect three higher-level biological processes: glycosylation (23 gene sets), immune system processes (22 gene sets), and transcription (7 gene sets) (table S4). DEPICT tests for enrichment in preconstructed gene sets that are based on information from protein-protein interactions (PPIs), molecular and biochemical pathways, gene coexpression, and gene sets based on mouse gene knockout studies. No gene sets were enriched at FDR <0.05, but 397 gene sets were enriched at FDR <0.2. These gene sets are connected to various processes involved in immunity, B cell life cycle, and antibody production and quantity (table S5).

Functional network of loci associated with IgG N-glycosylation

Next, we investigated how genes in associated loci are functionally connected with each other. We assume that two loci with a similar glycome-wide effect (effect of the top SNP in the locus on all glycan traits) are likely to have a similar role in the regulation of glycosylation and may be a part of the same biological pathway. Our assumption is based on the following. We analyze the glycosylation of a single protein (IgG) and determine 77 different glycan traits. These traits arise as a result of the activity of different enzymes in the glycosylation biosynthesis pathway. It can be expected that changing the rate of a specific reaction would lead to specific changes of glycan profile, regardless of the specific mechanism (e.g., alteration in substrate availability, lower activity of enzyme as a result of a mutation changing its structure, a regulatory variation changing the level of expression of a gene encoding relevant enzyme), leading to the rate change. For example, it can be expected that a regulatory variant that is associated with expression levels of glycosyltransferase gene MGAT3 [mannosyl (β-1,4-)-glycoprotein β-1,4-N-acetylglucosaminyltransferase] will have an influence on all glycan traits that contain bisecting GlcNAc. This could be a variant changing the structure of this acetylglucosaminyltransferase, a regulatory variant in the promoter of the MGAT3 gene, a variant disrupting an enhancer of MGAT3, or it could be a regulatory variant that changes expression of a transcription factor (TF) binding to the enhancer of MGAT3. Either way, the rate of reaction controlled by the product of MGAT3 would change, leading to a similar effect on all glycan traits that contain bisecting GlcNAc.

To create specific hypotheses regarding the regulatory functions of our loci, we defined the glycome-wide effect of an SNP as a vector of effects [in the form of z scores, where z score = β/se(β)] of the given SNP on each of the glycans. As can be seen in fig. S1, some loci have notably similar effects on all glycan traits. We therefore constructed a functional network by computing Spearman pairwise correlations of glycome-wide effects of all 27 SNPs that tag each locus. Given that the direction of the GWAS estimates depends on the allele used as a reference, the sign of correlation is not informative in this analysis. Therefore, we report absolute values of correlation coefficients. Although nodes in this network represent effects of top SNPs, we marked nodes by using the names of candidate genes in each locus, and we used these names to denote loci throughout the text. Below, we focus our discussion only on clusters that contain glycosyltransferases (MGAT3, FUT8, B4GALT1, and ST6GAL1), because these enzymes catalyze the transfer of monosaccharides to a growing glycan chain and therefore have a known and well-established role in the biosynthesis of glycans. As outlined above, we hypothesize that any locus with a similar glycome-wide effect to a glycosyltransferase locus is likely to either regulate the expression levels of the enzyme or modulate its activity through other indirect effects.

To validate the functional network, we applied both computational and experimental approaches. We first pruned the correlation network for significant correlations [P ≤ 0.05/(27 × 26)/2 ≈ 1.4 × 10−4] and performed permutation analysis with 100,000 random SNPs. We also compared our network with the STRING human PPI database (16).

The strongest overall correlation, a Spearman’s ρ of 0.97, was observed between IgG–N-glycome–wide effects of the top SNP in the MGAT3 locus and the top SNP in the SMARCB1-DELR3-CHCHD10-VPREB3 locus (Fig. 2A). None of the randomly selected SNPs exhibited such a strong correlation with these two loci in the permutation analysis (Table 2 and table S6). Recent work by Sharapov et al. (17) on genetics of the total plasma proteome N-glycosylation, of which IgG is a major part, suggested that the likely candidate gene is DERL3. DERL3 encodes a protein involved in endoplasmic reticulum–associated degradation for misfolded luminal glycoproteins. Data provided in this study shift the balance of evidence toward the SMARCB1, as it is unlikely that modulation of glycoprotein degradation would result in effects similar to those observed when perturbing MGAT3.

Fig. 2 Functional network of loci associated with IgG N-glycosylation.

Correlation estimates are computed on the basis of squared pairwise Spearman’s correlation of SNP effects. The loci are denoted with names of genes that were prioritized in regions tagged by the given lead SNP. (A) Functional network of loci associated with IgG N-glycosylation. In this network, each node represents a lead SNP in the locus, and each edge represents the squared correlation of glycome-wide effects of the two nodes. Only significant correlations after multiple testing correction (P ≤ 1.4 × 10−4) are shown. The thickness and intensity of edges depend on variation in one locus explained by the effect estimates in the second locus. Round-edged rectangular nodes denote genes that are, according to GO, involved in glycosylation; purple-edged nodes denote genes involved in immune system processes; green nodes denote loci containing genes involved in transcription regulation; orange nodes denote glycosyltransferases; and blue rectangles indicate diseases pleiotropic with IgG glycans in the given locus (Table 3). (B) Hierarchical clustering of pairwise Spearman’s locus-effect correlations.

Table 2 In silico evidence for edges from functional network of IgG N-glycosylation loci.

Permutation analysis was used to assess the distribution of glycome-wide SNP effect pairwise correlations and the STRING PPI database (16) to search for biological validation of observed links. All reported correlations are statistically significant after Bonferroni correction. Corr, Spearman’s correlation coefficient of z scores of SNP1 and z scores of SNP2; quant, quantile of distribution at which the given ρ2 was observed in permutation analysis; STRING score, probability that the link exists obtained by combining evidence from different sources. SNPs that were replicated in UPLC replication are in bold.

View this table:

The second strongest association, a Spearman’s ρ of 0.94, was between lead SNPs in TFs RUNX3 and RUNX1 loci. This link was validated in the permutation analysis and observed in the STRING PPI network. RUNX3 and RUNX1 loci were also strongly correlated with both the MGAT3 and SMARCB1-DELR3-CHCHD10-VPREB3 loci (Table 2). A proxy for the top SNP in the MGAT3 locus, SNP rs8137426 (LD R2 = 1 with rs5750830), is located in a region bound by TF RUNX3, although not directly within its binding motif (fig. S2A). In our SMR/HEIDI analysis, we found evidence of the same association having a pleiotropic effect on both MGAT3 expression in CD19+ B cells and IGP40, a glycan trait related with bisecting GlcNAc ( This suggests that the associated SNP could influence binding of TFs (RUNX1 or RUNX3), which, together with the chromatin remodeling protein SMARCB1, regulate expression of MGAT3, resulting in an increased incidence of bisecting GlcNAc in all fucosylated disialylated structures of IgG (IGP40; Table 1).

The FUT8 locus had the strongest glycome-wide association with the ORMDL3-GSDMB-IKZF3-ZPBP2 and IKZF1 loci (ρ = 0.71 and 0.62). The latter two loci also show highly similar glycome-wide effects with each other (ρ = 0.78) and were observed as interacting in the STRING PPI network (Table 2 and table S7).

Two remaining galactosyltransferases had the strongest correlations with loci where the underlying hypothesis of regulation is less straightforward. The most strongly correlated loci with the galactosyltransferase B4GALT1 locus were SPINK4 and HIVEP2 (ρ = 0.8 and 0.71). Top SNPs from the NXPE1-NXPE4 and ELL2 loci had the most similar glycome-wide effects to the top SNP from the sialyltransferase ST6GAL1 locus (ρ = 0.77 and 0.58). Full results of the permutation analysis and STRING PPI are available at the online resource (

Last, we assessed whether there is evidence of a subnetwork of genes regulating each specific glycosyltransferase by performing hierarchical clustering on the glycome-wide SNP effect correlation matrix. We observed three clusters: the cluster with MGAT3, the cluster with FUT8, and the cluster with B4GALT1 and ST6GAL1. Several loci containing genes coding for TFs cluster together with the MGAT3 locus, suggesting a complex regulatory network for this glycosyltransferase gene (and/or for the factors regulating this gene, i.e., RUNX1 is repressed by RUNX3) (Fig. 2B).

To further explore this possibility, we analyzed whether IgG glycosylation–associated SNPs were predicted to disrupt binding sites of TFs encoded by genes from this network more often than by chance. To assess the influence of glycosylation SNPs on binding of TFs from our network (IKZF1, IKZF3, RUNX1, RUNX3, IRF1, SMARCB1, and TBX21), we used Regulatory Sequence Analysis Tools (18). Briefly, if an associated SNP maps to the conserved position in the TF-binding motif and its effect allele is different than the conserved allele at that position, the SNP will be predicted to strongly affect binding of the TF. Alternatively, the allele can also completely disrupt or create the binding motif, resulting in a loss of a known or introduction of a new TF-binding site. In four loci spanning the glycosyltransferase genes, the associated SNPs resulted in either introduction or disruption of binding sites for TFs RUNX1, RUNX3, IKZF1, SMARCB1, TBX21, and IRF1. We observed the same disruption/introduction of TF-binding sites for all these TFs too and therefore confirmed some well-known regulation feedback loops between TFs RUNX1 and RUNX3 and IKZF1 and IKZF3 (table S8) (19, 20). To assess whether these TF-binding site alterations are specific for associated SNPs, we compared the frequency of TF-binding alterations of associated and nonassociated SNPs. In seven glycosylation loci, the associated SNPs were at least two times more likely to affect binding of TF than nonassociated SNPs from the same region. The highest difference in the number of SNPs affecting the TF-binding site was observed for FUT8, MGAT3, and loci coding for TFs that are members of their cluster. Associated SNPs from the FUT8 and MGAT3 loci have a significant effect on binding of TF from our network, while nonassociated SNPs from their proximity have no effect on binding (table S9). In summary, we not only found TFs associated with IgG glycosylation but also showed that glycosylation-associated SNPs can potentially alter their binding in other glycosylation-associated loci. These results not only reinforce some of the links suggested in our network but also confirm some known feedback loops between different TFs involved in B cell differentiation and biology (e.g., interactions of IKZF1 and IKZF3 and RUNX1 and RUNX3) (19, 20).

Experimental validation of the functional links between IKZF1 and FUT8

Our network-based approach allowed us to show that variants in or near IKZF1 and IKZF3 share glycome-wide similarities with the top SNP in the FUT8 locus. We hypothesized that the TFs IKZF1 and IKZF3 may regulate the expression of FUT8. We further observed notable pleiotropy between variants in the IKZF3-ORMDL3-GSDMB-ZPBP2 locus with inflammatory conditions including inflammatory bowel disease (IBD), ulcerative colitis (UC), and rheumatoid arthritis (RA) (Fig. 2). We therefore aimed to disrupt this node of our network in our cellular model. Microarray and RNA sequencing data from the lymphoblastoid cell line (LCL) used in this study showed that IKZF1 was more highly expressed than IKZF3. Because these genes are known to regulate one another, we decided to target IKZF1. To investigate the potential link between IKZF1 and FUT8, we identified IKZF1-binding sites on chromosome 14 near FUT8 in the LCL GM12878 Encode chromatin immunoprecipitation sequencing (ChIP-seq) data. We performed ChIP followed by polymerase chain reaction (PCR) to confirm IKZF1 binding at one of these sites in an IgG-secreting human B cell–derived LCL, MATAT6 (Fig. 3A). We then investigated the role of IKZF1 in the regulation of FUT8 by conducting stable short hairpin RNA (shRNA)–mediated knockdown of IKZF1 using shRNA in MATAT6 cells. This resulted in depletion of the IKZF1 transcript and protein (Fig. 3, B and C). Knockdown of IKZF1 was accompanied by a significant down-regulation of IKZF3 (Fig. 3D) and a 3.5-fold up-regulation in the expression of FUT8 (t test, P = 0.04; Fig. 3E), corroborating a link between IKZF1 and FUT8 and highlighting the dynamic relationships between TFs in B cells. Next, we looked at changes in the glycosylation of secreted IgG in cells with depleted IKZF1 at two different time points and observed a small but significant (P = 0.02; table S10) increase in fucosylation (Fig. 3F) resulting from IKZF1 knockdown. While this change is small, the low SE of triplicate measurements and independent replication of the experiment (second time point) demonstrate its robustness (table S10). In addition, one must consider that most of the glycans are fucosylated and that further fucosylation might be limited by specificities of glycosyltransferases involved in the process—the addition of bisecting GlcNAc to a growing glycan chain by MGAT3 inhibits fucosylation of these glycans (21). This change also results in a 20% decrease (from 5 to 4%) in afucosylated glycans [glycoforms activating antibody-dependent cellular cytotoxicity (ADCC)]. These data are consistent with decreased expression of IKZF1 resulting in increased FUT8 expression and, thus, increased IgG fucosylation. Hence, although there is no in silico evidence that the lead SNP in the FUT8 locus overlaps with a predicted IKZF1-binding motif, and there is no evidence for the associated SNP being a FUT8 eQTL, we show that SNPs that are in LD with the lead FUT8 SNP can alter binding of IKZF1 (table S8). Furthermore, this TF can bind to regulatory regions of FUT8 and may act cooperatively with additional factors to modify its expression. To explore this further, we used publicly available Hi-C profiling performed at 1-kb resolution in the LCL GM12878. These data indicate that SNPs in the FUT8 locus lie in the same chromosomal topologically associating domain as the transcription start site of FUT8 (fig. S2B). CCCTC-binding factor (CTCF) plays a crucial role in maintaining the three-dimensional structure of the genome, and disruption of CTCF binding by SNPs in LD with rs4400971 may modify the interactions formed between enhancer and promoter regions of FUT8 to affect expression of the glycosyltransferase enzyme (for details, see Supplementary Note).