Glycosylation of immunoglobulin G is regulated by a large network of genes pleiotropic with inflammatory diseases

Variation in key transcription factors and glycogenes modifies IgG glycosylation and has an influence on inflammatory diseases.


The PDF file includes:
Supplementary Note Appendix Table 1. Participating studies. Appendix Table 2. Genotyping overview. Appendix Table 3. Overview of imputation software and reference panels. Appendix Table 4. Description of structures for UPLC IgG glycans. Appendix Table 5. Details of genome-wide association analyses. Appendix Table 6. Individual GWAS file-level quality control. Appendix Table 7. Description of structures for LCMS glycans. Appendix Table 8. Summary-level statistics downloaded for SMR or HEIDI test. Appendix Table 9. DEPICT gene prioritization. Appendix Table 10. SNPs associated with IgG glycosylation with nonsynonymous amino acid change. Appendix Table 11. Strongest eQTL for each probe in each cell type in the CEDAR dataset. Appendix Figure 1. Phenotypic correlation of UPLC IgG N-glycans. Fig. S1. Glycome-wide effect estimates of genome-wide significant loci. Fig. S2A. RUNX3 binding in the proximity of MGAT3 rs8137426, SNP strongly associated with IgG N-glycosylation is the region bound by RUNX3, in the proximity of MGAT3.

Methods
Descriptions of the glycan data can be found in Huffman et al (46).

Isolation of IgG and glycan analysis
Isolation of IgG and glycan analysis has been described in a greater detail in the following studies: CROATIA-Korcula and CROATIA-Vis(36), ORCADES(47), TWINSUK(48), COGS, EGCUT,FINRISK(49). Details of the liquid chromatography electrospray mass spectrometry used to quantify Fc bound IgG glycopeptides can be found in Plomp et al(46). IgG isolation. IgG was isolated from blood plasma samples (70-80 uL) by affinity chromatography binding to CIM® Protein G 96-well plate (BIA Separations, Ajdovščina, Slovenia), using vacuum manifold (Millipore Corporation, Billerica, MA, USA). All steps during the isolation procedure were performed at around 380 mm Hg, except plasma sample application and IgG elution steps (around 200 mm Hg). Protein G plate was washed with 2 mL ultra-pure water (18 MΩ cm at 25 °C), 2 mL 1x PBS pH 7.4, 1 mL 0.1 mol L-1 formic acid; 2 mL 10x PBS; and equilibrated with 4 mL 1x PBS pH 7.4. Plasma samples, ψ(sample, 1x PBS pH 7.4) = 1:7, were applied to the protein G plate.
Unbound proteins were washed away with 3x 2 mL 1x PBS pH 7.4. Bound IgG was eluted with 1 mL 0.1 mol L-1 formic acid and neutralized with 1 mol L-1 ammonium hydrogencarbonate to pH 7.0.
Protein G plate was then washed with 1 mL 0.1 mol L-1 formic acid, 2 mL 10x PBS, 4 mL 1x PBS pH 7.4, 2 mL storage buffer (ethanol φ = 20 %, 20 mmol L-1 Tris, 0.1 mol L-1 NaCl, titrated with HCl to pH 7.4), and plate stored at 4 °C. N-glycan release and labelling. The whole procedure was done in a 96-well plate manner and ultra-pure water was used throughout. Dried IgG samples (150-200 ug) were denatured with the addition of 30 μl SDS (γ = 13.3 g L-1) and by incubation at 65 °C for 10 min. After cooling down to room temperature for 30 min, 10 μl of Igepal-CA630 (φ = 4 %) was added and mixture was shaken for 15 min on a plate shaker. N-glycans were released with the addition of 1.2 U of PNGase F (Promega, Madison, WI, USA) in 10 μL 5x PBS and incubation at 37 °C for 18 hours. Released N-glycans were labeled with 2-AB. The labeling mixture was freshly prepared by dissolving 2-AB (final γ = 19.2 mg mL-1) and 2-PB (final γ = 44.8 mg mL-1) in the mixture of DMSO and glacial acetic acid, ψ(DMSO, CH3COOH) = 7:3. To each N-glycan sample in the 96-well plate 25 μl of labeling mixture was added and the plate was sealed using adhesive seal. Mixing was achieved by shaking for 10 min, followed by 2 hour incubation at 65 °C.
Clean-up of 2-AB labeled glycans. Free fluorescent label, excess of reagents and proteins were removed from the samples using hydrophilic interaction liquid chromatography solid phase extraction (HILIC-SPE). After cooling down to room temperature for 30 min, 700 μL of acetonitrile (previously cooled down to 4 °C) was added to each sample (total volume of 775 μl). Clean-up procedure was performed on hydrophilic 0.2 μm AcroPrep GHP filter plate. Solvent was removed by application of vacuum using a vacuum manifold at around 25 mm Hg. All wells were prewashed with 200 μL ethanol in water (φ = 70 %), 200 μL ultrapure water and 200 μL acetonitrile in water (φ = 96 %, previously cooled down to 4 °C). The samples were loaded to the wells, and after short incubation subsequently washed 5x 200 μL acetonitrile in water (φ = 96 %, previously cooled down to 4 °C). Glycans were eluted with 2x 90 μL of ultra-pure water after 15 min shaking at room temperature and combined eluates were stored at -20 °C until ultra performance liquid chromatography (UPLC) analysis.
Glycan analysis by ultra performance liquid chromatography. Fluorescently labeled Nglycans were separated by HILIC on an Acquity UPLC instrument (Waters, Milford, MA, USA) consisting of a quaternary solvent manager, sample manager and a FLR fluorescence detector set with excitation and emission wavelengths of 250 and 428 nm, respectively. The instrument was under the control of Empower 3 software, build 3471 (Waters). Labeled Nglycans were separated on a Waters BEH Glycan chromatography column, 100 × 2.1 mm i.d., 1.7 μm BEH particles, with 100 mmol L-1 ammonium formate, pH 4.4, as solvent A and acetonitrile as solvent B. The samples were prepared in acetonitrile, ψ(sample, acetonitrile) = 20:80. The separation method used a linear gradient of 75% to 62% acetonitrile (vol/vol) at flow rate of 0.4 mL/min in a 25-minute analytical run. Samples were maintained at 10 °C before injection, and the separation temperature was 60 °C. The system was calibrated using an external standard of hydrolyzed and 2-AB labeled glucose oligomers from which the retention times for the individual glycans were converted to glucose units. Data processing was performed using an automatic processing method with a traditional integration algorithm after which each chromatogram was manually corrected to maintain the same intervals of integration for all the samples. The chromatograms were all separated in the same manner into 24 peaks and the amount of glycans in each peak was expressed as percentage of total integrated area (% Area) (36).

Participating studies and genetic data pre-processing
All genotyping, quality control, imputation and genome-wide association studies were performed in the respective research centres. Details of participating studies can be seen in Appendix Table 1 and details of genetic analyses can be seen in Appendix Tables 2, 3, and 4. Glycan data pre-processing

Appendix
Each of the 24 raw glycan intensities was divided by the total chromatographic area of the sample, followed by log transformation and exclusion of extreme values that most likely present technical outliers. Batch correction was performed with an empirical Bayes method (61), as implemented in the "ComBat" function of "sva" (37) R package. Derived traits, traits combining glycans with similar structural and chemical properties (for examplefucosylationsum of all glycans that have core fucose) were then computed on exponentiated measured glycans, using the "iudt" and "ildt" functions from the glycanr package (62), resulting in a final list of 77 glycan traits(36). The list of directly measured and derived glycan traits, together with formulas used for their calculation can be seen in the Appendix Table 4. The same procedure, with the difference of normalising glycopeptide data within subclass rather than on the whole dataset was applied to LCMS glycan data. The percentage of sialylation of fucosylated galactosylated structures without bisecting GlcNAc in total IgG glycans IGP25 SUM(GP19 + GP24) / SUM(GP19 + GP24 + GP10 + GP11 + GP15) * 100

Genome-wide association studies
Prior to analyses, each cohort received quality controlled glycan data (glycan data were preprocessed by the phenotype provider) and an analysis plan for genome-wide association studies. GWAS were performed on age, sex and cohort specific covariate corrected glycan data and imputed genotypes, taking into account relatedness and population stratification and assuming an additive model of association. as well as verification of trait transformations. A summary of quality control outcomes can be seen in Appendix Table 6.
Each cleaned set of individual GWA studies was pooled independently by two analysts.
Meta-analyses were performed using fixed effect inverse-variance method implemented in METAL (39). Effect estimates and p-values of the final meta-analyses were compared and showed no inconsistencies.
Prior to meta-analysis, each GWA study was corrected for genomic control inflation factor.
The genomic control inflation factor λ varied from 0.95 to 1.05, suggesting little residual influence of population stratification. Additional genomic control was performed on the aggregated meta-analysis results.

SMR/HEIDI analysis for pleiotropy with gene expression and complex traits
We used Summary data-based Mendelian Randomization (SMR) analysis followed by the

Functional network of genes associated with IgG N-glycosylation
Only 23 glycans are directly measured and all others are a function of these measured traits and are, therefore, correlated to different extent. Especially high phenotypic correlation can be observed between IGP1-IG14 and IGP41-IGP54 (Appendix Figure 1). These traits represent the same glycan traits but normalised in a different fashion. Since charged glycans are more prone to experimental error, glycan traits IGP41-IGP54 are defined as IGP1-IGP15 normalised by the total area of these peaks (instead of the total area of the whole chromatogram). Because of the redundancy of these glycan traits, they were removed prior the association analysis. Permutation analysis was then performed to assess the impact of correlations between remaining glycans on the glycome-wide pairwise SNP effect associations and additionally validate the observed network edges as described in the main Methods.

Appendix Figure 1. Phenotypic correlation of UPLC IgG N-glycans.
ChIP-seq data (.narrowPeak files) for the transcription factors (TFs) that we reported as associated with IgG glycosylation (Supplementary Allele specific binding of the TFs was tested using the matrix-scan tool (80) in RSAT.
Sequences surrounding the top SNPs from the GWAS were obtained from ensemble (81).
Position weight matrices (PWMs) of the discovered motifs outputted from peak-motifs were used to determine whether these SNPs fall within the binding sites of the TFs. The same analysis was performed for all significantly and suggestively associated IgG glycosylation SNPs (N=1723) and non-associated SNPs within 50kb of glycosylation loci (N=4389).
Allele specific binding was estimated by comparing binding weights of the motifs to the sequence were between reference and alternate alleles. A SNP was considered to influence TF binding if one of the alleles significantly affected its binding (p≤ 0.05/2764712 ~ 1.8 × 10 −8 , corrected for the total number of performed tests).

In vitro validation of the functional links between IKZF1 and FUT8
ShRNA cloning, cell culture, transfections and shRNA knockdown were determined using paired t-test.
Secondary antibodies were horseradish peroxidase conjugated anti-rabbit IgG, or anti-mouse IgG (Sigma Aldrich). Antibody detection was achieved using the enhanced chemiluminescent (ECL) detection system (Amersham Biosciences, UK).

Chromatin Immunoprecipitation (ChIP)
ChIP assays were performed using ExactaChIP buffers (R&D Systems), as described by the manufacturer except for the following modifications. MATAT6 cells cultured at 1 × 10 6 cells/ml of media were collected by centrifugation and resuspended in fresh media. Cells were cross-linked in a final formaldehyde concentration of 1% (w/v) for 5 minutes, followed by the addition of glycine to 125mM and incubation for 10 min. After washing with cold PBS, 5 × 10 6 cells were sonicated 15 times in 30 s pulses at maximum power in 500 μl Lysis Buffer (ExactaChIP, R&D Systems) and Complete EDTA-free protease inhibitor (Roche) to shear chromatin to an average length of 500 bp. Lysates were cleared by centrifugation for 10 minutes at 12,000 g, and then the supernatant was diluted in Dilution Buffer (ExactaChIP, R&D Systems). ChIP input was incubated with either anti-IKZF1 (AF4984, R&D Systems) or Goat IgG Isotype Control Antibody (AB108C, R&D Systems) overnight at 4ºC with rotation. 5 μg of biotinylated anti-goat IgG (BAF109, R&D Systems) was added for a further 2 hours, prior to the addition of Streptavidin-agarose beads (Sigma Aldrich) for 1 hour.
Agarose beads were collected by centrifugation, and sequential washes performed with Wash Buffer 1 -Wash Buffer 4 (R&D Systems). After the final wash, Chelating Resin solution was added to the beads and the samples boiled for 10 minutes. Immunoprecipitated DNA or input material was incubated at 65°C to remove cross-links, and purified using the Qiagen PCR purification kit. ChIP-PCR was performed using primers flanking a binding site upstream of to confirm Ikaros-DNA complexes also occur in MATAT6 cells. PCR products were analysed on 2% (w/v) agarose gel by electrophoresis in Tris-borate EDTA buffer.

Prioritising genes associated with IgG N-glycosylation
First, we investigated each candidate gene's functional similarity to members of enriched gene-sets. Data-driven Expression-Prioritized Integration for Complex Traits (DEPICT) (10) significantly prioritised genes in 14 loci at FDR ≤ 0.05 and another 3 at FDR ≤ 0.2 (Appendix Table 9). Five loci span glycosyltransferase genes and three of these were selected  Table 9.
We next checked if any of the associated SNPs ( Table 10).
The  Table 11). Full results of this analysis can be seen in the online resource available at https://shiny.igmm.ed.ac.uk/igg_glycans_gwas/.

Additional validation of the functional network regulating IgG N-glycosylation
We hypothesised that the transcription factors in the functional network may be involved directly in regulation of the expression of glycosyltransferase genes in IgG secreting cells.
The regulatory potential of the associated variants in B-cells, and the impact upon transcription factor binding, was inferred by interrogation of ENCODE chromatinimmunoprecitation sequencing (ChIP-seq) experiments performed in the lymphoblastoid cell line (LCL) GM12878, or by comparison to known binding site motifs. As shown in Supplementary Table 9, a striking number of the variants tested were predicted to have strong impact on binding weight scores obtained for RUNX1, RUNX3 and IRF1, providing further evidence to corroborate the links in the functional network in Figure 2a.
FUT8 had the strongest glycome-wide association with the ORMDL3-GSDMB-IKZF3-ZPBP2 and IKZF1 loci. Unlike rs5750830, the lead SNP for the MGAT3 loci, we found no evidence that the six variants contributing to variation in IGP42 glycan level in the region of FUT8 (Supplementary Table 3) are eQTLs for FUT8, or that they alter binding sites for IKZF1 using RSAT for motif discovery in GM12878 Encode IKZF1 Chip-Seq data. We next identified IKZF1 binding sites on chromosome 14 near to FUT8 in the GM12878 Encode ChIP-Seq data, and performed chromatin immunoprecipitation (ChIP) assays to investigate occupancy of these sites. In the GM12878 Encode ChIP-Seq data, 10 sites upstream of FUT8 were occupied by IKZF1. In an IgG secreting LCL, IKZF1 bound most strongly to chr14: 65757755-65758024 (Figure 3a).
Interrogation of sequences surrounding SNPs in high LD with rs11847263 (top FUT8 SNP): rs1054218, rs8006608, rs4073416, rs4902409 or rs4400971 predicted that variants on the same haplotype will affect IRF, RUNX1, RUNX3 and CTCF binding sites (Supplementary Table 9). Due to the low frequency of this haplotype, we were unable to identify a lymphoblastoid cell line (LCL) with appropriate genotype to assess the functional consequences of these putative alterations in transcription factor and CTCF binding.
Publically available Hi-C profiling (45)   LD -linkage disequilibrium between the given and the SNP in the next row, as estimated with Broad Institute's SNAP tool. p-value -the lowest p-value for the given SNP in the discovery meta-analysis. In bold -replicated loci.