## Abstract

Genotype-by-environment interaction (GEI) is a fundamental component in understanding complex trait variation. However, it remains challenging to identify genetic variants with GEI effects in humans largely because of the small effect sizes and the difficulty of monitoring environmental fluctuations. Here, we demonstrate that GEI can be inferred from genetic variants associated with phenotypic variability in a large sample without the need of measuring environmental factors. We performed a genome-wide variance quantitative trait locus (vQTL) analysis of ~5.6 million variants on 348,501 unrelated individuals of European ancestry for 13 quantitative traits in the UK Biobank and identified 75 significant vQTLs with *P* < 2.0 × 10^{−9} for 9 traits, especially for those related to obesity. Direct GEI analysis with five environmental factors showed that the vQTLs were strongly enriched with GEI effects. Our results indicate pervasive GEI effects for obesity-related traits and demonstrate the detection of GEI without environmental data.

## INTRODUCTION

Most human traits are complex because they are affected by many genetic and environmental factors as well as potential interactions between them (*1*, *2*). Despite the long history of effort (*3*–*5*), there has been limited success in identifying genotype-by-environment interaction (GEI) effects in humans (*5*–*8*). This is likely because many environmental exposures are unknown or difficult to record during the life course and because the effect sizes of GEI are small, given the polygenic nature of most human traits (*9*–*11*), so that the sample sizes of most previous studies are not large enough to detect the small GEI effects. For model complex traits such as body mass index (BMI), GEI analyses have been limited to GEI tests at known BMI loci (*12*–*14*) or estimation of GEI variance captured by all common SNPs (*15*, *16*).

GEI effect of a genetic variant on a quantitative trait could lead to differences in variance of the trait among groups of individuals with different variant genotypes (Fig. 1, A and B, and note S1). GEI can therefore be inferred from a variance quantitative trait locus (vQTL) analysis (*17*), although there are other explanations for an observed vQTL such as direct effect on phenotypic dispersion [e.g., induced by selection (*18*)], epistasis (*17*), and phantom vQTL (*19*, *20*). Unlike the classical QTL analysis that tests the allelic substitution effect of a variant on the mean of a phenotype (Fig. 1C), vQTL analysis tests the allelic substitution effect on the trait variance (Fig. 1, B or D). In comparison to the analyses that perform direct GEI tests, vQTL analysis is more flexible because it does not require measures of environmental factors and thus can be performed in a very large sample where the environmental factors are unknown, unavailable, or incomplete (*21*). Of course, the vQTL test is less powerful than the direct GEI test if the corresponding environmental factor has been measured on all the genotyped individuals in the sample (*17*). Although there had been empirical evidence for the genetic control of phenotypic variance in livestock for decades (*22*, *23*), it was not until recent years that genome-wide vQTL analysis was applied in humans (*17*, *24*, *25*), and only a handful of vQTLs have been identified for a limited number of traits [e.g., the *FTO* locus for BMI (*25*)] owing to small effect sizes of vQTLs. The availability of data from large biobank-based genome-wide association studies (GWAS) (*26*, *27*) provides an opportunity to interrogate the genome for vQTLs for a range of phenotypes in cohorts with unprecedented sample size.

On the other hand, statistical methods for vQTL analysis are not entirely mature (*21*). There have been a series of classical nonparametric methods (*28*), originally developed to detect violation of the homogeneous variance assumption in linear regression model, which can be used to detect vQTLs, including the Bartlett’s test (*29*), the Levene’s test (*30*, *31*), and the Fligner-Killeen (FK) test (*32*). Recently, more flexible parametric models have been proposed, including the double generalized linear model (DGLM) (*33*–*35*) and the likelihood ratio test for variance effect (LRT_{V}) (*19*). In addition, it has been suggested that the transformation of phenotype that alters phenotype distribution also has an influence on the power and/or false-positive rate (FPR) of a vQTL analysis (*24*, *36*).

In this study, we calibrated the most commonly used statistical methods for vQTL analysis by extensive simulations. We then used the best performing method to conduct a genome-wide vQTL analysis for 13 quantitative traits in 348,501 unrelated individuals using the UK Biobank (UKB) data (*26*). We further investigated whether the detected vQTLs are enriched for GEI by conducting a direct GEI test for the vQTLs with five environmental factors (or covariates).

## RESULTS

### Evaluation of the vQTL methods by simulation

We used simulations to quantify the FPR and power (i.e., true-positive rate) for the vQTL methods and phenotype processing strategies (Methods). We first simulated a quantitative trait based on a simulated single-nucleotide polymorphism (SNP), i.e., a single-SNP model, under a number of different scenarios, namely, (i) five different distributions for the random error term (i.e., individual-specific environment effect) and (ii) four different types of SNP with or without the effect on mean or variance (Methods). We used the simulated data to compare the four most widely used vQTL methods, namely, Bartlett’s test (*29*), Levene’s test (*30*, *31*), FK test (*32*), and DGLM (*33*–*35*). We observed no inflation in FPR for the Levene’s test under the null (i.e., no vQTL effect) regardless of the skew or kurtosis of the phenotype distribution or the presence or absence of SNP effect on the mean (fig. S1A). These findings are in line with the results from previous studies (*24*, *28*, *37*) that the Levene’s test is robust to the distribution of the phenotype. The FPR of the Bartlett’s test or DGLM was inflated if the phenotype distribution was skewed or heavy-tailed (fig. S1A). The FK test seemed to be robust to kurtosis but vulnerable to skewness of the phenotype distribution (fig. S1A). Because the Levene’s test performed the best in the simulations, for this test, we investigated the impact of nonlinear transformations of the phenotype by considering logarithm [log(*y*)], square (*y*^{2}), cube (*y*^{3}), and rank-based inverse-normal transformation (RINT) and found that these nonlinear transformations could result in inflated FPR (fig. S1B).

To simulate more complex scenarios, we used a multiple-SNP model with two covariates (age and sex) and different numbers of SNPs (Fig. 2). The results were similar to those described above, although the power of the Levene’s test decreased with an increase of the number of causal SNPs (Fig. 2A). Nonlinear transformations led to an inflated FPR when the variance explained by a QTL effect (i.e., SNP effect on mean) was relatively large and a loss of power of vQTL detection when the per-QTL variance explained was relatively small, although logarithm transformation did not seem to affect the power (Fig. 2B). These results also suggested that pre-adjusting the phenotype by covariates slightly increased the power (Fig. 2B). On the basis of the results of these simulations, we used the Levene’s test, a one-way analysis of variance (ANOVA) to test for absolute deviations from the medians (Methods), for real data analysis, with the phenotypes pre-adjusted for covariates without any nonlinear transformation.

### Genome-wide vQTL analysis for 13 UKB traits

We performed a genome-wide vQTL analysis using the Levene’s test with 5,554,549 genotyped or imputed common variants on 348,501 unrelated individuals of European ancestry for 13 quantitative traits in the UKB (Methods; table S1A and fig. S2A) (*26*). For each trait, we pre-adjusted the phenotype for age and the first 10 principal components (PCs; derived from SNP data) and standardized the residuals to *z* scores in each gender group (Methods). This process removed not only the effects of age and the first 10 PCs on the phenotype but also the differences in mean and variance between the two genders. We excluded individuals with adjusted phenotypes more than 5 SDs from the mean and removed SNPs with minor allele frequency (MAF) smaller than 0.05 to avoid potential false-positive associations due to the coincidence of a low-frequency variant with an outlier phenotype (see fig. S3A for an example). We acknowledge that this process could potentially result in a loss of power, but this can be compensated for by the use of a very large sample (*n* ~ 350,000).

With an experiment-wise significant threshold of 2.0 × 10^{−9} [i.e., 1 × 10^{−8}/5.0, with 1 × 10^{−8} being a more stringent genome-wide significant threshold recommended by recent studies (*38*, *39*) and 5.0 being the effective number of independent traits (note S4)], we identified 75 vQTLs [independent to linkage disequilibrium (LD) *r*^{2} < 0.01 within trait] across the nine traits (Fig. 3, Table 1, and table S2A). There was no vQTL for height, consistent with the observation in a previous study (*25*). We identified more than 15 vQTLs for each of the three obesity-related traits, i.e., BMI, waist circumference (WC), and hip circumference (HC) (Table 1). The 75 vQTLs were located at 41 near-independent loci after excluding one of each between-trait pair of top vQTL SNPs (i.e., the SNP with lowest vQTL *P* value at each vQTL association peak) with LD *r*^{2} > 0.01, suggesting that some of the loci were associated with the phenotypic variance of multiple traits. For example, the *FTO* locus was associated with the phenotypic variance of WC, HC, BMI, body fat percentage (BFP), and basal metabolic rate (BMR) (fig. S3B), and the vQTL associations were likely to be driven by a shared causal variant with pleiotropic vQTL effects on multiple traits (fig. S3C). For the lung function–related traits, there was no significant vQTL for forced expiratory volume in 1 s (FEV1) and forced vital capacity (FVC) but there were three vQTLs for the FEV1/FVC ratio (FFR). There was no evidence for an effect of MAF on vQTL test statistic at the 41 independent loci (fig. S3D), consistent with the observation in a previous study (*25*).

The Levene’s test assesses the difference in variance among three genotype groups free from the assumption about additivity (i.e., the vQTL effect of carrying two copies of the effect allele is not assumed to be twice that carrying one copy). We found two vQTLs (i.e., rs141783576 and rs10456362) potentially showing nonadditive genetic effect on the variance of HC and BMR, respectively (table S2A).

To demonstrate the vulnerability of vQTL analysis to nonlinear transformations in real data, we performed genome-wide vQTL analysis for height squared and cubed. There was no genome-wide significant vQTL for height squared but one genome-wide significant vQTL for height cubed, which was very likely to be driven by a strong QTL signal for height [*P*_{QTL(Height)} = 4.35 × 10^{−150}] (fig. S3, E and F), consistent with our simulation results that nonlinear transformations could inflate the vQTL test statistics in the presence of a strong QTL signal (Fig. 2B and fig. S1B). Although we have not applied any nonlinear transformation to the UKB traits, some of them are nonlinear functions of other traits, i.e., BMI (= WT/HT^{2}), FFR (= FEV1/FVC), and WHR (= WC/HC). We therefore explored whether the BMI, FFR, and WHR vQTLs were driven by the nonlinear functions by testing the variance effects of the BMI, FFR, and WHR vQTLs on 1/HT2, 1/FVC, and 1/HC, respectively. There were 26 tests in total, none of which reached the experiment-wise significance level (i.e., 2.0 × 10^{−9}) used to claim vQTLs in this study and 23 of which had a *P* value larger than 0.05 (table S2B), suggesting that the BMI, FFR, and WHR vQTLs were not driven by the nonlinear functions. Although the variance effect of an FFR vQTL (rs56077333) on 1/FVC was significant after correcting for 26 tests (*P* = 5.11 × 10^{−6}; table S2B), the effect of rs56077333 on the variance of 1/FVC was not large enough to drive the vQTL signal for FFR and rs56077333 has a known GEI effect on lung function (see below for more details).

### GWAS analysis for the 13 UKB traits

To investigate whether the SNPs with effects on variance also have effects on mean, we performed GWAS (or genome-wide QTL) analyses for the 13 UKB traits described above. We identified 3973 QTLs at an experiment-wise significance level (i.e., *P*_{QTL} < 2.0 × 10^{−9}) for the 13 traits in total, a much larger number than that of the vQTLs (Fig. 4 and Table 1). Among the 75 vQTLs, the top vQTL SNPs at nine loci did not pass the experiment-wise significance level in the QTL analysis (table S2A). For example, the *CCDC92* locus showed a significant vQTL effect but no significant QTL effect on WC (table S2A and fig. S3G), whereas the *FTO* locus showed both significant QTL and vQTL effects on WC (fig. S3G). For the 66 vQTLs with both QTL and vQTL effects, the vQTL effects were all in the same directions as the QTL effects, meaning that, for any of these SNPs, the genotype group with a larger phenotypic mean also tends to have a larger phenotypic variance than the other groups. For the nine loci with vQTL effects only, it is equivalent to a scenario where a QTL has a GEI effect with no (or a substantially reduced) effect on average across different levels of an environmental factor (Fig. 1B).

### vQTL and GEI

To further investigate whether the associations between vQTLs and phenotypic variance can be explained by GEI, we performed a direct GEI test based on an additive genetic model with an interaction term between a top vQTL SNP and one of five environmental factors/covariates in the UKB data (Methods). The five environmental factors/covariates are sex, age, physical activity (PA), sedentary behavior (SB), and ever smoking (note S5, fig. S2B, and table S1B). We observed 16 vQTLs showing a significant GEI effect with at least one of five environmental factors after Bonferroni correction for multiple tests [*P* < 1.33 × 10^{−4} = 0.05/(75 × 5); Fig. 5A and table S2C].

To test whether the GEI effects are enriched among vQTLs in comparison with the same number of QTLs, we performed GEI test for the 75 top GWAS SNPs randomly selected from all the QTLs and repeated the analysis 1000 times. Of the 75 top SNPs with QTL effects, the number of SNPs with significant GEI effects was 2.25, averaged from the 1000 repeated samplings with an SD of 1.49 (Fig. 5B), significantly lower than the number (*16*) observed for the vQTLs (the difference is larger than 9 SDs, equivalent to *P* = 1.4 × 10^{−20}). This result shows that SNPs with vQTL effects are much more enriched with GEI effects compared to those with QTL effects. To exclude the possibility that the GEI signals were driven by phenotype processing (e.g., the adjustment of phenotype for sex and age), we repeated the GEI analyses using raw phenotype data without covariate adjustment; the results remain largely unchanged (fig. S5).

## DISCUSSION

In this study, we leveraged the genetic effects associated with phenotypic variability to infer GEI. We calibrated the most commonly used vQTL methods by simulation. We found that the FPR of the Levene’s test was well calibrated across all simulation scenarios, whereas the other methods showed an inflated FPR if the phenotype distribution was skewed or heavy-tailed under the null hypothesis (i.e., no vQTL effect), although the Levene’s test appeared to be less powerful than the other methods particularly when the per-variant vQTL effect was small (Fig. 2 and fig. S1). Parametric bootstrap or permutation procedures have been proposed to reduce the inflation in the test statistics of DGLM and LRTv, both of which are expected to be more powerful than the Levene’s test (*19*, *37*), but bootstrap and permutation are computationally inefficient and thus are not practically applicable to biobank data such as the UKB. We observed inflated FPR for the Levene’s test in the absence of vQTL effects but in the presence of QTL effects if the phenotype was nonlinearly transformed (e.g., logarithm transformation or RINT). We therefore recommend the use of the Levene’s test in practice without nonlinear transformation of the phenotype. In addition, a very recent study by Young *et al*. (*40*) developed an efficient algorithm to perform a DGLM analysis and proposed a method [called dispersion effect test (DET)] to remove confounding in vQTL associations (identified by DGLM) due to QTL effects. We showed by simulation that, when the number of simulated causal variants was relatively large (note that the DET test is not applicable to oligogenic traits), the Young *et al*. method (DGLM followed by DET) performed similarly as the Levene’s test, with differences depending on how the phenotype was processed (fig. S6).

We demonstrated in the analysis of the UKB data that a number of vQTLs (with enriched GEI effects) can be detected by an appropriate analytical strategy in a very large sample. Traits with a larger number of vQTLs detected at the experiment-wise significance level tended to have a higher genomic inflation factor (defined as the mean or median χ^{2} statistic divided by its expected value) even after excluding the top vQTLs as well as SNPs in LD with them (fig. S4), consistent with a polygenic model of variance effect (*41*, *42*), suggesting a large number of vQTLs with small variance effects yet to be discovered in larger samples in the future.

There are several vQTLs for which the GEI effect has been reported in previous studies. The first example is the interaction effect of the *CHRNA5-A3-B4* locus (rs56077333) with smoking for lung function (as measured by FFR, i.e., FEV1/FVC), *P*_{vQTL} = 1.1 × 10^{−14} and *P*_{GEI(smoking)} = 4.6 × 10^{−25} (table S3A). The *CHRNA5-A3-B4* gene cluster is known to be associated with smoking and nicotine dependence (*43*–*45*). However, results from recent GWAS (*46*–*48*) do not support the association of this locus with lung function. We hypothesize that the effect of the *CHRNA5-A3-B4* locus on lung function depends on smoking (table S3A) (*49*). The vQTL signal at this locus remained (*P*_{vQTL} = 5.2 × 10^{−12}) after adjusting the phenotype for array effect, which was reported to affect the QTL association signal at this locus (*26*). The second example is the interaction of the *WNT16-CPED1* locus with age for bone mineral density (BMD) [rs10254825: *P*_{vQTL} = 2.0 × 10^{−45} and *P*_{GEI(age)} = 1.2 × 10^{−7}]. The *WNT16-CPED1* locus is one of the strongest BMD-associated loci identified from GWAS (*50*, *51*). We observed a genotype-by-age interaction effect at this locus for BMD (table S3B), in line with the results from previous studies that the effect of the top SNP at *WNT16-CPED1* on BMD in humans (*52*) and the knockout effect of *Wnt16* on bone mass in mice (*53*) are age dependent. The third example is the interaction of the *FTO* locus with PA and SB for obesity-related traits [*P*_{vQTL} < 1 × 10^{−10} for BMI, WC, HC, BFP, and BMR; *P*_{GEI(PA)} = 1.3 × 10^{−10} for BMI, 1.4 × 10^{−7} for WC, 5.3 × 10^{−7} for HC, and 2.6 × 10^{−7} for BMR]. The *FTO* locus was one of the first loci identified by the GWAS of obesity-related traits (*54*), although subsequent studies (*55*, *56*) show that *IRX3* and *IRX5* (rather than *FTO*) are the functional genes responsible for the GWAS association. The top associated SNP at the *FTO* locus is not associated with PA, but its effect on BMI decreases with the increase of PA level (*12*, *57*), consistent with the interaction effects of the *FTO* locus with PA or SB for obesity-related traits identified in this study (table S3, C and D). In addition, 5 of the 22 BMI vQTLs were in LD (*r*^{2} > 0.5), with the variants (identified by a recently developed multiple-environment GEI test) showing significant interaction effects at a false discovery rate (FDR) of <5% (corresponding to *P* < 1.16 × 10^{−3}) with at least 1 of 64 environmental factors for BMI in the UKB (*58*).

It should be noted that GEI is sufficient but not necessary to generate a vQTL. For the vQTLs that did not show a direct GEI effect in our GEI analysis, we cannot distinguish whether they are due to undetected GEI or direct effects on phenotypic dispersion, although GEI is a more likely explanation because of the enrichment of GEI (Fig. 5); hence, these traits and loci are candidates for follow-up studies to identify putative environmental risk factors that may be amendable to lifestyle modification. We also explored two other interpretations of the observed vQTLs, i.e., “phantom vQTLs” (*19*, *20*) and epistasis (genotype-by-genotype interaction) (*17*). If the underlying causal QTL is not well imputed or not well tagged by a genotyped/imputed variant, then the untagged variation at the causal QTL will inflate the vQTL test statistic, potentially leading to a spurious vQTL association, i.e., the so-called phantom vQTL. We showed by theoretical deviations that the Levene’s test statistic due to the phantom vQTL effect was a function of sample size, effect size of the causal QTL, allele frequency of the causal QTL, allele frequency of the phantom vQTL, and LD between the causal QTL and the phantom vQTL (note S6 and fig. S7A). From our deviations, we computed the numerical distribution of the expected phantom vQTL *F*-statistics, given a number of parameters including the sample size (*n* = 350,000), variance explained by the causal QTL (*q*^{2} = 0.005, 0.01, or 0.02), and MAFs of the causal QTL and the phantom vQTL (MAF = 0.05 to 0.5). The result showed that, for a causal QTL with *q*^{2} < 0.005 and MAF > 0.05, the largest possible phantom vQTL *F*-statistic was smaller than 2.69 (corresponding to a *P* value of 6.8 × 10^{−2}; fig. S7, B to D). This explains why there were thousands of genome-wide significant QTLs but no significant vQTL for height (Fig. 3 and Table 1). This result also suggests that the vQTLs detected in this study are very unlikely to be phantom vQTLs because the estimated variance explained by their QTL effects were all smaller than 0.005, except for rs10254825 at the *WNT16* locus on BMD (*q*^{2} = 0.014) (fig. S7E). However, our numerical calculation also indicated that, for a QTL with MAF > 0.3 and *q*^{2} < 0.02, the largest possible phantom vQTL *F*-statistic was smaller than 5.64 (corresponding to a *P* value of 3.6 × 10^{−3}), suggesting that rs10254825 is also unlikely to be a phantom vQTL. Note that we used the variance explained estimated at the top GWAS SNP to approximate *q*^{2} of the causal QTL so that *q*^{2} was likely to be underestimated because of imperfect tagging. However, considering the extremely high imputation accuracy for common variants (*59*), the strong LD between the causal QTLs and the GWAS top SNPs observed in a previous simulation study based on whole-genome sequence data (*38*), and the overestimation of variance explained by the GWAS top SNPs because of winner’s curse, the underestimation in causal QTL *q*^{2} is likely to be small. In addition, we reran the vQTL analysis, with the phenotype adjusted for the top GWAS variants within 10 Mb of the top vQTL SNP; the vQTL signals after this adjustment were highly concordant with those without adjustment (fig. S7F). We further showed that there was no evidence for epistatic interactions between the top vQTL SNPs and any other SNP located more than 10 Mb away or on a different chromosome (fig. S7G). Note that we did not perform epistatic test for SNP pairs within 10 Mb to avoid possible spurious epistatic signals caused by LD (*60*).

In conclusion, we systematically quantified the FPR and power for four commonly used vQTL methods by extensive simulations and demonstrated the robustness of the Levene’s test. We also showed that, in the presence of QTL effects, the Levene’s test statistic could be inflated if the phenotype was nonlinearly transformed. We implemented the Levene’s test as part of the OSCA software package (http://cnsgenomics.com/software/osca) (*61*) for efficient genome-wide vQTL analysis. We applied OSCA-vQTL to 13 quantitative traits in the UKB and identified 75 vQTL (at 41 near-independent loci) associated with 9 traits, 9 of which did not show a significant QTL effect. As a proof of principle, we performed GEI analyses in the UKB with five environmental factors/covariates and demonstrated the enrichment of GEI effects among the detected vQTLs. Hence, the vQTL trait-loci combinations we have identified could be investigated for as-yet-undetermined but measurable environmental risk factors generating GEI, as these factors could be amenable to lifestyle change interventions. We further derived the theory to compute the expected “phantom vQTL” test statistic due to untagged causal QTL effect and showed by numerical calculation that our observed vQTLs were very unlikely to be driven by imperfectly tagged QTL effects. Our theory is also consistent with the observation of pervasive phantom vQTLs for molecular traits with large-effect QTLs [e.g., DNA methylation (*20*)]. However, the conclusions from this study may only be applicable to quantitative traits of polygenic architecture. We caution vQTL analysis for binary or categorical traits, or molecular traits (e.g., gene expression or DNA methylation), for which the methods need further investigation.

## METHODS

### Simulation study

We used a DGLM (*33*–*35*) to simulate the phenotype based on two models with simulated SNP data in a sample of 10,000 individuals, i.e., a single-SNP model and multiple-SNP model with two covariates (i.e., age and sex). The single-SNP model can be written as*y* is a simulated phenotype; *w* or *w _{k}* is a standardized SNP genotype, i.e.,

*x*being the genotype indicator variable coded as 0, 1, or 2, generated from binomial(2,

*f*) and

*f*being the MAF generated from uniform(0.01, 0.5);

*c*is a standardized covariate with

_{j}*c*

_{1}(sex) generated from binomial(1, 0.5) and

*c*

_{2}(age) generated from uniform(20, 60);

*e*is an error term with mean 0 and variance

*e*from five different distributions, including normal distribution,

*t*-distribution with df = 10 or 3, and χ

^{2}distribution with df = 15 or 1. β (ϕ) is the effect on mean (variance) generated from

*N*(0,1) if exists, 0 otherwise. Log(σ

^{2}) is the intercept of the second linear model, which was set to 0. We rescaled the different components to control the variance explained, i.e., 0.1 and 0.9 for the genotype component and error term, respectively, in the single-SNP model, and 0.2, 0.4, and 0.4 for the covariate component, genotype component, and error term, respectively, in the multiple-SNP model. We simulated the SNP effects in four different scenarios: (i) effect on neither mean nor variance (nei), (ii) effect on mean only (mean), (iii) effect on variance only (var), or (iv) effect on both mean and variance (both). We simulated only one causal SNP in the single-SNP model and 4, 40, or 80 causal SNPs in the multiple-SNP model.

We performed vQTL analyses using the simulated phenotype and SNP data to compare four vQTL methods, including the Bartlett’s test (*29*), the Levene’s test (*31*), the FK test (*32*), and the DGLM (note S2). We also performed the Levene’s test with six phenotype process strategies, including raw phenotype (raw), raw phenotype adjusted for covariates (adj), RINT after adj (rint) (note S3), logarithm transformation after adj (log), square transformation after adj (sq), and cube transformation after adj (cub). We repeated the simulation 1000 times and calculated the FPR and power at *P* < 0.05 at a single-SNP level.

### The UKB data

The full release of the UKB data consisted of genotype and phenotype data for ~500,000 participants across the United Kingdom (*26*). The genotype data were cleaned and imputed to the Haplotype Reference Consortium (*59*) and UK10K (*62*) reference panels by the UKB team. Genotype probabilities from imputation were converted to hard-call genotypes using PLINK2 (--hard-call 0.1) (*63*). We excluded genetic variants with MAF < 0.05, Hardy-Weinberg equilibrium test *P* value < 1 × 10^{−5}, missing genotype rate > 0.05, or imputation INFO score < 0.3 and retained 5,554,549 variants for further analysis.

We identified a subset of individuals of European ancestry (*n* = 456,422) by projecting the UKB PCs onto those of the 1000 Genome Project (1KGP) (*64*). We then removed one of each pair of individuals with SNP-derived (based on HapMap 3 SNPs) genomic relatedness >0.05 using GCTA-GRM (*65*) and retained 348,501 unrelated European individuals for further analysis.

We selected 13 quantitative traits for our analysis (table S1A and fig. S2A). We adjusted the raw phenotype values for age and the first 10 PCs, excluded from the analysis phenotype values that were more than 5 SDs from the mean, and then standardized to *z* scores with mean 0 and variance 1 in each gender group.

### Genome-wide vQTL analysis

The genome-wide vQTL analysis was conducted using the Levene’s test implemented in the software tool OSCA (http://cnsgenomics.com/software/osca) (*61*). The Levene’s test used in the study [also known as the median-based Levene’s test or the Brown-Forsythe test (*31*)] is a modified version of the original Levene’s test (*30*) developed in 1960 that is essentially a one-way ANOVA test of the variable *y _{ij}* is the phenotype of the

*j*th individual in the

*i*th group and

*i*th group. The Levene’s test statistic

*F*distribution with

*k*− 1 and

*n*−

*k*degrees of freedom under the null hypothesis, where

*n*is the total sample size,

*k*is the number of groups (

*k*= 3 in vQTL analysis),

*n*is the sample size of the

_{i}*i*th group, i.e.,

The experiment-wise significance level was set to 2.0 × 10^{−9}, which is the genome-wide significance level (i.e., 1 × 10^{−8}) (*38*, *39*) divided by the effective number of independent traits (i.e., 5.00 for our 13 traits). The effective number of independent traits was estimated on the basis of the phenotypic correlation matrix (note S4) (*66*). To determine the number of near-independent vQTLs, we performed an LD clumping analysis for each trait using PLINK2 (--clump option with parameters --clump-p1 2.0e-9 --clump-p2 2.0e-9 --clump-r2 0.01 and --clump-kb 5000) (*63*). To visualize the results, we generated the Manhattan and regional association plots using the ggplot2 package in R.

### GWAS analysis

The GWAS (or genome-wide QTL) analysis was conducted using PLINK2 (*63*) (--assoc option) using the same data as used in the vQTL analysis (note that the phenotype had been pre-adjusted for covariates and PCs). The other analyses, including LD clumping and visualization, were performed using the same pipelines as those for genome-wide vQTL analysis described above.

### GEI analysis

Five environmental factors/covariates (i.e., sex, age, PA, SB, and smoking) were used for the direct GEI tests. Sex was coded as 0 or 1 for female or male. Age was an integer number ranging from 40 to 74. PA was assessed by a three-level categorical score (i.e., low, intermediate, and high) based on the short form of the International Physical Activity Questionnaire (IPAQ) guideline (*67*). SB was an integer number defined as the combined time (hours) spent driving, non–work-related computer using, or TV watching. The smoking factor “ever smoked” was coded as 0 or 1 for never or ever smoker. More details about the definition and derivation of environmental factor PA, SB, and smoking can be found in note S5, fig. S2B, and table S1B.

We performed a GEI analysis to test the interaction effect between the top vQTL SNP and one of the five environmental factors based on the following model*y* is the phenotype, μ is the mean term, *x _{g}* is the mean-centered SNP genotype indicator, and

*x*is the mean-centered environmental factor. We used a standard ANOVA analysis to test for β

_{E}*and applied a stringent Bonferroni-corrected threshold 1.33 × 10*

_{gE}^{−4}[i.e., 0.05/(75 × 5)] to claim a significant GEI effect.

## SUPPLEMENTARY MATERIALS

Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/5/8/eaaw3538/DC1

Note S1. Theoretical derivation of vQTL as a consequence of GEI

Note S2. The Bartlett’s test, the FK test, and the DGLM test

Note S3. Rank-based inverse-normal transformation

Note S4. The effective number of independent traits

Note S5. Definitions of the three environmental factors—PA, SB, and smoking

Note S6. Expected inflation in the Levene’s test statistic due to phantom vQTL effect

Note S7. Acknowledgments

Fig. S1. Evaluation of statistical methods and phenotype processing strategies for the vQTL analysis by simulation based on a single-SNP model.

Fig. S2. Phenotypic correlations among 13 quantitative traits, and PA and SB measures in the UKB.

Fig. S3. Genome-wide vQTL and QTL analyses for 13 traits in the UKB.

Fig. S4. Quantile-quantile plots of vQTL associations for the 13 UKB traits.

Fig. S5. Enrichment of GEI effects among the 75 vQTLs compared with a random set of QTLs using the raw phenotypic values.

Fig. S6. Comparison of the Young *et al*. method with the Levene’s test by vQTL simulation.

Fig. S7. Excluding two alternative explanations for vQTL signals: Phantom vQTLs and epistasis.

Table S1. Descriptive summary of (A) the quantitative traits and (B) the environmental data used in this study from the UKB.

Table S2. Seventy-five experiment-wise significant vQTLs for nine UKB traits.

Table S3. GEI examples.

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is **not** for commercial advantage and provided the original work is properly cited.

## REFERENCES AND NOTES

**Acknowledgments:**

**Funding:**This research was supported by the Australian Research Council (DP160101343 and DP160101056, FT180100186), the Australian National Health and Medical Research Council (1078037, 1078901, 1113400, 1107258, and 1083656), and the Sylvia & Charles Viertel Charitable Foundation.

**Author contributions:**J.Y. and A.F.M. conceived the study. J.Y., H.W., and A.F.M. designed the experiment. F.Z. developed the software tool. H.W. performed simulations and data analyses under the assistance or guidance from J.Y., J.Z., Y.W., K.E.K., A.X., and M.Z. J.E.P., M.E.G., N.R.W., and P.M.V. provided critical advice that significantly improved the experimental design and/or interpretation of the results. P.M.V., N.R.W., and J.Y. contributed resources and funding. H.W. and J.Y. wrote the manuscript with the participation of all authors.

**Competing interests:**The authors declare that they have no competing interests.

**Data and material availability:**This study makes use of data from the UKB (project ID: 12514). A full list of acknowledgments of this data set can be found in note S7. The individual-level genotype and phenotype data used in this study can be provided by the UKB (http://www.ukbiobank.ac.uk/) pending scientific review and a completed material transfer agreement (requests for these data should be submitted to the UKB). The vQTL summary statistics of all SNPs for the 13 UKB traits are available at the OSCA website (http://cnsgenomics.com/software/osca). All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials. Additional data related to this paper may be requested from the authors.

- Copyright © 2019 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution NonCommercial License 4.0 (CC BY-NC).