Delineating the Genetic Component of Gene Expression in Major Depression

Background Major depression (MD) is determined by a multitude of factors including genetic risk variants that regulate gene expression. We examined the genetic component of gene expression in MD by performing a transcriptome-wide association study (TWAS), inferring gene expression–trait relationships from genetic, transcriptomic, and phenotypic information. Methods Genes differentially expressed in depression were identified with the TWAS FUSION method, based on summary statistics from the largest genome-wide association analysis of MD (n = 135,458 cases, n = 344,901 controls) and gene expression levels from 21 tissue datasets (brain; blood; thyroid, adrenal, and pituitary glands). Follow-up analyses were performed to extensively characterize the identified associations: colocalization, conditional, and fine-mapping analyses together with TWAS-based pathway investigations. Results Transcriptome-wide significant differences between cases and controls were found at 94 genes, approximately half of which were novel. Of the 94 significant genes, 6 represented strong, colocalized, and potentially causal associations with depression. Such high-confidence associations include NEGR1, CTC-467M3.3, TMEM106B, LRFN5, ESR2, and PROX2. Lastly, TWAS-based enrichment analysis highlighted dysregulation of gene sets for, among others, neuronal and synaptic processes. Conclusions This study sheds further light on the genetic component of gene expression in depression by characterizing the identified associations, unraveling novel risk genes, and determining which associations are congruent with a causal model. These findings can be used as a resource for prioritizing and designing subsequent functional studies of MD.

SNP-weights were derived from several studies, including the PsychENCODE cohort. The PsychENCODE SNP-weights were downloaded from the PsychENCODE resources page Furthermore, PsychENCODE SNP-weights were based on all variants available in the PsychENCODE cohort after HRC imputation, instead of being restricted to HapMap3 variants like FUSION-released SNP-weights. To address this, a 1KG Phase 3 reference that was not restricted to HapMap3 variants was used in the PsychENCODE TWAS analysis, and downstream conditional and FOCUS analyses across PsychENCODE and FUSION-released SNP-weights sets.
Most common variants in the HRC reference are available in the 1KG Phase 3 reference, although the few missing variants may lead to a small decrease in gene expression imputation accuracy.

Calculation of the transcriptome-wide significance threshold
Firstly, as expected in the FUSION protocol, GE levels were inferred for all heritable features based on the selected SNP-weight sets and the 1000 Genomes LD reference data. GWAS summary statistics were not used as the calculation of the significance threshold should be based on a null phenotype. A thousand permutations were performed, with each permutation randomly generating a normally-distributed null phenotype. Such phenotype was subsequently tested in association with predicted GE levels. Given the use of a random phenotype, the identified feature -trait relationships should constitute false-positive findings, uniquely attributable to chance. To develop a significance threshold able to distinguish between chance and meaningful findings, the minimum p-value of all individual permutations (Npermutations = 1,000) was collected to form a normal distribution of minimum p-values. The five percent quantile of this distribution, equalling to a false positive rate of α = .05 (21), was considered as our transcriptome-wide significance threshold, which corresponded to p = 1.37×10 -06 . We additionally calculated a more stringent threshold to capture genes of high significance (α = 0.001): p = 3.69×10 -08 .

TWAS FUSION
Gene expression-depression associations were obtained only for features with a non-zero cis-SNP heritability (p < 0.01). This was calculated with the Average Information Restricted Maximum Likelihood (AI-REML) algorithm of Genome-wide Complex Trait Analyses (GCTA). SNPs were selected if they resided within the gene boundaries ±500kb.
A TWAS FUSION analysis was run for all significantly heritable features, for each feature, based on one of several predictive models (BLUP, LASSO, elastic net, or BSLMM), with the best-fitting model being used. TWAS Z-scores were estimated, for each feature separately, based on the following linear model: Where wi = the correlation of a SNP within a feature with the gene expression of such feature, zi = the standardized effect from the GWAS which tested the association between that SNP and a trait, and i = the SNP within the feature.
Outputs from all chromosomes and SNP-weight sets were merged and subsequently filtered based on our transcriptome-wide significant threshold.

Colocalization
This method used a Bayesian approach estimating the posterior probability (PP) of five models concerning GWAS and TWAS associations. These involve a SNP being in: By comparing values for models three and four (PP3 vs. PP4), we can distinguish whether the GWAS and TWAS associations are colocalized, i.e. whether the signal for an association with the trait and with GE at a locus results from the same causal polymorphism. To perform colocalization, the coloc R package (22), available in the FUSION software, was employed. Colocalization was implemented only for genes surpassing transcriptome-wide significance.

Conditional Analysis
With a conditional analysis, we could identify loci of co-expression as well as distinguish between independent and conditioned features. Independent/jointly significant features remain associated with the phenotype, at a nominal significance level (p < 0.05), after adjustments. Conditioned or marginally significant features are those whose association with depression is solely reliant on the expression of other nearby features. Conditioned features are thus significantly associated with depression in the unadjusted model only.
A conditional analysis can additionally show to what extent GWAS signal is attributable to functional associations. In fact, in such analysis, the top SNP in a locus is conditioned on the GE patterns of the most significant feature in the same locus. Subsequently, the variance in a SNP signal accounted for by such functional associations was calculated, with the following formula:

unconditioned GWAS association
Where R 2 = variance explained, χ 2 conditioned GWAS association = SNP-phenotype association after adjusting for the GE of the most significant feature in the locus, χ 2 unconditioned GWAS association = SNP-phenotype association before adjusting.
The conditional analysis was performed for chromosomal regions with multiple statistically significant features, within and across SNP-weight sets, as described in the FUSION webpage (http://gusevlab.org/projects/fusion/). Features were defined as pertaining to a shared locus when boundaries overlapped within 1.5Mb±0.5.

TWAS-GSEA
For the TWAS-GSEA, a linear mixed model was run. Gene set membership was regressed on the TWAS feature z-score indicating non-zero association. Gene set membership was developed as a dichotomous variable with information on whether a gene pertains to a given functional annotation. Analyses were performed with the R package lme4qtl (23), which permits the fitting of linear mixed models.
We conducted a hypothesis-free TWAS-GSEA testing all annotations from the gene ontology (GO) resource, a database where gene and gene set functions are specified. Candidate gene sets were also tested. These contained pathways previously selected by Wray et al. (24) due to their involvement in psychiatric disorders. We attempted to validate these with a functionally-informed gene set enrichment analysis.

Comparison with previous literature
We additionally contrasted our results to previous studies of observed gene expression and of predicted gene expression (TWASs) in MD. The largest study to date on observed gene expression

Results after exclusion of the Major Histocompatibility Complex
Here, we explain results from the TWAS and follow-up analyses after excluding features from the Major Histocompatibility Complex (MHC). This was done as associations in the MHC region are generally difficult to interpret due to extensive linkage disequilibrium.
Firstly, we identified 121 transcriptome-wide significant features. Of these, 77 were colocalized and 40 were jointly significant features. When using FOCUS fine-mapping, 23 features were likely causal. All six of the high-confidence associations were outside the MHC. Moreover, 38 features were novel compared to the Wray et al. GWAS (24).
Levels of expression for differentially expressed genes throughout developmental stages and across (A) all tissues, (B) groups of tissues (brain, blood, HPT and HPA axes), and (C) single tissues. No Bonferroni significant enrichment or depletion was shown. Nominally significant enrichment is however depicted in red, while nominally significant depletion is depicted in blue.
Enrichment indicates that differentially expressed genes were particularly expressed at a given developmental period. On the contrary, depletion shows that differentially expressed genes were generally expressed at lower rates at a given developmental period.