Gene Expression Imputation Across Multiple Tissue Types Provides Insight Into the Genetic Architecture of Frontotemporal Dementia and Its Clinical Subtypes

BACKGROUND
The etiology of frontotemporal dementia (FTD) is poorly understood. To identify genes with predicted expression levels associated with FTD, we integrated summary statistics with external reference gene expression data using a transcriptome-wide association study approach.


METHODS
FUSION software was used to leverage FTD summary statistics (all FTD: n = 2154 cases, n = 4308 controls; behavioral variant FTD: n = 1337 cases, n = 2754 controls; semantic dementia: n = 308 cases, n = 616 controls; progressive nonfluent aphasia: n = 269 cases, n = 538 controls; FTD with motor neuron disease: n = 200 cases, n = 400 controls) from the International FTD-Genomics Consortium with 53 expression quantitative loci tissue type panels (n = 12,205; 5 consortia). Significance was assessed using a 5% false discovery rate threshold.


RESULTS
We identified 73 significant gene-tissue associations for FTD, representing 44 unique genes in 34 tissue types. Most significant findings were derived from dorsolateral prefrontal cortex splicing data (n = 19 genes, 26%). The 17q21.31 inversion locus contained 23 significant associations, representing 6 unique genes. Other top hits included SEC22B (a gene involved in vesicle trafficking), TRGV5, and ZNF302. A single gene finding (RAB38) was observed for behavioral variant FTD. For other clinical subtypes, no significant associations were observed.


CONCLUSIONS
We identified novel candidate genes (e.g., SEC22B) and previously reported risk regions (e.g., 17q21.31) for FTD. Most significant associations were observed in dorsolateral prefrontal cortex splicing data despite the modest sample size of this reference panel. This suggests that our findings are specific to FTD and are likely to be biologically relevant highlights of genes at different FTD risk loci that are contributing to the disease pathology.

Frontotemporal dementia (FTD) is a heterogeneous neurodegenerative disorder characterized by frontal and/or temporal patterns of atrophy. Clinically, patients with FTD present with the behavioral variant of FTD (bvFTD) or language variants such as semantic dementia (SD) and progressive nonfluent aphasia (PNFA) (1). In 10% of all cases, FTD co-occurs with motor neuron diseases (FTD-MND) (2).
Where FTD is mostly sporadic (80%), approximately 20% of all FTD cases are familial, with the most common Mendelian mutations including the hexanucleotide repeat expansion at the C9ORF72 locus on chromosome 9 and mutations in microtubule-associated protein tau (MAPT) and progranulin (GRN) genes in and near the chromosome 17q21 inversion locus (3)(4)(5)(6)(7). Genome-wide association studies (GWASs) in FTD have also identified genetic risk variants, each having small associations with disease risk (8)(9)(10)(11). The number of known FTD disease susceptibility loci remains small owing to limited power for discovery in the relatively small sample sizes of the GWASs so far with n cases , 5000. At this time, it is poorly understood how genetic risk variants for FTD exert effects on etiology, while such knowledge is essential for understanding disease pathology and the development of therapeutic interventions.
Genetic risk variants identified in GWASs are often located in noncoding regions with and without regulatory motifs outside the protein encoding sequences (12). These risk variants are likely to predispose individuals to disease susceptibility by modulating messenger RNA expression levels through local (cis) or distal (trans) expression quantitative trait loci (eQTL) (13). The FTD risk variant rs302652 nearby RAB38 is a local eQTL, decreasing RAB38 gene expression in monocytes (11) and potentially influencing bvFTD disease risk by modulating RAB38 gene expression levels in specific brain areas. However, the joint effects of genetic risk loci for FTD on (differential) gene expression across multiple tissue types are unclear.
Transcriptome-wide association studies (TWASs) have emerged as a way to identify associations between traits and SEE COMMENTARY ON PAGE e37 gene expression. The most common TWAS methods include PrediXcan, summary data-based Mendelian randomization, and FUSION (14)(15)(16). TWAS leverage the combined effects of multiple single nucleotide polymorphisms (SNPs), on either the individual level (PrediXcan and summary data-based Mendelian randomization) or the summary level (s-PrediXcan and FUSION), on gene expression, thereby increasing power to find novel associations over a traditional GWAS when gene expression mediates risk (14)(15)(16). Imputation of the genetic control of gene expression is now widely used to decipher how GWAS-identified alleles may contribute to disease risk, and to identify specific candidate genes through which this effect is regulated. In this study, we performed a multitissue TWAS on sporadic FTD and its clinical subtypes to identify genes whose changes in expression play a role in FTD and to identify tissue types relevant to FTD. As a secondary aim of the study, we performed a TWAS-based enrichment analysis and explored whether FTD shows overlap in differential expression with neuropsychiatric disorders that show clinical overlap with FTD.
Preprocessing and quality check procedures have been described previously (11). SNPs were converted from chr:bp to rsID coordinates using Phase 3 1000 Genomes Project data (17). Summary statistics were quality checked and converted to LD-score format using the munge_stats.py utility from LDSC, leaving 1,068,995 SNPs for final analysis for all phenotypes (18) (22), and the Genotype Tissue Expression Project (GTEx) version 7 (https://gtexportal.org/home/datasets) (n = 752). Local eQTLs were calculated by leveraging gene expression with genetic variation data (i.e., SNPs within 61 Mb of the transcriptional start site of the gene). More detailed information on genotyping and gene expression analyses for these datasets have been described previously: CMC (23), NTR, YFS, METSIM (15), and GTEx (24).
Local eQTL datasets from tissue types less relevant to FTD (e.g., blood) were included in this study because local eQTLs are highly conserved across tissues (25) and eQTL datasets with nonbrain tissues consist of substantially larger sample sizes, thereby maximizing power to detect significant associations between local gene expression and FTD GWAS SNPs.

Functional Mapping and Annotation
To examine the proportion of noncoding variants among FTDrisk SNPs, we annotated SNPs from the IFGC GWAS on FTD using Functional Mapping and Annotation (FUMA) (https:// fuma.ctglab.nl) (26). The most significant (p , 5 3 10 26 ) SNPs and SNPs in linkage disequilibrium (LD) (r 2 $ .6) with these were used for further inspection using 1000 Genomes Project data (17). Lead SNPs were defined as being independent from each other at r 2 . .1. LD blocks of independent SNPs were merged into a genomic locus if they were closely located to each other (i.e., ,250 kb).
Lead and correlated SNPs were annotated for potential regulatory functions (RegulomeDB) (27), 15-core chromatin state predicted by ChromHMM (28), functional consequences on gene functions annotated by ANNOVAR (29), and deleteriousness score (Combined Annotation Dependent Depletion) (30). To test for enrichment of functional consequences of lead and correlated SNPs (as estimated with ANNOVAR), we performed a Fisher's exact test using a 5% false discovery rate (FDR) significance threshold (see https://fuma.ctglab.nl/ tutorial#annov). The enrichment value was calculated as the proportion of SNPs with an annotation divided by the proportion of SNPs with an annotation relative to all available SNPs in Phase 3 1000 Genomes Project data (17).
Statistical Analysis TWAS Analysis. To identify genes whose local-regulated expression is associated with FTD and its clinical subtypes (i.e., bvFTD, SD, PNFA, and FTD-MND), we performed TWAS analyses using FUSION software (http://gusevlab.org/projects/ fusion) with default settings (15). FUSION estimates the genetic correlation between local gene expression and FTD by integrating GWAS summary statistics with external gene expression reference panel data while accounting for LD structure among SNPs [using Phase 3 1000 Genomes Project data (17)]. To account for LD structure, we used 1000 Genomes (all ancestries) data as LD reference panel.
To study whether GWAS SNPs colocalized with eQTLs, we performed a Bayesian colocalization analysis for all associations with p TWAS uncorrected , .05 using the COLOC package in R (https://cran.r-project.org/web/packages/coloc) (31) implemented in FUSION. A joint analysis was performed to identify which genes are conditionally independent.
TWAS results are presented including the major histocompatibility (MHC) locus because the FTD GWAS included genome-wide significant loci within the MHC region (11). Results on gene-tissue associations per phenotype (i.e., FTD, bvFTD, SD, PNFA, and FTD-MND) were corrected for multiple comparisons using a 5% FDR significance threshold. Significant TWAS loci were identified as novel if the strongest FTDassociated SNP was not nominally significant (p . .05) in the IFGC GWAS (11) within 61 Mb of the transcriptional start site of the gene's region.
Transcriptome-wide Analysis on Frontotemporal Dementia  (32). Here, we define h 2 med as heritability mediated by local gene expression, h 2 g as disease heritability, and h 2 med /h 2 g as the proportion of heritability mediated by local gene expression. First, for each gene, local heritability scores were estimated while accounting for LD structure. Genes were partitioned into bins according to their local heritability because this has been shown to provide unbiased h 2 med /h 2 g estimates. Second, we estimated h 2 med /h 2 g from expression scores estimated in the previous step and GWAS summary statistics on FTD. Because MESC produces biased estimates for eQTL reference panels with small sample sizes, only eQTL datasets with sample sizes .300 (n = 17) were included.
Enrichment Analysis. Competitive enrichment analysis on FTD TWAS results was performed using TWAS-based gene set enrichment analysis (TWAS-GSEA) (https://github. com/opain/TWAS-GSEA) (33). TWAS-GSEA is an adapted method of GWAS-based enrichment analysis implemented in MAGMA software (34). In brief, this method examines whether TWAS results are enriched for specific pathways while accounting for LD structure. Per phenotype, TWAS-GSEA was performed simultaneously for all 53 eQTL datasets. The file used as eQTL reference panel for the TWAS-GSEA analysis included unique gene identifiers only; if genes were present in multiple local eQTL datasets, the gene with the best prediction of expression (as estimated by cross-validated R 2 [MODELCV.R2]) was used in the GSEA. Gene identifiers in TWAS result files were converted to Entrez ID format using the biomaRt package in R, resulting in 15,004 (14,813 non-MHC) unique Entrez IDs for FTD and all clinical FTD subtypes. TWAS results were tested for enrichment across 6778 Gene Ontology biological process gene sets. Per phenotype, results were corrected for the number of gene sets using a 5% FDR significance threshold.
Data Availability. The GWAS summary statistics on FTD can be acquired via the IFGC (https://ifgcsite.wordpress.com/ data-access). Local eQTL reference weights can be downloaded from the FUSION website (http://gusevlab.org/projects/ fusion).

Predicted Gene Expression Levels Show 73 Associations With FTD
Predicted gene expression levels in 53 tissue types (range of genes per tissue type = 1505-9229) were tested for association with FTD. We identified 73 significant genetissue associations for FTD, representing 44 (40 non-MHC) unique genes in 34 tissue types ( Table 1, Table S3 in Supplement 2, Figure 1, and Figure 2). In total, 39.7% (29/73) of these transcriptome-wide significant associations had supporting evidence from colocalization analyses (Table S4 in Supplement 2). The strongest genic FTD TWAS associations included ARL17B on chromosome 17 (brain cerebellar hemisphere p FDR = 9.  One region of interest is 17q21.31 on chromosome 17, which contained 23 significant associations, representing 6 unique genes (i.e., ARL17B, KANSL1-AS1, LRRC37A, MAPT, MAPT-AS1, and NSFP1). This locus is an inversion polymorphism that has been associated previously with neurodegenerative tauopathies but also with psychiatric disorders such as autism spectrum disorder (33,35). Gene expressions of most gene-tissue pairs were highly correlated except for KANSL1-AS1, MAPT, and MAPT-AS1 ( Figure S2 in Supplement 1). For the majority of significant associations in 17q21.31 (n = 16, 69.6%), colocalization analysis provided evidence for a shared causal genetic variant between gene expression and FTD (Table S4 in Supplement 2).
Another region was 7p14.1, for which predicted gene expression of TRGV5 and its pseudogene TRGV5P achieved transcriptome-wide significance in 4 different tissue types. Colocalization analyses suggested that FTD and 7p14.1 gene expression share a single causal association (Table S4 in Supplement 2).

Most TWAS Associations Were Detected in DLPFC Splicing Data
The brain-derived reference panels contributed the most to the significant associations between gene expression and FTD (43.8%, 32 gene-tissue associations), with the majority derived from DLPFC splicing data (19 splicing variants, 13 unique, all outside MHC). A previous study showed that a larger sample size and increased number of measured genes of the eQTL reference panel correlates to a higher number of significant hits (36). Despite the modest sample size (n sample = 452) and number of measured genes (n genes unique = 3221, n genes total = 7514), DLPFC splicing data accounted for 26% of all transcriptome-wide hits, thereby exceeding the number of significant hits compared with eQTL tissue types with larger sample sizes (e.g., 0% for YFS whole blood, n sample = 1264) and more measured genes (e.g., 3% for thyroid, n genes, unique = 9225, n genes total = 9229) ( Figure S3 and Figure S4 in Supplement 1). Accordingly, FTD TWAS results showed significant enrichment for DLPFC splicing data (p = 7.31 3 10 23 ) (Table S6 in Supplement 2).
MESC analysis showed that a substantial proportion of Transcriptome-wide Analysis on Frontotemporal Dementia splicing data, the mean h 2 med was 43.8 6 8.5%, whereas for the eQTL panel with the largest sample size (i.e., YFS whole blood data) this was 12.6 6 7.4%. A full overview of local mediated heritability is presented in Figure S5 in Supplement 1 (see SNP heritability estimates in Table S1 in Supplement 2).

Predicted Gene Expression Levels on Clinical Subtypes Separately Show Association With bvFTD Only
Predicted gene expression levels in 53 tissue types (range of genes per tissue type = 1505-9229) were tested for association with bvFTD, SD, PNFA, and FTD-MND. Gene expression

Implicated Genes Highlight Involvement of Amino Acid Transport in FTD Pathogenesis
Full competitive results for the enrichment analysis on FTD and its clinical subtypes are presented in Tables S15 to S24 in Supplement 2. TWAS results for FTD were significantly enriched for sulfur amino acid transport (with MHC p FDR = .04, without MHC p FDR = .03) (Figures S11 and S12 in Supplement 1). For all other gene sets and traits, no gene sets were significant after FDR correction.

No Genetic Correlations Were Observed Between Gene Expression FTD and Alzheimer's Disease, Amyotrophic Lateral Sclerosis, and Primary Psychiatric Disorders
Given the similarities between FTD and several neuropsychiatric disorders, we explored the genetic correlations between the predicted gene expression for FTD and Alzheimer's disease (AD), amyotrophic lateral sclerosis (ALS), schizophrenia, autism spectrum disorder, and major depressive disorder using RHOGE (37) (see Supplemental Methods in Supplement 1). No significant correlations were observed after FDR correction (Tables S25 and S26 in Supplement 2 and Figure S13 in Supplement 1).

DISCUSSION
In this study, we aimed to better understand the genetic etiology of sporadic FTD by identifying genes whose expression plays a role in FTD, using a TWAS approach with increased power of detecting loci compared with a traditional GWAS. We identified 73 significant gene-tissue associations for FTD, representing 44 unique genes in 34 tissue types. The 17q21.31 inversion region was replicated as risk region for FTD. SEC22B was identified as likely novel risk gene for FTD. Interestingly, most associations were derived from splicing data of the DLPFC, a brain region that is nearly universally involved in FTD, thereby providing some biological validation to the multitissue TWAS approach in FTD. Moreover, these findings highlight the importance of splicing events for disease risk (38). Our results indicate that a large proportion of FTD risk loci modulate gene expression levels, and we highlight these genes as potential candidates for functional follow-up studies.
The majority of FTD risk variants were located in noncoding regions, demonstrating that these variants likely have regulatory functions. A total of 44 genes were identified as differentially expressed in FTD. We replicated the 17q21.31 locus as risk factor for FTD. This region contained 23 significant associations from 6 different genes: ARL17B, KANSL1-AS1, LRRC37A, NSFP1, MAPT-AS1, and MAPT. Mutations in the latter gene, MAPT, are identified as one of the most common Mendelian mutations implicated in familial FTD (6). The 17q21.31 region contains a common inversion polymorphism and has been associated not only with several neurodegenerative disorders (e.g., progressive supranuclear palsy, corticobasal degeneration, AD, FTD) but also with psychiatric disorders such as autism spectrum disorder (33,35,(39)(40)(41)(42). Previous research has shown that different haplotypes of the 17q21.31 inversion affect expression of 17q21.31 genes in blood and different brain regions (43). Here, we highlight the role of differential gene expression of 17q21.31 genes across several tissue types in the pathogenesis of FTD.
Another implicated gene was SEC22B on chromosome 1, which showed evidence for differential gene expression in FTD without achieving genome-wide significance in the corresponding FTD GWAS (p . .05 within 61 Mb of SEC22B). GWAS, genome-wide association study; TWAS, transcriptome-wide association study. Transcriptome-wide Analysis on Frontotemporal Dementia SEC22B codes for a protein that plays an important role in vesicle trafficking between the Golgi apparatus and the endoplasmic reticulum, autophagy, and membrane fusion. The latter is essential for the development of the nervous system, including axonal and dendritic growth (44). Little is known about the precise role of SEC22B in neurodegeneration, but differential expression of this gene in the brain has been associated with normal aging and AD (45,46).
We found increased C4A gene expression to be significantly associated with FTD. The C4 gene has two functionally different isoforms (i.e., C4A and C4B, both of which can vary in structure and copy number) and is located on the MHC locus, a locus strongly associated with immune-related processes. Structural variation in C4A/B has been associated with schizophrenia, probably affecting synaptic pruning (47,48). The potential role of C4 (structure) in the etiology of FTD is not fully understood yet. Human postmortem and mice model studies on FTD demonstrate an association between upregulated C4A gene expression and aggregation of transactive response DNA-binding protein 43 (TDP), one of the most common pathological subtypes underlying FTD (49,50). Although this would suggest a specific relationship between upregulated C4A gene expression and FTD pathology, increased C4A gene expression has also been observed in AD and schizophrenia (51).
We explored the genetic correlation between predicted gene expression for FTD and primary psychiatric disorders AD and ALS. Although FTD and psychiatric disorders overlap with respect to symptoms and affected neuroanatomical regions, we found no indications for an overlapping expression profile (52,53). We further did not observe a significant overlap of predicted gene expression for FTD with both AD and ALS. Although previous studies have reported a shared genetic architecture between FTD and ALS (54), our results suggest that the known clinical association between FTD and ALS (in w10% of all cases) might not be driven by an overlap in gene expression. Altogether, this suggests that at least part of the FTD TWAS signal is specific for FTD rather than generic to neuropsychiatric disorders.
Proteins differentially expressed in FTD showed enrichment for the transport of sulfur amino acids (e.g., methionine, cysteine), a process essential for the synthesis of antioxidants. For example, transport of L-cystine (i.e., oxidized form of cysteine) is needed for the production of antioxidant glutathione in the brain (55). Sulfur amino acids are sensitive to oxidative modifications by reactive oxygen-containing species. A balance between the production of reactive oxygencontaining species and antioxidants protects cells against invaders. However, an imbalance leads to increased oxidative stress, which is particularly damaging to cells in high demand of oxygen such as neuronal cells (56). Increased oxidative stress has been associated with aging and has been observed in several disorders, including FTD (56)(57)(58).
Despite the modest sample size of the DLPFC CMC reference panel, the DLPFC contributed to significantly more transcriptome-wide findings compared to other tissue types, thereby highlighting the topology-specific neurodegenerative nature of FTD. MESC analysis, an approach to examine the genome-wide distribution of heritability, showed that the tibial nerve had the largest proportion of heritability mediated by local gene expression, which may reflect the comorbidity of FTD with motor neuron disorders. However, motor neuron disorders typically present with the degeneration of both upper and lower motor neurons, while most (but not all) studies indicate that sensory neurons are spared (59,60). While tibial nerve degeneration has been observed in motor neuron disorders, this nerve contains both motor and sensory axons and Schwann cells, making it possibly less specific as tissue of transcriptome-wide marginally significant associated genes are highlighted in blue, and those that are jointly significant (i.e., RAB38 in colon sigmoid) are highlighted in green. The bottom panel shows a Manhattan plot of the genome-wide association study data before (gray) and after (blue) conditioning on the green genes. This locus goes from being genome-wide significant to being nonsignificant after conditioning on the predicted expression of RAB38. chr, chromosome.
Transcriptome-wide Analysis on Frontotemporal Dementia interest for motor neuron disorders (61). Therefore, current MESC results should be validated using reference weights of upper and lower motor neuron tissue types.
We also observed various associations outside the brain, potentially highlighting the importance of other organ systems in FTD. In line with this, other organ systems, such as the gastrointestinal and musculoskeletal systems, have been associated with FTD (62,63). On the other hand, we included local eQTL data from many tissue types-also those that are seemingly less disease relevant-to increase power and to include as many genes in this exploratory study. As a result, we might not have detected the true mechanism of disease owing to a shared cross-tissue regulatory architecture of eQTLs between the tissue types related and nonrelated to FTD (25,64). This is illustrated by our finding on bvFTD, for which we identified differential regulatory gene expression only of RAB38 in tissue types outside the brain. Because RAB38 is expressed throughout the brain (https://gtexportal.org/home/gene/RAB38) but is not available in the brain tissue panels we used, we hypothesize that differential expression of RAB38 in the brain contributes to bvFTD disease risk as well. To gain a deeper understanding of molecular mechanisms underlying FTD, future TWAS studies should increase the sample sizes of eQTL reference panels of disease-relevant tissue types and refine tissue-specific information with, for instance, cell type-specific features.
This study is a starting point for bridging the gap between genetic variation and disease pathogenesis involving specific genes in FTD. Nevertheless, several limitations should be taken into account. First, where TWAS increases power over a traditional GWAS, the small sample size of the current FTD GWAS (n = 2154 cases, n = 4308 controls) still reduces the power to find novel transcriptome-wide associations. As such, future TWASs on FTD should be performed using FTD GWAS summary statistics with a larger sample size, because this would increase not only the power to detect true associations, but also the robustness of results on tissue enrichment and genetic correlations. A second major limitation is that this study does not address the pathological heterogeneity in FTD. The most common pathological subtypes of FTD include abnormal aggregation of tau (frontotemporal lobar degeneration [FTLD]-tau) and FTLD-TDP (65). Because we performed a TWAS on the clinical entity of FTD, this study provides insights only into generic mechanisms underlying FTD but not into specific mechanisms underlying pathological subtypes. Additional studies in postmortem verified FTD cases are required to gain more insight into distinct mechanisms underlying pathological subtypes of FTD. Moreover, our results should be replicated using independent cis eQTL datasets to exclude the possibility that presented findings reflect false-positive findings. Finally, it should be noted that TWAS or colocalization analysis cannot be used for causal inference (64). Therefore, it is essential that our efforts will be extended to functional validation to further understand the relationship between FTD and genes reported in this study.
Results presented in this study could be used as a point of reference in future genetic association studies on FTD. We provide evidence for the contribution of many genes, with both tissue-shared and tissue-specific effects, to the pathogenesis of FTD, including potential novel (i.e., SEC22B) and previously reported (e.g., 17q21.31 inversion region, C4A) FTD risk loci. Most associations were detected in DLPFC splicing data, but tissues outside the brain may be involved in FTD as well. However, functional validation is needed because TWASs are sensitive to detecting associations not relevant for disease if the disease-relevant tissue is not well represented across reference panels. Identifying which biological processes are genetically influenced by FTD is important for understanding the disease etiology, and eventually for the development of treatments.