Enhancing Discovery of Genetic Variants for Posttraumatic Stress Disorder Through Integration of Quantitative Phenotypes and Trauma Exposure Information

BACKGROUND: Posttraumatic stress disorder (PTSD) is heritable and a potential consequence of exposure to traumatic stress. Evidence suggests that a quantitative approach to PTSD phenotype measurement and incorporation of lifetime trauma exposure (LTE) information could enhance the discovery power of PTSD genome-wide association studies (GWASs). METHODS: A GWAS on PTSD symptoms was performed in 51 cohorts followed by a fixed-effects meta-analysis (N = 182,199 European ancestry participants). A GWAS of LTE burden was performed in the UK Biobank cohort (N = 132,988). Genetic correlations were evaluated with linkage disequilibrium score regression. Multivariate analysis was performed using Multi-Trait Analysis of GWAS. Functional mapping and annotation of leading loci was performed with FUMA. Replication was evaluated using the Million Veteran Program GWAS of PTSD total symptoms. RESULTS: GWASs of PTSD symptoms and LTE burden identified 5 and 6 independent genome-wide significant loci, respectively. There was a 72% genetic correlation between PTSD and LTE. PTSD and LTE showed largely similar patterns of genetic correlation with other traits, albeit with some distinctions. Adjusting PTSD for LTE reduced PTSD heritability by 31%. Multivariate analysis of PTSD and LTE increased the effective sample size of the PTSD GWAS by 20% and identified 4 additional loci. Four of these 9 PTSD loci were independently replicated in the Million Veteran Program. CONCLUSIONS: Through using a quantitative trait measure of PTSD, we identified novel risk loci not previously identified using prior case-control analyses. PTSD and LTE have a high genetic overlap that can be leveraged to increase discovery power through multivariate methods.


GWAS Quality Control
Genotyping, quality control (QC), and imputation methods for the included studies have been described in detail (8). In brief, participating cohorts provided phenotype and genotype data or GWAS summary statistics to the PGC-PTSD for quality control and analysis. For studies in which the PGC-PTSD analyst had direct access to genotype data, RICOPILI (17) was used to perform QC and imputation. QC included standard filters for single nucleotide polymorphism (SNP) call rates (exclusion of SNPs with call rate <98% or a missing difference >0.02 between cases and controls), call rate for participant genotypes (samples with <98% call rate excluded), Hardy-Weinberg equilibrium (p < 1 × 10 −6 in controls), and heterozygosity (within ± 0.2). Datasets were phased using SHAPEIT (18) and imputed using IMPUTE2 (19) with the 1000 Genomes Phase 3 reference panel data (20). For the UKBB, QC and imputation were carried out centrally by UKBB investigators as previous described (16) and GWAS was carried out by the PGC-PTSD analyst. For cohorts with data-sharing restrictions, analyses were performed using similar protocols by the study team that had individual-level data access, and GWAS summary statistics were provided to the PGC-PTSD.

Genome-wide Association Study
Only unrelated (π < 0.2) participants were retained for analysis. Principal components (PCs) were calculated within each cohort using EIGENSOFT v6.3.4 (21). The PTSD GWAS was performed within cohorts using PLINK 2.0 alpha with the −glm option, with the exception of UKBB and VETSA (Vietnam Era Twin Study of Aging) data, which were analyzed using BOLT-LMM v2.3.4 (22). Where available, PTSD symptom scores were analyzed using linear regression (n = 36 cohorts); PTSD case-control status was used if symptom scores were not available, using logistic regression (n = 15 cohorts). In both cases, 5 PCs were included as covariates to account for population stratification and genotyping artifacts. The UKBB PTSD GWAS included an additional PC as well as batch and assessment center covariates. Studies providing summary data used similar analytic strategies, as previously described (8). For each GWAS, SNPs with minor allele frequency <1% or imputation information score <0.6 were excluded. To perform a GWAS of PTSD conditioned on LTE, the GWAS was performed with LTE included as an additional covariate as either a count of LTEs or a binary variable, depending on data availability. The GWAS of the LTE count phenotype in the UKBB sample was performed in BOLT-LMM using 6 PCs, batch, and assessment center as covariates.

PTSD Meta-analysis
Sample size-weighted fixed-effects meta-analysis was performed using METAL (23). To account for different analytic methods and measure scales, effect estimates were converted into z scores by dividing effect sizes by standard errors (24). Case-control and quantitative GWAS subsets were evaluated for r g to determine if they could be meta-analyzed. To account for differences in ascertainment, heritability, and power between case-control and quantitative subsets, modified sample size weights were derived as previously described (25), assuming 10% population prevalence of PTSD, the estimates of SNP-based heritability (h 2 SNP ), r g , and sample PTSD prevalence. Meta-analysis was conducted on the reweighted z scores. Only SNPs available in >90% of all samples (N ≥ 163,979) were included in analyses. Regional annotation plots of genome-wide significant loci were produced using Locus-Zoom (26).

Heritability and Genetic Correlation Estimation With Linkage Disequilibrium Score Regression
Trait h 2 SNP and r g were estimated from GWAS summary statistics using linkage disequilibrium score regression (27). The linkage disequilibrium score intercept was used to test for inflation of test statistics owing to residual population stratification or other artifacts, and the attenuation factor {[intercept − 1]/[mean (χ 2 ) − 1]} was used to determine the proportion of inflation of test statistics owing to residual population stratification (Table  S2 in Supplement 2). Heritabilities were contrasted using a z test where standard errors were estimated using the block-jackknife approach. To estimate r g with other disorders, the LD Hub web interface was used (28). To identify genetic differences between PTSD and LTE, the r g s observed for PTSD and LTE were contrasted using z tests, where significance level was determined using Bonferroni correction for the 772 traits tested (p < 6.47 × 10 −5 ).

Cis-Quantitative Trait Locus Mapping
The effects of GWAS loci on transcriptomic regulation of surrounding genes (locus within ± 1 Mb of the gene transcription starting site) were tested for 49 tissues in GTEx v8 with genome-wide false discovery rate correction applied. Using the same criteria, GTEx v8 data were also used to investigate the effects of GWAS loci on the regulation of alternative splicing isoforms. A detailed description regarding GTEx v8 quantitative trait locus (QTL) mapping data by the GTEx Consortium is available (32). Briefly, cis-expression QTL (eQTL) and cis-splicing QTL mapping was performed using FastQTL (33) including the top 5 genotyping PCs, probabilistic estimation of expression residuals factors (34), sequencing platform, sequencing protocol, and sex as covariates.

Replication Analysis
Summary data from MVP TOT (dbGaP study accession phs001672.v4.p1) was used to replicate GWAS results. MVP TOT included 186,689 European ancestry participants who completed the PTSD Checklist-Civilian Version and passed QC. Details of MVP TOT have been published (35). SNPs were deemed replicated in MVP TOT if they had matching effect direction and were nominally significant after Bonferroni correction for the 9 SNPs tested (p < .006).

Multi-Trait Analysis of GWAS
Multi-Trait Analysis of GWAS (MTAG) (36) performs multivariate analysis of genetically correlated traits to increase discovery power for each input trait, providing trait-specific effect estimates and p values. MTAG was used to perform multivariate analysis with PTSD and LTE GWASs. The maxFDR statistic was used to test for MTAG model assumptions (Supplement 1).

Phenome-wide Association Study
To understand further how functional changes of significant loci are associated with human traits and diseases, we conducted a PheWAS of leading SNPs from PTSD and LTE loci using data from the GWAS Atlas (available at https://atlas.ctglab.nl/) (37). Bonferroni correction was applied to account for the 4756 phenotypes available that were tested (p < 1.05 × 10 −5 ).

RESULTS
The PTSD GWAS meta-analysis included 182,199 participants of European ancestry from 51 cohorts (Table S1 in Supplement 2). The largest cohort was the UKBB (N = 134,586 participants). Across the cohorts, PTSD was assessed using a variety of different methods (n = 19 methods); the most common methods were versions of the Clinician-Administered PTSD Scale (n = 18 studies) and PTSD Checklist (n = 14 studies). The majority of participants (91%, n = 165,825, 36 studies) were analyzed based on PTSD symptom scores; the remaining participants (9%, n = 16,374, 15 studies) did not have symptom scores available and were analyzed based on PTSD case-control status.

PGC-PTSD GWAS Meta-analysis
The h 2 SNP of meta-analysis of cohorts analyzed by symptom scores was 0.0547 (SE = 0.0042, p = 8.9 × 10 −39 ) ( Table S2 in Supplement 2). The h 2 SNP was similar, albeit not significant, in the smaller meta-analysis of case-control cohorts (observed scale h 2 SNP = 0.0580, SE = 0.0259, p = .17). The r g between the symptom score and case-control analyses was very high (r g = 0.9646, SE = 0.36, p = .0074). Thus, symptom score and case-control GWASs were meta-analyzed. We identified 5 genome-wide significant loci (Table 1, Figure  1A). Leading variants in significant loci mapped to an intergenic locus on chromosome 1, the intronic region of the GABBR1 gene on chromosome 6, the intronic regions of MPP6 and DFNA5 on chromosome 7, an intron of FOXP2 on chromosome 7, and the intronic region of FAM120A on chromosome 9. Gene-based analysis identified 6 significant genes (DCAF5, EXD2, FAM120A, FOXP2, GALNT16, and PHF2) ( Table S3 in Supplement 2).

PGC-PTSD GWAS Covariate Adjustment for LTE
We repeated the GWAS of PTSD with covariate adjustment for LTE. h 2 SNP was 0.0389 (SE = 0.00340, p = 2.6 × 10 −30 ), 31% lower than the PTSD GWAS without LTE covariate adjustment (p = 8.6 × 10 −20 ). There was a genome-wide significant locus in an uncharacterized region, CTC-340A15.2, on chromosome 5 that was not identified in the PTSD GWAS (Table S4 in Supplement 2). Effects changed slightly for the loci previously identified in the unadjusted PTSD GWAS (Table S4 in Supplement 2). Gene-based analysis identified no significant genes.

Multivariate Analysis of PTSD and Trauma Exposure
MTAG analysis that combined PTSD GWAS meta-analysis and UKBB LTE GWAS reported an effective sample size increase of PTSD GWAS from 182,199 to 217,491. There were 8 genome-wide significant loci for the MTAG PTSD analysis, including 4 loci not identified in the PTSD GWAS meta-analysis (Table 1, Figure 1C). Leading variants from additional loci mapped to an intergenic region in chromosome 2, the intron of SGCD on chromosome 5, an intergenic region on chromosome 16 near ZKSCAN2 and AQP8, and the intron of STAU1 on chromosome 20. In gene-based analysis, there were 8 significant genes, including 5 genes not identified from the original GWAS gene-based analysis (CSE1L, DFNA5, FOXP1, SGCD, TRIM26) ( Table S3 in Supplement 2).

Cross-Replication in MVP TOT
Of the 9 loci identified across the PTSD GWASs (5 from the PGC GWAS and 4 from the MTAG), 4 replicated significantly in MVP TOT (p < .006) ( Table 1, Figures S2-S10 in Supplement 1). Of the 11 genes identified in gene-based analyses (6 in the GWAS and 5 in the MTAG), 7 replicated at least at a nominally significant level in MVP TOT (Table  S3 in Supplement 2). Additionally, of 15 loci identified in MVP TOT GWASs, 9 nominally replicated in PGC-PTSD (Table S7 in Supplement 2). Overall, r g between PGC-PTSD and MVP TOT was high (r g = 0.8359, SE = 0.0376, p = 2.5 × 10 −109 ).

Functional Consequences of Risk Loci
We examined the functional impact of the 9 GWAS variants associated with PTSD (5 from the GWAS and 4 from the MTAG) ( Table 1). We observed that 7 loci were related to multiple tissue-specific eQTLs ( Table S8 in Supplement 2), where 11% of false discovery rate-significant eQTLs were in brain regions. A similar trend was present for splicing QTLs (Table S9 in Supplement 2), where only 7% of gene-tissue combinations were related to brain regions. Further details of the eQTL analysis are provided in Supplement 1.

DISCUSSION
Our GWASs aimed to advance understanding of PTSD genetics by integrating quantitative PTSD phenotypes and LTE exposure information in 182,199 participants of European ancestry from 51 cohorts. Overall, quantitative PTSD phenotyping captured similar genetic signal to our prior case-control analysis (r g = 0.92-1.14) (8), but with substantially higher power. However, by using LTE as a covariate, which hypothetically accounts for unexpressed genetic vulnerability among unexposed participants (12), we found a significant reduction in heritability and gene discovery. As high r g between PTSD and LTE would be one hypothetical explanation for this result (i.e., multicollinearity), we performed a GWAS of LTE and contrasted it to GWAS results for PTSD. We found that LTE has h 2 SNP comparable to PTSD and high r g compared with PTSD. We leveraged the r g to significantly enhance PTSD discovery power using a multivariate approach (36).
One explanation for h 2 SNP of PTSD adjusted for LTE being lower than the unadjusted estimate is that it may have removed genetic effects on PTSD mediated by trauma exposure (12,13). Given that trauma is a prerequisite for PTSD, genetic effects on trauma exposure can have mediated (i.e., indirect) effects on PTSD. Indeed, this seems plausible, as our LTE GWAS suggested a substantial amount of h 2 SNP related to trauma exposure. Therefore, the estimated h 2 SNP of PTSD conditional on LTE would theoretically reflect only nonmediated (i.e., direct) effects and thus would be smaller.
We used r g to quantify the genetic overlap between LTE and PTSD, finding similar magnitude to findings from twin studies (5,6). At the same time, incomplete r g between these two phenotypes also suggested meaningful genetic differences. To investigate this, we contrasted the magnitudes of r g that PTSD and LTE shared with other traits. For most traits, r g with PTSD was quite similar in magnitude to r g with LTE. However, we also found that negative affect traits, such as neuroticism and irritability, were more strongly correlated with PTSD than LTE, whereas risk-taking behavior showed higher correlation with LTE than PTSD. This suggests that some variants influence PTSD and LTE through somewhat distinct psychological and behavioral mechanisms (5).
The high r g between PTSD and LTE facilitates the application of multivariate approaches to PTSD GWASs. Whereas the r g between PTSD and LTE induces loss of power in the PTSD analysis when conditioned on LTE, a multivariate approach can benefit from it. Our multivariate (36) analysis resulted in a 19% increase in the effective sample size by adding LTE count data from the UKBB and identified replicable loci and patterns of tissue expression not identified in a standard PTSD GWAS.
The biological mechanisms associated with several of the protein products of identified genes have been linked to PTSD pathophysiology in animal and cell models: amygdalamediated fear extinction [FAM120A (38) (47)]. Blood and brain transcription-wide association and differential gene expression studies of PTSD have also implicated some of these genes, including a bloodbased prediction of downregulation of ARFGEF2 in the dorsolateral prefrontal cortex (48) and a postmortem study of human PTSD cortex indicating downregulation of CTSS expression in the dorsal anterior cingulate cortex and downregulation of OSBPL3 expression in the dorsolateral prefrontal cortex (49).
Interestingly, PTSD loci show widespread pleiotropic associations in PheWAS (50)(51)(52). Some loci point to factors associated with existing clinical presentations of PTSD (e.g., sleep), while others point to potential risk/protective factors for PTSD, such as educational attainment and cognitive functioning. Loci may affect PTSD through their direct influence on these risk/protective factors. Alternatively, the high degree of pleiotropy shown by these loci suggests that they could influence PTSD risk through a more general alteration of biological function (37), such as general predisposition to psychiatric illness (53). In particular, metabolic phenotypes such as height and body mass also appeared to be enriched in our PheWAS. This could be the influence of these loci on previously implicated inflammatory mechanisms for PTSD (8) or simply an artifact of their overrepresentation in the GWAS Atlas. Nevertheless, the broad variety of behavioral and clinical domains associated with these loci suggest complex etiologic heterogeneity of PTSD that could relate to subtypes (54).
Further characterization of significant loci via eQTL analyses identified expression across a variety of tissue types. Given the high degree of shared eQTL architecture between tissues, the presence of some of these tissues might not be directly related to PTSD pathogenesis. Indeed, on the genome-wide level, our tissue enrichment analysis suggests that only brain tissues are relevant. The brain regions implicated are consistent with functional magnetic resonance imaging and structural magnetic resonance imaging findings of PTSD. BA 24 (as part of the ventral anterior cingulate cortex) is implicated in PTSD response to trauma-, fear-, and threat-related stimuli (55,56). BA 9 (as part of the dorsomedial prefrontal cortex) reflects response to self-referential thought, theory of mind, empathy, and moral judgments and shows greater engagement in people with PTSD and trauma-exposed individuals (55,57,58). Nucleus accumbens expression is consistent with the neuroimaging evidence of its role in the reward system, which is prominently affected with emotional numbing symptoms of PTSD (59)(60)(61)(62).

Limitations
Stress-related disorders are phenotypically complex and heterogeneous (63), which limits discovery power and complicates translation to clinical application. The strategies proposed for understanding and addressing heterogeneity in major depressive disorder, such as harmonization of measures, additional phenotypic measures, and investigations of subtypes, could be applied to PTSD as additional avenues to enhance discovery power (64). Sex differences may also contribute a significant source of heterogeneity (8,(65)(66)(67)(68). Our analyses were restricted to participants of European ancestry given power limitations for other ancestry groups. However, urgent scientific and ethical reasons call for extending analyses to individuals of non-European ancestry (69). The PGC-PTSD group has actively been gathering data to increase representation from diverse ancestry and developing methods to optimize analyses in admixed populations (70). As sample sizes increase, future investigations will be powered to investigate ancestry and sex-specific genetic influences on PTSD and trauma exposure. In performing a GWAS of cumulative LTE, we identified several significant loci, including loci previously identified in GWASs of childhood trauma exposure (14). A full investigation of the genetic basis of LTE is clearly warranted. Future work could also examine the relationship between PTSD and specific types or numbers of trauma exposure, as they plausibly have different relationships with PTSD (6) and may therefore be more informative than our cumulative measure for LTE. Finally, trauma was assessed via participant self-report, which may vary with mood and PTSD symptoms at the time of reporting (71) and could inflate genetic associations with PTSD.

Conclusions
Novel replicable risk loci for PTSD identified by incorporating quantitative symptom data and trauma exposure information into GWASs offer us new insights into the genetic architecture of PTSD. Beyond the nature of LTE as an environmental exposure, there is a heritable component to LTE that overlaps highly with PTSD to impart an enhanced understanding of PTSD genetics. In future investigations, the genetic architectures of PTSD and LTE could be further delineated using causal mediation analysis (72), which can provide estimates of LTE-related mediation and gene-by-environment interaction. Our results reinforce the notion that in addition to larger samples, more detailed phenotyping and sophisticated modeling are needed to account for the role of environmental exposure in developing PTSD, as these influence GWAS discovery power. Widespread pleiotropy of significant loci suggests that cross-disorder analysis with PTSD (73,74) will enhance our understanding of how these loci modify risk for PTSD and related disorders.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.  Comparison of the genetic correlations of posttraumatic stress disorder (PTSD) and lifetime trauma exposure (LTE) with other traits. The x-axis is the genetic correlation between LTE and a given trait from the LD Hub. The y-axis is the genetic correlation between PTSD and a given trait. Each dot depicts a given trait. Colored (black, red, or blue) dots indicate traits with significant genetic correlation to both PTSD and LTE after Bonferroni adjustment. Noncolored (gray) dots indicate traits where genetic correlation is not significant after Bonferroni adjustment. Blue dots indicate traits with significantly higher genetic correlation with PTSD than with LTE. Red dots indicate traits with significantly higher correlation with LTE than with PTSD. The top 5 traits with a significantly higher correlation to PTSD than LTE and top trait with significantly higher correlation to LTE than PTSD have been labeled.   Table 2. Biol Psychiatry. Author manuscript; available in PMC 2022 April 01.