Targeted Sequencing of 10,198 Samples Confirms Abnormalities in Neuronal Activity and Implicates Voltage-Gated Sodium Channels in Schizophrenia Pathogenesis

Background Sequencing studies have pointed to the involvement in schizophrenia of rare coding variants in neuronally expressed genes, including activity-regulated cytoskeleton-associated protein (ARC) and N-methyl-D-aspartate receptor (NMDAR) complexes; however, larger samples are required to reveal novel genes and specific biological mechanisms. Methods We sequenced 187 genes, selected for prior evidence of association with schizophrenia, in a new dataset of 5207 cases and 4991 controls. Included among these genes were members of ARC and NMDAR postsynaptic protein complexes, as well as voltage-gated sodium and calcium channels. We performed a rare variant meta-analysis with published sequencing data for a total of 11,319 cases, 15,854 controls, and 1136 trios. Results While no individual gene was significantly associated with schizophrenia after genome-wide correction for multiple testing, we strengthen the evidence that rare exonic variants in the ARC (p = 4.0 × 10–4) and NMDAR (p = 1.7 × 10–5) synaptic complexes are risk factors for schizophrenia. In addition, we found that loss-of-function variants and missense variants at paralog-conserved sites were enriched in voltage-gated sodium channels, particularly the alpha subunits (p = 8.6 × 10–4). Conclusions In one of the largest sequencing studies of schizophrenia to date, we provide novel evidence that multiple voltage-gated sodium channels are involved in schizophrenia pathogenesis and confirm the involvement of ARC and NMDAR postsynaptic complexes.


See Supplement 2 (Excel file) for tables
. Genes targeted for sequencing. Gene IDs are presented for 187 targeted genes, along with the criteria used to select them for sequencing. Table S2. Targeted sequence sample case-control LoF variants. All loss of function (LoF) variants observed in the new targeted sequence data, for alleles < 0.1% in frequency. Singletons are indicated in the 'Is Singleton' column. The 'DDG2P Gene' column indicates whether the affected gene is a confirmed developmental disorder gene associated with monogenic LoF variants (defined using data downloaded from DECIPHER (https://decipher.sanger.ac.uk/about#downloads/data) on 21/08/2018 (1)). Table S3. Total variant burden analysis. Case-control association results for all 187 targeted genes and 106 LoF intolerant genes (genes with pLi scores > 0.9). The minor allele frequency threshold used for the given test is shown in the 'Variant frequency' column. Table S4. Primary gene set analysis. Gene set association results for three case-control data sets (Targeted sequence sample, Swedish, UK10K) and case-control-de novo meta-analysis (Fisher's combined method). The minor allele frequency threshold used for the given test is shown in the 'Variant frequency' column. Table S5. Secondary gene set analysis of LoF and paralog conserved missense variants (<0.1% frequency) in the ion channel gene sets. Paralog conservation scores (para_zscores) were downloaded from https://zenodo.org/record/817898. Table S6. Single-gene meta-analysis of sodium channel genes for LoF and paralog conserved missense variants (<0.1% frequency). Table S7. Primary single-gene meta-analysis of LoF variants (<0.1% frequency).
Single-gene results for three case-control (Ion torrent, Swedish, UK10K) and de novo mutations.

Sample description Cases
Targeted sequence sample: 5,724 schizophrenia cases (pre-QC) were sequenced using Ion Torrent instruments. The majority of these cases were from the CLOZUK cohort (n=4,647), which consists of individuals diagnosed with treatment resistant schizophrenia. Here, DNA was extracted from anonymised whole blood samples. All CLOZUK samples had received a clinician reported diagnosis of treatment-resistant schizophrenia. The CLOZUK cohort has been extensively used in previous common allele (2,3) and rare CNV genetic studies (4,5), and a validation for using a clinician diagnosis of treatment-resistant schizophrenia against a research diagnostic criteria for schizophrenia can be found in Pardiñas et al 2017, Supplementary Note (2). We sequenced additional cases from the UK belonging to the CardiffCOGs cohort (n=521).
These cases were assessed with a SCAN interview (6) and case note review followed by consensus research diagnostic procedures. All CardiffCOGs cases had a DSM-IV diagnosis of schizophrenia or schizoaffective disorder-depressive type. Further details on the CardiffCOGs cohort can be found in previously published studies (2,5). 335 cases were recruited from Ireland (Dublin cohort). These cases were all over 18 years of age and had a diagnosis of Schizophrenia or Schizoaffective Disorder after a structural clinical assessment (as described in (7)). Diagnosis was made based on the consensus lifetime best estimate method using all available information (interview, family or staff report, chart review) with DSM-IV criteria as per the Structured Clinical Interview for DSM-IV, research edition (SCID-P). Each referral centre obtained local Research Ethics Committee (REC) approval. 221 additional cases were recruited from the Netherlands (GROUP cohort). The GROUP cohort has been described previously (8). Cases were between 16 and 50 years of age, and had received a diagnosis of schizophrenia according to DSM-IV criteria.

Published trio data sets used to derive schizophrenia de novo variants
Published schizophrenia exome sequencing trio studies used to derive schizophrenia de novo mutations are shown in Table S8. We included data sets that reported all exomewide nonsynonymous de novo mutations.  Fig S1). Both cases and controls had at least 95% of target bases covered at ≥10X.

Gene selection criteria
We sequenced the coding regions of 187 genes, 129 of which belong to four gene-sets that have been previously implicated in schizophrenia ('Schizophrenia-associated biological gene-sets' selection criteria in Table S9). We also sequenced 58 additional genes that have at least two lines of evidence for association with schizophrenia. These lines of evidence were based on gene-sets previously implicated in schizophrenia, for example FMRP (21) and miR-137 (3) targets, genes disrupted by de novo SNVs/indels in patients diagnosed with schizophrenia, autism spectrum disorder or intellectual disability (12), genes within schizophrenia-associated CNV loci (5) and genes disrupted by schizophrenia de novo CNVs (22) (full list of criteria described in Table S1). All genes belonging to ARC, NMDAR, Voltage-gated calcium channels (VGCCs) and voltage-gated sodium channels (VGSCs) gene-sets were sequenced (rows above dashed line), as well as genes that were members of two or more gene-sets listed below the dashed line.

Data processing and quality control
Sequence data were independently processed for each Ion Torrent wave according to GATK best practice guidelines (23,24). Reads were aligned to the human reference genome (GRCh37) using bwa (25 (35)). However, our study design precludes the use of random gene sets as a means for testing null distributions, as the genes are, by design, selected as candidate genes for schizophrenia. Nevertheless, we tested the enrichment of sodium channel genes for LoF and paralog conserved missense variants in schizophrenia by comparing its significance to that observed from 100,000 random sets of 14 genes (the number of genes in our voltage-gated sodium channels gene set), sampled from our 137 targeted genes that had paralog conserved sites. We performed two random gene set tests: the first excluded sodium channels from the random gene draws to test whether the sodium channel result reflected a general increased burden for LoF variants and missense variants at paralog sites across all targeted genes; the second included sodium channels among the random gene draws to test whether the enrichment is due to the genes being sodium channels, as opposed to being other pathways containing the most significant genes.
Significance was determined as the fraction of random gene sets as, or more, significant than the original sodium channel P value. This is likely to be a more conservative test than comparing sodium channels to random gene sets selected from the whole genome, since one would expect the 187 schizophrenia candidate genes as a whole to show an excess of LoF variants in schizophrenia. Case-control de novo meta-analysis: Coefficients and standard errors from independent case-control regression tests (targeted, UK10K and Swedish) were metaanalysed as fixed effects using the inverse-variance method (implemented in R using the rma.uni() function as part of the metafor package). To obtain a single enrichment statistic for meta-analysed case-control and de novo tests, we followed the method described in Singh et al 2016 (30), which combined a 1-tail case-control P value with the de novo Poisson test P value using Fisher's combined method. For combined casecontrol-de novo meta-analysis of nonsynonymous damaging mutations, we included all de novo nonsynonymous mutations (i.e. not just those with a CADD score ≥ 20), given they are a priori more likely to be deleterious than inherited variation (39) and were the class of mutation most strongly associated with schizophrenia candidate genes in our previous publication (12).

Approach to hypothesis testing and multiple testing
Our targeted sequencing study was designed to test three broad questions: 1. Do we observe significant evidence for enrichment of rare variants in 187 genes previously implicated in schizophrenia?
2. Do we support enrichment of rare variants in four candidate gene-sets previously implicated in schizophrenia?
3. Do we observe significant evidence for enrichment of rare variants in single genes?
For each of these questions, we describe below our rationale for the classes of variant tested and the number of multiple tests. those that are not (pLi ≤ 0.9). Since this single analysis was designed to further characterize the source of an association signal, no multiple testing correction was applied.
2. Do we support enrichment for rare variants in four candidate gene-sets previously implicated in schizophrenia?
Primary gene-set analysis: Among the targeted genes are four gene-sets previously implicated in schizophrenia; these include the synaptic sets ARC (n=28) and NMDAR (n=61) (12), and the ion-channel sets voltage-gated calcium channels (n=26) (21) and voltage-gated sodium channels (n=14) (40). Given rare (< 0.1% frequency), LoF variants was the only class of mutation from question 1 that survived correction for multiple testing (main text Table 1), we tested these variants in a primary gene-set analysis. In the new targeted sequencing sample, P values from the primary gene-set analysis were Bonferroni corrected for 4 tests (four gene-sets x one mutation class).
For meta-analysis, we note the inclusion of ARC, NMDAR, and calcium channel gene sets in the present study was predicated on previous associations from exome-wide de novo and case-control studies that are included in the present meta-analysis (12,21).
This ascertainment bias makes it impossible to generate meaningful and appropriately conservative study-wide multiple-testing corrections. We therefore consider those meta-analyses as representing an appraisal of the current sequencing evidence for those gene-sets. The case-control meta-analysis of sodium channels does not include any previously reported data, and therefore it does not suffer from such an ascertainment bias; accordingly, we calculate study-wide corrected P-values as we did for the new sequencing data (four gene-sets x one mutation class).

Secondary gene-set analysis:
The two ion-channel gene sets are largely comprised of paralogous genes; therefore, we leveraged information from a recently described metric We performed a number of exploritory tests to assess the robustness of the enrichment of LoF and paralog conserved missense variants in sodium channels, and to dissect which genes might be driving the signal. In this later test, we partioned sodium channel genes into alpha and beta subunits. Aiming to favour caution in the light of the novelty of the finding, we conservatively Bonferroni corrected the derived P-values for 12 potential tests (two mutation classes tested against four gene-sets plus the two sub-sets of sodium channels alpha and beta subunits).
3. Do we observe significant evidence for enrichment of rare variants in single genes?
Single gene analysis: Our primary single-gene enrichment analysis tested rare (< 0.1% frequency), LoF variants, again as this was the only class of mutation from question 1 that survived correction for multiple testing. In our meta-analysis, we applied exomewide criteria for multiple testing correction by applying Bonferroni correction for testing 20,000 genes.

Swedish exome sequencing sample
To test our hypothesis that our 187 targeted genes are more likely to contain rare (frequency < 1%), non-singleton schizophrenia risk alleles compared with all genes, we compared effect sizes between these tests. As expected, we reproduced the published Swedish result (29) in finding a significant effect size difference between singleton and rare (frequency < 1%), non-singleton tests at the exome-wide level (Table S10).
However, no significant difference was observed when the analysis was restricted to the 187 targeted genes (Table S10), suggesting the existence of risk alleles within these genes that have a minor allele count > 1.