The Penetrance of Copy Number Variations for Schizophrenia and Developmental Delay

,

A number of rare genomic rearrangements, called copy number variants (CNVs) have been shown to increase the risk of developing early-onset neurodevelopmental disorders.These were first identified in patients with characteristic and recognizable syndromic features (e.g., Williams-Beuren syndrome, Smith-Magenis syndrome, Sotos syndrome, DiGeorge/velo-cardio-facial syndrome [VCFS]).Over the past few years with the introduction of highthroughput microarray technologies, more CNVs of smaller size and incomplete penetrance have also been identified.Some of these have been shown to also increase the risk of developing SCZ, ASD, and other neuropsychiatric disorders.For example, in 2008-2009, a deletion at 15q13.3 was shown to increase the risk of developing developmental delay (DD) (1), schizophrenia (SCZ) (2,3), epilepsy (4), and autism (5).Similar findings of increased risk for developing SCZ, DD, and autism spectrum disorders (ASD) were made for deletions at 1q21.1 and 15q11.2 and duplications at 16p11.2 and 16p13.11(reviewed by Malhotra and Sebat) (6).A number of CNVs have now been consistently associated with SCZ, and each of them also increases the risk for the group of DD/ASD/congenital malformations (CM) (6)(7)(8)(9).
The number of CNVs known to increase the risk of developing a disorder from the group of DD/ASD/CM is higher than those implicated in SCZ.Thus, Girirajan et al. (7) tested 32,587 samples from children who had DD or ASD with or without CM for 72 CNV regions (39 deletions and 33 reciprocal duplications) that had previously been implicated in neurodevelopmental phenotypes or genomic disorders, including nine of uncertain pathogenic significance.When compared with a set of 8329 healthy control subjects, 38 (25 deletions and 13 duplications) were nominally statistically associated with the disorders (at p Ͻ .05),and several more showed trends that might also represent true associations if tested in larger samples.Similar results were reported by Kaminsky et al. (8) on 15,749 individuals who presented for diagnostic array testing with abnormal clinical phenotypes including DD, intellectual deficit, ASD, and/or multiple CM.These authors reported that 21 CNV regions (14 deletions and 7 duplications) were significantly associated with one or more of these disorders.
It is clinically important to know the risk to carriers of these CNVs for developing each of the possible associated disorders (i.e., their penetrance).Vassos et al. (9) were the first to estimate the penetrance for SCZ for seven CNVs that had been shown to increase risk for this disorder.They found rather modest rates of 2% to 7.4% except for the VCFS deletion on 22q11.2, which had a much higher penetrance of 55%, although with broad confidence intervals because no CNV was observed in control subjects.The authors concluded that these CNVs were neither necessary nor sufficient to cause the disorder and that the level of penetrance was not sufficient for them to be considered as useful clinical tools in genetic counselling, diagnosis, and testing.However, they pointed out that the overall penetrance for any neuropsychiatric disorder was likely to be much higher.The penetrance of 12 CNVs for DD/ASD/CM was estimated by Rosenfeld et al. (10).Estimates of the risk for an abnormal phenotype ranged from 10.4% for 15q11.2deletions to 62.4% for distal 16p11.2deletions.These values are much higher than those for SCZ.The most highly penetrant CNVs were not tested because their absence in control subjects prevented accurate estimates.
Here we estimate the penetrance of all CNVs listed in the Girirajan et al. (7) article.Nearly all SCZ-associated CNVs are on this list as well, and we added only exonic deletions at NRXN1, a gene consistently implicated in SCZ (11)(12)(13).W e performed estimates for both SCZ and the group of earlyonset developmental disorders: DD/ASD/CM.This joint analysis allowed us to provide estimates for each CNV, even for those that are never found in SCZ or in healthy control subjects.We use a large new sample of cases and controls and add to this the data from the two largest previous studies on SCZ or from previous meta-analyzes to derive more reliable estimates.

Choice of CNVs
We analyzed CNVs previously associated with SCZ or severe neurodevelopmental phenotypes.These were taken from the list of CNV regions proposed by Girirajan et al. (7): 37 deletions and 32 reciprocal duplications, after excluding an overlapping segment at 17p13.3 and a CNV on the X chromosome because the X chromosome was not analyzed in our samples.Most of the strongly implicated SCZ-associated CNVs are on this list, but we added exonic deletions at NRXN1.For some of the analyzes, we focus on CNVs that we regard as associated with SCZ (Table 1), based on the review by Malhotra and Sebat (6), with the addition of NRXN1 and newly implicated loci (Table S2 in Supplement 1).We did not analyze other SCZ-implicated loci (e.g., VIPR2 duplications) because they have not been tested in sufficiently large samples of DD/ASD/CM or have not received consistent support.

Estimating the Rate of CNVs in Different Disorders
We only included data from the largest studies/samples available to simplify the presentation.The numbers that follow are those after exclusion of poorly performing arrays and duplicate samples.A CNV was called as covering a CNV locus if it spanned more than 50% of the commonly affected region (Table S2 in Supplement 1).In the case of loci that include only single genes (NRXN1, SIM1, YWHAE, PAFAH1B1, and NF1), we accepted CNVs that intersected at least one exon of the gene.
The rates of CNVs in DD/ASD/CM are taken from the largest study on these phenotypes: 32,587 patients referred for genetic testing to one laboratory (Signature Genomics) described by Girirajan et al. (7).For some of the loci, the reported sample numbers are smaller (23,380), for others they are larger (33,226) because the same team subsequently published data on several CNVs in an enlarged data set (10).
For SCZ cases, we analyzed three large data sets where we had access to the raw data, for a total of at least 13,465 cases (and more for the SCZ-associated loci, see below): 1) 6882 patients from our new Clozapine UK and Cardiff University Cognition in Schizophrenia samples (Supplement 1), 2) 3391 cases from the International Schizophrenia Consortium study (ISC, 2008) (2), and 3) 3192 cases from the Molecular Genetics of Schizophrenia (MGS) study (dbGAP accession numbers phs000167.v1.p1 and phs000021.v3.p2).
For controls, we analyzed samples from four publicly available data sets genotyped with high-resolution Illumina arrays (San Diego, California), similar to our new SCZ sample and analyzed by us with the same methods.These include individuals who took part in a study on smoking cessation in the United States (n ¼ 1488); a study on melanoma in the United States (n ¼ 2971); a study on refractive error Kooperative Gesundheitsforschung in der Region Augsburg (KORA) study from Germany (n ¼ 1857), and the Wellcome Trust Case Control Consortium (WTCCC2) in the United Kingdom (n ¼ 4939).To those we added 3181 control subjects from the ISC and 3437 control subjects from the MGS studies listed above, for a total of 17,873 control subjects.
For the SCZ-associated loci, we added data from previous studies, as reviewed by Malhotra and Sebat (6) or presented in the relevant articles that implicated them (Table S2 in Supplement 1).For these loci we excluded our WTCCC2 control subjects because they are completely or partially included in the previous reviews.
Estimation of the penetrance was performed with an adaptation of the method proposed by Vassos et al. (9).These authors estimated the penetrance as the probability of developing the disease (D)f o r individuals carrying the CNV (G) with the following formula: PðDjGÞ¼ PðGjDÞPðDÞ PðGjDÞPðDÞ ϩPðGjDÞPðDÞ where D denoted control subjects who do not have SCZ, and P(D)is the lifetime morbid risk for SCZ.Instead of using one single disease population and controls, we substitute the denominator with the estimate of the CNV frequency in the general population (P(CNV general)), which includes patients with SCZ and the group of DD/ ASD/CM.The frequency in the general population is therefore likely to be higher than the rate among control subjects.

Estimating the Frequency of the CNV in the General Population
This method was described in our previous publication (14).Briefly, we must take into account that if a CNV has a high penetrance for disorders that are underrepresented among populations recruited as healthy control subjects (e.g., DD/SCZ/ CM), then the measured frequency of that CNV among healthy control subjects would be an underestimate of the population frequency.To minimize this effect, we took into account the rate of these CNVs in all disorders that are likely to be excluded from "control" populations.The overall frequency of a CNV in the general population P(CNV general) is therefore: that is, the sum of the products of the conditional probabilities of being a carrier of the CNV given the phenotype, and the proportion of people with this phenotype in the population).
We accepted a frequency of SCZ in the population of 1%.The frequency of DD/ASD/CM was approximated at 4%.The 4% figure is a compromise based on the figure of 5.12% for the total frequency of diseases with an important genetic component, proposed by Baird et al. (15).This latter figure was used by Rosenfeld et al. (10) for calculating the penetrance of CNVs.However, it includes some individuals with psychosis and some with disorders that are unlikely to be referred for genetic testing, as in the population investigated by Girirajan et al. (7).A lower end of the frequency of these disorders could be the sum of the widely accepted rates of 2% for DD (16) and 1% for ASD (17,18), that is, 3%, the number that we used in our previous publication (14).Therefore, we accepted 4% as a reasonable compromise that also includes some congenital malformations.As shown later in the article, even large errors in these estimates make little difference to our conclusions because the combined rate of DD/ASD/CM in the population is still several times higher than that of SCZ under any assumptions.
Therefore, the frequency of a CNV in the general population can be expressed as the sum of the frequencies among healthy control subjects (comprising 95% of the general population that excludes the disorders) ϩ the frequency among SCZ (comprising 1% of the general population) ϩ the frequency among the DD/ASD/CM patients (comprising the remaining 4% of the general population): P(CNV general) ¼ P(CNV|con)Â.95ϩ P(CNV|SCZ)Â.01ϩ P(CNV|DD/ASD/CM)Â.04 The penetrance for SCZ (in the range of 0-1) then simplifies to PðSCZjGÞ¼ PðCNVjSCZÞÂ:01 PðCNV generalÞ The penetrance for DD/ASD/CM simplifies to

PðDDjASDjCMjGÞ¼
PðCNVjDD=ASD=CMÞÂ:04 PðCNV generalÞ The total penetrance for any of these disorders is simply the sum of those for SCZ and DD/ASD/CM.To illustrate the method, we provide an example of the penetrance of the 1q21.1 deletion.It is found in .021% of reported control subjects, in .17% of SCZ patients, and .24% of patients affected with DD/ASD/CM (Table 1).This results in a frequency in the general population of .033%:(.00021 Â .95)ϩ (.0017 Â .01)ϩ (.0029 Â .04)¼ .00033.(Note that this rate is higher than the frequency among healthy control subjects).The penetrance values are expressed as percent in the text and tables.
BecausesomeoftheCNVsareextremely rare or even absent in control subjects, we provide the 95% confidence intervals (CI) of the penetrance values, which can be wide in such instances.These were estimated by first producing binomial CIs for the frequencies of CNVs in each population, using the Wilson score interval (19).Upper and lower 95% bounds for penetrance were estimated from the upper bounds of CNV frequencies in patients and the lower bounds of the frequencies in the general population (and vice versa for the lower bounds).The details for each CNV are presented in the Table S4 of Supplement 1.
Estimating the selection coefficients acting against CNVs was performed with the method we presented previously (14).Briefly, this equates to the proportion of the observed de novo CNVs in a population, out of the total number of CNVs observed in that population (de novo ϩ inherited).We updated our previous estimates with new data published since and added loci that were not part of our previous paper (Table S3 in Supplement 1).

Results
The rates of CNVs among subjects affected with SCZ, those with DD/ASD/CM, and healthy control subjects are presented in Figure 1 and the full details in Tables S2 and S4 in Supplement 1. Almost all CNVs have higher rates in the DD/ASD/CM group compared with SCZ.The instances in which these differences are significantly higher are indicated with asterisks in Figure 1.The only occasions in which the rates in SCZ patients are higher are for the 16p11.2duplication, 3q29 deletion, 16p13.11duplication, and the "smaller 15q13.3(CHRNA7)" duplication, but these differences are small and not significant.In contrast, there are numerous occasions of CNVs that are much more frequent in DD/ ASD/CM, some differences are highly significant, and some CNVs have not yet been reported in SCZ cases.
The differences between the penetrance of CNVs for the different disorders are even more striking (Figure 2 and Table S4 in Supplement 1).For CNVs that are never observed in control subjects, the joint penetrance reaches 100%, but we should point out that for some of the cases with 100% penetrance, the 95% CIs are large (Supplement 1).It is known that the penetrance for some syndromic disorders, like Prader-Willi syndrome/Angelman syndrome (PWS/AS), is indeed nearly complete (i.e., they are not found in healthy control subjects), nor are they found in SCZ subjects.Furthermore, their 95% CIs are tighter.
Psychiatrists are likely to be more interested in the penetrance of SCZ-associated CNVs.Therefore, we present them separately in Figure 3 and Table 1.All of them have much higher penetrance for DD/ASD/CM than for SCZ.The frequency of a CNV (and therefore its penetrance) separately for ASD, for DD, or for specific CM will of course differ for each CNV and is not known in each 380 BIOL PSYCHIATRY 2014;75:378-385 G. Kirov et al.
www.sobp.org/journalcase.This remains to be established in the future and is not a topic of this article.The total penetrance for any disorder (including SCZ) for this set of CNVs ranges from 10.6% for the 16p13.11duplication to 100% for the VCFS deletion, (mean of 41%).The 95% CIs for this set of CNVs are much tighter because they have been tested in larger numbers of control subjects, and therefore the penetrance estimates are more reliable.
The selection coefficients for the CNVs and the sources we used to derive them are presented in Table S3 of Supplement 1.Ourdataare insufficient for a confident estimate of the selection coefficients for many of the CNVs, so for our comparison with the penetrance data, we use only CNVs for which at least five observations on their inheritance status are available from systematically conducted studies.The results are shown in Figure 4A.The strength of selection against CNVs correlated strongly with their overall penetrance for any disorder: Pearson correlation r ¼ .51,p ¼ .001.There are some obvious exceptions to the rule, with CNVs not seen in control subjects (and therefore have a penetrance of 100%) having only modest selection coefficients.As a rule, these exceptions are based on smaller number of observations and have wide 95% CIs, up to 0% to 100% (Table S4 in Supplement 1).We therefore excluded CNVs with a penetrance of 100% that have lower bounds of the 95% CI below 10% (an arbitrarily chosen cutoff).Most of the outliers disappeared (Figure 4B), and the correlation increased to r ¼ .83,p Ͻ 10 -6 .

Discussion
The role of CNVs in the pathogenesis of SCZ and developmental disorders is well established (6)(7)(8)20).The penetrance of some of these CNVs has been estimated before but separately for these disorders (9,10).The estimates for SCZ (9) had produced modest rates of 2% to 7.4% for seven SCZ-associated CNVs (excluding the 22q11.2deletion).The estimates for DD/ASD/CM  (10) for a small subset of CNVs produced higher estimates of 10% to 62%, but only three loci overlapped between these studies.Our estimates for the penetrance and the 95% CIs for the two phenotypes are reassuringly similar to these previous reports: Pearson correlation of .82 for SCZ and .68 for DD/ASD/CM (Table S5 in Supplement 1).The only exception is the 22q11.2deletion in which the penetrance for SCZ was estimated at 55% by Vassos et al. (10) and 12% by us.However, these authors pointed out that the credible intervals for this CNV were broad because no CNV was observed in a control, and they relied on simulations.Accurate data on DD/ASD/CM were not available at that time, and these greatly help the estimates.Because our results are based on larger sample sizes for every CNV tested, they are likely to be more accurate.Even for CNVs that are found at similar rates in the two phenotypes, the penetrance is several times higher for the DD/ASD/CM group.This is because the frequency of the group of DD/ASD/CM is approximately 4 times higher in the general population, indicating that even in those cases in which the rate of a CNV is similar, still about 4 times more CNV carriers will develop a DD/ ASD/CM phenotype instead of SCZ.Even large errors in our assumptions for the population frequencies of SCZ and the group of DD/ASD/CM cannot change the conclusion that the penetrance is higher for the group of DD/ASD/CM.
Different arrays have been used in the studies, and therefore we need ensure that this did not create the differences we observe.The CNV frequencies in SCZ cases and controls are based on similar or identical arrays.Thus, the ISC and MGS samples have both cases and controls of similar numbers, and they have been genotyped on the same (Affymetrix, Santa Clara, California) arrays, whereas the Clozapine UK/Cardiff University Cognition in Schizophrenia and the corresponding controls from the smoking, melanoma, Kooperative Gesundheitsforschung in der Region Augsburg (KORA), and WTCCC2 studies were analyzed with only the 520,766 overlapping probes on Illumina arrays (Table S1 in Supplement).The data on the DD/ASD/ CM samples is based on different, custom-made, whole-genome, bacterial artificial chromosome versions or oligonucleotide-based arrays (7,10).These arrays have fewer probes than those used for the SCZ cases and controls and, as a consequence, could have a lower resolution.Therefore, if CNVs had been underdetected on the custom-made arrays, the differences we find would be even higher.In any case, most CNVs tested are large and should be detected on any of these arrays.We made sure that even the limited number of small CNVs analyzed in the DD/ASD/CM samples (those for single genes) were covered with sufficient number of probes from the list of probes common on all Illumina arrays used in our study and would thus be detected on these arrays as well.As presented in Table S2 of Supplement 1, only a small number of CNVs are covered with fewer than 15 probes on the Illumina arrays, and these CNVs are not relevant for our conclusions.Even more reassuringly, the most striking differences between DD/ASD/CM and SCZ are found for very large CNVs, which should be detected on any arrays (e.g., the deletions at the AS/PWS and Williams-Beuren syndrome regions; at 1p36, 16p11.2,17q21.31;and the duplications at 22q11.2 and 22q13 that show differences at p Ͻ 10 -5 ) are covered with at least 80 probes (Table S2 in Supplement 1).
The rather modest penetrance values produced for SCZ in the previous literature have been taken as evidence that these CNVs have low penetrance and are neither sufficient, nor necessary for the development of SCZ.The current data highlights the point that most of these CNVs are in fact highly pathogenic, but the phenotype that they produce is more likely to be another developmental disorder, such as DD or ASD.Thus the average penetrance for the SCZ-associated CNVs from Table 1 is 41% for developing any of the disorders discussed here, ranging from 10.6% (95% CI ¼ 7-17%) for the 16p13.11duplication, to 100% (95% CI ¼ 60-100%) for the VCFS deletion.These are very substantial increases in risk for developing a serious disorder, such as SCZ, DD, ASD, and certain CM.As not all controls have been screened for neuropsychiatric phenotypes, it is possible that some carriers of CNVs from the control populations also have some subtle phenotypes, which would result in even higher penetrance estimates, so our figures might even be an underestimate.The high pathogenicity of these CNVs is supported by the estimates of high selection pressure that operates against them, and the two show a striking correlation (Figure 4B), despite being derived at with different methods (one based of frequencies, the other on de novo ratios).The increase in risk to develop one of these disorders appears to result in a similar increase in the selection pressure against their carriers.This indicates that the selection coefficient (the de novo ratio) is a good predictor for the penetrance of a CNV, and vice versa.For many of the 70 CNVs discussed here, the number of observations used for t h ed en o v or a t i o so rf o rt h ef r e q u e n c i e sa r et o os m a l l ,l e a d i n gt o unreliable estimates for the penetrance and selection coefficients.These estimates should be refined in future studies.
Additional disorders that have not been specifically discussed here but are also reported in carriers of these CNVs include, among others, epilepsy (4) and attention-deficit/hyperactivity disorder (21).I ti sn o t the scope of this article to discuss the exact range of phenotypic presentations of each of these CNVs.They are variable and in some cases not yet reliably established.What is more important is the fact that the presence of one of these CNVs has consequences for genetic counselling and diagnosis.Because the penetrance for a severe neurodevelopmental disorder among children of carriers is greater than 10%, this raises important questions in a genetic counselling setting because the offspring of carriers who inherit one of these CNVs (Table 1) will have a risk for developing one of these disorders of between 10% and 100% (ignoring any additional risk from other inherited genetic variants).In most instances, this will be higher than the 12.8% to 15% risk among children of SCZ patients in general (22).The presence in a patient of most other CNVs from the list proposed by Girirajan et al. (7) should also be considered important for the   diagnosis and management of the person.There are some notable exceptions to the rule, for example, the "smaller 15q13.3"duplication is present at equal rates in cases and controls and is not considered pathogenic by us or by Girirajan et al. (7).
A question that arises is what determines the neurodevelopmental trajectory toward SCZ or severe developmental delay/ intellectual disability for carriers of the same CNV.One possible explanation is the presence of a second large and rare CNV among carriers of pathogenic CNVs (7).We tested this hypothesis on the subset of SCZ-associated CNVs because these are the CNVs for which we have sufficient numbers to produce valid results (many of the most pathogenic CNV loci analyzed by Girirajan et al. (7) are not hit by CNVs in SCZ patients).We used the same criteria to define a "second hit," as suggested by Girirajan et al. (7): large (Ͼ500 kb) and rare CNVs (Ͻ.1% frequency in control populations), or a known pathogenic CNV (from the list of 70 CNV, Table S2 in Supplement 1), even if Ͻ500 kb.The rate of such "second hits" (Table S6 in Supplement 1) was however, nearly identical for patients with SCZ and those with DD/ASD/CM, at 10% versus 9.3% (p ¼ .74),indicating that the presence of a second hit CNV is not the factor that usually determines the phenotype of carriers.
The current study also strengthens the now-established evidence of a genetic overlap among DD, ASD, and SCZ, at least for a subset of CNVs.It appears that some CNVs are so highly pathogenic and penetrant that they cause earlier-onset disorders (DD/ASD) and not SCZ.Indeed, severe DD or ASD, particularly in the presence of a clear chromosomal syndrome, is likely to preclude a clinical diagnosis of SCZ.These are for example the Angelman/Prader-Willi syndrome, Williams-Beuren syndrome, and the 1q36 deletion syndrome.Other CNVs can present with DD/ASD or lead later in life to SCZ (Table 1).They still have a much higher penetrance for an early-onset disorder (Figure 3).No CNV from this list specifically increases risk to develop SCZ rather than DD/ASD/CM.

Figure 1 .
Figure 1.Frequencies of copy number variants among individuals with schizophrenia (gray) and the combined group of developmental delay, autism spectrum disorders, and various congenital malformations (black).Deletions are on the left and duplications on the right of the figure.VCFS, velo-cardiofacial syndrome.*p Ͻ .05;**p Ͻ .001;***p Ͻ .00001.

Figure 2 .
Figure 2. Penetrance of copy number variants.The layout is the same as for Figure 1.VCFS, velo-cardio-facial syndrome.

Figure 4 .
Figure 4. Correlation between the overall penetrance and selection coefficients of copy number variants.(A) All data.(B) Excluding data with wide 95% confidence intervals, see text.