Advertisement

Genome-wide Association Study of Dimensional Psychopathology Using Electronic Health Records

Open AccessPublished:February 26, 2018DOI:https://doi.org/10.1016/j.biopsych.2017.12.004

      Abstract

      Background

      Genetic studies of neuropsychiatric disease strongly suggest an overlap in liability. There are growing efforts to characterize these diseases dimensionally rather than categorically, but the extent to which such dimensional models correspond to biology is unknown.

      Methods

      We applied a newly developed natural language processing method to extract five symptom dimensions based on the National Institute of Mental Health Research Domain Criteria definitions from narrative hospital discharge notes in a large biobank. We conducted a genome-wide association study to examine whether common variants were associated with each of these dimensions as quantitative traits.

      Results

      Among 4687 individuals, loci in three of five domains exceeded a genome-wide threshold for statistical significance. These included a locus spanning the neocortical development genes RFPL3 and RFPL3S for arousal (p = 2.29 × 10−8) and one spanning the FPR3 gene for cognition (p = 3.22 × 10−8).

      Conclusions

      Natural language processing identifies dimensional phenotypes that may facilitate the discovery of common genetic variation that is relevant to psychopathology.

      Keywords

      Family studies of psychiatric illnesses demonstrated decades ago the overlap in risk for these disorders, a finding that has now been confirmed by genome-wide association studies (
      • Bulik-Sullivan B.
      • Finucane H.K.
      • Anttila V.
      • Gusev A.
      • Day F.R.
      • Loh P.R.
      • et al.
      An atlas of genetic correlations across human diseases and traits.
      ,
      • Lee S.H.
      • Ripke S.
      • Neale B.M.
      • Faraone S.V.
      • Purcell S.M.
      • et al.
      Cross-Disorder Group of the Psychiatric Genomics Consortium
      Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs.
      ,
      • Gilman S.E.
      • Ni M.Y.
      • Dunn E.C.
      • Breslau J.
      • McLaughlin K.A.
      • Smoller J.W.
      • et al.
      Contributions of the social environment to first-onset and recurrent mania.
      ). Such overlap highlights the limitations of a nosologic system focused on categories of symptoms rather than dimensions. For this reason, recent initiatives emphasize the utility of identifying symptom domains that may better correspond to underlying neurobiology (
      • Insel T.
      • Cuthbert B.
      • Garvey M.
      • Heinssen R.
      • Pine D.S.
      • Quinn K.
      • et al.
      Research domain criteria (RDoC): Toward a new classification framework for research on mental disorders.
      ,
      • Sanislow C.A.
      • Pine D.S.
      • Quinn K.J.
      • Kozak M.J.
      • Garvey M.A.
      • Heinssen R.K.
      • et al.
      Developing constructs for psychopathology research: Research Domain Criteria.
      ).
      The rise of biobanks embedded in health care systems or national registries provides an opportunity to investigate the impact of genomic variation in a less biased fashion than traditional disease case-control designs. However, such biobanks typically capture primarily coded clinical data, i.e., categorical diagnoses. We have recently developed multiple methods to examine narrative clinical notes to extract symptom dimensions as a means of augmenting these coded data (
      • McCoy T.H.
      • Castro V.M.
      • Rosenfield H.R.
      • Cagan A.
      • Kohane I.S.
      • Perlis R.H.
      A clinical perspective on the relevance of research domain criteria in electronic health records.
      ,
      • McCoy Jr., T.H.
      • Yu S.
      • Hart K.L.
      • Castro V.M.
      • Brown H.E.
      • Rosenquist J.N.
      • et al.
      High throughput phenotyping for dimensional psychopathology in electronic health records.
      ).
      We hypothesized that symptom dimensions based on expert-curated terms capturing National Institute of Mental Health Research Domain Criteria (RDoC) domains would be associated with common genomic variation and could thereby implicate novel sets of genes related to psychopathology. As proof of concept, we therefore applied a newly described natural language processing (NLP) method for extracting dimensional phenotypes to hospital discharge summaries drawn from the genomic biobank of an academic medical center (
      • McCoy Jr., T.H.
      • Yu S.
      • Hart K.L.
      • Castro V.M.
      • Brown H.E.
      • Rosenquist J.N.
      • et al.
      High throughput phenotyping for dimensional psychopathology in electronic health records.
      ) and used standard genome-wide association studies to investigate these novel phenotypes as quantitative traits.

      Methods and Materials

      Overview and Data Set Generation

      We drew on three waves of participants in the Partners Biobank from the Brigham and Women’s Hospital network and the Massachusetts General Hospital network, representing approximately the first 15,000 individuals genotyped as part of the Partners HealthCare Biobank initiative (
      • Gainer V.S.
      • Cagan A.
      • Castro V.M.
      • Duey S.
      • Ghosh B.
      • Goodson A.P.
      • et al.
      The Biobank Portal for Partners Personalized Medicine: A query tool for working with consented biobank samples, genotypes, and phenotypes using i2b2.
      ). Narrative discharge summaries were extracted from the longitudinal electronic health record of the Massachusetts General Hospital. We included any individuals 18 years of age or older who had at least one hospitalization between 2010 and 2015.
      A datamart containing all clinical data was generated with i2b2 server software (version 1.6; i2b2, Boston, MA), a computational framework for managing human health data (
      • Murphy S.N.
      • Mendis M.
      • Hackett K.
      • Kuttan R.
      • Pan W.
      • Phillips L.C.
      • et al.
      Architecture of the open-source clinical research chart from Informatics for Integrating Biology and the Bedside.
      ,
      • Murphy S.
      • Churchill S.
      • Bry L.
      • Chueh H.
      • Weiss S.
      • Lazarus R.
      • et al.
      Instrumenting the health care enterprise for discovery research in the genomic era.
      ,
      • Murphy S.N.
      • Weber G.
      • Mendis M.
      • Gainer V.
      • Chueh H.C.
      • Churchill S.
      • et al.
      Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2).
      ). The Partners HealthCare System Institutional Review Board approved both the study protocol and the release of biobank data, which were collected after acquiring written informed consent from participants and explicitly allowed identifiable data to be shared with qualified investigators.

      Study Design and Analysis

      Primary analyses used a cohort design with all patients admitted for any reason during the time period noted above. Discharge documentation was used to estimate dimensional psychopathology scores for one encounter per individual; when an individual was hospitalized on multiple occasions during the study period, a single hospitalization was selected at random to minimize bias resulting from other means of ascertainment. The derivation of dimensional psychopathology has been described elsewhere (
      • McCoy Jr., T.H.
      • Yu S.
      • Hart K.L.
      • Castro V.M.
      • Brown H.E.
      • Rosenquist J.N.
      • et al.
      High throughput phenotyping for dimensional psychopathology in electronic health records.
      ); in brief, it began with a set of seed terms for each of the five National Institute of Mental Health RDoC definitions drawn from National Institute of Mental Health workgroup statements, then expanded these term lists to include synonyms (

      National Institute of Mental Health (2017): RDoC matrix. Available at: https://www.nimh.nih.gov/research-priorities/rdoc/constructs/rdoc-matrix.shtml. Accessed November 17, 2017.

      ). This second expansion step is important because it reduces potential bias introduced by a given specialty or set of providers who may use specific terminology to characterize symptoms, yielding a broader set of terms that should better generalize across providers and hospitals. Each note is assigned a score corresponding to a simple count of term appearance. We have developed simple code to facilitate dimension extraction in other data sets (
      • McCoy Jr., T.H.
      • Yu S.
      • Hart K.L.
      • Castro V.M.
      • Brown H.E.
      • Rosenquist J.N.
      • et al.
      High throughput phenotyping for dimensional psychopathology in electronic health records.
      ).

      Genotyping and Quality Control

      DNA was extracted from buffy coat, and genotyping was done using three versions of the Illumina Multi-Ethnic Global (MEG) array (Illumina, Inc., San Diego, CA) (MEGA, n = 4927; MEGA EX, n = 5353; and MEG, n = 4784; mappable variants available for each were 1,411,334, 1,710,339, and 1,747,639, respectively). These common variant arrays all incorporate content from the 1000 Genomes Project Phase 3. Single nucleotide polymorphism (SNP) coordinates were remapped based on the TopGenomicSeq provided by Illumina; all reference SNP cluster IDs correspond to build 142 of the Single Nucleotide Polymorphism Database. To determine the forward strand of the SNP, we aligned both SNP sequences (alleles A and B) to hg19 using the BLAST-like alignment tool (BLAT) with default parameters set by the University of California Santa Cruz Genome Browser (
      • Wang C.
      • Ward M.E.
      • Chen R.
      • Liu K.
      • Tracy T.E.
      • Chen X.
      • et al.
      Scalable production of iPSC-derived human neurons to identify tau-lowering compounds by high-content screening.
      ).
      Each cohort was cleaned, imputed, and analyzed separately to avoid batch effects. In each batch we included subjects with genotyping call rates exceeding 99%; no related individuals based on identity by descent were included (
      • Henn B.M.
      • Hon L.
      • Macpherson J.M.
      • Eriksson N.
      • Saxonov S.
      • Pe'er I.
      • et al.
      Cryptic distant relatives are common in both isolated and cosmopolitan genetic samples.
      ). From these individuals, any genotyped SNP with a call rate of at least 95% and a Hardy-Weinberg equilibrium p value <1 × 10−6 was included. Imputation used the Michigan Imputation Server implementing Minimac3 (
      • Fuchsberger C.
      • Abecasis G.R.
      • Hinds D.A.
      minimac2: Faster genotype imputation.
      ,
      • Sala M.
      • Lazzaretti M.
      • De Vidovich G.
      • Caverzasi E.
      • Barale F.
      • d’Allio G.
      • et al.
      Electrophysiological changes of cardiac function during antidepressant treatment.
      ,
      • Wenger T.L.
      • Cohn J.B.
      • Bustrack J.
      Comparison of the effects of bupropion and amitriptyline on cardiac conduction in depressed patients.
      ). Imputation used all population subsets from 1000 Genomes Project Phase 3 version 5 as reference panel; haplotype phasing was performed using SHAPEIT (
      • Delaneau O.
      • Marchini J.
      • Zagury J.-F.
      A linear complexity phasing method for thousands of genomes.
      ).
      For each batch, we applied principal components analysis of a linkage-disequilibrium-pruned set of genotyped SNPs to characterize population structure, based on EIGENSTRAT as implemented in PLINK software version 1.9 (

      Purcell S, Chang C (2013): PLINK 1.90 beta. Available at: https://www.cog-genomics.org/plink/1.9/. Accessed November 17, 2017.

      ). We then plotted these components with superimposition of HapMap samples to confirm location of Northern European individuals. The present analysis included only individuals of Northern European genomic ancestry to minimize the risk for confounding by ancestry (i.e., population stratification) and because the power to detect association in other ancestry groups would be limited (
      • Price A.L.
      • Patterson N.J.
      • Plenge R.M.
      • Weinblatt M.E.
      • Shadick N.A.
      • Reich D.
      Principal components analysis corrects for stratification in genome-wide association studies.
      ,
      • Chang C.C.
      • Chow C.C.
      • Tellier L.C.
      • Vattikuti S.
      • Purcell S.M.
      • Lee J.J.
      Second-generation PLINK: Rising to the challenge of larger and richer datasets.
      ,
      • Purcell S.
      • Neale B.
      • Todd-Brown K.
      • Thomas L.
      • Ferreira M.A.
      • Bender D.
      • et al.
      PLINK: A tool set for whole-genome association and population-based linkage analyses.
      ).

      Analysis

      We examined single-locus associations in each batch, then combined in inverse-variance-weighted fixed effects meta-analysis. In all analyses, only biallelic SNPs with minor allele frequencies of at least 1% in all batches were retained. Tests for association used linear regression assuming an additive allelic effect and examined each of the five dimensional measures as a quantitative trait, with adjustment for the first 10 principal components a priori. (In previous work, analyses incorporating five or 20 components did not yield meaningfully different results.) Association results are presented in terms of independent loci after pruning using the clump command in PLINK, with a 250-kb window and r2 = .2. Locus plots were generated using LocusZoom software (

      Purcell S, Chang C (2013): PLINK 1.90 beta. Available at: https://www.cog-genomics.org/plink/1.9/. Accessed November 17, 2017.

      ,
      • Pruim R.J.
      • Welch R.P.
      • Sanna S.
      • Teslovich T.M.
      • Chines P.S.
      • Gliedt T.P.
      • et al.
      LocusZoom: Regional visualization of genome-wide association scan results.
      ).
      Reported p values are not adjusted for lambda or linkage disequilibrium scores; in previous work, adjustment for lambda-1000 or linkage disequilibrium score regression intercept did not meaningfully change relative results. Lambdas ranged from 0.998 to 1.003 (
      • Bulik-Sullivan B.K.
      • Loh P.-R.
      • Finucane H.K.
      • Ripke S.
      • Yang J.
      • Patterson N.
      • et al.
      LD Score regression distinguishes confounding from polygenicity in genome-wide association studies.
      ).

      Results

      We examined 4687 individuals of Northern European ancestry across the three batches (wave 1, 1589; wave 2, 1547; wave 3, 1551), with meta-analysis of 893,900 SNPs with minor allele frequency of 0.01 or greater. The cohorts included 2363 females (50.4%), and the mean age was 64.3 years (SD, 14.9 years). Figure 1 shows Manhattan plots for each of the five dimensional phenotypes (Q-Q plots are shown in Supplemental Figure S1).
      Figure thumbnail gr1
      Figure 1Manhattan plots from genome-wide association studies for each of the five dimensions of psychopathology.
      For each of the dimensions, the 10 independent loci with strongest evidence of association are described in Table 1. Overall, one locus was associated with arousal, two with social, and one with cognition at a standard genome-wide significance threshold (p < 5 × 10−8); these four regions are depicted in Figure 2. Notably, for arousal, the associated locus spans RFPL3 and RFPL3S; this family of proteins has been suggested to be important in primate neocortical evolution (
      • Bonnefont J.
      • Nikolaev S.I.
      • Perrier A.L.
      • Guo S.
      • Cartier L.
      • Sorce S.
      • et al.
      Evolutionary forces shape the human RFPL1,2,3 genes toward a role in neocortex development.
      ). For cognition, the associated locus spans FPR3, a chemoattractant (15623572) that has been suggested to be relevant in immune response in Alzheimer’s disease (
      • Iribarren P.
      • Zhou Y.
      • Hu J.
      • Le Y.
      • Wang J.M.
      Role of formyl peptide receptor-like 1 (FPRL1/FPR2) in mononuclear phagocyte responses in Alzheimer disease.
      ).
      Table 1Independent Loci With Strongest Evidence of Association for Each Dimension of Psychopathology
      CHRSNPp ValueSNPs, n
      Number of SNPs in the linkage disequilibrium block with nominal p < .01. See text for details.
      Locus SpanLocus Size, kbGenes in LocusA1A2MAF
      Negative
       2222:327504637.00e−78chr22:32738156..3280070562.55LOC339666, RFPL3, RFPL3S, RTCBAC0.013
       1717:144958839.03e−712chr17:14382314..14497857115.544AC0.012
       11:279632591.12e−623chr1:27963259..2798131418.056TA0.011
       1919:513713901.13e−61chr19:51371390..513713900.001TC0.033
       66:1555635481.16e−618chr6:155562206..155733894171.689CLDN20, NOX3, TFB1M, TIAM2GA0.014
       55:1553731601.24e−64chr5:155221281..155397446176.166TC0.015
       66:58211501.42e−63chr6:5815134..585118936.056GT0.014
       2020:150424291.54e−61chr20:15042429..150424290.001MACROD2CT0.013
       55:860396542.18e−64chr5:86037590..860401142.525GT0.010
       1616:836649282.71e−638chr16:83656037..8375351297.476CDH13AT0.084
      Positive
       66:58211503.91e−73chr6:5815134..585118936.056GT0.014
       11:60392583.97e−73chr1:5953811..603945385.643NPHP4TC0.012
       1616:562514284.32e−72chr16:56095547..56251428155.882DKFZP434H168, GNAO1, LOC283856GA0.013
       88:1325322294.51e−7121chr8:132404887..132532936128.05AG0.157
       1818:773742681.40e−61chr18:77374268..773742680.001GC0.011
       2020:156960841.83e−63chr20:15690854..156960845.231MACROD2CA0.093
       33:545081151.98e−671chr3:54488508..5457577087.263CACNA2D3TC0.338
       2020:165603452.54e−651chr20:16509803..1660562795.825KIF16BCT0.167
       44:1273703412.88e−6102chr4:127360862..12740292442.063TC0.100
       77:473241363.55e−63chr7:47324136..473280603.925TNS3TC0.081
      Arousal
       2222:327504632.29e−88chr22:32738156..3280070562.55LOC339666, RFPL3, RFPL3S, RTCBAC0.013
       33:1677416701.44e−781chr3:167544555..167741670197.116GOLIM4, LOC646168AG0.016
       55:1503274745.28e−74chr5:150115979..150327474211.496DCTN4, IRGM, SMIM3, ZNF300, ZNF300P1TG0.057
       66:58211506.75e−72chr6:5821150..585118930.04GT0.014
       1616:836649287.59e−749chr16:83656037..8375351297.476CDH13AT0.084
       1717:144960777.69e−711chr17:14495883..144978571.975CT0.012
       88:1184697708.49e−786chr8:118379461..118588575209.115MED30CT0.371
       66:1555635488.55e−720chr6:155562206..155733894171.689CLDN20, NOX3, TFB1M, TIAM2GA0.014
       1313:434968531.13e−61chr13:43496853..434968530.001EPSTI1CT0.014
       2020:453756741.24e−617chr20:45315786..4538526869.483SLC2A10, TP53RKTA0.063
      Social
       55:526296431.77e−822chr5:52564100..5266100796.908AG0.012
       99:1374059643.42e−83chr9:137341500..13740596464.465TC0.021
       1414:970951547.45e−84chr14:97087772..9711778530.014AG0.043
       1818:773742688.48e−85chr18:77365764..7739624030.477GC0.011
       77:24725171.89e−72chr7:2230076..2472517242.442CHST12, EIF3B, FTSJ2, MAD1L1, MIR6836, NUDT1, SNX8TC0.023
       1212:271672203.60e−78chr12:27145587..27333632188.046C12orf71, MED21, TM7SF3AG0.014
       44:24839003.94e−77chr4:2483900..2732557248.658FAM193A, RNF4AC0.013
       1111:1250648774.10e−765chr11:125052718..12511007957.362PKNOX2TA0.127
       77:143095104.37e−73chr7:14294006..1430951015.505DGKBTC0.014
       66:1258028034.93e−71chr6:125802803..1258028030.001AG0.013
      Cognitive
       1919:523519653.22e−894chr19:52306547..5237769971.153FPR3, ZNF577TC0.321
       77:36273911.24e−73chr7:3610381..366296052.58SDK1GA0.019
       1717:136839291.63e−73chr17:13680505..13806459125.955AG0.014
       1111:735861122.49e−7211chr11:73340835..73672187331.353COA4, DNAJB13, MRPL48, PAAF1, PLEKHB1, RAB6AGC0.303
       55:225283914.49e−75chr5:22365713..22706775341.063CDH12TG0.018
       1717:1673125.02e−711chr17:149460..17259123.132RPH3ALTC0.023
       1414:571835675.40e−76chr14:57182182..5719497012.789CA0.018
       66:1696164235.79e−731chr6:169596595..16962226325.669THBS2TC0.257
       1818:773742686.33e−71chr18:77374268..773742680.001GC0.011
       88:60346536.48e−743chr8:6021491..606123439.744AG0.363
      A1, allele 1; A2, allele 2; CHR, chromosome; MAF, minor allele frequency; SNP, single nucleotide polymorphism.
      a Number of SNPs in the linkage disequilibrium block with nominal p < .01. See text for details.
      Figure thumbnail gr2
      Figure 2Region plots for four loci with genome-wide significance. chr, chromosome; cM/Mb, recombination rate.

      Discussion

      In this analysis of 4687 individuals drawn from a biobank spanning two academic medical centers, we identified four loci associated with dimensional psychopathology at a standard genome-wide threshold based on natural language processing of narrative hospital discharge notes. Two of these span genes are associated with neurodevelopment (RFPL3) or neurodegeneration (PFR3). While both are known to be brain expressed, neither has previously been strongly associated with neuropsychiatric disease, suggesting the potential utility of the approach we describe in understanding brain function in a manner that is unbiased by traditional nosology.
      While not achieving a genome-wide threshold for significance, we also note the observed association between the calcium channel subunit CACNA2D3 and positive valence. This locus has previously been associated with pain sensitivity, which may impact reward responsiveness, suggesting convergent validity (i.e., assay sensitivity) (
      • Neely G.G.
      • Hess A.
      • Costigan M.
      • Keene A.C.
      • Goulas S.
      • Langeslag M.
      • et al.
      A genome-wide Drosophila screen for heat nociception identifies alpha2delta3 as an evolutionarily conserved pain gene.
      ). This family of subunits represents the target for multiple anticonvulsants used to treat neuropathic pain and has recently been shown to regulate accumulation of voltage-gated calcium channels and exocytosis at the synapse (
      • Hoppa M.B.
      • Lana B.
      • Margas W.
      • Dolphin A.C.
      • Ryan T.A.
      alpha2delta expression sets presynaptic calcium channel abundance and release probability.
      ).
      While these loci are promising as candidates for follow-up study, multiple limitations in this proof-of-concept study should be considered. First, while we exceed a standard threshold for genome-wide studies, replication will increase confidence in these results. (At a more stringent experiment-wide threshold, based upon correlation between these domains, one could also argue that a threshold of 2 × 10−8 would be appropriate.) We elected to meta-analyze all data available to us, rather than holding out a replication set, and present these results in the hope that they will encourage other hospital-linked biobanks to consider our approach. Second, as with any common variant study, none of these variants can be considered causal, and biological studies will be required to characterize their effect.
      More broadly, it is entirely possible—indeed, likely—that other dimensional features or extraction methods, as well as incorporation of other data types, would lead to identification of other loci. We adopted a new method for identifying dimensional psychopathology from narrative clinical notes based on seed terms extracted from RDoC workgroup statements, which we have recently described in more detail along with initial validation (
      • McCoy Jr., T.H.
      • Yu S.
      • Hart K.L.
      • Castro V.M.
      • Brown H.E.
      • Rosenquist J.N.
      • et al.
      High throughput phenotyping for dimensional psychopathology in electronic health records.
      ). These scores do not yet address subdomains; sensitivity likely varies by domain, and indeed, as with RDoC itself, the presence of terms loading on a given domain does not necessarily represent psychopathology and may instead capture normal or subsyndromal variation. We note that the present study represents an example of transfer learning: a model trained in one type of cohort (psychiatric hospitalizations) is applied to distinguish features of another (all-cause hospitalizations), but further investigations of portability will be important. In particular, this approach complements rather than replaces analysis of more traditional curated phenotypes (
      • O’Dushlaine C.
      • Ripke S.
      • Ruderfer D.M.
      • Hamilton S.P.
      • Fava M.
      • Iosifescu D.V.
      • et al.
      Rare copy number variation in treatment-resistant major depressive disorder.
      ,
      • Castro V.M.
      • Minnier J.
      • Murphy S.N.
      • Kohane I.
      • Churchill S.E.
      • Gainer V.
      • et al.
      Validation of electronic health record phenotyping of bipolar disorder cases and controls.
      ). Beyond investigating other strategies for concept extraction, it will be valuable to understand the extent to which incorporating other types of notes or integrating these data with coded clinical data improve the identification of dimensions of psychopathology [for further discussion of general methodologic considerations, please see (
      • McCoy Jr., T.H.
      • Yu S.
      • Hart K.L.
      • Castro V.M.
      • Brown H.E.
      • Rosenquist J.N.
      • et al.
      High throughput phenotyping for dimensional psychopathology in electronic health records.
      )].
      With these caveats in mind, our results suggest an approach to identifying genes associated with psychopathology beyond traditional diagnostic categories, and they demonstrate the feasibility and potential utility of this broad class of approaches, aiming to be both transparent and portable. Narrative clinical notes may contain a wealth of clinical detail relevant to developing dimensional representations of brain diseases. With increasing availability of biobanks and registries as a resource for genomic discovery and translation, natural language processing represents a way to amplify their utility for investigating complex phenotypes that avoids the constraint of traditional psychiatric nosology.

      Acknowledgments and Disclosures

      This work was supported by National Human Genome Research Institute (NHGRI) Grant No. 1P50MH106933-04 and National Institute of Mental Health (NIMH) Grant No. 1R01MH106577-01A1 (to RHP) and the Broad Institute Stanley Center Fellowship and Brain and Behavior Foundation Grant No. 26489 (to THM). The sponsors had no role in study design, writing of the report, or data collection, analysis, or interpretation. The corresponding and senior authors had full access to all data and made the decision to submit for publication.
      We thank the participants and administrators of the Partners HealthCare Biobank for their contribution to this work.
      RHP serves on the scientific advisory board for Perfect Health, Genomind, and Psy Therapeutics and is a consultant for RID Ventures. The other authors report no biomedical financial interests or potential conflicts of interest.

      Supplementary Material

      References

        • Bulik-Sullivan B.
        • Finucane H.K.
        • Anttila V.
        • Gusev A.
        • Day F.R.
        • Loh P.R.
        • et al.
        An atlas of genetic correlations across human diseases and traits.
        Nat Genet. 2015; 47: 1236-1241
        • Lee S.H.
        • Ripke S.
        • Neale B.M.
        • Faraone S.V.
        • Purcell S.M.
        • et al.
        • Cross-Disorder Group of the Psychiatric Genomics Consortium
        Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs.
        Nat Genet. 2013; 45: 984-994
        • Gilman S.E.
        • Ni M.Y.
        • Dunn E.C.
        • Breslau J.
        • McLaughlin K.A.
        • Smoller J.W.
        • et al.
        Contributions of the social environment to first-onset and recurrent mania.
        Mol Psychiatry. 2015; 20: 329-336
        • Insel T.
        • Cuthbert B.
        • Garvey M.
        • Heinssen R.
        • Pine D.S.
        • Quinn K.
        • et al.
        Research domain criteria (RDoC): Toward a new classification framework for research on mental disorders.
        Am J Psychiatry. 2010; 167: 748-751
        • Sanislow C.A.
        • Pine D.S.
        • Quinn K.J.
        • Kozak M.J.
        • Garvey M.A.
        • Heinssen R.K.
        • et al.
        Developing constructs for psychopathology research: Research Domain Criteria.
        J Abnorm Psychol. 2010; 119: 631-639
        • McCoy T.H.
        • Castro V.M.
        • Rosenfield H.R.
        • Cagan A.
        • Kohane I.S.
        • Perlis R.H.
        A clinical perspective on the relevance of research domain criteria in electronic health records.
        Am J Psychiatry. 2015; 172: 316-320
        • McCoy Jr., T.H.
        • Yu S.
        • Hart K.L.
        • Castro V.M.
        • Brown H.E.
        • Rosenquist J.N.
        • et al.
        High throughput phenotyping for dimensional psychopathology in electronic health records.
        Biol Psychiatry. 2018; 83: 997-1004
        • Gainer V.S.
        • Cagan A.
        • Castro V.M.
        • Duey S.
        • Ghosh B.
        • Goodson A.P.
        • et al.
        The Biobank Portal for Partners Personalized Medicine: A query tool for working with consented biobank samples, genotypes, and phenotypes using i2b2.
        J Pers Med. 2016; 6: 11
        • Murphy S.N.
        • Mendis M.
        • Hackett K.
        • Kuttan R.
        • Pan W.
        • Phillips L.C.
        • et al.
        Architecture of the open-source clinical research chart from Informatics for Integrating Biology and the Bedside.
        AMIA Annu Symp Proc. 2007; : 548-552
        • Murphy S.
        • Churchill S.
        • Bry L.
        • Chueh H.
        • Weiss S.
        • Lazarus R.
        • et al.
        Instrumenting the health care enterprise for discovery research in the genomic era.
        Genome Res. 2009; 19: 1675-1681
        • Murphy S.N.
        • Weber G.
        • Mendis M.
        • Gainer V.
        • Chueh H.C.
        • Churchill S.
        • et al.
        Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2).
        J Am Med Inform Assoc. 2010; 17: 124-130
      1. National Institute of Mental Health (2017): RDoC matrix. Available at: https://www.nimh.nih.gov/research-priorities/rdoc/constructs/rdoc-matrix.shtml. Accessed November 17, 2017.

        • Wang C.
        • Ward M.E.
        • Chen R.
        • Liu K.
        • Tracy T.E.
        • Chen X.
        • et al.
        Scalable production of iPSC-derived human neurons to identify tau-lowering compounds by high-content screening.
        Stem Cell Rep. 2017; 9: 1221-1233
        • Henn B.M.
        • Hon L.
        • Macpherson J.M.
        • Eriksson N.
        • Saxonov S.
        • Pe'er I.
        • et al.
        Cryptic distant relatives are common in both isolated and cosmopolitan genetic samples.
        PLoS One. 2012; 7e34267
        • Fuchsberger C.
        • Abecasis G.R.
        • Hinds D.A.
        minimac2: Faster genotype imputation.
        Bioinformatics. 2015; 31: 782-784
        • Sala M.
        • Lazzaretti M.
        • De Vidovich G.
        • Caverzasi E.
        • Barale F.
        • d’Allio G.
        • et al.
        Electrophysiological changes of cardiac function during antidepressant treatment.
        Ther Adv Cardiovasc Dis. 2009; 3: 29-43
        • Wenger T.L.
        • Cohn J.B.
        • Bustrack J.
        Comparison of the effects of bupropion and amitriptyline on cardiac conduction in depressed patients.
        J Clin Psychiatry. 1983; 44: 174-175
        • Delaneau O.
        • Marchini J.
        • Zagury J.-F.
        A linear complexity phasing method for thousands of genomes.
        Nat Methods. 2012; 9: 179-181
      2. Purcell S, Chang C (2013): PLINK 1.90 beta. Available at: https://www.cog-genomics.org/plink/1.9/. Accessed November 17, 2017.

        • Price A.L.
        • Patterson N.J.
        • Plenge R.M.
        • Weinblatt M.E.
        • Shadick N.A.
        • Reich D.
        Principal components analysis corrects for stratification in genome-wide association studies.
        Nat Genet. 2006; 38: 904-909
        • Chang C.C.
        • Chow C.C.
        • Tellier L.C.
        • Vattikuti S.
        • Purcell S.M.
        • Lee J.J.
        Second-generation PLINK: Rising to the challenge of larger and richer datasets.
        Gigascience. 2015; 4: 7
        • Purcell S.
        • Neale B.
        • Todd-Brown K.
        • Thomas L.
        • Ferreira M.A.
        • Bender D.
        • et al.
        PLINK: A tool set for whole-genome association and population-based linkage analyses.
        Am J Hum Genet. 2007; 81: 559-575
        • Pruim R.J.
        • Welch R.P.
        • Sanna S.
        • Teslovich T.M.
        • Chines P.S.
        • Gliedt T.P.
        • et al.
        LocusZoom: Regional visualization of genome-wide association scan results.
        Bioinformatics. 2010; 26: 2336-2337
        • Bulik-Sullivan B.K.
        • Loh P.-R.
        • Finucane H.K.
        • Ripke S.
        • Yang J.
        • Patterson N.
        • et al.
        LD Score regression distinguishes confounding from polygenicity in genome-wide association studies.
        Nat Genet. 2015; 47: 291-295
        • Bonnefont J.
        • Nikolaev S.I.
        • Perrier A.L.
        • Guo S.
        • Cartier L.
        • Sorce S.
        • et al.
        Evolutionary forces shape the human RFPL1,2,3 genes toward a role in neocortex development.
        Am J Hum Genet. 2008; 83: 208-218
        • Iribarren P.
        • Zhou Y.
        • Hu J.
        • Le Y.
        • Wang J.M.
        Role of formyl peptide receptor-like 1 (FPRL1/FPR2) in mononuclear phagocyte responses in Alzheimer disease.
        Immunol Res. 2005; 31: 165-176
        • Neely G.G.
        • Hess A.
        • Costigan M.
        • Keene A.C.
        • Goulas S.
        • Langeslag M.
        • et al.
        A genome-wide Drosophila screen for heat nociception identifies alpha2delta3 as an evolutionarily conserved pain gene.
        Cell. 2010; 143: 628-638
        • Hoppa M.B.
        • Lana B.
        • Margas W.
        • Dolphin A.C.
        • Ryan T.A.
        alpha2delta expression sets presynaptic calcium channel abundance and release probability.
        Nature. 2012; 486: 122-125
        • O’Dushlaine C.
        • Ripke S.
        • Ruderfer D.M.
        • Hamilton S.P.
        • Fava M.
        • Iosifescu D.V.
        • et al.
        Rare copy number variation in treatment-resistant major depressive disorder.
        Biol Psychiatry. 2014; 76: 536-541
        • Castro V.M.
        • Minnier J.
        • Murphy S.N.
        • Kohane I.
        • Churchill S.E.
        • Gainer V.
        • et al.
        Validation of electronic health record phenotyping of bipolar disorder cases and controls.
        Am J Psychiatry. 2015; 172: 363-372