Spurious Genetic Associations

  • Patrick F. Sullivan
    Address reprint requests to Patrick F. Sullivan, M.D., Department of Genetics, CB#7264, 4109D Neurosciences Research Building, University of North Carolina, Chapel Hill, NC 27599-7264
    Department of Genetics, University of North Carolina, Chapel Hill, North Carolina; and the Department of Medical Epidemiology & Biostatistics, Karolinska Institutet, Stockholm, Sweden.
    Search for articles by this author


      Genetic association studies are widely used in biomedical research and yet only a minority of positive findings stand the test of replication. I explored the capacity of association studies to produce false positive findings and the impact of various definitions of replication.


      Genetically realistic simulation data of a typical genotyping/analytic approach for 10 single nucleotide polymorphisms (SNPs) in COMT, a commonly studied candidate gene.


      Candidate gene studies like those simulated here are highly likely to produce one or more false positive findings at α ≤ .05, the pattern of findings can often be “compelling” or “intriguing,” and false positive findings propagate and confuse the literature unless the definition of replication is precise.


      Findings from single association studies constitute “tentative knowledge” and must be interpreted with exceptional caution. For the association method to function as intended, every statistical comparison must be tracked and reported, and integrated replication is essential. Precise replication (the same SNPs, phenotype, and direction of association) is required in the interpretation of multiple association studies.

      Key Words

      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to Biological Psychiatry
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Altshuler D.
        • Brooks L.D.
        • Chakravarti A.
        • Collins F.S.
        • Daly M.J.
        • Donnelly P.
        A haplotype map of the human genome.
        Nature. 2005; 437: 1299-1320
        • Anonymous
        Freely associating.
        Nat Genet. 1999; 22: 1-2
        • Anonymous
        In search of genetic precision.
        Lancet. 2003; 361: 357
        • Anonymous
        Framework for a fully powered risk engine.
        Nat Genet. 2005; 37: 1153
        • Armitage P.
        Tests for linear trends in proportions and frequencies.
        Biometrics. 1956; 11: 375-386
        • Balding D.J.
        A tutorial on statistical methods for population association studies.
        Nat Rev Genet. 2006; 7: 781-791
        • Barrett J.C.
        • Fry B.
        • Maller J.
        • Daly M.J.
        Haploview: Analysis and visualization of LD and haplotype maps.
        Bioinformatics. 2005; 21: 263-265
        • Benjamini Y.
        • Drai D.
        • Elmer G.
        • Kafkafi N.
        • Golani I.
        Controlling the false discovery rate in behavior genetics research.
        Behav Brain Res. 2001; 125: 279-284
        • Buchanan A.V.
        • Weiss K.M.
        • Fullerton S.M.
        Dissecting complex disease: The quest for the Philosopher’s Stone?.
        Int J Epidemiol. 2006; 35: 562-571
        • Cardon L.R.
        • Bell J.I.
        Association study designs for complex diseases.
        Nat Rev Genet. 2001; 2: 91-99
        • Chen J.
        • Lipska B.K.
        • Halim N.
        • Ma Q.D.
        • Matsumoto M.
        • Melhem S.
        • et al.
        Functional analysis of genetic variation in catechol-O-methyltransferase (COMT): Effects on mRNA, protein, and enzyme activity in postmortem human brain.
        Am J Hum Genet. 2004; 75: 807-821
        • Colhoun H.M.
        • McKeigue P.M.
        • Davey Smith G.
        Problems of reporting genetic associations with complex outcomes.
        Lancet. 2003; 361: 865-872
        • de Bakker P.I.
        • Yelensky R.
        • Pe’er I.
        • Gabriel S.B.
        • Daly M.J.
        • Altshuler D.
        Efficiency and power in genetic association studies.
        Nat Genet. 2005; 37: 1217-1223
        • DeGroot M.H.
        Probability and Statistics. Addison Wesley Longman, Inc, New York1986
        • Devlin B.
        • Roeder K.
        Genomic control for association studies.
        Biometrics. 1999; 55: 997-1004
        • Efron B.
        • Tibshirani R.
        An Introduction to the Bootstrap. Chapman and Hall, London1993
        • Gauderman W.J.
        Sample size requirements for matched case-control studies of gene-environment interaction.
        Stat Med. 2002; 21: 35-50
        • Good P.I.
        Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses. Springer-Verlag, New York2004
        • Gottesman I.I.
        • Gould T.D.
        The endophenotype concept in psychiatry: Etymology and strategic intentions.
        Am J Psychiatry. 2003; 160: 636-645
        • Hemminger B.M.
        • Saelim B.
        • Sullivan P.F.
        TAMAL: An integrated approach to choosing SNPs for genetic studies of human complex traits.
        Bioinformatics. 2006; 22: 626-627
        • Hirschhorn J.N.
        • Daly M.J.
        Genome-wide association studies for common diseases and complex traits.
        Nat Rev Genet. 2005; 6: 95-108
        • Hirschhorn J.N.
        • Lohmueller K.
        • Byrne E.
        • Hirschhorn K.
        A comprehensive review of genetic association studies.
        Genet Med. 2002; 4: 45-61
        • Ioannidis J.P.
        Why most published research findings are false.
        PLoS Med. 2005; 2: e124
        • Ioannidis J.P.
        Commentary: Grading the credibility of molecular evidence for complex diseases.
        Int J Epidemiol. 2006; 35: 572-578
        • Ioannidis J.P.
        • Ntzani E.E.
        • Trikalinos T.A.
        • Contopoulos-Ioannidis D.G.
        Replication validity of genetic association studies.
        Nat Genet. 2001; 29: 306-309
        • Lazzeroni L.C.
        • Lange K.
        A conditional inference framework for extending the transmission/disequilibrium test.
        Hum Hered. 1998; 48: 67-81
        • Lehmann E.
        • Romano J.
        Generalizations of the familywise error rate.
        Ann Stat. 2005; 33: 1138-1154
        • Lin B.K.
        • Clyne M.
        • Walsh M.
        • Gomez O.
        • Yu W.
        • Gwinn M.
        • et al.
        Tracking the epidemiology of human genes in the literature: The HuGE Published Literature database.
        Am J Epidemiol. 2006; 164: 1-4
        • Lin D.Y.
        Evaluating statistical significance in two-stage genomewide association studies.
        Am J Hum Genet. 2006; 78: 505-509
        • Lin D.Y.
        • Zou F.
        Assessing genomewide statistical significance in linkage studies.
        Genet Epidemiol. 2004; 27: 202-214
        • Little J.
        • Bradley L.
        • Bray M.S.
        • Clyne M.
        • Dorman J.
        • Ellsworth D.L.
        • et al.
        Reporting, appraising, and integrating data on genotype prevalence and gene-disease associations.
        Am J Epidemiol. 2002; 156: 300-310
        • Lohmueller K.E.
        • Pearce C.L.
        • Pike M.
        • Lander E.S.
        • Hirschhorn J.N.
        Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease.
        Nat Genet. 2003; 33: 177-182
        • Marchini J.
        • Donnelly P.
        • Cardon L.R.
        Genome-wide strategies for detecting multiple loci that influence complex diseases.
        Nat Genet. 2005; 37: 413-417
        • Page G.P.
        • George V.
        • Go R.C.
        • Page P.Z.
        • Allison D.B.
        “Are we there yet?”: Deciding when one has demonstrated specific genetic causation in complex diseases and quantitative traits.
        Am J Hum Genet. 2003; 73: 711-719
        • Risch N.J.
        Searching for genetic determinants in the new millennium.
        Nature. 2000; 405: 847-856
        • Risch N.
        • Merikangas K.R.
        The future of genetic studies of complex human diseases.
        Science. 1996; 273 (Erratum: 1997, 275:1329–1330): 1516-1517
        • Rothman K.J.
        Modern Epidemiology. Little, Brown, and Company, Boston1986
        • SAS Institute Inc
        SAS/STAT® Software: Version 9. SAS Institute, Inc, Cary, NC2004
        • Storey J.D.
        • Tibshirani R.
        Statistical significance for genomewide studies.
        Proc Natl Acad Sci U S A. 2003; 100: 9440-9445
        • Sullivan P.F.
        • Eaves L.J.
        • Kendler K.S.
        • Neale M.C.
        Genetic case-control association studies in neuropsychiatry.
        Arch Gen Psychiatry. 2001; 58: 1015-1024
        • van den Oord E.J.
        • Sullivan P.F.
        False discoveries and models for gene discovery.
        Trends Genet. 2003; 19: 537-542
        • van den Oord E.J.
        • Sullivan P.F.
        A framework for controlling false discovery rates and minimizing the amount of genotyping in the search for disease mutations.
        Hum Hered. 2003; 56: 188-199
        • Wacholder S.
        • Chanock S.
        • Garcia-Closas M.
        • El Ghormli L.
        • Rothman N.
        Assessing the probability that a positive report is false: An approach for molecular epidemiology studies.
        J Natl Cancer Inst. 2004; 96: 434-442
      1. Wright F, Hanwen H, Guan X, Gamiel K, Jeffries C, Barry WT, et al. (submitted): Simulating association studies: A data-based resampling method for candidate regions or whole genome scans.