If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Major Depressive Disorder Is Associated With Differential Expression of Innate Immune and Neutrophil-Related Gene Networks in Peripheral Blood: A Quantitative Review of Whole-Genome Transcriptional Data From Case-Control Studies
Whole-genome transcription has been measured in peripheral blood samples as a candidate biomarker of inflammation associated with major depressive disorder.
We searched for all case-control studies on major depressive disorder that reported microarray or RNA sequencing measurements on whole blood or peripheral blood mononuclear cells. Primary datasets were reanalyzed, when openly accessible, to estimate case-control differences and to evaluate the functional roles of differentially expressed gene lists by technically harmonized methods.
We found 10 eligible studies (N = 1754 depressed cases and N = 1145 healthy controls). Fifty-two genes were called significant by 2 of the primary studies (published overlap list). After harmonization of analysis across 8 accessible datasets (n = 1706 cases, n = 1098 controls), 272 genes were coincidentally listed in the top 3% most differentially expressed genes in 2 or more studies of whole blood or peripheral blood mononuclear cells with concordant direction of effect (harmonized overlap list). By meta-analysis of standardized mean difference across 4 studies of whole-blood samples (n = 1567 cases, n = 954 controls), 343 genes were found with false discovery rate <5% (standardized mean difference meta-analysis list). These 3 lists intersected significantly. Genes abnormally expressed in major depressive disorder were enriched for innate immune-related functions, coded for nonrandom protein-protein interaction networks, and coexpressed in the normative transcriptome module specialized for innate immune and neutrophil functions.
Quantitative review of existing case-control data provided robust evidence for abnormal expression of gene networks important for the regulation and implementation of innate immune response. Further development of white blood cell transcriptional biomarkers for inflamed depression seems warranted.
). Indeed, biomarker evidence for inflammatory disease would conventionally be regarded as prohibiting a diagnosis of MDD, instead implying an alternative diagnosis of secondary or comorbid depression. However, there is increasing interest in the concept that depressive symptoms, whether diagnosed as MDD or comorbid with bodily disorders, can be caused, in at least a proportion of cases, by inflammatory mechanisms (
). The evidence supporting this hypothesis has motivated a search for biomarkers of immune status that could be used to stratify MDD cases and to predict therapeutic response to anti-inflammatory drugs (
). It is already clear that to optimize the potential of anti-inflammatory interventions for depressive symptoms, new therapeutics must be precisely guided by development of immune biomarkers and companion diagnostics (
Blood is one of the most clinically convenient tissues to sample for immune biomarkers, whereas the brain is arguably the least convenient. For blood biomarkers to be relevant to the inflammatory pathogenesis of depression, it is assumed that the immune status of the central nervous system is correlated with, or caused by, the immune status of the periphery. Experimental work in animal models has demonstrated causal mechanisms that link peripheral inflammation to central inflammation, altered neuronal function, and quasi-depressive behaviors (
In the early 1990s, the search for inflammatory blood biomarkers of depression initially focused on acute phase proteins, such as C-reactive protein (CRP), and proinflammatory cytokines, such as interleukin-6 (
). Meta-analyses have demonstrated that CRP, interleukin-6, and some other cytokines are moderately but robustly increased on average in MDD cases versus controls (standardized mean difference [SMD] approximately 0.1–0.5) (
). However, the case-control difference in mean CRP concentration should not obscure the fact that only a minority of MDD cases (<30%) will have CRP greater than the upper limit of the normal range (3 mg/L) (
). Moreover, compared with the elevated proinflammatory cytokine levels observed in patients with autoimmune disorders, the levels found in MDD are at lower (pg/mL) concentrations, often below the lower limits of quantification or detectability for standard assays. Cytokine levels are responsive to exercise, diet, stress, time of day, annual season, and other potentially confounding factors. Most fundamentally, modestly increased blood levels of a few proteins do not specify causal pathways with cellular or subcellular precision (
In this context, gene expression in white blood cells has been increasingly investigated as an alternative class of immune biomarkers. Case-control studies have measured expression of a subset of preselected candidate genes in whole-blood, peripheral blood mononuclear cell (PBMC), and monocyte-sorted samples (
). Recent technical advances have enabled transcriptional measurement of the whole genome, by microarray or RNA sequencing (RNA-seq) methods, in blood samples and postmortem brain tissue samples from MDD case-control studies (
Here we endeavored to quantitatively review all published whole-genome transcriptional datasets from peripheral blood samples in case-control studies of MDD (Table 1). First, we simply compiled the published overlap list of genes that were called significant by 2 or more of 10 methodologically heterogeneous studies. Second, to mitigate statistical heterogeneity between studies, we reanalyzed the subject-level statistics from 8 openly accessible datasets by technically harmonized standards so we could compile the harmonized overlap list of genes that ranked in the top 3% of differentially expressed genes that were concordantly overexpressed or underexpressed in MDD for 2 or more primary studies. Third, we meta-analytically estimated the SMD between cases and controls for each gene on average over 4 harmonized datasets from whole-blood samples to compile the SMD meta-analysis list of genes that were concordantly and differentially expressed at a false discovery rate (FDR) of 5%. We used ontology enrichment analysis, protein-protein interaction (PPI) networks, and gene coexpression networks to explore functional specialization and network roles of these MDD case-control gene lists.
Table 1Primary Studies of Genome-wide Blood Gene Expression in MDD Cases Compared With Healthy Controls
Adjusted for 39 covariates including BMI, smoking, medication, etc.
Genes: 29 genes selected at FDR ≤ .25. 14 additional genes included at puncorrected < .05 as members of the significant interferon α/β signaling pathway Enrichments: Biological functions including innate immune processes, vesicle trafficking, cell cycle regulation, and splicing Pathway enrichment of interferon α/β signaling pathway
Data available from Depression Genes and Networks study (PI, DF Levinsin) through application to: https://nimhgenetics.org 462 MDD 458 CTL
Eligible studies were identified using PubMed, Google Scholar, and the Gene Expression Omnibus to search for peer-reviewed publications indexed by the terms “major depressive disorder,” “human,” “blood,” and “gene expression.” We found 28 studies reporting genome-wide transcriptional data, 18 of which used a case-control design. Some of these studies were subsequently excluded if 1) the transcriptional measurement was of microRNA, long noncoding RNA, or circular RNA (n = 3);) the study sample was composed solely of participants with geriatric onset depression (n = 2); 3) multiple patient samples were pooled before transcriptional measurement (n = 1); or 4) only candidate genes were reported (n = 2). One study (
) included data from cells at baseline and after lipopolysaccharide stimulation ex vivo; the lipopolysaccharide-stimulated data were excluded from further analysis, while the baseline data (i.e., before lipopolysaccharide stimulation) were included. This process resulted in 10 eligible datasets, originally published in 9 articles (Table 1), of which 8 datasets were accessible for harmonized reanalysis of subject-level data.
Harmonized Differential Expression Analysis
The harmonized workflow included 1) filtering out, rather than imputing, missing data or samples with missing metadata; 2) normalization of expression statistics between samples in the same dataset, either by quantile normalization for microarray data or by the DEseq2 median of ratios method for RNA-seq data (
); and 3) univariate statistical analysis of differential expression using a linear model that included the same covariates (age, gender, and batch, where available) for each study (see Supplement 1 for study-specific nuances of this procedure). A total of 24,976 genes were measured in at least 2 studies, which was the minimum criterion for a gene to be included in harmonized analysis.
Compilation of Harmonized Overlap List
For quantitative review of the 8 openly accessible datasets, we did not immediately discount smaller studies but leveraged consensus across studies to filter false positives in a way that is robust to outliers. We ranked genes within each study by their p value for MDD case versus control differential expression and thus defined the top 3% most differentially expressed genes, with the smallest p values, for each primary dataset. While any cutoff is somewhat arbitrary, the top 3% threshold includes approximately 500 genes per study, which is comparable to the median number of genes defined as statistically significant across all primary studies by multiple univariate analyses with p < .01 per gene. The harmonized overlap list comprised genes present in the top 3% list of 2 or more of the primary studies that were concordant in direction-of-effect or sign of differential (over- or under-) expression.
We assessed the similarity between studies in terms of the pairwise concordance of the sign of case-control differential expression for each gene. For each primary study, we estimated a direction-of-effect vector, with +1 coding overexpressed genes and −1 coding underexpressed genes. Then the pairwise concordance between studies was defined as the cosine similarity between their 2 direction-of-effect vectors.
Compilation of SMD Meta-analysis List
We estimated the SMD, or Cohen’s d, fitting a random-effects model using the R package metafor (
) to combine differential expression statistics across primary studies. This is technically appropriate for the meta-analysis of homogeneous datasets without obvious outliers. We therefore restricted this meta-analysis to the 4 large independent case-control studies of whole-blood samples (
) and to 16,302 genes measured in at least 3 of the studies. The meta-analytic effect size for each gene was tested under the null hypothesis of zero case-control difference in gene expression, with per-gene p values corrected for multiple comparisons using the Benjamini-Hochberg method (
), to compile the SMD meta-analysis list with FDR of 5% and low heterogeneity (τ2 < .01).
We used the enricher and cnetplot functions from the R package ClusterProfiler. A universe of background genes was defined by the set of 24,976 genes measured in 2 or more studies for the harmonized overlap analysis or the 16,302 genes measured in 3 or more studies in the SMD meta-analysis. Significant enrichment was defined probabilistically, controlling FDR < 5%, for GO Biological Processes (
) pathways and visualized as gene-pathway association networks. We used Fisher’s exact test to assess the enrichment of harmonized overlap or SMD meta-analysis gene lists for previously reported gene lists associated with obesity (
) application within Cytoscape v3.7.2 (cytoscape.org). The probability of the number of edges in each PPI network, under the null hypothesis that the proteins were coded by a random set of genes, was calculated by permutation testing.
Weighted Gene Coexpression Network Analysis
To provide some independent, normative context for the MDD case-control gene lists, we accessed a public database of whole-genome transcription measured by RNA-seq in whole-blood samples from 755 healthy participants in the GTEx (Genotype-Tissue Expression) project (
) to represent the pattern of correlations between all possible pairs of transcripts as a signed network or transcriptome. In this graph, each node is a gene, and each edge represents the coexpression or correlation between a pair of gene transcripts. Using WGCNA parameters of soft power = 14 and deep split = 4, the normative gene coexpression network was divided into 17 modules or communities (Tables S7 and S8 in Supplement 2).
We summarized each primary study by a set of 17 eigengenes, calculated by the WGCNA software function multiSetMEs, which represented the weighted average expression in each primary study of all genes affiliated to each of the normative network modules. The similarity of each pair of studies was then quantifiable by the plotEigengeneNetworks function in WGCNA, which calculates the preservation of normative network community structure between all pairs of modular eigengenes for each pair of primary datasets (see Supplement 1).
), genome-wide differential expression statistics were available for all participants. Thus, 8 out of 10 eligible studies could be reanalyzed or meta-analyzed by technically harmonized methods (Table 2).
Table 2Summary of Key Parameters and Results of 3 Analyses Used for Quantitative Review
Differentially Expressed Genes
Overexpressed Genes in MDD Cases
Underexpressed Genes in MDD Cases
Published Overlap List
Variable between primary studies
52 (45 concordant)
Harmonized Overlap List
24,976 in at least 2 studies
SMD Meta-analysis List
16,302 in at least 3 studies
MDD, major depressive disorder; SMD, standardized mean difference.
Seven studies measured transcription in whole-blood samples, including all cell types; 2 studies measured transcription in PBMCs; and 1 study measured transcription in lymphocytes. Eight studies used one of a variety of microarray platforms to measure messenger RNA (6 in whole blood, 1 in PBMCs, and 1 in lymphocytes); 2 studies used RNA-seq (1 in whole blood and 1 in PBMCs) (Table 1).
Case-control differences in gene expression were statistically controlled for different sets of confounding factors (
) across studies. Age, gender, and messenger RNA assay batch were most frequently included as covariates in the linear model used for estimation of differential gene expression; however, some studies included zero covariates, whereas one study included 39 potential confounds (body mass index, tobacco smoking, medication, etc.) (
). Two studies used multivariate methods, such as WGCNA, but most studies reported only mass or multiple univariate analysis of between-study differences with a genewise threshold for statistical significance. Three studies used the FDR to account for the large number of tests (approximately 10,000) required to survey the whole genome and to control the number of false-positive tests expected by chance (
Reading across all 10 studies, 1455 genes were called significant by at least one study (Table S1 in Supplement 2). Fifty-two genes were called significant by more than one study (Figure 1; Table S2 in Supplement 2) and therefore constituted the published overlap list. Of these genes, 45 had concordant sign of case-control differential expression: 19 genes were overexpressed and 26 were underexpressed in MDD cases. Only one gene, SETD6, was called significant by 3 studies (
). The published overlap list was significantly enriched for functional terms “cell activation,” “immune system processes,” and “defense response to other organisms” (Figure 1; Table S3 in Supplement 2). However, the number of edges in the PPI network (n = 11) was not significantly greater than expected by chance (n = 9; p > .05) (Figure 1).
Harmonized Overlap List
A total of 543 genes were ranked in the top 3% most differentially expressed genes for at least 2 studies; 52 genes were common to at least 3 studies; and 3 genes, LPCAT1, MS4A7, and TROVE2, were among the top 3% in 4 studies (Figure 2; Tables S4 and S5 in Supplement 2). The 543 genes included 133 genes that were concordantly underexpressed and 139 that were concordantly overexpressed in MDD cases, out of 24,976 gene transcripts measured in at least 2 studies. These 272 concordant genes constituted the harmonized overlap list, which included significantly more (n = 10) of the 52 genes in the published overlap list than expected by chance (p < 10−9, Fisher’s exact test).
The harmonized overlap list was significantly enriched for the functional terms “cell activation,” “endocytosis,” “granulocyte-activation,” “leukocyte-activation,” “neutrophil-activation,” and “degranulation” (Figure 2; Table S6 in Supplement 2). Genes in the harmonized overlap list were significantly overrepresented in the module of the normative transcriptional network specialized for neutrophil- and granulocyte-mediated immunity (Figure 3), and they coded for a PPI network comprising more known biochemical interactions (n = 259) than expected for a set of proteins coded by 272 randomly sampled genes (n = 179; p < 10−7) (Figure 2).
Between-Study Similarity of Harmonized Transcriptional Datasets
By pairwise concordance analysis of the sign of differential expression, we found the most similar datasets were based on the largest whole-blood samples (
). Studies based on PBMC samples were more similar to each other than to studies of whole-blood samples (Figure 3). By preservation analysis of normative transcriptome community structure, we again found that the whole-blood studies (
) were more similar to each other than to the PBMC-based studies (Figure 3).
SMD Meta-analysis List
To identify a set of genes that were significantly differentially expressed across a homogeneous subset of studies, we estimated the SMD for each gene on average over the 4 independent studies of whole-blood samples that were most similar to each other by concordance and preservation analysis (
). This analysis revealed 343 genes (229 overexpressed and 114 underexpressed in cases) for which the SMD between cases and controls was significant at FDR < .05 and the heterogeneity was low, with τ < .01 (SMD meta-analysis list) (Figure 4; Table S9 in Supplement 2). Of these genes, 21 were also included in the 272 genes of the harmonized overlap list, which was significantly more than expected by chance (p < 10−7, Fisher’s exact test).
The SMD meta-analysis gene list was highly enriched for similar pathways to those enriched in the harmonized overlap list: “cell activation involved immune response,” “granulocyte-activation,” “leukocyte-activation,” “neutrophil-activation,” and “degranulation.” Additional enrichments were seen for functional terms “exocytosis,” “cell death,” and “apoptotic processes” (Figure 4; Table S10 in Supplement 2). Collectively these genes coded for a PPI network comprising more known biochemical interactions (n = 362) than expected for a set of proteins coded by 343 randomly sampled genes (n = 296; p < .0002) (Figure 4). Overexpressed genes in the SMD meta-analysis list were significantly overrepresented only in the module of the normative network specialized for neutrophil- and granulocyte-mediated immunity; underexpressed genes were significantly overrepresented in the normative network modules specialized for RNA processing and other functions (Figure 5).
Assessment of Potentially Confounding Factors
MDD case-control differences in gene expression could be confounded by many factors that were not consistently measured in all primary studies, e.g., obesity and smoking, and can therefore not be statistically controlled as covariates in the linear model used for harmonized analysis. However, whole-blood gene expression studies of obesity (
) have previously published lists of genes associated with these potential confounds (Table S11 in Supplement 2). We tested the 272 genes in the harmonized overlap list and the 343 genes in the SMD meta-analysis list for intersection with these prior gene sets. There were no significant intersections between either of the 2 MDD-related lists and the smoking- or age-related gene sets; there were 2 obesity-related genes (RPS7 and RPS3A) in the harmonized overlap list (p = .03).
We reviewed 10 whole-genome transcriptional case-control studies of MDD, collectively including data on 1754 MDD cases and 1145 healthy controls (Table 1). We have reported 3 principal analyses (Table 2), of which only the first is descriptive: we described the set of genes reported as significant by one or more of the primary studies. This preliminary effort to “read across” primary studies highlighted the high degree of heterogeneity between studies on many important factors, including the blood tissue type (whole blood or PBMCs), the microarray or RNA-seq platforms used to measure whole-genome transcription, the diagnostic and eligibility criteria used to define depressed cases, and the methods used for statistical analysis. To assess and mitigate the statistical contribution to heterogeneity, we accessed individual whole-genome transcripts or differential expression statistics deposited in open repositories by 8 studies and completely reanalyzed these primary data using identical models and standards for statistical significance. Finally, to address the cellular contribution to heterogeneity, we restricted attention to 4 studies of whole-blood samples, which represented the majority of the available case-control data (Table 2).
We first compiled the published overlap list of 52 genes that were called significant by at least 2 primary studies. This list was enriched for cell activation and defense response to other organisms, but it did not code for a densely connected PPI network (Figure 1). It is a signal, but not a strong one, probably owing to high methodological heterogeneity. We therefore estimated differential expression statistics by technically harmonized methods and then compiled the harmonized overlap list of genes that were ranked in the top 3% of the most differentially expressed genes, with concordant sign of overexpression or underexpression, in at least 2 studies. This list comprised 272 genes that were enriched for neutrophil and other innate immune-related functions and coded a set of proteins with more known biochemical interactions between them than expected by chance (Figure 2).
However, harmonized data analysis disclosed another important cellular source of heterogeneity between studies: the use of whole-blood versus PBMC samples. Primary studies measuring differential expression in whole-blood samples were more consistent with each other and with the modular community structure of the normative transcriptome (
) than with the results of primary studies based on PBMCs (Figure 3). This is not surprising, as PBMC samples by definition exclude platelets and neutrophils (the largest single class of peripheral immune cells), so PBMC transcripts are expected to have lower levels of neutrophil-related gene expression than whole-blood transcripts.
To control cellular heterogeneity, we focused on 4 primary studies that had measured gene expression in whole blood. We meta-analytically estimated the SMD between MDD cases and healthy controls in expression of each of 16,302 genes measured in at least 3 studies and probabilistically thresholded these statistics with an FDR of 5%. The SMD meta-analysis list comprised 343 genes that were differentially expressed with concordant sign in depressed cases compared with healthy controls (Figure 4). There was a significant degree of convergence between this list and the harmonized overlap list. The SMD meta-analysis list was also significantly enriched for neutrophil activation and degranulation, apoptosis, and transmembrane signaling, and it coded a PPI network that was significantly more densely connected than expected by chance. The 229 genes that were concordantly overexpressed in the SMD meta-analysis list were significantly affiliated to the module of the normative transcriptome specialized for neutrophil functions (Figure 5).
These methodologically harmonized results more convincingly indicate that MDD is robustly associated with increases in expression of neutrophil-related and innate immune genes. It is a stronger signal, but what does it mean biologically and in relation to the pathogenesis of depression?
Biologically, this transcriptional signal from whole blood could represent either an MDD-related increase in the number of neutrophils, or relative overexpression of inflammatory genes by circulating neutrophils, or both an increased number and activation status of neutrophils owing to expansion of more developmentally immature and hypersegmented subclasses of neutrophils (
). We were unable to explore this issue any further immediately owing to lack of cell count data provided by the primary studies. There is prior evidence that MDD is associated with increased numbers of neutrophils (
). Flow cytometry and transcriptional analysis of sorted cell classes or single cells could be used in the future to resolve the immune cellular phenotype and its relationship to transcriptional changes more precisely.
In relation to pathogenesis of MDD, these case-control differences in innate immune gene expression (SMD approximately 0.2–0.5) (Figure 4) are of the same order of magnitude as previously reported case-control differences in CRP and inflammatory cytokines (SMD approximately 0.1–0.5). There is some prior evidence that increased neutrophil counts are positively correlated with increased inflammatory proteins in MDD (
). Thus, neutrophil expansion and/or activation may constitute at least one of the cellular sources of peripherally increased proinflammatory cytokines in MDD, which in turn could communicate across the blood-brain barrier to cause central immune state changes and depressive behaviors (
), included within the broad clinical syndrome of MDD.
The main technical focus of our review has been to mitigate statistical and cellular sources of heterogeneity when comparing or aggregating data between primary studies. We have been able to do this post hoc, from openly accessible or published data, but only to some extent. There are some aspects of methodological heterogeneity, such as microarray versus RNA-seq assays or differences in diagnostic or eligibility criteria for caseness, that we have not addressed. There are also many potential confounding factors, e.g., comorbid medical disorder, that were not consistently controlled a priori or easily controllable post hoc across this set of studies. We benchmarked the MDD-harmonized gene lists against prior lists of genes differentially expressed in association with age, smoking, and obesity and found little evidence for confounding effects on blood transcription. However, future biomarker studies of depression might endeavor to go beyond case-control binarization and collect richer clinical data to explore the relationships between immune profiles and subsyndromes of MDD and comorbid depression.
Current best practice in the bioinformatics community includes depositing genome-wide expression data in an accessible repository, such as the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/) or the database of Genotypes and Phenotypes (https://www.ncbi.nlm.nih.gov/gap/). Our results clearly demonstrate the value added to primary publications if the raw data on individual participants are openly accessible for harmonized reanalysis and meta-analysis. It is hoped that future studies will share measurement, analytic, and open science protocols to minimize unnecessary heterogeneity between studies and to accelerate collective convergence on optimal standards.
Overall, we consider that this quantitative review provides encouraging evidence of consistent and significant blood transcriptional changes, especially in neutrophil and other myeloid cell–related genes, which merit further investigation as candidate biomarkers of depression associated with inflammation.
Acknowledgments and Disclosures
This work was supported by the National Institute of Health Cambridge Biomedical Research Centre (UK) and by the Wellcome Trust-funded consortium for Neuroimmunology of Mood Disorders and Alzheimer's Disease (NIMA). ETB is a National Institute of Health Research Senior Investigator.
GMW and WCD are employed by Janssen Research & Development LLC and hold stock in Johnson & Johnson. ETB is now employed full-time by the University of Cambridge and was previously (until May 2019) employed half-time by GlaxoSmithKline; he is a member of the Scientific Advisory Board of Seiso Heptares; he receives research funding from Janssen, GlaxoSmithKline, and Lundbeck as part of the Wellcome Trust Consortium for the Neuroimmunology of Mood Disorders and Alzheimer’s Disease. PEV is a Fellow of MQ: Transforming Mental Health (Grant No. MQF17_24) and of the Alan Turing Institute funded by EPSRC Grant No. EP/N510129/1. JG and PEV report no biomedical financial interests or potential conflicts of interest.
Challenges to the diagnosis and treatment of patients with psychiatric disorders have long been acknowledged in the field. In recent years, efforts have been made to identify genomic biomarkers for psychiatric disorders. A link between immune function and major depressive disorder (MDD) has been suggested for decades (1), but the identification of differentially expressed genes (DEGs) that underlie immune function as biomarkers for MDD has been more recent. Hundreds of DEGs have been reported for transcriptome-wide association studies (TWASs) of MDD.
Erratum to: “Major Depressive Disorder Is Associated With Differential Expression of Innate Immune and Neutrophil-Related Gene Networks in Peripheral Blood: A Quantitative Review of Whole-Genome Transcriptional Data From Case-Control Studies,” by Wittenberg et al. (Biol Psychiatry 2020; 88:625–637); https://doi.org/10.1016/j.biopsych.2020.05.006 .