If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Department of Psychiatry and Psychotherapy, Ludwig-Maximilian-University Munich, GermanyDepartment of Education, Psychology, and Communication, University of Bari “Aldo Moro,” Bari, Italy
Address correspondence to Nikolaos Koutsouleris, M.D., Department of Psychiatry and Psychotherapy, Ludwig-Maximilian-University, Nussbaumstr. 7, D-80336 Munich, Germany.
The clinical high risk (CHR) paradigm has facilitated research into the underpinnings of help-seeking individuals at risk for developing psychosis, aiming at predicting and possibly preventing transition to the overt disorder. Statistical methods such as machine learning and Cox regression have provided the methodological basis for this research by enabling the construction of diagnostic models (i.e., distinguishing CHR individuals from healthy individuals) and prognostic models (i.e., predicting a future outcome) based on different data modalities, including clinical, neurocognitive, and neurobiological data. However, their translation to clinical practice is still hindered by the high heterogeneity of both CHR populations and methodologies applied.
Methods
We systematically reviewed the literature on diagnostic and prognostic models built on Cox regression and machine learning. Furthermore, we conducted a meta-analysis on prediction performances investigating heterogeneity of methodological approaches and data modality.
Results
A total of 44 articles were included, covering 3707 individuals for prognostic studies and 1052 individuals for diagnostic studies (572 CHR patients and 480 healthy control subjects). CHR patients could be classified against healthy control subjects with 78% sensitivity and 77% specificity. Across prognostic models, sensitivity reached 67% and specificity reached 78%. Machine learning models outperformed those applying Cox regression by 10% sensitivity. There was a publication bias for prognostic studies yet no other moderator effects.
Conclusions
Our results may be driven by substantial clinical and methodological heterogeneity currently affecting several aspects of the CHR field and limiting the clinical implementability of the proposed models. We discuss conceptual and methodological harmonization strategies to facilitate more reliable and generalizable models for future clinical practice.
) describes a mental state characterized by subthreshold psychotic symptoms that differ quantitatively in their intensity from those of a full-blown psychosis (Supplement and Table 1). The CHR paradigm has become a well-established clinical avenue to early detect and potentially treat the psychosis high-risk states. Based on the CHR paradigm, researchers have investigated the nature of the prepsychotic phase from both pathophysiological and epidemiological perspectives (
). Thus, the CHR designation delineates a mental condition that is burdensome per se and, in addition, is associated with a known set of comorbidities (e.g., depression, substance abuse, anxiety disorders) (
). Therefore, predictive psychiatry has gradually broadened its scope from detecting disease transition to encompassing adverse outcomes more broadly [e.g., functional deficits (
Prediction models of functional outcomes for individuals in the clinical high-risk state for psychosis or with recent-onset depression: A multimodal, multisite machine learning analysis.
Persistence or recurrence of non-psychotic comorbid mental disorders associated with 6-year poor functional outcomes in patients at ultra high risk for psychosis.
), efforts have been made to identify potential risk factors for psychosis in several symptomatological and biological readouts, or biomarkers, of the disorder (
Association of neurocognition with transition to psychosis: Baseline functioning in the second phase of the North American Prodrome Longitudinal Study.
) neural activity and connectivity anomalies, has been consistently reported in people at risk for psychosis compared with healthy individuals. Some of these phenotypes have been associated with both disease course and transition to the overt disease (
). Therefore, the identification of reliable markers able to distinguish between at-risk and healthy populations may be potentially useful in clinical practice to monitor disease development and treatment outcome (
) and to obviate time-consuming CHR assessments. The two prevailing statistical approaches to address the challenge of single-subject prediction are machine learning (ML) methods (e.g., support vector machine, LASSO [least absolute shrinkage and selection operator] regression, random forest), which can handle large databases and different data domains (
) able to investigate time-to-conversion trajectories. Recent research applying these methods has produced prognostic models able to stratify CHR patients into different risk classes according to their pretest risk enrichment (
Improving prognostic accuracy in subjects at clinical high risk for psychosis: Systematic review of predictive models and meta-analytical sequential testing simulation.
Prediction models of functional outcomes for individuals in the clinical high-risk state for psychosis or with recent-onset depression: A multimodal, multisite machine learning analysis.
Individual prediction of long-term outcome in adolescents at ultra-high risk for psychosis: Applying machine learning techniques to brain imaging data.
). Despite the great potential of these models, their applicability is still hindered by the methodological heterogeneity in the field. Indeed, CHR patients are identified by several clinical instruments and are characterized by subtypes with different levels of risk (
). Moreover, models’ generalizability has been assessed through discrepant validation strategies across studies, ranging from the less replicable (i.e., single-site cross-validation [CV]) to the most robust (i.e., validation to external samples) (
). Thus, methodological approaches still lack standardized validation strategies testing clinical applicability under real-world conditions. One way to tackle these issues is to use a meta-analytic approach to quantitatively investigate models’ performance across different outcomes, algorithms, and data modalities. Although important contributions to this goal have been made (
Improving prognostic accuracy in subjects at clinical high risk for psychosis: Systematic review of predictive models and meta-analytical sequential testing simulation.
), to the best of our knowledge, the field is still lacking such an analysis. Investigating the field’s heterogeneity would allow a comprehensive assessment of accuracy and validity of the existing diagnostic and prognostic models, an important prerequisite for establishing reliable tools for psychosis risk quantification in clinical care.
Our aim was to review the literature on ML-based and Cox regression–based diagnostic models (i.e., discriminating CHR individuals from healthy individuals) and prognostic models (i.e., predictive approaches for transition or negative outcomes). Furthermore, we performed a meta-analysis of models’ performance, with the aim of investigating the effects of 1) data modality, 2) type of algorithm, and 3) validation paradigms. We expected that our results would elucidate the complexity of methods and data domains currently used in the predictive analytics arm of CHR research. This will facilitate a deeper understanding of the state of the art within the field and may clarify the bottlenecks impeding clinical translation.
Methods and Materials
Literature Search
We conducted a systematic search of published original articles in English through June 30, 2019, using a range of search terms in PubMed and Scopus as well as reference lists of the included articles (Supplement). We selected studies that reported prognostic or diagnostic models constructed using ML or Cox proportional hazard regression. Concerning diagnostic models, we included only those that used healthy control subjects (HCs) as a reference group to enlarge the sample size by selecting comparable classification models across studies. CHR included patients with a psychosis risk syndrome categorized as CHR, ultra high risk (UHR), or at-risk mental states (Table 1) as well as those with a familial risk (FR) or 22q11.2 deletion syndrome (22q11.2DS). Studies were included if measures of performance accuracy were reported (i.e., true positives [TP], false positives [FP], true negatives [TN], and false negatives [FN]) or if they could be extracted. Results of the literature search are illustrated in the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-analyses) flowchart (
Table 1Definitions of Different Psychosis Risk Syndromes Commonly Referred to as CHR States and Descriptions of the Abbreviations and Respective Clinical Diagnostic Instruments
Concept
Description
Instruments
CHR
Clinical high risk: psychosis risk syndrome operationalized by UHR, BS, or both diagnostic criteria
All instruments below
ARMS
At-risk mental state: same as the CHR state
UHR
Ultra high risk: psychosis risk syndrome described by the fulfillment of APS, BLIP, or GRDS criteria
Drop in functioning is described 1) in the CAARMS as a Social and Occupational Functioning Assessment Scale (SOFAS) score ≤30% compared with the previous functioning, within the last year, and for at least 1 month and 2) in the SIPS/SOPS as a 30% decrease in the Global Assessment of Functioning scale score from premorbid baseline. A sustained low functioning is defined only in the CAARMS as a SOFAS score ≤50 in the past year or longer.
BS
Basic symptoms: subjective disturbances of cognitive, affective, and perceptive nature
BSABS
COGDIS
Cognitive disturbances: 9 BS describing disturbances of cognitive nature
SPI-A/SPI-CY
COPER
Cognitive-perceptive symptoms: 10 BS describing disturbances of a cognitive-perceptual nature
UPS
Unspecific prodromal symptoms: unspecific attenuated symptoms characterizing a low-risk state
BSIP
BSABS, Bonn Scale for the Assessment of Basic Symptoms; BSIP, Basel Screening Instrument for Psychosis; CAARMS, Comprehensive Assessment of the At-Risk Mental State; SIPS, Structured Interview for the Prodromal Syndrome; SOPS, Scale of Prodromal Symptoms; SPI-A/SPI-CY; Schizophrenia Proneness Instrument–Adult version/Schizophrenia Proneness Instrument–Child and Youth version.
a Drop in functioning is described 1) in the CAARMS as a Social and Occupational Functioning Assessment Scale (SOFAS) score ≤30% compared with the previous functioning, within the last year, and for at least 1 month and 2) in the SIPS/SOPS as a 30% decrease in the Global Assessment of Functioning scale score from premorbid baseline. A sustained low functioning is defined only in the CAARMS as a SOFAS score ≤50 in the past year or longer.
A comprehensive list of all variables extracted by each study is reported in the Supplement (second section). Performance accuracy measures used for analyses comprised TP, FN, TN, FP, sensitivity (SE) [TP/(TP + FN)], and specificity (SP) [TN/(TN + FP)].
Data Analysis
The meta-analysis of diagnostic models was conducted following previous work (
). Extracted SE and SP were converted to a confusion matrix tabulated across studies. Publication bias was assessed with both overall diagnostic odds ratio and SE. The Deeks et al. (
) in the mada R package (version 0.5.8), which permits the analysis of SE and SP separately by explicitly accounting for correlations between each measure, incorporating precision estimates arising from sample size differences (i.e., more precision with higher weight), and modeling normal distributions of each with a random effects approach. This bivariate method was used to produce summary estimates of SE, SP, and confidence intervals (CIs) that were used in forest plots, in addition to the analysis of moderators using mixed modeling. Moderators were age, sex, data modality, algorithm, presence of CV, type of CHR, being a multisite study, and year of publication. For prognostic studies, we also investigated follow-up time and prognostic target. Moderator analyses were conducted if a minimum of 10 models for variable were available to decrease the standard error and maximize power in case of high between-study variance (
) and to control for sample size and CV scheme—the latter factor overlapping with algorithm used. Results were corrected for false discovery rate. Likelihood ratios and diagnostic odds ratios were produced using a Markov chain Monte Carlo approach within the mada toolbox. All analyses were conducted with R (version 3.6.0).
Results
The systematic literature search detected 881 articles, from which 44 were considered eligible after screening for exclusion criteria, for a total of 12 diagnostic models (Table 2 and Figure S1) and 32 prognostic models (Table 3 and Figure S1). The final sample comprised 3707 patients for prognostic studies (mean age = 20.41 years; ∼58% male), of which 320 (∼9%) were CHR patients investigated for nontransition outcomes (mean age = 19.25 years; 56% male) and 1052 were used for diagnostic classification (mean age = 23.42 years; ∼59% male), of which 480 (45%) were HCs. In addition, 26 studies used ML (all diagnostic studies) and 18 were conducted with Cox regression (Tables 2 and 3 and Table S1).
Table 2Summary of Diagnostic Studies Included in the Current Meta-analysis
Can neuropsychological testing facilitate differential diagnosis between at-risk mental state (ARMS) for psychosis and adult attention-deficit/hyperactivity disorder (ADHD)?.
Abnormal regional homogeneity as potential imaging biomarker for psychosis risk syndrome: A resting-state fMRI study and support vector machine analysis.
Individual prediction of long-term outcome in adolescents at ultra-high risk for psychosis: Applying machine learning techniques to brain imaging data.
Prediction models of functional outcomes for individuals in the clinical high-risk state for psychosis or with recent-onset depression: A multimodal, multisite machine learning analysis.
Prediction models of functional outcomes for individuals in the clinical high-risk state for psychosis or with recent-onset depression: A multimodal, multisite machine learning analysis.
Predictive validity of clinical variables in the “at risk” for psychosis population: International comparison with results from the North American Prodrome Longitudinal Study.
Improved individualized prediction of schizophrenia in subjects at familial high risk, based on neuroanatomical data, schizotypal and neurocognitive features.
CHR individuals could be classified against HCs with an overall SE of 78% (95% CI = 73%–83%) and an SP of 77% (95% CI = 68%–84%), while across all prognostic models SE reached 67% (95% CI = 63%–70%) and SP reached 78% (95% CI = 73%–82%). Prognostic studies showed a publication bias (R2 = .26, p < .001), whereas diagnostic studies did not (R2 = .07, p > .05) (Figure S2). Performances of both models’ categories are illustrated in two summary receiving operating characteristic curves (Figures 1 and 2) and forest plots (Figures 3 and 4). Within diagnostic models, moderator effects of type of CHR and algorithm, data modality, presence of CV, and being a multisite study were not investigated because less than 10 models per factor were available (
). We found no effects of moderator variables in either application domain (p > .10) (Table S2) even when splitting the sample based on CV (Supplement).
Figure 3Forest plot of sensitivity and specificity for all diagnostic studies divided by data modality. CI, confidence interval; RE, random effects; RF, random forest; SVM, support vector machine.
Figure 4Forest plot of sensitivity and specificity for all prognostic studies divided by algorithm and data modality. CHC, convex hull classification; CI, confidence interval; GPC, Gaussian process classifier; LASSO, least absolute shrinkage and selection operator regularized regression; RE, random effects; RF, random forest; SVM, support vector machine.
Prediction models of functional outcomes for individuals in the clinical high-risk state for psychosis or with recent-onset depression: A multimodal, multisite machine learning analysis.
Individual prediction of long-term outcome in adolescents at ultra-high risk for psychosis: Applying machine learning techniques to brain imaging data.
Abnormal regional homogeneity as potential imaging biomarker for psychosis risk syndrome: A resting-state fMRI study and support vector machine analysis.
Improved individualized prediction of schizophrenia in subjects at familial high risk, based on neuroanatomical data, schizotypal and neurocognitive features.
Predictive validity of clinical variables in the “at risk” for psychosis population: International comparison with results from the North American Prodrome Longitudinal Study.
), or lacked a validation procedure. Among the cross-validated studies, 58% applied leave-one-out CV, 3 of which nested and 7 of which used k-fold CV (3 in its repeated nested form). Only 1 study applied a leave-site-out CV (
Prediction models of functional outcomes for individuals in the clinical high-risk state for psychosis or with recent-onset depression: A multimodal, multisite machine learning analysis.
). Within prognostic studies, we found a main effect of CV/algorithm on SE (p = .009; χ22 = 6.96, p = .031); that is, cross-validated ML models reached a higher SE (71%, 95% CI = 67%–74%) than Cox regression ones (61%, 95% CI = 54%–68%) (Figure 4).
Effect of Data Modality
Diagnostic models included the use of functional (
Abnormal regional homogeneity as potential imaging biomarker for psychosis risk syndrome: A resting-state fMRI study and support vector machine analysis.
). Clinical models were trained on prodromal positive and negative symptoms, functioning, and family risk associated with functional decline; the neurocognitive modality was based on executive functions and verbal IQ (
Improved individualized prediction of schizophrenia in subjects at familial high risk, based on neuroanatomical data, schizotypal and neurocognitive features.
Prediction models of functional outcomes for individuals in the clinical high-risk state for psychosis or with recent-onset depression: A multimodal, multisite machine learning analysis.
Individual prediction of long-term outcome in adolescents at ultra-high risk for psychosis: Applying machine learning techniques to brain imaging data.
Improved individualized prediction of schizophrenia in subjects at familial high risk, based on neuroanatomical data, schizotypal and neurocognitive features.
). Because of this imbalance, we could not statistically test the effects of this variable, yet results did not change when excluding patients with 22q11.2DS and FR (Supplement).
Furthermore, individuals differed in their outcome definitions. Poor functional outcome was defined on the Global Assessment of Functioning scale (GAF) (cutoff: 70) (
Individual prediction of long-term outcome in adolescents at ultra-high risk for psychosis: Applying machine learning techniques to brain imaging data.
Prediction models of functional outcomes for individuals in the clinical high-risk state for psychosis or with recent-onset depression: A multimodal, multisite machine learning analysis.
), treatment response was operationalized as an increase of ≥15 points in the GAF. There were no significant effects on SE or false positive rate driven by prognostic target (p = .570 or .085, respectively) or the duration of time-to-follow-up examination (p = .637 or .305, respectively).
Discussion
We conducted a systematic review and meta-analysis on 44 studies reporting prognostic and diagnostic models for a total of 3707 and 572 CHR individuals, respectively, with the aim to quantitatively assess their accuracy, validity, and heterogeneity. Our results point to good model performance overall and to a higher SE of ML models compared with Cox regression in prognostic studies. This effect was fully collinear with that of CV, mainly due to the complete overlap of this factor with algorithm type. Notably, there were no significant effects of data modality, CHR or CV type, prognostic target, or any other potential confounding variables (e.g., age distribution, sex, year of publication, follow-up interval time) on accuracy performance in our data. It is noteworthy that in prognostic studies we observed a publication bias, that is, the tendency for studies with smaller sample sizes to report higher, and potentially inflated, prediction accuracies (
) so that we cannot draw robust conclusions from our meta-analytical findings.
Methodological Differences and Pitfalls
Prognostic models employing ML outperformed those using Cox regression by 10% SE. This finding may have resulted from a complex interplay of cohort-related and methodological heterogeneity. Notably, there was a complete overlap between the statistical method chosen and implementation of CV, that is, all ML models were cross-validated, while only 6 Cox regression studies applied bootstrapping as the validation procedure. Because the choice of a reliable validation method strongly determines both performance and generalizability of models (
), this methodological discrepancy may have biased our findings. Validation issues were also present in studies employing ML for prognostic modeling. First, 53% of these studies applied CV without nesting and repetitions, which is known to generate overoptimistic results due to high variability and information leakage between training and testing data during model optimization (
). The extended use of this validation scheme may explain the higher SE found in ML studies.
Second, several Cox regression studies included in this meta-analysis either did not report probability thresholds or chose a priori optimal thresholds from the data. While ML’s lack of homogeneous thresholds is mainly handled via CV schemes averaging performances across folds and repetitions, the use of p values or data-derived thresholds without a proper training–test separation might have inflated Cox regression models’ performance (
Third, preprocessing approaches varied across studies. In 3 cases, for instance, prognostic features were derived from univariate group comparisons or by applying principal component analysis outside the CV scheme (
). These approaches, as well as the use of stepwise methods in Cox regression models, entail sample-driven variance and, therefore, could lead to good predictive performance, but arguably they should be tested for generalizability in an external dataset. Valuable alternatives are literature-based feature selection and embedded feature optimization, where the intrinsic optimal feature configuration is learned by the model itself (
It should be noted that some of the studies included in our meta-analytic contribution had very low sample sizes. One study had N < 20, while 2 diagnostic and 21 prognostic models had, respectively, less than 20 CHR individuals or CHR with poor outcome. Findings from these studies might be consistent with literature demonstrating a publication bias toward increased accuracy with reduced sample size (
), possibly caused by overfitting. This indicates the need for future ML research to employ larger, preferably multisite samples for both diagnostic and prognostic purposes (
Taken together, these issues may mirror the heterogeneity of methodological procedures within the field. Arguably, the application of ML techniques to diagnosis and prognosis in psychiatry is still relatively young (
), so conventions and standard operating procedures facilitating model comparability and replicability have not become generally accepted. Our findings highlight the urgency to develop such guidelines for the construction of prognostic and diagnostic models (
). As indicated in Table 4, the most important ones are 1) the implementation of repeated nested CV, internal–external, or external validation schemes and 2) the full and strict embedding of all preprocessing or feature engineering procedures within the CV scheme. Researchers, funding organizations, and journals should support efforts to standardize approaches, favoring the importance of thorough validation over model performance per se.
Table 4Conceptual and Methodological Guidelines for Construction of Diagnostic and Predictive Models Implementable in Real-Life Clinical Practice
Guidelines
Practical Suggestions
Conceptual Guidelines
Harmonization of the CHR definition and diagnostic instruments
Create a harmonized early recognition instrument that encompasses those at-risk definitions and criteria from the existing diverse inventories that parsimoniously delineate the CHR state and also are predictive of its adverse outcomes
Broaden the scope of prediction to nontransition outcomes
Harmonize social and occupational outcomes, pharmacological and nonpharmacological treatment response criteria, and definitions of persistence or remission of symptoms and use these end points in future predictive studies
Methodological Guidelines
Increase in sample size
Facilitate collaborative science approaches that enable the harmonization of end-point definitions and the external validation of predictive models Get access to open-source databases
Study design harmonization
Employ reliable methodologies (CV and external validation are recommended); avoid leave-one-out CV; implement k-fold CV Embed all preprocessing or feature engineering procedures within the chosen CV scheme Enforce preregistration processes (as in clinical trials) to facilitate monitoring of standardized data acquisition, model discovery, and validation plan
Common modeling platforms and open-source model libraries
Large-scale, consortium-wide international model benchmarking
Overall, most models were constructed using biological (44%) and clinical (38%) data, with only 10 prognostic models based on more than one data modality. Most diagnostic models used MRI data (83%), whereas prognostic models showed a higher variability. Prognostic models of psychosis transition included molecular, neuroanatomical, electrophysiological, neuropsychological, and clinical data modalities, most of the latter trained on prodromal positive and negative symptoms, functioning, and FR associated with functional decline. We found no significant differences in predictive accuracy when comparing data modalities within and between algorithms.
This result may mirror a real lack of significant differences in biomarker type when distinguishing the CHR state from the norm or predicted outcome. However, because only 4 prognostic studies tested the relative and combined predictive ability of different data modalities on the same individuals (
Prediction models of functional outcomes for individuals in the clinical high-risk state for psychosis or with recent-onset depression: A multimodal, multisite machine learning analysis.
Individual prediction of long-term outcome in adolescents at ultra-high risk for psychosis: Applying machine learning techniques to brain imaging data.
Improved individualized prediction of schizophrenia in subjects at familial high risk, based on neuroanatomical data, schizotypal and neurocognitive features.
), and because data modalities are overall under- or overrepresented, the currently available studies do not allow this conclusion to be drawn. Further research directly comparing performance across data modalities, followed by meta-analytic evaluation, is warranted.
Alternatively, our results may reflect the complexity of the multifaceted architecture of psychosis risk (
), which might be only partly captured by single data modalities. Indeed, a neuroanatomical biomarker might be informative for genetically or pathophysiologically driven mechanisms given that genes’ effect may be closer to brain than to behavior (
). Hence, a multimodal approach may be a viable way to reconcile and leverage information from single risk domains. Powerful new methodologies able to combine multiple sources of data, such as similarity network fusion (
), might be suitable for this purpose. Indeed, research has shown that a combination of clinical variables and structural brain imaging data might represent a promising multimodal framework for psychosis prediction (
Prediction models of functional outcomes for individuals in the clinical high-risk state for psychosis or with recent-onset depression: A multimodal, multisite machine learning analysis.
Improving prognostic accuracy in subjects at clinical high risk for psychosis: Systematic review of predictive models and meta-analytical sequential testing simulation.
) devised a 3-stage sequential testing paradigm, which in theory reaches nearly perfect positive predictive value when individuals are tested on one multimodal modality (i.e., clinical and electroencephalography) and two biological data modalities (i.e., structural MRI and blood based). However, these findings are simulated, have not been confirmed in empirical studies yet, and did not follow a thorough meta-analytical approach like the one implemented here.
Alternatively, similar performance of tested data modalities may have resulted from the variability induced by higher-order algorithm–data validation interactions. To thoroughly compare models originating from different data spaces, methodological consensus guidelines are urgently needed in the precision psychiatry field. A strict cross-study standardization, in terms of both data definitions and algorithm implementations, may shed light on real phenotypic and neurobiological differences and thus lead to unique insights into the pathology of emerging psychosis.
At-Risk State/Sample Differences
Another source of heterogeneity affecting our results may be due to clinical sample definitions. Most of the at-risk individuals in our sample fulfilled the UHR criteria, while a minority (5.7%) had an FR or a 22q11.2DS diagnosis, which prevented us from quantitatively estimating the effects of risk group designation. However, it is noteworthy that two of the instruments operationalizing UHR criteria (i.e., SIPS [Structured Interview for Prodromal Syndromes] and CAARMS [Comprehensive Assessment of At-Risk Mental States]) include a genetic risk group (i.e., the genetic risk deterioration syndrome) and that two studies in our sample included FRs and deletion syndrome patients with subthreshold psychotic symptoms (
Improved individualized prediction of schizophrenia in subjects at familial high risk, based on neuroanatomical data, schizotypal and neurocognitive features.
). This diagnostic overlap might create, on the one hand, a further source of variability and, on the other, a tangible bridge to the well-known heterogeneity among CHR individuals. This issue was tackled by a recent study (
) that provided evidence of a differential risk level within the subcategories of the CHR construct. Hence, further research should put effort into revising the CHR paradigm toward a more parsimonious definition based on one gold-standard clinical instrument and clear-cut biological underpinnings.
Furthermore, in our sample, criteria to define transition to psychosis or poor functional outcome differed both in their operationalization and in the threshold used within a specific diagnostic instrument. Another issue in the variability of outcome definition is dichotomization of continuous variables such as GAF and global functioning, which has proven to be a potential source of bias in prognostic models (
) addressed this point by conducting an additional analysis to investigate the continuous nature of functioning by using a support vector regression algorithm. The predictability of nontransition outcomes in at-risk individuals is still relatively unexplored. Therefore, there is a need for clinical consensus on relevant nontransition outcomes and how they should be assessed. Additionally, adopting adaptive risk models, which capture the high extent of variability of symptoms and risk factors over time (
). For instance, American CHR individuals are usually younger (∼16–18 years) than their European counterparts (∼22–24 years). Interestingly, recent research has shown that neuroanatomical development and risk for developing psychosis are interconnected (
). Overall, our findings suggest that the gestalt of the CHR state might be successfully modeled only if multiple behavioral and neurobiological moderators are conjointly considered using standardized multivariate methods, thereby fully embracing the complexity of this risk paradigm.
Limitations
Our meta-analysis was driven by the primary aim to evaluate the potential applicability of diagnostic and prognostic models in real-life clinical practice. Therefore, we focused only on the two currently prevailing methodological approaches (i.e., ML and Cox regression). Importantly, we might have missed significant results by excluding other more traditional statistical methods such as logistic regression (
). Nevertheless, ML approaches enable the investigation of the intrinsic complexity of specific data types (e.g., brain features) and are devised for better generalizability.
Another limitation might be the lack of investigation into symptomatology, treatment, substance use, or additional comorbidities, which was due to missing or inconsistent information for several studies. Indeed, already in patients with first-episode psychosis, antipsychotic treatment has been shown to have neuroanatomical effects (
Persistence or recurrence of non-psychotic comorbid mental disorders associated with 6-year poor functional outcomes in patients at ultra high risk for psychosis.
) has further introduced spurious variance in our analyses.
Furthermore, the CHR paradigm has proven to have intrinsic limitations. On the one hand, its predictive power might be partly driven by the so-called pretest risk enrichment; that is, the assessment of at-risk criteria in a specific constellation of help-seeking individuals (
). On the other hand, it might not capture the full extent of risk in the population, as a recent study pointed out by reporting that most transitions occurred in patients with an unclear psychiatric diagnosis or no CHR status (
). Because most prognostic models have been developed for the CHR state, their usefulness outside of this category should be intensively investigated.
Lastly, given the heterogeneity of our data and the publication bias detected, our meta-analysis is inherently limited to a description of, not an ultimate decision on, which diagnostic and prognostic models are sufficiently reliable to be applied in clinical settings.
Conclusions
A comprehensive paradigm shift is required to enable the clinical application of diagnostic and prognostic models for the CHR state. First, the field requires study design harmonization, which demands reliable methodological approaches such as CV or external validation to ensure generalizability. An approach to enhance the studies’ potential for real-life implementation could be a preregistration process similar to clinical trials, during which their validity in terms of standardized data acquisition, model discovery, and validation could be monitored. Furthermore, large-scale international model benchmarking at the level of external model validation can be achieved only by constructing common modeling platforms and open source model libraries. The National Institute of Mental Health’s Harmonization of At-Risk Multisite Observational Networks for Youth (HARMONY) is a first step in the above direction. Consortium-wise coordinated work will also allow strategic methodological testing; that is, controlled comparison of algorithms, preprocessing and feature optimization pipelines, and multiple data modalities (for an overview of conceptual and methodological guidelines, see Table 4). Multimodal ML carries the challenging responsibility to better disentangle the complex architecture of psychosis risk within a clinical consensus environment. This should involve efforts to unify the CHR definition, both theoretically and practically, and also to embrace relevant nontransition outcomes to broaden the prognostic scope. Future studies are warranted to investigate whether harmonizing procedures within precision psychiatry will lead to more reliable and reproducible translational research in the field.
Acknowledgments and Disclosures
This work was supported by a EU-FP7-HEALTH grant for the project “PRONIA” (Personalized Prognostic Tools for Early Psychosis Management) (Grant No. 602152) and by the National Institute of Mental Health (NIMH) for the project “HARMONY” (Harmonization of At Risk Multisite Observational Networks for Youth) (Grant No. MH081928). PRONIA, BMBF (Federal Ministry of Education and Research), and the Max Planck Society funded RS.
NK received honoraria for two lectures from Otsuka. He has a patent issued related to adaptive pattern recognition for psychosis risk modeling (U.S. patent 20160192889A1). RS received honoraria for one lecture from Lundbeck. The other authors report no biomedical financial interests or potential conflicts of interest.
Prediction models of functional outcomes for individuals in the clinical high-risk state for psychosis or with recent-onset depression: A multimodal, multisite machine learning analysis.
Persistence or recurrence of non-psychotic comorbid mental disorders associated with 6-year poor functional outcomes in patients at ultra high risk for psychosis.
Association of neurocognition with transition to psychosis: Baseline functioning in the second phase of the North American Prodrome Longitudinal Study.
Improving prognostic accuracy in subjects at clinical high risk for psychosis: Systematic review of predictive models and meta-analytical sequential testing simulation.
Individual prediction of long-term outcome in adolescents at ultra-high risk for psychosis: Applying machine learning techniques to brain imaging data.
Abnormal regional homogeneity as potential imaging biomarker for psychosis risk syndrome: A resting-state fMRI study and support vector machine analysis.
Improved individualized prediction of schizophrenia in subjects at familial high risk, based on neuroanatomical data, schizotypal and neurocognitive features.
Predictive validity of clinical variables in the “at risk” for psychosis population: International comparison with results from the North American Prodrome Longitudinal Study.
Can neuropsychological testing facilitate differential diagnosis between at-risk mental state (ARMS) for psychosis and adult attention-deficit/hyperactivity disorder (ADHD)?.
Research on clinical high risk for psychosis (CHR-P) is central for the early detection field and the deployment of suitable clinical care pathways aiming at preventing the consequences of psychosis. In the last decades, the field has been engaged in a robust effort to develop prognostic models for transdiagnostic staging and individualized risk stratification, as shown in the recent meta-analysis by Sanfelici et al. (1). However, in such vibrant yet tumultuous growth, the accelerated search for scalable predictors was not immune to disharmonies and involuntary distortions, such as the neglect of important clinical confounders.