How Real-World Data Can Facilitate the Development of Precision Medicine Treatment in Psychiatry

Precision medicine has the ambition to improve treatment response and clinical outcomes through patient strati ﬁ - cation and holds great potential for the treatment of mental disorders. However, several important factors are needed to transform current practice into a precision psychiatry framework. Most important are 1) the generation of accessible large real-world training and test data including genomic data integrated from multiple sources, 2) the development and validation of advanced analytical tools for strati ﬁ cation and prediction, and 3) the development of clinically useful management platforms for patient monitoring that can be integrated into health care systems in real-life settings. This narrative review summarizes strategies for obtaining the key elements — well-powered samples from large biobanks integrated with electronic health records and health registry data using novel arti ﬁ cial intelligence algorithms — to predict outcomes in severe mental disorders and translate these models into clinical management and treatment approaches. Key elements are massive mental health data and novel arti ﬁ cial intelligence algorithms. For the clinical translation of these strategies, we discuss a precision medicine platform for improved management of mental disorders. We use cases to illustrate how precision medicine interventions could be brought into psychiatry to improve the clinical outcomes of mental disorders.

https://doi.org/10.1016/j.biopsych.2024.01.001Mental disorders are among the leading causes of chronic illness, disability, morbidity (1), and mortality (2) and represent a major public health concern worldwide (1,2).People living with severe and enduring mental illness, with onset usually during childhood or adolescence, are reported to have a life expectancy that is reduced by 10 to 20 years compared to the general population (2,3).The main cause of the increased mortality rate is comorbidities including additional psychiatric diagnoses (4) and somatic diseases such as type 2 diabetes, hypertension, and cardiovascular and respiratory diseases (5-7) but also substance use and suicide (8,9).
A fundamental challenge in psychiatry is the treatment of psychotic and affective symptoms, which are core characteristics of the severe mental disorders schizophrenia (SCZ) (10), bipolar disorder (11), and major depressive disorder (MDD) (12).While current medications for psychotic symptoms (antipsychotics) and mood alterations (antidepressants and mood stabilizers) are effective for most patients (13), there is large variation in efficacy and adverse effects (14).Nonresponse to these medications is a significant clinical problem, with failure rates around 30% in SCZ (15) and similar rates in bipolar disorder (16) and MDD (12).Individuals with symptoms that do not meaningfully improve after $2 trials of psychotropic medications (assuming adequate dose and duration) are commonly defined as being treatment resistant (14).However, a significant challenge in the identification of factors related to psychopharmacological treatment response is the high clinical and biological heterogeneity that characterizes psychiatric disorders (17).In addition, adverse effects such as cardiometabolic alterations are common and often cause nonadherence (18,19).Additional complexity is added by the extensive polypharmacy in psychiatry, increasing the risk for drug-drug interactions and adverse effects (20,21).Psychopharmacological treatment often involves a trial-and-error approach to balance treatment effects and adverse effects (22).
Precision medicine, an approach for treatment and prevention (23,24), aims to develop and validate clinical prediction models for therapeutic stratification (23)(24)(25)(26).For psychopharmacology, the goal of precision medicine is to guide psychopharmacological treatments by considering individual variability in genes, environment, and lifestyle (23).Progress in both psychiatric genetics (27) and pharmacogenomics (28) will create great opportunities for improving treatment outcomes by optimizing the use of existing medications based on the patient's genetic profile (29,30).While the application of genomics is crucial for future precision psychiatry, it is anticipated that genomic factors contribute to disease outcomes in concert with environmental factors such as socioeconomic status, education, nutrition, and adverse life events (26,31).Therefore, there is a need to include environmental exposures as well as nongenetic biomarkers and standard clinical data in prediction models to improve the predictive value of genomic information (31).However, the relevant datasets necessary to develop and validate precision treatment have only recently become available (23).Real-world data (RWD) is defined by the European Medicines Agency as any type of data that is not collected in a randomized clinical trial (RCT) (32).The U.S. Food and Drug Administration defines RWD as "the data relating to patient health status and/or the delivery of health care routinely collected from a variety of sources" (33).RWD provides a unique opportunity to obtain large datasets with sufficient statistical power to leverage novel analytical methods.This will enable the development of prediction and stratification tools with the precision required for translation into clinically useful decision support tools for precision treatment in psychiatry.
The aim of this narrative review is to summarize important factors that are needed to bring precision medicine interventions to psychiatry.The cornerstone of these is the use of RWD collected from routine clinical assessments, which remains an underexplored source of information that provides a unique opportunity to obtain massive datasets that can power both basic and applied research initiatives (23).As illustrated in Figure 1, the generation of large training and test data by integrating RWD from health care systems and biobanks, the development of advanced artificial intelligence tools for stratification and prediction, and finally the development of a management platform for clinical monitoring of patients are all required to translate precision psychiatry interventions from basic science into clinical practice.

METHODS
This is a narrative review focusing on the use of RWD, genetic information, and prediction tools for precision psychiatry.PubMed was used to identify articles (up to August 1, 2023) on "precision psychiatry," "genetics AND precision psychiatry," "real-world data AND precision psychiatry," "prediction models AND precision psychiatry," "electronic health records AND precision psychiatry," and "treatment stratification AND psychiatry" were collated.We searched the literature to qualitatively evaluate its relevance to the current objective and selected papers based on expertise in the writing team.

RWD SOURCES
Deep phenotyping data of clinical information, including comorbidities and psychopharmacological treatment outcome data, are essential for stratification and prediction of clinical outcomes in mental disorders, but such data are difficult to obtain on a large, homogeneous scale.Structured and curated RWD from health registries and hospital records/electronic health records (eHRs) linked with genotype data from biobanks, as well as large-scale therapeutic drug monitoring databases or other large clinical samples of individuals with severe mental illness, can provide such data and the sample sizes needed to have adequate power for discovering genetic factors associated with treatment outcomes in mental disorders.Nationwide prescription records provide insight into individual treatment outcomes that can be deduced from, e.g., the duration and changes in type and dosage of medication (34,35).These proxy phenotypes can be used to estimate treatment response.The Nordic region, i.e., Denmark, Estonia, Finland, Iceland, Norway, and Sweden, offers large populationbased genotyped cohorts with longitudinal data that are valuable for precision medicine (36).Examples of such cohorts include the Danish Neonatal Screening Biobank used by the iPSYCH project (http://www.ipsych.au.dk/), the Estonian Biobank (http://www.biobank.ee/),the FinnGen project (https:// www.finngen.fi/),deCODE genetics (http://www.decode.com),and the MoBa (Norwegian Mother and Child Cohort Study) (http://www.fhi.no/MoBa),all of which have been linked to drug prescription data and/or self-reported medication use and related treatment response as well as registry data relevant for precision psychiatry.
Combining existing genomics data from biobanks with these collections of RWD overcomes the limitations of data from RCTs, from which patients with multimorbid conditions are excluded because they often require multiple drugs (polypharmacy) and are thus at greater risk of developing adverse effects.Furthermore, treatment adherence is better in RCTs than in the real world (37).Several studies have shown that RWD such as data from eHRs can be utilized to identify individuals who are at risk for treatment resistance in MDD (38,39) or SCZ (40).Proxies for treatment response or resistance have been defined from prescription registries (38,39), and natural language processing has been used to refine eHRderived treatment response definitions (41).In a meta-analysis on antipsychotic treatment discontinuation, it has been demonstrated that results from real-world studies and RCTs have good congruency (42).A recent study has shown that treatment-resistant depression can be reliably defined using primary care eHRs and utilized to assess genetic, clinical, and demographic characteristics of treatment-resistant depression (38).However, although eHRs may facilitate stratification of risk for treatment resistance (39,43), data from eHRs are subject to high variability and confounders and therefore require careful curation and validation (43).To combine information from RWD from multiple sources for integrated analysis, the RWD needs to fulfill the necessary quality of the measures related to treatment efficacy and adverse effects.Further quality control is required for data harmonization of different types of RWD, collected from registries and biobanks, medical health records, large clinical research data on mental disorder cohorts, and interviews or questionnaires.To apply RWD to precision psychiatry, data quality has to be evaluated, and the models have to be further improved using additional measures and external validation to evaluate their performance in realworld clinical settings (39,43).

GENOMIC DISCOVERY OF TREATMENT OUTCOMES
Severe mental disorders are complex chronic conditions with high heritability (40% to 80%) as estimated based on twin studies (44).Recent advances in genotyping technologies have led to the discovery of hundreds of regions in the human genome that harbor risk variants for psychiatric traits that have been identified from genome-wide association studies (GWASs) (45).Both mental disorders (46)(47)(48) and their comorbidities (49)(50)(51) are highly polygenic, meaning that they are influenced by many genes, with each genetic variant contributing a small effect toward the disorder.In aggregate, however, they explain a substantial portion of the variability in the phenotype (29).Polygenic risk scores (PRSs) can be used to study the cumulative effect of disorder-associated single nucleotide polymorphisms and may be useful for assessing disease risk.However, the predictive ability of psychiatric PRSs is still insufficient for clinical utility (52,53).With larger GWASs, improved phenotyping, and technological refinement, the predictive performance of PRSs are likely to improve in the coming years (52)(53)(54), and PRSs may become part of clinical psychiatry in the future (52,55).
Emerging evidence suggests that response to psychotropic medications may also have a genetic component (56,57).Pharmacogenomic studies investigate how genetic variation affects drug metabolism (pharmacokinetics) or the molecular, biochemical, and physiological effects of drugs (pharmacodynamics) and related adverse effects with the aim of guiding drug prescription to improve treatment response and reduce side effects (58).Several studies have shown that pharmacogenomic testing before starting drug treatment can lead to improved patient outcomes for specific drug-gene combinations (59)(60)(61)(62).However, pharmacogenomic information is not widely used in clinical psychiatry (28,63), primarily due to a lack of evidence on therapeutic utility in mental health conditions (64).In addition, most genetic markers identified and validated in psychopharmacogenetic studies are related to variability in pharmacokinetics, in particular drug metabolism mediated by CYP2D6 and CYP2C19 (28,63,64), while knowledge on how genetic variation affects the pharmacodynamics of psychotropic medications is still weak (64).Therefore, to provide a pharmacogenetic basis for precision treatment of psychotropic drugs, large-scale studies are needed to discover genetic variants that significantly affect the pharmacotherapeutic outcomes in mental disorders (28,63).
Knowledge of common and rare variants associated with treatment efficacy and adverse effects may be highly useful for treatment stratification, but the genetics of drug treatment outcomes are poorly understood, making prediction of drug response difficult.In addition, the degree of polygenicity of a phenotype affects the power of the GWAS (65); given that psychotropic drug treatment outcomes are polygenic (56,57), gene discovery requires large samples.Large RWD samples with both genotypes and longitudinal treatment outcome data could allow for the identification of genetic factors associated with response and adverse effects from psychotropic medication.The robust identification of genetic associations in current psychopharmacogenetic studies is limited by insufficient sample sizes as well as variability in defining treatmentrelated phenotypes (28).For antidepressant response, no robustly replicated associations have been detected to date (66)(67)(68)(69)(70).The largest GWAS on antidepressant response (N = 5151), measured using depression symptom scores, did not identify any genome-wide significant loci, likely due to its limited sample size (57).In a GWAS of treatment-resistant SCZ (TRS) including the world's largest sample of antipsychotic nonresponders (n TRS = 10,501 and n non-TRS = 20,325), no genome-wide significant loci were identified (56).The largest GWAS on lithium response (N = 2563), performed by the ConLiGen (International Consortium on Lithium Genetics), identified 1 replicable locus (71).While the ConLiGen sample size is even smaller than the GWAS of TRS (56) and antidepressant response (57), response to a specific drug, i.e., lithium, can probably be more robustly assessed than other treatment phenotypes.
While current GWASs on psychotropic drug treatment outcomes have not yielded genomic predictors that can be integrated into stratification and prediction of treatment outcomes, data from clozapine clinics in the United Kingdom and Norway have been used to conduct analyses that link genomic liability to SCZ with antipsychotic dosing, suggesting that individuals who are at high genomic risk for SCZ are less likely to respond to clozapine treatment at standard doses (72).A Swedish study demonstrated that lithium dose prediction was improved by using clinical and genomic data (73).Moreover, PRSs for SCZ and MDD have been used to predict lithium response (74), with improved prediction when PRSs were combined with clinical data using a cross-validated machine learning regression approach (75).These insights support the strategy of studies that combine genomic information with clinical data to optimize treatment outcome prediction in psychiatry.

BIG DATA TOOLS DEVELOPMENT
To transform psychiatric treatment into precision medicine, a main challenge is making multiple data sources and modalities accessible for the training of new prediction algorithms.
Identifying and harmonizing phenotypic data is a key initial step toward precision medicine.A solution for distributed data analysis has been developed in the Nordic countries by the Tryggve infrastructure (http://www.neic.no/tryggve),building on harmonized databases and container solutions (76) for secure and efficient cross-national health research utilizing large, sensitive data collections.Container technologies provide platforms to store, share, and analyze genomic data in compliance with the General Data Protection Regulation, which can be used by users from different countries and across projects to conduct genomic data analyses (76).Big data analysis tools, such as natural language processing (77) using artificial intelligence algorithms for extraction of data from eHRs, as well as sequence analysis (78) for capturing phenotypic trajectories, can be extended to include nationwide prescription records.Sequence analysis (78) has been used to systematically explore life-course disease trajectories (79).
After harmonized phenotypes and genotypes have been linked, the data can be used to identify common and rare risk factors for treatment response, adverse effects, and comorbidities.Differences in phenotype polygenicity and cross-trait genetic overlap motivate the development of tools such as MiXeR (80) that can improve our understanding of the genetic architecture of traits of interest and how they overlap with others.Although standard GWAS approaches can be used to investigate treatment-related phenotypes, the available sample sizes for these traits are often smaller than what is seen for disease phenotypes (27), highlighting a need for more advanced biostatistical tools, such as the following examples.MOSTest (81) exploits multivariate data to improve common variant discovery and replication rates (82)(83)(84).The conditional and conjunctional false discovery rate approach (85,86) can be utilized for the identification of polygenic risk factors shared between severe mental disorders and treatment response or comorbid diseases/factors (83,87), thereby improving prediction and stratification.Applying the conditional false discovery rate approach (86) to boost discovery of genetic variants associated with TRS after conditioning on body mass index, a largely comorbid trait, 2 novel loci for TRS were identified (none were found in the original GWAS of TRS) (88).Multi-trait analyses, e.g., using genomic structural equation modeling (89) and multi-trait analysis of GWASs (90), can also be applied for improved discovery of common variants associated with treatment outcomes by leveraging genetic overlap between related traits.
The majority of existing GWAS approaches assess imputed rather than directly sequenced polymorphisms.For discovery of rare variants that confer risk for development of nonresponse or adverse effects in mental disorders, the long-range phasing method (91,92) can be applied.This method imputes variants from sequenced data to large population samples, thereby greatly improving the discovery of rare variants (91,92).
However, discoveries from GWASs may be difficult to interpret.Therefore, various fine-mapping methods aim to identify causal single nucleotide polymorphisms among the identified variants from GWASs (93).A recently developed variational Bayesian approach for fine mapping of genomic data, Finemap-MiXeR (94), has been shown to outperform most other methods in estimating the genotype-phenotype relationship because its fine-mapping algorithm detects more causal variants in real applications.Finemap-MiXeR enables the identification of a small number of genetic variants per locus, which are informative for predicting the phenotype in independent samples (94).Gene-set analysis (GSA) has become important to identify biological pathways and relevant tissue-and cell type-specific insights related to GWAS findings (46,48,95).GSA methods such as MAGMA ( 96), Fisher's exact (hypergeometric) test (97), and stratified linkage disequilibrium score regression (98), have become important for understanding the biological implications of GWAS findings (99).A novel GSA tool, GSA-MiXeR (100), estimates fold enrichment and identifies gene sets with greater biological specificity compared to standard GSA approaches, thus providing new insights into the pathobiology of complex polygenic disorders, which may help to advance the classification, diagnosis, and treatment of mental disorders (100).
Finally, phenotypic and genetic information obtained using the tools and methods described above can be integrated to improve prediction of treatment outcomes and comorbidities (101,102).The polygenic hazard score (103), a tool for prediction of age of disease onset initially applied to Alzheimer's disease (103), can be used for prediction of drug response and adverse effects.The polygenic hazard score (103) applies the Cox proportional hazards model to GWAS data of the disease and information on its age of onset to estimate instantaneous risk of disease development.Thus, the polygenic hazard score provides a fruitful framework to move polygenic information toward clinical utility.
Taken together, to reach the vision of precision treatment, gene discoveries must be leveraged by novel analytical algorithms to enable translation into clinical use.By combining genetic information with clinical and lifestyle data in the prediction of treatment outcomes, prediction accuracy can be improved.Novel artificial intelligence statistical approaches and improved prediction and stratification algorithms for both pharmacological treatment outcomes and multimorbid disease trajectories will open up new avenues of treatment of mental disorders and their accompanying comorbidities to identify an optimal treatment regimen and improve patients' quality of life.

VALIDATION BEFORE CLINICAL USE
To test the validity of the genotype-phenotype associations for genetic variants associated with treatment outcomes, replication in independent real-world samples is required.In a recent study (104), an interaction between a previously identified variant in the NFIB gene (105) and CYP1A on clozapine serum concentrations in smokers and nonsmokers was identified.Specifically, patients who smoke and carry the studied CYP1A and NFIB variants may need 3-fold higher doses of clozapine (104).Moreover, the previously mentioned study that showed that clozapine dosage was positively correlated with polygenic risk for SCZ found this association in 3 independent samples of patients with TRS, supporting the clinical impact of pharmacogenetics for precision dosing of clozapine (72).However, large real-world replication cohorts are needed to validate genetic discoveries from GWASs of psychotropic drug treatment outcomes.
RWD also offers opportunities for validation and refining of the prediction models (106), i.e., to determine treatment outcomes for patients for whom accurate prediction is currently not possible, and to identify additional data to improve the prediction capabilities for other clinical decisions.The ascertainment of individuals with specific genomic variants and subsequent evaluation in recall studies of real-world patients, known as reverse phenotyping (107), enables validation of a given prediction profile to ensure that the established genetic prediction models are valid.For genotype-phenotype associations of treatment outcomes, reverse phenotyping of patients who have started psychotropic drug treatment can be done.By splitting those cases into groups of patients with a high predicted likelihood of a positive treatment outcome, patients with a high predicted likelihood of a negative treatment outcome, and patients for which the model could not accurately predict outcome status, the developed algorithms can be validated.Likewise, the prediction models can be refined through the collection of additional clinical and outcome information on individuals for which accurate prediction was not possible.In that way, the outcome of interest can be determined, and additional data can be identified to improve the prediction capabilities of the model for these individuals.This will help to estimate the accuracy of methods and facilitate the collection of additional relevant data, potentially allowing for the development of more accurate prediction and stratification algorithms with clinical utility.

CLINICAL IMPLEMENTATION AND UTILITY
Using and combining multidisciplinary RWD from biobanks, hospitals, registries, self-reports, and medical records, as well as data from clinical research will contribute to advance the knowledge, clinical management, and pharmacological treatment of mental disorders.To implement precision medicine in clinical practice, especially crossing international borders, natural language processing tools (77) can be used for data extraction and harmonization across data sources and countries, and container technologies can be used as a platform for cross-border analysis, with tools available for standardizing various data in a unified manner across countries (76).Once large, deep-phenotyped RWD become available for clinical use, the prediction models can be trained and validated for different clinical and ethnic subgroups as well as stratified by age and sex to improve outcome prediction (108,109).By developing and validating advanced stratification and prediction tools based on measurable biomarkers, namely genotypes in combination with drug treatment outcomes and other response predictors (symptoms, disease history, cardiometabolic blood markers, body mass index, etc.), patients who do not respond to available pharmacological treatments can be identified.Identifying nonresponsive patients will enable economic savings while avoiding adverse effects from the administration of ineffective and unnecessary treatments.This will enable health and regulatory authorities to improve the standards of care in terms of safety, quality, and effectiveness of medication therapies.
Currently, there are no tools for prediction of treatment outcomes in psychiatry that are used clinically.A clinical decision support tool built on prediction and stratification algorithms integrated with digital tools could potentially improve disease outcomes.Such a clinical management platform should be designed as an integrated software solution that incorporates baseline information about risk factors and outcome predictors (clinical information, sociodemographics, genetics) with the prediction and stratification algorithms.These algorithms could be integrated with a software system for inclusion of follow-up and outcome data such as specific adverse effects (e.g., obesity, motor disturbance), self-reports (e.g., somnolence, sexual dysfunction), biomarkers (e.g., glucose levels, lipids), and socioeconomic factors collected from registries (e.g., socioeconomic status, education).To make the platform a clinically relevant tool, the monitoring system should build on the integrative clinical decision support analytics and include specific recommendations for interventions at critical time points during disease progress, such as change of medication type and dose, physical activity, a healthier diet, and referral to specialists in other disciplines (cardiology, endocrinology) when needed.The monitoring system should have a user-friendly dashboard where clinicians can quickly, easily, and securely access their patients' analytics and reports to inform clinical decision making for optimal monitoring.Such a platform could contain information that helps clinicians answer practical, ethical, and user-related questions that must be addressed to implement precision psychiatry.Combining multisource data and algorithms with new data retrieved from clinical practice while using the platform, the prediction models will be further improved.Through improved prediction, the development of a clinical management platform may ultimately enable earlier diagnosis, including diagnosis of comorbidities, facilitate planning of individual treatment, and improve clinical strategies to reduce adverse effects as well as prevent complications related to polypharmacy.In sum, a clinical management platform for monitoring patients with psychiatric disorders that integrates prediction tools with clinical information could have a strong impact on the quality of life of individuals with mental disorders.However, the platform should be used in accordance with the wishes of the patients, ensuring that data can be deleted when a patient revokes consent for data processing.

ETHICAL CONSIDERATIONS
The use of RWD and prediction models for precision psychiatry carries ethical challenges (23,110), in particular privacy protection for individuals who contribute to RWD.Ethical concerns have particularly been raised about using genomic information, including informed consent, sample collection, storage, identifiability of the samples, reidentification, sharing samples throughout the world, and privacy and confidentiality (111,112).Informed consent for genetic material should contain information about sample storage, anonymity, and an option for withdrawing the samples (113,114).Data protection issues must be addressed by data protection legislation (115) and the implementation of secure data systems to ensure that the RWD are impossible to identify and that the data are securely handled.In Europe, secure data handling environments must align with requirements from the General Data Protection Regulation and upcoming European Health Data Space, especially when databases are cross-linked.Software container technologies with tools for data capture, harmonization, and standard analysis can fulfill these requirements and be used across borders to conduct large-scale genomic and phenotypic data analyses (76).
For the clinical use of prediction and stratification tools, the requirements of regulations such as the European Union Medical Device Regulations must be fulfilled.In addition, the safety, performance, and risk-benefit ratios of the software tools need to be established prior to their clinical use.By applying secure cloud-based solutions in accordance with General Data Protection Regulation and clinical security systems, it is possible to build a versatile infrastructure that can support management platforms across health care systems.

CONCLUSIONS
To bring precision medicine interventions to psychiatry, RWD from health care systems combined with biobanks and research data can solve the need for large-scale data that are necessary for the training and testing of prediction models related to treatment outcomes in mental disorders.The implementation of an RWD infrastructure, novel tools to exploit these large datasets, and a clinical management platform with prediction algorithms for medication response and adverse effects offer large opportunities for precision psychiatry to improve treatment outcomes and quality of life of individuals with mental disorders.

Figure 1 .
Figure 1.The integration of multiple big real-world data sources and prediction algorithms into a clinical management platform for precision treatment and improved outcomes in psychiatry.eHealth, electronic health records.