Reduced Protein Stability of 11 Pathogenic Missense STXBP1 /MUNC18-1 Variants and Improved Disease Prediction

BACKGROUND: Pathogenic variants in STXBP1 /MUNC18-1 cause severe encephalopathies that are among the most common in genetic neurodevelopmental disorders. Different molecular disease mechanisms have been proposed, and pathogenicity prediction is limited. In this study, we aimed to de ﬁ ne a generalized disease concept for STXBP1-related disorders and improve prediction. METHODS: A cohort of 11 disease-associated and 5 neutral variants (detected in healthy individuals) were tested in 3 cell-free assays and in heterologous cells and primary neurons. Protein aggregation was tested using gel ﬁ ltration and Triton X-100 insolubility. PRESR (predicting STXBP1-related disorder), a machine learning algorithm that uses both sequence-and 3-dimensional structure – based features, was developed to improve pathogenicity prediction using 231 known disease-associated variants and comparison to our experimental data. RESULTS: Disease-associated variants, but none of the neutral variants, produced reduced protein levels. Cell-free assays demonstrated directly that disease-associated variants have reduced thermostability, with most variants denaturing around body temperature. In addition, most disease-associated variants impaired SNARE-mediated membrane fusion in a reconstituted assay. Aggregation/insolubility was observed for none of the variants in vitro or in neurons. PRESR outperformed existing tools substantially: Matthews correlation coef ﬁ cient = 0.71 versus , 0.55. CONCLUSIONS: These data establish intrinsic protein instability as the generalizable, primary cause for STXBP1-related disorders and show that protein-speci ﬁ c ortholog and 3-dimensional information improve disease prediction. PRESR is a publicly available diagnostic tool.

The first functional studies on disease-associated STXBP1 variants led to diverse conclusions as to the mechanism of pathogenesis.The first variant studied, C180Y (3), showed reduced binding to its binding partner syntaxin-1 (functional effect) and reduced thermostability.Subsequent studies showed lower protein levels in heterologous cells (24,25) and primary mouse neurons (26) for several variants (protein-level effect) as well as effects on synaptic transmission (26) (functional effects), while 2 other studies showed that four (27) or four of the five (28) disease-associated variants aggregate together with the wild-type (WT) protein, also in primary mouse neurons and in vivo in C. elegans, and are found in Triton X-100-insoluble fractions (dominant-negative effects).Functional effects may not be a general feature because they vary among disease-associated variants, they do not correlate with clinical symptoms (26), and the 500 disease-associated variants currently described are generally evenly distributed over the whole coding sequence (6).However, it remains to be resolved whether lower cellular levels are caused by intrinsic protein instability and/or aggregation, whether these properties are shared by all disease variants, and whether these features are exclusive to disease-associated variants and not seen in those observed in the healthy population.Resolving these issues will also improve prediction of pathogenicity for new variants and prognosis and will be crucial for therapy design.The possibility of dominant-negative effects and protein aggregation that may lead to cell death (27) is a particularly crucial issue given the impact on prognosis and treatment strategies.However, neuronal loss has not been reported as an obvious feature in patients (4,29,30).
Accurate prediction of pathogenicity and discriminating disease-causing and population (neutral) variants is crucial for diagnostics.Improving this classification increases clinical efficiency, ends the diagnostic odyssey for families, supports cohort assembly for clinical trials, and ultimately improves treatment and is thus of utmost importance for the clinic, patients' families, and society.Several computational tools have been developed to predict pathogenicity caused by missense variants (31)(32)(33)(34).However, prediction may still be improved.First, most current predictors [with exceptions; see FunNCion (35)] are generic for the whole proteome and not optimized to specific proteins.Second, they often use only sequence homology and/or physiochemical properties of amino acid changes as features to predict pathogenicity and do not take into account 3-dimensional (3D) structure information of the protein or predictions thereof (36).Third, they use a fixed set of features for all the proteins rather than determining protein-specific features (32).A reliable, protein-specific predictor of pathogenicity has the potential to increase prediction accuracy and provide deeper insights into underlying disease mechanisms.
Here, we investigated a cohort of 11 disease-associated variants in STXBP1 and 5 neutral variants observed in the healthy population side by side in several model systems: heterologous cells (human embryonic kidney [HEK] 293) that have low expression of established MUNC18-1 interactors, mouse homozygous null mutant primary neurons that provide a normal palette of established interactors (without the endogenous mouse protein competing), and cell-free assays that allow the assessment of (altered) intrinsic properties.We showed that protein instability, decreased expression levels, and impaired functionality underlie MUNC18-1 variant pathogenicity.In contrast, no evidence was observed for protein aggregation or insolubility for any of the variants in vitro or in neurons.In addition, we developed PRESR (predicting STXBP1-related disorder), a machine learning-based predictor of missense variant pathogenicity that uses a diverse set of sequence-and 3dimensional structure-derived features.On an independent test set of MUNC18-1 missense variants, PRESR performed better than previously existing predictors.To assess individual variants on a 3-dimensional and interpret the participation of features involved, we developed the PRESR web server, freely available at PRESR.russelllab.org.

Dataset Assembly
We used literature searches, gnomAD (37), and UniProt (38) to create a list of 231 variants in MUNC18-1; see Supplemental Methods and Materials in Supplement 1 for details.

Other Methods
See Supplemental Methods and Materials in Supplement 1 for discussion of the following: Orthologs and paralogs alignments; feature generation and mutational information; logistic regression, cross-validation, and randomization test; prediction tools and web application; MUNC18-1 point variant modeling and solvent-accessible surface area; protein purification and gel filtration; animals and neuronal cultures; and immunocytochemistry and Western blotting.

Nano Differential Scanning Fluorimetry
Melting curves of WT and mutant MUNC18-1 were recorded as specified in Supplemental Methods and Materials in Supplement 1.

Small Unilamellar Vesicles/Giant Unilamellar Vesicles Fusion Assay
The lipid mixing assay was performed as described previously (17,19,39), with some modifications; see Supplemental Methods and Materials in Supplement 1 for details.

Triton X-100 Solubility Assay
The solubility assay was performed as described before (28); see Supplemental Methods and Materials in Supplement 1 for details.

Statistical Analysis
See Supplemental Methods and Materials in Supplement 1.

Disease-Associated Variants Reduce MUNC18-1 Protein Levels
Previous studies of disease-associated variants in STXBP1 have led to different conclusions, showing functional effects, protein-level effects, and dominant-negative effects (3,(24)(25)(26)(27)(28).To help resolve these differences, we experimentally examined a cohort of disease-associated MUNC18-1 variants and variants found in the (healthy) population (Figure 1A, B) in heterologous cells, in primary neurons, and as purified proteins (in total, 11 disease and 5 neutral variants) and analyzed cellular expression levels, protein stability, protein aggregation, and SNARE-mediated membrane fusion.
First, we assessed the effect of this collection of variants on MUNC18-1 protein levels in heterologous HEK293T cells, which lack endogenous MUNC18-1 and syntaxin-1 (Figure 1C, E).Western blot analysis of all disease variants (P575S did not reach significance) showed lower cellular levels, as shown before for several of these (3,24,26).In contrast, neutral variants did not show lower cellular levels (Figure 1C, E).
Second, in mouse Stxbp1 knockout neurons, the disease variants (except P575S) showed significantly reduced expression levels whereas neutral variants V84I and P94L were not significantly different from WT protein (Figure 1D, F).Together, these data show that the majority of diseaseassociated variants led to reduced protein levels, whereas neutral variants did not.
Taken together, these findings suggest that a lower cellular protein level is a shared feature among disease-associated variants.

STXBP1 Disorder Mechanisms and Predictions
Biological Psychiatry

Disease-Associated Variants Impair Protein Thermostability
To assess whether a lower cellular protein level is the direct consequence of reduced intrinsic stability of these variants, we used nano differential scanning fluorimetry to measure thermostability of purified MUNC18-1 (Figure 2A).Because MUNC18-1 and syntaxin-1 depend on each other for correct cellular localization and stability (23,(40)(41)(42)(43)(44), the effects of variants were assessed for MUNC18-1 alone (Figure 2B), as well as for the binary MUNC18-1/syntaxin-1 complex (Figure S1A in Supplement 1).WT MUNC18-1 alone showed a mean melting temperature of 45.4 C (Figure 2B), consistent with previous studies (3).Melting temperature was decreased for disease-associated variants C180Y, D262V, H445P, P575S, and, although less pronounced, for L446F (Figure 2B).Thermostability of the neutral variants V84I, P94L, A430T, F153S, and R457; disease variant T361I; and unknown variant R536H (see Dataset Assembly in Supplement 1) was not impaired beyond 3 C (Figure 2B).The mutants I77N, M443R, G544D, and T574P could be expressed using the bacterial E. coli expression host but could not be purified, probably as a consequence of impaired stability (indicated by N.D. [not determined] in Figure 2B).The presence of syntaxin-1 increased stability of WT and all variants, but most diseaseassociated mutants still showed decreased stability (Figure S1A in Supplement 1).Together, disease-associated MUNC18-1 variants, except T361I and unknown variant R536H, caused severely reduced thermostability whereas neutral variants did not.We also inspected biophysical characteristics of the test set by analyzing the surface exposure and calculating the biomolecular energy functions (45-47) (Figure S1B, C in Supplement 1).

Both scores were variable between variants, with no clear discrimination between disease-associated and neutral variants.
To examine the relationship between the thermostability and expression levels of MUNC18-1 variants, the melting temperatures of all tested variants in Figure 2B were plotted against their expression levels in HEK293T cells (Figure 1E) and neurons (Figure 1F).Thermostability and protein levels were correlated with HEK293T cells (r = 0.86) (Figure 2C) and neurons (r = 0.77) (Figure 2D).Thus, protein stability of MUNC18-1 variants was correlated with expression levels.
We employed 2 distinct reconstituted lipid mixing assays to analyze the functional impact of MUNC18-1 variants on SNARE-mediated fusion (Figure 3; Figures S2 and S3 in Supplement 1).In the first assay, syntaxin-1 giant unilamellar vesicles (GUVs) were used (Figure 3A, B).The addition of WT MUNC18-1 led to a dose-dependent increase of lipid mixing that started to saturate at 250 nM and resulted in a 6-fold stimulation of the total fusion after 30 minutes (Figure 3C).To test whether variants show a similar concentration dependency as WT, we performed a titration curve for C180Y.Similar to WT MUNC18-1, C180Y saturated at 250 nM (Figure 3C).To exclude dose-dependent effects, all other MUNC18-1 mutants were tested at 750 nM, and their effect on total fusion after 30 minutes was normalized to WT. Neutral  variants V84I and P94L, disease variants T361I and L446F, and unknown variant R536H increased fusion kinetics and efficiency comparable to WT, while the disease-associated variants C180Y, D262V, P575S, and H445P led to a decrease in MUNC18-1-dependent stimulation (Figure 3D; Figure S3A in Supplement 1).Thus, the majority of the tested disease variants were at least partially impaired in stimulating initial SNARE-complex assembly, likely compromising the formation of the syntaxin-1 MUNC18-1 VAMP2 template intermediate.In a second assay, GUVs contained already preassembled t-SNARE complexes, bypassing some of the rate-limiting steps in SNARE-complex assembly (Figure S2A in Supplement 1).As expected, fusion kinetics were faster, and WT MUNC18-1 stimulated lipid mixing 1.5-fold (Figures S2B  and S3B in Supplement 1).V84I, C180Y, and D262V showed a tendency to reduced stimulation while the other variants supported lipid mixing comparable to WT MUNC18-1.Therefore, when t-SNAREs were already formed, only few diseaseassociated variants had a rather minor impact on the facilitating role of MUNC18-1 in liposome fusion.
Lastly, we assessed potential dominant-negative effects of the disease variant MUNC18-1 C180Y in membrane fusion at a 1:1 ratio with WT.This mixture did not significantly reduce the fusion kinetics compared to WT alone (Figure S4C in Supplement 1).Therefore, under these in vitro conditions, no dominant-negative effects were observed.
To explore the relationship between MUNC18-1 stability and membrane fusion, melting temperature values (Figure 2B) were correlated to syntaxin-1 GUV fusion data (Figure 3D).Melting temperature was correlated with membrane fusion efficiency (r = 0.76) (Figure 3E), indicating that MUNC18-1 instability caused functional defects in SNARE-mediated membrane fusion.

MUNC18-1 Variants Do Not Show a Tendency to Aggregate at Physiological Expression Levels
To assess intrinsic aggregation propensities, purified MUNC18-1 was analyzed using gel filtration.The elution volume and the molecular weight of WT MUNC18-1 (70.1 kDa) corresponded to the monomeric form (Figure 4A).All variants exhibited elution volumes that were indistinguishable from WT with molecular weights of monomeric proteins (Figure 4A).When mixed with stoichiometric amounts of MUNC18-1 WT, the C180Y variant did not alter its behavior (Figure 4A, blue dotted line).Thus, all purified MUNC18-1 variants remained monomers.Four variants (I77N, M443R, G544D, T574P) could not be purified using the heterologous E. coli expression system.
Next, cellular solubility of MUNC18-1 variants was assessed using the Triton X-100 solubility assay, which has been used before to demonstrate increased insolubility of MUNC18-1 disease variants (28).WT and mutant MUNC18-1 were expressed in HEK293T cells via lentiviral delivery to ensure physiological expression levels (26) (Figure 4B).WT MUNC18-1 as well as all variants were largely found in the soluble fraction (w99%) (Figure 4C).These findings are in contrast with earlier studies (24,27,28).To test whether the differences arose from protein tags and/or expression methods, we repeated the Triton X-100 assay for the WT and a subset of variants (I77N, C180Y, and L446F) using the same methodology as previous studies (28) (Figure S4A, B in Supplement 1).Compared with lentiviral delivery, calcium transfection increased expression 88-fold, and disease variants showed profound insolubility (Figure S4A, B in Supplement 1).In contrast, the presence of a 6xMYC tag at the N-terminus did Lastly, using lentiviral infection, we inspected cells for signs of MUNC18-1 aggregation (Figure 4D).For WT, neutral, and disease-associated variants, diffuse, cytosolic MUNC18-1 immunoreactivity was observed, without noticeable differences between conditions.Taken together, we found no indications for MUNC18-1 variants to be polymeric, to aggregate, or to have reduced solubility at physiological levels.

MUNC18-1-Specific Predictors of Pathogenicity
To improve disease prediction, we developed a proteinspecific, logistic regression-based classifier of pathogenicity and tested its prediction strength on STXBP1-RD.To date, 231 unique missense variants have been described, 108 of which are associated with disease (Figure 5A, purple), and the remaining 123 are classified as neutral variants (Figure 5A, green; Table S1F in Supplement 2).The variants are distributed over the entire protein without clear hot spots (Figure 5A).Using w90% of the known missense variants as a training set, we developed PRESR, a MUNC18-1-specific pathogenicity predictor in 3 steps.First, we defined 6 sequence-and 7 structure-derived features of MUNC18-1 that could potentially influence variant pathogenicity (see Methods and Materials) (Figure 5B; Table S1C in Supplement 2).Second, we applied lasso regression to select the features that could discriminate between disease and neutral variants, which left 7 features (Figure 5C).Next, we trained PRESR on the discriminative features using logistic regression (Figure 5C).PRESR returns the pathogenicity score of a MUNC18-1 variant in terms of probability, where a value .0.S1A in Supplement 2).We compared the performance of PRESR with other publicly available predictors on an independent test set of 11 disease and 5 neutral variants (Figure 5D).Based on Matthews correlation coefficient (MCC), a reliable metric to compare the performance of predictive tools (52,53), PRESR outperformed (MCC = 0.71) other predictors (MCC # 0.54) (Figure 5E; Table S1B in Supplement 2).We also explored whether Rosetta (Table S1G in Supplement 2) could help distinguish disease from population variants; however, the inspection suggested that substantial weight is given to variants potentially disrupting structure (e.g., P94L) despite evidence from evolutionary conservation (which Rosetta does not use) that such a change would likely be tolerated.
The regression coefficients obtained post training of a logistic regression-based model (PRESR) can be used to understand the relevance of each feature (54,55).We extracted the regression coefficients of PRESR to determine the contribution of each feature (Figure 5E).The conservation across MUNC18 orthologs and the number of residues in contact with the variant position along with its predicted backbone flexibility and torsion angles are the major contributors to pathogenicity prediction (Figure 5E).To further aid in the visualization of the

PRESR Predictions Correlate With Experimental Results
To assess how PRESR predictions compared with biological properties of MUNC18-1 variants, probabilities were correlated with outcomes on the thermostability, cellular expression, and functional fusion data (Figures 1-3).While the probabilities showed high correlations with both thermostability (r = 20.68)(Figure 6A) and neuronal expression levels (r = 20.74)(Figure 6B), they were less correlated with the membrane fusion assay (r = 20.40)(Figure 6C).This can be attributed to the outlier H445P, and its exclusion resulted in a correlation of greater magnitude (r = 20.82)(Figure 6C).We also assessed how well PRESR and the experimental data agreed on pathogenicity classification (Figure 6D).For 14 of the 16 variants, agreement was found between PRESR classification and experimental data.Conflicting results were found for 2 variants (visualized as mixed green-purple circles); the neutral variant F153S and the unknown variant R536H were classified as disease variants by PRESR, but F153S and R536H showed no experimental effect on protein stability (and R536H is in fact also present in gnomAD, suggesting that this is a neutral variant).Despite these few exceptions, PRESR probabilities showed an encouraging degree of agreement with our experimental data regarding pathogenicity.
Finally, given the high predictive value of PRESR, we visualized PRESR probability scores of all possible amino acid substitutions at all positions (Figure 6E).As expected, given the training approach above, the predicted pathogenicity was not randomly distributed across the protein, but instead showed clusters of increased pathogenicity (e.g., residues 242-252 and 175-183/536-545/563-567) and decreased pathogenicity (e.g., residues 297-345).This heatmap provides a useful guide for interpreting variants and selecting appropriate amino acids for targeted mutagenesis to further characterize MUNC18-1 function.

DISCUSSION
In this study, we investigated a cohort of 11 diseaseassociated missense variants in STXBP1 and 5 neutral variants in heterologous cells, in primary neurons, and in new cell-free assays.We showed that protein instability and decreased expression levels are a common principle underlying MUNC18-1 variant pathogenicity.No evidence was observed for protein aggregation or insolubility for any of the variants in vitro or in neurons.We then developed a machine learning-guided framework for predicting variant pathogenicity in monogenic diseases that relies on protein-specific, sequence-and 3D structure-based features.PRESR outperformed all existing, routinely used classifiers tested here.Conservation across orthologs, the number of contact residues, and protein backbone flexibility were most relevant to predicting variant pathogenicity.Moreover, we showed that the biochemical and cell biological findings were correlated with the in silico predictions, thereby experimentally validating predictions of the framework.

Protein Instability Reduces Cellular MUNC18-1 Levels and Affects Membrane Fusion
Our data show that intrinsic MUNC18-1 instability is a common feature among a collection of disease-associated missense variants and is therefore the probable cause for the lower expression levels that have been observed before for several disease-associated variants (3,(24)(25)(26)28,56).In addition, the correlation between thermostability and MUNC18-1dependent stimulation in the syntaxin-1 GUV assay suggests that MUNC18-1 instability also directly impairs membrane fusion, likely by impaired formation of MUNC18-1 template complexes, a major physiological intermediate during SNAREcomplex assembly (14,16,57).
Lower protein levels alone may not be sufficient to fully explain STXBP1-RD pathogenicity of some missense mutations.All disease-associated variants retained at least some syntaxin-1 binding capacity, as demonstrated by the increased melting temperatures in the presence of syntaxin-1 (Figure S1A in Supplement 1) and were added in molar excess over syntaxin-1 in fusion assays.In these assays, 4 of 7 variants showed moderate to severe functional deficits (Figure 3D), which did not improve when adding even more MUNC18-1, at least for one mutant (Figure 3C).Therefore, at least some disease-associated missense variants had impaired functionality, in addition to impaired protein stability.The 2 effects together are expected to compromise SNARE-mediated membrane fusion and explain the observed haploinsufficiency in STXBP1-RD.The fact that protein truncation variants and 5 0 or whole-gene deletions produce a similar symptom spectrum suggests that the latter, reduction in the amount of functional protein, is the most central aspect.
Protein instability is often presumed to be an explanation for causative variants that do not show any particular sequence or spatial concentration within a protein (58), although this has seldom been demonstrated with such rigor as we have done here.This notion has been challenged by the proposal that many disease variants are edgetic (59), or caused by disruption of specific protein-protein interactions.Regardless of the general predominance of edgetic variants across all diseases, our work suggests that they do not predominate in STXBP1-RD.

MUNC18-1 Aggregation and Dominant-Negative Phenotypes Were Not Observed
Oligomerization, insolubility, and aggregation of MUNC18-1 variants may lead to dominant-negative effects.We found no evidence for any variant to aggregate or increase insolubility in mammalian cells at physiological expression levels, and this was also true for some of the variants previously described to aggregate (27,28).Only upon strong overexpression did MUNC18-1 variants accumulate and became insoluble, likely forming aggregates by exceeding the processing capacity of the cellular protein quality control system (60).
Dominant-negative effects of missense variants are expected to be more pathogenic than nonsense variants, truncations, or full gene deletions, although this does not appear to be true for STXBP1-RD (1,4,6) negative effects on synaptic transmission were found when disease-associated variants were expressed at approximately physiological levels in heterozygous STXBP1 neurons (26).
Based on these findings together with the data presented in this study, we conclude that most evidence argues against dominant-negative effects as a shared feature of MUNC18-1 missense variants.However, we cannot exclude the possibility that specific variants not analyzed in this study harbor dominant-negative properties or that dominant-negative effects occur under specific conditions not modeled in the current study (e.g., during cell stress or aging).
Protein-Specific Sequence and 3D Structure Information Provide New Insights Into Disease Mechanisms PRESR revealed that among the sequence-derived features, the conservation of the substituted amino acid across MUNC18-1 orthologs followed by the predicted backbone flexibility are the most discriminating feature between disease and neutral variants (Figure 5E).Thus, evolutionary constraints provide a rich resource to identify vulnerable amino acids and specific exchanges, as has been described for other methods (32)(33)(34).The current study demonstrates that, in addition, 3D structural features, especially changes in the number of contacting residues and dihedral angles upon substitution, are also highly predictive.Furthermore, by assessing the predicted pathogenicity of all possible amino acid substitutions at any given position, we identified disease-associated clusters (Figure 6E).Interestingly, 2 of those clusters (amino acids 242-252 and 175-183/536-545/563-567) lie within the center of the 3D structure of MUNC18-1.Perturbations at these buried sites could either unfold the entire protein or cause partial functional impairments due to allosteric effects.The binding interface between MUNC18-1 and the closed conformation of syntaxin-1 (MUNC18-1 amino acids 26-59) is also largely predicted to be disease associated, which is consistent with previous studies (4,6).Surprisingly, helix 11 and helix 12, which compromise the VAMP2 binding interface (14), appear to mostly tolerate single amino acid substitutions.This is consistent with our previous mutagenesis study (14), the transient nature and the low affinity of this interaction, and potential avidity effects due to multiple MUNC18-1 template complexes at vesicle fusion sites.Our disease prediction across the whole protein indicates at least 2 distinct disease mechanisms: 1) allosteric effects caused by buried residues, and 2) mutations within the MUNC18-1/syntaxin-1 interface.Although the experimental data largely correlated with PRESR, 2 variants displayed discrepancies.F153S was predicted to be disease causing (PRESR prediction score .0.5), and R536H was labeled disease associated in a previous study (6) but was also found in a control cohort (37).Even though both variants are buried in the MUNC18-1 structure (43), F153S does not impair protein stability in any assay, and R536H showed normal thermostability as a purified protein, but (slightly) reduced levels when expressed in neurons.Thus, the neutral variant F153S seems to be misclassified by PRESR.In contrast, the classification for R536H remains uncertain.This variant may be the first variant that is observed in both healthy individuals and patients with STXBP1-RD.Finally, the disease variant T361I is predicted by PRESR to be neutral, and all biological characterizations indicate that this variant is neutral.Because residue 361 is facing outward, it may impair (yet unknown) interactions, or alternatively, the nucleotide substitution in the STXBP1 gene may affect RNA stability or processing.Because we used complementary DNAs in our experiments, potential messenger RNA defects were not detected.Furthermore, the choice of model systems in this study does not rule out human-specific effects and/or interactions with the specific genomic background of the patient.Overall, the combined in silico and experimental investigation of pathogenicity not only provides insight into disease mechanisms but also identifies atypical variants, which could also help to explain the broad palette of patient phenotypes.
A Scalable, Machine Learning-Guided Framework Improves Pathogenicity Prediction PRESR offers several advantages over previously developed predictors.Its unique framework relies on the protein-specific, sequence-and 3D structure-based features to determine variant pathogenicity, and it performs better than generic classifiers.The logistic regression model enables us to extract regression coefficients and rank features based on their relevance.The publicly available PRESR webserver (presr.russelllab.org)provides users with detailed information on a given variant, including distribution plots of the most relevant features and visualization of the variant on a 3D structure.This significantly improves variant pathogenicity interpretation and understanding of underlying disease mechanisms.The application of protein-specific predictors will contribute to increased diagnostic accuracy and reduced time to reach a clinical diagnosis, which is otherwise labor and cost intensive.Moreover, it will promote clinical research and the development of treatment strategies.
Given the ever-growing amount of genetic screening data, PRESR will be continuously updated as the number of known MUNC18-1 variants increases [see (6)], thereby improving prediction performance.As we extend our understanding of protein sequences and structures, new features can be added to our computation workflow that will broaden our understanding of functional impacts of variants.This can help in the discovery of novel disease mechanisms, which can be experimentally validated by specialized assays such as the ones described in this framework.This framework can be further extended to other monogenic disorders.Previous predictors tend toward a generic framework, which is sensible when disease/neutral variants are relatively rare.More recent predictors have focused on individual genes or families with a similar improved performance [e.g., FunNCion (35)].However, today, many diseases have hundreds of known genetic variants, and the explosion in high-throughput sequencing means that there are now hundreds or thousands of neutral variants for virtually all human proteins (37).There are currently 429 proteins for which at least 50 disease-associated variants are reported in either UniProt (38) or ClinVar (61) (see Methods and Materials) (Table S1F in Supplement 2).This means that it is conceivable for many diseases/genes to consider specific, rather than generic, pandisease, pangene approaches because they will have sufficient data to train a learning algorithm.The feature space can be automatically tailored to the protein of interest using a feature selection method (such as lasso regression used in this study), and an interpretable machine learning-based algorithm (such as logistic regression) will rank the features based on their relevance.Moreover, the recent great improvement in the accuracy of protein 3D structure prediction by AlphaFold (36)  To maximize the potential of protein-specific prediction algorithms, variants of candidate proteins should harbor high penetrance for disease traits.The combination with biochemical/biological assays provides a more in-depth understanding of general pathogenicity mechanisms and identification of atypical variants.Such tailored predictions could also be helpful for identifying critical interfaces for binding partners (proteins, lipids, and other ligands).The PRESR webserver (PRESR.russelllab.org)reveals critical surface residues in the 3D structure.

Figure 3 .
Figure 3.Most disease-associated variants reduce MUNC18-1 dependent stimulation in reconstituted membrane fusion assay.(A) Incubation scheme of the syntaxin-1 GUV-based lipid mixing assay.(B) Typical examples of the syntaxin-1 GUV lipid mixing assay.Lipid mixing kinetics were recorded at 37 C for 30 minutes as the increase in Atto488 fluorescence.The fusion efficiency was plotted as the normalized fluorescence after membrane lysis by detergent (% of max).(C) Titration of MUNC18-1 WT and C180Y.The final lipid mixing signal after 30 minutes (% of max) is plotted in the presence or absence of varying concentrations of MUNC18-1 WT or C180Y, respectively.n = 4-11 of technical replicates, 1-way analysis of variance with Dunn's post hoc comparison, *p , .05 and for all other conditions, ****p , .0001(not displayed for clarity).(D) Membrane fusion stimulation by MUNC18-1 variants.Lipid mixing kinetics were recorded in the presence or absence of 750 nM MUNC18-1 or a respective disease variant and analyzed as described in (B) and (C).The effect size is normalized and plotted as % of WT.Diseaseassociated variants C180Y, D262V, P575S, and H445P led to a decrease in stimulation (67%, 71%, 73%, and 18% activity, respectively).n = 3-11 of technical replicates, 1-way analysis of variance with Dunn's post hoc comparison, ****p , .0001.(E) Correlation plot of the normalized fluorescence at 30 minutes (% of max) and melting temperatures.GUV, giant unilamellar vesicle; n.s., nonsignificant; RT, room temperature; SUV, small unilamellar vesicle; WT, wild-type.

5 Figure 4 .
Figure 4. MUNC18-1 disease variants do not form oligomers. (A) Purified MUNC18-1 WT and variants show similar elution volumes in gel filtration experiments.Every mutant was analyzed once on a Superdex 75 Increase 10/300 column (GE Healthcare).The experimentally determined apparent molecular weights correspond to the theoretical molecular weight (74 kDa) of the monomeric protein.(B) Representative results of the Triton X-100 solubility of MUNC18-1 expressed in neurons.Composite panels are from the same Western blot analysis.(C) The MUNC18-1 WT and mutants show similar Triton X-100 solubility, with the exception of a minor insoluble G544D fraction.WT n = 38, mutants n = 8-15.Kruskal-Wallis test with Dunn's post hoc comparison, *p , .05. (D) Neuronal imaging of MUNC18-1 expressing neurons.A.U., arbitrary unit; GFP, green fluorescent protein; MW, molecular weight; n.s., nonsignificant; WT, wildtype.

Figure 5 .
Figure 5. PRESR: a MUNC18-1-specific predictor of pathogenicity.(A) The disease-associated variants (shown in purple) and neutral variants (shown in green) included in this study are broadly distributed within MUNC18-1.(B) Features included in the machine-learning process.(C) Machine learning by logistic regression and 5-fold cross validation.(D) Crystal structure of the MUNC18-1 (gray)/syntaxin-1 (tan) complex (PDB:3C98).The C-alpha atoms of the disease and neutral variants used as a test set are shown as purple and green spheres, respectively.Variants are associated with different clinical phenotypes.Please note that all disease-associated variants shown here (except D262V) are also summarized in Xian et al. (6).(E) Benchmarking PRESR of the test set.(F) The variants comprising the test set were also examined in a multilevel experimental approach.MCC, Matthews correlation coefficient.

Figure 6 .
Figure 6.PRESR predictions correlate with experimental data.(A) Correlation plot of MUNC18-1 variant melting temperatures ( C) and PRESR scores.(B) Correlation plot of the neuronal expression levels (% of WT) and PRESR scores.(C) Correlation plot of membrane fusion signals (normalized fluorescence at 30 minutes [% of WT]) and PRESR scores.(D) Graphical representation of MUNC18-1 (gray) together with syntaxin-1 (tan).PBD:C398.The C-alpha atoms of the amino acids delineated in this study are shown as spheres in the 3-dimensional (3D) structure of MUNC18-1.The position of the respective Ca in the blowups is depicted by a black line.Within each blowup, all polar contacts between the disease mutant and neighboring residues are indicated by black dotted lines.All residues that are involved in the interaction are additionally shown as sticks.The circles around each blowup indicate the effect (unchanged or impaired) of a given variant as determined by a given method.(E) PRESR predictions scores for all possible amino acid exchanges at any given position in MUNC18-1 reveal positions vulnerable to causing disease.N.D., not determined; WT, wild-type.