(C) PLOS One
This story was originally published by PLOS One and is unaltered.
. . . . . . . . . .



A comprehensive exploration of the druggable conformational space of protein kinases using AI-predicted structures [1]

['Noah B. Herrington', 'Department Of Pharmacological Sciences', 'Icahn School Of Medicine At Mount Sinai', 'New York', 'United States Of America', 'Yan Chak Li', 'Department Of Genetics', 'Genomic Sciences', 'David Stein', 'Gaurav Pandey']

Date: 2024-10

Lowering MSA depth diminished the bias of AI prediction methods favoring particular conformations

We first compared the current conformational space of experimentally determined structures of kinases in the PDB and structural models deposited in the AlphaFold2 Protein Structure Database (‘AF2 Database’) [54] and generated by ESMFold [41] (S1 Fig). In brief, we downloaded all available experimentally determined structures of 497 human kinase domains (from 484 kinases) from the PDB (5,136 structures for 331 proteins). While 153 kinases (32% of the kinome) had no known experimentally determined structure, each kinase of the 331 proteins with at least one known structure had over 15 structures on average. These structures were classifiable [55] into one of six defined conformations, including ɑC-Helix-in / DFG-in (CIDI), ɑC-Helix-in / DFG-out (CIDO), ɑC-Helix-out / DFG-in (CODI), ɑC-Helix-out / DFG-out (CODO), DFGinter (an in-between DFG conformation), and Unassigned (none-of-the-above) (Fig 1A). Of the classifiable structures, 50.0% only exhibited one conformation, highlighting the limited coverage of kinase conformations.

PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 1. Kinase conformations in the PDB and AlphaFold2 (AF2) Protein Structure Database. (A) Prototypical structural conformations considered in our study: CIDI: ɑC-Helix/DFG-in; CIDO: ɑC-Helix-in/DFG-out; CODI: ɑC-Helix-out/DFG-in; CODO: ɑC-Helix-out/DFG-out; DFGinter (in between DFG-in and DFG-out); Unassigned (none-of-the-above). Examples shown are CIDI: MAPK14 (PDB ID: 1BL6), CIDO: RIPK2 (PDB ID: 4C8B), CODI: CDK2 (PDB ID: 1H07), CODO: NTRK1 (PDB ID: 6D1Y), DFGinter: AURKA (PDB ID: 3FDN), Unassigned: PDGFRA (PDB ID: 6JOJ). Residues in lime green indicate the aspartic acid and phenylalanine (DF) of the DFG motif, while magenta indicates the ɑC-Helix, whose conformation is signaled by movement of the conserved Glu residue. Red and blue atoms indicate oxygen and nitrogen atoms, respectively. (B) Fractional distribution of kinase structures from the PDB (n = 5,024) classified into each conformation type. (C) Fractional distribution of kinase models from the AF2 Database (n = 469) classified by conformation. *p-value was calculated using the Fisher’s exact test and indicated that the PDB and AF2 distributions were significantly different. **p(CIDI) was calculated using a one-sided Wilcoxon rank-sum and indicated AF2 had a significantly higher overrepresentation of CIDI models compared to the PDB. (D) Fractional distributions of PDB structures in different conformations by kinase group (AGC: PKA, PKG, PKC families; CAMK: Calcium/calmodulin-dependent; CK1: Casein kinase 1; CMGC: CDK, MAPK, GSK3, CLK families; STE: Sterile 7, Sterile 11, Sterile 20 kinases; TK: Tyrosine kinase; Tyrosine kinase-like). (E) Fractional distributions of models in different conformations by kinase group in the AF2 Database. https://doi.org/10.1371/journal.pcbi.1012302.g001

We also downloaded all computational models of these 497 kinase domains from the AF2 Database and predicted structures of the same domains using ESMFold, which were all also classified by their conformation. We calculated the fractional distribution of the six conformation types present in the PDB and AF2 Database (Fig 1B and 1C) and models predicted by ESMFold (S2 Fig). Our analysis revealed a significant overrepresentation of the active ‘CIDI’ state in all three datasets. Interestingly, both AF2 and ESMFold exhibited an even greater fraction of CIDI models than those in the PDB (73.6%, 82.9% and 65.7%, respectively); p(PDB CIDI < AF2 CIDI ) = 1.26 x 10−34; p(PDB CIDI < ESMFold CIDI ) < 2.2 x 10−16. Furthermore, DFG-out conformations were underrepresented in all three datasets, more so in the AF2 Database and by ESMFold than in the PDB (2.3%, 0.8% and 9.7%, respectively; p(PDB DFG-out > AF2 DFG-out ) < 2.2 x 10−16; p(PDB DFG-out > ESMFold DFG-out ) < 2.2 x 10−16). To test whether these differences arose from the expanded coverage of the kinome by AF2 or ESMFold, we limited analysis to kinase models with deposited structures in the PDB (n = 331) (“i.e., Overlap”). The distributions of AF2 and ESMFold kinase conformations showed similar trends when compared to their full datasets (S3A and S3B Fig, respectively; p(AF2 Overlap) = 0.751; p(ESMFold Overlap) = 0.913). This suggests that these ab initio predictors exhibit a bias for predicting structures of kinases in the active (CIDI) state.

Next, we investigated if the bias for the CIDI conformation existed within individual kinase family or evolutionary groups. We therefore computed the fractional distributions of each conformation by kinase group (AGC, CAMK, CK1, CMGC, Other, RGC, STE, TK and TKL) [56] using our datasets from the PDB and AF2 Database (Fig 1D and 1E) and the models predicted by ESMFold. We observed that the active conformation, CIDI, consistently represented the greatest fraction in all kinase groups in all three datasets, ranging from 46.7% (Other) to 98.9% (CK1) in the PDB, 62.2% (TK) to 100% (CK1) in the AF2 Database, and 58% (STE) to 100% (CK1 and CMGC) by ESMFold (S4 Fig). These same groups did not retain those similar distributions in ESMFold-predicted models (S4 Fig). Interestingly, the two groups with the highest fractions of CIDO structures in the PDB, TK and TKL, had lower CIDO fractions by AF2 and ESMFold. However, it is noteworthy that TK had the highest CIDO fraction in all three datasets. Intriguingly, the understudied CODO conformation was a minor population across all groups in the PDB, ranging from 0.0% (CK1) to 6.1% (TKL), but was even less frequent in the AI-predicted datasets, ranging from 0.0% (all but AGC) to 3.4% (AGC) for AF2 and 0.0% (all but AGC) to 1.7% (AGC) for ESMFold. Finally, in addition to the sole CODO model belonging to AGC generated by ESMFold, the only two Unassigned models belonged to the Other group, and the only three CIDO models belonged to TK. Taken together, our results suggest that the conformation bias seen across the kinome in the PDB (Fig 1B) was also observed at the group level for the PDB and both predictive methods, and that some groups are more conformationally diverse than others.

The AF2 algorithm includes a variety of parameters whose tuning can impact the predicted protein structural models. In particular, the number of sequences (‘depth’) in the multiple sequence alignment (MSA) used as input for AF2 correlates with the conformational diversity of the structural models generated of a given sequence [43], since the MSA generated by AF2 includes sequences evolutionarily related to the input kinase sequence. Therefore, to sample alternative kinase conformations, we used ColabFold [57] to run AF2 with various MSA depths as input (Methods). We generated five models for each kinase (default parameters), which were then classified into the same six conformations as earlier. We observed limited structural diversity in models generated at higher MSA depths of 512, 128, and 32, where most distributions appeared similar to those observed in the AF2 dataset, though only the distribution at a depth of 512 was statistically similar (Fig 2A; S1 Table). For example, at an MSA depth of 512, 76.8% of the models were in the CIDI conformation, while DFG-out states (CIDO and CODO) were under-represented (total of 1.7%).

PPT PowerPoint slide

PNG larger image

TIFF original image Download: Fig 2. Exploration of the kinome’s conformational space by lowering MSA depth when predicting structures using AF2. (A) Fractional distribution of models in various conformations across different MSA depths (left plot) and at an MSA depth of 8 (donut plot). (B) Distributions of models predicted at an MSA depth of 8 by kinase group. (C) Coverage and conformational space representation across the human kinome tree, generated with KinMap, for structures from the PDB, where each node represents one kinase. Colors (crimson, cyan, lime, magenta, blue and coral, in order) and increasing circle sizes indicate greater counts of unique conformations. (D) Coverage and conformational space representation of the human kinome using models predicted by AF2 at an MSA depth of 8. The node color and size coding are the same as in C. (E-G) Representative models predicted by AF2 at an MSA depth of 8 for kinases from different groups in novel conformations not seen in the PDB: (E) CLK3 in the CIDO conformation, (F) NEK9 in the CIDO conformation, and (G) MAP3K4 in the CODI conformation. Residues in lime green represent the ‘DF’ of the DFG motif (‘DY’ of DYG for NEK9), while residues in magenta represent the ɑC-Helix. Red and blue atoms indicate oxygen and nitrogen atoms, respectively. https://doi.org/10.1371/journal.pcbi.1012302.g002

Interestingly, we observed greater fractions of non-CIDI conformations in models generated at shallow MSAs (i.e., MSAs with fewer sequences) (Fig 2A). For instance, at an MSA depth of 8, the CIDI fraction of models was 45.2%, while the DFG-out states’ (CIDO and CODO) fraction was 12.3%. At lower MSA depths of 4 and 2, the models were even more diverse, respectively including only 15.7% and 3.2% of the models in the CIDI conformation. Furthermore, substantial fractions of the models (49.3% and 80.5%, respectively) were classified as Unassigned at these MSA depths. Statistical comparison of the fractional compositions of each conformation at each MSA depth to each other showed that they were all statistically different, but distributions at higher MSA depths (512 through 16) were more similar to that of the AF2 Database, while those of lower depths showed greater statistical difference from the AF2 Database (S2 Table). We attempted to enhance inactive conformation predictions (i) using custom MSAs and (ii) increasing the number of random seeds in as input to AF2: p(AF2 Database = MSA 256_32seeds) = 1; p(MSA 8_1seed = MSA 8_32seeds) = 1. However, these models exhibited similar conformational distributions to that observed in the AF2 Database (S5 Fig) (Methods).

Additionally, we observed that lowering MSA depth impacted the distribution of model conformations by kinase group (Fig 2B). First, the CIDI fraction at an MSA depth of 8 was much lower for all groups than the corresponding fractions in the AF2 Database and ranged from 29.6% (STE) to 71.7% (CK1) (Figs 1E and 2B). Second, the fractions of CIDO models in most groups was higher at an MSA depth of 8 (3.3% (CK1) to 12.1% (TK)) than the corresponding fractions in the AF2 Database (0.0% (many) to 7.8% (TK)). A similar trend was observed in CODO models for all groups (range of 1.7% (CK1) to 6.8% (STE) at an MSA depth of 8), except for AGC. The levels of statistical significance of all these comparisons are provided in the S3 Table.

To better understand these findings, we visualized the number of unique conformations for each kinase using KinMap [58], a tool for presenting data across the kinome, among PDB structures and AF2 (MSA depth = 8) models (Fig 2C and 2D, respectively). A comparison between these kinome trees demonstrated that lowering the MSA depth explored the conformational space of the kinome more broadly than the PDB and that, on average, an MSA depth of 8 yielded more conformations predicted per kinase over those with structures in the PDB. (p-value = 5.05 x 10−29). Part of this expansion were conformations not previously reported for several kinases. For instance, CLK3, NEK9, and MAP3K4, all from different kinase groups, had no experimentally determined structures in the PDB, but were predicted confidently in previously uncharacterized conformations (CIDI, CIDO, and CODI (CLK3); CIDI and CIDO (NEK9); CIDI, CODI, CODO, and DFGinter (MAP3K4)) by AF2. Collectively, the above data suggested that conformational exploration owing to different AF2 parameters is not limited to any one kinase group.

[END]
---
[1] Url: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012302

Published and (C) by PLOS One
Content appears here under this condition or license: Creative Commons - Attribution BY 4.0.

via Magical.Fish Gopher News Feeds:
gopher://magical.fish/1/feeds/news/plosone/