Improving Protein Stability and Mutation Based on Machine Learning Guidance

Engineering proteins that enhance stability are widely used in the development of biological therapies or diagnostic reagents, as well as in other commercial and industrial applications. For this reason, many related experimental and computational methods have been developed, such as directed evolution, rational design, ancestor sequence reconstruction, and molecular dynamics simulation, which stabilize proteins by introducing favorable biophysical properties at appropriate positions in the protein structure. Another method for identifying stable mutations is consensus mutations, which identify the most conserved and common amino acids present at each position in a protein sequence by comparing homologs. This method assumes that the most common amino acids at each position will lead to the most stable protein, and compares the resulting consistent sequence with the protein sequence to be stabilized to determine potential substitutions. However, the assumption that evolutionary fitness is equivalent to protein thermal stability is not always correct. Additionally, using only the characteristics of amino acids found at each location without considering their biophysical properties does not fully utilize all the inherent information in amino acid properties.

Figure 1. Machine learning enhanced target protein thermal stability (MEnTaT) flowchart

 

On December 1, 2023, Professor Adrian Whitty and Professor Karen N. Allen from the Department of Chemistry at Boston University in the United States published a research paper titled “MEnTaT: A machine learning approach for the identification of mutations to increase protein stability” in the journal PNAS. The team developed a machine learning method called MEnTaT to enhance protein stability, Identifying amino acid substitutions that contribute to improved thermal stability by comparing the amino acid sequences of homologous proteins from bacteria growing at different temperatures. This method is not simply based on the properties of amino acids for sequence comparison, but on the structure and physicochemical properties of side chains. Currently, stable substitutions have been accurately identified in three well studied systems, and experimental verification has been conducted to enhance the stability of polyamine oxidase. This method is superior to the widely used bioinformatics consensus methods, and also helps to gain a deeper understanding of protein structure and evolution.

 

Figure 2. MEnTaT results of Bacillus subtilis adenosine kinase Adk

 

Based on the protein stability study of Bacillus subtilis adenosine kinase Adk by Kern et al. and the published dataset, researchers conducted a retrospective evaluation of Adk using MEnTaT. Firstly, 187 bacterial Adk sequences were collected from the host organism, including approximately 1:1 thermophilic and thermophilic bacteria. After screening, the sequences were used to construct an input matrix, and the output was the stability classification of the host organism’s growth temperature range for each sequence. The results indicate that the model explains 96% of the differences in growth temperatures among host organisms. Next, the projection variable importance (VIP) method was used to determine which specific sequence positions and properties in the model contribute the most to model recognition. A total of 8 loci with the greatest impact on Adk were identified, of which 3 have been experimentally tested as stable mutation positions, and 3 have shown potential for improving stability structurally.

 

Figure 3. Using subsets of the Adk dataset to determine the range and limitations of MEnTaT

 

Next, researchers evaluated the accuracy of the top 10 predictions made using different subsets of the Adk sequence set to assess how the performance of MEnTaT varies with the number of homologous sequences used, the ratio of mesophilic to thermophilic sequences, and the degree of consistency between sequences. Afterwards, the performance of the five amino acid descriptors used in the model was analyzed, and the number of times each descriptor was recognized in the highest VIP score was calculated, as well as the number of times the resulting substitution was validated as stable in the literature, and compared with those predicted as unlikely to increase heating stability. The results indicate that in over 60% of instances, descriptors TFE and GFE are associated with effective predictions; The performance of KDH and charge is also quite good, with over 50% of instances associated with effective hits; However, descriptor GSI often leads to false positives and should be removed from the model. In order to further improve prediction accuracy, the introduction of β – Thin slice tendency (BST) is used as the fifth descriptor for subsequent analysis.

 

Figure 4. Comparison of cold and hot sequence models

 

At present, it is not clear whether the molecular mechanism for adapting to low growth temperatures (<15 ℃) is the same as that for adapting to high growth temperatures. Therefore, to test whether homologs including psychrophilic bacteria will affect the ability of MEnTaT to predict stable substitution, researchers added 13 Adk homologs from psychrophilic bacteria to the complete sequence dataset and generated a new subset of Adk sequences, To compare the results of including thermophilic and thermophilic sequences in the MEnTaT model. Compared with the original method, the analysis results of using cold sequence to replace or supplement hot sequence are similar to the original method, and may even be slightly better. These results indicate that in MEnTaT, thermophilic homologs can be used together with or instead of thermophilic homologs. Although stable substitutions determined using cold derived sequences are different from those determined using hot derived sequences, they can still increase the thermal stability of proteins.

 

Figure 5. Mutation prediction of polyamine oxidase (PAO) stability enhancement using MEnTaT

 

Due to the aggregation and instability of polyamine oxidase PAO from Micrococcus luteus after recombinant expression in vitro, researchers used MEnTaT to promote its structure and crystal research by increasing protein stability. The input dataset includes 23 homologous sequences from psychrophilic organisms and 28 homologous sequences from mesophilic organisms. Six mutations were selected for cloning, expression, and purification, and compared with the wild-type using differential scanning fluorescence. The results showed that among the four highest ranked mutations, three played a stabilizing role as proteins, and compared with the wild-type, their Tm increased by 2-8 ° C; The two mutations selected at the lower positions in the VIP table did not significantly alter Tm. Meanwhile, researchers applied consensus methods to the same PAO sequence, but none of the three stable mutations identified by MEnTaT appeared in the first 25 mutations predicted by consensus methods. These results indicate that MEnTaT has been very successful in identifying stable mutations in the PAO of Micrococcus luteus, and can identify stable mutations that were overlooked by consensus methods.

In summary, MEnTaT is a simple, fast, and reliable method for identifying stable mutations in bacterial proteins. This method considers the optimal growth temperature of homologous protein bacterial sources and the physical and chemical properties of amino acid sequences, and has high accuracy. At the same time, this method also helps to promote protein crystallinity, improve the expression and purification of unstable proteins, and provides a tool for exploring how proteins can adapt to other environmental factors such as high salinity or low pH, with broad application potential and value.

Related recommendations

Protein Engineering and Optimization

Site-Directed Mutagenesis

Protein Structure Determination