Figure 1: PET plastic pollution-PET plastic products such as plastic bottles accumulate in the natural environment and are difficult to degrade naturally, causing serious pollution problems. PET (polyethylene terephthalate) has excellent properties and is widely used in daily necessities such as bottle fibers. However, its durability and non-degradable properties cause an environmental burden. While traditional plastics can take hundreds of years to decompose in nature, an enzyme called PETase can degrade PET into small molecules in a few days. In 2016, Japanese researchers discovered the bacterium Ideonella sakaiensis 201-F6, which can use PET as its main energy source, in soil near plastic recycling stations. The PETase enzyme secreted by the bacterium can degrade PET. This discovery triggered a scientific upsurge, with many studies focusing on revealing the mechanism of action of PETase and modifying and optimizing this enzyme to improve its degradation efficiency.
Subject Goal
The goal of this tutorial is to design a new type of PET-degrading enzyme that is more efficient than known PETase. In other words, we hope that through computer-assisted protein design, the structure of the enzyme will be more suitable for binding and hydrolyzing PET polymers. This involves introducing mutations or new sequences based on the known PETase, optimizing the active site of the enzyme to make it bind more tightly to the PET substrate and catalyze the reaction more efficiently. Finally, we will use simulation prediction to screen new enzyme candidates with potentially stronger activity for further experimental verification. It is important to design such new enzymes: efficient PET hydrolytic enzymes can be used to biodegrade and recycle plastics, alleviating plastic pollution problems. This project combines advanced methods such as protein structure prediction, sequence design and molecular docking to demonstrate how innovative bioengineering attempts can be made with AI tools such as AlphaFold and diffusion generation models.
In this tutorial, we will use the following open tool software to complete the new enzyme design: AlphaFold/ColabFold: AlphaFold is a protein structure prediction AI developed by DeepMind that predicts 3D structures based on amino acid sequences. ColabFold provides a simplified interface for running AlphaFold on Google Colab. We will use ColabFold to predict protein structures and compare them with known structures. It is easy to operate and suitable for beginners. ProteinMPNN: This is a deep learning sequence design tool developed by the Baker Laboratory of the University of Washington that quickly generates optimized amino acid sequences given the skeleton structure. Compared to traditional design methods, ProteinMPNN runs extremely fast (it takes only about 1 second to design a protein sequence) and has excellent results. We will use it to design new variant sequences for the PETase skeleton. DiffDock: A molecular docking tool based on a diffusion-generation model that predicts the binding conformation of small molecules (such as substrates or ligands) to proteins. Different from the traditional docking method with exhaustive scoring, DiffDock uses a diffusion model to gradually optimize and sample the ligand conformation to generate possible binding postures. It is superior to previous methods in docking accuracy and can evaluate the confidence of prediction. We will use DiffDock to simulate the binding of newly designed enzymes to small molecule fragments of PET. PyMOL/ChimeraX: Professional molecular visualization software for viewing and analyzing protein structures. We will use one of them (beginners can choose free and familiar ones, such as ChimeraX) to observe changes in the enzyme structure, docking results, and generate pictures for comparison analysis. The above tools are all open access or free versions and are suitable for beginners. In practice, ColabFold, ProteinMPNN, and DiffDock all have ready-made Colab Notebooks or open source code that can be run directly;PyMOL and ChimeraX also have friendly user interfaces or command lines to use.
Specific Steps
Next, we will follow the steps to explain how to use the above tools to design and analyze new PET-degrading enzymes. Each step will be explained in conjunction with the diagram.
Step 1: Obtain the known structure of PETase and perform structural prediction comparison.
First, we need to obtain the known structure of PETase enzyme as a reference. This example adopts the crystal structure of Ideonella sakaiensis PETase, with PDB ID of 5XJH. You can download the structure file for this ID through RCSB Protein Data Bank. The resolution of 5XJH is 1.5Å, revealing that PETase is a typical α/β hydrolase fold with the classic serine hydrolase catalytic triplet (Ser-His-Asp) located at the active center. Its active pocket can accommodate approximately 4 monomer (MHET) structural units of PET. After getting the crystal structure, we used ColabFold to predict the structure of the PETase sequence to verify the accuracy of AlphaFold prediction and practice the operation. This is done by entering the amino acid sequence of PETase in the Google Colab notebook provided by ColabFold (available from the PDB file or the UniProt database A0A0K8P6T7). Running ColabFold produces a prediction model for PETase, as well as information such as the credibility of each residue (pLDDT). Due to the high accuracy of AlphaFold, we expect the predicted structure to be highly consistent with the crystal structure. After completing the prediction, compare the AlphaFold model with the PDB crystal structure (PyMOL/ChimeraX align command can be used). The comparison results usually show that the RMSD of the two main chains is very small, proving that AlphaFold’s prediction of PETase is very accurate. This step not only familiarized us with the use of ColabFold, but also provided a starting point for subsequent design-we confirmed that the model was structurally reliable and could be used for downstream modifications.
Figure 2: Structural model of PETase enzyme-Schematic diagram of the crystal structure of PETase (5XJH), showing a typical α/β fold structure. The cyan helix and red origami in the figure represent the α-helix and β-fold of the protein, and the purple loop part is random coil. The active site contains the Ser-His-Asp triplet (not shown) located in the pocket at the top of the structure. This figure also shows a PET analogue (HEMT molecule, gray-black rod) bound in the active pocket of the enzyme. The PETase structure predicted by ColabFold should be highly consistent with this crystal structure. Figure 2 visually shows the overall configuration and substrate binding position of PETase. For beginners, this structure diagram will help understand where the active pockets are (the recessed area where the HEMT is located in the picture) and the key areas we need to focus on later.
Step 2: Design new enzyme sequences using ProteinMPNN
With the skeleton structure of PETase, we next used ProteinMPNN to design new sequences. ProteinMPNN can calculate and generate new amino acid sequences that adapt to the structure based on a given protein skeleton (three-dimensional coordinates). Simply put, it is to fix the skeleton and let AI “fill in the letters” to fill in a stable and potentially better functional protein sequence.
Figure 3: Principle of protein sequence design for fixed skeleton-The left side schematically shows the “Fixed-backbone design”, that is, given a protein structure, the amino acid sequence corresponding to the structure is automatically determined by the AI network. ProteinMPNN is a tool based on this idea. Compared to random mutation trial and error from scratch, AI design can efficiently explore sequence space and find solutions that are better than natural sequences.
In practice, we provide the structural coordinates of 5XJH as input to ProteinMPNN. You can run it through scripts provided by the ProteinMPNN project on GitHub, or using Colab Notebook. Usually we will ask ProteinMPNN to generate multiple candidate sequences and optimize them for the active site. For example, if we expect the enzyme pocket to have a stronger affinity for PET, we can try to have ProteinMPNN increase the pocket size or introduce more residues that can interact with the substrate (such as aromatic or positively charged amino acids to increase interaction with terephthalic acid rings and ester groups).
Running ProteinMPNN results in a series of new sequences. We screened out the sequences with the highest model scores as candidates. These sequences fit well with the original structure skeleton under evaluation by ProteinMPNN, indicating that they may fold stably. Next, we used ColabFold (or local version of AlphaFold) to predict the structure of these new sequences again, ensuring that the mutations introduced did not disrupt the overall folding. AlphaFold predicts that if it shows that the new sequence is still folding into a structure similar to PETase and has a high level of confidence (higher pLDDT), it can move to the next step. If certain mutations cause structural instability, we can abandon the sequence or adjust the design and run it again.
Suppose we select a newly designed sequence “PETase-variant1″(with several mutations at key sites), and AlphaFold predicts that its structure is highly similar to the original PETase, with only local differences in the active pocket region. We will now examine whether this difference may enhance binding and catalytic capabilities for PET substrates.
Step 3: Use DiffDock to simulate the binding of the new enzyme to the PET fragment
To evaluate the function of the newly designed enzyme, we need to test its binding to PET substrates. Since PET is a polymer, we usually select small molecules that can represent PET segments for docking simulation. For example, the basic repeating unit of PET is ethylene terephthalate, and we can use one to two units of molecules as substrate simulants. Commonly used are MHET (a single PET monomer, mono-2-hydroxyethyl terephthalate) or BHET (two linked PET monomers, ethylene glycol di-(terephthalate)) as substrate fragments. In fact, natural PETase decomposes PET to produce the main product MHET and a small amount of BHET.
In this tutorial, we use MHET analogues (such as HEMT in the 5XJH structure, methyl terephthalate with an ethylene glycol termination) as exemplary ligands. Now, DiffDock is used for molecular docking of new enzyme models: DiffDock does not require humans to specify binding sites, it uses blind docking to predict which pocket on the protein surface the ligand may bind to. The diffusion model behind it randomly perturbs and gradually optimizes the position, orientation and conformation of the ligand to generate a series of possible binding conformations. We applied DiffDock to the “PETase-variant1” structure and MHET substrate, setting up several (e.g., 20) candidate binding conformations.
The results output by DiffDock include multiple ligand-protein complex structures, as well as a confidence score for each predicted conformation. We expect that ideally, the active pocket of the new enzyme will be the most preferred binding site for the ligand. If the design is successful, DiffDock is likely to find a deep pit in the active center of the enzyme (near the triplet region) for MHET to enter and position it in the right direction (ester bonds towards the Ser residue). We selected compounds with high rankings and reasonable positions for analysis.
Step 4: Visualize structural changes and binding patterns with PyMOL/ChimeraX
- Substrate positioning: Is the substrate of the new enzyme similar to the position in the original enzyme? Is there any deviation? If the new enzyme design increases interactions around the pockets, you may see the substrate being “pulled” deeper or binding at different angles.
- Interaction network: Observe whether more hydrogen bonds or salt bridges have been formed at the active site of the new enzyme. PyMOL can show a range of hydrogen bonds around ligands, helping us confirm whether new residues are involved in substrate binding.
- Pocket shape change: Use surface view comparison to see how the pocket size and shape of the new enzyme is different. We can calculate the surface area or volume of the substrate binding pocket (which some plug-ins can do) to verify whether the design widens/shrinks the binding cavity.
Step 5: Screening for new enzymes with potential high activity
- Structural reliability: The new enzyme structure predicted by AlphaFold is generally higher, with no significant disorder regions, indicating that the sequence design is reasonable and the protein can fold stably.
- Substrate binding: DiffDock docking shows that the substrate does enter the enzyme’s active pocket and approach the catalytic residue in a reasonable manner. At the same time, the new enzyme forms more or stronger interactions with the substrate (such as additional hydrogen bonds, hydrophobic interactions), implying a higher affinity.
- Pocket matching: The shape and size of the new enzyme activity chamber is just right for the substrate, without significant gaps and does not appear to be overcrowded. Appropriate induced fit is conducive to catalysis.
- Multi-candidate comparison: If we design multiple variants, we will give priority to the one or more with the best docking score and meeting the above conditions. Other properties of the enzyme are also considered, such as whether mutations may improve thermal stability (this cannot be given by AlphaFold, but we can speculate based on whether additional salt bridges/hydrophobic cores are introduced).
The new enzyme candidates finally selected are worthy of synthesis and testing in the laboratory. In reality, we will clone and express these candidate genes, measure their degradation rate on PET, and verify model predictions. Although computer design cannot 100% guarantee the success of the experiment, through the method of this tutorial, we have greatly narrowed the scope of candidates and provided evidence-based transformation ideas.
Summary and Outlook
This tutorial uses a step-by-step example to demonstrate how to design new enzymes in conjunction with protein structure prediction (AlphaFold/ColabFold), artificial intelligence sequence design (ProteinMPNN), and molecular docking simulation (DiffDock). Taking PET degrading enzyme as an example, we clarified the idea of using computing methods to solve practical bioengineering challenges from environmental issues:
- Understand the background: Identify problems and directions for improvement (increase PETase activity)
- Obtaining models: Using AlphaFold to obtain high-precision structures as the basis for design
- Intelligent design: Explore infinite new sequence possibilities on the skeleton with ProteinMPNN
- Functional verification: Predicting the functional expression of new sequences (substrate binding) through tools such as DiffDock
- Visual analysis: Use molecular graphics software to interpret model results and guide decision-making
References:
- Joo S. et al. (2018) Nat Commun 9: Structure analysis of the mechanism of action of PET hydrolase
- PETase -Introduction to PET hydrolase (Wikipedia)
- Dauparas J. et al. (2022) Science: ProteinMPNN algorithm for efficient protein sequence design
- Interpretation of DiffDock papers-MIT News (2022): Application of diffusion models in molecular docking improves the accuracy of docking predictions