AI Protein Design Tutorial: Using AlphaFold and Diffusion Model to Design New PET Plastic Degradation Enzymes

Figure 1: PET plastic pollution-PET plastic products such as plastic bottles accumulate in the natural environment and are difficult to degrade naturally, causing serious pollution problems. PET (polyethylene terephthalate) has excellent properties and is widely used in daily necessities such as bottle fibers. However, its durability and non-degradable properties cause an environmental burden. While traditional plastics can take hundreds of years to decompose in nature, an enzyme called PETase can degrade PET into small molecules in a few days. In 2016, Japanese researchers discovered the bacterium Ideonella sakaiensis 201-F6, which can use PET as its main energy source, in soil near plastic recycling stations. The PETase enzyme secreted by the bacterium can degrade PET. This discovery triggered a scientific upsurge, with many studies focusing on revealing the mechanism of action of PETase and modifying and optimizing this enzyme to improve its degradation efficiency.

Subject Goal

The goal of this tutorial is to design a new type of PET-degrading enzyme that is more efficient than known PETase. In other words, we hope that through computer-assisted protein design, the structure of the enzyme will be more suitable for binding and hydrolyzing PET polymers. This involves introducing mutations or new sequences based on the known PETase, optimizing the active site of the enzyme to make it bind more tightly to the PET substrate and catalyze the reaction more efficiently. Finally, we will use simulation prediction to screen new enzyme candidates with potentially stronger activity for further experimental verification. It is important to design such new enzymes: efficient PET hydrolytic enzymes can be used to biodegrade and recycle plastics, alleviating plastic pollution problems. This project combines advanced methods such as protein structure prediction, sequence design and molecular docking to demonstrate how innovative bioengineering attempts can be made with AI tools such as AlphaFold and diffusion generation models.

In this tutorial, we will use the following open tool software to complete the new enzyme design: AlphaFold/ColabFold: AlphaFold is a protein structure prediction AI developed by DeepMind that predicts 3D structures based on amino acid sequences. ColabFold provides a simplified interface for running AlphaFold on Google Colab. We will use ColabFold to predict protein structures and compare them with known structures. It is easy to operate and suitable for beginners. ProteinMPNN: This is a deep learning sequence design tool developed by the Baker Laboratory of the University of Washington that quickly generates optimized amino acid sequences given the skeleton structure. Compared to traditional design methods, ProteinMPNN runs extremely fast (it takes only about 1 second to design a protein sequence) and has excellent results. We will use it to design new variant sequences for the PETase skeleton. DiffDock: A molecular docking tool based on a diffusion-generation model that predicts the binding conformation of small molecules (such as substrates or ligands) to proteins. Different from the traditional docking method with exhaustive scoring, DiffDock uses a diffusion model to gradually optimize and sample the ligand conformation to generate possible binding postures. It is superior to previous methods in docking accuracy and can evaluate the confidence of prediction. We will use DiffDock to simulate the binding of newly designed enzymes to small molecule fragments of PET. PyMOL/ChimeraX: Professional molecular visualization software for viewing and analyzing protein structures. We will use one of them (beginners can choose free and familiar ones, such as ChimeraX) to observe changes in the enzyme structure, docking results, and generate pictures for comparison analysis. The above tools are all open access or free versions and are suitable for beginners. In practice, ColabFold, ProteinMPNN, and DiffDock all have ready-made Colab Notebooks or open source code that can be run directly;PyMOL and ChimeraX also have friendly user interfaces or command lines to use.

Specific Steps

Next, we will follow the steps to explain how to use the above tools to design and analyze new PET-degrading enzymes. Each step will be explained in conjunction with the diagram.

Step 1: Obtain the known structure of PETase and perform structural prediction comparison.

First, we need to obtain the known structure of PETase enzyme as a reference. This example adopts the crystal structure of Ideonella sakaiensis PETase, with PDB ID of 5XJH. You can download the structure file for this ID through RCSB Protein Data Bank. The resolution of 5XJH is 1.5Å, revealing that PETase is a typical α/β hydrolase fold with the classic serine hydrolase catalytic triplet (Ser-His-Asp) located at the active center. Its active pocket can accommodate approximately 4 monomer (MHET) structural units of PET. After getting the crystal structure, we used ColabFold to predict the structure of the PETase sequence to verify the accuracy of AlphaFold prediction and practice the operation. This is done by entering the amino acid sequence of PETase in the Google Colab notebook provided by ColabFold (available from the PDB file or the UniProt database A0A0K8P6T7). Running ColabFold produces a prediction model for PETase, as well as information such as the credibility of each residue (pLDDT). Due to the high accuracy of AlphaFold, we expect the predicted structure to be highly consistent with the crystal structure. After completing the prediction, compare the AlphaFold model with the PDB crystal structure (PyMOL/ChimeraX align command can be used). The comparison results usually show that the RMSD of the two main chains is very small, proving that AlphaFold’s prediction of PETase is very accurate. This step not only familiarized us with the use of ColabFold, but also provided a starting point for subsequent design-we confirmed that the model was structurally reliable and could be used for downstream modifications.

Figure 2: Structural model of PETase enzyme-Schematic diagram of the crystal structure of PETase (5XJH), showing a typical α/β fold structure. The cyan helix and red origami in the figure represent the α-helix and β-fold of the protein, and the purple loop part is random coil. The active site contains the Ser-His-Asp triplet (not shown) located in the pocket at the top of the structure. This figure also shows a PET analogue (HEMT molecule, gray-black rod) bound in the active pocket of the enzyme. The PETase structure predicted by ColabFold should be highly consistent with this crystal structure. Figure 2 visually shows the overall configuration and substrate binding position of PETase. For beginners, this structure diagram will help understand where the active pockets are (the recessed area where the HEMT is located in the picture) and the key areas we need to focus on later.

Step 2: Design new enzyme sequences using ProteinMPNN

With the skeleton structure of PETase, we next used ProteinMPNN to design new sequences. ProteinMPNN can calculate and generate new amino acid sequences that adapt to the structure based on a given protein skeleton (three-dimensional coordinates). Simply put, it is to fix the skeleton and let AI “fill in the letters” to fill in a stable and potentially better functional protein sequence.

Figure 3: Principle of protein sequence design for fixed skeleton-The left side schematically shows the “Fixed-backbone design”, that is, given a protein structure, the amino acid sequence corresponding to the structure is automatically determined by the AI network. ProteinMPNN is a tool based on this idea. Compared to random mutation trial and error from scratch, AI design can efficiently explore sequence space and find solutions that are better than natural sequences.

In practice, we provide the structural coordinates of 5XJH as input to ProteinMPNN. You can run it through scripts provided by the ProteinMPNN project on GitHub, or using Colab Notebook. Usually we will ask ProteinMPNN to generate multiple candidate sequences and optimize them for the active site. For example, if we expect the enzyme pocket to have a stronger affinity for PET, we can try to have ProteinMPNN increase the pocket size or introduce more residues that can interact with the substrate (such as aromatic or positively charged amino acids to increase interaction with terephthalic acid rings and ester groups).

Running ProteinMPNN results in a series of new sequences. We screened out the sequences with the highest model scores as candidates. These sequences fit well with the original structure skeleton under evaluation by ProteinMPNN, indicating that they may fold stably. Next, we used ColabFold (or local version of AlphaFold) to predict the structure of these new sequences again, ensuring that the mutations introduced did not disrupt the overall folding. AlphaFold predicts that if it shows that the new sequence is still folding into a structure similar to PETase and has a high level of confidence (higher pLDDT), it can move to the next step. If certain mutations cause structural instability, we can abandon the sequence or adjust the design and run it again.

Suppose we select a newly designed sequence “PETase-variant1″(with several mutations at key sites), and AlphaFold predicts that its structure is highly similar to the original PETase, with only local differences in the active pocket region. We will now examine whether this difference may enhance binding and catalytic capabilities for PET substrates.

Step 3: Use DiffDock to simulate the binding of the new enzyme to the PET fragment

To evaluate the function of the newly designed enzyme, we need to test its binding to PET substrates. Since PET is a polymer, we usually select small molecules that can represent PET segments for docking simulation. For example, the basic repeating unit of PET is ethylene terephthalate, and we can use one to two units of molecules as substrate simulants. Commonly used are MHET (a single PET monomer, mono-2-hydroxyethyl terephthalate) or BHET (two linked PET monomers, ethylene glycol di-(terephthalate)) as substrate fragments. In fact, natural PETase decomposes PET to produce the main product MHET and a small amount of BHET.

In this tutorial, we use MHET analogues (such as HEMT in the 5XJH structure, methyl terephthalate with an ethylene glycol termination) as exemplary ligands. Now, DiffDock is used for molecular docking of new enzyme models: DiffDock does not require humans to specify binding sites, it uses blind docking to predict which pocket on the protein surface the ligand may bind to. The diffusion model behind it randomly perturbs and gradually optimizes the position, orientation and conformation of the ligand to generate a series of possible binding conformations. We applied DiffDock to the “PETase-variant1” structure and MHET substrate, setting up several (e.g., 20) candidate binding conformations.

The results output by DiffDock include multiple ligand-protein complex structures, as well as a confidence score for each predicted conformation. We expect that ideally, the active pocket of the new enzyme will be the most preferred binding site for the ligand. If the design is successful, DiffDock is likely to find a deep pit in the active center of the enzyme (near the triplet region) for MHET to enter and position it in the right direction (ester bonds towards the Ser residue). We selected compounds with high rankings and reasonable positions for analysis.

Figure 4: Schematic diagram of the binding of the enzyme surface to the substrate-The structure of the new enzyme is displayed as a surface model. The small gray molecule is a PET monomer analogue (MHET). It can be seen that the small molecules are firmly embedded in pockets on the surface of the enzyme (i.e., at the active site). This surface map helps to visually determine whether the ligand can enter the binding chamber well. A successfully designed enzyme should hold substrate in its active pocket without significant spatial conflicts. If the conformation generated by DiffDock is similar to that shown in Figure 4, it means that the substrate can smoothly enter the active center of the new enzyme and be localized.

Using PyMOL/ChimeraX, we further examined the details of the interaction between the new enzyme and the substrate: for example, whether the aromatic ring of the substrate forms a stack with the hydrophobic residues of the enzyme, whether there is a positively charged lysine/arginine near the ester bond to form a hydrogen bond stable transition state, etc. If our design goals include expanding the pocket to accommodate longer PET segments, we can also try docking with BHET to see if the new pocket can accommodate two units of molecules.

Step 4: Visualize structural changes and binding patterns with PyMOL/ChimeraX

After getting the docking model, we used PyMOL or ChimeraX to compare the structure of the original PETase with the newly designed enzyme to understand key differences. Open two protein structures simultaneously in visualization software and superimpose their main backbone skeletons. You can highlight the locations where we introduced mutations and observe the changes in side chain orientation at these sites. For example, if we replace a residue with a small glycine with a large tryptophan, we may see that the new enzyme forms a prominent side chain there that can interact with the substrate ring. Such changes are exactly the mechanism we expect to increase enzyme-substrate affinity.

We then loaded the new enzyme-MHET complex predicted by DiffDock and compared it with the original PETase-HEMT complex (such as the ligand that comes with PDB5XH3). By looking at the two side by side or alternately, we focus on the following:

Substrate positioning: Is the substrate of the new enzyme similar to the position in the original enzyme? Is there any deviation? If the new enzyme design increases interactions around the pockets, you may see the substrate being “pulled” deeper or binding at different angles.
Interaction network: Observe whether more hydrogen bonds or salt bridges have been formed at the active site of the new enzyme. PyMOL can show a range of hydrogen bonds around ligands, helping us confirm whether new residues are involved in substrate binding.
Pocket shape change: Use surface view comparison to see how the pocket size and shape of the new enzyme is different. We can calculate the surface area or volume of the substrate binding pocket (which some plug-ins can do) to verify whether the design widens/shrinks the binding cavity.

Through these visual analyses, we can intuitively determine whether the new design meets expectations. For example, if you see substrates bound more tightly in a new enzyme and key reaction sites closer to catalytic Ser, this suggests potentially higher catalytic efficiency. If instead substrate binding deteriorates or cannot enter the pocket, you need to rethink the design idea and possibly try other mutation options.

Step 5: Screening for new enzymes with potential high activity

Based on the above prediction results, we finally screened the most promising enzyme from the designed candidates. Screening criteria include:

Structural reliability: The new enzyme structure predicted by AlphaFold is generally higher, with no significant disorder regions, indicating that the sequence design is reasonable and the protein can fold stably.
Substrate binding: DiffDock docking shows that the substrate does enter the enzyme’s active pocket and approach the catalytic residue in a reasonable manner. At the same time, the new enzyme forms more or stronger interactions with the substrate (such as additional hydrogen bonds, hydrophobic interactions), implying a higher affinity.
Pocket matching: The shape and size of the new enzyme activity chamber is just right for the substrate, without significant gaps and does not appear to be overcrowded. Appropriate induced fit is conducive to catalysis.
Multi-candidate comparison: If we design multiple variants, we will give priority to the one or more with the best docking score and meeting the above conditions. Other properties of the enzyme are also considered, such as whether mutations may improve thermal stability (this cannot be given by AlphaFold, but we can speculate based on whether additional salt bridges/hydrophobic cores are introduced).

The new enzyme candidates finally selected are worthy of synthesis and testing in the laboratory. In reality, we will clone and express these candidate genes, measure their degradation rate on PET, and verify model predictions. Although computer design cannot 100% guarantee the success of the experiment, through the method of this tutorial, we have greatly narrowed the scope of candidates and provided evidence-based transformation ideas.

Summary and Outlook

This tutorial uses a step-by-step example to demonstrate how to design new enzymes in conjunction with protein structure prediction (AlphaFold/ColabFold), artificial intelligence sequence design (ProteinMPNN), and molecular docking simulation (DiffDock). Taking PET degrading enzyme as an example, we clarified the idea of using computing methods to solve practical bioengineering challenges from environmental issues:

Understand the background: Identify problems and directions for improvement (increase PETase activity)
Obtaining models: Using AlphaFold to obtain high-precision structures as the basis for design
Intelligent design: Explore infinite new sequence possibilities on the skeleton with ProteinMPNN
Functional verification: Predicting the functional expression of new sequences (substrate binding) through tools such as DiffDock
Visual analysis: Use molecular graphics software to interpret model results and guide decision-making

This process is a powerful example for newcomers to protein design. All software used is open and community-supported. Beginners can confidently try and deepen their understanding of protein structure and function through repeated practice. It is worth mentioning that with the development of technology, similar methods can also be applied to the transformation of other environmental pollution-degrading enzymes and drug-target enzymes. Through computational design, we hope to develop more efficient and stable biological enzymes to solve practical problems. For example, in addition to improving PETase, researchers are already trying to design high-temperature PET-degrading enzymes to adapt to industrial applications.

All in all, with the power of AI such as AlphaFold, we are entering a new era of biomolecule design. With these tools, novices can also make meaningful explorations in the initial stage of scientific research. I hope this tutorial will inspire and help you take the first step in protein design and contribute to creating a cleaner and greener future!

References:

Joo S. et al. (2018) Nat Commun 9: Structure analysis of the mechanism of action of PET hydrolase
PETase -Introduction to PET hydrolase (Wikipedia)
Dauparas J. et al. (2022) Science: ProteinMPNN algorithm for efficient protein sequence design
Interpretation of DiffDock papers-MIT News (2022): Application of diffusion models in molecular docking improves the accuracy of docking predictions

The world’s first AI designed and open-source CRISPR gene editor has successfully edited human DNA

Frontier | FDA approved new gene editing therapies for the treatment of OTC deficiency in clinical practice; LSTA1, a new drug for the treatment of osteosarcoma, has been recognized as an orphan drug

Gene editing therapy: Is Base editing and Prime editing technology the future?

Avantium stops its bio based polyol business and focuses on FDCA and PEF!

EU funded project to develop 24 bio based products using enzymes and strains from olive leaves

Ginkgo announces Q3 financial report: $55 million recorded, establishing new partnerships with Pfizer and Google

AI Protein Design Tutorial: Using AlphaFold and Diffusion Model to Design New PET Plastic Degradation Enzymes

Subject Goal

Specific Steps

Step 1: Obtain the known structure of PETase and perform structural prediction comparison.

Step 2: Design new enzyme sequences using ProteinMPNN

Step 3: Use DiffDock to simulate the binding of the new enzyme to the PET fragment

Step 4: Visualize structural changes and binding patterns with PyMOL/ChimeraX

Step 5: Screening for new enzymes with potential high activity

Summary and Outlook

References:

TABLE OF CONTENTS

Subject Goal

Specific Steps

Step 1: Obtain the known structure of PETase and perform structural prediction comparison.

Step 2: Design new enzyme sequences using ProteinMPNN

Step 3: Use DiffDock to simulate the binding of the new enzyme to the PET fragment

Step 4: Visualize structural changes and binding patterns with PyMOL/ChimeraX

Step 5: Screening for new enzymes with potential high activity

Summary and Outlook

References:

Related News

Share

TABLE OF CONTENTS

Subscribe