Research progress in DNA synthesis, assembly, and error correction techniques

DNA design and synthesis are key common underlying technologies that drive the development of life sciences and related fields. Conventional genetic manipulation techniques can only make limited modifications to existing DNA sequences, while DNA synthesis techniques can “write” life information from scratch, enhancing our ability to understand, predict, and manipulate living organisms from another perspective. DNA synthesis technology includes oligonucleotide synthesis technology, DNA assembly technology, and DNA error correction technology. This article summarizes the characteristics and development trends of the above key technologies. After more than 60 years of development, chemical synthesis is still the mainstream method for oligonucleotide synthesis. It is widely used in column and chip DNA synthesizers, and enzymatic DNA synthesis technology is expected to overturn traditional DNA chemical synthesis methods; The existing DNA synthesis techniques have limitations in synthesis ability and accuracy, making it difficult to directly and accurately synthesize DNA fragments of gene length. The reasonable combination of graded in vitro and in vivo assembly techniques can assemble segmented synthesized oligonucleotide fragments into long segments of DNA, achieving the synthesis of gene length and even genome length DNA sequences. Therefore, it has become the key to long segment DNA synthesis; The synthesis and assembly process of oligonucleotides inevitably introduce errors. The application of error correction techniques based on mismatch binding or mismatch cleavage at different stages of DNA synthesis can not only improve the accuracy of DNA synthesis, but also effectively reduce the quality control cost of long segment DNA synthesis. In recent years, the rapid development of synthetic biology and other related fields has put forward new requirements for DNA synthesis related technologies, which are driving the continuous improvement and innovation of DNA synthesis, assembly, and error correction related technologies towards high throughput, automation, and integration.


Synthetic biology is the third biotechnology revolution following the discovery of DNA double helix and the implementation of genomics by the Human Genome Project. DNA synthesis technology is one of the core enabling technologies in synthetic biology. With the deepening of our understanding of life systems, the redesign and creation of life systems has become the most imaginative and dynamic research field in the field of biology. The design and synthesis of large-scale genomic DNA endows us with the ability to modify cellular functions and even create artificial life, which helps improve our understanding, prediction, and manipulation of living organisms.

Since the 1950s, a large number of researchers have attempted to synthesize DNA through chemical and enzymatic methods. The first to achieve success is chemical synthesis technology. After years of optimization and improvement, DNA chemical synthesis has undergone a transformation and development from columnar synthesis to chip synthesis, and has been widely applied in the market. However, the coupling efficiency and side reactions of existing chemical methods limit the length of oligonucleotide synthesis to 200-300 nt, making it difficult to reach gene lengths at the kb level. Therefore, longer fragments require assembly techniques to splice oligonucleotide fragments until DNA of gene, chromosome, or genome length is obtained. However, the synthesis of oligonucleotide fragments and DNA assembly process can result in many errors, reducing the accuracy of long DNA fragments. The application of error correction technology can remove a large number of errors introduced in DNA synthesis and assembly processes, thereby reducing the cost of screening and sequencing correct DNA fragments. The author of this article will focus on summarizing the research progress related to DNA synthesis, assembly, and error correction technology, in order to promote the innovation and development of DNA synthesis related technologies in China.

1 DNA synthesis technology

1.1 Oligonucleotide Column Synthesis Technology

The chemical synthesis of oligonucleotides began in the 1950s. In the 1980s, a phosphoramide triester chemical synthesis method was developed and applied to column synthesis. In the 1990s, it was also applied to chip based high-throughput synthesis technology. The synthesis method of phosphoramide triesters consists of four chemical reactions: deprotection, coupling, capping, and oxidation, forming a cycle. Controlled synthesis is achieved by stepwise activation of the chemically active protective groups bound to the 3 ‘and 5’ positions of nucleotides. Oligonucleotide chains are synthesized by extending them one by one from the 3 ‘to 5’ direction on a solid-phase carrier.

[1- deprotection, using trichloroacetic acid to remove the DMT (dimethyltrityl) protecting group at the 5 ‘position of the nucleotide on the solid-phase carrier, obtaining the free 5’ hydroxyl group for the next condensation reaction; 2- coupling, mixing the nucleotide monomer protected by phosphoramide with a tetrazole activator to form the phosphoramide tetrazole active intermediate (whose 3 ‘end has been activated, but the 5’ end is still protected by DMT) , undergoes a condensation reaction with the 5 ‘hydroxyl group generated after deprotection, and the oligonucleotide chain extends forward by one nucleotide; 3- Capping, after condensation reaction, block the 5 ‘hydroxyl group that did not participate in the reaction through acetylation to prevent it from being extended in subsequent cyclic reactions, in order to reduce the proportion of single nucleotide missing fragments in the product; 4- Oxidation, using iodine tetrahydrofuran solution to oxidize unstable phosphorous amide triesters connected to 3 ‘and 5’ to phosphate triesters, obtaining stable oligonucleotide chains

A columnar DNA synthesis technique based on the synthesis of phosphoramide triesters, using a synthesis column filled with porous glass (CPG) or polystyrene (PS) sieve plates as a solid-phase carrier, with protected monomers gradually activated by chemical reagents and sequentially added to the initiator fixed on the synthesis column. At present, the coupling efficiency of commercialized columnar DNA synthesis reactions can reach 98%~99.8%, with an error rate of about 1/600 nt. The yield is generally within 1 µ mol, and a single cycle takes 6-8 minutes. After weighing efficiency and cost, the longest synthetic length is usually controlled around 100 nt. If the synthesis column is fixed on a porous loading plate, the single round synthesis flux can be increased to 1536, the average coupling efficiency is 99.5%, the error rate is about 1.53/717 nt, and the cost is about 0.277 cents per base.

Since the development of columnar DNA synthesis technology, automated synthesis equipment has matured. It can conveniently and flexibly extract any oligonucleotide fragment required for synthesizing a certain gene segment, meeting the requirements of general experiments. With the increasing demand for large-scale gene and genome synthesis in the field of synthetic biology, the shortcomings of column synthesis, such as weak sustained synthesis ability, high chemical reagent consumption, multiple side reactions, and low throughput, are becoming increasingly prominent. In order to break through the limitations of low efficiency, low throughput, and high cost, research seeking the sustainable development of DNA synthesis technology is mainly committed to: ① developing integrated chips with high reaction flux and multiple functions as solid-phase carriers for parallel synthesis of oligonucleotides; ② Develop enzyme catalyzed oligonucleotide synthesis technology based on template free single stranded DNA synthesis.

1.2 Chip DNA synthesis technology

Chips can serve as solid-phase carriers for DNA synthesis, conducting synthesis reactions at specific surface sites in a high-density and integrated manner, thereby achieving high-throughput synthesis while saving reagents. The chip DNA synthesis technology is still based on the four step reaction cycle of phosphoramide synthesis, but different “deprotection” fixed-point control methods have been used to develop DNA synthesis platforms such as photolithography synthesis, electrochemical synthesis, and inkjet printing synthesis.

The earliest photolithography synthesis achieved ordered addition of nucleotides at different synthesis sites by precisely controlling the projection of light onto the surface of the chip, decomposing photosensitive protective groups or photocatalysts to produce acid for deprotection. According to the lighting control method, it can be further divided into mask lithography synthesis (used by Affimetrix) and mask free lithography synthesis (used by NimbleGen and LC Sciences). Electrochemical synthesis utilizes electrochemical reactions to generate acids and control the addition of conventional phosphoramide monomers on polymer membranes, which has been adopted by CustomiArray (acquired by Genscript). Agilent’s inkjet printing synthesis involves catalyzing deprotection by spraying acid solutions or reagents onto the reaction site.

The flux of DNA synthesis on the chip is high, and through precise automation control, a single microreactor on the chip can perform synthesis reactions in a skin upgraded reaction system. Different chip synthesis fluxes at 3 × 103~3 × Between 106, the synthetic density ranges from 105 to 106/cm2, with a cost as low as 0.001 to 0.1 cents per base. Compared with columnar synthesis, chip synthesis method has an absolute advantage in large-scale gene synthesis. But the chip manufacturing process is complex, and with the increase of synthesis density, there are also high requirements for chip production technology and automation control technology of synthesizers. In addition, chip synthesis also faces issues such as edge effects, incomplete deprotection reactions, and multiple side reactions such as purine removal due to misplacement at designated sites or poor reagent isolation. These factors directly affect the integrity and correctness of the synthesized sequence, resulting in an efficiency of 90%~99% for DNA chip synthesis and a length limit of 25-200 nt for synthesis. The yield of oligonucleotides synthesized by a single trace reaction system on the chip is low (about 10-15 mol), and the error rate is higher than that of column synthesis, making it difficult to separate and purify them separately. Therefore, it is not suitable for single gene or small-scale gene, conventional probe, and primer synthesis.

1.3 Enzyme catalyzed DNA synthesis technology

The extensive use of toxic, flammable, and unstable organic reagents in DNA chemical synthesis results in low environmental friendliness, leading to a renewed focus on biosynthesis. Conventional DNA polymerase is template dependent and cannot be used for de novo DNA synthesis. Finding non template dependent DNA synthetases has become the primary task in the development of enzymatic DNA synthesis technology. In addition, adding specific nucleotides one by one to the enzyme under controlled conditions is another challenge in enzymatic oligonucleotide synthesis technology.

Mackey and Gilham used nucleotide phosphorylase (PNPase) to introduce 5 ‘- diphosphate-2’ – O with a 2 ‘terminal block-( α- Methoxyethyl) nucleotide was coupled to the 3 ‘end of the oligonucleotide primer and sequenced to synthesize an oligoribonucleotide chain. Gillam and Smith synthesized oligodeoxyribonucleotide chains using the same method. England and Uhlenbeck first used the ligation method for DNA synthesis, coupling 5 ‘, 3’ – ribonucleoside diphosphate substrates to the 3 ‘end of the initiator chain using T4 RNA ligase to synthesize oligonucleotide chains. In 1999, T4 RNA ligase was first used in solid-phase enzymatic DNA synthesis. Although PNPase and T4 RNA ligases can synthesize RNA and DNA, their coupling efficiency is low, single cycle time is long, and the limitations of using them for DNA synthesis technology outweigh their availability.

Bollum first discovered the terminal deoxyribonucleotide transferase (TdT) and proposed in 1962 that TdT can be used for single stranded oligonucleotide synthesis. In 1984, Schott and Schrade added dNTPs to trigger chains of different lengths using TdT. Research has found that TdT has small preference differences for four nucleotides and high coupling efficiency. Continuous synthesis and extension of single stranded DNA can produce homopolymers of up to 8000 nt. An effective reversible termination method is needed for TdT to be used for controllable enzymatic DNA synthesis. Using RT dNTP with blocking groups (RT is a reversible terminator that can be located at 3 ‘- OH or other positions of dNTP) as a substrate, TdT is expected to be used for the sequencing synthesis of long-chain oligonucleotides through coupling de blocking two-step cyclic iterations. Blocking groups with potential for development include amino, allyl, phosphate, 2-nitrobenzyl, 3 ‘- O – (2-cyanoethyl), etc. In 2018, the Keasling team took a different approach by connecting individual nucleotides on a single molecule TdT using a cleavable linker. After adding nucleotides to the primer chain using TdT, they still maintained their connection to the DNA strand, effectively preventing further extension of the DNA strand (Figure 2). After releasing TdT from the cleavage junction, DNA can undergo a new cycle of nucleotide addition. The average coupling efficiency of this method can reach 97.7%, and a single cycle only takes 2-3 minutes. Recently, DNA script company claimed to have synthesized up to 280 nt of oligonucleotides using enzymatic methods with a coupling efficiency of up to 99.5% through TdT modification of binding blocking groups. The enzymatic DNA synthesis technology gSynth promoted by Camena Bioscience has a coupling efficiency of up to 99.9%. At the same time, when gSynth synthesized 300 nt oligonucleotide fragments, the proportion of full-length sequences in the product reached 85.3%, much higher than the 22.7% of phosphoramide synthesis method. Although enzymatic DNA synthesis technology has not yet been commercialized, its potential predicts that enzymatic DNA synthesis technology will lead a new round of DNA synthesis technology revolution.

Enzymes, as key molecular machines for enzymatic DNA synthesis, control the formation of polymers. Therefore, the characteristics of enzymatic synthesis technology are closely related to enzyme function. The polymerization mode and mild reaction conditions of enzymes make the two-step coupling de blocking cycle based on enzymatic catalysis simpler, and can effectively reduce DNA damage and side reactions that occur during the synthesis process, with the potential for sustained synthesis of longer oligonucleotide fragments. Currently, DNA enzymatic synthesis technology is in a period of rapid development, and there are still many urgent problems to be solved in the synthesis process. This includes low efficiency of natural enzymes in modifying RT dNTP, removal of blocking groups, inability to start from scratch or requiring a certain length of initiator chains (such as TdT,>3 nt) for synthesis. The core of these issues lies in the design and modification of enzyme molecule machine functions, as well as the development of control strategies. By exploring new enzymes, modifying enzymes, and developing artificial enzymes, we aim to enhance the function of DNA synthase and efficiently incorporate RT dNTP; Developing new controllable strategies to adapt to enzymatic synthesis automation, such as controlling enzyme activity through fine control of cofactors, to improve controllable synthesis efficiency and length, will also help promote the application of enzymatic DNA synthesis.

1.4 Development and evaluation of DNA synthesis technology

The chemical and enzymatic synthesis techniques of DNA involve the de novo polymerization of nucleotides into oligonucleotide fragments based on artificially specified nucleic acid sequences and corresponding synthesis rules (forming 3 ‘, 5’ – phosphodiester bonds). The actual commercial application of DNA synthesis technology is still limited to the above-mentioned few. There are various factors that affect the marketization of these technologies, among which the two main technical evaluation indicators are controllability and efficiency. The key to template free DNA synthesis technology lies in the controllable addition of nucleotides one by one, thereby extending the polynucleotide chain. In theory, when a single nucleotide is added under controlled conditions, the entire sequence can be synthesized under controlled conditions. After achieving a controllable synthesis cycle, synthesis efficiency and coupling efficiency will be another evaluation indicator of whether DNA synthesis technology has industrial value. The synthesis efficiency is reflected in the time it takes to add a single nucleotide to the cycle (t), and the total time required for the synthesis of an oligonucleotide with a length of x nt using the column method is T=t × (x-1). The coupling efficiency is the probability that one of the 100 nt oligonucleotide molecules cannot react during the coupling step, which can be expressed as 99% coupling efficiency (CE). Therefore, the full-length product (FLP)=(CE) n, where n is the number of cyclic iterations (n=x-1). For example, a synthetic 200 nt chain with 99% CE, with a single cycle of 8 minutes, theoretically only 13% FLP is present in the product after 1592 minutes, and 87% are short or other erroneous chains. These two efficiencies are closely related to the cost of oligonucleotide synthesis and also affect the cost of assembly and detection screening processes for longer fragment synthesis.

2 Long fragment DNA assembly technology

At present, the length of DNA fragments synthesized from scratch is limited, and longer genes or genomes need to be obtained through enzymatic assembly or in vivo assembly of oligonucleotide fragments. There are two commonly used methods for oligonucleotide assembly: ligase chain reaction (LCR) and polymerase cycling assembly (PCA). The ligase assembly method uses DNA ligases to connect end-to-end and overlapping 5 ‘phosphorylated oligonucleotide fragments into double stranded DNA. The polymerase assembly rule utilizes overlapping oligonucleotide fragments of DNA polymerase extension hybridization to obtain mixtures of different lengths, and finally amplifies the successfully assembled full-length fragments using primers . PCA has good compatibility and is also used for oligonucleotide assembly in chip synthesis.

In order to improve gene synthesis throughput and reduce costs, new progress has been made in integrating miniaturized and automated gene synthesis technologies for synthesis and assembly. In 2011, Tian et al. developed a gene synthesis method using multifunctional chips and combinatorial enzyme technology, integrating all steps of gene synthesis from oligonucleotide library synthesis, library amplification, error correction to gene assembly onto the same chip, without the need to change the reaction system midway, greatly simplifying the gene synthesis process. Twist Bioscience has developed a docking silicon wafer reactor based on Agilent’s oligonucleotide in situ synthesis technology for automated gene synthesis. The enzymatic gene synthesis technology that integrates synthesis and assembly has also made new progress. According to reports, gSynth enzymatic DNA synthesis technology achieved de novo synthesis of 2.7 kb pUC19 plasmid through a cycle of synthesis and assembly.

For further splicing of double stranded DNA after oligonucleotide assembly, early methods relied on the viscous ends produced by restriction endonucleases to concatenate DNA fragments. As a result, the BioBrick and BglBrick systems, as well as the Golden Gate technology that utilizes IIS type restriction endonucleases to generate sticky ends for assembly, have emerged (Figure 3). However, the introduction of sequence dependencies and DNA residues, as well as the tedious operation process, limit the application of such methods. The assembly methods developed using the individual or synergistic effects of exonucleases, DNA polymerase, and ligases eliminate the dependence on restriction endonucleases. This type of method involves generating homologous single stranded complementary ends for assembly, including various efficient and simple assembly methods such as SLIC, SLiCE, LCR, CPEC, and Gibson assembly (Figure 3). Among them, Gibson assembly can seamlessly assemble genome level fragments with hundreds of thousands of base pairs through one-step splicing in vitro.

As the length of fragments increases, DNA becomes easily unstable in vitro due to conventional manipulation, and splicing of fragments exceeding 20 kb relies more on the recombination system within the organism (Figure 3). Escherichia coli, Bacillus subtilis, and brewing yeast are the main host cells for assembling long DNA fragments in the body. After recombining with bacterial artificial chromosomes (BAC), Bacillus subtilis genome (BGM), or yeast artificial chromosomes (YAC) in a recombination system, these host cells can stably carry large DNA fragments, among which BGM has a cloning ability of over 3 Mb. Compared to Escherichia coli and Bacillus subtilis, brewing yeast has a higher homologous recombination rate, good compatibility with long fragments, and is the preferred chassis for simultaneously assembling multiple DNA fragments. There are also more application methods developed based on this system. Gibson et al. assembled 25 DNA fragments in one step in brewing yeast to form a 592 kb long circular mycoplasma genome. Researchers from the Chinese Academy of Sciences even spliced the complete genome of nearly 12 Mb Saccharomyces cerevisiae into a single chromosome.

The limitations of existing DNA synthesis technologies make DNA assembly an indispensable process in gene synthesis. The assembly process is significantly affected by connection efficiency and the number of splices. The assembly of chromosomes or genome length DNA using short initial fragments requires a higher number of layered assemblies, and the cost of quality control such as clone selection and sequencing during the process will also increase accordingly. In addition, assembly technology still faces a series of problems, which require further overcoming the inhibitory effects of aggregation sequences, long repeat sequences, and non-standard DNA structures on assembly, and developing vectors that can stably obtain full-length error free genetic systems. The optimization of assembly processes or the development of new assembly technologies require a micro understanding and effective control of the aforementioned assembly influencing factors. The development of intelligent assembly technology through computer algorithms for splitting and sequence transformation of problem sequences that affect assembly will help solve the assembly problem of complex long DNA fragments. A microfluidic assembly system with low cost, automation, and integration characteristics will become the direction for the development of in vitro synthesis and assembly integration platforms for oligonucleotides; The screening and application of new hosts with efficient recombination systems will provide more options for DNA in vivo assembly.

Gene Assembly Tools

Catalog Number Product Name Product Size Applications Price
GE0011 High-Fidelity DNA Assembly Cloning Kit 10 Reactions Function as a DNA-guided endonuclease. Online Inquiry
GE0012S High-Fidelity DNA Assembly Master Mix 10 Reactions High-fidelity construct generation for CRISPR workflows. Online Inquiry
GE0012L High-Fidelity DNA Assembly Master Mix 50 Reactions High-fidelity construct generation for CRISPR workflows. Online Inquiry
GE0012X High-Fidelity DNA Assembly Master Mix 250 Reactions High-fidelity construct generation for CRISPR workflows. Online Inquiry

3 DNA error correction techniques

The synthesis of oligonucleotides and enzymatic assembly processes inevitably result in various types of errors. Common errors include insertion, deletion, and substitution of nucleotides. The use of error correction techniques can effectively remove different types of errors and improve the accuracy of synthesized products.

3.1 Removal of erroneous oligonucleotide chains

The chemically synthesized oligonucleotide chains contain a large number of errors, and the correction process for such oligonucleotide pools is mainly based on the molecular weight or functional group differences caused by synthesis errors for separation and purification. For column synthesis, incomplete fragments can be removed by high performance liquid chromatography (HPLC), polyacrylamide gel electrophoresis (PAGE) or hydrophobic purification column filtration. These methods have low throughput, limited accuracy in distinguishing errors, and significant losses in the separation process. The error correction of chip synthesized oligonucleotides can be achieved by directly hybridizing with the sequence correct oligonucleotide capture probe, or by strict hybridization screening using Tm homogenization (thermodynamic parameters) to filter out erroneous fragments in the oligonucleotide pool. Evonetix’s DNA synthesis platform integrates synthesis, assembly, and error correction through temperature control, and its error correction process involves precise temperature control to remove non perfectly matched DNA strands. NGS technology can also be used for error correction, combining DNA chip synthesis with high-throughput sequencing platforms to integrate synthesis, assembly, and sequencing error correction.

3.2 Error correction of double stranded DNA fragments containing errors

The nucleotide insertion, deletion, and substitution errors of double stranded DNA fragments after complementary pairing are mainly manifested as mismatches and protrusions. The removal of these errors is mainly achieved through DNA enzymatic correction techniques developed based on DNA repair systems in organisms. By annealing complementary sequences to expose mismatched signals, and then using enzymes with mismatched binding or mismatched cleavage activity to correct DNA double strands, the correct sequences are enriched.

MutS, a mismatch binding enzyme involved in DNA repair, and its homologous proteins can recognize and bind various DNA containing mismatch bases and single stranded rings, and then separate MutS binding mismatched heteroduplex and unbound homoduplex by means of gel electrophoresis, capillary electrophoresis, affinity beads or adsorption resins. After two rounds of repetition, the error rate can be reduced to 1/10 kb, which is more than 15 times lower than traditional gene synthesis techniques. The engineering modification of the mismatched binding enzyme MutS not only enhances its high stability but also contributes to its market-oriented application. For oligonucleotide pools with high error rates, Binkowski et al. further developed the same sequence shuffling method using MutS. By introducing restriction endonucleases to segment DNA double strands, MutS does not need to remove the entire incorrect double stranded DNA, thus retaining a large number of short segments containing the correct sequence. Finally, the full-length sequence is recovered through OE-PCR assembly. The test found that after two rounds of same sequence reorganization, the error rate of the 3.5 kb fragment decreased to 1/3.5 kb, and the accuracy increased by 3.5-4.3 times. The essence of MutS error correction method is the physical separation of double stranded DNA containing errors. To ensure that there are sufficient correct fragments in the sample after MutS treatment, a considerable portion of the sequence correct fragments are required in the oligonucleotide pool.

The use of mismatched cleavage enzymes can achieve the goal of correcting errors in the original treatment pool. Mismatch cleavage enzymes are a group of mismatch specific endonucleases that recognize DNA double stranded mismatch sites and cleave them near the mismatch sites. This mainly includes endonucleases that recognize single base mismatches and single strand specific nucleases, which work together with polymerase to hydrolyze and cleave the erroneous region using polymerase with exonuclease activity, and then assemble and recover the full-length sequence through OE-PCR. This method can eliminate errors at the single base level while retaining most of the sequence correct regions, and can perform multiple “error correction assembly” cycles until the desired purity of the product is achieved. T4 endonucleases VII, T7 endonucleases I, and E. coli endonucleases V can recognize and cleave single base mismatches, single nucleotide protrusions, and other types of errors in double stranded DNA, effectively reducing errors such as mismatches, deletions, and insertions in synthetic gene products. CEL in single stranded specific nucleases can specifically cleave different types of base mismatches and DNA distortions at neutral pH values. Combining the use of CEL nuclease in gene synthesis correction of chips can reduce the error rate of synthesized gene products from 1/526 bp to 1/3883 bp. Two enzymatic cleavage reactions can further reduce the error rate to 1/8700 bp, resulting in a reduction of errors by more than 16 times.

Most of the existing DNA error correction methods still focus on removing erroneous fragments. Developing error correction methods based on mismatch excision and error repair is expected to overturn traditional error correction techniques. Mismatch repair requires the correct template chain, and the in vivo mismatch repair system distinguishes the template chain from the newly synthesized sub chain through methylation modification, and repairs errors on the newly synthesized chain based on the template chain. Drawing on similar principles to distinguish between the correct segment and the segment to be repaired in vitro, identifying errors and performing single strand cutting on the incorrect area, and then using the correct segment as a template to repair errors. The error correction technology developed based on this will break free from the limitations of traditional error correction techniques and achieve true in vitro DNA error correction.