Construction of DNA Library~Fragmentation of DNA

1、 DNA source

The double stranded length of DNA is commonly described in units of bp, Kb, Mb, and Gb. 1bp=one base pair; 1kb=1000bp; 1Mb=1000kb; 1Gb=1000Mb. The length of a single stranded DNA is represented by nt, where 10nt=10 base lengths of a single stranded DNA.

1. Genomic DNA (gDNA)

Eukaryotes, double stranded linear DNA within the nucleus; Prokaryotes, double stranded circular DNA with super helical structure; Virus, single stranded linear/circular DNA or double stranded linear/circular DNA.

The eukaryotic genome is usually much larger than the prokaryotic genome, for example, the human genome has approximately 300 million base pairs, while the bacterial genome only has millions to tens of millions of base pairs; Compared with bacteria or eukaryotic cells, the genome of virus is very small, but the genome size of different viruses varies greatly. For example, hepatitis B virus DNA is only 3 kb in size, while poxvirus genome is 300 kb in size.

2. Mitochondrial DNA (mtDNA)

The circular double stranded DNA within the mitochondria of eukaryotes is 16569bp in size.

3. Chloroplast DNA (cpDNA)

The length of double stranded circular DNA molecules in higher plant chloroplasts varies with plant species, with sizes ranging from 120kb to 217kb.

4. Plasmid DNA

Plasmids are small double stranded circular DNA molecules that are exposed, structurally simple, independent of bacterial nucleoid DNA, and have the ability to self replicate.

5. Ct-DNA&cf-DNA

Free DNA (cf-DNA) refers to extracellular DNA present in bodily fluids (plasma or cerebrospinal fluid); Circulating tumor DNA (ct-DNA) refers to tumor DNA fragments released into peripheral blood by necrotic or apoptotic tumor cells. Circulating tumor cells (CTCs) are a collective term for various types of tumor cells present in peripheral blood.

6. Exosome DNA

Exosomes are small single-layer membrane vesicles with a size of 30-140nm that are actively secreted into the extracellular space after the fusion of intracellular vesicles with the cytoplasmic membrane. The single or double stranded small fragments of DNA generated by genomic DNA damage repair are enveloped by exosomes, and exist in the form of DNA within the exosome membrane.

7. Extrachromosomal circular DNA (eccDNA)

EccDNA is a special type of free extracellular circular DNA detached from the normal genome; The size of eccDNAs varies greatly, ranging from tens of bases to hundreds of thousands of bases; According to their size and sequence, they can be divided into the following four categories: spcDNA, telomere loop, microDNA, and ecDNA.


Single stranded DNA complementary to RNA strands is synthesized by RNA dependent DNA polymerase (reverse transcriptase) using its RNA as a template in the presence of appropriate primers.

2、 Quality inspection of DNA

1. Concentration detection

1) Nandrop: Micro spectrophotometer that can detect the absorbance signal of samples in the spectral range including ultraviolet and visible light, and only needs to measure a small volume (0.5-2 μ l) The analysis results can be used to detect the concentration and purity of nucleic acids.

2) Qubit: Using fluorescent dyes to bind to specific target molecules for DNA quantification. Fluorescence signals are only emitted when fluorescent dyes bind to specific molecules (DNA, RNA, or proteins) in the sample; Significantly reduce the impact of other molecules (salts, solvents, detergents, and free nucleotides) in the buffer on the results.

2. Purity and integrity testing

1) Nandrop

A260/A280 is usually used to evaluate the presence of protein or phenol contamination in nucleic acids, with pure samples having A260/A280 greater than 1.8; Evaluate the presence of pollutants with an absorption wavelength of 230nm using A260/A230. The ratio of pure nucleic acid A260/A230 is greater than 2.0.


If A260/280=1.7 and A260/230=0.5, it can be considered as protein residue; A260/280=1-1.5260/230=1-1.5, which can be considered as phenol residue; A260/280 ≥ 2, A260/230<1, may be considered as residual guanidine salt.

Consider using magnetic bead purification to remove impurities from the sample due to impurity contamination.

2) Agarose gel electrophoresis

RNA contamination: treated with RNaseA.

Protein contamination: treated with protease K.

Genomic DNA degradation: Adjust fragmentation conditions appropriately during fragmentation.

3) Agilent 2100/4200 or Qsep100/400

For example, cf-DNA quality inspection:


3、 Fragmentation of DNA

The methods of nucleic acid interruption can be divided into machine method and reagent method, among which machine method includes contact and non-contact, contact method mainly includes probe type and water bath ultrasound, and non-contact method mainly includes Diagenode and Covaris brands.

1. Ultrasonic interruption

Ultrasonic disruption, based on hydrodynamic shear and ultrasonic fracture methods, is the most common method of DNA fragmentation. The principle is that under the action of high-frequency pulses, the liquid generates countless high-pressure and low-pressure points. The accompanying cavitation bubbles exist for a few milliseconds and then burst due to pressure changes. During this process, an instantaneous high-strength shear force is formed in the liquid, which can break cells and shear DNA.

Ultrasound: Ultrasound is a mechanical wave with an extremely short wavelength, typically shorter than 2cm in air. It must rely on the medium for propagation and cannot exist in a vacuum. It travels farther in water than in air.

Cavitation effect: During the propagation of ultrasonic waves in a medium, there is a positive and negative alternating period. In the positive phase, ultrasonic waves compress the molecules of the medium, changing its original density and causing it to increase; When in negative pressure phase, the molecules of the medium are sparse and further dispersed, resulting in a decrease in the density of the medium. When a sufficiently strong ultrasound is applied to a liquid medium, the average distance between the molecules of the medium will exceed the critical molecular distance that keeps the liquid medium constant, causing the liquid medium to fracture and form microbubbles. These small voids rapidly expand and close, causing violent collisions between liquid particles, resulting in pressures of several thousand to tens of thousands of atmospheres.

Features: Random fragmentation of fragments, no sequence preference, but high instrument cost and significant mechanical damage to DNA.

2. Endonuclease fragmentation

Typically, non restrictive endonucleases that randomly cleave DNA are used, not limited by specific cleavage sites, to randomly cleave corresponding template DNA. At present, the products in the NGS field are mainly divided into two types: fragmented reagents (enzyme+buffer) and integrated modules for fragmentation/final repair/addition of A.

1) Fragmentase of mixed enzyme system

The NEB patented dual enzyme fragmentation system consists of a mutant Vvn and a mutant T7 endonuclease I. By changing the cleavage time, DNA is cleaved to a specific size of DNA fragment, independent of the initial Input DNA quantity and Input DNA length. Mutant Vvn randomly generates cleavage on dsDNA. Mutant T7 endonuclease I recognizes the cleavage site and cleaves the strand at the cleavage point, causing dsDNA to break. The resulting DNA fragments contain short protruding ends, 5 ‘- phosphate groups, and 3’ – hydroxyl groups.

Vibrio vulnificus nuclease Vvn: capable of digesting DNA and RNA, mainly binding to small grooves in DNA, causing the double stranded body to bend about 20 degrees towards the main groove, and hydrolyzing DNA through a general single metal ion mechanism.

T7 endonuclease I: capable of recognizing and cleaving incompletely paired DNA, cross shaped DNA, Holliday or cross shaped DNA, heterologous double stranded DNA, and can also cleave double stranded DNA at a slower rate. The cleavage site is located at the first, second, or third phosphodiester bond at the 5 ‘end of the mismatched base.

2) Single enzyme system endonase V

Randomly introducing uracil through PCR can be used for DNA fragmentation, and the length of DNA cleavage can be controlled by adjusting the content of uracil in DNA (the ratio of dTTP/dUTP). However, due to differences in base sequence composition, the GC content of the sample has a potential impact on the effectiveness of fragmentation.

Vendonuclease V: A repair enzyme found in Escherichia coli, also known as deoxyhexaxanthine nucleoside 3 ‘endonuclease, can recognize double or single stranded DNA containing deoxyhexaxanthine nucleoside (whether paired or not), as well as DNA containing bases (AP) or uracil, base mismatches, insertion/deletion mismatches, hairpin structures, unpaired loop loops, folded flaps, and Y-like structures, but its ability to recognize the latter is not as good as the former. Nucleic acid endonuclease V cleaves the second phosphodiester bond at the 3 ‘end of the mismatched deoxyhydroxanthine nucleoside, producing a cleavage of a 3’ hydroxyl group and a 5 ‘phosphate group.

3) Single enzyme system DNaseI

DNase I can fragment dsDNA in the presence of different ions, but in practical operation, it is easily affected by various conditions, such as enzyme dosage, reaction temperature, substrate DNA purity, etc; In addition, research has found that DNase I has a preference for pyrimidine nucleotide adjacent sites, which may greatly affect the diversity of the final library.

DNase I: An endonuclease that randomly hydrolyzes dsDNA/ssDNA, producing monodeoxyribonucleotides and oligodeoxyribonucleotides with 5 ‘- phosphate and 3’ – OH groups.

DNase I enzyme cleavage of substrate DNA typically relies on calcium ions and can be activated by magnesium ions or divalent manganese ions. In the presence of Mg2+, DNase I independently cleaves each strand of dsDNA in a statistically random manner. When Mn2+is present, the enzyme cleaves two DNA strands at almost the same site, producing DNA fragments with flat ends or protruding ends with only one or two nucleotides.

Incubate at 37 ℃; After adding EDTA to a final concentration of 2.5mM, the activity was inhibited and inactivated by heating at 65 ℃ for 10 minutes; Phenol chloroform extraction can also inactivate DNase I; Metal ion chelating agents, zinc ions reaching a concentration of millimoles per liter, 0.1% SDS, DTT, mercaptoethanol and other reducing agents, as well as salt concentrations above 50-100mM, all have a significant inhibitory effect on DNaseI.

3. Transposase fragmentation

By utilizing the characteristics of transposons, random DNA fragments are generated by randomly inserting them into a certain position on DNA. The left and right 9 bases of the transposase insertion site have a certain preference.

The phenomenon of transposition refers to a DNA sequence that can be replicated or broken separately from its original position, cyclized and inserted into another site, and plays a regulatory role on subsequent genes. This sequence is called a “jumping gene” or transposon.

Generally speaking, transposons are classified into “Type I transposons” and “Type II transposons” based on their different transposition modes.

Type I transposon: In the “copy paste” mode, RNA pol II transcribes transposon DNA into mRNA, then reverse transcribes it into cDNA, and finally integrates it into a new position on the genome under the catalysis of integrase.

Type II transposon: In the “cut paste” mode, under the action of transposase, type II transposon dissociates from its original position and integrates back onto the chromosome. Tn5, commonly used in the NGS field, also belongs to this type of transposon.

1) Tn5 transposon

The Tn5 transposon is a bacterial transposon with a total length of approximately 5.8 kbp, consisting of a core sequence encoding three antibiotics (neomycin, bleomycin, and streptomycin) and two inverted IS50 sequences. The sequences of IS50R and IS50L are highly homologous, except for a mutation in one base of IS50L. IS50 has a 19bp inverted end (ES), an outer end (OE), and an inner end (IE), with 7 different bases at the inverted ends. This inverted end is the site of action for transposase (Tnp), during which the transposon is completely removed from the donor DNA and inserted into the target DNA. Both IS50L and IS50R contain genes encoding transposase (TnP) and transposon repressor protein (lnh), but due to base mutations in IS50L, translation is terminated prematurely, so only IS50R can produce normal active TnP and lnh.

Image of Tn5 enzyme transposition process:

① Two transposon enzyme molecules (Tnp) bind to a specific 19bp ES recognition sequence at the end of the transposon to form two Tnp-ES complexes, which then bind through the C-terminus interaction of Tnp to form the Tn5 transposon complex.

② In the presence of Mg2+, Tnp activates water molecules, which carry out nucleophilic attacks on the OE end of the transposon. A strand of DNA is hydrolyzed, exposing a 0H at the transposon end ˉ Group; This 0H ˉ Furthermore, nucleophilic attacks are carried out on the complementary strands of DNA to form a hairpin structure and cleave it.

③ After the transposon complex binds to non-specific target DNA, the activated 3 ′ -0H group at the end of the transposon attacks the target DNA nucleophilically and inserts it into the target DNA sequence at 9 bp intervals. The 9bp sticky end generated is filled by the action of DNA polymerase and ligase, resulting in the formation of a 9bp forward repeating sequence at both ends.

The Tn5 enzyme exhibits a 9-bp base preference at the end of the “cleavage” site. In fact, Tn5 enzymes do have recognition sites, but the recognition sites are not sequence specific like restriction endonucleases. The current consensus target site for Tn5/IS50 is A-GNTYWRANC-T.

Another characteristic of Tn5 transposon is that it preferentially integrates into actively transcribed or highly supercoiled DNA regions, which is not related to sequence specificity but rather to the topological structure of the target.

Researchers have found that in vitro Tn5 experiments only require the core sequence (ME: CTGTCTTATACACATCT), Tn5 transposase, and Mg2+to perform the Tn5 splicing replication function. Therefore, using this principle, scientists add sequences containing sequencing linker sequences that are consistent to the ME sequence. During transposase splicing replication, the ME sequence is inserted into the target DNA sequence while adding sequencing sequences. In the subsequent library construction and amplification process, a one-step PCR library can be achieved to obtain a complete library for sequencing.

NexteraXT technology: By adding the sequenced P5/P7 junction sequence (Adapter 1/2) to the ME sequence, Tnp can recognize the Adapter1 and Adapter2 sequences containing the ME sequence to form the Tn5 transposon complex.

The complex recognizes the target sequence of the receptor DNA, cuts off the receptor DNA, and inserts the carrying donor DNA to form a DNA library with a P5 partial adapter 1 on one end and a P7 partial adapter 2 on the other end. Then, by adding Barcode and the remaining parts of the adapter through PCR, a DNA library containing a complete P5 and P7 terminal adapter is formed.
During the splicing replication process of the Tn5 transposon complex in the image, a 9bp gap is generated at the target DNA. In the PCR stage of library construction, a 72 ℃ extension step is required to level the gap.

2) Other transposons

Mu transposase: Thermo Scientific transposon product, based on the transposon mechanism of bacteriophage Mu, this device replicates its genome by repeating transposons within the host genome during the dissolution phase of the bacteriophage lifecycle. Mu transposons can insert randomly generated 15 base pairs into any target DNA in vitro.