DNA sequencing for beginners
What is DNA sequencing?
DNA sequencing is a technique used to determine the sequence of nucleotides within a DNA molecule. The sequence can provide information about the genetic makeup of a particular DNA segment, for example, coding and non-coding regions, regulatory regions, genetic variations and mutations.
Why is DNA sequencing important?
The importance of DNA sequencing lies in its ability to:
- Find genes quickly and easily
- Know how genes direct the growth, development and maintenance of an organism
- Understand all aspects of the genome (DNA, junk DNA and regulatory regions of DNA)
- Identify the causes of genetic diseases and how to correct them
It helps us understand how genes and genomes are structured, regulated, vary, evolve and function, distinguishing organisms at the species and even individual levels.
What are the types of DNA sequencing?
Broadly, there are two main types of DNA sequencing: sequencing by synthesis or polymerase-based sequencing and single-molecule sequencing.
Sanger DNA sequencing follows the principle of DNA sequencing by synthesis, utilizing chain-terminating nucleotides. Amplicons are separated by gel electrophoresis. It is widely used to sequence smaller fragments of DNA.
Massively parallel DNA sequencing (NGS or next-generation sequencing) follows the principle of DNA sequencing by synthesis or other methods, allowing high-throughput sequencing of large volumes of DNA, including whole genomes or transcriptomes. NGS uses capillary electrophoresis to separate amplicons of various lengths.
Third-generation sequencing works on the principle of real-time sequencing of single DNA molecules without amplification, often providing longer read lengths.
Compare and contrast the common DNA sequencing methods
Sanger sequencing | NGS | |
Accuracy | Highly accurate for sequencing small DNA fragments. High accuracy per read with low error rates makes it agold standard for confirming variants |
Higher error rates per base compared to Sanger in certain genomic regions. But achieves overall accuracy through high coverage and consensus building. |
Throughput and scale | Low throughput; not suitable for large-scale projects. |
High throughput, capable of sequencing millions of DNA fragments simultaneously. |
Cost and time | Expensive and time-consuming on a per-base basis or large genomes. |
Cost-effective for large-scale projects (whole-genome,exome, or transcriptome sequencing). Lower cost perbase and faster for large genomes. |
Read length | Produces longer read lengths (~700–1000 bp), whichcan be advantageous for certain applications. |
Generates shorter reads (typically 50–300 bp with Illumina), but high coverage compensates for the shorter length. |
How does DNA sequencing work?
DNA sequencing typically involves these key steps: sample preparation, DNA fragmentation, amplification (in some methods), sequencing reaction, detection and data analysis.
- Sample preparation: DNA is extracted from cells or tissues and purified.
- DNA fragmentation: The DNA is broken into smaller fragments, and fragments of desired lengths are selected to facilitate sequencing.
- Library preparation: Short DNA sequences called adapters are attached to the ends of DNA fragments. These adapters are necessary to attach the fragments to a sequencing platform and amplify them.
- Amplification (optional): Polymerase chain reaction (PCR) is often used to generate sufficient quantities of DNA fragments for sequencing.
- Sequencing reaction and detection: The prepared DNA library is subjected to sequencing processes like Sanger sequencing or next-generation sequencing (NGS), and the sequencing instrument records signals corresponding to the incorporated events (nucleotides or probes).
- Data analysis: The raw sequence data is processed, aligned and compared to reference genomes, and the functional implications of the sequences and variants are interpreted.
How long does DNA sequencing take?
The time required for DNA sequencing depends on the run times and capacities of the sequencing technology used. Apart from that, the time can vary significantly depending on factors like:
- The type and size of DNA being sequenced
- Sample preparation and library construction steps
- Tools and methods to process the raw data
As a general rule,
- Sanger sequencing: Several hours to a few days for short DNA sequences
- Next-generation sequencing (NGS): 24–48 hours to sequence an entire human genome, but library preparation and data analysis may take additional time
- Third-generation sequencing: Even faster sequencing times, sometimes within hours for smaller genomes
How accurate is DNA sequencing ?
Accuracy in DNA sequencing varies not only between different technologies but also across genomic regions, as certain stretches of the genome are inherently more challenging to read.
Read accuracy refers to the inherent error rate of individual sequencing reads produced by a DNA sequencing technology. Typical read accuracy ranges from approximately 90% for traditional long-read technologies to over 99% for short-read technologies and high-fidelity (HiFi) reads.
Consensus accuracy, on the other hand, is achieved by combining information from multiple reads within a dataset, which helps eliminate random errors from individual reads. High coverage from sequencing the same region multiple times provides more reads from which to build a consensus, improving accuracy in general.
What are the applications of DNA sequencing?
DNA sequencing is a transformative technology that has applications in many fields of research, some of which are listed here:
Genetic diseases: Sequencing can identify genetic mutations that cause inherited diseases and, therefore, help understand disease mechanisms, identify therapeutic targets, personalize treatments and predict responses to therapy.
Evolutionary relationships: Comparing the genomes of different organisms shows how species are related, allowing us to trace evolutionary lineages and understand the genetic divergence between populations or species.
Forensic science: DNA profiling helps analyze genetic samples from crime scenes. Sequencing is also used to identify individuals in paternity cases and for disaster victim identification. It also helps trace ancestry by revealing genetic markers passed down through generations.
Agriculture: It supports genetic modification and breeding for better crops and livestock. It can also help identify genes responsible for desirable traits like pest resistance, drought tolerance, and higher yield, leading to improved crop varieties.
Microbial genomics and infectious disease control: Researchers can identify and characterize entire microbial communities by sequencing DNA from environmental samples. This has applications in ecology, agriculture and human health (for example, understanding the microbiome). Rapid sequencing of pathogens during outbreaks enables quick identification and response.