DNA sequencing for beginners

What is DNA sequencing?

DNA seq is a technique used to determine the sequence of nucleotides within a DNA molecule. The sequence can provide information about the genetic makeup of a particular DNA segment, for example, coding and non-coding regions, regulatory regions, genetic variations and mutations.

Why is DNA sequencing important?

The importance of DNA sequencing lies in its ability to:

  • Find genes quickly and easily
  • Know how genes direct the growth, development and maintenance of an organism
  • Understand all aspects of the genome (DNA, junk DNA and regulatory regions of DNA)
  • Identify the causes of genetic diseases and how to correct them

It helps us understand how genes and genomes are structured, regulated, vary, evolve and function, distinguishing organisms at the species and even individual levels.

What are the types of DNA sequencing?

Broadly, there are two main types of DNA sequencing: sequencing by synthesis or polymerase-based sequencing and single-molecule sequencing.

Sanger DNA sequencing method follows the principle of DNA sequencing by synthesis, utilizing chain-terminating nucleotides. Amplicons are separated by gel electrophoresis.  It is widely used to sequence smaller fragments of DNA. 

Next-generation sequencing (NGS), also known as massively parallel sequencing, follows the principle of DNA sequencing by synthesis or other methods, allowing high-throughput sequencing of large volumes of DNA, including whole genomes or transcriptomes. NGS uses capillary electrophoresis to separate amplicons of various lengths.

Third-generation sequencing works on the principle of real-time sequencing of single DNA molecules without amplification, often providing longer read lengths.

   Sanger sequencing  NGS
 Accuracy Highly accurate for sequencing small
DNA fragments.
High accuracy per read with low error
rates makes it agold standard for confirming
variants
Higher error rates per base compared
to Sanger in certain genomic regions. But
achieves overall accuracy through high
coverage and consensus building.
 Throughput and scale Low throughput; not suitable for large-scale
projects.
High throughput, capable of sequencing
millions of DNA fragments simultaneously.
 Cost and time Expensive and time-consuming on a per-base
basis or large genomes.
Cost-effective for large-scale projects
(whole-genome,exome, or transcriptome
sequencing). Lower cost per base and faster for
large genomes.
 Read length Produces longer read lengths (~700–1000 bp),
which can be advantageous for certain
applications.
Generates shorter reads (typically 50–300 bp),
but high coverage compensates for the shorter
length (e.g., Illumina, Element Biosciences).
However, the two long-read NGS methods
(e.g., PacBio and Oxford Nanopore) are even
better suited for repeat sequences as their read
lengths outperform Sanger sequencing.

How does DNA sequencing work?

DNA seq typically involves these key steps: sample preparation, DNA fragmentation, amplification (in some methods), sequencing reaction, detection and data analysis.

  • Sample preparation: DNA is extracted from cells or tissues and purified.
  • DNA fragmentation: The DNA is broken into smaller fragments, and fragments of desired lengths are selected to facilitate sequencing.
  • Library preparation: Short DNA sequences called adapters are attached to the ends of DNA fragments. These adapters are necessary to attach the fragments to a sequencing platform and amplify them.
  • Amplification (optional): Polymerase chain reaction (PCR) is often used to generate sufficient quantities of DNA fragments for sequencing.
  • Sequencing reaction and detection: The prepared DNA library is subjected to sequencing processes like Sanger sequencing or next-generation sequencing (NGS), and the sequencing instrument records signals corresponding to the incorporated events (nucleotides or probes).
  • Data analysis: The raw sequence data is processed, aligned and compared to reference genomes, and the functional implications of the sequences and variants are interpreted. 

How long does DNA seq take?

The time required for sequencing DNA depends on the run times and capacities of the sequencing technology used. Apart from that, the time can vary significantly depending on factors like:

  • The type and size of DNA being sequenced
  • Sample preparation and library construction steps
  • Tools and methods to process the raw data

As a general rule, 

  • Sanger sequencing: Several hours to a few days for short sequence of DNA
  • Next-generation sequencing (NGS): 5-24 hours to sequence an entire human genome, but library preparation and data analysis may take additional time
  • Third-generation sequencing: Even faster sequencing times, sometimes within hours for smaller genomes

How accurate is DNA sequencing ?

Accuracy in sequencing DNA varies not only between different technologies but also across genomic regions, as certain stretches of the genome are inherently more challenging to read.

Read accuracy refers to the inherent error rate of individual sequencing reads produced by a DNA sequencing technology. Typical read accuracy ranges from approximately 90% for traditional long-read technologies to over 99% for short-read technologies and high-fidelity (HiFi) reads.

Consensus accuracy, on the other hand, is achieved by combining information from multiple reads within a dataset, which helps eliminate random errors from individual reads. High coverage from sequencing the same region multiple times provides more reads from which to build a consensus, improving accuracy in general.

Accuracy can be increased by incorporating unique molecular indexing (UMIs) into the library prior to amplification and sequencing. UMIs with UMI-enabled bioinformatics help increase confidence in low-abundance DNA variants.

What are the applications of DNA sequencing?

Sequencing of DNA is a transformative technology that has applications in many fields of research, some of which are listed here:

Genetic diseases: Sequencing can identify genetic mutations that cause inherited diseases and, therefore, help understand disease mechanisms, identify therapeutic targets, personalize treatments and predict responses to therapy.

Evolutionary relationships: Comparing the genomes of different organisms shows how species are related, allowing us to trace evolutionary lineages and understand the genetic divergence between populations or species.

Forensic science: DNA profiling helps analyze genetic samples from crime scenes. Sequencing is also used to identify individuals in paternity cases and for disaster victim identification. It also helps trace ancestry by revealing genetic markers passed down through generations.

Agriculture: It supports genetic modification and breeding for better crops and livestock. It can also help identify genes responsible for desirable traits like pest resistance, drought tolerance, and higher yield, leading to improved crop varieties.

Microbial genomics and infectious disease control: Researchers can identify and characterize entire microbial communities by sequencing DNA from environmental samples. This has applications in ecology, agriculture and human health (for example, understanding the microbiome). Rapid sequencing of pathogens during outbreaks enables quick identification and response.