NGS library preparation

What is library preparation?

A critical step in any next-generation sequencing (NGS) workflow is library preparation, which involves converting nucleic acid samples (gDNA or cDNA) into a library of uniformly sized, adapter-ligated DNA fragments, which can then be sequenced using an NGS instrument.

For most commercially available sequencing platforms, the clonal amplification of each DNA fragment in the library by methods such as bridge amplification or emulsion PCR is necessary to generate sufficient copies of the sequencing template. The fragment libraries are obtained by annealing platform-specific adaptors to fragments generated from a DNA source of interest, such as genomic DNA (gDNA), double-stranded cDNA, and PCR amplicons. The presence of adapter sequences enables selective clonal amplification of the library molecules. Therefore, no bacterial cloning step is required to amplify the genomic fragment in a bacterial intermediate, as is performed in traditional sequencing approaches. Furthermore, the adapter sequence also contains a docking site for the platform-specific sequencing primers.

Why is library preparation important?

Data quality: Proper library preparation minimizes biases, ensures even coverage, and reduces errors, leading to high-quality sequencing data.

Customization: Libraries can be tailored for specific applications, such as whole-genome sequencing, exome sequencing, transcriptome analysis, metagenomics, or epigenomics.

Efficiency: Optimized protocols save time and resources, enabling high-throughput sequencing projects.

Sample preservation: Specialized protocols allow for the use of limited or degraded samples, such as formalin-fixed paraffin-embedded (FFPE) tissues.

What are the key steps in NGS library preparation?

Typically, a conventional library construction protocol consists of 4 steps:

  • Fragmentation of DNA
  • End repair of fragmented DNA
  • Ligation of adapter sequences (not for single-molecule sequencing applications)
  • Optional library amplification

Currently, four different methods are commonly used to generate fragmented gDNA: enzymatic digestion, sonication, nebulization, and hydrodynamic shearing. All methods have been used in library construction, but each has specific advantages and limitations. Endonucleolytic digestion is easy and fast, but it is often difficult to accurately control the fragment length distribution. Furthermore, this method tends to introduce biases regarding the representation of genomic DNA. The other three techniques employ physical methods to introduce double-strand breaks into DNA, which are believed to occur randomly resulting in an unbiased representation of the DNA in the library. The resulting DNA fragment size distribution can be controlled by agarose gel electrophoresis or automated DNA analysis.

Following fragmentation, the DNA sections must be repaired to generate blunt-ended, 5'-phosphorylated DNA ends compatible with the sequencing platform-specific adapter ligation strategy. The library generation efficiency is directly dependent on the efficiency and accuracy of these DNA end-repair steps.

The end-repair mix converts 5'- and 3'-protruding ends to 5'-phosphorylated blunt-ended DNA. In most cases the end repair is accomplished by exploiting the 5'–3' polymerase and the 3'–5' exonuclease activities of T4 DNA polymerase, while T4 Polynucleotide Kinase ensures the 5'-phoshorylation of the blunt-ended DNA fragments, preparing these fragments for subsequent adapter ligation.

Depending on the sequencing platform used, the blunt-ended DNA fragments can either directly be used for adapter-ligation, or need the addition of a single A overhang at the 3' ends of the DNA fragments to facilitate subsequent ligation of platform-specific adapters with compatible single T overhangs. Typically, this A-addition step is catalyzed by Klenow Fragment (minus 3' to 5' exonuclease) or other polymerases with terminal transferase activity.

T4 DNA ligase then adds the double-stranded adapters to the end-repaired library fragments, followed by reaction cleanup and DNA size selection to remove free library adapters and adapter dimers. The methods for size selection include agarose gel isolation, the use of magnetic beads, or advanced column-based purification methods. Adapter-dimers that can occur during the ligation and will subsequently be co-amplified with the adapter-ligated library fragments must be depleted from the libraries before sequencing, as they reduce the capacity of the sequencing platform for real library fragments and reduce sequencing quality. Some sequencing platforms require a narrow distribution of library fragments for optimal results, which in many cases can only be achieved by excising the respective fragment section from the gel. This can also serve to deplete adapter–dimers.

After this step, DNA fragment libraries should be qualified and quantified. Depending on the concentration and adapter design of the sequencing library, it can either be directly diluted and used for sequencing, or subjected to optional library amplification. In the library amplification step, high-fidelity DNA polymerases are employed to either generate the entire adapter sequence needed for subsequent clonal amplification and binding of sequencing primers, with overlapping PCR primers, and/or to produce higher yields of the DNA libraries. Optimal library amplification requires DNA polymerase with high fidelity and minimal sequence bias.

For an assessment of the library quality, see Library QC for NGS.

To enable efficient use of the sequencing capacity, sequencing libraries generated from different samples can be pooled and sequenced in the same sequencing run. This is enabled by ligation DNA fragments to adaptors with characteristic barcodes, i.e., short stretches of nucleotide sequences that are distinct for each sample.
Other methods are available to streamline library construction. One such novel method uses in vitro transposition by a transposase/DNA complex to simultaneously fragment and tag DNA in a single-tube reaction. A complete sequencing library can subsequently be constructed by limited rounds of the PCR amplification of such tagged DNA fragments, limiting handling steps and saving time. However, libraries generated using in vitro transposition may show higher sequence bias compared to those generated using conventional methods.

Challenges associated with library preparation include:

Sample quality: Degraded or contaminated samples can lead to poor library quality and sequencing failures.

Biases and artifacts: PCR amplification and fragmentation methods can introduce biases that affect downstream data interpretation.

Adapter dimer formation: Excess adapters can ligate to each other, creating small fragments that can interfere with sequencing.

Automation: High-throughput projects may require automated library preparation systems to increase reproducibility and efficiency.

Basic tips for successful library preparation:

Optimize input amounts: Use recommended amounts of DNA or RNA to ensure efficient library construction.

Maintain clean workspaces: Prevent contamination by using sterile techniques and dedicated equipment.

Follow protocols carefully: Adhere strictly to manufacturer instructions or validated protocols.

Include controls: Use positive and negative controls to monitor the success of library preparation steps.

Stay updated: Keep abreast of new kits and technologies that may improve efficiency and data quality.


DNA and RNA library preparation share common steps such as fragmentation, end repair, adapter ligation and amplification. However, RNA library prep involves unique challenges and additional steps like RNA enrichment and reverse transcription to generate cDNA. 
  DNA library prep   RNA library prep
 Purpose  Analyze genomic DNA
sequences, variants, and
structural variations
 Analyze gene expression,
transcript variants, and splicing
events
 Starting material  Genomic DNA extracted
from cells, tissues, or
environmental samples
 Total RNA or messenger RNA
(mRNA) extracted from cells or
tissues
 Challenges
  • Handling high GC-content
    regions that may lead to
    amplification biases
  • DNA damage in low-quality
    samples (e.g., FFPE tissues)
  • Potential contamination with
    RNA or proteins
  • RNA is prone to degradation;
    requiring careful handling to
    prevent RNase contamination
  • Efficient removal of rRNA is
    crucial to prevent it from
    dominating sequencing reads
  • Reverse transcription efficiency
    can vary, affecting the representation
    of transcripts
  • Preserving strand specificity adds
    complexity to the protocol
 Applications
  • Whole genome sequencing
    (WGS)
  • Whole exome sequencing
    (WES)
  • Targeted gene panels
  • Copy number variation (CNV)
    analysis
  • Metagenomics
  • Structural variant detection
  • Transcriptome profiling (RNA seq)
  • Differential gene expression analysis
  • Detection of alternative splicing and
    transcript isoforms
  • Fusion gene detection in cancer
  • Non-coding RNA analysis
  • Single-cell RNA sequencing

What are the key challenges and benefits of NGS library prep automation?

Automating NGS library preparation offers significant benefits, including scaling throughput while reducing hands-on time, enhanced reproducibility and minimized errors, producing consistent high-quality libraries in less time. However, automating this process also faces some challenges:

  • System cost, design and setup
  • Troubleshooting and training
  • Routine performance and maintenance
  • Quality control within system limitations

Various automated platforms are currently available for next-generation sequencing library preparation. Before committing, laboratories should assess their budget, facilities, and sequencing workflow to help identify what best meets their sequencing goals.