Identifying and minimizing biases in microbiome research
Bias. Is there another word in metagenomics that sends more chills down a researcher’s spine? Perhaps not. Biases attack the very core of objectivity; they threaten to destroy days, weeks or months of progress. No matter if biases are hidden, in your face, big or small, one thing’s for sure: They’ll stink up your research sooner or later.
Here are some of the scary side effects of biases in metagenomics. They can:
- Lead to incorrect conclusions about which taxa dominate different samples
- Result in wrong interpretations of which ecosystems are more similar
- Cause erroneous analyses of which taxa are associated with a given condition
- Limit our ability to make direct comparisons between taxon or gene abundance measurements from different experiments (1)
Fear not; being aware of biases in metagenomics is the first step in fighting against them. So here are five bias-prone steps in your workflow and some ways to minimize their effects in your microbiome research.
1. Biases during sample collection and stabilization
High variance in sample composition or variability in sample storage can easily introduce biases to the very first steps of the metagenomic workflow.
To prevent these biases in metagenomics, you should preserve your samples soon after collection. Deep freezing samples is a standard solution but can lead to substantial challenges during storage and transport. Instead, consider using stabilization chemistry to preserve bacterial DNA and RNA yield even at room temperature. With such kits, you can maintain microbial community and functional profiles in the sample after it’s collected and during transport and storage.
2. Biases during sample disruption
Efficient lysis maintains microbial sample diversity. Optimal bead beating and chemistry are critical for effective lysis and achieving higher alpha diversity through observed operational taxonomic units (OTUs) in downstream testing. Inhibitor removal is also important for efficient PCR in downstream steps for detecting microbial species. As quality control, bear in mind that a high A260/A280 ratio, near 1.8, indicates absence of inhibitors.
3. Biases during nucleic acid extraction
Biases due to DNA extraction procedures exert some of the most substantial effects on microbiome analysis. One study found that bias due to DNA extraction protocols resulted in error rates of over 85% in some samples. This is mainly because your extraction method can affect which species are recovered in a sample, amplifying or suppressing a community's observed proportions (2).
To address this bias, it is recommended that you find dedicated kits for your sample. For example, DNA extraction kits exist for isolation from stool or DNA extraction from compost, clay and topsoil. Additionally, you could analyze mock communities first to facilitate data interpretation (2).
4. Biases during library preparation
One of the most likely sources of biases in metagenomics is PCR amplification. Amplification is not uniform among fragments and thus could yield uneven base composition. Samples with high GC or AT content are not amplified as efficiently, which snowballs into an exponential problem over several PCR cycles. GC-poor regions can also be problematic, as they can suffer from underrepresentation compared to GC-optimal sequences (3).
Ultimately, your library diversity is reduced; in extreme cases, some genome regions may be completely absent in the final sequencing data. Moreover, GC bias can hinder de novo genome assembly and the discovery of single nucleotide polymorphisms (SNP). Other amplification-related artifacts from PCR stochasticity, template switching, and polymerase errors also reduce library quality (3).
To minimize amplification biases in metagenomics, you can:
- Use an optimal PCR mastermix
- Use PCR additives, such as betaine, that improve coverage of GC-regions or trimethylammonium chloride to improve coverage of GC-poor regions
- Reduce temperature ramp rates in the thermocycler
- Try PCR-free library preparation methods (3)
Don’t skip out on size selection in NGS library preparation. This helps remove undesired DNA fragments and reduce off-target reads. If you pool libraries (or even if you don’t), perform a library normalization step to make sure your flow cells are not overloaded or underloaded. Both of these scenarios could skew your data later on.
5. Bias during data interpretation
The NGS pipeline isn’t immune to bias. Try paying closer attention to:
- Filtering in moderation during quality control of raw sequence reads (4)
- Mapping: here most biases occur due to poor references, overclipping of 3’ ends, assembler discordances, DNA regional bias leading to non-uniform read coverage, repetitive DNA stretches and others (4)
- Performing quality control and attempting to correct errors after alignment (mismatch between sequence read and reference) and assembly (a consensus correction across all reads belonging to assembled location) (4)
To avoid introducing more bias in metagenomics, consider standardized analysis software. Bioinformatic tools can perform taxonomic identification, microbial species abundance profiling, and antimicrobial resistance (AMR) gene identification for you. You can easily visualize and analyze your metagenomics data with user-friendly interfaces. Such intuitive tools can help you reduce errors and save time during data interpretation.
Alternatively, consider tailor-made microbiome solutions covering the entire workflow from sample preparation to bioinformatics. Such sets can help you obtain high-quality DNA, minimize bias and maximize diversity for a truly representative view of the microbiome in stool or soil.
When we become conscious of biased decisions, we slowly stop making biased decisions. Biases in metagenomics are no different. Understanding where these discrepancies could occur during your work can drastically improve the conclusions based on your sample analysis.
So bye-bye, be gone, you shiver-inducing biases.
References
- McLaren MR, Willis AD, Callahan BJ. Consistent and correctable bias in metagenomic sequencing experiments. Elife. 2019; 8:e46923.
- Brooks JP, et al. The truth about metagenomics: quantifying and counteracting bias in 16S rRNA studies. BMC Microbiol. 2015; 21(15):66.
- Patrick DB et al. GC bias affects genomic and metagenomic reconstructions, underrepresenting GC-poor organisms. GigaScience. 2020; 9(2):giaa008.
- Abnizova I, Boekhorst RT, Orlov YL. Computational Errors and Biases in Short Read Next Generation Sequencing. Journal of Proteomics & Bioinformatics. 2017; (10):1.