Variability and bias in microbiome metagenomic sequencing: an interlaboratory study comparing experimental protocols.
Author(s): Forry SP, Servetas SL, Kralj JG, Soh K, Hadjithomas M, Cano R, Carlin M, Amorim MG, Auch B, Bakker MG, Bartelli TF, Bustamante JP, Cassol I, Chalita M, Dias-Neto E, Duca AD, Gohl DM, Kazantseva J, Haruna MT, Menzel P, Moda BS, Neuberger-Castillo L, Nunes DN, Patel IR, Peralta RD, Saliou A, Schwarzer R, Sevilla S, Takenaka IKTM, Wang JR, Knight R, Gevers D, Jackson SA
Publication: Sci Rep, 2024, Vol. 14, Page 9785
PubMed ID: 38684791 PubMed Review Paper? No
Purpose of Paper
This paper compared metagenomic data from five fecal specimens that were analyzed by 44 different laboratories, each using their own 16S rRNA or whole genome shotgun (WGS) sequencing workflow.
Conclusion of Paper
Of the 44 participating laboratories, 30 conducted 16S rRNA sequencing and 14 performed whole genome sequencing. On average, the read count was 10-fold higher with the WGS than the 16S rRNA workflow. Bray–Curtis principal coordinate analysis (PCoA) of the 16S and WGS datasets clustered specimens primarily by patient rather than by laboratory. The calculated ratio of Firmicutes to Bacteroidetes was highly variable between laboratories and specimens. When platforms were compared, 16S rRNA sequencing showed a higher ratio of Firmicutes to Bacteroidetes than WGS, with 16S rRNA sequencing frequently identifying relatively more Firmicutes than Bacteroidetes and WGS identifying the opposite in the same specimens. Importantly, the relative abundances of Balutia, Facalibacterium, Lachnospiraceae, or Ruminococcus were all higher than Bacteroidetes when identified by 16S rRNA sequencing but all were lower than Bacteroidetes by WGS. The calculated Inverse Simpson alpha diversity was also higher in all five specimens when analysis was by 16S rRNA sequencing than WGS. When data was stratified by sequencing method, the manufacturer of the DNA extraction kit and the target gene affected the Firmicutes to Bacteroidetes ratio when samples were analyzed by 16S rRNA sequencing, whereas the DNA extraction protocol, the DNA extraction kit manufacturer, and the sequencing library kit affected the Firmicutes to Bacteroidetes ratio when samples were analyzed by WGS sequencing. Interestingly, one of the two spiked-in bacteria (L. xyli) was not found by 16S rRNA sequencing and was only observed at very low abundance by WGS.
Studies
-
Study Purpose
This study compared metagenomic data from five fecal specimens that were analyzed by 44 different laboratories, each using their own 16S rRNA or whole genome shotgun (WGS) sequencing workflow. Multiple fecal specimens were collected from donors as part of the MOSAIC project and shipped overnight to the laboratory. Specimens were frozen at -80°C, homogenized with dry ice, preserved with OMNIgene Stabilizing Solution, aliquoted, and stored at -80°C. Fecal specimens were spiked with Alivibrio fischeri and Leifsonia xyli. Five fecal specimens were selected for the study based on their microbial composition, including four from healthy donors and one from a patient with Parkinson’s disease. To evaluate the homogeneity, ten aliquots of feces from each of these donors underwent rRNA amplicon sequencing and shotgun metagenomic sequencing at COSMOS ID using proprietary protocols. Additionally, two DNA mixtures were prepared from 13 ATCC strains. Aliquots of each fecal specimen as well as the two DNA mixtures were sent to 44 different laboratories where metagenomic profiling was conducted using laboratory-specific protocols.
Summary of Findings:
Of the 44 participating laboratories, 30 conducted 16S rRNA sequencing and 14 performed whole genome sequencing. On average, the read count was 10-fold higher with the WGS than 16S rRNA workflow (>106 vs ≥105), but the authors reported variability between laboratories. Bray–Curtis PCoA of the 16S and WGS datasets clustered specimens primarily by patient rather than by laboratory; however, at one laboratory, three of the five specimens appeared to have been misidentified. To avoid confusion, all data from the laboratory with possible specimen misidentification was excluded from further analysis. The calculated ratio of Firmicutes to Bacteroidetes was highly variable between laboratories and specimens. When platforms were compared, 16S rRNA sequencing showed a higher ratio of Firmicutes to Bacteroidetes than WGS, with 16S rRNA sequencing frequently identifying relatively more Firmicutes than Bacteroidetes and WGS identifying the opposite in the same specimens. Importantly, the relative abundances of Balutia, Facalibacterium, Lachnospiraceae, or Ruminococcus were all higher than Bacteroidetes by 16S rRNA sequencing but all were lower than 16S Bacteroidetes by WGS. Importantly, the calculated Inverse Simpson alpha diversity was also higher in all five specimens when analysis was by 16S rRNA sequencing than WGS. When data was stratified by sequencing method, the manufacturer of the DNA extraction kit and the target gene affected the Firmicutes to Bacteroidetes ratio in at least two specimens when analyzed by 16S rRNA sequencing; use of magnetic beads during extraction, a shaking apparatus, a homogenizer, a single versus multiple amplification step, the primer set, the method used for DNA input quantification, the method used to identify the fragment size of input DNA, the PCR kit, the sequencing machine manufacturer and mode, library kit choice, or pair-end versus single read and read length did not affect 16S rRNA sequencing results. In contrast, WGS data showed an effect of DNA extraction protocol, DNA extraction kit manufacturer, and sequencing library kit on the Firmicutes to Bacteroidetes ratio. Interestingly, one of the two spiked-in bacteria (L. xyli) was not identified by 16S rRNA sequencing and was only observed at very low abundance by WGS. Analysis of the DNA mixtures with each platform by each laboratory revealed a high level of variability; importantly, results from the DNA mixture did not reflect the true ratios of bacteria even when variability was low.
Biospecimens
Preservative Types
- Frozen
Diagnoses:
- Normal
- Parkinson's Disease
Platform:
Analyte Technology Platform Cell count/volume Next generation sequencing Pre-analytical Factors:
Classification Pre-analytical Factor Value(s) Analyte Extraction and Purification Analyte isolation method Effect of unspecified kits investigated
Effect of using magnetic beads investigated
Biospecimen Aliquots and Components Biospecimen mixing Effects of homogenization investigated
Effects of shaker apparatus investigated
Next generation sequencing Specific Technology platform Compared WGS and 16S rRNA sequencing
Compared unspecified library preparation kits
Compared pair-end versus single read
Compared different read lengths
Next generation sequencing Specific Nucleic acid amplification Compared single versus multiple amplification steps
Compared use or omission of a primer set
Next generation sequencing Specific Template/input amount Investigated effects associated with unspecified quantification method