NIH, National Cancer Institute, Division of Cancer Treatment and Diagnosis (DCTD) NIH - National Institutes of Health National Cancer Institute DCTD - Division of Cancer Treatment and Diagnosis

Variability and bias in microbiome metagenomic sequencing: an interlaboratory study comparing experimental protocols.

Author(s): Forry SP, Servetas SL, Kralj JG, Soh K, Hadjithomas M, Cano R, Carlin M, Amorim MG, Auch B, Bakker MG, Bartelli TF, Bustamante JP, Cassol I, Chalita M, Dias-Neto E, Duca AD, Gohl DM, Kazantseva J, Haruna MT, Menzel P, Moda BS, Neuberger-Castillo L, Nunes DN, Patel IR, Peralta RD, Saliou A, Schwarzer R, Sevilla S, Takenaka IKTM, Wang JR, Knight R, Gevers D, Jackson SA

Publication: Sci Rep, 2024, Vol. 14, Page 9785

PubMed ID: 38684791 PubMed Review Paper? No

Purpose of Paper

This paper compared metagenomic data from five fecal specimens that were analyzed by 44 different laboratories, each using their own 16S rRNA or whole genome shotgun (WGS) sequencing workflow. 

Conclusion of Paper

Of the 44 participating laboratories, 30 conducted 16S rRNA sequencing and 14 performed whole genome sequencing. On average, the read count was 10-fold higher with the WGS than the 16S rRNA workflow. Bray–Curtis principal coordinate analysis (PCoA) of the 16S and WGS datasets clustered specimens primarily by patient rather than by laboratory. The calculated ratio of Firmicutes to Bacteroidetes was highly variable between laboratories and specimens. When platforms were compared, 16S rRNA sequencing showed a higher ratio of Firmicutes to Bacteroidetes than WGS, with 16S rRNA sequencing frequently identifying relatively more Firmicutes than Bacteroidetes and WGS identifying the opposite in the same specimens. Importantly, the relative abundances of Balutia, Facalibacterium, Lachnospiraceae, or Ruminococcus were all higher than Bacteroidetes when identified by 16S rRNA sequencing but all were lower than  Bacteroidetes by WGS.  The calculated Inverse Simpson alpha diversity was also higher in all five specimens when analysis was by 16S rRNA sequencing than WGS. When data was stratified by sequencing method, the manufacturer of the DNA extraction kit and the target gene affected the Firmicutes to Bacteroidetes ratio when samples were analyzed by 16S rRNA sequencing, whereas the DNA extraction protocol, the DNA extraction kit manufacturer, and the sequencing library kit affected the Firmicutes to Bacteroidetes ratio when samples were analyzed by WGS sequencing. Interestingly, one of the two spiked-in bacteria (L. xyli) was not found by 16S rRNA sequencing and was only observed at very low abundance by WGS. 

Studies

  1. Study Purpose

    This study compared metagenomic data from five fecal specimens that were analyzed by 44 different laboratories, each using their own 16S rRNA or whole genome shotgun (WGS) sequencing workflow.  Multiple fecal specimens were collected from donors as part of the MOSAIC project and shipped overnight to the laboratory. Specimens were frozen at -80°C, homogenized with dry ice, preserved with OMNIgene Stabilizing Solution, aliquoted, and stored at -80°C. Fecal specimens were spiked with Alivibrio fischeri and Leifsonia xyli.  Five fecal specimens were selected for the study based on their microbial composition, including four from healthy donors and one from a patient with Parkinson’s disease. To evaluate the homogeneity, ten aliquots of feces from each of these donors underwent rRNA amplicon sequencing and shotgun metagenomic sequencing at COSMOS ID using proprietary protocols. Additionally, two DNA mixtures were prepared from 13 ATCC strains. Aliquots of each fecal specimen as well as the two DNA mixtures were sent to 44 different laboratories where metagenomic profiling was conducted using laboratory-specific protocols.

    Summary of Findings:

    Of the 44 participating laboratories, 30 conducted 16S rRNA sequencing and 14 performed whole genome sequencing. On average, the read count was 10-fold higher with the WGS than 16S rRNA workflow (>106 vs ≥105), but the authors reported variability between laboratories. Bray–Curtis PCoA of the 16S and WGS datasets clustered specimens primarily by patient rather than by laboratory; however, at one laboratory, three of the five specimens appeared to have been misidentified. To avoid confusion, all data from the laboratory with possible specimen misidentification was excluded from further analysis. The calculated ratio of Firmicutes to Bacteroidetes was highly variable between laboratories and specimens. When platforms were compared, 16S rRNA sequencing showed a higher ratio of Firmicutes to Bacteroidetes than WGS, with 16S rRNA sequencing frequently identifying relatively more Firmicutes than Bacteroidetes and WGS identifying the opposite in the same specimens. Importantly, the relative abundances of Balutia, Facalibacterium, Lachnospiraceae, or Ruminococcus were all higher than Bacteroidetes by 16S rRNA sequencing but all were lower than 16S Bacteroidetes by WGS.  Importantly, the calculated Inverse Simpson alpha diversity was also higher in all five specimens when analysis was by 16S rRNA sequencing than WGS. When data was stratified by sequencing method, the manufacturer of the DNA extraction kit and the target gene affected the Firmicutes to Bacteroidetes ratio in at least two specimens when analyzed by 16S rRNA sequencing; use of magnetic beads during extraction, a shaking apparatus, a homogenizer, a single versus multiple amplification step, the primer set, the method used for DNA input quantification, the method used to identify the fragment size of input DNA, the PCR kit, the sequencing machine manufacturer and mode, library kit choice, or pair-end versus single read and read length did not affect 16S rRNA sequencing results. In contrast, WGS data showed an effect of DNA extraction protocol, DNA extraction kit manufacturer, and sequencing library kit on the Firmicutes to Bacteroidetes ratio. Interestingly, one of the two spiked-in bacteria (L. xyli) was not identified by 16S rRNA sequencing and was only observed at very low abundance by WGS. Analysis of the DNA mixtures with each platform by each laboratory revealed a high level of variability; importantly, results from the DNA mixture did not reflect the true ratios of bacteria even when variability was low.

    Biospecimens
    Preservative Types
    • Frozen
    Diagnoses:
    • Normal
    • Parkinson's Disease
    Platform:
    AnalyteTechnology Platform
    Cell count/volume Next generation sequencing
    Pre-analytical Factors:
    ClassificationPre-analytical FactorValue(s)
    Analyte Extraction and Purification Analyte isolation method Effect of unspecified kits investigated
    Effect of using magnetic beads investigated
    Biospecimen Aliquots and Components Biospecimen mixing Effects of homogenization investigated
    Effects of shaker apparatus investigated
    Next generation sequencing Specific Technology platform Compared WGS and 16S rRNA sequencing
    Compared unspecified library preparation kits
    Compared pair-end versus single read
    Compared different read lengths
    Next generation sequencing Specific Nucleic acid amplification Compared single versus multiple amplification steps
    Compared use or omission of a primer set
    Next generation sequencing Specific Template/input amount Investigated effects associated with unspecified quantification method

You Recently Viewed  

News and Announcements

  • Most Downloaded SOPs in 2024

  • New Articles on the GTEx Project are Now FREELY Available!

  • Just Published!

  • More...