NIH, National Cancer Institute, Division of Cancer Treatment and Diagnosis (DCTD) NIH - National Institutes of Health National Cancer Institute DCTD - Division of Cancer Treatment and Diagnosis

Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation.

Author(s): Costello M, Pugh TJ, Fennell TJ, Stewart C, Lichtenstein L, Meldrim JC, Fostel JL, Friedrich DC, Perrin D, Dionne D, Kim S, Gabriel SB, Lander ES, Fisher S, Getz G

Publication: Nucleic Acids Res, 2013, Vol. 41, Page e67

PubMed ID: 23303777 PubMed Review Paper? No

Suggested by: Ian Fore, NCI


Purpose of Paper

This paper sought to identify the source of next generation sequencing artifacts as well as their cause and methods to eliminate them pre- and post-sequencing.   

Conclusion of Paper

Deep coverage sequencing revealed a high number of G>T mutations in the first read and C>A mutations in the second read, occurring at a fraction of <20% in both tumor and normal melanoma specimens. When more than 50,000 libraries sequenced by the Broad Institute (between 2009 and 2012) were investigated, the frequency of the artifact was highly variable between projects but increased over time and was predominantly found in targeted capture specimens and was absent in whole genome samples. The authors determined that the artifact was due to the oxidation of DNA during shearing and could be prevented by shearing to 500 bp rather than 150 bp, by including a buffer exchange step using Ampure XP SPRI beads (P<0.05), or through the addition of 1 mM EDTA to the Tris HCl (ph 8.0) buffer prior to shearing. For existing data, the authors developed a bioinformatics tool that removed the artifactual changes, although an estimated 1.4% of true mutations were also removed.

Studies

  1. Study Purpose

    This study sought to identify the source of next generation sequencing artifacts as well as methods to eliminate them pre-and post-sequencing. Specifically, oxidation during DNA shearing was identified as the source of a specific next-generation sequencing data artifact. The authors then investigated whether the artifact could be removed pre-analysis by buffer exchange or adding chelators to the shearing buffer, or computationally post-sequencing. Two hundred and twenty-one matched melanoma and normal specimens were used for deep coverage exome sequencing using the Illumina HiSeq 2000 v2 chemistry and sequencer. Details relating to specimen collection, processing and preservation were not specified. The presence of an artifact was determined by calculating the ArtQ as -10 x log10 (consistent errors-inconsistent errors/all observations). Damaged DNA was defined as having an ArtQ score of <30, which means that the sequence of more than 1 of every 1000 bases is attributable to the artifact. ArtQ scores were calculated for all projects sequenced at the Broad institute between 2009 and 2012. Two specimens with an ArtQ < 30 and one with ArtQ > 30 were also sequenced using the Illumina HiSeqV3, MiSeq and Ion Torrent PGM chemistries and sequencers. Potential effects of shearing (where mean fragment sizes of 150 bp and 500 bp were compared) was investigated using six specimens with an ArtQ <30. Potential effects of DNA oxidation were investigated using an 8-oxoG ELISA and DNA from six specimens with an ArtQ <30. Potential effects of the buffer exchange and buffer composition were investigated using DNA from three specimens with an initial ArtQ <30. 

    Summary of Findings:

    Deep coverage sequencing revealed a high number of variants occurring at a fraction of <20%. Instead of the expected dominance of C-to-T (C>T) transitions in tumor specimens only, which would be indicative of ultraviolet damage, C>A and G>T transversion mutations were dominant and occurred at similar rates in tumor and normal specimens. Importantly, there was also a strand bias with G>T mutations occurring in the first read and C>A in the second read. The authors report that similar mutations found at low allele frequency were identified in data from neuroblastoma and chronic lymphocytic leukemia. When more than 50,000 libraries sequenced by the Broad Institute between 2009 and 2012 were investigated, the presence of the artifact was highly variable between projects but increased over time and was predominantly found in targeted capture specimens and was absent in whole genome samples. The artifact remained when specimens were resequenced using different chemistries and sequencers (HiSeq V2, HiSeqV3, MiSeq, and Ion Torrent) and when samples were sequenced pre- and post-enrichment. When DNA from specimens with low ArtQ scores were resequenced using  libraries constructed from DNA sheared to 150 bp or 500 bp, mean Art Q scores were significantly lower among samples with DNA  sheared to 150 bp than those sheared to 500 bp (27 versus 35, P<0.05).  Effects were not observed in all specimens subjected to this shearing protocol even among those on the same 96-well plate, but those affected did cluster by DNA collection site. ELISA identified a significantly higher level of 8-oxoG in specimens with the artifact when DNA was sheared at 150 bp rather than at 500 bp or in unaffected specimens regardless of shearing (P<0.05), indicating the effect is due to DNA oxidation. The effect of shearing could be decreased by subjecting the specimen to buffer exchange using Ampure XP SPRI beads (P<0.05). Addition of 0.1 mM DFAM and/or 1 mM EDTA, but not 0.1 mM BHT to the Tris HCl (ph 8.0) buffer prior to shearing significantly reduced the frequency of the artifact but addition of DFAM also resulted in lower library yields making EDTA the preferred chelator. For specimens that cannot be reprocessed, the authors developed a bioinformatics tool that identified and removed the artifact, although an estimated 1.4% of true mutations were also removed.

    Biospecimens
    Preservative Types
    • Frozen
    Diagnoses:
    • Neoplastic - Melanoma
    • Neoplastic - Normal Adjacent
    • Not specified
    • Neoplastic - Leukemia
    Platform:
    AnalyteTechnology Platform
    DNA Next generation sequencing
    DNA ELISA
    Pre-analytical Factors:
    ClassificationPre-analytical FactorValue(s)
    Analyte Extraction and Purification Analyte isolation method Different unspecified extraction methods used at different sites
    Next generation sequencing Specific Technology platform HiSeq V2
    HiSeq V3
    MiSeq
    Ion Torrent
    Next generation sequencing Specific Template modification Exome enriched
    Not enriched
    Next generation sequencing Specific Reaction solution Tris HCl (pH 8.0)
    Tris HCl (pH 8.0) and 0.1 mM DFAM
    Tris HCl (pH 8.0) and 0.1 mM EDTA
    Tris HCl (pH 8.0), 1 mM EDTA and 0.1 mM BHT
    Tris HCl (pH 8.0) 1 mM EDTA and 0.1 mM DFAM
    Tris HCl (pH 8.0), 1 mM EDTA, 0.1 mM DFAM and 0.1mM BHT
    Tris HCl (pH 8.0) and 0.1 mM BHT

You Recently Viewed  

News and Announcements

  • Most Popular SOPs in March 2024

  • New SOPs Available

  • Most Downloaded SOPs in January and February of 2024

  • More...