NIH, National Cancer Institute, Division of Cancer Treatment and Diagnosis (DCTD) NIH - National Institutes of Health National Cancer Institute DCTD - Division of Cancer Treatment and Diagnosis

DNA damage is a pervasive cause of sequencing errors, directly confounding variant identification.

Author(s): Chen L, Liu P, Evans TC Jr, Ettwiller LM

Publication: Science, 2017, Vol. 355, Page 752-756

PubMed ID: 28209900 PubMed Review Paper? No

Suggested by: Ian Fore, NCI


Purpose of Paper

This paper investigated whether the type of buffer used during DNA shearing and DNA repair can affect the detection of mutations by next generation sequencing. It also assessed the prevalence of DNA damage in publically available data-sets and the impact of this damage on mutation discovery.

 

Conclusion of Paper

Shearing DNA in 10 mM Tris alone or in combination with EDTA resulted in fewer G-to-T variants than when the specimen was sheared in water alone or 1 mM Tris, but the difference was eliminated if DNA was repaired prior to sequencing. DNA repair eliminated 77-82% of G-to-T and C-to-A variants occurring at very low to low frequency (<1-5%). Although 195 G-to-T and C-to-A variants with a frequency of <5% were detected in the cancer panel using unrepaired DNA, only 12 were detected using repaired DNA, indicating that if DNA is not repaired prior to sequencing approximately one false mutation will be detected for each cancer gene investigated. Importantly, an excess of G-to-T variants was also observed in the 1000 Genomes and The Cancer Genome Atlas (TCGA) datasets. For 78% of TCGA sequencing reads from tumor specimens, more than 50% of the G-to-T variants detected were false positives, and the percentage of false positives was strongly correlated with DNA damage (r=0.79). Notably, the fraction of germline G-to-T variants was unaffected by DNA damage. 

Studies

  1. Study Purpose

    This study investigated whether the type of buffer used during DNA shearing and DNA repair can affect the detection of mutations by next generation sequencing. It also assessed the prevalence of DNA damage in publically available data-sets and the impact of this damage on mutation discovery. Purchased Genomic DNA from human liver tumor was buffered with different concentrations of Tris (pH 8.0) with or without EDTA or unbuffered in water (pH 5.8) then sheared to an average size of 200 bp. Libraries were constructed with and without DNA repair (NEBNext FFPE DNA Repair kit) using the NEBNext DNA Library Prep Master Mix Set for Illumina or the NEBNext Ultra II DNA Library Prep Kit for Illumina. Libraries enriched for the 151 genes represented on the ClearSeq Comprehensive Cancer panel were also sequenced. All constructed libraries were sequenced using an Illumina MiSeq sequencer. DNA damage was assessed by Global Imbalance Value (GIV), which is the imbalance in mutations of a particular type between read 1 and read 2. DNA that generated a GIV of greater than 1.5 (1.5 fold more mutations in read 1 than read 2) was considered damaged. GIV and detected mutations were also investigated in the The Cancer Genome Atlas (TCGA) and 1000 Genomes data sets.

    Summary of Findings:

    Shearing DNA in 10 mM Tris alone or in combination with EDTA resulted in fewer G-to-T mutations than when the specimen was sheared in water alone or 1 mM Tris, but the difference was eliminated if DNA was repaired prior to sequencing. DNA repair eliminated 77-82% of G-to-T and C-to-A of very and low frequency variants (<1% and 1-5%, respectively). When DNA enriched for the cancer panel was sequenced without repair, 195 G-to-T and C-to-A variants of moderate frequency (<5%) were detected. In contrast only 12 such variants were detected if DNA was repaired prior to sequencing, indicating that if DNA is not repaired prior to sequencing approximately one false mutation will be detected for every cancer gene investigated.

    An excess of G-to-T variants was also observed in the 1000 Genomes and The Cancer Genome Atlas (TCGA) datasets. The percentage of TCGA sequencing runs with a Global Imbalance Value (GIV) G to T of > 2 was 73%, indicating that the majority of these variants were a result of DNA damage. When a GIV threshold of 1.5 was applied, an A-to-T imbalance was found in 0.5% of TCGA sequencing runs and a C-to-T imbalance was found in 3% of TCGA sequencing runs. The authors conclude that stochastic damage is responsible for the G-to-T transversions, which suggests that such artifacts would be more prevalent among low allelic fractions. The occurrence of such artifacts could confound detection of mutations with a similar low allelic frequency. Importantly, 78% of TCGA sequencing reads from tumor specimens yielded false positives for more that 50% of G-to-T variants; and, the percentage of false positives was strongly correlated with DNA damage (r=0.79). Notably, the fraction of germline variants (G-to-T), which occur at higher frequency was unaffected by DNA damage. In a subset of TCGA specimens (~10%), there was a higher number of high confidence somatic C-to-T type mutations which were false positives. TCGA reference files also displayed a 9% increase in G-to-T and C-to-A variants identified with high confidence in specimens with heavily damaged DNA in comparison to those with no damage or weak damage. 

    Biospecimens
    Preservative Types
    • Frozen
    Diagnoses:
    • Neoplastic - Carcinoma
    • Neoplastic - Germ Cell
    • Neoplastic - Not specified
    • Neoplastic - Melanoma
    • Neoplastic - Sarcoma
    • Neoplastic - Benign
    Platform:
    AnalyteTechnology Platform
    DNA Next generation sequencing
    Pre-analytical Factors:
    ClassificationPre-analytical FactorValue(s)
    Analyte Extraction and Purification Rehydration of dried sample/specimen Water pH 5.6
    1 mM Tris (pH8.0)
    1 mM Tris (pH 8.0) and 0.1 mM EDTA (0.1 X TE)
    10 mM Tris (pH8.0)
    10 mM Tris (pH8.0) and 1 mM EDTA (1 X TE)
    10 mM Tris (pH8.0) and 0.1 mM EDTA
    Next generation sequencing Specific Template modification Repaired with NEBNext FFPE DNA Repair kit
    Not repaired
    Next generation sequencing Specific Reaction solution Water pH 5.6
    1 mM Tris (pH8.0)
    1 mM Tris (pH 8.0) and 0.1 mM EDTA (0.1 X TE)
    10 mM Tris (pH8.0)
    10 mM Tris (pH8.0) and 1 mM EDTA (1 X TE)
    10 mM Tris (pH8.0) and 0.1 mM EDTA

You Recently Viewed  

News and Announcements

  • April 24, 2024: Biobanking for Precision Medicine Seminar

  • Most Popular SOPs in March 2024

  • New SOPs Available

  • More...