DNA damage is a pervasive cause of sequencing errors, directly confounding variant identification.
Author(s): Chen L, Liu P, Evans TC Jr, Ettwiller LM
Publication: Science, 2017, Vol. 355, Page 752-756
PubMed ID: 28209900 PubMed Review Paper? No
Suggested by: Ian Fore, NCI
Purpose of Paper
This paper investigated whether the type of buffer used during DNA shearing and DNA repair can affect the detection of mutations by next generation sequencing. It also assessed the prevalence of DNA damage in publically available data-sets and the impact of this damage on mutation discovery.
Conclusion of Paper
Shearing DNA in 10 mM Tris alone or in combination with EDTA resulted in fewer G-to-T variants than when the specimen was sheared in water alone or 1 mM Tris, but the difference was eliminated if DNA was repaired prior to sequencing. DNA repair eliminated 77-82% of G-to-T and C-to-A variants occurring at very low to low frequency (<1-5%). Although 195 G-to-T and C-to-A variants with a frequency of <5% were detected in the cancer panel using unrepaired DNA, only 12 were detected using repaired DNA, indicating that if DNA is not repaired prior to sequencing approximately one false mutation will be detected for each cancer gene investigated. Importantly, an excess of G-to-T variants was also observed in the 1000 Genomes and The Cancer Genome Atlas (TCGA) datasets. For 78% of TCGA sequencing reads from tumor specimens, more than 50% of the G-to-T variants detected were false positives, and the percentage of false positives was strongly correlated with DNA damage (r=0.79). Notably, the fraction of germline G-to-T variants was unaffected by DNA damage.
Studies
-
Study Purpose
This study investigated whether the type of buffer used during DNA shearing and DNA repair can affect the detection of mutations by next generation sequencing. It also assessed the prevalence of DNA damage in publically available data-sets and the impact of this damage on mutation discovery. Purchased Genomic DNA from human liver tumor was buffered with different concentrations of Tris (pH 8.0) with or without EDTA or unbuffered in water (pH 5.8) then sheared to an average size of 200 bp. Libraries were constructed with and without DNA repair (NEBNext FFPE DNA Repair kit) using the NEBNext DNA Library Prep Master Mix Set for Illumina or the NEBNext Ultra II DNA Library Prep Kit for Illumina. Libraries enriched for the 151 genes represented on the ClearSeq Comprehensive Cancer panel were also sequenced. All constructed libraries were sequenced using an Illumina MiSeq sequencer. DNA damage was assessed by Global Imbalance Value (GIV), which is the imbalance in mutations of a particular type between read 1 and read 2. DNA that generated a GIV of greater than 1.5 (1.5 fold more mutations in read 1 than read 2) was considered damaged. GIV and detected mutations were also investigated in the The Cancer Genome Atlas (TCGA) and 1000 Genomes data sets.
Summary of Findings:
Shearing DNA in 10 mM Tris alone or in combination with EDTA resulted in fewer G-to-T mutations than when the specimen was sheared in water alone or 1 mM Tris, but the difference was eliminated if DNA was repaired prior to sequencing. DNA repair eliminated 77-82% of G-to-T and C-to-A of very and low frequency variants (<1% and 1-5%, respectively). When DNA enriched for the cancer panel was sequenced without repair, 195 G-to-T and C-to-A variants of moderate frequency (<5%) were detected. In contrast only 12 such variants were detected if DNA was repaired prior to sequencing, indicating that if DNA is not repaired prior to sequencing approximately one false mutation will be detected for every cancer gene investigated.
An excess of G-to-T variants was also observed in the 1000 Genomes and The Cancer Genome Atlas (TCGA) datasets. The percentage of TCGA sequencing runs with a Global Imbalance Value (GIV) G to T of > 2 was 73%, indicating that the majority of these variants were a result of DNA damage. When a GIV threshold of 1.5 was applied, an A-to-T imbalance was found in 0.5% of TCGA sequencing runs and a C-to-T imbalance was found in 3% of TCGA sequencing runs. The authors conclude that stochastic damage is responsible for the G-to-T transversions, which suggests that such artifacts would be more prevalent among low allelic fractions. The occurrence of such artifacts could confound detection of mutations with a similar low allelic frequency. Importantly, 78% of TCGA sequencing reads from tumor specimens yielded false positives for more that 50% of G-to-T variants; and, the percentage of false positives was strongly correlated with DNA damage (r=0.79). Notably, the fraction of germline variants (G-to-T), which occur at higher frequency was unaffected by DNA damage. In a subset of TCGA specimens (~10%), there was a higher number of high confidence somatic C-to-T type mutations which were false positives. TCGA reference files also displayed a 9% increase in G-to-T and C-to-A variants identified with high confidence in specimens with heavily damaged DNA in comparison to those with no damage or weak damage.
Biospecimens
- Tissue - Cervix
- Tissue - Esophagus
- Tissue - Bone
- Tissue - Lung
- Tissue - Liver
- Tissue - Testis
- Tissue - Uterus
- Tissue - Muscle (Skeletal)
- Tissue - Thymus Gland
- Tissue - Pancreas
Preservative Types
- Frozen
Diagnoses:
- Neoplastic - Carcinoma
- Neoplastic - Germ Cell
- Neoplastic - Not specified
- Neoplastic - Melanoma
- Neoplastic - Sarcoma
- Neoplastic - Benign
Platform:
Analyte Technology Platform DNA Next generation sequencing Pre-analytical Factors:
Classification Pre-analytical Factor Value(s) Analyte Extraction and Purification Rehydration of dried sample/specimen Water pH 5.6
1 mM Tris (pH8.0)
1 mM Tris (pH 8.0) and 0.1 mM EDTA (0.1 X TE)
10 mM Tris (pH8.0)
10 mM Tris (pH8.0) and 1 mM EDTA (1 X TE)
10 mM Tris (pH8.0) and 0.1 mM EDTA
Next generation sequencing Specific Template modification Repaired with NEBNext FFPE DNA Repair kit
Not repaired
Next generation sequencing Specific Reaction solution Water pH 5.6
1 mM Tris (pH8.0)
1 mM Tris (pH 8.0) and 0.1 mM EDTA (0.1 X TE)
10 mM Tris (pH8.0)
10 mM Tris (pH8.0) and 1 mM EDTA (1 X TE)
10 mM Tris (pH8.0) and 0.1 mM EDTA