Quality control recommendations for RNASeq using FFPE samples based on pre-sequencing lab metrics and post-sequencing bioinformatics metrics.
Author(s): Liu Y, Bhagwate A, Winham SJ, Stephens MT, Harker BW, McDonough SJ, Stallings-Mann ML, Heinzen EP, Vierkant RA, Hoskin TL, Frost MH, Carter JM, Pfrender ME, Littlepage L, Radisky DC, Cunningham JM, Degnim AC, Wang C
Publication: BMC Med Genomics, 2022, Vol. 15, Page 195
PubMed ID: 36114500 PubMed Review Paper? No
Purpose of Paper
The purpose of this paper was to compare the performance of two library preparation methods (Ribodepletion and Exome capture) for next-generation RNA sequencing (RNAseq) and to use RNAseq results to identify input thresholds (RNA concentration and library concentration) for the successful analysis of formalin-fixed, paraffin-embedded (FFPE) tissue by RNAseq.
Conclusion of Paper
The exome capture library preparation method performed superiorly to the rRNA Depletion (ribodepletion) method for both frozen and FFPE paired breast specimens; while the RNA Exome method resulted in fewer sequenced reads, it generated a higher proportion of reads that mapped to regions containing genes and exon-exon junctions and a higher number of canonical exon-exon junctions. Using three frozen specimens, the exome capture library preparation method also had fewer false positive calls and significantly higher SNP confirmation rates (based on whole exome sequencing (WES) results) than the ribodepletion method (p<2.2XE-16). Based on gene expression profiles, all 5 FFPE specimens sequenced with the exome capture method clustered by individual with the corresponding frozen specimens sequenced by the TruSeq method with PolyA Selection (not by preservation/library preparation method), but only 2 of 5 samples clustered by individual when FFPE libraries were prepared with the rRNA Depletion method.
The authors recommend a minimum RNA concentration of 25 ng/µl and a minimum library concentration of 1.7 ng/µl for RNAseq analysis using the exome capture method when studying FFPE tissue specimens. Based on trends observed between bioinformatic metrics and false positive rate (FPR), the following were identified by the authors as corresponding to a quality control (QC) failure: a median sample-wise correlation within the cohort of <0.75, <11,400 detectable genes, <25 million reads mapped to gene regions. Samples that failed to meet all three bioinformatic QC criteria had significantly lower library concentrations when determined by Qubit than those that passed (median 2.08 versus 5.82 ng/µl; p=2.8E-6) and lower RNA input concentration (median 18.9 versus 40.8 ng/µl). As library concentration (determined by Qubit) increased, the local failure rate decreased down to 20% at a library concentration of 2-4 ng/µl. As RNA input concentration (as determined by Qubit) increased, local failure rate decreased down to 25% at an RNA concentration of 20-30 ng/µl, which is higher than the concentration recommended by the vendor for a DV200 value of 30-50% (≥4.7 ng/µl). The authors developed a decision tree model (based on the concentrations of RNA and library inputs) that resulted in an F score of 0.848 in predicting a QC status of “pass” when RNA input ≥25 ng/µl and library input ≥1.7 ng/µl.
Studies
-
Study Purpose
The purpose of this paper was to compare the performance of two library preparation methods (ribodepletion and exome capture) for next-generation RNA sequencing (RNAseq) of paired formalin-fixed, paraffin-embedded (FFPE) and snap-frozen tissue by RNAseq and to use RNAseq results to identify input thresholds (RNA concentration and library concentration) for the successful analysis of formalin-fixed, paraffin-embedded (FFPE) tissue by RNAseq. Paired frozen and FFPE benign breast tumor specimens collected from seven women were used in the study; no further information on patient-related characteristics or specimen collection, preservation, or storage were provided. The concentration of total RNA (extraction method was not provided) was determined with a Qubit 2.0 Fluorometery and RNA HS Assay. RNA integrity was evaluated using DV200 values (the percentage of RNA fragments longer than 200 bp) that were generated with a bioanalyzer and a RNA 6000 Nano Kit. RNA exome libraries were prepared using 20 ng (frozen specimens) or 40-100 ng of total RNA (isolated from FFPE specimens or the FFPE control) and the TruSeq RNA Library Prep for Enrichment Kit (FFPE specimens) and the Illumina Exome Panel-Enrichment Oligos Kit (frozen specimens) according to the manufacturer’s instructions, with the exception that RNA from FFPE specimens was not fragmented. The concentration of libraries were quantified by Qubit; libraries were pooled based on pre-capture yield (200, 100, 50, 40, 20, 30 ng pools). rRNA depletion libraries were prepared using 20-100 ng of total RNA from FFPE or frozen specimens and the NEBNext rRNA Depletion Kit and Ultra II Directional RNA Library Prep Kit according to “the manufacturer’s protocol for highly degraded (RIN≤2) or intact (RIN>7) samples, respectively”. The quality of both exome capture and rRNA depleted libraries was assessed using the Qubit dsDNA HS Assay, the Bioanalyzer DNA 7500 Assay, and KAPA Library Quantification Kit. Exome capture and rRNA depleted libraries were sequenced on an Illumina NextSeq 500 High Output flowcell machine at coverage of ≥700 or 800 million reads. To assess the accuracy of single nucleotide polymorphisms (SNPs) identified by the two sequencing protocols, three frozen benign breast tumor specimens were sent to the Mayo Clinic for whole exome sequencing (WGS), which entailed the generation of pair-end libraries from 1.0 µg of genomic DNA, whole-exon capture using 750 ng of the prepared library and the SureSelect Human All Exon v5+UTRs 75 Mb Kit, and sequencing with a HiSeq 3000/4000 PE Cluster Kit on a KiSeq 4000 machine. The Bioinformatics quality metrics assessed included “sample-wise median correlation of gene expression” the number of genes that were mapped to genes, and the “number of detectable genes with a transcript per million (TPM) larger than 4”.
Summary of Findings:
The exome capture library preparation method resulted in fewer sequenced reads but a higher proportion of reads mapped to regions containing gene and exon-exon junctions, and a higher number of canonical exon-exon junctions than the ribodepletion method for both frozen and FFPE paired breast specimens. The exome capture library preparation method also had fewer false positive calls and significantly higher SNP confirmation rates (based on WES results obtained with 3 frozen specimens) than the ribodepletion method (p<2.2XE-16). Based on gene expression profiles, all 5 FFPE specimens sequenced with the exome capture method clustered by individual with the corresponding frozen specimens sequenced by the TruSeq method with PolyA Selection (not by preservation/library preparation method), but only 2 of 5 samples clustered by individual when FFPE libraries were prepared with the rRNA Depletion method.
Based on trends observed between bioinformatic metrics and false positive rate (FPR), the following were identified as the authors as corresponding to a quality control (QC) failure: a median sample-wise correlation within the cohort of <0.75, <11,400 detectable genes, or <25 million reads mapped to gene regions. Samples that failed to meet all three bioinformatic QC criteria had significantly lower library concentration than those that passed (median 2.08 versus 5.82 ng/µl; p=2.8E-6) and lower RNA input concentration (median 18.9 versus 40.8 ng/µl). As library concentration increased (Qubit), the local failure rate decreased, down to 20% at a library concentration of 2-4 ng/µl. As RNA input concentrations increased, local failure rate decreased down to 25% at an RNA concentration of 20-30 ng/µl, which is higher than the concentration recommended by the vendor for a DV200 value of 30-50% (≥4.7 ng/µl). The authors developed a decision tree model (based on the concentrations of RNA and library inputs) that resulted in an F score of 0.848 in predicting a QC status of “pass” when RNA input ≥25 ng/µl and library input ≥1.7 ng/µl. The authors recommend a minimum RNA concentration of 25 ng/µl and a minimum library concentration of 1.7 ng/µl for RNAseq analysis by the RNA Exome Capture Method when studying FFPE tissue specimens.
Biospecimens
Preservative Types
- Frozen
- Formalin
Diagnoses:
- Neoplastic - Benign
Platform:
Analyte Technology Platform RNA Fluorometry RNA Next generation sequencing RNA Automated electrophoresis/Bioanalyzer Pre-analytical Factors:
Classification Pre-analytical Factor Value(s) Biospecimen Preservation Type of fixation/preservation Formalin (buffered)
Frozen
Next generation sequencing Specific Template/input amount <25 (RNA input)
≥25 ng/µl (RNA input)
<1.7 (library input)
≥1.7 ng/µl (library input)
Next generation sequencing Specific Technology platform TruSeq RNA Exome (Illumina)
NEBNext rRNA Depletion