RNA-seq from archival FFPE breast cancer samples: molecular pathway fidelity and novel discovery.
Author(s): Pennock ND, Jindal S, Horton W, Sun D, Narasimhan J, Carbone L, Fei SS, Searles R, Harrington CA, Burchard J, Weinmann S, Schedin P, Xia Z
Publication: BMC Med Genomics, 2019, Vol. 12, Page 195
PubMed ID: 31856832 PubMed Review Paper? No
Purpose of Paper
This paper investigated the effects of number of sections used, block storage duration, epithelial tumor area, specimen procurement method (biopsy versus surgical resection), and library preparation method on RNA yield, RNA quality, and NGS success. Results using different NGS normalization strategies and data pipelines were also investigated.
Conclusion of Paper
In most cases, sufficient RNA for NGS was obtained from a single 10 µm section and DV200 values were unaffected by input amount. Older blocks were less likely to yield RNA of a suitable DV200 for RNA-seq (≥30%) than those collected more recently. The data indicated a possible effect of repository on the DV200, indicating possible confounding factors. RNA yield was correlated with epithelial tumor area and was higher from resection than biopsy specimens but DV200 values were unaffected.
A higher read count per gene and less variability but fewer unique reads was obtained when library preparation was with the TruSeq Kit rather than the Ovation Kit. A linear relationship between raw and transformed values was observed in data from both library preparation kits when normalized using Loess, VST, and Limma, but quantile and q-spline distorted the data based on library preparation method. Using a soft-clip approach resulted in a larger increase in the percentage of aligned reads than increasing the number of permitted mismatches, indicating the driver of unaligned reads in FFPE specimens was the quality at the end of reads. Specimens clustered first by preservation method and then by estrogen receptor (ER) status when data from FFPE ER+ and ER- cases were compared with database data for non-matched fresh specimens, but they largely clustered by ER status after application of the Oncotype Dx or PAM50 but not MammaPrint gene sets. There was considerable overlap in genes and regulons differentially expressed by ER status in fresh and FFPE specimens. Genes unique to the FFPE dataset were also investigated and found to be related to UV response, Unfolded Protein Response, and other stress pathways.
Studies
-
Study Purpose
This study investigated the effects of number of sections used, block storage duration, epithelial tumor area, specimen procurement method (biopsy versus surgical resection), and library preparation method on RNA yield, RNA quality, and NGS success. Results using different NGS normalization strategies and data pipelines were also investigated. Fifty-eight archival FFPE invasive breast cancer specimens (resection and biopsy) stored for 2-23 years at room temperature at two different repositories were used for this study but no details of specimen procurement or processing were provided. A pathologist reviewed H&E stained sections for tumor content and adjacent freshly cut 10 µm sections were chosen for RNA extraction. RNA was extracted from 1-4 sections using the miRNeasy FFPE kit. RNA yield and purity were determined spectrophotometrically and integrity (DV200) by bioanalyzer. Sequencing libraries were prepared from 75 ng RNA using the TruSeq RNA Access Library Prep Kit and from 150 ng of RNA using the Ovation Human FFPE RNAseq Library System and quantified using KAPA Library Quantification Kits before being pooled and sequenced on a Hi-Seq 2500. Data was compared to that in public databases for 10 ER+ and 10 ER- cases. The effects of the number of sections was investigated using five archival breast cancer specimens.
Summary of Findings:
Sufficient RNA for NGS was obtained from a single 10 µm section in four of the five cases examined and DV200 values were unaffected by input amount. Older blocks were less likely to yield RNA of a suitable DV200 for RNA-seq than those collected more recently with a lower percentage of specimens with a DV200 ≥30% observed in specimens stored for more than 11 years versus those stored ≤10 years (13% versus 82%). The data indicated a possible effect of repository on the DV200, indicating possible confounding factors. RNA yield was correlated with epithelial tumor area (r=0.52, P<0.001) but not DV200. RNA yield was significantly higher from 10 µm sections of surgical specimens than biopsy specimens but was sufficient for RNA-seq in all specimens with the exception of one of the five biopsy specimens.
In the two cases examined, a higher read count per gene but fewer unique reads was obtained when library preparation was with the TruSeq Kit rather than the Ovation Kit. Further, the data obtained using the TruSeq Kit lacked some gene families. A linear relationship between raw and transformed values was observed in data from both library preparation kits when normalized using Loess, VST, and Limma but quantile and q-spline distorted the data based on library preparation method. The authors chose VST for further use and found that there were comparable reads per gene after normalization for the two library preparation methods but specimens grouped based on library preparation method rather than by patient after VST normalization. There was less variability expression using the TruSeq rather than Ovation platform (11.1% versus 23.2%), leading the authors to choose this method for further analysis. Using a soft-clip approach resulted in a larger increase in the percentage of aligned reads than increasing the number of permitted mismatches, indicating the driver of unaligned reads in FFPE specimens was the quality at the end of reads. Specimens clustered first by preservation method and then by ER status when data from FFPE ER+ and ER- cases were compared with database data for non-matched fresh specimens. When specimens were evaluated using the Oncotype Dx and PAM50 gene sets, they clustered largely by ER status with one fresh specimen mis-clustering using the Oncotype Dx gene set. In contrast, specimens clustered first by preservation method than by ER status when using the MammaPrint gene set with two fresh specimens misclustering by ER status. Importantly, when lists of genes differentially expressed based on ER status were compared in fresh and FFPE specimens, there was considerable overlap with 278 upregulated and 324 downregulated common to both data sets, but three-fold more genes were found to be differentially expressed in the fresh than FFPE data set. Genes unique to the FFPE dataset were also investigated and found to be related to UV response, Unfolded Protein Response, and other stress pathways. When regulons were compared, 24 of the 30 most differentially expressed regulons between ER+ and ER- specimens were common to fresh and FFPE specimens and these were strongly correlated (ccc=0.861).
Biospecimens
Preservative Types
- Formalin
Diagnoses:
- Neoplastic - Carcinoma
Platform:
Analyte Technology Platform RNA Next generation sequencing RNA Spectrophotometry RNA Automated electrophoresis/Bioanalyzer Pre-analytical Factors:
Classification Pre-analytical Factor Value(s) Biospecimen Acquisition Method of tissue acquisition Biopsy
Surgical resection
Storage Storage duration 2-23 years
Next generation sequencing Specific Template modification TruSeq RNA Access Library Prep Kit
Ovation Human FFPE RNAseq Library System
Biospecimen Aliquots and Components Aliquot size/volume One 10 um section
Two 10 um sections
Three 10 um sections
Four 10 um sections
Next generation sequencing Specific Data handling Loess normalization
VST normalization
Limma normalization
Quantile normalization
Q-spline normalization
Oncotype Dx gene set
PAM 50 gene set
MammaPrint gene set
Permissive mismatch
Soft-clipped
Biospecimen Preservation Type of fixation/preservation None (fresh)
Formalin (buffered)