RNA-seq from archival FFPE breast cancer samples: molecular pathway fidelity and novel discovery.

Author(s): Pennock ND, Jindal S, Horton W, Sun D, Narasimhan J, Carbone L, Fei SS, Searles R, Harrington CA, Burchard J, Weinmann S, Schedin P, Xia Z

Publication: BMC Med Genomics, 2019, Vol. 12, Page 195

PubMed ID: 31856832 PubMed Review Paper? No

Purpose of Paper

This paper investigated the effects of number of sections used, block storage duration, epithelial tumor area, specimen procurement method (biopsy versus surgical resection), and library preparation method on RNA yield, RNA quality, and NGS success. Results using different NGS normalization strategies and data pipelines were also investigated.

Conclusion of Paper

In most cases, sufficient RNA for NGS was obtained from a single 10 µm section and DV200 values were unaffected by input amount. Older blocks were less likely to yield RNA of a suitable DV200 for RNA-seq (≥30%) than those collected more recently. The data indicated a possible effect of repository on the DV200, indicating possible confounding factors. RNA yield was correlated with epithelial tumor area and was higher from resection than biopsy specimens but DV200 values were unaffected.

A higher read count per gene and less variability but fewer unique reads was obtained when library preparation was with the TruSeq Kit rather than the Ovation Kit. A linear relationship between raw and transformed values was observed in data from both library preparation kits when normalized using Loess, VST, and Limma, but quantile and q-spline distorted the data based on library preparation method. Using a soft-clip approach resulted in a larger increase in the percentage of aligned reads than increasing the number of permitted mismatches, indicating the driver of unaligned reads in FFPE specimens was the quality at the end of reads. Specimens clustered first by preservation method and then by estrogen receptor (ER) status when data from FFPE ER+ and ER- cases were compared with database data for non-matched fresh specimens, but they largely clustered by ER status after application of the Oncotype Dx or PAM50 but not MammaPrint gene sets. There was considerable overlap in genes and regulons differentially expressed by ER status in fresh and FFPE specimens. Genes unique to the FFPE dataset were also investigated and found to be related to UV response, Unfolded Protein Response, and other stress pathways.

Studies

Study Purpose

This study investigated the effects of number of sections used, block storage duration, epithelial tumor area, specimen procurement method (biopsy versus surgical resection), and library preparation method on RNA yield, RNA quality, and NGS success. Results using different NGS normalization strategies and data pipelines were also investigated. Fifty-eight archival FFPE invasive breast cancer specimens (resection and biopsy) stored for 2-23 years at room temperature at two different repositories were used for this study but no details of specimen procurement or processing were provided. A pathologist reviewed H&E stained sections for tumor content and adjacent freshly cut 10 µm sections were chosen for RNA extraction. RNA was extracted from 1-4 sections using the miRNeasy FFPE kit. RNA yield and purity were determined spectrophotometrically and integrity (DV200) by bioanalyzer. Sequencing libraries were prepared from 75 ng RNA using the TruSeq RNA Access Library Prep Kit and from 150 ng of RNA using the Ovation Human FFPE RNAseq Library System and quantified using KAPA Library Quantification Kits before being pooled and sequenced on a Hi-Seq 2500. Data was compared to that in public databases for 10 ER+ and 10 ER- cases. The effects of the number of sections was investigated using five archival breast cancer specimens.

Summary of Findings:

Sufficient RNA for NGS was obtained from a single 10 µm section in four of the five cases examined and DV200 values were unaffected by input amount. Older blocks were less likely to yield RNA of a suitable DV200 for RNA-seq than those collected more recently with a lower percentage of specimens with a DV200 ≥30% observed in specimens stored for more than 11 years versus those stored ≤10 years (13% versus 82%). The data indicated a possible effect of repository on the DV200, indicating possible confounding factors. RNA yield was correlated with epithelial tumor area (r=0.52, P<0.001) but not DV200. RNA yield was significantly higher from 10 µm sections of surgical specimens than biopsy specimens but was sufficient for RNA-seq in all specimens with the exception of one of the five biopsy specimens.

In the two cases examined, a higher read count per gene but fewer unique reads was obtained when library preparation was with the TruSeq Kit rather than the Ovation Kit. Further, the data obtained using the TruSeq Kit lacked some gene families. A linear relationship between raw and transformed values was observed in data from both library preparation kits when normalized using Loess, VST, and Limma but quantile and q-spline distorted the data based on library preparation method. The authors chose VST for further use and found that there were comparable reads per gene after normalization for the two library preparation methods but specimens grouped based on library preparation method rather than by patient after VST normalization. There was less variability expression using the TruSeq rather than Ovation platform (11.1% versus 23.2%), leading the authors to choose this method for further analysis. Using a soft-clip approach resulted in a larger increase in the percentage of aligned reads than increasing the number of permitted mismatches, indicating the driver of unaligned reads in FFPE specimens was the quality at the end of reads. Specimens clustered first by preservation method and then by ER status when data from FFPE ER+ and ER- cases were compared with database data for non-matched fresh specimens. When specimens were evaluated using the Oncotype Dx and PAM50 gene sets, they clustered largely by ER status with one fresh specimen mis-clustering using the Oncotype Dx gene set. In contrast, specimens clustered first by preservation method than by ER status when using the MammaPrint gene set with two fresh specimens misclustering by ER status. Importantly, when lists of genes differentially expressed based on ER status were compared in fresh and FFPE specimens, there was considerable overlap with 278 upregulated and 324 downregulated common to both data sets, but three-fold more genes were found to be differentially expressed in the fresh than FFPE data set. Genes unique to the FFPE dataset were also investigated and found to be related to UV response, Unfolded Protein Response, and other stress pathways. When regulons were compared, 24 of the 30 most differentially expressed regulons between ER+ and ER- specimens were common to fresh and FFPE specimens and these were strongly correlated (ccc=0.861).

Biospecimens

Tissue - Breast

Preservative Types

Formalin

Diagnoses:

Neoplastic - Carcinoma

Platform:

Analyte	Technology Platform
RNA	Next generation sequencing
RNA	Spectrophotometry
RNA	Automated electrophoresis/Bioanalyzer

Pre-analytical Factors:

Classification	Pre-analytical Factor	Value(s)
Biospecimen Acquisition	Method of tissue acquisition	Biopsy Surgical resection
Storage	Storage duration	2-23 years
Next generation sequencing Specific	Template modification	TruSeq RNA Access Library Prep Kit Ovation Human FFPE RNAseq Library System
Biospecimen Aliquots and Components	Aliquot size/volume	One 10 um section Two 10 um sections Three 10 um sections Four 10 um sections
Next generation sequencing Specific	Data handling	Loess normalization VST normalization Limma normalization Quantile normalization Q-spline normalization Oncotype Dx gene set PAM 50 gene set MammaPrint gene set Permissive mismatch Soft-clipped
Biospecimen Preservation	Type of fixation/preservation	None (fresh) Formalin (buffered)

You Recently Viewed

News and Announcements

Most Downloaded SOPs in 2024

New Articles on the GTEx Project are Now FREELY Available!

Just Published!

More...