Large scale, robust, and accurate whole transcriptome profiling from clinical formalin-fixed paraffin-embedded samples.
Author(s): Newton Y, Sedgewick AJ, Cisneros L, Golovato J, Johnson M, Szeto CW, Rabizadeh S, Sanborn JZ, Benz SC, Vaske C
Publication: Sci Rep, 2020, Vol. 10, Page 17597
PubMed ID: 33077815 PubMed Review Paper? No
Purpose of Paper
This paper investigated whether whole transcriptome profiling of formalin-fixed, paraffin-embedded (FFPE) specimens is replicable by comparing results between case-matched FFPE and frozen specimens, macro- and micro-dissected FFPE specimens, extraction replicates, library preparation replicates and replicates processed at two different laboratories. Additionally, data generated in this study using ribo-depleted RNA from 3,115 FFPE specimens was compared to data obtained from poly-A selected RNA from frozen and FFPE specimens in The Cancer Genome Atlas (TCGA) cohort.
Conclusion of Paper
RNA sequencing replicability was affected by transcript integrity numbers (TIN), GC content, fragment size, and exome coverage; with few exceptions, correlations between case-matched pairs (different sequencing labs, library replicates, replicate extractions, or macro- versus micro-dissected specimens) were very strong (r>0.95). Importantly, in all cases, replicate specimens were more similar to their case-matched specimen than to specimens from other patients.
Compared with case-matched frozen OCT-embedded specimens, FFPE specimens had shorter RNA fragments, higher GC content, and lower transcript integrity, but comparable transcriptome coverage. Oddly, the mapping rate was higher for FFPE specimens, which the authors attributed to the shorter fragment sizes and higher fraction of ribosomal reads. The median correlation between FFPE and case-matched frozen OCT-embedded specimens was 0.954, which was slightly lower than for macro- versus micro-dissected case-matched FFPE pairs. The correlation between matched FFPE and frozen OCT-embedded specimens depended on TIN, GC content, and template length, particularly in the FFPE specimen. Moderate systematic differences in coverage consistency between case-matched FFPE and frozen OCT-embedded specimens were observed. The per-gene TIN curves of frozen specimens were unimodal, but a bimodal distribution was observed for FFPE specimens, with more genes having low TIN. Zinc finger proteins were among the genes that tended to have low TIN in FFPE specimens. Nevertheless, while differences between case-matched FFPE and frozen specimens were larger than between microdissected and macrodissected replicates, pairs were more similar to each other than to specimens collected from other patients.
To assess ribo-depletion versus poly-A library preparation, the authors compared data from 3,116 FFPE specimens sequenced after ribo-depletion with the 10,379 poly-A-selected frozen specimens in the TCGA cohort. While the frozen specimens in TCGA had higher TIN (particularly for zinc-finger proteins), FFPE specimens from this study had higher coverage across most transcripts. As expected, the TCGA cohort had a strong 3’ bias, particularly for long transcripts, due to polyA selection. In contrast, the ribodepleted FFPE specimens had a slight 5’ bias and less dependence on transcript length. However, when gene expression data was directly compared using the computational mapping method developed by the authors, the largest differences were due to cancer type not cohort or preservation method (FFPE versus frozen). Importantly, this mapping methodology did a better job at removing platform effects while preserving gene-level quantification than the commonly used ComBat method (P≤2.2e−16). The ribo-depleted FFPE specimens in this cohort were more similar to the 113 poly-A selected FFPE specimens in the TCGA cohort than the poly-A selected frozen specimens in the TCGA cohort. The majority of differences between the FFPE specimens of this study and the poly-A selected TCGA cohort affected mitochondrial proteins, which the author stated are often inadvertently removed during ribo-depletion. The majority of differences between the FFPE specimens of this study and the frozen specimens in the poly-A selected TCGA cohort were in histones, which the authors stated are often not poly-adenylated and therefore were lost in the poly-A selection step employed by TCGA. Nevertheless, specimen stratification based on breast cancer molecular markers was consistent between the FFPE specimens of this study and the TCGA frozen specimens.
Studies
-
Study Purpose
This study investigated whether whole transcriptome profiling of FFPE specimens is replicable by comparing results between case-matched FFPE and frozen specimens, macro- and micro-dissected FFPE specimens, extraction replicates, library preparation replicates and replicates processed at two different laboratories. Additionally, data generated in this study using ribo-depleted RNA from 3,115 FFPE specimens was compared to data obtained from poly-A selected RNA from frozen and FFPE specimens in The Cancer Genome Atlas (TCGA) cohort. More than 3,200 FFPE tumor specimens (breast, colorectal, sarcoma, lung, brain, pancreatic, ovarian, head and neck, prostate, uterine, esophageal, stomach, cholangiocarcinoma, kidney, bladder, melanoma, liver, skin-not melanoma, thymus, lymphoma, adrenal and others) were obtained from multiple centers. Twenty-five specimens had a case-matched frozen specimen embedded in OCT. From each block, 10 μM tissue sections were cut, and tumor areas were macrodissected. Laser microdissection of an additional section was performed for 16 specimens. RNA was extracted using the RNeasy FFPE Kit and quantified by Qubit. Fifty-seven specimens underwent replicate RNA extraction. Sequencing libraries were constructed using the KAPA Stranded RNA-Seq Kit with RiboErase and sequenced on an Illumina HiSeq platform. Eighty-seven specimens had replicate libraries prepared, and 57 specimens were sequenced at two different laboratories. The transcript integrity numbers (TIN) were calculated using a previously published method and are based on the evenness of read coverage. Data were compared to data from TCGA FASTQ files processed using the same bioinformatics pipeline.
Summary of Findings:
RNA sequencing replicability between the two processing labs was affected by RNA integrity, GC content, fragment size, and exome coverage. All FFPE specimens with low RNA sequencing reproducibility (r<0.95) also had a low TIN (<50), GC content >50%, or <10 x average exome coverage. There was no bias in template length or exome coverage between the two labs, but one lab had greater ribosomal depletion. Library replicates were generally strongly correlated, with only a few low TIN specimens having an r<0.95. Importantly, correlation coefficients were high both when the entire transcriptome was analyzed and when only COSMIC cancer genes were considered. While the median correlation between macrodissected and microdissected FFPE specimens was very strong (r=0.982) and replicate specimens were most similar to their counterpart, variability was greater than observed among extraction or library preparation replicates. Compared to case-matched frozen, OCT-embedded specimens (frozen), FFPE specimens had shorter RNA fragments, higher GC content, and lower transcript integrity, but comparable transcriptome coverage. Oddly, the mapping rate was higher for FFPE specimens, which the authors attributed to the shorter fragment sizes and higher fraction of ribosomal reads. The median correlation between FFPE and case-matched frozen specimens was 0.954, which is slightly lower than for macro- versus micro-dissected pairs. Importantly, the strength of the correlation between matched FFPE and frozen specimens was dependent on TIN, GC content, and template length, particularly in the FFPE specimen. Moderate systematic differences in coverage consistency between case-matched FFPE and frozen specimens were observed. The per-gene TIN curves of frozen specimens were unimodal, but a bimodal distribution was observed for FFPE specimens, as more genes had low TIN. Zinc finger proteins were among the genes that tended to have low TIN in FFPE specimens. Nevertheless, while differences between case-matched FFPE and frozen specimens were larger than those between microdissected and macrodissected replicates, pairs were more similar to each other than to any other specimen.
To assess ribo-depletion versus poly-A library preparation, the authors compared data from 3,116 FFPE specimens sequenced after ribo-depletion with the 10,379 poly-A-selected frozen specimens in the TCGA cohort. While frozen specimens in the TCGA cohort had higher TINs, FFPE specimens from this study had higher coverage across most transcripts. As expected, the TCGA cohort had a strong 3’ bias, particularly for long transcripts. In contrast, the ribo-depleted FFPE specimens had a slight 5’ bias and less dependence on transcript length. When the gene expression data was directly compared using the computational mapping method developed by the authors, the largest differences were due to cancer type not cohort or preservation method (FFPE versus frozen). Importantly, this mapping methodology did a better job at removing platform effects while preserving gene-level quantification compared to the commonly used ComBat method (P≤2.2e−16). The ribo-depleted FFPE specimens in this cohort were more similar to the 113 poly-A selected FFPE specimens in the TCGA cohort than to the poly-A selected frozen specimens in the TCGA cohort. The majority of differences between the FFPE specimens in this study and the poly-A selected TCGA cohort affected mitochondrial proteins, which the authors stated are often inadvertently removed during ribo-depletion. The majority of differences between the FFPE specimens in this study and the frozen specimens in the TCGA cohort affected histones, which the authors stated are often lost during the poly-A selection step employed by TCGA. Nevertheless, specimen stratification based on breast cancer molecular markers was consistent between the FFPE specimens in this study and the TCGA frozen specimens.
Biospecimens
- Tissue - Adrenal Gland
- Tissue - Other
- Tissue - Colorectal
- Tissue - Lung
- Tissue - Brain
- Tissue - Pancreas
- Tissue - Ovary
- Tissue - Head and Neck
- Tissue - Prostate
- Tissue - Uterus
- Tissue - Esophagus
- Tissue - Stomach
- Tissue - Bladder
- Tissue - Kidney
- Tissue - Liver
- Tissue - Skin
- Tissue - Thymus Gland
- Tissue - Lymph Node
- Tissue - Breast
Preservative Types
- Formalin
- Frozen
Diagnoses:
- Neoplastic - Carcinoma
- Neoplastic - Lymphoma
- Neoplastic - Sarcoma
- Neoplastic - Not specified
- Neoplastic - Melanoma
Platform:
Analyte Technology Platform RNA Fluorometry RNA Next generation sequencing Pre-analytical Factors:
Classification Pre-analytical Factor Value(s) Biospecimen Aliquots and Components Cell capture method Macrodissected
Microdissected
Analyte Extraction and Purification Analyte isolation method Replicate extractions performed
Next generation sequencing Specific Template modification Poly-A selection
Ribosomal depletion
Biospecimen Preservation Type of fixation/preservation Formalin (buffered)
Frozen
