RNA-seq: impact of RNA degradation on transcript quantification.
Author(s): Gallego Romero I, Pai AA, Tung J, Gilad Y
Publication: BMC Biol, 2014, Vol. 12, Page 42
PubMed ID: 24885439 PubMed Review Paper? No
Purpose of Paper
The purpose of this paper was to determine the effects of decreased RNA integrity numbers (RIN) due to room temperature storage of peripheral blood mononuclear cells (PBMC) on Whole Transcriptome Sequencing data (WTSS). The effects of transcript characteristics and data-handling were also investigated.
Conclusion of Paper
With increasing room temperature storage of PBMC, the RIN of the resultant RNA declined. RIN was positively associated with the number of uniquely mapped reads and the number of reads mapped to genes and negatively associated with the proportion of reads that were due to spiked in control material. While specimens with high RIN (mean RIN ≥7.9) clustered by individual, those with a RIN <7.9 clustered with specimens with a similar level of degradation. While almost all transcripts were detected at all timepoints, the degradation rates increased with increasing %GC content and length of either the 3' untranslated region (UTR) or coding DNA sequence (CDS). Including the RIN as a covariate in the generalized linear model, or regression of the data to account for the RIN, allowed for the identification of more genes that were differentially expressed between individuals.
The purpose of this study was to determine the effects of room temperature storage of PBMC on RIN and the effects of using specimens with a low RIN on whole transcriptome sequencing data. The effects of GC content, transcript and UTR length as well as data-handling characteristics such as the mapping algorithm used, exclusion of regions >1000 nt from the 3'UTR, and accounting for RIN in statistical analysis were also investigated. Aliquots of PBMC from 4 patients were stored at room temperature, lysed at the appropriate timepoint and stored frozen until extraction using the RNeasy kit. 50 bp reads were obtained using the Illumina HiSeq2000.
Summary of Findings:
RINs declined from an average of 9.3 in specimens extracted immediately, to 3.8 in specimens stored at room temperature for 84 h before RNA extraction. RIN was positively associated with the number of uniquely mapped reads (p<0.01) and the number of reads mapped to genes (p<0.00) and negatively associated with the proportion of reads that were due to spiked in control material. Importantly, 28.9% of the variance in gene expression of principle component 1 was associated with RIN score (p<0.000001). Further, specimens with high RIN (mean RIN ≥7.9) clustered by individual, but those with a RIN <7.9 were more correlated with specimens with a similar level of degradation than with intact specimens from the same individual. This effect was observed regardless of distance from the 3' UTR and mapping algorithm used. As RIN decreased, the mean reads per kilobase transcript per million (RPKM) increased (p<0.0001), but the median RPKM decreased reflecting non-uniform degradation resulting in a less complex library. While almost all transcripts were detected at all timepoints, the degradation rates were transcript dependent. The rate of degradation was correlated with CDS length (ρ= -0.068, p<10^−12), %GC content (ρ= -0.039, p<0.001), and 3′UTR length (ρ= -0.136, p<10^−15) with faster degradation occurring with higher %GC content and increased length of either the 3' UTR or the CDS. Degradation of pseudogenes tended to be slower than that of protein-coding genes (p<10^-16). When RIN was a covariate in the generalized linear model, the number of differentially expressed genes across time-points decreased dramatically, and the number of genes found to be differentially expressed between individuals increased. Regression of the data for the RIN was better at eliminating the effects of degradation than including RIN as a covariate, but neither method, alone or together, was able to completely eliminate the effects.
- None (Fresh)
- Not specified
Analyte Technology Platform RNA Next generation sequencing RNA Automated electrophoresis/Bioanalyzer
Classification Pre-analytical Factor Value(s) Storage Time at room temperature 0 h
Next generation sequencing Specific Data handling All reads considered
Only reads within 1000 nt of 3' UTR considered
Mapped with BWA 0.63
Mapped with TopHat 2.08
RIN included as covariate
Regressed for RIN