Comparison of pre-processing methodologies for Illumina 450k methylation array data in familial analyses.
Author(s): Cazaly E, Thomson R, Marthick JR, Holloway AF, Charlesworth J, Dickinson JL
Publication: Clin Epigenetics, 2016, Vol. 8, Page 75
PubMed ID: 27429663 PubMed Review Paper? No
Purpose of Paper
This paper investigated how different methods of normalizing 450k methylation array data may affect technical bias and associations between methylation data and single nucleotide polymorphisms (SNPs) and patient age.
Conclusion of Paper
The superior normalization method of 450k methylation array data was a combination of stratified quantile normalization (QN) and batch normalization using ComBat, as it was the most successful at minimizing batch effects, array bias and differentially methylated region standard error (DMSRE) and produced the lowest absolute median difference for five of the six technical replicates assessed. This combined method also yielded the strongest association between methylation data and patient age. Importantly, after normalization methylation data were significantly associated with SNPs, but significance was stronger when methylation data were normalized by stratified QN alone, as opposed to a combination of stratified QN and ComBAT.
Studies
-
Study Purpose
This study investigated how different methods of normalizing 450k methylation array data may affect technical biases and associations between methylation data and SNPs and patient age. DNA was isolated with the Nucleon BACC3 kit from fifty peripheral blood specimens collected as part of the Tasmanian Familial Prostate Cancer study which includes men with prostate cancer (16 individuals) and their close relatives (34 individuals). Blood was stored at 4˚C for up to 6 months before DNA isolation (DNA was isolated from most specimens within 1-2 months). DNA was quantified by spectrophotometer and specimens with OD260:280 ratios of less than 1.8 were cleaned with the Zymo Clean & Concentrator (TM)-5 Kit. DNA (1 µg based on Qubit quantification) was bisulfite converted with the EZ DNA Methylation-Gold kit. Methylation was then analyzed using 450k arrays and the data was subjected to different data-handling methods, including eight different methods of normalization (QN, stratified QN, Beta-mixture quantile dilation (BMIQ), subset-quantile within array normalization (SWAN), functional normalization (FunNorm), Dasen, Noob, and ComBat Batch correction (ComBat)). The performance of each method was compared by eight metrics: 1. Density plots, 2. Multidimensional scaling (MDS) plots, 3. ANOVA of the first principal component of the MDS plot, 4. Median absolute differences between replicate samples, 5. Imprinted regions density plots and differentially methylated region standard error, 6. Cluster dendrogram, 7. Association between SNPs and methylation at cg17749961, and 8. Association of epigenome wide methylation with patient age. For some specimens, genotype data obtained using Illumina HumanOmni 2.5-8Beadchip was also available.
Summary of Findings:
A substantial effect of processing batch was observed among methylation profiles. Processing batch effects on methylation were reduced when the data was normalized using stratified QN and/or ComBat batch correction, were worsened by subset-quantile within array normalization (SWAN), and remained unaffected by beta-mixture quantile dilation. Within array bias was best removed using stratified QN alone or with ComBat and to a lesser extent lesser extent by QN, functional normalization, Dasen and Noob, but was not substantially affected by SWAN or beta-mixture quantile dilation. Based on hierarchical cluster dendrograms normalization using stratified QN and ComBat generated superior results than the other normalization methods assessed. The absolute median difference between technical replicates was lowest for 5 of the 6 pairs when data was adjusted using stratified QN with ComBat followed by ComBat and stratified QN alone. Larger absolute mean differences between technical replicates than were observed in the raw data were observed in all 6 pairs after normalization with BMIQ; 4 of 6 pairs after normalization with SWAN; 2 of 6 pairs after QN, functional normalization or Noob; and 1 of 6 pairs after Dasen normalization. The differentially methylated region standard error (DMSRE) was reduced after stratified QN with ComBat (0.0012), stratified QN or ComBat alone (0.0028, both), Dasen (0.0043), and SWAN (0.0046), but increased after QN, Noob or functional normalization (0.0052, 0.0056 and 0.0052, respectively versus 0.0048 for raw data), but was unaffected by BMIQ normalization (0.0048).
Raw methylation data were significantly associated with SNP data (P=7.29 x 10-6) and the significance strengthened (P=3.53 x 10-7) when stratified QN was applied. Interestingly, significance was not quite as strong when data were ComBAT normalized (P=1.05 x 10-5). The association between patient age and methylation increased when raw data (λ=0.838) was normalized by stratified QN (λ=1.402) or ComBat (λ=1.448).
Biospecimens
Preservative Types
- Other Preservative
Diagnoses:
- Neoplastic - Carcinoma
- Not specified
- Normal
Platform:
Analyte Technology Platform DNA DNA microarray DNA Bisulfite conversion assay Pre-analytical Factors:
Classification Pre-analytical Factor Value(s) DNA microarray Specific Data handling QN
Stratified QN
SWAN
BMIQ
FunNorm
Dasen
Noob
ComBat
Stratified QN and ComBat
Preaquisition Patient age 23-92 years