Metabolomics is a powerful tool for understanding phenotypes and discovering biomarkers. Combinations of multiple batches or data sets in large cross-sectional epidemiology studies are frequently utilized in metabolomics, but various systematic biases can introduce both batch and injection order effects and often require proper calibrations prior to chemometric analyses. We present a novel algorithm, Batch Normalizer, to calibrate large scale metabolomic data. Batch Normalizer utilizes a regression model with consideration of the total abundance of each sample to improve its calibration performance, and it is able to remove both batch effect and injection order effects. This calibration method was tested using liquid chromatography/time-of- flight mass spectrometry (LC/TOF-MS) chromatograms of 228 plasma samples and 23 pooled quality control (QC) samples. We evaluated the performance of Batch Normalizer by examining the distribution of relative standard deviation (RSD) for all peaks detected in the pooled QC samples, the average Pearson correlation coefficients for all peaks between any two of QC samples, and the distribution of QC samples in the scores plot of a principal component analysis (PCA). After calibration by Batch Normalizer, the number of peaks in QC samples with RSD less than 15% increased from 11 to 914, all of the QC samples were closely clustered in PCA scores plot, and the average Pearson correlation coefficients for all peaks of QC samples increased from 0.938 to 0.976. This method was compared to 7 commonly used calibration methods. We discovered that using Batch Normalizer to calibrate LC/TOF-MS data produces the best calibration results.
ASJC Scopus subject areas