Several prognostic signatures have been identified for breast cancer. However, these signatures vary extensively in their gene compositions, and the poor concordance of the risk groups defined by the prognostic signatures hinders their clinical applicability. Breast cancer risk prediction was refined with a novel approach to finding concordant genes from leading edge analysis of prognostic signatures. Each signature was split into two gene sets, which contained either up-regulated or down-regulated genes, and leading edge analysis was performed within each array study for all up-/down-regulated gene sets of the same signature from all training datasets. Consensus of leading edge subsets among all training microarrays was used to synthesize a predictive model, which was then tested in independent studies by partial least squares regression. Only a small portion of six prognostic signatures (Amsterdam, Rotterdam, Genomic Grade Index, Recurrence Score, and Hu306 and PAM50 of intrinsic subtypes) was significantly enriched in the leading edge analysis in five training datasets (n = 2,380), and that the concordant leading edge subsets (43 genes) could identify the core signature genes that account for the enrichment signals providing prognostic power across all assayed samples. The proposed concordant leading edge algorithm was able to discriminate high-risk from low-risk patients in terms of relapse-free or distant metastasis-free survival in all training samples (hazard ratios: 1.84–2.20) and in three out of four independent studies (hazard ratios: 3.91–8.31). In some studies, the concordant leading edge subset remained a significant prognostic factor independent of clinical ER, HER2, and lymph node status. The present study provides a statistical framework for identifying core consensus across microarray studies with leading edge analysis, and a breast cancer risk predictive model was established.
ASJC Scopus subject areas