Background: The survival analysis of the Cancer Genome Atlas (TCGA) dataset is a well-known method to discover the gene expression-based prognostic biomarkers of head and neck squamous cell carcinoma (HNSCC). In order to utilize a continuous gene expression for survival analysis, it is necessary to determine a cutoff point by the dichotomization of the patients. There is some optimization software for cutoff determination. However, those predetermined cutoffs by software usually set at the median, 1/4 quantile, or 3/4 quantile of RNA sequencing (RNA -Seq) value to nd a significant P-value of the Kaplan-Meier curve. There are few clinicopathological features available on their pre-processed data sets. Methods: We developed a comprehensive work flow by R script, running on the Rstudio platform. It includes data retrieving and pre-processing, feature selection, cutoff mining engine, Kaplan-Meier survival analysis, Cox proportional hazard modeling, and biomarker discovery. Results: Using this work flow on the TCGA HNSCC cohort, we scanned human protein-coding genes (20,500) programmatically. After adjustment with other confounders, we found that the clinical tumor stage and the surgical margin involvement are independent risk factors in patient survival. According to the resulting tables with Bonferroni adjusted P-value under optimal cutoff as well as hazard ratio (>= 1:5), there were ten candidate biomarkers, named as DKK1, CAMK2N1, STC2, PGK1, SURF4, USP10, NDFIP1, FOXA2, STIP1, and DKC1, which are significantly associated with the poor prognosis of overall survival (OS). At the same time, the other ten genes were over-expressed in the better survival patients (with hazard ratio
Original languageEnglish
Publication statusUnpublished - Dec 4 2020

Cite this