Importance: A prediction model for new-onset nonmelanoma skin cancer could enhance prevention measures, but few patient data-driven tools exist for more accurate prediction. Objective: To use machine learning to develop a prediction model for incident nonmelanoma skin cancer based on large-scale, multidimensional, nonimaging medical information. Design, Setting, and Participants: This study used a database comprising 2 million randomly sampled patients from the Taiwan National Health Insurance Research Database from January 1, 1999, to December 31, 2013. A total of 1829 patients with nonmelanoma skin cancer as their first diagnosed cancer and 7665 random controls without cancer were included in the analysis. A convolutional neural network, a deep learning approach, was used to develop a risk prediction model. This risk prediction model used 3-year clinical diagnostic information, medical records, and temporal-sequential information to predict the skin cancer risk of a given patient within the next year. Stepwise feature selection was also performed to investigate important and determining factors of the model. Statistical analysis was performed from November 1, 2016, to October 31, 2018. Main Outcomes and Measures: Sensitivity, specificity, and area under the receiver operating characteristic (AUROC) curve were used to evaluate the performance of the models. Results: A total of 1829 patients (923 women [50.5%] and 906 men [49.5%]; mean [SD] age, 65.3 [15.7] years) with nonmelanoma skin cancer and 7665 random controls without cancer (3951 women [51.5%] and 3714 men [48.4%]; mean [SD] age, 47.5 [17.3] years) were included in the analysis. The 1-year incident nonmelanoma skin cancer risk prediction model using sequential diagnostic information and drug prescription information as a time-incorporated feature matrix could attain an AUROC of 0.89 (95% CI, 0.87-0.91), with a mean (SD) sensitivity of 83.1% (3.5%) and mean (SD) specificity of 82.3% (4.1%). Carcinoma in situ of skin (AUROC, 0.867; -2.80% loss) and other chronic comorbidities (eg, degenerative osteopathy [AUROC, 0.872; -2.32% loss], hypertension [AUROC, 0.879; -1.53% loss], and chronic kidney insufficiency [AUROC, 0.879; -1.52% loss]) served as more discriminative factors for the prediction. Medications such as trazodone, acarbose, systemic antifungal agents, statins, nonsteroidal anti-inflammatory drugs, and thiazide diuretics were the top-ranking discriminative features in the model; each led to more than a 1% decrease of the AUROC when eliminated individually (eg, trazodone AUROC, 0.868; -2.67% reduction; acarbose AUROC, 0.870; -2.50 reduction; and systemic antifungal agents AUROC, 0.875; -1.99 reduction). Conclusions and Relevance: The findings of this study suggest that a risk prediction model may have potential predictive factors for nonmelanoma skin cancer. This model may help health care professionals target high-risk populations for more intensive skin cancer preventive methods.
ASJC Scopus subject areas