Regularized receiver operating characteristic-based logistic regression for grouped variable selection with composite criterion

Yang Li, Chenqun Yu, Yichen Qin, Limin Wang, Jiaxu Chen, Danhui Yi, Ben-Chang Shia, Shuangge Ma

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

It is well known that statistical classifiers trained from imbalanced data lead to low true positive rates and select inconsistent significant variables. In this article, an improved method is proposed to enhance the classification accuracy for the minority class by differentiating misclassification cost for each group. The overall error rate is replaced by an alternative composite criterion. Furthermore, we propose an approach to estimate the tuning parameter, the composite criterion, and the cut-point simultaneously. Simulations show that the proposed method achieves a high true positive rate on prediction and a good performance on variable selection for both continuous and categorical predictors, even with highly imbalanced data. An illustrative example of the analysis of the suboptimal health state data in traditional Chinese medicine is discussed to show the reasonable application of the proposed method.

Original languageEnglish
Pages (from-to)2582-2595
Number of pages14
JournalJournal of Statistical Computation and Simulation
Volume85
Issue number13
DOIs
Publication statusPublished - Sep 2 2015
Externally publishedYes

Fingerprint

Operating Characteristics
Variable Selection
Logistic Regression
Logistics
Receiver
Composite
Composite materials
Traditional Chinese Medicine
Medicine
Classifiers
Misclassification
Tuning
Health
Parameter Tuning
Categorical
Inconsistent
Error Rate
Predictors
Classifier
Costs

Keywords

  • composite criterion
  • group lasso
  • imbalanced data
  • true positive rate

ASJC Scopus subject areas

  • Applied Mathematics
  • Statistics and Probability
  • Modelling and Simulation
  • Statistics, Probability and Uncertainty

Cite this

Regularized receiver operating characteristic-based logistic regression for grouped variable selection with composite criterion. / Li, Yang; Yu, Chenqun; Qin, Yichen; Wang, Limin; Chen, Jiaxu; Yi, Danhui; Shia, Ben-Chang; Ma, Shuangge.

In: Journal of Statistical Computation and Simulation, Vol. 85, No. 13, 02.09.2015, p. 2582-2595.

Research output: Contribution to journalArticle

Li, Yang ; Yu, Chenqun ; Qin, Yichen ; Wang, Limin ; Chen, Jiaxu ; Yi, Danhui ; Shia, Ben-Chang ; Ma, Shuangge. / Regularized receiver operating characteristic-based logistic regression for grouped variable selection with composite criterion. In: Journal of Statistical Computation and Simulation. 2015 ; Vol. 85, No. 13. pp. 2582-2595.
@article{15a4db8356b34178a163ce382e86be4d,
title = "Regularized receiver operating characteristic-based logistic regression for grouped variable selection with composite criterion",
abstract = "It is well known that statistical classifiers trained from imbalanced data lead to low true positive rates and select inconsistent significant variables. In this article, an improved method is proposed to enhance the classification accuracy for the minority class by differentiating misclassification cost for each group. The overall error rate is replaced by an alternative composite criterion. Furthermore, we propose an approach to estimate the tuning parameter, the composite criterion, and the cut-point simultaneously. Simulations show that the proposed method achieves a high true positive rate on prediction and a good performance on variable selection for both continuous and categorical predictors, even with highly imbalanced data. An illustrative example of the analysis of the suboptimal health state data in traditional Chinese medicine is discussed to show the reasonable application of the proposed method.",
keywords = "composite criterion, group lasso, imbalanced data, true positive rate",
author = "Yang Li and Chenqun Yu and Yichen Qin and Limin Wang and Jiaxu Chen and Danhui Yi and Ben-Chang Shia and Shuangge Ma",
year = "2015",
month = "9",
day = "2",
doi = "10.1080/00949655.2014.899362",
language = "English",
volume = "85",
pages = "2582--2595",
journal = "Journal of Statistical Computation and Simulation",
issn = "0094-9655",
publisher = "Taylor and Francis Ltd.",
number = "13",

}

TY - JOUR

T1 - Regularized receiver operating characteristic-based logistic regression for grouped variable selection with composite criterion

AU - Li, Yang

AU - Yu, Chenqun

AU - Qin, Yichen

AU - Wang, Limin

AU - Chen, Jiaxu

AU - Yi, Danhui

AU - Shia, Ben-Chang

AU - Ma, Shuangge

PY - 2015/9/2

Y1 - 2015/9/2

N2 - It is well known that statistical classifiers trained from imbalanced data lead to low true positive rates and select inconsistent significant variables. In this article, an improved method is proposed to enhance the classification accuracy for the minority class by differentiating misclassification cost for each group. The overall error rate is replaced by an alternative composite criterion. Furthermore, we propose an approach to estimate the tuning parameter, the composite criterion, and the cut-point simultaneously. Simulations show that the proposed method achieves a high true positive rate on prediction and a good performance on variable selection for both continuous and categorical predictors, even with highly imbalanced data. An illustrative example of the analysis of the suboptimal health state data in traditional Chinese medicine is discussed to show the reasonable application of the proposed method.

AB - It is well known that statistical classifiers trained from imbalanced data lead to low true positive rates and select inconsistent significant variables. In this article, an improved method is proposed to enhance the classification accuracy for the minority class by differentiating misclassification cost for each group. The overall error rate is replaced by an alternative composite criterion. Furthermore, we propose an approach to estimate the tuning parameter, the composite criterion, and the cut-point simultaneously. Simulations show that the proposed method achieves a high true positive rate on prediction and a good performance on variable selection for both continuous and categorical predictors, even with highly imbalanced data. An illustrative example of the analysis of the suboptimal health state data in traditional Chinese medicine is discussed to show the reasonable application of the proposed method.

KW - composite criterion

KW - group lasso

KW - imbalanced data

KW - true positive rate

UR - http://www.scopus.com/inward/record.url?scp=84930575780&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84930575780&partnerID=8YFLogxK

U2 - 10.1080/00949655.2014.899362

DO - 10.1080/00949655.2014.899362

M3 - Article

AN - SCOPUS:84930575780

VL - 85

SP - 2582

EP - 2595

JO - Journal of Statistical Computation and Simulation

JF - Journal of Statistical Computation and Simulation

SN - 0094-9655

IS - 13

ER -