ProLoc: Prediction of protein subnuclear localization using SVM with automatic selection from physicochemical composition features

Wen Lin Huang, Chun Wei Tung, Hui Ling Huang, Shiow Fen Hwang, Shinn Ying Ho

Research output: Contribution to journalArticle

56 Citations (Scopus)

Abstract

Accurate prediction methods of protein subnuclear localizations rely on the cooperation between informative features and classifier design. Support vector machine (SVM) based learning methods are shown effective for predictions of protein subcellular and subnuclear localizations. This study proposes an evolutionary support vector machine (ESVM) based classifier with automatic selection from a large set of physicochemical composition (PCC) features to design an accurate system for predicting protein subnuclear localization, named ProLoc. ESVM using an inheritable genetic algorithm combined with SVM can automatically determine the best number m of PCC features and identify m out of 526 PCC features simultaneously. To evaluate ESVM, this study uses two datasets SNL6 and SNL9, which have 504 proteins localized in 6 subnuclear compartments and 370 proteins localized in 9 subnuclear compartments. Using a leave-one-out cross-validation, ProLoc utilizing the selected m = 33 and 28 PCC features has accuracies of 56.37% for SNL6 and 72.82% for SNL9, which are better than 51.4% for the SVM-based system using k-peptide composition features applied on SNL6, and 64.32% for an optimized evidence-theoretic k-nearest neighbor classifier utilizing pseudo amino acid composition applied on SNL9, respectively.

Original languageEnglish
Pages (from-to)573-581
Number of pages9
JournalBioSystems
Volume90
Issue number2
DOIs
Publication statusPublished - Sep 1 2007
Externally publishedYes

Fingerprint

Support vector machines
Support Vector Machine
Proteins
Protein
Prediction
Chemical analysis
Classifiers
Classifier
Cross-validation
Large Set
Peptides
Amino Acids
Amino acids
Nearest Neighbor
Genetic algorithms
Genetic Algorithm
Evaluate
Design

Keywords

  • Amino acid composition
  • Genetic algorithm
  • k-Nearest neighbor
  • Physicochemical property
  • Prediction
  • Subnuclear localization
  • Support vector machine

ASJC Scopus subject areas

  • Statistics and Probability
  • Modelling and Simulation
  • Biochemistry, Genetics and Molecular Biology(all)
  • Applied Mathematics

Cite this

ProLoc : Prediction of protein subnuclear localization using SVM with automatic selection from physicochemical composition features. / Huang, Wen Lin; Tung, Chun Wei; Huang, Hui Ling; Hwang, Shiow Fen; Ho, Shinn Ying.

In: BioSystems, Vol. 90, No. 2, 01.09.2007, p. 573-581.

Research output: Contribution to journalArticle

Huang, Wen Lin ; Tung, Chun Wei ; Huang, Hui Ling ; Hwang, Shiow Fen ; Ho, Shinn Ying. / ProLoc : Prediction of protein subnuclear localization using SVM with automatic selection from physicochemical composition features. In: BioSystems. 2007 ; Vol. 90, No. 2. pp. 573-581.
@article{ce6684598a4e4868ac92081944ffc1b3,
title = "ProLoc: Prediction of protein subnuclear localization using SVM with automatic selection from physicochemical composition features",
abstract = "Accurate prediction methods of protein subnuclear localizations rely on the cooperation between informative features and classifier design. Support vector machine (SVM) based learning methods are shown effective for predictions of protein subcellular and subnuclear localizations. This study proposes an evolutionary support vector machine (ESVM) based classifier with automatic selection from a large set of physicochemical composition (PCC) features to design an accurate system for predicting protein subnuclear localization, named ProLoc. ESVM using an inheritable genetic algorithm combined with SVM can automatically determine the best number m of PCC features and identify m out of 526 PCC features simultaneously. To evaluate ESVM, this study uses two datasets SNL6 and SNL9, which have 504 proteins localized in 6 subnuclear compartments and 370 proteins localized in 9 subnuclear compartments. Using a leave-one-out cross-validation, ProLoc utilizing the selected m = 33 and 28 PCC features has accuracies of 56.37{\%} for SNL6 and 72.82{\%} for SNL9, which are better than 51.4{\%} for the SVM-based system using k-peptide composition features applied on SNL6, and 64.32{\%} for an optimized evidence-theoretic k-nearest neighbor classifier utilizing pseudo amino acid composition applied on SNL9, respectively.",
keywords = "Amino acid composition, Genetic algorithm, k-Nearest neighbor, Physicochemical property, Prediction, Subnuclear localization, Support vector machine",
author = "Huang, {Wen Lin} and Tung, {Chun Wei} and Huang, {Hui Ling} and Hwang, {Shiow Fen} and Ho, {Shinn Ying}",
year = "2007",
month = "9",
day = "1",
doi = "10.1016/j.biosystems.2007.01.001",
language = "English",
volume = "90",
pages = "573--581",
journal = "BioSystems",
issn = "0303-2647",
publisher = "Elsevier Ireland Ltd",
number = "2",

}

TY - JOUR

T1 - ProLoc

T2 - Prediction of protein subnuclear localization using SVM with automatic selection from physicochemical composition features

AU - Huang, Wen Lin

AU - Tung, Chun Wei

AU - Huang, Hui Ling

AU - Hwang, Shiow Fen

AU - Ho, Shinn Ying

PY - 2007/9/1

Y1 - 2007/9/1

N2 - Accurate prediction methods of protein subnuclear localizations rely on the cooperation between informative features and classifier design. Support vector machine (SVM) based learning methods are shown effective for predictions of protein subcellular and subnuclear localizations. This study proposes an evolutionary support vector machine (ESVM) based classifier with automatic selection from a large set of physicochemical composition (PCC) features to design an accurate system for predicting protein subnuclear localization, named ProLoc. ESVM using an inheritable genetic algorithm combined with SVM can automatically determine the best number m of PCC features and identify m out of 526 PCC features simultaneously. To evaluate ESVM, this study uses two datasets SNL6 and SNL9, which have 504 proteins localized in 6 subnuclear compartments and 370 proteins localized in 9 subnuclear compartments. Using a leave-one-out cross-validation, ProLoc utilizing the selected m = 33 and 28 PCC features has accuracies of 56.37% for SNL6 and 72.82% for SNL9, which are better than 51.4% for the SVM-based system using k-peptide composition features applied on SNL6, and 64.32% for an optimized evidence-theoretic k-nearest neighbor classifier utilizing pseudo amino acid composition applied on SNL9, respectively.

AB - Accurate prediction methods of protein subnuclear localizations rely on the cooperation between informative features and classifier design. Support vector machine (SVM) based learning methods are shown effective for predictions of protein subcellular and subnuclear localizations. This study proposes an evolutionary support vector machine (ESVM) based classifier with automatic selection from a large set of physicochemical composition (PCC) features to design an accurate system for predicting protein subnuclear localization, named ProLoc. ESVM using an inheritable genetic algorithm combined with SVM can automatically determine the best number m of PCC features and identify m out of 526 PCC features simultaneously. To evaluate ESVM, this study uses two datasets SNL6 and SNL9, which have 504 proteins localized in 6 subnuclear compartments and 370 proteins localized in 9 subnuclear compartments. Using a leave-one-out cross-validation, ProLoc utilizing the selected m = 33 and 28 PCC features has accuracies of 56.37% for SNL6 and 72.82% for SNL9, which are better than 51.4% for the SVM-based system using k-peptide composition features applied on SNL6, and 64.32% for an optimized evidence-theoretic k-nearest neighbor classifier utilizing pseudo amino acid composition applied on SNL9, respectively.

KW - Amino acid composition

KW - Genetic algorithm

KW - k-Nearest neighbor

KW - Physicochemical property

KW - Prediction

KW - Subnuclear localization

KW - Support vector machine

UR - http://www.scopus.com/inward/record.url?scp=34548458988&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34548458988&partnerID=8YFLogxK

U2 - 10.1016/j.biosystems.2007.01.001

DO - 10.1016/j.biosystems.2007.01.001

M3 - Article

C2 - 17291684

AN - SCOPUS:34548458988

VL - 90

SP - 573

EP - 581

JO - BioSystems

JF - BioSystems

SN - 0303-2647

IS - 2

ER -