EuLoc

A web-server for accurately predict protein subcellular localization in eukaryotes by incorporating various features of sequence segments into the general form of Chou's PseAAC

Tzu Hao Chang, Li Ching Wu, Tzong Yi Lee, Shu Pin Chen, Hsien Da Huang, Jorng Tzong Horng

Research output: Contribution to journalArticle

35 Citations (Scopus)

Abstract

The function of a protein is generally related to its subcellular localization. Therefore, knowing its subcellular localization is helpful in understanding its potential functions and roles in biological processes. This work develops a hybrid method for computationally predicting the subcellular localization of eukaryotic protein. The method is called EuLoc and incorporates the Hidden Markov Model (HMM) method, homology search approach and the support vector machines (SVM) method by fusing several new features into Chou's pseudo-amino acid composition. The proposed SVM module overcomes the shortcoming of the homology search approach in predicting the subcellular localization of a protein which only finds low-homologous or non-homologous sequences in a protein subcellular localization annotated database. The proposed HMM modules overcome the shortcoming of SVM in predicting subcellular localizations using few data on protein sequences. Several features of a protein sequence are considered, including the sequence-based features, the biological features derived from PROSITE, NLSdb and Pfam, the post-transcriptional modification features and others. The overall accuracy and location accuracy of EuLoc are 90.5 and 91.2 %, respectively, revealing a better predictive performance than obtained elsewhere. Although the amounts of data of the various subcellular location groups in benchmark dataset differ markedly, the accuracies of 12 subcellular localizations of EuLoc range from 82.5 to 100 %, indicating that this tool is much more balanced than other tools. EuLoc offers a high, balanced predictive power for each subcellular localization. EuLoc is now available on the web at http://euloc.mbc.nctu.edu.tw/.

Original languageEnglish
Pages (from-to)91-103
Number of pages13
JournalJournal of Computer-Aided Molecular Design
Volume27
Issue number1
DOIs
Publication statusPublished - Jan 2013

Fingerprint

eukaryotes
Eukaryota
Servers
proteins
Proteins
Support vector machines
Hidden Markov models
homology
Biological Phenomena
Benchmarking
modules
Amino acids
Databases
Amino Acids
amino acids
Chemical analysis
Support Vector Machine

Keywords

  • Eukaryote
  • Protein function
  • Subcellular localization
  • Support vector machine

ASJC Scopus subject areas

  • Drug Discovery
  • Physical and Theoretical Chemistry
  • Computer Science Applications

Cite this

EuLoc : A web-server for accurately predict protein subcellular localization in eukaryotes by incorporating various features of sequence segments into the general form of Chou's PseAAC. / Chang, Tzu Hao; Wu, Li Ching; Lee, Tzong Yi; Chen, Shu Pin; Huang, Hsien Da; Horng, Jorng Tzong.

In: Journal of Computer-Aided Molecular Design, Vol. 27, No. 1, 01.2013, p. 91-103.

Research output: Contribution to journalArticle

@article{f11e2238013b4d0fa703d2c2c0070837,
title = "EuLoc: A web-server for accurately predict protein subcellular localization in eukaryotes by incorporating various features of sequence segments into the general form of Chou's PseAAC",
abstract = "The function of a protein is generally related to its subcellular localization. Therefore, knowing its subcellular localization is helpful in understanding its potential functions and roles in biological processes. This work develops a hybrid method for computationally predicting the subcellular localization of eukaryotic protein. The method is called EuLoc and incorporates the Hidden Markov Model (HMM) method, homology search approach and the support vector machines (SVM) method by fusing several new features into Chou's pseudo-amino acid composition. The proposed SVM module overcomes the shortcoming of the homology search approach in predicting the subcellular localization of a protein which only finds low-homologous or non-homologous sequences in a protein subcellular localization annotated database. The proposed HMM modules overcome the shortcoming of SVM in predicting subcellular localizations using few data on protein sequences. Several features of a protein sequence are considered, including the sequence-based features, the biological features derived from PROSITE, NLSdb and Pfam, the post-transcriptional modification features and others. The overall accuracy and location accuracy of EuLoc are 90.5 and 91.2 {\%}, respectively, revealing a better predictive performance than obtained elsewhere. Although the amounts of data of the various subcellular location groups in benchmark dataset differ markedly, the accuracies of 12 subcellular localizations of EuLoc range from 82.5 to 100 {\%}, indicating that this tool is much more balanced than other tools. EuLoc offers a high, balanced predictive power for each subcellular localization. EuLoc is now available on the web at http://euloc.mbc.nctu.edu.tw/.",
keywords = "Eukaryote, Protein function, Subcellular localization, Support vector machine",
author = "Chang, {Tzu Hao} and Wu, {Li Ching} and Lee, {Tzong Yi} and Chen, {Shu Pin} and Huang, {Hsien Da} and Horng, {Jorng Tzong}",
year = "2013",
month = "1",
doi = "10.1007/s10822-012-9628-0",
language = "English",
volume = "27",
pages = "91--103",
journal = "Journal of Computer-Aided Molecular Design",
issn = "0920-654X",
publisher = "Springer Netherlands",
number = "1",

}

TY - JOUR

T1 - EuLoc

T2 - A web-server for accurately predict protein subcellular localization in eukaryotes by incorporating various features of sequence segments into the general form of Chou's PseAAC

AU - Chang, Tzu Hao

AU - Wu, Li Ching

AU - Lee, Tzong Yi

AU - Chen, Shu Pin

AU - Huang, Hsien Da

AU - Horng, Jorng Tzong

PY - 2013/1

Y1 - 2013/1

N2 - The function of a protein is generally related to its subcellular localization. Therefore, knowing its subcellular localization is helpful in understanding its potential functions and roles in biological processes. This work develops a hybrid method for computationally predicting the subcellular localization of eukaryotic protein. The method is called EuLoc and incorporates the Hidden Markov Model (HMM) method, homology search approach and the support vector machines (SVM) method by fusing several new features into Chou's pseudo-amino acid composition. The proposed SVM module overcomes the shortcoming of the homology search approach in predicting the subcellular localization of a protein which only finds low-homologous or non-homologous sequences in a protein subcellular localization annotated database. The proposed HMM modules overcome the shortcoming of SVM in predicting subcellular localizations using few data on protein sequences. Several features of a protein sequence are considered, including the sequence-based features, the biological features derived from PROSITE, NLSdb and Pfam, the post-transcriptional modification features and others. The overall accuracy and location accuracy of EuLoc are 90.5 and 91.2 %, respectively, revealing a better predictive performance than obtained elsewhere. Although the amounts of data of the various subcellular location groups in benchmark dataset differ markedly, the accuracies of 12 subcellular localizations of EuLoc range from 82.5 to 100 %, indicating that this tool is much more balanced than other tools. EuLoc offers a high, balanced predictive power for each subcellular localization. EuLoc is now available on the web at http://euloc.mbc.nctu.edu.tw/.

AB - The function of a protein is generally related to its subcellular localization. Therefore, knowing its subcellular localization is helpful in understanding its potential functions and roles in biological processes. This work develops a hybrid method for computationally predicting the subcellular localization of eukaryotic protein. The method is called EuLoc and incorporates the Hidden Markov Model (HMM) method, homology search approach and the support vector machines (SVM) method by fusing several new features into Chou's pseudo-amino acid composition. The proposed SVM module overcomes the shortcoming of the homology search approach in predicting the subcellular localization of a protein which only finds low-homologous or non-homologous sequences in a protein subcellular localization annotated database. The proposed HMM modules overcome the shortcoming of SVM in predicting subcellular localizations using few data on protein sequences. Several features of a protein sequence are considered, including the sequence-based features, the biological features derived from PROSITE, NLSdb and Pfam, the post-transcriptional modification features and others. The overall accuracy and location accuracy of EuLoc are 90.5 and 91.2 %, respectively, revealing a better predictive performance than obtained elsewhere. Although the amounts of data of the various subcellular location groups in benchmark dataset differ markedly, the accuracies of 12 subcellular localizations of EuLoc range from 82.5 to 100 %, indicating that this tool is much more balanced than other tools. EuLoc offers a high, balanced predictive power for each subcellular localization. EuLoc is now available on the web at http://euloc.mbc.nctu.edu.tw/.

KW - Eukaryote

KW - Protein function

KW - Subcellular localization

KW - Support vector machine

UR - http://www.scopus.com/inward/record.url?scp=84874109868&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84874109868&partnerID=8YFLogxK

U2 - 10.1007/s10822-012-9628-0

DO - 10.1007/s10822-012-9628-0

M3 - Article

VL - 27

SP - 91

EP - 103

JO - Journal of Computer-Aided Molecular Design

JF - Journal of Computer-Aided Molecular Design

SN - 0920-654X

IS - 1

ER -