Prediction of pupylation sites using the composition of k-spaced amino acid pairs

Research output: Contribution to journalArticle

29 Citations (Scopus)

Abstract

Pupylation is an important post-translational modification in prokaryotes. A prokaryotic ubiquitin-like protein (Pup) is attached to proteins as a signal for selective degradation by proteasome. Several proteomics methods have been developed for the identification of pupylated proteins and pupylation sites. However, pupylation sites of many experimentally identified pupylated proteins are still unknown. The development of sequence-based prediction methods can help to accelerate the identification of pupylation sites and gain insights into the substrate specificity and regulatory functions of pupylation. A novel tool iPUP is developed for the computational identification of pupylation sites. A composition of k-spaced amino acid pairs is utilized to represent a peptide sequence. Top ranked k-spaced amino acid pairs are subsequently selected by using a sequential backward feature elimination algorithm. The 10-fold cross-validation performance of iPUP trained by using the composition of 150 top ranked k-spaced amino acid pairs and support vector machines is 0.83 for the area under receiver operating characteristic curve. The importance analysis of k-spaced amino acid pairs shows that terminal space-containing pairs are useful for discriminating pupylation sites from non-pupylation sites. A sequence analysis confirms that lysines close to C-terminus tend to be pupylated. In contrast, lysines close to N-terminus are less likely to be pupylated. The iPUP tool can predict pupylation sites with probability scores for prioritizing promising pupylation sites. Both the online server and the standalone software of iPUP are freely available for academic use at http://cwtung.kmu.edu.tw/ipup.

Original languageEnglish
Pages (from-to)11-17
Number of pages7
JournalJournal of Theoretical Biology
Volume336
DOIs
Publication statusPublished - Nov 7 2013
Externally publishedYes

Fingerprint

Amino Acids
Amino acids
Proteins
amino acids
prediction
Prediction
Protein
Chemical analysis
Lysine
lysine
proteins
Ubiquitins
post-translational modification
proteasome endopeptidase complex
Proteasome Endopeptidase Complex
Post Translational Protein Processing
ubiquitin
substrate specificity
prokaryotic cells
Substrate Specificity

Keywords

  • Feature selection
  • K-spaced amino acid pairs
  • Pupylation
  • Software
  • Support vector machine

ASJC Scopus subject areas

  • Medicine(all)
  • Immunology and Microbiology(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)
  • Modelling and Simulation
  • Statistics and Probability
  • Applied Mathematics

Cite this

Prediction of pupylation sites using the composition of k-spaced amino acid pairs. / Tung, Chun Wei.

In: Journal of Theoretical Biology, Vol. 336, 07.11.2013, p. 11-17.

Research output: Contribution to journalArticle

@article{42e56d457a4e46caac540c77d5188364,
title = "Prediction of pupylation sites using the composition of k-spaced amino acid pairs",
abstract = "Pupylation is an important post-translational modification in prokaryotes. A prokaryotic ubiquitin-like protein (Pup) is attached to proteins as a signal for selective degradation by proteasome. Several proteomics methods have been developed for the identification of pupylated proteins and pupylation sites. However, pupylation sites of many experimentally identified pupylated proteins are still unknown. The development of sequence-based prediction methods can help to accelerate the identification of pupylation sites and gain insights into the substrate specificity and regulatory functions of pupylation. A novel tool iPUP is developed for the computational identification of pupylation sites. A composition of k-spaced amino acid pairs is utilized to represent a peptide sequence. Top ranked k-spaced amino acid pairs are subsequently selected by using a sequential backward feature elimination algorithm. The 10-fold cross-validation performance of iPUP trained by using the composition of 150 top ranked k-spaced amino acid pairs and support vector machines is 0.83 for the area under receiver operating characteristic curve. The importance analysis of k-spaced amino acid pairs shows that terminal space-containing pairs are useful for discriminating pupylation sites from non-pupylation sites. A sequence analysis confirms that lysines close to C-terminus tend to be pupylated. In contrast, lysines close to N-terminus are less likely to be pupylated. The iPUP tool can predict pupylation sites with probability scores for prioritizing promising pupylation sites. Both the online server and the standalone software of iPUP are freely available for academic use at http://cwtung.kmu.edu.tw/ipup.",
keywords = "Feature selection, K-spaced amino acid pairs, Pupylation, Software, Support vector machine",
author = "Tung, {Chun Wei}",
year = "2013",
month = "11",
day = "7",
doi = "10.1016/j.jtbi.2013.07.009",
language = "English",
volume = "336",
pages = "11--17",
journal = "Journal of Theoretical Biology",
issn = "0022-5193",
publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - Prediction of pupylation sites using the composition of k-spaced amino acid pairs

AU - Tung, Chun Wei

PY - 2013/11/7

Y1 - 2013/11/7

N2 - Pupylation is an important post-translational modification in prokaryotes. A prokaryotic ubiquitin-like protein (Pup) is attached to proteins as a signal for selective degradation by proteasome. Several proteomics methods have been developed for the identification of pupylated proteins and pupylation sites. However, pupylation sites of many experimentally identified pupylated proteins are still unknown. The development of sequence-based prediction methods can help to accelerate the identification of pupylation sites and gain insights into the substrate specificity and regulatory functions of pupylation. A novel tool iPUP is developed for the computational identification of pupylation sites. A composition of k-spaced amino acid pairs is utilized to represent a peptide sequence. Top ranked k-spaced amino acid pairs are subsequently selected by using a sequential backward feature elimination algorithm. The 10-fold cross-validation performance of iPUP trained by using the composition of 150 top ranked k-spaced amino acid pairs and support vector machines is 0.83 for the area under receiver operating characteristic curve. The importance analysis of k-spaced amino acid pairs shows that terminal space-containing pairs are useful for discriminating pupylation sites from non-pupylation sites. A sequence analysis confirms that lysines close to C-terminus tend to be pupylated. In contrast, lysines close to N-terminus are less likely to be pupylated. The iPUP tool can predict pupylation sites with probability scores for prioritizing promising pupylation sites. Both the online server and the standalone software of iPUP are freely available for academic use at http://cwtung.kmu.edu.tw/ipup.

AB - Pupylation is an important post-translational modification in prokaryotes. A prokaryotic ubiquitin-like protein (Pup) is attached to proteins as a signal for selective degradation by proteasome. Several proteomics methods have been developed for the identification of pupylated proteins and pupylation sites. However, pupylation sites of many experimentally identified pupylated proteins are still unknown. The development of sequence-based prediction methods can help to accelerate the identification of pupylation sites and gain insights into the substrate specificity and regulatory functions of pupylation. A novel tool iPUP is developed for the computational identification of pupylation sites. A composition of k-spaced amino acid pairs is utilized to represent a peptide sequence. Top ranked k-spaced amino acid pairs are subsequently selected by using a sequential backward feature elimination algorithm. The 10-fold cross-validation performance of iPUP trained by using the composition of 150 top ranked k-spaced amino acid pairs and support vector machines is 0.83 for the area under receiver operating characteristic curve. The importance analysis of k-spaced amino acid pairs shows that terminal space-containing pairs are useful for discriminating pupylation sites from non-pupylation sites. A sequence analysis confirms that lysines close to C-terminus tend to be pupylated. In contrast, lysines close to N-terminus are less likely to be pupylated. The iPUP tool can predict pupylation sites with probability scores for prioritizing promising pupylation sites. Both the online server and the standalone software of iPUP are freely available for academic use at http://cwtung.kmu.edu.tw/ipup.

KW - Feature selection

KW - K-spaced amino acid pairs

KW - Pupylation

KW - Software

KW - Support vector machine

UR - http://www.scopus.com/inward/record.url?scp=84881234023&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84881234023&partnerID=8YFLogxK

U2 - 10.1016/j.jtbi.2013.07.009

DO - 10.1016/j.jtbi.2013.07.009

M3 - Article

C2 - 23871866

AN - SCOPUS:84881234023

VL - 336

SP - 11

EP - 17

JO - Journal of Theoretical Biology

JF - Journal of Theoretical Biology

SN - 0022-5193

ER -