Abstract
Pupylation is an important post-translational modification in prokaryotes. A prokaryotic ubiquitin-like protein (Pup) is attached to proteins as a signal for selective degradation by proteasome. Several proteomics methods have been developed for the identification of pupylated proteins and pupylation sites. However, pupylation sites of many experimentally identified pupylated proteins are still unknown. The development of sequence-based prediction methods can help to accelerate the identification of pupylation sites and gain insights into the substrate specificity and regulatory functions of pupylation. A novel tool iPUP is developed for the computational identification of pupylation sites. A composition of k-spaced amino acid pairs is utilized to represent a peptide sequence. Top ranked k-spaced amino acid pairs are subsequently selected by using a sequential backward feature elimination algorithm. The 10-fold cross-validation performance of iPUP trained by using the composition of 150 top ranked k-spaced amino acid pairs and support vector machines is 0.83 for the area under receiver operating characteristic curve. The importance analysis of k-spaced amino acid pairs shows that terminal space-containing pairs are useful for discriminating pupylation sites from non-pupylation sites. A sequence analysis confirms that lysines close to C-terminus tend to be pupylated. In contrast, lysines close to N-terminus are less likely to be pupylated. The iPUP tool can predict pupylation sites with probability scores for prioritizing promising pupylation sites. Both the online server and the standalone software of iPUP are freely available for academic use at http://cwtung.kmu.edu.tw/ipup.
Original language | English |
---|---|
Pages (from-to) | 11-17 |
Number of pages | 7 |
Journal | Journal of Theoretical Biology |
Volume | 336 |
DOIs | |
Publication status | Published - Nov 7 2013 |
Externally published | Yes |
Fingerprint
Keywords
- Feature selection
- K-spaced amino acid pairs
- Pupylation
- Software
- Support vector machine
ASJC Scopus subject areas
- Medicine(all)
- Immunology and Microbiology(all)
- Biochemistry, Genetics and Molecular Biology(all)
- Agricultural and Biological Sciences(all)
- Modelling and Simulation
- Statistics and Probability
- Applied Mathematics
Cite this
Prediction of pupylation sites using the composition of k-spaced amino acid pairs. / Tung, Chun Wei.
In: Journal of Theoretical Biology, Vol. 336, 07.11.2013, p. 11-17.Research output: Contribution to journal › Article
}
TY - JOUR
T1 - Prediction of pupylation sites using the composition of k-spaced amino acid pairs
AU - Tung, Chun Wei
PY - 2013/11/7
Y1 - 2013/11/7
N2 - Pupylation is an important post-translational modification in prokaryotes. A prokaryotic ubiquitin-like protein (Pup) is attached to proteins as a signal for selective degradation by proteasome. Several proteomics methods have been developed for the identification of pupylated proteins and pupylation sites. However, pupylation sites of many experimentally identified pupylated proteins are still unknown. The development of sequence-based prediction methods can help to accelerate the identification of pupylation sites and gain insights into the substrate specificity and regulatory functions of pupylation. A novel tool iPUP is developed for the computational identification of pupylation sites. A composition of k-spaced amino acid pairs is utilized to represent a peptide sequence. Top ranked k-spaced amino acid pairs are subsequently selected by using a sequential backward feature elimination algorithm. The 10-fold cross-validation performance of iPUP trained by using the composition of 150 top ranked k-spaced amino acid pairs and support vector machines is 0.83 for the area under receiver operating characteristic curve. The importance analysis of k-spaced amino acid pairs shows that terminal space-containing pairs are useful for discriminating pupylation sites from non-pupylation sites. A sequence analysis confirms that lysines close to C-terminus tend to be pupylated. In contrast, lysines close to N-terminus are less likely to be pupylated. The iPUP tool can predict pupylation sites with probability scores for prioritizing promising pupylation sites. Both the online server and the standalone software of iPUP are freely available for academic use at http://cwtung.kmu.edu.tw/ipup.
AB - Pupylation is an important post-translational modification in prokaryotes. A prokaryotic ubiquitin-like protein (Pup) is attached to proteins as a signal for selective degradation by proteasome. Several proteomics methods have been developed for the identification of pupylated proteins and pupylation sites. However, pupylation sites of many experimentally identified pupylated proteins are still unknown. The development of sequence-based prediction methods can help to accelerate the identification of pupylation sites and gain insights into the substrate specificity and regulatory functions of pupylation. A novel tool iPUP is developed for the computational identification of pupylation sites. A composition of k-spaced amino acid pairs is utilized to represent a peptide sequence. Top ranked k-spaced amino acid pairs are subsequently selected by using a sequential backward feature elimination algorithm. The 10-fold cross-validation performance of iPUP trained by using the composition of 150 top ranked k-spaced amino acid pairs and support vector machines is 0.83 for the area under receiver operating characteristic curve. The importance analysis of k-spaced amino acid pairs shows that terminal space-containing pairs are useful for discriminating pupylation sites from non-pupylation sites. A sequence analysis confirms that lysines close to C-terminus tend to be pupylated. In contrast, lysines close to N-terminus are less likely to be pupylated. The iPUP tool can predict pupylation sites with probability scores for prioritizing promising pupylation sites. Both the online server and the standalone software of iPUP are freely available for academic use at http://cwtung.kmu.edu.tw/ipup.
KW - Feature selection
KW - K-spaced amino acid pairs
KW - Pupylation
KW - Software
KW - Support vector machine
UR - http://www.scopus.com/inward/record.url?scp=84881234023&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84881234023&partnerID=8YFLogxK
U2 - 10.1016/j.jtbi.2013.07.009
DO - 10.1016/j.jtbi.2013.07.009
M3 - Article
C2 - 23871866
AN - SCOPUS:84881234023
VL - 336
SP - 11
EP - 17
JO - Journal of Theoretical Biology
JF - Journal of Theoretical Biology
SN - 0022-5193
ER -