SOHSite: Incorporating evolutionary information and physicochemical properties to identify protein S-sulfenylation sites

Van Minh Bui, Shun Long Weng, Cheng Tsung Lu, Tzu Hao Chang, Julia Tzu Ya Weng, Tzong Yi Lee

Research output: Contribution to journalArticle

24 Citations (Scopus)

Abstract

Background: Protein S-sulfenylation is a type of post-translational modification (PTM) involving the covalent binding of a hydroxyl group to the thiol of a cysteine amino acid. Recent evidence has shown the importance of S-sulfenylation in various biological processes, including transcriptional regulation, apoptosis and cytokine signaling. Determining the specific sites of S-sulfenylation is fundamental to understanding the structures and functions of S-sulfenylated proteins. However, the current lack of reliable tools often limits researchers to use expensive and time-consuming laboratory techniques for the identification of S-sulfenylation sites. Thus, we were motivated to develop a bioinformatics method for investigating S-sulfenylation sites based on amino acid compositions and physicochemical properties. Results: In this work, physicochemical properties were utilized not only to identify S-sulfenylation sites from 1,096 experimentally verified S-sulfenylated proteins, but also to compare the effectiveness of prediction with other characteristics such as amino acid composition (AAC), amino acid pair composition (AAPC), solvent-accessible surface area (ASA), amino acid substitution matrix (BLOSUM62), position-specific scoring matrix (PSSM), and positional weighted matrix (PWM). Various prediction models were built using support vector machine (SVM) and evaluated by five-fold cross-validation. The model constructed from hybrid features, including PSSM and physicochemical properties, yielded the best performance with sensitivity, specificity, accuracy and MCC measurements of 0.746, 0.737, 0.738 and 0.337, respectively. The selected model also provided a promising accuracy (0.693) on an independent testing dataset. Additionally, we employed TwoSampleLogo to help discover the difference of amino acid composition among S-sulfenylation, S-glutathionylation and S-nitrosylation sites. Conclusion: This work proposed a computational method to explore informative features and functions for protein S-sulfenylation. Evaluation by five-fold cross validation indicated that the selected features were effective in the identification of S-sulfenylation sites. Moreover, the independent testing results demonstrated that the proposed method could provide a feasible means for conducting preliminary analyses of protein S-sulfenylation. We also anticipate that the uncovered differences in amino acid composition may facilitate future studies of the extensive crosstalk among S-sulfenylation, S-glutathionylation and S-nitrosylation.

Original languageEnglish
Article number9
JournalBMC Genomics
Volume17
Issue number1
DOIs
Publication statusPublished - Jan 11 2016

Fingerprint

Protein S
Amino Acids
Position-Specific Scoring Matrices
Biological Phenomena
Amino Acid Substitution
Post Translational Protein Processing
Computational Biology
Sulfhydryl Compounds
Hydroxyl Radical
Cysteine
Research Personnel
Apoptosis
Cytokines
Sensitivity and Specificity

Keywords

  • Physicochemical properties
  • S-sulfenylation
  • Sulfenic acids
  • Support vector machine

ASJC Scopus subject areas

  • Biotechnology
  • Genetics

Cite this

SOHSite : Incorporating evolutionary information and physicochemical properties to identify protein S-sulfenylation sites. / Bui, Van Minh; Weng, Shun Long; Lu, Cheng Tsung; Chang, Tzu Hao; Weng, Julia Tzu Ya; Lee, Tzong Yi.

In: BMC Genomics, Vol. 17, No. 1, 9, 11.01.2016.

Research output: Contribution to journalArticle

Bui, Van Minh ; Weng, Shun Long ; Lu, Cheng Tsung ; Chang, Tzu Hao ; Weng, Julia Tzu Ya ; Lee, Tzong Yi. / SOHSite : Incorporating evolutionary information and physicochemical properties to identify protein S-sulfenylation sites. In: BMC Genomics. 2016 ; Vol. 17, No. 1.
@article{42ca2edf763c40d1abc26aaab3f21abb,
title = "SOHSite: Incorporating evolutionary information and physicochemical properties to identify protein S-sulfenylation sites",
abstract = "Background: Protein S-sulfenylation is a type of post-translational modification (PTM) involving the covalent binding of a hydroxyl group to the thiol of a cysteine amino acid. Recent evidence has shown the importance of S-sulfenylation in various biological processes, including transcriptional regulation, apoptosis and cytokine signaling. Determining the specific sites of S-sulfenylation is fundamental to understanding the structures and functions of S-sulfenylated proteins. However, the current lack of reliable tools often limits researchers to use expensive and time-consuming laboratory techniques for the identification of S-sulfenylation sites. Thus, we were motivated to develop a bioinformatics method for investigating S-sulfenylation sites based on amino acid compositions and physicochemical properties. Results: In this work, physicochemical properties were utilized not only to identify S-sulfenylation sites from 1,096 experimentally verified S-sulfenylated proteins, but also to compare the effectiveness of prediction with other characteristics such as amino acid composition (AAC), amino acid pair composition (AAPC), solvent-accessible surface area (ASA), amino acid substitution matrix (BLOSUM62), position-specific scoring matrix (PSSM), and positional weighted matrix (PWM). Various prediction models were built using support vector machine (SVM) and evaluated by five-fold cross-validation. The model constructed from hybrid features, including PSSM and physicochemical properties, yielded the best performance with sensitivity, specificity, accuracy and MCC measurements of 0.746, 0.737, 0.738 and 0.337, respectively. The selected model also provided a promising accuracy (0.693) on an independent testing dataset. Additionally, we employed TwoSampleLogo to help discover the difference of amino acid composition among S-sulfenylation, S-glutathionylation and S-nitrosylation sites. Conclusion: This work proposed a computational method to explore informative features and functions for protein S-sulfenylation. Evaluation by five-fold cross validation indicated that the selected features were effective in the identification of S-sulfenylation sites. Moreover, the independent testing results demonstrated that the proposed method could provide a feasible means for conducting preliminary analyses of protein S-sulfenylation. We also anticipate that the uncovered differences in amino acid composition may facilitate future studies of the extensive crosstalk among S-sulfenylation, S-glutathionylation and S-nitrosylation.",
keywords = "Physicochemical properties, S-sulfenylation, Sulfenic acids, Support vector machine",
author = "Bui, {Van Minh} and Weng, {Shun Long} and Lu, {Cheng Tsung} and Chang, {Tzu Hao} and Weng, {Julia Tzu Ya} and Lee, {Tzong Yi}",
year = "2016",
month = "1",
day = "11",
doi = "10.1186/s12864-015-2299-1",
language = "English",
volume = "17",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central Ltd.",
number = "1",

}

TY - JOUR

T1 - SOHSite

T2 - Incorporating evolutionary information and physicochemical properties to identify protein S-sulfenylation sites

AU - Bui, Van Minh

AU - Weng, Shun Long

AU - Lu, Cheng Tsung

AU - Chang, Tzu Hao

AU - Weng, Julia Tzu Ya

AU - Lee, Tzong Yi

PY - 2016/1/11

Y1 - 2016/1/11

N2 - Background: Protein S-sulfenylation is a type of post-translational modification (PTM) involving the covalent binding of a hydroxyl group to the thiol of a cysteine amino acid. Recent evidence has shown the importance of S-sulfenylation in various biological processes, including transcriptional regulation, apoptosis and cytokine signaling. Determining the specific sites of S-sulfenylation is fundamental to understanding the structures and functions of S-sulfenylated proteins. However, the current lack of reliable tools often limits researchers to use expensive and time-consuming laboratory techniques for the identification of S-sulfenylation sites. Thus, we were motivated to develop a bioinformatics method for investigating S-sulfenylation sites based on amino acid compositions and physicochemical properties. Results: In this work, physicochemical properties were utilized not only to identify S-sulfenylation sites from 1,096 experimentally verified S-sulfenylated proteins, but also to compare the effectiveness of prediction with other characteristics such as amino acid composition (AAC), amino acid pair composition (AAPC), solvent-accessible surface area (ASA), amino acid substitution matrix (BLOSUM62), position-specific scoring matrix (PSSM), and positional weighted matrix (PWM). Various prediction models were built using support vector machine (SVM) and evaluated by five-fold cross-validation. The model constructed from hybrid features, including PSSM and physicochemical properties, yielded the best performance with sensitivity, specificity, accuracy and MCC measurements of 0.746, 0.737, 0.738 and 0.337, respectively. The selected model also provided a promising accuracy (0.693) on an independent testing dataset. Additionally, we employed TwoSampleLogo to help discover the difference of amino acid composition among S-sulfenylation, S-glutathionylation and S-nitrosylation sites. Conclusion: This work proposed a computational method to explore informative features and functions for protein S-sulfenylation. Evaluation by five-fold cross validation indicated that the selected features were effective in the identification of S-sulfenylation sites. Moreover, the independent testing results demonstrated that the proposed method could provide a feasible means for conducting preliminary analyses of protein S-sulfenylation. We also anticipate that the uncovered differences in amino acid composition may facilitate future studies of the extensive crosstalk among S-sulfenylation, S-glutathionylation and S-nitrosylation.

AB - Background: Protein S-sulfenylation is a type of post-translational modification (PTM) involving the covalent binding of a hydroxyl group to the thiol of a cysteine amino acid. Recent evidence has shown the importance of S-sulfenylation in various biological processes, including transcriptional regulation, apoptosis and cytokine signaling. Determining the specific sites of S-sulfenylation is fundamental to understanding the structures and functions of S-sulfenylated proteins. However, the current lack of reliable tools often limits researchers to use expensive and time-consuming laboratory techniques for the identification of S-sulfenylation sites. Thus, we were motivated to develop a bioinformatics method for investigating S-sulfenylation sites based on amino acid compositions and physicochemical properties. Results: In this work, physicochemical properties were utilized not only to identify S-sulfenylation sites from 1,096 experimentally verified S-sulfenylated proteins, but also to compare the effectiveness of prediction with other characteristics such as amino acid composition (AAC), amino acid pair composition (AAPC), solvent-accessible surface area (ASA), amino acid substitution matrix (BLOSUM62), position-specific scoring matrix (PSSM), and positional weighted matrix (PWM). Various prediction models were built using support vector machine (SVM) and evaluated by five-fold cross-validation. The model constructed from hybrid features, including PSSM and physicochemical properties, yielded the best performance with sensitivity, specificity, accuracy and MCC measurements of 0.746, 0.737, 0.738 and 0.337, respectively. The selected model also provided a promising accuracy (0.693) on an independent testing dataset. Additionally, we employed TwoSampleLogo to help discover the difference of amino acid composition among S-sulfenylation, S-glutathionylation and S-nitrosylation sites. Conclusion: This work proposed a computational method to explore informative features and functions for protein S-sulfenylation. Evaluation by five-fold cross validation indicated that the selected features were effective in the identification of S-sulfenylation sites. Moreover, the independent testing results demonstrated that the proposed method could provide a feasible means for conducting preliminary analyses of protein S-sulfenylation. We also anticipate that the uncovered differences in amino acid composition may facilitate future studies of the extensive crosstalk among S-sulfenylation, S-glutathionylation and S-nitrosylation.

KW - Physicochemical properties

KW - S-sulfenylation

KW - Sulfenic acids

KW - Support vector machine

UR - http://www.scopus.com/inward/record.url?scp=84953860034&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84953860034&partnerID=8YFLogxK

U2 - 10.1186/s12864-015-2299-1

DO - 10.1186/s12864-015-2299-1

M3 - Article

C2 - 26819243

AN - SCOPUS:84953860034

VL - 17

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

IS - 1

M1 - 9

ER -