Characterization and identification of ubiquitin conjugation sites with E3 ligase recognition specificities

Van Nui Nguyen, Kai Yao Huang, Chien Hsun Huang, Tzu Hao Chang, Neil Arvin Bretaña, K. Robert Lai, Julia Tzu Ya Weng, Tzong Yi Lee

Research output: Contribution to journalArticle

13 Citations (Scopus)

Abstract

Background: In eukaryotes, ubiquitin-conjugation is an important mechanism underlying proteasome-mediated degradation of proteins, and as such, plays an essential role in the regulation of many cellular processes. In the ubiquitin-proteasome pathway, E3 ligases play important roles by recognizing a specific protein substrate and catalyzing the attachment of ubiquitin to a lysine (K) residue. As more and more experimental data on ubiquitin conjugation sites become available, it becomes possible to develop prediction models that can be scaled to big data. However, no development that focuses on the investigation of ubiquitinated substrate specificities has existed. Herein, we present an approach that exploits an iteratively statistical method to identify ubiquitin conjugation sites with substrate site specificities. Results: In this investigation, totally 6259 experimentally validated ubiquitinated proteins were obtained from dbPTM. After having filtered out homologous fragments with 40% sequence identity, the training data set contained 2658 ubiquitination sites (positive data) and 5532 non-ubiquitinated sites (negative data). Due to the difficulty in characterizing the substrate site specificities of E3 ligases by conventional sequence logo analysis, a recursively statistical method has been applied to obtain significant conserved motifs. The profile hidden Markov model (profile HMM) was adopted to construct the predictive models learned from the identified substrate motifs. A five-fold cross validation was then used to evaluate the predictive model, achieving sensitivity, specificity, and accuracy of 73.07%, 65.46%, and 67.93%, respectively. Additionally, an independent testing set, completely blind to the training data of the predictive model, was used to demonstrate that the proposed method could provide a promising accuracy (76.13%) and outperform other ubiquitination site prediction tool. Conclusion: A case study demonstrated the effectiveness of the characterized substrate motifs for identifying ubiquitination sites. The proposed method presents a practical means of preliminary analysis and greatly diminishes the total number of potential targets required for further experimental confirmation. This method may help unravel their mechanisms and roles in E3 recognition and ubiquitin-mediated protein degradation.

Original languageEnglish
Article numberS1
JournalBMC Bioinformatics
Volume16
Issue number1
DOIs
Publication statusPublished - Jan 21 2015

Fingerprint

Ubiquitin-Protein Ligases
Conjugation
Ubiquitin
Specificity
Substrate
Ubiquitination
Predictive Model
Substrates
Substrate Specificity
Proteins
Protein
Proteasome Endopeptidase Complex
Statistical method
Proteolysis
Statistical methods
Degradation
Position-Specific Scoring Matrices
Ubiquitinated Proteins
Sequence Analysis
Hidden Markov models

Keywords

  • Maximal dependence decomposition
  • Profile hidden Markov model
  • Substrate site specificity
  • Ubiquitin conjugation
  • Ubiquitination

ASJC Scopus subject areas

  • Applied Mathematics
  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications

Cite this

Characterization and identification of ubiquitin conjugation sites with E3 ligase recognition specificities. / Nguyen, Van Nui; Huang, Kai Yao; Huang, Chien Hsun; Chang, Tzu Hao; Bretaña, Neil Arvin; Lai, K. Robert; Weng, Julia Tzu Ya; Lee, Tzong Yi.

In: BMC Bioinformatics, Vol. 16, No. 1, S1, 21.01.2015.

Research output: Contribution to journalArticle

Nguyen, Van Nui ; Huang, Kai Yao ; Huang, Chien Hsun ; Chang, Tzu Hao ; Bretaña, Neil Arvin ; Lai, K. Robert ; Weng, Julia Tzu Ya ; Lee, Tzong Yi. / Characterization and identification of ubiquitin conjugation sites with E3 ligase recognition specificities. In: BMC Bioinformatics. 2015 ; Vol. 16, No. 1.
@article{d666754062194dc9afc97687d9f4077e,
title = "Characterization and identification of ubiquitin conjugation sites with E3 ligase recognition specificities",
abstract = "Background: In eukaryotes, ubiquitin-conjugation is an important mechanism underlying proteasome-mediated degradation of proteins, and as such, plays an essential role in the regulation of many cellular processes. In the ubiquitin-proteasome pathway, E3 ligases play important roles by recognizing a specific protein substrate and catalyzing the attachment of ubiquitin to a lysine (K) residue. As more and more experimental data on ubiquitin conjugation sites become available, it becomes possible to develop prediction models that can be scaled to big data. However, no development that focuses on the investigation of ubiquitinated substrate specificities has existed. Herein, we present an approach that exploits an iteratively statistical method to identify ubiquitin conjugation sites with substrate site specificities. Results: In this investigation, totally 6259 experimentally validated ubiquitinated proteins were obtained from dbPTM. After having filtered out homologous fragments with 40{\%} sequence identity, the training data set contained 2658 ubiquitination sites (positive data) and 5532 non-ubiquitinated sites (negative data). Due to the difficulty in characterizing the substrate site specificities of E3 ligases by conventional sequence logo analysis, a recursively statistical method has been applied to obtain significant conserved motifs. The profile hidden Markov model (profile HMM) was adopted to construct the predictive models learned from the identified substrate motifs. A five-fold cross validation was then used to evaluate the predictive model, achieving sensitivity, specificity, and accuracy of 73.07{\%}, 65.46{\%}, and 67.93{\%}, respectively. Additionally, an independent testing set, completely blind to the training data of the predictive model, was used to demonstrate that the proposed method could provide a promising accuracy (76.13{\%}) and outperform other ubiquitination site prediction tool. Conclusion: A case study demonstrated the effectiveness of the characterized substrate motifs for identifying ubiquitination sites. The proposed method presents a practical means of preliminary analysis and greatly diminishes the total number of potential targets required for further experimental confirmation. This method may help unravel their mechanisms and roles in E3 recognition and ubiquitin-mediated protein degradation.",
keywords = "Maximal dependence decomposition, Profile hidden Markov model, Substrate site specificity, Ubiquitin conjugation, Ubiquitination",
author = "Nguyen, {Van Nui} and Huang, {Kai Yao} and Huang, {Chien Hsun} and Chang, {Tzu Hao} and Breta{\~n}a, {Neil Arvin} and Lai, {K. Robert} and Weng, {Julia Tzu Ya} and Lee, {Tzong Yi}",
year = "2015",
month = "1",
day = "21",
doi = "10.1186/1471-2105-16-S1-S1",
language = "English",
volume = "16",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Characterization and identification of ubiquitin conjugation sites with E3 ligase recognition specificities

AU - Nguyen, Van Nui

AU - Huang, Kai Yao

AU - Huang, Chien Hsun

AU - Chang, Tzu Hao

AU - Bretaña, Neil Arvin

AU - Lai, K. Robert

AU - Weng, Julia Tzu Ya

AU - Lee, Tzong Yi

PY - 2015/1/21

Y1 - 2015/1/21

N2 - Background: In eukaryotes, ubiquitin-conjugation is an important mechanism underlying proteasome-mediated degradation of proteins, and as such, plays an essential role in the regulation of many cellular processes. In the ubiquitin-proteasome pathway, E3 ligases play important roles by recognizing a specific protein substrate and catalyzing the attachment of ubiquitin to a lysine (K) residue. As more and more experimental data on ubiquitin conjugation sites become available, it becomes possible to develop prediction models that can be scaled to big data. However, no development that focuses on the investigation of ubiquitinated substrate specificities has existed. Herein, we present an approach that exploits an iteratively statistical method to identify ubiquitin conjugation sites with substrate site specificities. Results: In this investigation, totally 6259 experimentally validated ubiquitinated proteins were obtained from dbPTM. After having filtered out homologous fragments with 40% sequence identity, the training data set contained 2658 ubiquitination sites (positive data) and 5532 non-ubiquitinated sites (negative data). Due to the difficulty in characterizing the substrate site specificities of E3 ligases by conventional sequence logo analysis, a recursively statistical method has been applied to obtain significant conserved motifs. The profile hidden Markov model (profile HMM) was adopted to construct the predictive models learned from the identified substrate motifs. A five-fold cross validation was then used to evaluate the predictive model, achieving sensitivity, specificity, and accuracy of 73.07%, 65.46%, and 67.93%, respectively. Additionally, an independent testing set, completely blind to the training data of the predictive model, was used to demonstrate that the proposed method could provide a promising accuracy (76.13%) and outperform other ubiquitination site prediction tool. Conclusion: A case study demonstrated the effectiveness of the characterized substrate motifs for identifying ubiquitination sites. The proposed method presents a practical means of preliminary analysis and greatly diminishes the total number of potential targets required for further experimental confirmation. This method may help unravel their mechanisms and roles in E3 recognition and ubiquitin-mediated protein degradation.

AB - Background: In eukaryotes, ubiquitin-conjugation is an important mechanism underlying proteasome-mediated degradation of proteins, and as such, plays an essential role in the regulation of many cellular processes. In the ubiquitin-proteasome pathway, E3 ligases play important roles by recognizing a specific protein substrate and catalyzing the attachment of ubiquitin to a lysine (K) residue. As more and more experimental data on ubiquitin conjugation sites become available, it becomes possible to develop prediction models that can be scaled to big data. However, no development that focuses on the investigation of ubiquitinated substrate specificities has existed. Herein, we present an approach that exploits an iteratively statistical method to identify ubiquitin conjugation sites with substrate site specificities. Results: In this investigation, totally 6259 experimentally validated ubiquitinated proteins were obtained from dbPTM. After having filtered out homologous fragments with 40% sequence identity, the training data set contained 2658 ubiquitination sites (positive data) and 5532 non-ubiquitinated sites (negative data). Due to the difficulty in characterizing the substrate site specificities of E3 ligases by conventional sequence logo analysis, a recursively statistical method has been applied to obtain significant conserved motifs. The profile hidden Markov model (profile HMM) was adopted to construct the predictive models learned from the identified substrate motifs. A five-fold cross validation was then used to evaluate the predictive model, achieving sensitivity, specificity, and accuracy of 73.07%, 65.46%, and 67.93%, respectively. Additionally, an independent testing set, completely blind to the training data of the predictive model, was used to demonstrate that the proposed method could provide a promising accuracy (76.13%) and outperform other ubiquitination site prediction tool. Conclusion: A case study demonstrated the effectiveness of the characterized substrate motifs for identifying ubiquitination sites. The proposed method presents a practical means of preliminary analysis and greatly diminishes the total number of potential targets required for further experimental confirmation. This method may help unravel their mechanisms and roles in E3 recognition and ubiquitin-mediated protein degradation.

KW - Maximal dependence decomposition

KW - Profile hidden Markov model

KW - Substrate site specificity

KW - Ubiquitin conjugation

KW - Ubiquitination

UR - http://www.scopus.com/inward/record.url?scp=84961589464&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84961589464&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-16-S1-S1

DO - 10.1186/1471-2105-16-S1-S1

M3 - Article

C2 - 25707307

AN - SCOPUS:84961589464

VL - 16

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - 1

M1 - S1

ER -