Identification of clathrin proteins by incorporating hyperparameter optimization in deep learning and PSSM profiles

Nguyen Quoc Khanh Le, Tuan Tu Huynh, Edward Kien Yee Yapp, Hui Yuan Yeh

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Background and Objectives: Clathrin is an adaptor protein that serves as the principal element of the vesicle-coating complex and is important for the membrane cleavage to dispense the invaginated vesicle from the plasma membrane. The functional loss of clathrins has been tied to a lot of human diseases, i.e., neurodegenerative disorders, cancer, Alzheimer's diseases, and so on. Therefore, creating a precise model to identify its functions is a crucial step towards understanding human diseases and designing drug targets. Methods: We present a deep learning model using a two-dimensional convolutional neural network (CNN) and position-specific scoring matrix (PSSM) profiles to identify clathrin proteins from high throughput sequences. Traditionally, the 2D CNNs take images as an input so we treated the PSSM profile with a 20 × 20 matrix as an image of 20 × 20 pixels. The input PSSM profile was then connected to our 2D CNN in which we set a variety of parameters to improve the performance of the model. Based on the 10-fold cross-validation results, hyper-parameter optimization process was employed to find the best model for our dataset. Finally, an independent dataset was used to assess the predictive ability of the current model. Results: Our model could identify clathrin proteins with sensitivity of 92.2%, specificity of 91.2%, accuracy of 91.8%, and MCC of 0.83 in the independent dataset. Compared to state-of-the-art traditional neural networks, our method achieved a significant improvement in all typical measurement metrics. Conclusions: Throughout the proposed study, we provide an effective tool for investigating clathrin proteins and our achievement could promote the use of deep learning in biomedical research. We also provide source codes and dataset freely at https://www.github.com/khanhlee/deep-clathrin/.

Original languageEnglish
Pages (from-to)81-88
Number of pages8
JournalComputer Methods and Programs in Biomedicine
Volume177
DOIs
Publication statusPublished - Aug 1 2019
Externally publishedYes

Fingerprint

Position-Specific Scoring Matrices
Clathrin
Learning
Proteins
Neural networks
Vesicular Transport Adaptor Proteins
Aptitude
Neurodegenerative diseases
Neurodegenerative Diseases
Cell membranes
Biomedical Research
Alzheimer Disease
Cell Membrane
Deep learning
Sensitivity and Specificity
Pixels
Throughput
Membranes
Datasets
Coatings

Keywords

  • Adaptor protein complex
  • Clathrin coated pits
  • Convolutional neural network
  • Molecular function
  • Position specific scoring matrix
  • Vesicular transport

ASJC Scopus subject areas

  • Software
  • Computer Science Applications
  • Health Informatics

Cite this

Identification of clathrin proteins by incorporating hyperparameter optimization in deep learning and PSSM profiles. / Le, Nguyen Quoc Khanh; Huynh, Tuan Tu; Yapp, Edward Kien Yee; Yeh, Hui Yuan.

In: Computer Methods and Programs in Biomedicine, Vol. 177, 01.08.2019, p. 81-88.

Research output: Contribution to journalArticle

@article{6341d06c35bc47908ccdcb6c83186d88,
title = "Identification of clathrin proteins by incorporating hyperparameter optimization in deep learning and PSSM profiles",
abstract = "Background and Objectives: Clathrin is an adaptor protein that serves as the principal element of the vesicle-coating complex and is important for the membrane cleavage to dispense the invaginated vesicle from the plasma membrane. The functional loss of clathrins has been tied to a lot of human diseases, i.e., neurodegenerative disorders, cancer, Alzheimer's diseases, and so on. Therefore, creating a precise model to identify its functions is a crucial step towards understanding human diseases and designing drug targets. Methods: We present a deep learning model using a two-dimensional convolutional neural network (CNN) and position-specific scoring matrix (PSSM) profiles to identify clathrin proteins from high throughput sequences. Traditionally, the 2D CNNs take images as an input so we treated the PSSM profile with a 20 × 20 matrix as an image of 20 × 20 pixels. The input PSSM profile was then connected to our 2D CNN in which we set a variety of parameters to improve the performance of the model. Based on the 10-fold cross-validation results, hyper-parameter optimization process was employed to find the best model for our dataset. Finally, an independent dataset was used to assess the predictive ability of the current model. Results: Our model could identify clathrin proteins with sensitivity of 92.2{\%}, specificity of 91.2{\%}, accuracy of 91.8{\%}, and MCC of 0.83 in the independent dataset. Compared to state-of-the-art traditional neural networks, our method achieved a significant improvement in all typical measurement metrics. Conclusions: Throughout the proposed study, we provide an effective tool for investigating clathrin proteins and our achievement could promote the use of deep learning in biomedical research. We also provide source codes and dataset freely at https://www.github.com/khanhlee/deep-clathrin/.",
keywords = "Adaptor protein complex, Clathrin coated pits, Convolutional neural network, Molecular function, Position specific scoring matrix, Vesicular transport",
author = "Le, {Nguyen Quoc Khanh} and Huynh, {Tuan Tu} and Yapp, {Edward Kien Yee} and Yeh, {Hui Yuan}",
year = "2019",
month = "8",
day = "1",
doi = "10.1016/j.cmpb.2019.05.016",
language = "English",
volume = "177",
pages = "81--88",
journal = "Computer Methods and Programs in Biomedicine",
issn = "0169-2607",
publisher = "Elsevier Ireland Ltd",

}

TY - JOUR

T1 - Identification of clathrin proteins by incorporating hyperparameter optimization in deep learning and PSSM profiles

AU - Le, Nguyen Quoc Khanh

AU - Huynh, Tuan Tu

AU - Yapp, Edward Kien Yee

AU - Yeh, Hui Yuan

PY - 2019/8/1

Y1 - 2019/8/1

N2 - Background and Objectives: Clathrin is an adaptor protein that serves as the principal element of the vesicle-coating complex and is important for the membrane cleavage to dispense the invaginated vesicle from the plasma membrane. The functional loss of clathrins has been tied to a lot of human diseases, i.e., neurodegenerative disorders, cancer, Alzheimer's diseases, and so on. Therefore, creating a precise model to identify its functions is a crucial step towards understanding human diseases and designing drug targets. Methods: We present a deep learning model using a two-dimensional convolutional neural network (CNN) and position-specific scoring matrix (PSSM) profiles to identify clathrin proteins from high throughput sequences. Traditionally, the 2D CNNs take images as an input so we treated the PSSM profile with a 20 × 20 matrix as an image of 20 × 20 pixels. The input PSSM profile was then connected to our 2D CNN in which we set a variety of parameters to improve the performance of the model. Based on the 10-fold cross-validation results, hyper-parameter optimization process was employed to find the best model for our dataset. Finally, an independent dataset was used to assess the predictive ability of the current model. Results: Our model could identify clathrin proteins with sensitivity of 92.2%, specificity of 91.2%, accuracy of 91.8%, and MCC of 0.83 in the independent dataset. Compared to state-of-the-art traditional neural networks, our method achieved a significant improvement in all typical measurement metrics. Conclusions: Throughout the proposed study, we provide an effective tool for investigating clathrin proteins and our achievement could promote the use of deep learning in biomedical research. We also provide source codes and dataset freely at https://www.github.com/khanhlee/deep-clathrin/.

AB - Background and Objectives: Clathrin is an adaptor protein that serves as the principal element of the vesicle-coating complex and is important for the membrane cleavage to dispense the invaginated vesicle from the plasma membrane. The functional loss of clathrins has been tied to a lot of human diseases, i.e., neurodegenerative disorders, cancer, Alzheimer's diseases, and so on. Therefore, creating a precise model to identify its functions is a crucial step towards understanding human diseases and designing drug targets. Methods: We present a deep learning model using a two-dimensional convolutional neural network (CNN) and position-specific scoring matrix (PSSM) profiles to identify clathrin proteins from high throughput sequences. Traditionally, the 2D CNNs take images as an input so we treated the PSSM profile with a 20 × 20 matrix as an image of 20 × 20 pixels. The input PSSM profile was then connected to our 2D CNN in which we set a variety of parameters to improve the performance of the model. Based on the 10-fold cross-validation results, hyper-parameter optimization process was employed to find the best model for our dataset. Finally, an independent dataset was used to assess the predictive ability of the current model. Results: Our model could identify clathrin proteins with sensitivity of 92.2%, specificity of 91.2%, accuracy of 91.8%, and MCC of 0.83 in the independent dataset. Compared to state-of-the-art traditional neural networks, our method achieved a significant improvement in all typical measurement metrics. Conclusions: Throughout the proposed study, we provide an effective tool for investigating clathrin proteins and our achievement could promote the use of deep learning in biomedical research. We also provide source codes and dataset freely at https://www.github.com/khanhlee/deep-clathrin/.

KW - Adaptor protein complex

KW - Clathrin coated pits

KW - Convolutional neural network

KW - Molecular function

KW - Position specific scoring matrix

KW - Vesicular transport

UR - http://www.scopus.com/inward/record.url?scp=85065903133&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85065903133&partnerID=8YFLogxK

U2 - 10.1016/j.cmpb.2019.05.016

DO - 10.1016/j.cmpb.2019.05.016

M3 - Article

VL - 177

SP - 81

EP - 88

JO - Computer Methods and Programs in Biomedicine

JF - Computer Methods and Programs in Biomedicine

SN - 0169-2607

ER -