SNARE-CNN: A 2D convolutional neural network architecture to identify SNARE proteins from high-throughput sequencing data

Nguyen Quoc Khanh Le, Van Nui Nguyen

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

Deep learning has been increasingly and widely used to solve numerous problems in various fields with state-of-the-art performance. It can also be applied in bioinformatics to reduce the requirement for feature extraction and reach high performance. This study attempts to use deep learning to predict SNARE proteins, which is one of the most vital molecular functions in life science. A functional loss of SNARE proteins has been implicated in a variety of human diseases (e.g., neurodegenerative, mental illness, cancer, and so on). Therefore, creating a precise model to identify their functions is a crucial problem for understanding these diseases, and designing the drug targets. Our SNARE-CNN model which uses two-dimensional convolutional neural networks and position-specific scoring matrix profiles could identify SNARE proteins with achieved sensitivity of 76.6%, specificity of 93.5%, accuracy of 89.7%, and MCC of 0.7 in cross- validation dataset. We also evaluate the performance of our model via an independent dataset and the result shows that we are able to solve the overfitting problem. Compared with other state-of-the-art methods, this approach achieved significant improvement in all of the metrics. Throughout the proposed study, we provide an effective model for identifying SNARE proteins and a basis for further research that can apply deep learning in bioinformatics, especially in protein function prediction. SNARE-CNN are freely available at https://github.com/khanhlee/snare-cnn.

Original languageEnglish
Article numbere177
JournalPeerJ Computer Science
Volume2019
Issue number5
DOIs
Publication statusPublished - Jan 1 2019
Externally publishedYes

Fingerprint

Network architecture
Throughput
Neural networks
Proteins
Bioinformatics
Neurodegenerative diseases
Feature extraction
Deep learning

Keywords

  • Biological domain
  • Cancer
  • Deep learning
  • Human disease
  • Membrane fusion
  • Overfitting
  • Position specific scoring matrix
  • Protein family classification
  • SNARE protein function
  • Vesicular transport protein

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

SNARE-CNN : A 2D convolutional neural network architecture to identify SNARE proteins from high-throughput sequencing data. / Le, Nguyen Quoc Khanh; Nguyen, Van Nui.

In: PeerJ Computer Science, Vol. 2019, No. 5, e177, 01.01.2019.

Research output: Contribution to journalArticle

@article{345fbe7fa25b411487a451f3b34f9f29,
title = "SNARE-CNN: A 2D convolutional neural network architecture to identify SNARE proteins from high-throughput sequencing data",
abstract = "Deep learning has been increasingly and widely used to solve numerous problems in various fields with state-of-the-art performance. It can also be applied in bioinformatics to reduce the requirement for feature extraction and reach high performance. This study attempts to use deep learning to predict SNARE proteins, which is one of the most vital molecular functions in life science. A functional loss of SNARE proteins has been implicated in a variety of human diseases (e.g., neurodegenerative, mental illness, cancer, and so on). Therefore, creating a precise model to identify their functions is a crucial problem for understanding these diseases, and designing the drug targets. Our SNARE-CNN model which uses two-dimensional convolutional neural networks and position-specific scoring matrix profiles could identify SNARE proteins with achieved sensitivity of 76.6{\%}, specificity of 93.5{\%}, accuracy of 89.7{\%}, and MCC of 0.7 in cross- validation dataset. We also evaluate the performance of our model via an independent dataset and the result shows that we are able to solve the overfitting problem. Compared with other state-of-the-art methods, this approach achieved significant improvement in all of the metrics. Throughout the proposed study, we provide an effective model for identifying SNARE proteins and a basis for further research that can apply deep learning in bioinformatics, especially in protein function prediction. SNARE-CNN are freely available at https://github.com/khanhlee/snare-cnn.",
keywords = "Biological domain, Cancer, Deep learning, Human disease, Membrane fusion, Overfitting, Position specific scoring matrix, Protein family classification, SNARE protein function, Vesicular transport protein",
author = "Le, {Nguyen Quoc Khanh} and Nguyen, {Van Nui}",
year = "2019",
month = "1",
day = "1",
doi = "10.7717/peerj-cs.177",
language = "English",
volume = "2019",
journal = "PeerJ Computer Science",
issn = "2376-5992",
publisher = "PeerJ Inc.",
number = "5",

}

TY - JOUR

T1 - SNARE-CNN

T2 - A 2D convolutional neural network architecture to identify SNARE proteins from high-throughput sequencing data

AU - Le, Nguyen Quoc Khanh

AU - Nguyen, Van Nui

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Deep learning has been increasingly and widely used to solve numerous problems in various fields with state-of-the-art performance. It can also be applied in bioinformatics to reduce the requirement for feature extraction and reach high performance. This study attempts to use deep learning to predict SNARE proteins, which is one of the most vital molecular functions in life science. A functional loss of SNARE proteins has been implicated in a variety of human diseases (e.g., neurodegenerative, mental illness, cancer, and so on). Therefore, creating a precise model to identify their functions is a crucial problem for understanding these diseases, and designing the drug targets. Our SNARE-CNN model which uses two-dimensional convolutional neural networks and position-specific scoring matrix profiles could identify SNARE proteins with achieved sensitivity of 76.6%, specificity of 93.5%, accuracy of 89.7%, and MCC of 0.7 in cross- validation dataset. We also evaluate the performance of our model via an independent dataset and the result shows that we are able to solve the overfitting problem. Compared with other state-of-the-art methods, this approach achieved significant improvement in all of the metrics. Throughout the proposed study, we provide an effective model for identifying SNARE proteins and a basis for further research that can apply deep learning in bioinformatics, especially in protein function prediction. SNARE-CNN are freely available at https://github.com/khanhlee/snare-cnn.

AB - Deep learning has been increasingly and widely used to solve numerous problems in various fields with state-of-the-art performance. It can also be applied in bioinformatics to reduce the requirement for feature extraction and reach high performance. This study attempts to use deep learning to predict SNARE proteins, which is one of the most vital molecular functions in life science. A functional loss of SNARE proteins has been implicated in a variety of human diseases (e.g., neurodegenerative, mental illness, cancer, and so on). Therefore, creating a precise model to identify their functions is a crucial problem for understanding these diseases, and designing the drug targets. Our SNARE-CNN model which uses two-dimensional convolutional neural networks and position-specific scoring matrix profiles could identify SNARE proteins with achieved sensitivity of 76.6%, specificity of 93.5%, accuracy of 89.7%, and MCC of 0.7 in cross- validation dataset. We also evaluate the performance of our model via an independent dataset and the result shows that we are able to solve the overfitting problem. Compared with other state-of-the-art methods, this approach achieved significant improvement in all of the metrics. Throughout the proposed study, we provide an effective model for identifying SNARE proteins and a basis for further research that can apply deep learning in bioinformatics, especially in protein function prediction. SNARE-CNN are freely available at https://github.com/khanhlee/snare-cnn.

KW - Biological domain

KW - Cancer

KW - Deep learning

KW - Human disease

KW - Membrane fusion

KW - Overfitting

KW - Position specific scoring matrix

KW - Protein family classification

KW - SNARE protein function

KW - Vesicular transport protein

UR - http://www.scopus.com/inward/record.url?scp=85063286057&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85063286057&partnerID=8YFLogxK

U2 - 10.7717/peerj-cs.177

DO - 10.7717/peerj-cs.177

M3 - Article

AN - SCOPUS:85063286057

VL - 2019

JO - PeerJ Computer Science

JF - PeerJ Computer Science

SN - 2376-5992

IS - 5

M1 - e177

ER -