Fertility-GRU: Identifying Fertility-Related Proteins by Incorporating Deep-Gated Recurrent Units and Original Position-Specific Scoring Matrix Profiles

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Protein function prediction is one of the well-known problems in proteome research, attracting the attention of numerous researchers. However, the implementation of deep neural networks, which helps to increase the protein function prediction, still poses a big challenge. This study proposes a deep learning approach namely Fertility-GRU that incorporates gated recurrent units and position-specific scoring matrix profiles to predict the function of fertility-related protein, which is a highly crucial biological function. Fertility-related proteins also have been proven to be important in many biological entities (i.e., bone marrow and peripheral blood, postnatal mammalian ovary) and parameters (i.e., daily sperm production). As a result, our model can achieve a cross-validation accuracy of 85.8% and an independent accuracy of 91.1%. We also solve the problem of overfitting in the data set by adding dropout layers in the deep learning model. The independent testing results showed sensitivity, specificity, and Matthews correlation coefficient (MCC) values of 90.5%, 91.7%, and 0.82, respectively. Fertility-GRU demonstrates superiority in performance against the state-of-the-art predictor on the same data set. In our proposed study, we provided a method that enables more proteins to be discovered, especially proteins associated with fertility. Moreover, our achievement could promote the use of recurrent networks and gated recurrent units in proteome research. The source code and data set are freely accessible via https://github.com/khanhlee/fertility-gru.

Original languageEnglish
JournalJournal of Proteome Research
DOIs
Publication statusAccepted/In press - Jan 1 2019
Externally publishedYes

Fingerprint

Position-Specific Scoring Matrices
Fertility
Proteins
Proteome
Learning
Research
Spermatozoa
Ovary
Bone
Blood
Bone Marrow
Research Personnel
Sensitivity and Specificity
Testing

Keywords

  • deep learning
  • embryogenesis
  • infertility
  • oogenesis process
  • position-specific scoring matrix
  • protein function prediction
  • recurrent neural network
  • reproductive physiology
  • sperm metabolism
  • spermatogenesis

ASJC Scopus subject areas

  • Biochemistry
  • Chemistry(all)

Cite this

@article{a77f6fd0f2234f74a6205332edb493f7,
title = "Fertility-GRU: Identifying Fertility-Related Proteins by Incorporating Deep-Gated Recurrent Units and Original Position-Specific Scoring Matrix Profiles",
abstract = "Protein function prediction is one of the well-known problems in proteome research, attracting the attention of numerous researchers. However, the implementation of deep neural networks, which helps to increase the protein function prediction, still poses a big challenge. This study proposes a deep learning approach namely Fertility-GRU that incorporates gated recurrent units and position-specific scoring matrix profiles to predict the function of fertility-related protein, which is a highly crucial biological function. Fertility-related proteins also have been proven to be important in many biological entities (i.e., bone marrow and peripheral blood, postnatal mammalian ovary) and parameters (i.e., daily sperm production). As a result, our model can achieve a cross-validation accuracy of 85.8{\%} and an independent accuracy of 91.1{\%}. We also solve the problem of overfitting in the data set by adding dropout layers in the deep learning model. The independent testing results showed sensitivity, specificity, and Matthews correlation coefficient (MCC) values of 90.5{\%}, 91.7{\%}, and 0.82, respectively. Fertility-GRU demonstrates superiority in performance against the state-of-the-art predictor on the same data set. In our proposed study, we provided a method that enables more proteins to be discovered, especially proteins associated with fertility. Moreover, our achievement could promote the use of recurrent networks and gated recurrent units in proteome research. The source code and data set are freely accessible via https://github.com/khanhlee/fertility-gru.",
keywords = "deep learning, embryogenesis, infertility, oogenesis process, position-specific scoring matrix, protein function prediction, recurrent neural network, reproductive physiology, sperm metabolism, spermatogenesis",
author = "Le, {Nguyen Quoc Khanh}",
year = "2019",
month = "1",
day = "1",
doi = "10.1021/acs.jproteome.9b00411",
language = "English",
journal = "Journal of Proteome Research",
issn = "1535-3893",
publisher = "American Chemical Society",

}

TY - JOUR

T1 - Fertility-GRU

T2 - Identifying Fertility-Related Proteins by Incorporating Deep-Gated Recurrent Units and Original Position-Specific Scoring Matrix Profiles

AU - Le, Nguyen Quoc Khanh

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Protein function prediction is one of the well-known problems in proteome research, attracting the attention of numerous researchers. However, the implementation of deep neural networks, which helps to increase the protein function prediction, still poses a big challenge. This study proposes a deep learning approach namely Fertility-GRU that incorporates gated recurrent units and position-specific scoring matrix profiles to predict the function of fertility-related protein, which is a highly crucial biological function. Fertility-related proteins also have been proven to be important in many biological entities (i.e., bone marrow and peripheral blood, postnatal mammalian ovary) and parameters (i.e., daily sperm production). As a result, our model can achieve a cross-validation accuracy of 85.8% and an independent accuracy of 91.1%. We also solve the problem of overfitting in the data set by adding dropout layers in the deep learning model. The independent testing results showed sensitivity, specificity, and Matthews correlation coefficient (MCC) values of 90.5%, 91.7%, and 0.82, respectively. Fertility-GRU demonstrates superiority in performance against the state-of-the-art predictor on the same data set. In our proposed study, we provided a method that enables more proteins to be discovered, especially proteins associated with fertility. Moreover, our achievement could promote the use of recurrent networks and gated recurrent units in proteome research. The source code and data set are freely accessible via https://github.com/khanhlee/fertility-gru.

AB - Protein function prediction is one of the well-known problems in proteome research, attracting the attention of numerous researchers. However, the implementation of deep neural networks, which helps to increase the protein function prediction, still poses a big challenge. This study proposes a deep learning approach namely Fertility-GRU that incorporates gated recurrent units and position-specific scoring matrix profiles to predict the function of fertility-related protein, which is a highly crucial biological function. Fertility-related proteins also have been proven to be important in many biological entities (i.e., bone marrow and peripheral blood, postnatal mammalian ovary) and parameters (i.e., daily sperm production). As a result, our model can achieve a cross-validation accuracy of 85.8% and an independent accuracy of 91.1%. We also solve the problem of overfitting in the data set by adding dropout layers in the deep learning model. The independent testing results showed sensitivity, specificity, and Matthews correlation coefficient (MCC) values of 90.5%, 91.7%, and 0.82, respectively. Fertility-GRU demonstrates superiority in performance against the state-of-the-art predictor on the same data set. In our proposed study, we provided a method that enables more proteins to be discovered, especially proteins associated with fertility. Moreover, our achievement could promote the use of recurrent networks and gated recurrent units in proteome research. The source code and data set are freely accessible via https://github.com/khanhlee/fertility-gru.

KW - deep learning

KW - embryogenesis

KW - infertility

KW - oogenesis process

KW - position-specific scoring matrix

KW - protein function prediction

KW - recurrent neural network

KW - reproductive physiology

KW - sperm metabolism

KW - spermatogenesis

UR - http://www.scopus.com/inward/record.url?scp=85071174376&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85071174376&partnerID=8YFLogxK

U2 - 10.1021/acs.jproteome.9b00411

DO - 10.1021/acs.jproteome.9b00411

M3 - Article

C2 - 31362508

AN - SCOPUS:85071174376

JO - Journal of Proteome Research

JF - Journal of Proteome Research

SN - 1535-3893

ER -