Protein function prediction is one of the well-known problems in proteome research, attracting the attention of numerous researchers. However, the implementation of deep neural networks, which helps to increase the protein function prediction, still poses a big challenge. This study proposes a deep learning approach namely Fertility-GRU that incorporates gated recurrent units and position-specific scoring matrix profiles to predict the function of fertility-related protein, which is a highly crucial biological function. Fertility-related proteins also have been proven to be important in many biological entities (i.e., bone marrow and peripheral blood, postnatal mammalian ovary) and parameters (i.e., daily sperm production). As a result, our model can achieve a cross-validation accuracy of 85.8% and an independent accuracy of 91.1%. We also solve the problem of overfitting in the data set by adding dropout layers in the deep learning model. The independent testing results showed sensitivity, specificity, and Matthews correlation coefficient (MCC) values of 90.5%, 91.7%, and 0.82, respectively. Fertility-GRU demonstrates superiority in performance against the state-of-the-art predictor on the same data set. In our proposed study, we provided a method that enables more proteins to be discovered, especially proteins associated with fertility. Moreover, our achievement could promote the use of recurrent networks and gated recurrent units in proteome research. The source code and data set are freely accessible via https://github.com/khanhlee/fertility-gru.
ASJC Scopus subject areas