DeepEfflux: A 2D convolutional neural network model for identifying families of efflux proteins in transporters

Semmy Wellem Taju, Trinh Trung Duong Nguyen, Nguyen Quoc Khanh Le, Rosdyana Mangir Irawan Kusuma, Yu Yen Ou

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

Motivation: Efflux protein plays a key role in pumping xenobiotics out of the cells. The prediction of efflux family proteins involved in transport process of compounds is crucial for understanding family structures, functions and energy dependencies. Many methods have been proposed to classify efflux pump transporters without considerations of any pump specific of efflux protein families. In other words, efflux proteins protect cells from extrusion of foreign chemicals. Moreover, almost all efflux protein families have the same structure based on the analysis of significant motifs. The motif sequences consisting of the same amount of residues will have high degrees of residue similarity and thus will affect the classification process. Consequently, it is challenging but vital to recognize the structures and determine energy dependencies of efflux protein families. In order to efficiently identify efflux protein families with considering about pump specific, we developed a 2 D convolutional neural network (2 D CNN) model called DeepEfflux. DeepEfflux tried to capture the motifs of sequences around hidden target residues to use as hidden features of families. In addition, the 2 D CNN model uses a position-specific scoring matrix (PSSM) as an input. Three different datasets, each for one family of efflux protein, was fed into DeepEfflux, and then a 5-fold cross validation approach was used to evaluate the training performance. Results: The model evaluation results show that DeepEfflux outperforms traditional machine learning algorithms. Furthermore, the accuracy of 96.02%, 94.89% and 90.34% for classes A, B and C, respectively, in the independent test results show that our model can perform well and can be used as a reliable tool for identifying families of efflux proteins in transporters.

Original languageEnglish
Pages (from-to)3111-3117
Number of pages7
JournalBioinformatics
Volume34
Issue number18
DOIs
Publication statusPublished - Jan 1 2018
Externally publishedYes

Fingerprint

Neural Networks (Computer)
Neural Network Model
Neural networks
Proteins
Protein
Pump
Pumps
Position-Specific Scoring Matrices
Family
Model Evaluation
Transport Processes
Extrusion
Cell
Xenobiotics
Structure-function
Energy
Scoring
Cross-validation
Learning algorithms
Learning systems

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

DeepEfflux : A 2D convolutional neural network model for identifying families of efflux proteins in transporters. / Wellem Taju, Semmy; Nguyen, Trinh Trung Duong; Le, Nguyen Quoc Khanh; Irawan Kusuma, Rosdyana Mangir; Ou, Yu Yen.

In: Bioinformatics, Vol. 34, No. 18, 01.01.2018, p. 3111-3117.

Research output: Contribution to journalArticle

Wellem Taju, Semmy ; Nguyen, Trinh Trung Duong ; Le, Nguyen Quoc Khanh ; Irawan Kusuma, Rosdyana Mangir ; Ou, Yu Yen. / DeepEfflux : A 2D convolutional neural network model for identifying families of efflux proteins in transporters. In: Bioinformatics. 2018 ; Vol. 34, No. 18. pp. 3111-3117.
@article{afacf4afba6247b3adfe59e83be4d271,
title = "DeepEfflux: A 2D convolutional neural network model for identifying families of efflux proteins in transporters",
abstract = "Motivation: Efflux protein plays a key role in pumping xenobiotics out of the cells. The prediction of efflux family proteins involved in transport process of compounds is crucial for understanding family structures, functions and energy dependencies. Many methods have been proposed to classify efflux pump transporters without considerations of any pump specific of efflux protein families. In other words, efflux proteins protect cells from extrusion of foreign chemicals. Moreover, almost all efflux protein families have the same structure based on the analysis of significant motifs. The motif sequences consisting of the same amount of residues will have high degrees of residue similarity and thus will affect the classification process. Consequently, it is challenging but vital to recognize the structures and determine energy dependencies of efflux protein families. In order to efficiently identify efflux protein families with considering about pump specific, we developed a 2 D convolutional neural network (2 D CNN) model called DeepEfflux. DeepEfflux tried to capture the motifs of sequences around hidden target residues to use as hidden features of families. In addition, the 2 D CNN model uses a position-specific scoring matrix (PSSM) as an input. Three different datasets, each for one family of efflux protein, was fed into DeepEfflux, and then a 5-fold cross validation approach was used to evaluate the training performance. Results: The model evaluation results show that DeepEfflux outperforms traditional machine learning algorithms. Furthermore, the accuracy of 96.02{\%}, 94.89{\%} and 90.34{\%} for classes A, B and C, respectively, in the independent test results show that our model can perform well and can be used as a reliable tool for identifying families of efflux proteins in transporters.",
author = "{Wellem Taju}, Semmy and Nguyen, {Trinh Trung Duong} and Le, {Nguyen Quoc Khanh} and {Irawan Kusuma}, {Rosdyana Mangir} and Ou, {Yu Yen}",
year = "2018",
month = "1",
day = "1",
doi = "10.1093/bioinformatics/bty302",
language = "English",
volume = "34",
pages = "3111--3117",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "18",

}

TY - JOUR

T1 - DeepEfflux

T2 - A 2D convolutional neural network model for identifying families of efflux proteins in transporters

AU - Wellem Taju, Semmy

AU - Nguyen, Trinh Trung Duong

AU - Le, Nguyen Quoc Khanh

AU - Irawan Kusuma, Rosdyana Mangir

AU - Ou, Yu Yen

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Motivation: Efflux protein plays a key role in pumping xenobiotics out of the cells. The prediction of efflux family proteins involved in transport process of compounds is crucial for understanding family structures, functions and energy dependencies. Many methods have been proposed to classify efflux pump transporters without considerations of any pump specific of efflux protein families. In other words, efflux proteins protect cells from extrusion of foreign chemicals. Moreover, almost all efflux protein families have the same structure based on the analysis of significant motifs. The motif sequences consisting of the same amount of residues will have high degrees of residue similarity and thus will affect the classification process. Consequently, it is challenging but vital to recognize the structures and determine energy dependencies of efflux protein families. In order to efficiently identify efflux protein families with considering about pump specific, we developed a 2 D convolutional neural network (2 D CNN) model called DeepEfflux. DeepEfflux tried to capture the motifs of sequences around hidden target residues to use as hidden features of families. In addition, the 2 D CNN model uses a position-specific scoring matrix (PSSM) as an input. Three different datasets, each for one family of efflux protein, was fed into DeepEfflux, and then a 5-fold cross validation approach was used to evaluate the training performance. Results: The model evaluation results show that DeepEfflux outperforms traditional machine learning algorithms. Furthermore, the accuracy of 96.02%, 94.89% and 90.34% for classes A, B and C, respectively, in the independent test results show that our model can perform well and can be used as a reliable tool for identifying families of efflux proteins in transporters.

AB - Motivation: Efflux protein plays a key role in pumping xenobiotics out of the cells. The prediction of efflux family proteins involved in transport process of compounds is crucial for understanding family structures, functions and energy dependencies. Many methods have been proposed to classify efflux pump transporters without considerations of any pump specific of efflux protein families. In other words, efflux proteins protect cells from extrusion of foreign chemicals. Moreover, almost all efflux protein families have the same structure based on the analysis of significant motifs. The motif sequences consisting of the same amount of residues will have high degrees of residue similarity and thus will affect the classification process. Consequently, it is challenging but vital to recognize the structures and determine energy dependencies of efflux protein families. In order to efficiently identify efflux protein families with considering about pump specific, we developed a 2 D convolutional neural network (2 D CNN) model called DeepEfflux. DeepEfflux tried to capture the motifs of sequences around hidden target residues to use as hidden features of families. In addition, the 2 D CNN model uses a position-specific scoring matrix (PSSM) as an input. Three different datasets, each for one family of efflux protein, was fed into DeepEfflux, and then a 5-fold cross validation approach was used to evaluate the training performance. Results: The model evaluation results show that DeepEfflux outperforms traditional machine learning algorithms. Furthermore, the accuracy of 96.02%, 94.89% and 90.34% for classes A, B and C, respectively, in the independent test results show that our model can perform well and can be used as a reliable tool for identifying families of efflux proteins in transporters.

UR - http://www.scopus.com/inward/record.url?scp=85061652064&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85061652064&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bty302

DO - 10.1093/bioinformatics/bty302

M3 - Article

C2 - 29668844

AN - SCOPUS:85061652064

VL - 34

SP - 3111

EP - 3117

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 18

ER -