Incorporating a transfer learning technique with amino acid embeddings to efficiently predict N-linked glycosylation sites in ion channels

Trinh Trung Duong Nguyen, Nguyen Quoc Khanh Le, The Anh Tran, Dinh Minh Pham, Yu Yen Ou

Research output: Contribution to journalArticlepeer-review

Abstract

Glycosylation is a dynamic enzymatic process that attaches glycan to proteins or other organic molecules such as lipoproteins. Research has shown that such a process in ion channel proteins plays a fundamental role in modulating ion channel functions. This study used a computational method to predict N-linked glycosylation sites, the most common type, in ion channel proteins. From segments of ion channel proteins centered around N-linked glycosylation sites, the amino acid embedding vectors of each residue were concatenated to create features for prediction. We experimented with two different models for converting amino acids to their corresponding embeddings: one was fed with ion channel sequences and the other with a large dataset composed of more than one million protein sequences. The latter model stemmed from the idea of transfer learning technique and emerged as a more efficient feature extractor. Our best model was obtained from this transfer learning approach and a hyperparameter tuning process with a random search on 5-fold cross-validation data. It achieved an accuracy, specificity, sensitivity, and Matthews correlation coefficient of 93.4%, 92.8%, 98.6%, and 0.726, respectively. Corresponding scores on an independent test were 92.9%, 92.2%, 99%, and 0.717. These results outperform the position-specific scoring matrix features that are predominantly employed in post-translational modification site predictions. Furthermore, compared to N-GlyDE, GlycoEP, SPRINT-Gly, the most recent N-linked glycosylation site predictors, our model yields higher scores on the above 4 metrics, thus further demonstrating the efficiency of our approach.

Original languageEnglish
Article number104212
JournalComputers in Biology and Medicine
Volume130
DOIs
Publication statusPublished - Mar 2021

Keywords

  • Amino acid embeddings
  • Ion channel
  • N-linked glycosylation
  • Post-translational modification site prediction
  • Transfer learning

ASJC Scopus subject areas

  • Computer Science Applications
  • Health Informatics

Fingerprint

Dive into the research topics of 'Incorporating a transfer learning technique with amino acid embeddings to efficiently predict N-linked glycosylation sites in ion channels'. Together they form a unique fingerprint.

Cite this