Gradient Boosting over Linguistic-Pattern-Structured Trees for Learning Protein–Protein Interaction in the Biomedical Literature

Neha Warikoo, Yung Chun Chang, Shang Pin Ma

研究成果: 雜誌貢獻文章同行評審

摘要

Protein-based studies contribute significantly to gathering functional information about biological systems; therefore, the protein–protein interaction detection task is one of the most researched topics in the biomedical literature. To this end, many state-of-the-art systems using syntactic tree kernels (TK) and deep learning have been developed. However, these models are computationally complex and have limited learning interpretability. In this paper, we introduce a linguistic-pattern-representation-based Gradient-Tree Boosting model, i.e., LpGBoost. It uses linguistic patterns to optimize and generate semantically relevant representation vectors for learning over the gradient-tree boosting. The patterns are learned via unsupervised modeling by clustering invariant semantic features. These linguistic representations are semi-interpretable with rich semantic knowledge, and owing to their shallow representation, they are also computationally less expensive. Our experiments with six protein–protein interaction (PPI) corpora demonstrate that LpGBoost outperforms the SOTA tree-kernel models, as well as the CNN-based interaction detection studies for BioInfer and AIMed corpora.
原文英語
文章編號10199
期刊Applied Sciences (Switzerland)
12
發行號20
DOIs
出版狀態已發佈 - 10月 2022

ASJC Scopus subject areas

  • 材料科學(全部)
  • 儀器
  • 工程 (全部)
  • 製程化學與技術
  • 電腦科學應用
  • 流體流動和轉移過程

指紋

深入研究「Gradient Boosting over Linguistic-Pattern-Structured Trees for Learning Protein–Protein Interaction in the Biomedical Literature」主題。共同形成了獨特的指紋。

引用此