LPTK

A linguistic pattern-aware dependency tree kernel approach for the BioCreative VI CHEMPROT task

Neha Warikoo, Yung Chun Chang, Wen Lian Hsu

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Identifying the interactions between chemical compounds and genes from biomedical literatures is one of the frequently discussed topics of text mining in the life science field. In this paper, we describe Linguistic Pattern-Aware Dependency Tree Kernel, a linguistic interaction pattern learning method developed for CHEMPROT task-BioCreative VI, to capture chemical-protein interaction (CPI) patterns within biomedical literatures. We also introduce a framework to integrate these linguistic patterns with smooth partial tree kernel to extract the CPIs. This new method of feature representation models aspects of linguistic probability in geometric representation, which not only optimizes the sufficiency of feature dimension for classification, but also defines features as interpretable contexts rather than long vectors of numbers. In order to test the robustness and efficiency of our system in identifying different kinds of biological interactions, we evaluated our framework on three separate data sets, i.e. CHEMPROT corpus, Chemical-Disease Relation corpus and Protein-Protein Interaction corpus. Corresponding experiment results demonstrate that our method is effective and outperforms several compared systems for each data set.

Original languageEnglish
JournalDatabase
Volume2018
Issue number2018
DOIs
Publication statusPublished - Jan 1 2018

Fingerprint

Linguistics
seeds
Proteins
chemical compounds
protein-protein interactions
Chemical compounds
Data Mining
Biological Science Disciplines
learning
methodology
taxonomy
Genes
extracts
Learning
Efficiency
Dependency (Psychology)
genes
proteins
testing
Experiments

ASJC Scopus subject areas

  • Information Systems
  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)

Cite this

LPTK : A linguistic pattern-aware dependency tree kernel approach for the BioCreative VI CHEMPROT task. / Warikoo, Neha; Chang, Yung Chun; Hsu, Wen Lian.

In: Database, Vol. 2018, No. 2018, 01.01.2018.

Research output: Contribution to journalArticle

@article{8444d7b02d774940815b4f946d79a9e5,
title = "LPTK: A linguistic pattern-aware dependency tree kernel approach for the BioCreative VI CHEMPROT task",
abstract = "Identifying the interactions between chemical compounds and genes from biomedical literatures is one of the frequently discussed topics of text mining in the life science field. In this paper, we describe Linguistic Pattern-Aware Dependency Tree Kernel, a linguistic interaction pattern learning method developed for CHEMPROT task-BioCreative VI, to capture chemical-protein interaction (CPI) patterns within biomedical literatures. We also introduce a framework to integrate these linguistic patterns with smooth partial tree kernel to extract the CPIs. This new method of feature representation models aspects of linguistic probability in geometric representation, which not only optimizes the sufficiency of feature dimension for classification, but also defines features as interpretable contexts rather than long vectors of numbers. In order to test the robustness and efficiency of our system in identifying different kinds of biological interactions, we evaluated our framework on three separate data sets, i.e. CHEMPROT corpus, Chemical-Disease Relation corpus and Protein-Protein Interaction corpus. Corresponding experiment results demonstrate that our method is effective and outperforms several compared systems for each data set.",
author = "Neha Warikoo and Chang, {Yung Chun} and Hsu, {Wen Lian}",
year = "2018",
month = "1",
day = "1",
doi = "10.1093/database/bay108",
language = "English",
volume = "2018",
journal = "Database : the journal of biological databases and curation",
issn = "1758-0463",
publisher = "Oxford University Press",
number = "2018",

}

TY - JOUR

T1 - LPTK

T2 - A linguistic pattern-aware dependency tree kernel approach for the BioCreative VI CHEMPROT task

AU - Warikoo, Neha

AU - Chang, Yung Chun

AU - Hsu, Wen Lian

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Identifying the interactions between chemical compounds and genes from biomedical literatures is one of the frequently discussed topics of text mining in the life science field. In this paper, we describe Linguistic Pattern-Aware Dependency Tree Kernel, a linguistic interaction pattern learning method developed for CHEMPROT task-BioCreative VI, to capture chemical-protein interaction (CPI) patterns within biomedical literatures. We also introduce a framework to integrate these linguistic patterns with smooth partial tree kernel to extract the CPIs. This new method of feature representation models aspects of linguistic probability in geometric representation, which not only optimizes the sufficiency of feature dimension for classification, but also defines features as interpretable contexts rather than long vectors of numbers. In order to test the robustness and efficiency of our system in identifying different kinds of biological interactions, we evaluated our framework on three separate data sets, i.e. CHEMPROT corpus, Chemical-Disease Relation corpus and Protein-Protein Interaction corpus. Corresponding experiment results demonstrate that our method is effective and outperforms several compared systems for each data set.

AB - Identifying the interactions between chemical compounds and genes from biomedical literatures is one of the frequently discussed topics of text mining in the life science field. In this paper, we describe Linguistic Pattern-Aware Dependency Tree Kernel, a linguistic interaction pattern learning method developed for CHEMPROT task-BioCreative VI, to capture chemical-protein interaction (CPI) patterns within biomedical literatures. We also introduce a framework to integrate these linguistic patterns with smooth partial tree kernel to extract the CPIs. This new method of feature representation models aspects of linguistic probability in geometric representation, which not only optimizes the sufficiency of feature dimension for classification, but also defines features as interpretable contexts rather than long vectors of numbers. In order to test the robustness and efficiency of our system in identifying different kinds of biological interactions, we evaluated our framework on three separate data sets, i.e. CHEMPROT corpus, Chemical-Disease Relation corpus and Protein-Protein Interaction corpus. Corresponding experiment results demonstrate that our method is effective and outperforms several compared systems for each data set.

UR - http://www.scopus.com/inward/record.url?scp=85055077690&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85055077690&partnerID=8YFLogxK

U2 - 10.1093/database/bay108

DO - 10.1093/database/bay108

M3 - Article

VL - 2018

JO - Database : the journal of biological databases and curation

JF - Database : the journal of biological databases and curation

SN - 1758-0463

IS - 2018

ER -