Developing an NLP and IR-based algorithm for analyzing gene-disease relationships

Y. T. Yen, B. Chen, H. W. Chiu, Y. C. Lee, Y. C. Li, Chien-Yeh Hsu

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Objectives: High-throughput techniques such as cDNA microarray, oligonucleotide arrays, and serial analysis of gene expression (SAGE) have been developed and used to automatically screen huge amounts of gene expression data. However, researchers usually spend lots of time and money on discovering gene-disease relationships by utilizing these techniques. We prototypically implemented an algorithm that can provide some kind of predicted results for biological researchers before they proceed with experiments, and it is very helpful for them to discover gene-disease relationships more efficiently. Methods: Due to the fast development of computer technology, many information retrieval techniques have been applied to analyze huge digital biomedical databases available worldwide. Therefore we highly expect that we can apply information retrieval (IR) technique to extract useful information for the relationship of specific diseases and genes from MEDLINE articles. Furthermore, we also applied natural language processing (NLP) methods to do the semantic analysis for the relevant articles to discover the relationships between genes and diseases. Results: We have extracted gene symbols from our literature collection according to disease MeSH classifications. We hove also built an IR-based retrieval system, "Biomedical Literature Retrieval System (BIRS)" and applied the N-gram model to extract the relationship features which can reveal the relationship between genes and diseases. Finally, a relationship network of a specific disease has been built to represent the gene-disease relationships. Conclusions: A relationship feature is a functional word that can reveal the relationship between one single gene and a disease. By incorporating many modern IR techniques, we found that BLRS is a very powerful information discovery tool for literature searching. A relationship network which contains the information on gene symbol, relationship feature, and disease MeSH term can provide on integrated view to discover gene-disease relationships.

Original languageEnglish
Pages (from-to)321-329
Number of pages9
JournalMethods of Information in Medicine
Volume45
Issue number3
Publication statusPublished - 2006

Fingerprint

Natural Language Processing
Information Storage and Retrieval
Genes
Oligonucleotide Array Sequence Analysis
Research Personnel
Gene Expression
Semantics
MEDLINE

Keywords

  • Disease
  • Gene
  • Information retrieval
  • MeSH
  • Natural language processing
  • Relationship

ASJC Scopus subject areas

  • Health Informatics
  • Health Information Management
  • Nursing(all)

Cite this

Developing an NLP and IR-based algorithm for analyzing gene-disease relationships. / Yen, Y. T.; Chen, B.; Chiu, H. W.; Lee, Y. C.; Li, Y. C.; Hsu, Chien-Yeh.

In: Methods of Information in Medicine, Vol. 45, No. 3, 2006, p. 321-329.

Research output: Contribution to journalArticle

@article{5aaa686e90874b8596dc0b55439ad162,
title = "Developing an NLP and IR-based algorithm for analyzing gene-disease relationships",
abstract = "Objectives: High-throughput techniques such as cDNA microarray, oligonucleotide arrays, and serial analysis of gene expression (SAGE) have been developed and used to automatically screen huge amounts of gene expression data. However, researchers usually spend lots of time and money on discovering gene-disease relationships by utilizing these techniques. We prototypically implemented an algorithm that can provide some kind of predicted results for biological researchers before they proceed with experiments, and it is very helpful for them to discover gene-disease relationships more efficiently. Methods: Due to the fast development of computer technology, many information retrieval techniques have been applied to analyze huge digital biomedical databases available worldwide. Therefore we highly expect that we can apply information retrieval (IR) technique to extract useful information for the relationship of specific diseases and genes from MEDLINE articles. Furthermore, we also applied natural language processing (NLP) methods to do the semantic analysis for the relevant articles to discover the relationships between genes and diseases. Results: We have extracted gene symbols from our literature collection according to disease MeSH classifications. We hove also built an IR-based retrieval system, {"}Biomedical Literature Retrieval System (BIRS){"} and applied the N-gram model to extract the relationship features which can reveal the relationship between genes and diseases. Finally, a relationship network of a specific disease has been built to represent the gene-disease relationships. Conclusions: A relationship feature is a functional word that can reveal the relationship between one single gene and a disease. By incorporating many modern IR techniques, we found that BLRS is a very powerful information discovery tool for literature searching. A relationship network which contains the information on gene symbol, relationship feature, and disease MeSH term can provide on integrated view to discover gene-disease relationships.",
keywords = "Disease, Gene, Information retrieval, MeSH, Natural language processing, Relationship",
author = "Yen, {Y. T.} and B. Chen and Chiu, {H. W.} and Lee, {Y. C.} and Li, {Y. C.} and Chien-Yeh Hsu",
year = "2006",
language = "English",
volume = "45",
pages = "321--329",
journal = "Methods of Information in Medicine",
issn = "0026-1270",
publisher = "Schattauer GmbH",
number = "3",

}

TY - JOUR

T1 - Developing an NLP and IR-based algorithm for analyzing gene-disease relationships

AU - Yen, Y. T.

AU - Chen, B.

AU - Chiu, H. W.

AU - Lee, Y. C.

AU - Li, Y. C.

AU - Hsu, Chien-Yeh

PY - 2006

Y1 - 2006

N2 - Objectives: High-throughput techniques such as cDNA microarray, oligonucleotide arrays, and serial analysis of gene expression (SAGE) have been developed and used to automatically screen huge amounts of gene expression data. However, researchers usually spend lots of time and money on discovering gene-disease relationships by utilizing these techniques. We prototypically implemented an algorithm that can provide some kind of predicted results for biological researchers before they proceed with experiments, and it is very helpful for them to discover gene-disease relationships more efficiently. Methods: Due to the fast development of computer technology, many information retrieval techniques have been applied to analyze huge digital biomedical databases available worldwide. Therefore we highly expect that we can apply information retrieval (IR) technique to extract useful information for the relationship of specific diseases and genes from MEDLINE articles. Furthermore, we also applied natural language processing (NLP) methods to do the semantic analysis for the relevant articles to discover the relationships between genes and diseases. Results: We have extracted gene symbols from our literature collection according to disease MeSH classifications. We hove also built an IR-based retrieval system, "Biomedical Literature Retrieval System (BIRS)" and applied the N-gram model to extract the relationship features which can reveal the relationship between genes and diseases. Finally, a relationship network of a specific disease has been built to represent the gene-disease relationships. Conclusions: A relationship feature is a functional word that can reveal the relationship between one single gene and a disease. By incorporating many modern IR techniques, we found that BLRS is a very powerful information discovery tool for literature searching. A relationship network which contains the information on gene symbol, relationship feature, and disease MeSH term can provide on integrated view to discover gene-disease relationships.

AB - Objectives: High-throughput techniques such as cDNA microarray, oligonucleotide arrays, and serial analysis of gene expression (SAGE) have been developed and used to automatically screen huge amounts of gene expression data. However, researchers usually spend lots of time and money on discovering gene-disease relationships by utilizing these techniques. We prototypically implemented an algorithm that can provide some kind of predicted results for biological researchers before they proceed with experiments, and it is very helpful for them to discover gene-disease relationships more efficiently. Methods: Due to the fast development of computer technology, many information retrieval techniques have been applied to analyze huge digital biomedical databases available worldwide. Therefore we highly expect that we can apply information retrieval (IR) technique to extract useful information for the relationship of specific diseases and genes from MEDLINE articles. Furthermore, we also applied natural language processing (NLP) methods to do the semantic analysis for the relevant articles to discover the relationships between genes and diseases. Results: We have extracted gene symbols from our literature collection according to disease MeSH classifications. We hove also built an IR-based retrieval system, "Biomedical Literature Retrieval System (BIRS)" and applied the N-gram model to extract the relationship features which can reveal the relationship between genes and diseases. Finally, a relationship network of a specific disease has been built to represent the gene-disease relationships. Conclusions: A relationship feature is a functional word that can reveal the relationship between one single gene and a disease. By incorporating many modern IR techniques, we found that BLRS is a very powerful information discovery tool for literature searching. A relationship network which contains the information on gene symbol, relationship feature, and disease MeSH term can provide on integrated view to discover gene-disease relationships.

KW - Disease

KW - Gene

KW - Information retrieval

KW - MeSH

KW - Natural language processing

KW - Relationship

UR - http://www.scopus.com/inward/record.url?scp=33745534882&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33745534882&partnerID=8YFLogxK

M3 - Article

C2 - 16685344

AN - SCOPUS:33745534882

VL - 45

SP - 321

EP - 329

JO - Methods of Information in Medicine

JF - Methods of Information in Medicine

SN - 0026-1270

IS - 3

ER -