Recognition and Evaluation of Clinical Section Headings in Clinical Documents Using Token-Based Formulation with Conditional Random Fields

Hong Jie Dai, Shabbir Syed-Abdul, Chih Wei Chen, Chieh Chen Wu

Research output: Contribution to journalArticle

15 Citations (Scopus)

Abstract

Electronic health record (EHR) is a digital data format that collects electronic health information about an individual patient or population. To enhance the meaningful use of EHRs, information extraction techniques have been developed to recognize clinical concepts mentioned in EHRs. Nevertheless, the clinical judgment of an EHR cannot be known solely based on the recognized concepts without considering its contextual information. In order to improve the readability and accessibility of EHRs, this work developed a section heading recognition system for clinical documents. In contrast to formulating the section heading recognition task as a sentence classification problem, this work proposed a token-based formulation with the conditional random field (CRF) model. A standard section heading recognition corpus was compiled by annotators with clinical experience to evaluate the performance and compare it with sentence classification and dictionary-based approaches. The results of the experiments showed that the proposed method achieved a satisfactory F-score of 0.942, which outperformed the sentence-based approach and the best dictionary-based system by 0.087 and 0.096, respectively. One important advantage of our formulation over the sentence-based approach is that it presented an integrated solution without the need to develop additional heuristics rules for isolating the headings from the surrounding section contents.

Original languageEnglish
Article number873012
JournalBioMed Research International
Volume2015
DOIs
Publication statusPublished - 2015

Fingerprint

Electronic Health Records
Health
Glossaries
Information Storage and Retrieval
Population
Recognition (Psychology)
Experiments
Heuristics

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)
  • Immunology and Microbiology(all)

Cite this

Recognition and Evaluation of Clinical Section Headings in Clinical Documents Using Token-Based Formulation with Conditional Random Fields. / Dai, Hong Jie; Syed-Abdul, Shabbir; Chen, Chih Wei; Wu, Chieh Chen.

In: BioMed Research International, Vol. 2015, 873012, 2015.

Research output: Contribution to journalArticle

@article{07654e954c6c458cb5513dad1e824cf9,
title = "Recognition and Evaluation of Clinical Section Headings in Clinical Documents Using Token-Based Formulation with Conditional Random Fields",
abstract = "Electronic health record (EHR) is a digital data format that collects electronic health information about an individual patient or population. To enhance the meaningful use of EHRs, information extraction techniques have been developed to recognize clinical concepts mentioned in EHRs. Nevertheless, the clinical judgment of an EHR cannot be known solely based on the recognized concepts without considering its contextual information. In order to improve the readability and accessibility of EHRs, this work developed a section heading recognition system for clinical documents. In contrast to formulating the section heading recognition task as a sentence classification problem, this work proposed a token-based formulation with the conditional random field (CRF) model. A standard section heading recognition corpus was compiled by annotators with clinical experience to evaluate the performance and compare it with sentence classification and dictionary-based approaches. The results of the experiments showed that the proposed method achieved a satisfactory F-score of 0.942, which outperformed the sentence-based approach and the best dictionary-based system by 0.087 and 0.096, respectively. One important advantage of our formulation over the sentence-based approach is that it presented an integrated solution without the need to develop additional heuristics rules for isolating the headings from the surrounding section contents.",
author = "Dai, {Hong Jie} and Shabbir Syed-Abdul and Chen, {Chih Wei} and Wu, {Chieh Chen}",
year = "2015",
doi = "10.1155/2015/873012",
language = "English",
volume = "2015",
journal = "BioMed Research International",
issn = "2314-6133",
publisher = "Hindawi Publishing Corporation",

}

TY - JOUR

T1 - Recognition and Evaluation of Clinical Section Headings in Clinical Documents Using Token-Based Formulation with Conditional Random Fields

AU - Dai, Hong Jie

AU - Syed-Abdul, Shabbir

AU - Chen, Chih Wei

AU - Wu, Chieh Chen

PY - 2015

Y1 - 2015

N2 - Electronic health record (EHR) is a digital data format that collects electronic health information about an individual patient or population. To enhance the meaningful use of EHRs, information extraction techniques have been developed to recognize clinical concepts mentioned in EHRs. Nevertheless, the clinical judgment of an EHR cannot be known solely based on the recognized concepts without considering its contextual information. In order to improve the readability and accessibility of EHRs, this work developed a section heading recognition system for clinical documents. In contrast to formulating the section heading recognition task as a sentence classification problem, this work proposed a token-based formulation with the conditional random field (CRF) model. A standard section heading recognition corpus was compiled by annotators with clinical experience to evaluate the performance and compare it with sentence classification and dictionary-based approaches. The results of the experiments showed that the proposed method achieved a satisfactory F-score of 0.942, which outperformed the sentence-based approach and the best dictionary-based system by 0.087 and 0.096, respectively. One important advantage of our formulation over the sentence-based approach is that it presented an integrated solution without the need to develop additional heuristics rules for isolating the headings from the surrounding section contents.

AB - Electronic health record (EHR) is a digital data format that collects electronic health information about an individual patient or population. To enhance the meaningful use of EHRs, information extraction techniques have been developed to recognize clinical concepts mentioned in EHRs. Nevertheless, the clinical judgment of an EHR cannot be known solely based on the recognized concepts without considering its contextual information. In order to improve the readability and accessibility of EHRs, this work developed a section heading recognition system for clinical documents. In contrast to formulating the section heading recognition task as a sentence classification problem, this work proposed a token-based formulation with the conditional random field (CRF) model. A standard section heading recognition corpus was compiled by annotators with clinical experience to evaluate the performance and compare it with sentence classification and dictionary-based approaches. The results of the experiments showed that the proposed method achieved a satisfactory F-score of 0.942, which outperformed the sentence-based approach and the best dictionary-based system by 0.087 and 0.096, respectively. One important advantage of our formulation over the sentence-based approach is that it presented an integrated solution without the need to develop additional heuristics rules for isolating the headings from the surrounding section contents.

UR - http://www.scopus.com/inward/record.url?scp=84941123554&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84941123554&partnerID=8YFLogxK

U2 - 10.1155/2015/873012

DO - 10.1155/2015/873012

M3 - Article

C2 - 26380302

AN - SCOPUS:84941123554

VL - 2015

JO - BioMed Research International

JF - BioMed Research International

SN - 2314-6133

M1 - 873012

ER -