Section heading recognition in electronic health records using conditional random fields

Chih Wei Chen, Nai Wen Chang, Yung Chun Chang, Hong Jie Dai

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Electronic health records (EHRs) contain a wealth of information, such as discharge diagnoses, laboratory results, and pharmacy orders, which can be used to support clinical decision support systems and enable clinical and translational research. Unfortunately, the information is represented in a highly heterogeneous semi-structured or unstructured format with author- and domainspecific idiosyncrasies, acronyms and abbreviations. To take full advantage of health data, text-mining techniques have been applied by researchers to recognize named entities (NEs) mentioned in EHRs. However, the judgment of clinical data cannot be known solely from the NE level. For instance, a disease mention in the section of past medical history has different clinical significance when mentioned in the family medical history section. To obtain high-quality information and improve the understanding of clinical records, this work developed a machine learning-based section heading recognition system and evaluated its performance on a manually annotated corpus. The experiment results showed that the machine learning-based system achieved a satisfactory F-score of 0.939, which outperformed a dictionary-based system by 0.321.

Original languageEnglish
Pages (from-to)47-55
Number of pages9
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8916
Publication statusPublished - 2014

Fingerprint

Conditional Random Fields
Health
Electronics
Learning systems
Machine Learning
Abbreviation
Acronym
Information Quality
Text Mining
Glossaries
Decision Support Systems
Decision support systems
Experiment
Experiments
History

Keywords

  • Electronic health record
  • Information extraction
  • Natural language processing
  • Section recognition

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

@article{f875ea06658043bc99160028c7a32776,
title = "Section heading recognition in electronic health records using conditional random fields",
abstract = "Electronic health records (EHRs) contain a wealth of information, such as discharge diagnoses, laboratory results, and pharmacy orders, which can be used to support clinical decision support systems and enable clinical and translational research. Unfortunately, the information is represented in a highly heterogeneous semi-structured or unstructured format with author- and domainspecific idiosyncrasies, acronyms and abbreviations. To take full advantage of health data, text-mining techniques have been applied by researchers to recognize named entities (NEs) mentioned in EHRs. However, the judgment of clinical data cannot be known solely from the NE level. For instance, a disease mention in the section of past medical history has different clinical significance when mentioned in the family medical history section. To obtain high-quality information and improve the understanding of clinical records, this work developed a machine learning-based section heading recognition system and evaluated its performance on a manually annotated corpus. The experiment results showed that the machine learning-based system achieved a satisfactory F-score of 0.939, which outperformed a dictionary-based system by 0.321.",
keywords = "Electronic health record, Information extraction, Natural language processing, Section recognition",
author = "Chen, {Chih Wei} and Chang, {Nai Wen} and Chang, {Yung Chun} and Dai, {Hong Jie}",
year = "2014",
language = "English",
volume = "8916",
pages = "47--55",
journal = "Lecture Notes in Computer Science",
issn = "0302-9743",
publisher = "Springer Verlag",

}

TY - JOUR

T1 - Section heading recognition in electronic health records using conditional random fields

AU - Chen, Chih Wei

AU - Chang, Nai Wen

AU - Chang, Yung Chun

AU - Dai, Hong Jie

PY - 2014

Y1 - 2014

N2 - Electronic health records (EHRs) contain a wealth of information, such as discharge diagnoses, laboratory results, and pharmacy orders, which can be used to support clinical decision support systems and enable clinical and translational research. Unfortunately, the information is represented in a highly heterogeneous semi-structured or unstructured format with author- and domainspecific idiosyncrasies, acronyms and abbreviations. To take full advantage of health data, text-mining techniques have been applied by researchers to recognize named entities (NEs) mentioned in EHRs. However, the judgment of clinical data cannot be known solely from the NE level. For instance, a disease mention in the section of past medical history has different clinical significance when mentioned in the family medical history section. To obtain high-quality information and improve the understanding of clinical records, this work developed a machine learning-based section heading recognition system and evaluated its performance on a manually annotated corpus. The experiment results showed that the machine learning-based system achieved a satisfactory F-score of 0.939, which outperformed a dictionary-based system by 0.321.

AB - Electronic health records (EHRs) contain a wealth of information, such as discharge diagnoses, laboratory results, and pharmacy orders, which can be used to support clinical decision support systems and enable clinical and translational research. Unfortunately, the information is represented in a highly heterogeneous semi-structured or unstructured format with author- and domainspecific idiosyncrasies, acronyms and abbreviations. To take full advantage of health data, text-mining techniques have been applied by researchers to recognize named entities (NEs) mentioned in EHRs. However, the judgment of clinical data cannot be known solely from the NE level. For instance, a disease mention in the section of past medical history has different clinical significance when mentioned in the family medical history section. To obtain high-quality information and improve the understanding of clinical records, this work developed a machine learning-based section heading recognition system and evaluated its performance on a manually annotated corpus. The experiment results showed that the machine learning-based system achieved a satisfactory F-score of 0.939, which outperformed a dictionary-based system by 0.321.

KW - Electronic health record

KW - Information extraction

KW - Natural language processing

KW - Section recognition

UR - http://www.scopus.com/inward/record.url?scp=84911940910&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84911940910&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:84911940910

VL - 8916

SP - 47

EP - 55

JO - Lecture Notes in Computer Science

JF - Lecture Notes in Computer Science

SN - 0302-9743

ER -