Comparison of a semi-automatic annotation tool and a natural language processing application for the generation of clinical statement entries

Ching Heng Lin, Nai Yuan Wu, Wei Shao Lai, Der Ming Liou

Research output: Contribution to journalArticle

4 Citations (Scopus)


Background and objective Electronic medical records with encoded entries should enhance the semantic interoperability of document exchange. However, it remains a challenge to encode the narrative concept and to transform the coded concepts into a standard entry-level document. This study aimed to use a novel approach for the generation of entrylevel interoperable clinical documents. Methods Using HL7 clinical document architecture (CDA) as the example, we developed three pipelines to generate entry- level CDA documents. The first approach was a semi-automatic annotation pipeline (SAAP), the second was a natural language processing (NLP) pipeline, and the third merged the above two pipelines. We randomly selected 50 test documents from the i2b2 corpora to evaluate the performance of the three pipelines. Results The 50 randomly selected test documents contained 9365 words, including 588 Observation terms and 123 Procedure terms. For the Observation terms, the merged pipeline had a significantly higher F-measure than the NLP pipeline (0.89 vs 0.80, p < 0.0001), but a similar F-measure to that of the SAAP (0.89 vs 0.87). For the Procedure terms, the F-measure was not significantly different among the three pipelines. Conclusions The combination of a semi-automatic annotation approach and the NLP application seems to be a solution for generating entry-level interoperable clinical documents.

Original languageEnglish
Pages (from-to)132-142
Number of pages11
JournalJournal of the American Medical Informatics Association
Issue number1
Publication statusPublished - Jan 1 2015
Externally publishedYes



  • Auto-complete technique
  • CDA entry level
  • Natural language processing

ASJC Scopus subject areas

  • Health Informatics

Cite this