A frame-based approach for reference metadata extraction

Yu Lun Hsieh, Shih Hung Liu, Ting Hao Yang, Yu Hsuan Chen, Yung Chun Chang, Gladys Hsieh, Cheng Wei Shih, Chun Hung Lu, Wen Lian Hsu

Research output: Contribution to journalArticle

Abstract

In this paper, we propose a novel frame-based approach (FBA) and use reference metadata extraction as a case study to demonstrate its advantages. The main contributions of this research are three-fold. First, the new frame matching algorithm, based on sequence alignment, can compensate for the shortcomings of traditional rule-based approach, in which rule matching lacks flexibility and generality. Second, an approximate matching is adopted for capturing reasonable abbreviations or errors in the input reference string to further increase the coverage of the frames. Third, experiments conducted on extensive datasets show that the same knowledge framework performed equally well on various untrained domains. Comparing to a widely-used machine learning method, Conditional Random Fields (CRFs), the FBA can drastically reduce the average field error rate across all four independent test sets by 70%\ (2.24% vs. 7.54%).

Original languageEnglish
Pages (from-to)154-163
Number of pages10
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8916
Publication statusPublished - 2014
Externally publishedYes

Fingerprint

Metadata
Learning systems
Abbreviation
Conditional Random Fields
Sequence Alignment
Test Set
Threefolds
Matching Algorithm
Independent Set
Error Rate
Machine Learning
Coverage
Strings
Experiments
Flexibility
Demonstrate
Experiment

Keywords

  • Frame-based approach
  • Knowledge representation
  • Reference metadata extraction

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

A frame-based approach for reference metadata extraction. / Hsieh, Yu Lun; Liu, Shih Hung; Yang, Ting Hao; Chen, Yu Hsuan; Chang, Yung Chun; Hsieh, Gladys; Shih, Cheng Wei; Lu, Chun Hung; Hsu, Wen Lian.

In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 8916, 2014, p. 154-163.

Research output: Contribution to journalArticle

Hsieh, Yu Lun ; Liu, Shih Hung ; Yang, Ting Hao ; Chen, Yu Hsuan ; Chang, Yung Chun ; Hsieh, Gladys ; Shih, Cheng Wei ; Lu, Chun Hung ; Hsu, Wen Lian. / A frame-based approach for reference metadata extraction. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2014 ; Vol. 8916. pp. 154-163.
@article{a3cee5028880466a870321efd03d8f9b,
title = "A frame-based approach for reference metadata extraction",
abstract = "In this paper, we propose a novel frame-based approach (FBA) and use reference metadata extraction as a case study to demonstrate its advantages. The main contributions of this research are three-fold. First, the new frame matching algorithm, based on sequence alignment, can compensate for the shortcomings of traditional rule-based approach, in which rule matching lacks flexibility and generality. Second, an approximate matching is adopted for capturing reasonable abbreviations or errors in the input reference string to further increase the coverage of the frames. Third, experiments conducted on extensive datasets show that the same knowledge framework performed equally well on various untrained domains. Comparing to a widely-used machine learning method, Conditional Random Fields (CRFs), the FBA can drastically reduce the average field error rate across all four independent test sets by 70{\%}\ (2.24{\%} vs. 7.54{\%}).",
keywords = "Frame-based approach, Knowledge representation, Reference metadata extraction",
author = "Hsieh, {Yu Lun} and Liu, {Shih Hung} and Yang, {Ting Hao} and Chen, {Yu Hsuan} and Chang, {Yung Chun} and Gladys Hsieh and Shih, {Cheng Wei} and Lu, {Chun Hung} and Hsu, {Wen Lian}",
year = "2014",
language = "English",
volume = "8916",
pages = "154--163",
journal = "Lecture Notes in Computer Science",
issn = "0302-9743",
publisher = "Springer Verlag",

}

TY - JOUR

T1 - A frame-based approach for reference metadata extraction

AU - Hsieh, Yu Lun

AU - Liu, Shih Hung

AU - Yang, Ting Hao

AU - Chen, Yu Hsuan

AU - Chang, Yung Chun

AU - Hsieh, Gladys

AU - Shih, Cheng Wei

AU - Lu, Chun Hung

AU - Hsu, Wen Lian

PY - 2014

Y1 - 2014

N2 - In this paper, we propose a novel frame-based approach (FBA) and use reference metadata extraction as a case study to demonstrate its advantages. The main contributions of this research are three-fold. First, the new frame matching algorithm, based on sequence alignment, can compensate for the shortcomings of traditional rule-based approach, in which rule matching lacks flexibility and generality. Second, an approximate matching is adopted for capturing reasonable abbreviations or errors in the input reference string to further increase the coverage of the frames. Third, experiments conducted on extensive datasets show that the same knowledge framework performed equally well on various untrained domains. Comparing to a widely-used machine learning method, Conditional Random Fields (CRFs), the FBA can drastically reduce the average field error rate across all four independent test sets by 70%\ (2.24% vs. 7.54%).

AB - In this paper, we propose a novel frame-based approach (FBA) and use reference metadata extraction as a case study to demonstrate its advantages. The main contributions of this research are three-fold. First, the new frame matching algorithm, based on sequence alignment, can compensate for the shortcomings of traditional rule-based approach, in which rule matching lacks flexibility and generality. Second, an approximate matching is adopted for capturing reasonable abbreviations or errors in the input reference string to further increase the coverage of the frames. Third, experiments conducted on extensive datasets show that the same knowledge framework performed equally well on various untrained domains. Comparing to a widely-used machine learning method, Conditional Random Fields (CRFs), the FBA can drastically reduce the average field error rate across all four independent test sets by 70%\ (2.24% vs. 7.54%).

KW - Frame-based approach

KW - Knowledge representation

KW - Reference metadata extraction

UR - http://www.scopus.com/inward/record.url?scp=84911938604&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84911938604&partnerID=8YFLogxK

M3 - Article

VL - 8916

SP - 154

EP - 163

JO - Lecture Notes in Computer Science

JF - Lecture Notes in Computer Science

SN - 0302-9743

ER -