Abstract
We propose a statistical frame-based approach (FBA) for natural language processing, and demonstrate its advantage over traditional machine learning methods by using topic detection as a case study. FBA perceives and identifies semantic knowledge in a more general manner by collecting important linguistic patterns within documents through a unique flexible matching scheme that allows word insertion, deletion and substitution (IDS) to capture linguistic structures within the text. In addition, FBA can also overcome major issues of the rule-based approach by reducing human effort through its highly automated pattern generation and summarization. Using Yahoo! Chinese news corpus containing about 140,000 news articles, we provide a comprehensive performance evaluation that demonstrates the effectiveness of FBA in detecting the topic of a document by exploiting the semantic association and the context within the text. Moreover, it outperforms common topic models like Näive Bayes, Vector Space Model, and LDA-SVM.
Original language | English |
---|---|
Title of host publication | Proceedings of the 28th Pacific Asia Conference on Language, Information and Computation, PACLIC 2014 |
Publisher | Faculty of Pharmaceutical Sciences, Chulalongkorn University |
Pages | 75-84 |
Number of pages | 10 |
ISBN (Electronic) | 9786165518871 |
Publication status | Published - 2014 |
Externally published | Yes |
Event | 28th Pacific Asia Conference on Language, Information and Computation, PACLIC 2014 - Phuket, Thailand Duration: Dec 12 2014 → Dec 14 2014 |
Conference
Conference | 28th Pacific Asia Conference on Language, Information and Computation, PACLIC 2014 |
---|---|
Country | Thailand |
City | Phuket |
Period | 12/12/14 → 12/14/14 |
ASJC Scopus subject areas
- Language and Linguistics
- Computer Science (miscellaneous)