Distributed keyword vector representation for document categorization

Yu Lun Hsieh, Shih Hung Liu, Yung Chun Chang, Wen Lian Hsu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

In the age of information explosion, efficiently categorizing the topic of a document can assist our organization and comprehension of the vast amount of text. In this paper, we propose a novel approach, named DKV, for document categorization using distributed real-valued vector representation of keywords learned from neural networks. Such a representation can project rich context information (or embedding) into the vector space, and subsequently be used to infer similarity measures among words, sentences, and even documents. Using a Chinese news corpus containing over 100,000 articles and five topics, we provide a comprehensive performance evaluation to demonstrate that by exploiting the keyword embeddings, DKV paired with support vector machines can effectively categorize a document into the predefined topics. Results demonstrate that our method can achieve the best performances compared to several other approaches.

Original languageEnglish
Title of host publicationTAAI 2015 - 2015 Conference on Technologies and Applications of Artificial Intelligence
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages245-251
Number of pages7
ISBN (Electronic)9781467396066
DOIs
Publication statusPublished - Feb 12 2016
Externally publishedYes
EventConference on Technologies and Applications of Artificial Intelligence, TAAI 2015 - Tainan, Taiwan
Duration: Nov 20 2015Nov 22 2015

Conference

ConferenceConference on Technologies and Applications of Artificial Intelligence, TAAI 2015
CountryTaiwan
CityTainan
Period11/20/1511/22/15

Fingerprint

Vector spaces
Explosions
Support vector machines
Neural networks

Keywords

  • document representation
  • neural network
  • word embedding

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications

Cite this

Hsieh, Y. L., Liu, S. H., Chang, Y. C., & Hsu, W. L. (2016). Distributed keyword vector representation for document categorization. In TAAI 2015 - 2015 Conference on Technologies and Applications of Artificial Intelligence (pp. 245-251). [7407126] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/TAAI.2015.7407126

Distributed keyword vector representation for document categorization. / Hsieh, Yu Lun; Liu, Shih Hung; Chang, Yung Chun; Hsu, Wen Lian.

TAAI 2015 - 2015 Conference on Technologies and Applications of Artificial Intelligence. Institute of Electrical and Electronics Engineers Inc., 2016. p. 245-251 7407126.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Hsieh, YL, Liu, SH, Chang, YC & Hsu, WL 2016, Distributed keyword vector representation for document categorization. in TAAI 2015 - 2015 Conference on Technologies and Applications of Artificial Intelligence., 7407126, Institute of Electrical and Electronics Engineers Inc., pp. 245-251, Conference on Technologies and Applications of Artificial Intelligence, TAAI 2015, Tainan, Taiwan, 11/20/15. https://doi.org/10.1109/TAAI.2015.7407126
Hsieh YL, Liu SH, Chang YC, Hsu WL. Distributed keyword vector representation for document categorization. In TAAI 2015 - 2015 Conference on Technologies and Applications of Artificial Intelligence. Institute of Electrical and Electronics Engineers Inc. 2016. p. 245-251. 7407126 https://doi.org/10.1109/TAAI.2015.7407126
Hsieh, Yu Lun ; Liu, Shih Hung ; Chang, Yung Chun ; Hsu, Wen Lian. / Distributed keyword vector representation for document categorization. TAAI 2015 - 2015 Conference on Technologies and Applications of Artificial Intelligence. Institute of Electrical and Electronics Engineers Inc., 2016. pp. 245-251
@inproceedings{dd0fda2ddca4429cbf5ec8e24792b7e8,
title = "Distributed keyword vector representation for document categorization",
abstract = "In the age of information explosion, efficiently categorizing the topic of a document can assist our organization and comprehension of the vast amount of text. In this paper, we propose a novel approach, named DKV, for document categorization using distributed real-valued vector representation of keywords learned from neural networks. Such a representation can project rich context information (or embedding) into the vector space, and subsequently be used to infer similarity measures among words, sentences, and even documents. Using a Chinese news corpus containing over 100,000 articles and five topics, we provide a comprehensive performance evaluation to demonstrate that by exploiting the keyword embeddings, DKV paired with support vector machines can effectively categorize a document into the predefined topics. Results demonstrate that our method can achieve the best performances compared to several other approaches.",
keywords = "document representation, neural network, word embedding",
author = "Hsieh, {Yu Lun} and Liu, {Shih Hung} and Chang, {Yung Chun} and Hsu, {Wen Lian}",
year = "2016",
month = "2",
day = "12",
doi = "10.1109/TAAI.2015.7407126",
language = "English",
pages = "245--251",
booktitle = "TAAI 2015 - 2015 Conference on Technologies and Applications of Artificial Intelligence",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - Distributed keyword vector representation for document categorization

AU - Hsieh, Yu Lun

AU - Liu, Shih Hung

AU - Chang, Yung Chun

AU - Hsu, Wen Lian

PY - 2016/2/12

Y1 - 2016/2/12

N2 - In the age of information explosion, efficiently categorizing the topic of a document can assist our organization and comprehension of the vast amount of text. In this paper, we propose a novel approach, named DKV, for document categorization using distributed real-valued vector representation of keywords learned from neural networks. Such a representation can project rich context information (or embedding) into the vector space, and subsequently be used to infer similarity measures among words, sentences, and even documents. Using a Chinese news corpus containing over 100,000 articles and five topics, we provide a comprehensive performance evaluation to demonstrate that by exploiting the keyword embeddings, DKV paired with support vector machines can effectively categorize a document into the predefined topics. Results demonstrate that our method can achieve the best performances compared to several other approaches.

AB - In the age of information explosion, efficiently categorizing the topic of a document can assist our organization and comprehension of the vast amount of text. In this paper, we propose a novel approach, named DKV, for document categorization using distributed real-valued vector representation of keywords learned from neural networks. Such a representation can project rich context information (or embedding) into the vector space, and subsequently be used to infer similarity measures among words, sentences, and even documents. Using a Chinese news corpus containing over 100,000 articles and five topics, we provide a comprehensive performance evaluation to demonstrate that by exploiting the keyword embeddings, DKV paired with support vector machines can effectively categorize a document into the predefined topics. Results demonstrate that our method can achieve the best performances compared to several other approaches.

KW - document representation

KW - neural network

KW - word embedding

UR - http://www.scopus.com/inward/record.url?scp=84964228049&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84964228049&partnerID=8YFLogxK

U2 - 10.1109/TAAI.2015.7407126

DO - 10.1109/TAAI.2015.7407126

M3 - Conference contribution

AN - SCOPUS:84964228049

SP - 245

EP - 251

BT - TAAI 2015 - 2015 Conference on Technologies and Applications of Artificial Intelligence

PB - Institute of Electrical and Electronics Engineers Inc.

ER -