摘要
In the age of information explosion, efficiently categorizing the topic of a document can assist our organization and comprehension of the vast amount of text. In this paper, we propose a novel approach, named DKV, for document categorization using distributed real-valued vector representation of keywords learned from neural networks. Such a representation can project rich context information (or embedding) into the vector space, and subsequently be used to infer similarity measures among words, sentences, and even documents. Using a Chinese news corpus containing over 100,000 articles and five topics, we provide a comprehensive performance evaluation to demonstrate that by exploiting the keyword embeddings, DKV paired with support vector machines can effectively categorize a document into the predefined topics. Results demonstrate that our method can achieve the best performances compared to several other approaches.
原文 | 英語 |
---|---|
主出版物標題 | TAAI 2015 - 2015 Conference on Technologies and Applications of Artificial Intelligence |
發行者 | Institute of Electrical and Electronics Engineers Inc. |
頁面 | 245-251 |
頁數 | 7 |
ISBN(電子) | 9781467396066 |
DOIs | |
出版狀態 | 已發佈 - 二月 12 2016 |
對外發佈 | Yes |
事件 | Conference on Technologies and Applications of Artificial Intelligence, TAAI 2015 - Tainan, 臺灣 持續時間: 十一月 20 2015 → 十一月 22 2015 |
會議
會議 | Conference on Technologies and Applications of Artificial Intelligence, TAAI 2015 |
---|---|
國家 | 臺灣 |
城市 | Tainan |
期間 | 11/20/15 → 11/22/15 |
ASJC Scopus subject areas
- Artificial Intelligence
- Computer Science Applications