Abstract
The paper presents a graph model and an agglomerative algorithm for text document clustering. Given a set of documents, the associations among frequently co-occurring terms in any of the documents naturally form a graph, which can be decomposed into connected components at various levels. Each connected component represents a concept in the collection. These concepts can categorize documents into different semantic classes. The experiments on three different data sets from news, Web, and medical literatures have shown our algorithm is significantly better than traditional clustering algorithms, such as k-means, principal direction division partitioning, AutoClass and hierarchical clustering.
Original language | English |
---|---|
Title of host publication | Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 1556-1563 |
Number of pages | 8 |
ISBN (Print) | 9781479999255 |
DOIs | |
Publication status | Published - Dec 22 2015 |
Event | 3rd IEEE International Conference on Big Data, IEEE Big Data 2015 - Santa Clara, United States Duration: Oct 29 2015 → Nov 1 2015 |
Other
Other | 3rd IEEE International Conference on Big Data, IEEE Big Data 2015 |
---|---|
Country/Territory | United States |
City | Santa Clara |
Period | 10/29/15 → 11/1/15 |
Keywords
- agglomerative document categorization/clustering
- association rules
- hierarchical clustering
- hypergraph
ASJC Scopus subject areas
- Computer Networks and Communications
- Computer Science Applications
- Information Systems
- Software