Abstract
A new methodology that structures the semantics of a collection of documents into the geometry of a simplicial complex is developed: A primitive concept is represented by a top dimension simplex. and a connected component represents a concept. Based on these structures, documents can be clustered into some meaningful classes. Experiments with three different data sets from web pages and medical literature have shown that the proposed unsupervised clustering approach performs significantly better than traditional clustering algorithms, such as k-means, AutoClass and Hierarchical Clustering (HAC). This abstract geometric model seems have captured the intrinsic semantics of the documents.
Original language | English |
---|---|
Title of host publication | 2005 IEEE International Conference on Granular Computing |
Pages | 189-192 |
Number of pages | 4 |
Volume | 2005 |
DOIs | |
Publication status | Published - 2005 |
Event | 2005 IEEE International Conference on Granular Computing - Beijing, China Duration: Jul 25 2005 → Jul 27 2005 |
Other
Other | 2005 IEEE International Conference on Granular Computing |
---|---|
Country/Territory | China |
City | Beijing |
Period | 7/25/05 → 7/27/05 |
Keywords
- Clustering
- Document
- Polyhedron
- Semantics
- Web
ASJC Scopus subject areas
- Engineering(all)