A simplicial complex, a hypergraph, structure in the latent semantic space of document clustering

Tsau Young Lin, I. Jen Chiang

研究成果: 雜誌貢獻文章同行評審

29 引文 斯高帕斯(Scopus)

摘要

This paper presents a novel approach to document clustering based on some geometric structure in Combinatorial Topology. Given a set of documents, the set of associations among frequently co-occurring terms in documents forms naturally a simplicial complex. Our general thesis is each connected component of this simplicial complex represents a concept in the collection. Based on these concepts, documents can be clustered into meaningful classes. However, in this paper, we attack a softer notion, instead of connected components, we use maximal simplexes of highest dimension as representative of connected components, the concept so defined is called maximal primitive concepts. Experiments with three different data sets from Web pages and medical literature have shown that the proposed unsupervised clustering approach performs significantly better than traditional clustering algorithms, such as k-means, AutoClass and Hierarchical Clustering (HAG). This abstract geometric model seems have captured the latent semantic structure of documents.

原文英語
頁(從 - 到)55-80
頁數26
期刊International Journal of Approximate Reasoning
40
發行號1-2
DOIs
出版狀態已發佈 - 七月 2005

ASJC Scopus subject areas

  • 統計與概率
  • 電氣與電子工程
  • 統計、概率和不確定性
  • 資訊系統與管理
  • 資訊系統
  • 電腦科學應用
  • 人工智慧

指紋

深入研究「A simplicial complex, a hypergraph, structure in the latent semantic space of document clustering」主題。共同形成了獨特的指紋。

引用此