Generating hypergraph of term associations for automatic document concept clustering

I. J. Chiang, Tsau Young Lin, J. Y J Hsu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents a novel approach to document clustering using hypergraph decomposition. Given a set of documents, the associations among frequently co-occurring terms in any of the documents define naturally a hypergraph, which can then be decomposed into connected components at various levels. Each connected component represents a primitive concept in the collection. The documents can then be clustered based on the primitive concepts. Experiments with three different data sets from web pages and medical literatures have shown that the proposed unsupervised clustering approach performs significantly better than traditional clustering algorithms, such as k-means, AutoClass and Hierarchical Clustering (HAC). The results indicate that hypergraphs are a perfect model to capture association rules in text and is very useful for automatic document clustering.

Original languageEnglish
Title of host publicationProceedings of the Eighth IASTED International Conference on Artificial Intelligence and Soft Computing
EditorsA.P. Pobil
Pages181-186
Number of pages6
Publication statusPublished - 2004
EventProceedings of the Eighth IASTED International Conference on Atificial Intelligence and Soft Computing - Marbella, Spain
Duration: Sep 1 2004Sep 3 2004

Other

OtherProceedings of the Eighth IASTED International Conference on Atificial Intelligence and Soft Computing
CountrySpain
CityMarbella
Period9/1/049/3/04

Fingerprint

Association rules
Clustering algorithms
Websites
Decomposition
Experiments

Keywords

  • Association Rules
  • Concept
  • Connected Components
  • Decomposition
  • Document Clustering
  • Hypergraph

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Chiang, I. J., Lin, T. Y., & Hsu, J. Y. J. (2004). Generating hypergraph of term associations for automatic document concept clustering. In A. P. Pobil (Ed.), Proceedings of the Eighth IASTED International Conference on Artificial Intelligence and Soft Computing (pp. 181-186)

Generating hypergraph of term associations for automatic document concept clustering. / Chiang, I. J.; Lin, Tsau Young; Hsu, J. Y J.

Proceedings of the Eighth IASTED International Conference on Artificial Intelligence and Soft Computing. ed. / A.P. Pobil. 2004. p. 181-186.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chiang, IJ, Lin, TY & Hsu, JYJ 2004, Generating hypergraph of term associations for automatic document concept clustering. in AP Pobil (ed.), Proceedings of the Eighth IASTED International Conference on Artificial Intelligence and Soft Computing. pp. 181-186, Proceedings of the Eighth IASTED International Conference on Atificial Intelligence and Soft Computing, Marbella, Spain, 9/1/04.
Chiang IJ, Lin TY, Hsu JYJ. Generating hypergraph of term associations for automatic document concept clustering. In Pobil AP, editor, Proceedings of the Eighth IASTED International Conference on Artificial Intelligence and Soft Computing. 2004. p. 181-186
Chiang, I. J. ; Lin, Tsau Young ; Hsu, J. Y J. / Generating hypergraph of term associations for automatic document concept clustering. Proceedings of the Eighth IASTED International Conference on Artificial Intelligence and Soft Computing. editor / A.P. Pobil. 2004. pp. 181-186
@inproceedings{9dd0bc38fb984009b4be816fb28497f3,
title = "Generating hypergraph of term associations for automatic document concept clustering",
abstract = "This paper presents a novel approach to document clustering using hypergraph decomposition. Given a set of documents, the associations among frequently co-occurring terms in any of the documents define naturally a hypergraph, which can then be decomposed into connected components at various levels. Each connected component represents a primitive concept in the collection. The documents can then be clustered based on the primitive concepts. Experiments with three different data sets from web pages and medical literatures have shown that the proposed unsupervised clustering approach performs significantly better than traditional clustering algorithms, such as k-means, AutoClass and Hierarchical Clustering (HAC). The results indicate that hypergraphs are a perfect model to capture association rules in text and is very useful for automatic document clustering.",
keywords = "Association Rules, Concept, Connected Components, Decomposition, Document Clustering, Hypergraph",
author = "Chiang, {I. J.} and Lin, {Tsau Young} and Hsu, {J. Y J}",
year = "2004",
language = "English",
isbn = "0889864586",
pages = "181--186",
editor = "A.P. Pobil",
booktitle = "Proceedings of the Eighth IASTED International Conference on Artificial Intelligence and Soft Computing",

}

TY - GEN

T1 - Generating hypergraph of term associations for automatic document concept clustering

AU - Chiang, I. J.

AU - Lin, Tsau Young

AU - Hsu, J. Y J

PY - 2004

Y1 - 2004

N2 - This paper presents a novel approach to document clustering using hypergraph decomposition. Given a set of documents, the associations among frequently co-occurring terms in any of the documents define naturally a hypergraph, which can then be decomposed into connected components at various levels. Each connected component represents a primitive concept in the collection. The documents can then be clustered based on the primitive concepts. Experiments with three different data sets from web pages and medical literatures have shown that the proposed unsupervised clustering approach performs significantly better than traditional clustering algorithms, such as k-means, AutoClass and Hierarchical Clustering (HAC). The results indicate that hypergraphs are a perfect model to capture association rules in text and is very useful for automatic document clustering.

AB - This paper presents a novel approach to document clustering using hypergraph decomposition. Given a set of documents, the associations among frequently co-occurring terms in any of the documents define naturally a hypergraph, which can then be decomposed into connected components at various levels. Each connected component represents a primitive concept in the collection. The documents can then be clustered based on the primitive concepts. Experiments with three different data sets from web pages and medical literatures have shown that the proposed unsupervised clustering approach performs significantly better than traditional clustering algorithms, such as k-means, AutoClass and Hierarchical Clustering (HAC). The results indicate that hypergraphs are a perfect model to capture association rules in text and is very useful for automatic document clustering.

KW - Association Rules

KW - Concept

KW - Connected Components

KW - Decomposition

KW - Document Clustering

KW - Hypergraph

UR - http://www.scopus.com/inward/record.url?scp=10444223893&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=10444223893&partnerID=8YFLogxK

M3 - Conference contribution

SN - 0889864586

SP - 181

EP - 186

BT - Proceedings of the Eighth IASTED International Conference on Artificial Intelligence and Soft Computing

A2 - Pobil, A.P.

ER -