Semantic based clustering of web documents

Tsau Young Lin, I. Jen Chiang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

A new methodology that structures the semantics of a collection of documents into the geometry of a simplicial complex is developed: A primitive concept is represented by a top dimension simplex. and a connected component represents a concept. Based on these structures, documents can be clustered into some meaningful classes. Experiments with three different data sets from web pages and medical literature have shown that the proposed unsupervised clustering approach performs significantly better than traditional clustering algorithms, such as k-means, AutoClass and Hierarchical Clustering (HAC). This abstract geometric model seems have captured the intrinsic semantics of the documents.

Original languageEnglish
Title of host publication2005 IEEE International Conference on Granular Computing
Pages189-192
Number of pages4
Volume2005
DOIs
Publication statusPublished - 2005
Event2005 IEEE International Conference on Granular Computing - Beijing, China
Duration: Jul 25 2005Jul 27 2005

Other

Other2005 IEEE International Conference on Granular Computing
CountryChina
CityBeijing
Period7/25/057/27/05

Fingerprint

Semantics
Clustering algorithms
Websites
Geometry
Experiments

Keywords

  • Clustering
  • Document
  • Polyhedron
  • Semantics
  • Web

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Lin, T. Y., & Chiang, I. J. (2005). Semantic based clustering of web documents. In 2005 IEEE International Conference on Granular Computing (Vol. 2005, pp. 189-192). [1547264] https://doi.org/10.1109/GRC.2005.1547264

Semantic based clustering of web documents. / Lin, Tsau Young; Chiang, I. Jen.

2005 IEEE International Conference on Granular Computing. Vol. 2005 2005. p. 189-192 1547264.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Lin, TY & Chiang, IJ 2005, Semantic based clustering of web documents. in 2005 IEEE International Conference on Granular Computing. vol. 2005, 1547264, pp. 189-192, 2005 IEEE International Conference on Granular Computing, Beijing, China, 7/25/05. https://doi.org/10.1109/GRC.2005.1547264
Lin TY, Chiang IJ. Semantic based clustering of web documents. In 2005 IEEE International Conference on Granular Computing. Vol. 2005. 2005. p. 189-192. 1547264 https://doi.org/10.1109/GRC.2005.1547264
Lin, Tsau Young ; Chiang, I. Jen. / Semantic based clustering of web documents. 2005 IEEE International Conference on Granular Computing. Vol. 2005 2005. pp. 189-192
@inproceedings{8d241c4e05c44f248e76f5e50a08b04a,
title = "Semantic based clustering of web documents",
abstract = "A new methodology that structures the semantics of a collection of documents into the geometry of a simplicial complex is developed: A primitive concept is represented by a top dimension simplex. and a connected component represents a concept. Based on these structures, documents can be clustered into some meaningful classes. Experiments with three different data sets from web pages and medical literature have shown that the proposed unsupervised clustering approach performs significantly better than traditional clustering algorithms, such as k-means, AutoClass and Hierarchical Clustering (HAC). This abstract geometric model seems have captured the intrinsic semantics of the documents.",
keywords = "Clustering, Document, Polyhedron, Semantics, Web",
author = "Lin, {Tsau Young} and Chiang, {I. Jen}",
year = "2005",
doi = "10.1109/GRC.2005.1547264",
language = "English",
isbn = "0780390172",
volume = "2005",
pages = "189--192",
booktitle = "2005 IEEE International Conference on Granular Computing",

}

TY - GEN

T1 - Semantic based clustering of web documents

AU - Lin, Tsau Young

AU - Chiang, I. Jen

PY - 2005

Y1 - 2005

N2 - A new methodology that structures the semantics of a collection of documents into the geometry of a simplicial complex is developed: A primitive concept is represented by a top dimension simplex. and a connected component represents a concept. Based on these structures, documents can be clustered into some meaningful classes. Experiments with three different data sets from web pages and medical literature have shown that the proposed unsupervised clustering approach performs significantly better than traditional clustering algorithms, such as k-means, AutoClass and Hierarchical Clustering (HAC). This abstract geometric model seems have captured the intrinsic semantics of the documents.

AB - A new methodology that structures the semantics of a collection of documents into the geometry of a simplicial complex is developed: A primitive concept is represented by a top dimension simplex. and a connected component represents a concept. Based on these structures, documents can be clustered into some meaningful classes. Experiments with three different data sets from web pages and medical literature have shown that the proposed unsupervised clustering approach performs significantly better than traditional clustering algorithms, such as k-means, AutoClass and Hierarchical Clustering (HAC). This abstract geometric model seems have captured the intrinsic semantics of the documents.

KW - Clustering

KW - Document

KW - Polyhedron

KW - Semantics

KW - Web

UR - http://www.scopus.com/inward/record.url?scp=33845344700&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33845344700&partnerID=8YFLogxK

U2 - 10.1109/GRC.2005.1547264

DO - 10.1109/GRC.2005.1547264

M3 - Conference contribution

AN - SCOPUS:33845344700

SN - 0780390172

SN - 9780780390171

VL - 2005

SP - 189

EP - 192

BT - 2005 IEEE International Conference on Granular Computing

ER -