Discovering Latent Semantics in Web Documents Using Fuzzy Clustering

I-Jen Chiang, Charles Chih Ho Liu, Yi Hsin Tsai, Ajit Kumar

Research output: Contribution to journalArticlepeer-review

28 Citations (Scopus)

Abstract

Web documents are heterogeneous and complex. There exists complicated associations within one web document and linking to the others. The high interactions between terms in documents demonstrate vague and ambiguous meanings. Efficient and effective clustering methods to discover latent and coherent meanings in context are necessary. This paper presents a fuzzy linguistic topological space along with a fuzzy clustering algorithm to discover the contextual meaning in the web documents. The proposed algorithm extracts features from the web documents using conditional random field methods and builds a fuzzy linguistic topological space based on the associations of features. The associations of cooccurring features organize a hierarchy of connected semantic complexes called "CONCEPTS," wherein a fuzzy linguistic measure is applied on each complex to evaluate 1) the relevance of a document belonging to a topic, and 2) the difference between the other topics. Web contents are able to be clustered into topics in the hierarchy depending on their fuzzy linguistic measures; web users can further explore the CONCEPTS of web contents accordingly. Besides the algorithm applicability in web text domains, it can be extended to other applications, such as data mining, bioinformatics, content-based, or collaborative information filtering, etc.

Original languageEnglish
Article number7042824
Pages (from-to)2122-2134
Number of pages13
JournalIEEE Transactions on Fuzzy Systems
Volume23
Issue number6
DOIs
Publication statusPublished - Dec 1 2015

Keywords

  • Fuzzy aggregation algorithm
  • fuzzy linguistic topological space
  • fuzzy semantic topology
  • fuzzy web hierarchical clustering
  • named entity recognition (NER)

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Computational Theory and Mathematics
  • Artificial Intelligence
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Discovering Latent Semantics in Web Documents Using Fuzzy Clustering'. Together they form a unique fingerprint.

Cite this