Discovering Latent Semantics in Web Documents Using Fuzzy Clustering

I-Jen Chiang, Charles Chih Ho Liu, Yi Hsin Tsai, Ajit Kumar

Research output: Contribution to journalArticle

19 Citations (Scopus)

Abstract

Web documents are heterogeneous and complex. There exists complicated associations within one web document and linking to the others. The high interactions between terms in documents demonstrate vague and ambiguous meanings. Efficient and effective clustering methods to discover latent and coherent meanings in context are necessary. This paper presents a fuzzy linguistic topological space along with a fuzzy clustering algorithm to discover the contextual meaning in the web documents. The proposed algorithm extracts features from the web documents using conditional random field methods and builds a fuzzy linguistic topological space based on the associations of features. The associations of cooccurring features organize a hierarchy of connected semantic complexes called "CONCEPTS," wherein a fuzzy linguistic measure is applied on each complex to evaluate 1) the relevance of a document belonging to a topic, and 2) the difference between the other topics. Web contents are able to be clustered into topics in the hierarchy depending on their fuzzy linguistic measures; web users can further explore the CONCEPTS of web contents accordingly. Besides the algorithm applicability in web text domains, it can be extended to other applications, such as data mining, bioinformatics, content-based, or collaborative information filtering, etc.

Original languageEnglish
Article number7042824
Pages (from-to)2122-2134
Number of pages13
JournalIEEE Transactions on Fuzzy Systems
Volume23
Issue number6
DOIs
Publication statusPublished - Dec 1 2015

Fingerprint

Document Clustering
Fuzzy clustering
Fuzzy Clustering
Linguistics
Semantics
Information filtering
Bioinformatics
Clustering algorithms
Topological space
Data mining
Information Filtering
Conditional Random Fields
Fuzzy Algorithm
Collaborative Filtering
Ambiguous
Clustering Methods
Linking
Clustering Algorithm
Data Mining
Necessary

Keywords

  • Fuzzy aggregation algorithm
  • fuzzy linguistic topological space
  • fuzzy semantic topology
  • fuzzy web hierarchical clustering
  • named entity recognition (NER)

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Artificial Intelligence
  • Computational Theory and Mathematics
  • Applied Mathematics

Cite this

Discovering Latent Semantics in Web Documents Using Fuzzy Clustering. / Chiang, I-Jen; Liu, Charles Chih Ho; Tsai, Yi Hsin; Kumar, Ajit.

In: IEEE Transactions on Fuzzy Systems, Vol. 23, No. 6, 7042824, 01.12.2015, p. 2122-2134.

Research output: Contribution to journalArticle

Chiang, I-Jen ; Liu, Charles Chih Ho ; Tsai, Yi Hsin ; Kumar, Ajit. / Discovering Latent Semantics in Web Documents Using Fuzzy Clustering. In: IEEE Transactions on Fuzzy Systems. 2015 ; Vol. 23, No. 6. pp. 2122-2134.
@article{e5731fb6dc854576b1319126cf94938d,
title = "Discovering Latent Semantics in Web Documents Using Fuzzy Clustering",
abstract = "Web documents are heterogeneous and complex. There exists complicated associations within one web document and linking to the others. The high interactions between terms in documents demonstrate vague and ambiguous meanings. Efficient and effective clustering methods to discover latent and coherent meanings in context are necessary. This paper presents a fuzzy linguistic topological space along with a fuzzy clustering algorithm to discover the contextual meaning in the web documents. The proposed algorithm extracts features from the web documents using conditional random field methods and builds a fuzzy linguistic topological space based on the associations of features. The associations of cooccurring features organize a hierarchy of connected semantic complexes called {"}CONCEPTS,{"} wherein a fuzzy linguistic measure is applied on each complex to evaluate 1) the relevance of a document belonging to a topic, and 2) the difference between the other topics. Web contents are able to be clustered into topics in the hierarchy depending on their fuzzy linguistic measures; web users can further explore the CONCEPTS of web contents accordingly. Besides the algorithm applicability in web text domains, it can be extended to other applications, such as data mining, bioinformatics, content-based, or collaborative information filtering, etc.",
keywords = "Fuzzy aggregation algorithm, fuzzy linguistic topological space, fuzzy semantic topology, fuzzy web hierarchical clustering, named entity recognition (NER)",
author = "I-Jen Chiang and Liu, {Charles Chih Ho} and Tsai, {Yi Hsin} and Ajit Kumar",
year = "2015",
month = "12",
day = "1",
doi = "10.1109/TFUZZ.2015.2403878",
language = "English",
volume = "23",
pages = "2122--2134",
journal = "IEEE Transactions on Fuzzy Systems",
issn = "1063-6706",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "6",

}

TY - JOUR

T1 - Discovering Latent Semantics in Web Documents Using Fuzzy Clustering

AU - Chiang, I-Jen

AU - Liu, Charles Chih Ho

AU - Tsai, Yi Hsin

AU - Kumar, Ajit

PY - 2015/12/1

Y1 - 2015/12/1

N2 - Web documents are heterogeneous and complex. There exists complicated associations within one web document and linking to the others. The high interactions between terms in documents demonstrate vague and ambiguous meanings. Efficient and effective clustering methods to discover latent and coherent meanings in context are necessary. This paper presents a fuzzy linguistic topological space along with a fuzzy clustering algorithm to discover the contextual meaning in the web documents. The proposed algorithm extracts features from the web documents using conditional random field methods and builds a fuzzy linguistic topological space based on the associations of features. The associations of cooccurring features organize a hierarchy of connected semantic complexes called "CONCEPTS," wherein a fuzzy linguistic measure is applied on each complex to evaluate 1) the relevance of a document belonging to a topic, and 2) the difference between the other topics. Web contents are able to be clustered into topics in the hierarchy depending on their fuzzy linguistic measures; web users can further explore the CONCEPTS of web contents accordingly. Besides the algorithm applicability in web text domains, it can be extended to other applications, such as data mining, bioinformatics, content-based, or collaborative information filtering, etc.

AB - Web documents are heterogeneous and complex. There exists complicated associations within one web document and linking to the others. The high interactions between terms in documents demonstrate vague and ambiguous meanings. Efficient and effective clustering methods to discover latent and coherent meanings in context are necessary. This paper presents a fuzzy linguistic topological space along with a fuzzy clustering algorithm to discover the contextual meaning in the web documents. The proposed algorithm extracts features from the web documents using conditional random field methods and builds a fuzzy linguistic topological space based on the associations of features. The associations of cooccurring features organize a hierarchy of connected semantic complexes called "CONCEPTS," wherein a fuzzy linguistic measure is applied on each complex to evaluate 1) the relevance of a document belonging to a topic, and 2) the difference between the other topics. Web contents are able to be clustered into topics in the hierarchy depending on their fuzzy linguistic measures; web users can further explore the CONCEPTS of web contents accordingly. Besides the algorithm applicability in web text domains, it can be extended to other applications, such as data mining, bioinformatics, content-based, or collaborative information filtering, etc.

KW - Fuzzy aggregation algorithm

KW - fuzzy linguistic topological space

KW - fuzzy semantic topology

KW - fuzzy web hierarchical clustering

KW - named entity recognition (NER)

UR - http://www.scopus.com/inward/record.url?scp=84959557100&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84959557100&partnerID=8YFLogxK

U2 - 10.1109/TFUZZ.2015.2403878

DO - 10.1109/TFUZZ.2015.2403878

M3 - Article

AN - SCOPUS:84959557100

VL - 23

SP - 2122

EP - 2134

JO - IEEE Transactions on Fuzzy Systems

JF - IEEE Transactions on Fuzzy Systems

SN - 1063-6706

IS - 6

M1 - 7042824

ER -