Discovering Latent Semantics in Web Documents Using Fuzzy Clustering

I-Jen Chiang, Charles Chih Ho Liu, Yi Hsin Tsai, Ajit Kumar

研究成果: 雜誌貢獻文章同行評審

28 引文 斯高帕斯(Scopus)


Web documents are heterogeneous and complex. There exists complicated associations within one web document and linking to the others. The high interactions between terms in documents demonstrate vague and ambiguous meanings. Efficient and effective clustering methods to discover latent and coherent meanings in context are necessary. This paper presents a fuzzy linguistic topological space along with a fuzzy clustering algorithm to discover the contextual meaning in the web documents. The proposed algorithm extracts features from the web documents using conditional random field methods and builds a fuzzy linguistic topological space based on the associations of features. The associations of cooccurring features organize a hierarchy of connected semantic complexes called "CONCEPTS," wherein a fuzzy linguistic measure is applied on each complex to evaluate 1) the relevance of a document belonging to a topic, and 2) the difference between the other topics. Web contents are able to be clustered into topics in the hierarchy depending on their fuzzy linguistic measures; web users can further explore the CONCEPTS of web contents accordingly. Besides the algorithm applicability in web text domains, it can be extended to other applications, such as data mining, bioinformatics, content-based, or collaborative information filtering, etc.

頁(從 - 到)2122-2134
期刊IEEE Transactions on Fuzzy Systems
出版狀態已發佈 - 12月 1 2015

ASJC Scopus subject areas

  • 控制與系統工程
  • 計算機理論與數學
  • 人工智慧
  • 應用數學


深入研究「Discovering Latent Semantics in Web Documents Using Fuzzy Clustering」主題。共同形成了獨特的指紋。