Web documents are heterogeneous and complex. There exists complicated associations within one web document and linking to the others. The high interactions between terms in documents demonstrate vague and ambiguous meanings. Efficient and effective clustering methods to discover latent and coherent meanings in context are necessary. This paper presents a fuzzy linguistic topological space along with a fuzzy clustering algorithm to discover the contextual meaning in the web documents. The proposed algorithm extracts features from the web documents using conditional random field methods and builds a fuzzy linguistic topological space based on the associations of features. The associations of cooccurring features organize a hierarchy of connected semantic complexes called "CONCEPTS," wherein a fuzzy linguistic measure is applied on each complex to evaluate 1) the relevance of a document belonging to a topic, and 2) the difference between the other topics. Web contents are able to be clustered into topics in the hierarchy depending on their fuzzy linguistic measures; web users can further explore the CONCEPTS of web contents accordingly. Besides the algorithm applicability in web text domains, it can be extended to other applications, such as data mining, bioinformatics, content-based, or collaborative information filtering, etc.
ASJC Scopus subject areas