Agglomerative algorithm to discover semantics from unstructured big data

I-Jen Chiang

研究成果: 書貢獻/報告類型會議貢獻

摘要

The paper presents a graph model and an agglomerative algorithm for text document clustering. Given a set of documents, the associations among frequently co-occurring terms in any of the documents naturally form a graph, which can be decomposed into connected components at various levels. Each connected component represents a concept in the collection. These concepts can categorize documents into different semantic classes. The experiments on three different data sets from news, Web, and medical literatures have shown our algorithm is significantly better than traditional clustering algorithms, such as k-means, principal direction division partitioning, AutoClass and hierarchical clustering.

原文英語
主出版物標題Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015
發行者Institute of Electrical and Electronics Engineers Inc.
頁面1556-1563
頁數8
ISBN(列印)9781479999255
DOIs
出版狀態已發佈 - 十二月 22 2015
事件3rd IEEE International Conference on Big Data, IEEE Big Data 2015 - Santa Clara, 美国
持續時間: 十月 29 2015十一月 1 2015

其他

其他3rd IEEE International Conference on Big Data, IEEE Big Data 2015
國家/地區美国
城市Santa Clara
期間10/29/1511/1/15

ASJC Scopus subject areas

  • 電腦網路與通信
  • 電腦科學應用
  • 資訊系統
  • 軟體

指紋

深入研究「Agglomerative algorithm to discover semantics from unstructured big data」主題。共同形成了獨特的指紋。

引用此