Agglomerative algorithm to discover semantics from unstructured big data

I-Jen Chiang

研究成果: 書貢獻/報告類型會議貢獻

摘要

The paper presents a graph model and an agglomerative algorithm for text document clustering. Given a set of documents, the associations among frequently co-occurring terms in any of the documents naturally form a graph, which can be decomposed into connected components at various levels. Each connected component represents a concept in the collection. These concepts can categorize documents into different semantic classes. The experiments on three different data sets from news, Web, and medical literatures have shown our algorithm is significantly better than traditional clustering algorithms, such as k-means, principal direction division partitioning, AutoClass and hierarchical clustering.

原文英語
主出版物標題Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015
發行者Institute of Electrical and Electronics Engineers Inc.
頁面1556-1563
頁數8
ISBN(列印)9781479999255
DOIs
出版狀態已發佈 - 十二月 22 2015
事件3rd IEEE International Conference on Big Data, IEEE Big Data 2015 - Santa Clara, 美国
持續時間: 十月 29 2015十一月 1 2015

其他

其他3rd IEEE International Conference on Big Data, IEEE Big Data 2015
國家美国
城市Santa Clara
期間10/29/1511/1/15

    指紋

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems
  • Software

引用此

Chiang, I-J. (2015). Agglomerative algorithm to discover semantics from unstructured big data. 於 Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015 (頁 1556-1563). [7363920] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData.2015.7363920