Agglomerative algorithm to discover semantics from unstructured big data

I-Jen Chiang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The paper presents a graph model and an agglomerative algorithm for text document clustering. Given a set of documents, the associations among frequently co-occurring terms in any of the documents naturally form a graph, which can be decomposed into connected components at various levels. Each connected component represents a concept in the collection. These concepts can categorize documents into different semantic classes. The experiments on three different data sets from news, Web, and medical literatures have shown our algorithm is significantly better than traditional clustering algorithms, such as k-means, principal direction division partitioning, AutoClass and hierarchical clustering.

Original languageEnglish
Title of host publicationProceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1556-1563
Number of pages8
ISBN (Print)9781479999255
DOIs
Publication statusPublished - Dec 22 2015
Event3rd IEEE International Conference on Big Data, IEEE Big Data 2015 - Santa Clara, United States
Duration: Oct 29 2015Nov 1 2015

Other

Other3rd IEEE International Conference on Big Data, IEEE Big Data 2015
CountryUnited States
CitySanta Clara
Period10/29/1511/1/15

Fingerprint

Semantics
Clustering algorithms
Experiments
Big data

Keywords

  • agglomerative document categorization/clustering
  • association rules
  • hierarchical clustering
  • hypergraph

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems
  • Software

Cite this

Chiang, I-J. (2015). Agglomerative algorithm to discover semantics from unstructured big data. In Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015 (pp. 1556-1563). [7363920] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData.2015.7363920

Agglomerative algorithm to discover semantics from unstructured big data. / Chiang, I-Jen.

Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015. Institute of Electrical and Electronics Engineers Inc., 2015. p. 1556-1563 7363920.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chiang, I-J 2015, Agglomerative algorithm to discover semantics from unstructured big data. in Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015., 7363920, Institute of Electrical and Electronics Engineers Inc., pp. 1556-1563, 3rd IEEE International Conference on Big Data, IEEE Big Data 2015, Santa Clara, United States, 10/29/15. https://doi.org/10.1109/BigData.2015.7363920
Chiang I-J. Agglomerative algorithm to discover semantics from unstructured big data. In Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015. Institute of Electrical and Electronics Engineers Inc. 2015. p. 1556-1563. 7363920 https://doi.org/10.1109/BigData.2015.7363920
Chiang, I-Jen. / Agglomerative algorithm to discover semantics from unstructured big data. Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015. Institute of Electrical and Electronics Engineers Inc., 2015. pp. 1556-1563
@inproceedings{9e31510de1b04a80879d4938402f23fa,
title = "Agglomerative algorithm to discover semantics from unstructured big data",
abstract = "The paper presents a graph model and an agglomerative algorithm for text document clustering. Given a set of documents, the associations among frequently co-occurring terms in any of the documents naturally form a graph, which can be decomposed into connected components at various levels. Each connected component represents a concept in the collection. These concepts can categorize documents into different semantic classes. The experiments on three different data sets from news, Web, and medical literatures have shown our algorithm is significantly better than traditional clustering algorithms, such as k-means, principal direction division partitioning, AutoClass and hierarchical clustering.",
keywords = "agglomerative document categorization/clustering, association rules, hierarchical clustering, hypergraph",
author = "I-Jen Chiang",
year = "2015",
month = "12",
day = "22",
doi = "10.1109/BigData.2015.7363920",
language = "English",
isbn = "9781479999255",
pages = "1556--1563",
booktitle = "Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Agglomerative algorithm to discover semantics from unstructured big data

AU - Chiang, I-Jen

PY - 2015/12/22

Y1 - 2015/12/22

N2 - The paper presents a graph model and an agglomerative algorithm for text document clustering. Given a set of documents, the associations among frequently co-occurring terms in any of the documents naturally form a graph, which can be decomposed into connected components at various levels. Each connected component represents a concept in the collection. These concepts can categorize documents into different semantic classes. The experiments on three different data sets from news, Web, and medical literatures have shown our algorithm is significantly better than traditional clustering algorithms, such as k-means, principal direction division partitioning, AutoClass and hierarchical clustering.

AB - The paper presents a graph model and an agglomerative algorithm for text document clustering. Given a set of documents, the associations among frequently co-occurring terms in any of the documents naturally form a graph, which can be decomposed into connected components at various levels. Each connected component represents a concept in the collection. These concepts can categorize documents into different semantic classes. The experiments on three different data sets from news, Web, and medical literatures have shown our algorithm is significantly better than traditional clustering algorithms, such as k-means, principal direction division partitioning, AutoClass and hierarchical clustering.

KW - agglomerative document categorization/clustering

KW - association rules

KW - hierarchical clustering

KW - hypergraph

UR - http://www.scopus.com/inward/record.url?scp=84963745319&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84963745319&partnerID=8YFLogxK

U2 - 10.1109/BigData.2015.7363920

DO - 10.1109/BigData.2015.7363920

M3 - Conference contribution

AN - SCOPUS:84963745319

SN - 9781479999255

SP - 1556

EP - 1563

BT - Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015

PB - Institute of Electrical and Electronics Engineers Inc.

ER -