Identifying Semantic in High-Dimensional Web Data Using Latent Semantic Manifold

Ajit Kumar, Sanjeev Maskara, I-Jen Chiang

Research output: Contribution to journalArticle

Abstract

Latent Semantic Analysis involves natural language processing techniques for analyzing relation- ships between a set of documents and the terms they contain, by producing a set of concepts (related to the documents and terms) called semantic topics. These semantic topics assist search engine users by providing leads to the more relevant document. We develope a novel algorithm called Latent Semantic Manifold (LSM) that can identify the semantic topics in the high-dimen- sional web data. The LSM algorithm is established upon the concepts of topology and probability. Asearch tool is also developed using the LSM algorithm. This search tool is deployed for two years at two sites in Taiwan: 1) Taipei Medical University Library, Taipei, and 2) Biomedical Engineering Laboratory, Institute of Biomedical Engineering, National Taiwan University, Taipei. We evaluate the effectiveness and efficiency of the LSM algorithm by comparing with other contemporary algorithms. The results show that the LSM algorithm outperforms compared with others. This algorithm can be used to enhance the functionality of currently available search engines.
Original languageEnglish
Pages (from-to)136-152
JournalJournal of Data Analysis and Information Processing
Volume3
Issue number4
DOIs
Publication statusPublished - 2015

Fingerprint

Semantics
Biomedical engineering
Search engines
Ships
Topology
Processing

Keywords

  • Latent Semantic Manifold
  • Conditional Random Field
  • Hidden Markov Model
  • Graph-Based Tree-Width Decomposition

Cite this

Identifying Semantic in High-Dimensional Web Data Using Latent Semantic Manifold. / Kumar, Ajit; Maskara, Sanjeev; Chiang, I-Jen.

In: Journal of Data Analysis and Information Processing, Vol. 3, No. 4, 2015, p. 136-152.

Research output: Contribution to journalArticle

Kumar, Ajit ; Maskara, Sanjeev ; Chiang, I-Jen. / Identifying Semantic in High-Dimensional Web Data Using Latent Semantic Manifold. In: Journal of Data Analysis and Information Processing. 2015 ; Vol. 3, No. 4. pp. 136-152.
@article{cf68c767c8d84e0fa422ce08dfd57b3c,
title = "Identifying Semantic in High-Dimensional Web Data Using Latent Semantic Manifold",
abstract = "Latent Semantic Analysis involves natural language processing techniques for analyzing relation- ships between a set of documents and the terms they contain, by producing a set of concepts (related to the documents and terms) called semantic topics. These semantic topics assist search engine users by providing leads to the more relevant document. We develope a novel algorithm called Latent Semantic Manifold (LSM) that can identify the semantic topics in the high-dimen- sional web data. The LSM algorithm is established upon the concepts of topology and probability. Asearch tool is also developed using the LSM algorithm. This search tool is deployed for two years at two sites in Taiwan: 1) Taipei Medical University Library, Taipei, and 2) Biomedical Engineering Laboratory, Institute of Biomedical Engineering, National Taiwan University, Taipei. We evaluate the effectiveness and efficiency of the LSM algorithm by comparing with other contemporary algorithms. The results show that the LSM algorithm outperforms compared with others. This algorithm can be used to enhance the functionality of currently available search engines.",
keywords = "Latent Semantic Manifold, Conditional Random Field, Hidden Markov Model, Graph-Based Tree-Width Decomposition",
author = "Ajit Kumar and Sanjeev Maskara and I-Jen Chiang",
year = "2015",
doi = "10.4236/jdaip.2015.34014",
language = "English",
volume = "3",
pages = "136--152",
journal = "Journal of Data Analysis and Information Processing",
number = "4",

}

TY - JOUR

T1 - Identifying Semantic in High-Dimensional Web Data Using Latent Semantic Manifold

AU - Kumar, Ajit

AU - Maskara, Sanjeev

AU - Chiang, I-Jen

PY - 2015

Y1 - 2015

N2 - Latent Semantic Analysis involves natural language processing techniques for analyzing relation- ships between a set of documents and the terms they contain, by producing a set of concepts (related to the documents and terms) called semantic topics. These semantic topics assist search engine users by providing leads to the more relevant document. We develope a novel algorithm called Latent Semantic Manifold (LSM) that can identify the semantic topics in the high-dimen- sional web data. The LSM algorithm is established upon the concepts of topology and probability. Asearch tool is also developed using the LSM algorithm. This search tool is deployed for two years at two sites in Taiwan: 1) Taipei Medical University Library, Taipei, and 2) Biomedical Engineering Laboratory, Institute of Biomedical Engineering, National Taiwan University, Taipei. We evaluate the effectiveness and efficiency of the LSM algorithm by comparing with other contemporary algorithms. The results show that the LSM algorithm outperforms compared with others. This algorithm can be used to enhance the functionality of currently available search engines.

AB - Latent Semantic Analysis involves natural language processing techniques for analyzing relation- ships between a set of documents and the terms they contain, by producing a set of concepts (related to the documents and terms) called semantic topics. These semantic topics assist search engine users by providing leads to the more relevant document. We develope a novel algorithm called Latent Semantic Manifold (LSM) that can identify the semantic topics in the high-dimen- sional web data. The LSM algorithm is established upon the concepts of topology and probability. Asearch tool is also developed using the LSM algorithm. This search tool is deployed for two years at two sites in Taiwan: 1) Taipei Medical University Library, Taipei, and 2) Biomedical Engineering Laboratory, Institute of Biomedical Engineering, National Taiwan University, Taipei. We evaluate the effectiveness and efficiency of the LSM algorithm by comparing with other contemporary algorithms. The results show that the LSM algorithm outperforms compared with others. This algorithm can be used to enhance the functionality of currently available search engines.

KW - Latent Semantic Manifold

KW - Conditional Random Field

KW - Hidden Markov Model

KW - Graph-Based Tree-Width Decomposition

U2 - 10.4236/jdaip.2015.34014

DO - 10.4236/jdaip.2015.34014

M3 - Article

VL - 3

SP - 136

EP - 152

JO - Journal of Data Analysis and Information Processing

JF - Journal of Data Analysis and Information Processing

IS - 4

ER -