Identifying latent semantics in high-dimensional web data

Ajit Kumar, Sanjeev Maskara, Jau Min Wong, I-Jen Chiang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Search engines have become an indispensable tool for obtaining rele-vant information on the Web. The search engine often generates a large number of results, including several irrelevant items that obscure the comprehension of the generated results. Therefore, the search engines need to be enhanced to dis-cover the latent semantics in high-dimensional web data. This paper purports to explain a novel framework, including its implementation and evaluation. To discover the latent semantics in high-dimensional web data, we proposed a framework named Latent Semantic Manifold (LSM). LSM is a mixture model based on the concepts of topology and probability. The framework can find the latent semantics in web data and represent them in homogeneous groups. The framework will be evaluated by experiments. The LSM framework outper-formed compared to other frameworks. In addition, we deployed the framework to develop a tool. The tool was deployed for two years at two places - library and one biomedical engineering laboratory of Taiwan. The tool assisted the re-searchers to do semantic searches of the PubMed database. LSM framework evaluation and deployment suggest that the framework could be used to en-hance the functionalities of currently available search engines by discovering latent semantics in high-dimensional web data.

Original languageEnglish
Title of host publicationCEUR Workshop Proceedings
PublisherCEUR-WS
Volume1114
Publication statusPublished - 2013
Event6th International Workshop on Semantic Web Applications and Tools for Life Sciences, SWAT4LS 2013 - Edinburgh, United Kingdom
Duration: Dec 10 2013Dec 10 2013

Other

Other6th International Workshop on Semantic Web Applications and Tools for Life Sciences, SWAT4LS 2013
CountryUnited Kingdom
CityEdinburgh
Period12/10/1312/10/13

Fingerprint

World Wide Web
Semantics
Search engines
Biomedical engineering
Topology

Keywords

  • Conditional random field
  • Graph-based tree-width decomposition
  • Hidden markov models
  • Latent semantic manifold
  • Semantic cluster

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Kumar, A., Maskara, S., Wong, J. M., & Chiang, I-J. (2013). Identifying latent semantics in high-dimensional web data. In CEUR Workshop Proceedings (Vol. 1114). CEUR-WS.

Identifying latent semantics in high-dimensional web data. / Kumar, Ajit; Maskara, Sanjeev; Wong, Jau Min; Chiang, I-Jen.

CEUR Workshop Proceedings. Vol. 1114 CEUR-WS, 2013.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kumar, A, Maskara, S, Wong, JM & Chiang, I-J 2013, Identifying latent semantics in high-dimensional web data. in CEUR Workshop Proceedings. vol. 1114, CEUR-WS, 6th International Workshop on Semantic Web Applications and Tools for Life Sciences, SWAT4LS 2013, Edinburgh, United Kingdom, 12/10/13.
Kumar A, Maskara S, Wong JM, Chiang I-J. Identifying latent semantics in high-dimensional web data. In CEUR Workshop Proceedings. Vol. 1114. CEUR-WS. 2013
Kumar, Ajit ; Maskara, Sanjeev ; Wong, Jau Min ; Chiang, I-Jen. / Identifying latent semantics in high-dimensional web data. CEUR Workshop Proceedings. Vol. 1114 CEUR-WS, 2013.
@inproceedings{17bbd8aeda8b410492a989e547984a73,
title = "Identifying latent semantics in high-dimensional web data",
abstract = "Search engines have become an indispensable tool for obtaining rele-vant information on the Web. The search engine often generates a large number of results, including several irrelevant items that obscure the comprehension of the generated results. Therefore, the search engines need to be enhanced to dis-cover the latent semantics in high-dimensional web data. This paper purports to explain a novel framework, including its implementation and evaluation. To discover the latent semantics in high-dimensional web data, we proposed a framework named Latent Semantic Manifold (LSM). LSM is a mixture model based on the concepts of topology and probability. The framework can find the latent semantics in web data and represent them in homogeneous groups. The framework will be evaluated by experiments. The LSM framework outper-formed compared to other frameworks. In addition, we deployed the framework to develop a tool. The tool was deployed for two years at two places - library and one biomedical engineering laboratory of Taiwan. The tool assisted the re-searchers to do semantic searches of the PubMed database. LSM framework evaluation and deployment suggest that the framework could be used to en-hance the functionalities of currently available search engines by discovering latent semantics in high-dimensional web data.",
keywords = "Conditional random field, Graph-based tree-width decomposition, Hidden markov models, Latent semantic manifold, Semantic cluster",
author = "Ajit Kumar and Sanjeev Maskara and Wong, {Jau Min} and I-Jen Chiang",
year = "2013",
language = "English",
volume = "1114",
booktitle = "CEUR Workshop Proceedings",
publisher = "CEUR-WS",

}

TY - GEN

T1 - Identifying latent semantics in high-dimensional web data

AU - Kumar, Ajit

AU - Maskara, Sanjeev

AU - Wong, Jau Min

AU - Chiang, I-Jen

PY - 2013

Y1 - 2013

N2 - Search engines have become an indispensable tool for obtaining rele-vant information on the Web. The search engine often generates a large number of results, including several irrelevant items that obscure the comprehension of the generated results. Therefore, the search engines need to be enhanced to dis-cover the latent semantics in high-dimensional web data. This paper purports to explain a novel framework, including its implementation and evaluation. To discover the latent semantics in high-dimensional web data, we proposed a framework named Latent Semantic Manifold (LSM). LSM is a mixture model based on the concepts of topology and probability. The framework can find the latent semantics in web data and represent them in homogeneous groups. The framework will be evaluated by experiments. The LSM framework outper-formed compared to other frameworks. In addition, we deployed the framework to develop a tool. The tool was deployed for two years at two places - library and one biomedical engineering laboratory of Taiwan. The tool assisted the re-searchers to do semantic searches of the PubMed database. LSM framework evaluation and deployment suggest that the framework could be used to en-hance the functionalities of currently available search engines by discovering latent semantics in high-dimensional web data.

AB - Search engines have become an indispensable tool for obtaining rele-vant information on the Web. The search engine often generates a large number of results, including several irrelevant items that obscure the comprehension of the generated results. Therefore, the search engines need to be enhanced to dis-cover the latent semantics in high-dimensional web data. This paper purports to explain a novel framework, including its implementation and evaluation. To discover the latent semantics in high-dimensional web data, we proposed a framework named Latent Semantic Manifold (LSM). LSM is a mixture model based on the concepts of topology and probability. The framework can find the latent semantics in web data and represent them in homogeneous groups. The framework will be evaluated by experiments. The LSM framework outper-formed compared to other frameworks. In addition, we deployed the framework to develop a tool. The tool was deployed for two years at two places - library and one biomedical engineering laboratory of Taiwan. The tool assisted the re-searchers to do semantic searches of the PubMed database. LSM framework evaluation and deployment suggest that the framework could be used to en-hance the functionalities of currently available search engines by discovering latent semantics in high-dimensional web data.

KW - Conditional random field

KW - Graph-based tree-width decomposition

KW - Hidden markov models

KW - Latent semantic manifold

KW - Semantic cluster

UR - http://www.scopus.com/inward/record.url?scp=84908349381&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84908349381&partnerID=8YFLogxK

M3 - Conference contribution

VL - 1114

BT - CEUR Workshop Proceedings

PB - CEUR-WS

ER -