Tea in benefits of health: A literature analysis using text mining and latent dirichlet allocation

Ching Hsue Cheng, Wei Lun Hung

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Tea originated in Asian, which was initially used as a medicinal herb. The variety of tea is according to different manufacturing processes and levels of oxidation. The different varieties of tea have different level of effects on health, thus this study adopted text mining technique and Latent Dirichlet Allocation (LDA) to analyze literature for tea in health effect. This study chose Web of Science as the database of literature source, and the search literature from 2007 to 2017. The total 1230 journal articles were collected in this study. The title, abstract, and keywords of the collected journal articles were used as a dataset for the experiment. Experimental results show that the VEM method is significantly lower than Gibbs sampling in perplexity. Hence, this study chooses K=150 when VEM method and Gibbs sampling reach the minimal perplexity in the same time. Many topics that related with tea and compounds of tea, however some topics had terms that related to health and disease. The top 10 topics show that tea could reduce the risk of diseases and benefit of health.

Original languageEnglish
Title of host publicationICMHI 2018 - Proceedings of 2018 the 2nd International Conference on Medical and Health Informatics
PublisherAssociation for Computing Machinery (ACM)
Pages148-155
Number of pages8
ISBN (Electronic)9781450363891
DOIs
Publication statusPublished - Jun 8 2018
Externally publishedYes
Event2nd International Conference on Medical and Health Informatics, ICMHI 2018 - Tsukuba, Japan
Duration: Jun 8 2018Jun 10 2018

Publication series

NameACM International Conference Proceeding Series

Conference

Conference2nd International Conference on Medical and Health Informatics, ICMHI 2018
CountryJapan
CityTsukuba
Period6/8/186/10/18

Fingerprint

Health
Sampling
Tea
Oxidation
Experiments

Keywords

  • Health
  • Latent Dirichlet Allocation
  • LDA
  • Literature analysis
  • Tea
  • Text mining
  • Topic model

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Software

Cite this

Cheng, C. H., & Hung, W. L. (2018). Tea in benefits of health: A literature analysis using text mining and latent dirichlet allocation. In ICMHI 2018 - Proceedings of 2018 the 2nd International Conference on Medical and Health Informatics (pp. 148-155). (ACM International Conference Proceeding Series). Association for Computing Machinery (ACM). https://doi.org/10.1145/3239438.3239459

Tea in benefits of health : A literature analysis using text mining and latent dirichlet allocation. / Cheng, Ching Hsue; Hung, Wei Lun.

ICMHI 2018 - Proceedings of 2018 the 2nd International Conference on Medical and Health Informatics. Association for Computing Machinery (ACM), 2018. p. 148-155 (ACM International Conference Proceeding Series).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Cheng, CH & Hung, WL 2018, Tea in benefits of health: A literature analysis using text mining and latent dirichlet allocation. in ICMHI 2018 - Proceedings of 2018 the 2nd International Conference on Medical and Health Informatics. ACM International Conference Proceeding Series, Association for Computing Machinery (ACM), pp. 148-155, 2nd International Conference on Medical and Health Informatics, ICMHI 2018, Tsukuba, Japan, 6/8/18. https://doi.org/10.1145/3239438.3239459
Cheng CH, Hung WL. Tea in benefits of health: A literature analysis using text mining and latent dirichlet allocation. In ICMHI 2018 - Proceedings of 2018 the 2nd International Conference on Medical and Health Informatics. Association for Computing Machinery (ACM). 2018. p. 148-155. (ACM International Conference Proceeding Series). https://doi.org/10.1145/3239438.3239459
Cheng, Ching Hsue ; Hung, Wei Lun. / Tea in benefits of health : A literature analysis using text mining and latent dirichlet allocation. ICMHI 2018 - Proceedings of 2018 the 2nd International Conference on Medical and Health Informatics. Association for Computing Machinery (ACM), 2018. pp. 148-155 (ACM International Conference Proceeding Series).
@inproceedings{2fdad90309af402099be2120c8d5366e,
title = "Tea in benefits of health: A literature analysis using text mining and latent dirichlet allocation",
abstract = "Tea originated in Asian, which was initially used as a medicinal herb. The variety of tea is according to different manufacturing processes and levels of oxidation. The different varieties of tea have different level of effects on health, thus this study adopted text mining technique and Latent Dirichlet Allocation (LDA) to analyze literature for tea in health effect. This study chose Web of Science as the database of literature source, and the search literature from 2007 to 2017. The total 1230 journal articles were collected in this study. The title, abstract, and keywords of the collected journal articles were used as a dataset for the experiment. Experimental results show that the VEM method is significantly lower than Gibbs sampling in perplexity. Hence, this study chooses K=150 when VEM method and Gibbs sampling reach the minimal perplexity in the same time. Many topics that related with tea and compounds of tea, however some topics had terms that related to health and disease. The top 10 topics show that tea could reduce the risk of diseases and benefit of health.",
keywords = "Health, Latent Dirichlet Allocation, LDA, Literature analysis, Tea, Text mining, Topic model",
author = "Cheng, {Ching Hsue} and Hung, {Wei Lun}",
year = "2018",
month = "6",
day = "8",
doi = "10.1145/3239438.3239459",
language = "English",
series = "ACM International Conference Proceeding Series",
publisher = "Association for Computing Machinery (ACM)",
pages = "148--155",
booktitle = "ICMHI 2018 - Proceedings of 2018 the 2nd International Conference on Medical and Health Informatics",
address = "United States",

}

TY - GEN

T1 - Tea in benefits of health

T2 - A literature analysis using text mining and latent dirichlet allocation

AU - Cheng, Ching Hsue

AU - Hung, Wei Lun

PY - 2018/6/8

Y1 - 2018/6/8

N2 - Tea originated in Asian, which was initially used as a medicinal herb. The variety of tea is according to different manufacturing processes and levels of oxidation. The different varieties of tea have different level of effects on health, thus this study adopted text mining technique and Latent Dirichlet Allocation (LDA) to analyze literature for tea in health effect. This study chose Web of Science as the database of literature source, and the search literature from 2007 to 2017. The total 1230 journal articles were collected in this study. The title, abstract, and keywords of the collected journal articles were used as a dataset for the experiment. Experimental results show that the VEM method is significantly lower than Gibbs sampling in perplexity. Hence, this study chooses K=150 when VEM method and Gibbs sampling reach the minimal perplexity in the same time. Many topics that related with tea and compounds of tea, however some topics had terms that related to health and disease. The top 10 topics show that tea could reduce the risk of diseases and benefit of health.

AB - Tea originated in Asian, which was initially used as a medicinal herb. The variety of tea is according to different manufacturing processes and levels of oxidation. The different varieties of tea have different level of effects on health, thus this study adopted text mining technique and Latent Dirichlet Allocation (LDA) to analyze literature for tea in health effect. This study chose Web of Science as the database of literature source, and the search literature from 2007 to 2017. The total 1230 journal articles were collected in this study. The title, abstract, and keywords of the collected journal articles were used as a dataset for the experiment. Experimental results show that the VEM method is significantly lower than Gibbs sampling in perplexity. Hence, this study chooses K=150 when VEM method and Gibbs sampling reach the minimal perplexity in the same time. Many topics that related with tea and compounds of tea, however some topics had terms that related to health and disease. The top 10 topics show that tea could reduce the risk of diseases and benefit of health.

KW - Health

KW - Latent Dirichlet Allocation

KW - LDA

KW - Literature analysis

KW - Tea

KW - Text mining

KW - Topic model

UR - http://www.scopus.com/inward/record.url?scp=85055673964&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85055673964&partnerID=8YFLogxK

U2 - 10.1145/3239438.3239459

DO - 10.1145/3239438.3239459

M3 - Conference contribution

AN - SCOPUS:85055673964

T3 - ACM International Conference Proceeding Series

SP - 148

EP - 155

BT - ICMHI 2018 - Proceedings of 2018 the 2nd International Conference on Medical and Health Informatics

PB - Association for Computing Machinery (ACM)

ER -