Utilizing different word representation methods for twitter data in adverse drug reactions extraction

Wei San Lin, Hong Jie Dai, Jitendra Jonnagaddala, Nai Wun Chang, Toni Rose Jue, Usman Iqbal, Joni Yu Hsuan Shao, I. Jen Chiang, Yu Chuan Li

研究成果: 書貢獻/報告類型會議貢獻

3 引文 (Scopus)

摘要

With the advancement of technology and development of social media, patients discuss medications and other related information including adverse drug reactions (ADRs) with their friends, family or other patients. Although, there are various pros and cons of using social media for automatic ADR monitoring, information on social media provided by patients about drugs are widely considered a valuable resource for post-marketing drug surveillance. In this study, we developed a named entity recognition (NER) system based on conditional random fields to identify ADRs-related information from Twitter data. The representation of words for the input text is one of the crucial steps in supervised learning. Recently, the word vector representation is becoming popular, which uses unlabeled data to provide a generalization for reducing the data sparsity in word representation. This study examines different word representation methods for the ADR recognition task, including token normalization, and two state-of-the-art word embedding methods, namely word2vec and the global vectors (GloVe). The experimental results demonstrate that all of the studied representation scheme can improve the recall rate and overall F-measure with the cost of the reduced precision. The manual analysis of the generated clusters demonstrates that word2vec has stronger cluster trends compared to GloVe.
原文英語
主出版物標題TAAI 2015 - 2015 Conference on Technologies and Applications of Artificial Intelligence
發行者Institute of Electrical and Electronics Engineers Inc.
頁面260-265
頁數6
ISBN(列印)9781467396066
DOIs
出版狀態已發佈 - 二月 12 2016
事件Conference on Technologies and Applications of Artificial Intelligence, TAAI 2015 - Tainan, 臺灣
持續時間: 十一月 20 2015十一月 22 2015

其他

其他Conference on Technologies and Applications of Artificial Intelligence, TAAI 2015
國家臺灣
城市Tainan
期間11/20/1511/22/15

指紋

Supervised learning
Marketing
Monitoring
Costs

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications

引用此文

Lin, W. S., Dai, H. J., Jonnagaddala, J., Chang, N. W., Jue, T. R., Iqbal, U., ... Li, Y. C. (2016). Utilizing different word representation methods for twitter data in adverse drug reactions extraction. 於 TAAI 2015 - 2015 Conference on Technologies and Applications of Artificial Intelligence (頁 260-265). [7407070] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/TAAI.2015.7407070

Utilizing different word representation methods for twitter data in adverse drug reactions extraction. / Lin, Wei San; Dai, Hong Jie; Jonnagaddala, Jitendra; Chang, Nai Wun; Jue, Toni Rose; Iqbal, Usman; Shao, Joni Yu Hsuan; Chiang, I. Jen; Li, Yu Chuan.

TAAI 2015 - 2015 Conference on Technologies and Applications of Artificial Intelligence. Institute of Electrical and Electronics Engineers Inc., 2016. p. 260-265 7407070.

研究成果: 書貢獻/報告類型會議貢獻

Lin, WS, Dai, HJ, Jonnagaddala, J, Chang, NW, Jue, TR, Iqbal, U, Shao, JYH, Chiang, IJ & Li, YC 2016, Utilizing different word representation methods for twitter data in adverse drug reactions extraction. 於 TAAI 2015 - 2015 Conference on Technologies and Applications of Artificial Intelligence., 7407070, Institute of Electrical and Electronics Engineers Inc., 頁 260-265, Conference on Technologies and Applications of Artificial Intelligence, TAAI 2015, Tainan, 臺灣, 11/20/15. https://doi.org/10.1109/TAAI.2015.7407070
Lin WS, Dai HJ, Jonnagaddala J, Chang NW, Jue TR, Iqbal U 等. Utilizing different word representation methods for twitter data in adverse drug reactions extraction. 於 TAAI 2015 - 2015 Conference on Technologies and Applications of Artificial Intelligence. Institute of Electrical and Electronics Engineers Inc. 2016. p. 260-265. 7407070 https://doi.org/10.1109/TAAI.2015.7407070
Lin, Wei San ; Dai, Hong Jie ; Jonnagaddala, Jitendra ; Chang, Nai Wun ; Jue, Toni Rose ; Iqbal, Usman ; Shao, Joni Yu Hsuan ; Chiang, I. Jen ; Li, Yu Chuan. / Utilizing different word representation methods for twitter data in adverse drug reactions extraction. TAAI 2015 - 2015 Conference on Technologies and Applications of Artificial Intelligence. Institute of Electrical and Electronics Engineers Inc., 2016. 頁 260-265
@inproceedings{8a95e037a440461d88f8ef470459fffb,
title = "Utilizing different word representation methods for twitter data in adverse drug reactions extraction",
abstract = "With the advancement of technology and development of social media, patients discuss medications and other related information including adverse drug reactions (ADRs) with their friends, family or other patients. Although, there are various pros and cons of using social media for automatic ADR monitoring, information on social media provided by patients about drugs are widely considered a valuable resource for post-marketing drug surveillance. In this study, we developed a named entity recognition (NER) system based on conditional random fields to identify ADRs-related information from Twitter data. The representation of words for the input text is one of the crucial steps in supervised learning. Recently, the word vector representation is becoming popular, which uses unlabeled data to provide a generalization for reducing the data sparsity in word representation. This study examines different word representation methods for the ADR recognition task, including token normalization, and two state-of-the-art word embedding methods, namely word2vec and the global vectors (GloVe). The experimental results demonstrate that all of the studied representation scheme can improve the recall rate and overall F-measure with the cost of the reduced precision. The manual analysis of the generated clusters demonstrates that word2vec has stronger cluster trends compared to GloVe.",
keywords = "adverse drug reactions, named entity recognition, natural language processing, social media, word embedding",
author = "Lin, {Wei San} and Dai, {Hong Jie} and Jitendra Jonnagaddala and Chang, {Nai Wun} and Jue, {Toni Rose} and Usman Iqbal and Shao, {Joni Yu Hsuan} and Chiang, {I. Jen} and Li, {Yu Chuan}",
year = "2016",
month = "2",
day = "12",
doi = "10.1109/TAAI.2015.7407070",
language = "English",
isbn = "9781467396066",
pages = "260--265",
booktitle = "TAAI 2015 - 2015 Conference on Technologies and Applications of Artificial Intelligence",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Utilizing different word representation methods for twitter data in adverse drug reactions extraction

AU - Lin, Wei San

AU - Dai, Hong Jie

AU - Jonnagaddala, Jitendra

AU - Chang, Nai Wun

AU - Jue, Toni Rose

AU - Iqbal, Usman

AU - Shao, Joni Yu Hsuan

AU - Chiang, I. Jen

AU - Li, Yu Chuan

PY - 2016/2/12

Y1 - 2016/2/12

N2 - With the advancement of technology and development of social media, patients discuss medications and other related information including adverse drug reactions (ADRs) with their friends, family or other patients. Although, there are various pros and cons of using social media for automatic ADR monitoring, information on social media provided by patients about drugs are widely considered a valuable resource for post-marketing drug surveillance. In this study, we developed a named entity recognition (NER) system based on conditional random fields to identify ADRs-related information from Twitter data. The representation of words for the input text is one of the crucial steps in supervised learning. Recently, the word vector representation is becoming popular, which uses unlabeled data to provide a generalization for reducing the data sparsity in word representation. This study examines different word representation methods for the ADR recognition task, including token normalization, and two state-of-the-art word embedding methods, namely word2vec and the global vectors (GloVe). The experimental results demonstrate that all of the studied representation scheme can improve the recall rate and overall F-measure with the cost of the reduced precision. The manual analysis of the generated clusters demonstrates that word2vec has stronger cluster trends compared to GloVe.

AB - With the advancement of technology and development of social media, patients discuss medications and other related information including adverse drug reactions (ADRs) with their friends, family or other patients. Although, there are various pros and cons of using social media for automatic ADR monitoring, information on social media provided by patients about drugs are widely considered a valuable resource for post-marketing drug surveillance. In this study, we developed a named entity recognition (NER) system based on conditional random fields to identify ADRs-related information from Twitter data. The representation of words for the input text is one of the crucial steps in supervised learning. Recently, the word vector representation is becoming popular, which uses unlabeled data to provide a generalization for reducing the data sparsity in word representation. This study examines different word representation methods for the ADR recognition task, including token normalization, and two state-of-the-art word embedding methods, namely word2vec and the global vectors (GloVe). The experimental results demonstrate that all of the studied representation scheme can improve the recall rate and overall F-measure with the cost of the reduced precision. The manual analysis of the generated clusters demonstrates that word2vec has stronger cluster trends compared to GloVe.

KW - adverse drug reactions

KW - named entity recognition

KW - natural language processing

KW - social media

KW - word embedding

UR - http://www.scopus.com/inward/record.url?scp=84964284479&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84964284479&partnerID=8YFLogxK

U2 - 10.1109/TAAI.2015.7407070

DO - 10.1109/TAAI.2015.7407070

M3 - Conference contribution

AN - SCOPUS:84964284479

SN - 9781467396066

SP - 260

EP - 265

BT - TAAI 2015 - 2015 Conference on Technologies and Applications of Artificial Intelligence

PB - Institute of Electrical and Electronics Engineers Inc.

ER -