Sentiment analysis on Chinese movie review with distributed keyword vector representation

Chun Han Chu, Chen Ann Wang, Yung Chun Chang, Ying Wei Wu, Yu Lun Hsieh, Wen Lian Hsu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

In the area of national language processing, performing machine learning technique on customer or movie review for sentiment analysis has been? frequently tried. While methods such as? support vector machine (SVM) were much favored in the 2000s, recently there is a steadily rising percentage of implementation with vector representation and artificial neural network. In this article we present an approach to implement word embedding method to conduct sentiment analysis on movie review from a renowned bulletin board system forum in Taiwan. After performing log-likelihood ratio (LLR) on the corpus and selecting the top 10000 most related keywords as representative vectors for different sentiments, we use these vectors as the sentiment classifier for the testing set. We achieved results that are not only comparable to traditional methods like Naïve Bayes and SVM, but also outperform Latent Dirichlet Allocation, TF-IDF and its variant. It also tops the original LLR with a substantial margin.

Original languageEnglish
Title of host publicationTAAI 2016 - 2016 Conference on Technologies and Applications of Artificial Intelligence, Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages84-89
Number of pages6
ISBN (Electronic)9781509057320
DOIs
Publication statusPublished - Mar 16 2017
Externally publishedYes
Event2016 Conference on Technologies and Applications of Artificial Intelligence, TAAI 2016 - Hsinchu, Taiwan
Duration: Nov 25 2016Nov 27 2016

Conference

Conference2016 Conference on Technologies and Applications of Artificial Intelligence, TAAI 2016
CountryTaiwan
CityHsinchu
Period11/25/1611/27/16

Fingerprint

Sentiment Analysis
Log-likelihood Ratio
Support vector machines
Support Vector Machine
TF-IDF
Bulletin boards
Taiwan
Bayes
Margin
Dirichlet
Artificial Neural Network
Percentage
Learning systems
Machine Learning
Classifiers
Customers
Classifier
Neural networks
Testing
Processing

Keywords

  • LLR
  • machine learning
  • sentiment analysis
  • TF-IDF
  • word embedding

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications
  • Control and Optimization
  • Information Systems

Cite this

Chu, C. H., Wang, C. A., Chang, Y. C., Wu, Y. W., Hsieh, Y. L., & Hsu, W. L. (2017). Sentiment analysis on Chinese movie review with distributed keyword vector representation. In TAAI 2016 - 2016 Conference on Technologies and Applications of Artificial Intelligence, Proceedings (pp. 84-89). [7880169] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/TAAI.2016.7880169

Sentiment analysis on Chinese movie review with distributed keyword vector representation. / Chu, Chun Han; Wang, Chen Ann; Chang, Yung Chun; Wu, Ying Wei; Hsieh, Yu Lun; Hsu, Wen Lian.

TAAI 2016 - 2016 Conference on Technologies and Applications of Artificial Intelligence, Proceedings. Institute of Electrical and Electronics Engineers Inc., 2017. p. 84-89 7880169.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chu, CH, Wang, CA, Chang, YC, Wu, YW, Hsieh, YL & Hsu, WL 2017, Sentiment analysis on Chinese movie review with distributed keyword vector representation. in TAAI 2016 - 2016 Conference on Technologies and Applications of Artificial Intelligence, Proceedings., 7880169, Institute of Electrical and Electronics Engineers Inc., pp. 84-89, 2016 Conference on Technologies and Applications of Artificial Intelligence, TAAI 2016, Hsinchu, Taiwan, 11/25/16. https://doi.org/10.1109/TAAI.2016.7880169
Chu CH, Wang CA, Chang YC, Wu YW, Hsieh YL, Hsu WL. Sentiment analysis on Chinese movie review with distributed keyword vector representation. In TAAI 2016 - 2016 Conference on Technologies and Applications of Artificial Intelligence, Proceedings. Institute of Electrical and Electronics Engineers Inc. 2017. p. 84-89. 7880169 https://doi.org/10.1109/TAAI.2016.7880169
Chu, Chun Han ; Wang, Chen Ann ; Chang, Yung Chun ; Wu, Ying Wei ; Hsieh, Yu Lun ; Hsu, Wen Lian. / Sentiment analysis on Chinese movie review with distributed keyword vector representation. TAAI 2016 - 2016 Conference on Technologies and Applications of Artificial Intelligence, Proceedings. Institute of Electrical and Electronics Engineers Inc., 2017. pp. 84-89
@inproceedings{e490a99427394c29a63e336662f0de6a,
title = "Sentiment analysis on Chinese movie review with distributed keyword vector representation",
abstract = "In the area of national language processing, performing machine learning technique on customer or movie review for sentiment analysis has been? frequently tried. While methods such as? support vector machine (SVM) were much favored in the 2000s, recently there is a steadily rising percentage of implementation with vector representation and artificial neural network. In this article we present an approach to implement word embedding method to conduct sentiment analysis on movie review from a renowned bulletin board system forum in Taiwan. After performing log-likelihood ratio (LLR) on the corpus and selecting the top 10000 most related keywords as representative vectors for different sentiments, we use these vectors as the sentiment classifier for the testing set. We achieved results that are not only comparable to traditional methods like Na{\"i}ve Bayes and SVM, but also outperform Latent Dirichlet Allocation, TF-IDF and its variant. It also tops the original LLR with a substantial margin.",
keywords = "LLR, machine learning, sentiment analysis, TF-IDF, word embedding",
author = "Chu, {Chun Han} and Wang, {Chen Ann} and Chang, {Yung Chun} and Wu, {Ying Wei} and Hsieh, {Yu Lun} and Hsu, {Wen Lian}",
year = "2017",
month = "3",
day = "16",
doi = "10.1109/TAAI.2016.7880169",
language = "English",
pages = "84--89",
booktitle = "TAAI 2016 - 2016 Conference on Technologies and Applications of Artificial Intelligence, Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - Sentiment analysis on Chinese movie review with distributed keyword vector representation

AU - Chu, Chun Han

AU - Wang, Chen Ann

AU - Chang, Yung Chun

AU - Wu, Ying Wei

AU - Hsieh, Yu Lun

AU - Hsu, Wen Lian

PY - 2017/3/16

Y1 - 2017/3/16

N2 - In the area of national language processing, performing machine learning technique on customer or movie review for sentiment analysis has been? frequently tried. While methods such as? support vector machine (SVM) were much favored in the 2000s, recently there is a steadily rising percentage of implementation with vector representation and artificial neural network. In this article we present an approach to implement word embedding method to conduct sentiment analysis on movie review from a renowned bulletin board system forum in Taiwan. After performing log-likelihood ratio (LLR) on the corpus and selecting the top 10000 most related keywords as representative vectors for different sentiments, we use these vectors as the sentiment classifier for the testing set. We achieved results that are not only comparable to traditional methods like Naïve Bayes and SVM, but also outperform Latent Dirichlet Allocation, TF-IDF and its variant. It also tops the original LLR with a substantial margin.

AB - In the area of national language processing, performing machine learning technique on customer or movie review for sentiment analysis has been? frequently tried. While methods such as? support vector machine (SVM) were much favored in the 2000s, recently there is a steadily rising percentage of implementation with vector representation and artificial neural network. In this article we present an approach to implement word embedding method to conduct sentiment analysis on movie review from a renowned bulletin board system forum in Taiwan. After performing log-likelihood ratio (LLR) on the corpus and selecting the top 10000 most related keywords as representative vectors for different sentiments, we use these vectors as the sentiment classifier for the testing set. We achieved results that are not only comparable to traditional methods like Naïve Bayes and SVM, but also outperform Latent Dirichlet Allocation, TF-IDF and its variant. It also tops the original LLR with a substantial margin.

KW - LLR

KW - machine learning

KW - sentiment analysis

KW - TF-IDF

KW - word embedding

UR - http://www.scopus.com/inward/record.url?scp=85017646608&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85017646608&partnerID=8YFLogxK

U2 - 10.1109/TAAI.2016.7880169

DO - 10.1109/TAAI.2016.7880169

M3 - Conference contribution

AN - SCOPUS:85017646608

SP - 84

EP - 89

BT - TAAI 2016 - 2016 Conference on Technologies and Applications of Artificial Intelligence, Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -