[1]易顺明,周洪斌,周国栋.Twitter推文与情感词典SentiWordNet匹配算法研究[J].南京师范大学学报(工程技术版),2016,16(03):041.[doi:10.3969/j.issn.1672-1292.2016.03.007]
 Yi Shunming,Zhou Hongbin,Zhou Guodong.A Matching Algorithm Between the Tweets in Twitter and SentiWordNet[J].Journal of Nanjing Normal University(Engineering and Technology),2016,16(03):041.[doi:10.3969/j.issn.1672-1292.2016.03.007]
点击复制

Twitter推文与情感词典SentiWordNet匹配算法研究
分享到:

南京师范大学学报(工程技术版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
16卷
期数:
2016年03期
页码:
041
栏目:
计算机工程
出版日期:
2016-09-30

文章信息/Info

Title:
A Matching Algorithm Between the Tweets in Twitter and SentiWordNet
文章编号:
1672-1292(2016)03-0041-07
作者:
易顺明1周洪斌1周国栋2
(1.沙洲职业工学院电子信息工程系,江苏 苏州 215600)(2.苏州大学计算机科学与技术学院,江苏 苏州 215006
Author(s):
Yi Shunming1Zhou Hongbin1Zhou Guodong2
(1.Department of Electronics and Information Engineering,Shazhou Professional Institute of Technology,Suzhou 215600,China)(2.School of Computer Science and Technology,Soochow University,Suzhou 215006,China)
关键词:
推文情感分类SentiWordNet匹配算法
Keywords:
tweetssentiment classificationSentiWordNetmatching algorithm
分类号:
TP391
DOI:
10.3969/j.issn.1672-1292.2016.03.007
文献标志码:
A
摘要:
在Twitter情感分类研究中,经常会采用将推文中的单词匹配情感词典中的同义词条查找相应情感值的方法. 但推文书写比较随意,包含许多俚语、缩写和特殊符号,导致许多词汇与情感词典中的词条无法匹配,匹配率不高直接影响推文的情感分类性能. 针对Twitter的语言特征,提出了一套Twitter推文与情感词典SentiWordNet的匹配算法. 该算法首先通过对推文内容进行数据清洗、替代处理、词性标注和词形还原等预处理,增加了命名实体识别、对hashtags内容的断词处理、基于Word Clusters的否定句处理和词组匹配等方法. 实验结果表明,采用此方法的匹配率可达90%以上.
Abstract:
In the research of the Twitter sentiment classification,a method is widely used to obtain sentiment values by mapping tweets’ words with the synonym terms in the sentiment lexicon. However,tweets are usually written informally,which contain slangs,abbreviations and special symbols,many words in the tweets cannot be found in the terms of sentiment lexicon. Lower matching rate directly impacts the performance of sentiment classification. Based on the features of Twitter,a set of matching algorithm between tweets and sentiment lexicon SentiWordNet is proposed in the article. In this method,tweets are processed by data cleaning,alternative processing,POS tagging and word lemmatizing,along with some algorithms such as named entity recognition,hashtags word segmentation,negated context recognition with Word Clusters and phrase matching. Experimental results show that the matching rate reaches over 90%.

参考文献/References:

[1] HU M Q,LIU B. Mining and summarizing customer reviews[C]//Proceedings of KDD,USA,2004:168-177.
[2] WIEBE J,BRUCE R,O’HARA T. Development and use of a gold standard dataset for subjectivity classifications[C]//Proceedings of ACL,USA,1999:246-253.
[3] DAVE K,LAWRENCE S,PENNOCK D. Mining the peanut gallery:opinion extraction and semantic classification of product reviews[C]//Proceedings of WWW,Hungary,2003:519-528.
[4] TURNEY P. Thumps up or thumbs down?semantic orientation applied to unsupervised classification of reviews[C]//Proceedings of ACL,USA,2002:417-424.
[5] YU H,HATZIVASSILOGLOU V. Towards answering opinion questions:separating facts from opinions and identifying the polarity of opinion sentences[C]//Proceedings of EMNLP,Japan,2003:129-136.
[6] ESULI A,SEBASTIANI F. Determining term subjectivity and term orientation for opinion mining[C]//Proceedings of EACL,Italy,2006:193-200.
[7] PANG B,LEE L,VAITHYANATHAN S. Thumbs up?sentiment classification using machine learning techniques[C]//Proceedings of the EMNLP,USA,2002:79-86.
[8] MEI Q,LING X,WONDRA M,et al. Topic sentiment mixture modeling facets and opinions in weblogs[C]//Proceedings of WWW,Canada,2007:171-180.
[9] LI S,HUANG C,ZHOU G,et al. Employing personal/impersonal views in supervised and semi-supervised sentiment classification[C]//Proceedings of ACL,Sweden,2010:414-423.
[10] LI S,WANG Z,ZHOU G,et al. Semi-supervised learning for imbalanced sentiment classification[C]//Proceedings of IJCAI,Spain,2011:1 826-1 831.
[11] LI S,HUANG L,WANG R,et al. Sentence-level emotion classification with label and context dependence[C]//Proceedings of ACL,China,2015:1 045-1 053.
[12] SOCHER R,PERELYGIN A,WU J,et al. Recursive deep models for semantic compositionality over a sentiment treebank[C]//Proceedings of EMNLP,USA,2013:1 631-1 642.
[13] ZHU T,ZHANG F,LAN M. ECNUCS:A surface information based system description of sentiment analysis in Twitter in the SemEval-2013(Task 2)[C]//Proceedings of SemEval2013,USA,2013:408-413.
[14] TANG D,WEI F,QIN B,et al. Coooolll:a deep learning system for Twitter sentiment classification[C]//Proceedings of SemEval2014,Ireland,2014:208-212.
[15] 易顺明,易昊,周国栋. 基于情感特征向量的Twitter情感分类方法研究[C]//第14届全国计算语言学会议,广州,2015:79.
YI S M,YI H,ZHOU G D. Twitter sentiment classification with sentimental feature vector[C]//Proceedings of CCL2015,Guangzhou,2015:79. (in Chinese)
[16] KALCHBRENNER N,GREFENSTETTE E,BLUNSOM P. A convolutional neural network for modeling sentences[C]//Proceedings of ACL,USA,2014:655-665.
[17] BACCIANELLA S,ESULI A,SEBASTIANI F. SENTIWORDNET 3.0:An enhanced lexical resource for sentiment analysis and opinion mining[C]//Proceedings of LREC,Malta,2010:83-90.
[18] OWOPUTI O,O’CONNOR B,DYER C,et al. Improved part-of-speech tagging for online conversational text with word clusters[C]//Proceedings of NAACL,USA,2013:380-390.
[19] BROWN P,DESOUZA P,MERCER R,et al. Classbased n-gram models of natural language[J]. Computational linguistics,1997,18(4):467-479.
[20] NAKOV P,KOZAREVA Z,RITTER A,et al. SemEval-2013 task 2:sentiment analysis in Twitter[C]//Proceedings of SemEval2013,USA,2013:312-320.
[21] POURSEPANJ H,WEISSBOCK J,INKPEN D. uOttawa:System description for SemEval 2013 task 2 sentiment analysis in Twitter[C]//Proceedings of SemEval2013,USA,2013:380-383.

相似文献/References:

[1]郭 卡,王 芳.TS-Aug架构的半监督自训练情感分类算法[J].南京师范大学学报(工程技术版),2024,24(01):045.[doi:10.3969/j.issn.1672-1292.2024.01.007]
 Guo Ka,Wang Fang.Semi-Supervised Self-Training Sentiment Classification Algorithm Based on TS-Aug Architecture[J].Journal of Nanjing Normal University(Engineering and Technology),2024,24(03):045.[doi:10.3969/j.issn.1672-1292.2024.01.007]

备注/Memo

备注/Memo:
收稿日期:2016-07-17. 
基金项目:国家自然科学基金(61003155、61273320). 
通讯联系人:易顺明,副教授,研究方向:自然语言处理. E-mail:ysm2501@qq.com
更新日期/Last Update: 2016-09-30