A Matching Algorithm Between the Tweets in Twitter and SentiWordNet(PDF)


A Matching Algorithm Between the Tweets in Twitter and SentiWordNet
Yi Shunming1Zhou Hongbin1Zhou Guodong2
(1.Department of Electronics and Information Engineering,Shazhou Professional Institute of Technology,Suzhou 215600,China)(2.School of Computer Science and Technology,Soochow University,Suzhou 215006,China)
tweetssentiment classificationSentiWordNetmatching algorithm
In the research of the Twitter sentiment classification,a method is widely used to obtain sentiment values by mapping tweets’ words with the synonym terms in the sentiment lexicon. However,tweets are usually written informally,which contain slangs,abbreviations and special symbols,many words in the tweets cannot be found in the terms of sentiment lexicon. Lower matching rate directly impacts the performance of sentiment classification. Based on the features of Twitter,a set of matching algorithm between tweets and sentiment lexicon SentiWordNet is proposed in the article. In this method,tweets are processed by data cleaning,alternative processing,POS tagging and word lemmatizing,along with some algorithms such as named entity recognition,hashtags word segmentation,negated context recognition with Word Clusters and phrase matching. Experimental results show that the matching rate reaches over 90%.


