[1]张永军,刘金岭.一种改进的高效贝叶斯短信文本分类器[J].南京师范大学学报(工程技术版),2014,14(03):070.
 Zhang Yongjun,Liu Jinling.An Improved Efficient Bayesian Short Message Text Classifier[J].Journal of Nanjing Normal University(Engineering and Technology),2014,14(03):070.
点击复制

一种改进的高效贝叶斯短信文本分类器
分享到:

南京师范大学学报(工程技术版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
14卷
期数:
2014年03期
页码:
070
栏目:
出版日期:
2014-09-30

文章信息/Info

Title:
An Improved Efficient Bayesian Short Message Text Classifier
作者:
张永军刘金岭
淮阴工学院计算机工程学院,江苏 淮安 223003
Author(s):
Zhang YongjunLiu Jinling
College of Computer Engineering,Huaiyin Institute of Technology,Huai’an 223003,China
关键词:
短信文本分类贝叶斯支持向量机分类能量空间
Keywords:
short messagetext classificationBayesianSVMcategory energy space
分类号:
TP181
文献标志码:
A
摘要:
针对短信分类问题,提出了分类能量空间的概念,将特征词转换为分类能量空间上的一个能量元,以此为基础计算短信的能量特征向量.通过计算短信能量特征向量的领域密度,结合贝叶斯公式输出了短信在不同分类的分类概率.在分类过程中,还对分类概率差别较小的短信采用支持向量机进行了二次分类以提高分类效果.实验结果表明,该分类器模型具有良好的分类效果.
Abstract:
A Bayesian classifier model is proposed to classify short message according to its content.The concept of category energy space is introduced and the word feature is converted to an energy unit in category energy space.Then the short message is represented as an energy vector based on its words.To obtain each category’s probability,the energy vector density is calculated and brought in Bayesian probability formula.When the category probabilities are not very different,a SVM model is used to reclassify the short message.The experimental results shows that the proposed model is superior to other classification methods in the classification result.

参考文献/References:

[1] 新浪科技.2012年我国短信量同比增2%人均发送量下滑[R/OL].[2013-1-28].http://tech.sina.com.cn/t/2013-01-28/00538020096.shtml.
Sina Tech.SMS quantity increased is 2% and per capita volume has declined in China in 2012[R/OL].[2013-1-28].http://tech.sina.com.cn/t/2013-01-28/00538020096.shtml.(in Chinese)
[2]陈功平,沈明玉,王红,等.基于内容的短信分类技术[J].华东理工大学学报:自然科学版,2011,37(6):770-774.
Chen Gongping,Shen Mingyu,Wang Hong.SMS classification technology based on content[J].Journal of East China University of Science and Technology:Natural Science Edition,2011,37(6):770-774.(in Chinese)
[3]李继刚.短信自动分类技术研究与应用[D].上海:东华大学计算机科学学院,2011.
Li Jigang.Study and application of SMS automatic classification[D].Shanghai:Computer Science & Technology College,Donghua University,2011.(in Chinese)
[4]綦科,谢冬青.基于内容的短信分类系统的设计与实现[J].广州大学学报:自然科学版,2011,10(5):43-47.
Qi Ke,Xie Dongqing.Implement of classification system of short message based on text content[J].Journal of Guangzhou University:Natural Science Edition,2011,10(5):43-47.(in Chinese)
[5]张兢,候旭东,吕和胜.基于朴素贝叶斯和支持向量机的短信智能分析系统设计[J].重庆理工大学学报:自然科学版,2010,24(1):77-81.
Zhang Jing,Hou Xudong,Lv Heshen.Journal of chongqing university of technology[J].Journal of Chongqing University of Technology:Natural Science Edition,2010,24(1):77-81.(in Chinese)
[6]Ganiz M C.Higher order Na?ve Bayes:a novel non-IID approach to text classification[J].IEEE Transactions on Knowledge and Data Engineering,2011,23(7):1 022-1 034.
[7]Zhang Haijun.Textual and visual content-based anti-phishing:a Bayesian approach[J].IEEE Transactions on Neural Networks,2011,22(10):1 532-1 546.
[8]Tak-Lam Wong,Wai Lam.Learning to adapt web information extraction knowledge and discovering new attributes via a Bayesian approach[J].IEEE Transactions on Knowledge and Data Engineering,2010,22(4):523-536.
[9]Belem D.Content filtering for SMS systems based on Bayesian classifier and word grouping[C]//Network Operations and Management Symposium(LANOMS),Quito:IEEE Press,2011:1-7.
[10]Uysal,Alper Kursat.Detection of SMS spam messages on mobile phones[C]//Signal Processing and Communications Applications Conference(SIU),Mugla:IEEE Press,2012:1-4.
[11]Vahora S,Hasan M,Lakhani R.Novel approach:Na?ve Bayes with vector space model for spam classification[C]//2011 Nirma University International Conference,Ahmedabad Gujarat:Nirma University Press,2011:1-5.
[12]Gunal S,Ergin S,Gunal E S.Detection of SMS spam messages on mobile phones[C]//2012 20th Signal Processing and Communications Applications Conference(SIU),Mugla:IEEE Press,2012:1-4.
[13]Han Kyoungsoo,Rrim Haechang,Sung Hyon Myaeng.Some effective techniques for Naive Bayes text classification[J].IEEE Transactions on Knowledge and Data Engineering,2006,18(11):1 457-1 466.
[14]Khemapatapan C.Thai-English spam SMS filtering[C]//Communications(APCC),Auckland:IEEE Press,2010:226-230.
[15]宋艳艳.基于内容分类的垃圾短信拦截系统的研究[D].哈尔滨:哈尔滨理工大学测控技术与通信工程学院,2012.
Song Yanyan.Research on spam message interception system based on content classification[D].Harbin:Measurement and Control Technology & Communication engineering College,Harbin University of Science and Technology,2012.(in Chinese)
[16]李慧,叶鸿,潘学瑞,等.基于SVM的垃圾短信过滤系统[J].计算机安全,2012,13(6):34-38.
Li Hui,Ye Hong,Pan Xuerui.Spam messages filtering system based on SVM[J].Computer Security,2012,13(6):34-38.(in Chinese)
[17]冯鸥鹏.垃圾短信过滤中字特征与词特征对过滤效果的比较研究[D].北京:北京邮电大学计算机学院,2011.
Feng Oupeng.A comparative study of chinese character feature and word feature in SMS spam filtering[D].Beijing:School of Computing,Beijing University of Posts and Telecommunications,2011.(in Chinese)
[18]徐易.基于短文本的分类算法研究[D].上海:上海交通大学电子信息与电气工程学院,2010.
Xu Yi.Research of text classification algorithm based on short text[D]Shanghai:Electronic Information and Electrical Engineering College,Shanghai Jiao Tong University,2010.(in Chinese)
[19]龚垒.基于支持向量机的垃圾短信过滤方法研究[D].焦作:河南理工大学计算机科学与技术学院,2011.
Gong Lei.The research of filtering methods of spam messages based on SVM[D].Jiaozuo:Computer Science & Technology College,Henan Polytechnic University,,2011.(in Chinese)
[20]刘庆瑜.基于决策树分类的手机垃圾短信过滤器的设计与实现[D].杭州:浙江工业大学计算机科学与技术学院,2011.
Liu Qingyu.Design and implementation of mobilephone garbage SMS filters based on sorting algorithm of decision tree[D].Hangzhou:Computer Science & Technology College,Zhejiang University of Technology,2011.(in Chinese)
[21]熊忠阳,蒋健,张玉芳.新的CDF文本分类特征提取方法[J].计算机应用,2009,29(7):1 755-1 757.
Xiong Zhongyang,Jiang Jian,Zhang Yufang.New feature selection approach(CDF)for text categorization[J].Journal of Computer Applications,2009,29(7):1 755-1 757.(in Chinese)
[22]Yang Y,Pederson J O.A comparative study on feature selection in text categorization[C]//Proceedings of the 14th International Conference on Machine Learning.San Francisco:Morgan Kaufmann,1997:412-420.
[23]Forman G.An Extensive empirical study of feature selection metrics for text classification[J].Special Issue on Variable and Feature Selection,2003,8:1 289-1 305.

相似文献/References:

[1]沈志斌,白清源.文本分类中特征权重算法的改进[J].南京师范大学学报(工程技术版),2008,08(04):095.
 Shen Zhibin,Bai Qingyuan.Improvement of Feature Weighting Algorithm in Text Classification[J].Journal of Nanjing Normal University(Engineering and Technology),2008,08(03):095.
[2]高洁,吉根林.一种增量式Bayes文本分类算法[J].南京师范大学学报(工程技术版),2004,04(03):049.
 GAO Jie,JI Genlin.Incremental Bayes Text Categorization Algorithm[J].Journal of Nanjing Normal University(Engineering and Technology),2004,04(03):049.

备注/Memo

备注/Memo:
收稿日期:2013-12-11.
基金项目:国家级星火计划项目、农村民生建设信息反馈平台建设项目(2011GA690190).
通讯联系人:张永军,讲师,研究方向:中文信息处理.E-mail:13511543380@139.com
更新日期/Last Update: 2014-09-30