|Table of Contents|

Research on Bayes-Based Spam Filtering(PDF)

南京师范大学学报(工程技术版)[ISSN:1006-6977/CN:61-1281/TN]

Issue:
2005年04期
Page:
61-64
Research Field:
Publishing date:

Info

Title:
Research on Bayes-Based Spam Filtering
Author(s):
LIN Qiaomin~1XU Jianzhen~1XU Dihua~1WANG Cheng~2
1.Campus Network Center,Nanjing University of Posts and Telecommunications,Jiangsu Nanjing 210003,China;2.Department of Information Engineering,Nanjing University of Posts and Telecommunications,Jiangsu Nanjing 210003,China
Keywords:
spam tex t categor ization vector spacem ode l Bayes a lgor ithm
PACS:
TP393.098
DOI:
-
Abstract:
E-m a il comm un ications betw een people have been g rea tly affected by spam prob lem. In th is paper, N ave Bayesian categor ization algor ithm is deduced and ana lyzed as we ll as its application and va lidation in the exper im ents of spam filter ing. F irstly, the paper introduces Tex t categor ization techn ique, inc luding comm on ly used vector space m ode l to represent the tex t and feature extraction m ethods, such as inform ation g ain and docum en t frequency based m ethod. W hat is mo re, the behav io r of inform a tion ga in m ethod in the exper im ents is explained. Secondly, it deduces and analyzes Nave Bayesian w ith the prem ise o f independence w ith in fea tures. Then, it uses m a ils co llected before as co rpus, utilize k- fold cross-va lida tion, and app lys the nav e Bayes ian in exper im ents. Based on probab ilities and tha t of term s belong ing to som e ca tego ry w hich are ga ined through tra in ing corpus, the paper catego rizes m ails from test co rpus respectively. Fina lly, experim enta l resu lt is show n by tw o ind ica to rs, precision and recall.

References:

[ 1] 许洪波, 程学旗, 王斌, 等. 文本挖掘与机器学习[ J]. 信息技术快报, 2005, 3( 2) : 1- 14.
[ 2] Androutsopou los I, Paliouras G, M iche lakis E. Learning to F ilte rUnso licited Comm erc ia l E-M a il [ R] . Technical Report 2004 /2, NCSR / Dem okritos0, 2004.
[ 3] M cCa llum, Andrew Kach ites. Bow: A too lk it fo r statist-i cal languag e modeling, text retr ieva,l classification and c luste ring [ EB /OL ]. http: / /www. cs. cm u. edu /~ m ccallum /bow, 1996.
[ 4] Androutsopou los I, Koutsias J, Chandrinos K V, et al. An eva luation of naive bayesian ant-i spam filter ing [ C ] / / Potam ias G, M oustak is V, Som e ren Van M, et al. Proceed ing s of the Wo rkshop on M ach ine Learn ing in the N ew Inform ation Age. Barcelona: 11th European Conference onM ach ine Lea rn ing ( ECML 2000), 2000: 9 -17.
[ 5] Saham iM. Us ing M ach ine Lea rning to Im prove Inform ation Access [ EB /OL]. http: / / a.i stanford. edu /~ saham i/bio. htm l,1998.
[ 6] Saham iM, Dum a is S, H eckerman D, et al. A bayesian approach to filtering junk e-m a il[ C ] / / Saham iM ehran, CravenM ark, Joach im s Thorsten, et al. Lea rning fo rTex t Categor ization: Papers from the 1998W orkshop. [ s. .l ]: AAA I, 1998.
[ 7] Friedm an N, Ge ig erD, Go ldszm idtM. Bayesian netw ork c lassifiers [ J] . M ach ine Learn ing, 1997, 29: 131- 163

Memo

Memo:
-
Last Update: 2013-04-29