[1]商炳章,白清源.基于互信息规则剪枝的关联文本分类[J].南京师范大学学报(工程技术版),2008,08(04):173-177.
 Shang B ingzhang,B aiQ ingyuan.On Classif ication of Associative Text Based on Rules Pruning ofMutual Information[J].Journal of Nanjing Normal University(Engineering and Technology),2008,08(04):173-177.
点击复制

基于互信息规则剪枝的关联文本分类
分享到:

南京师范大学学报(工程技术版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
08卷
期数:
2008年04期
页码:
173-177
栏目:
出版日期:
2008-12-30

文章信息/Info

Title:
On Classif ication of Associative Text Based on Rules Pruning ofMutual Information
作者:
 商炳章 白清源
 福州大学数学与计算机科学学院, 福建福州350002
Author(s):
Shang B ingzhang B aiQ ingyuan
C ollege ofM ath em at ics and C om pu ter S cience, Fuzh ouU n ivers ity, Fuzhou 350002, Ch ina
关键词:
 互信息 规则剪枝 关联分类
Keywords:
 mutual in fo rm ation rules pruning assoc iative c lassifica tion
文献标志码:
A
摘要:
传统的关联文本分类算法产生的规则数量巨大, 若不对规则剪枝会影响分类效率, 而采用以前的剪枝方法又会使分类精度出现不同程度的下降. 为此提出以互信息的方法对每个类的规则进行剪枝, 挑选出分类能力强的规则构成分类器, 对待分类文本进行分类. 经过这个方法剪枝后的规则数量大幅减少, 且能取得比规则集未修剪过的分类器和采用以前剪枝方法的 ARC- BC 算法更好的分类效果, 大量的实验表明此方法是有效的.
Abstract:
The traditiona l assoc ia tive c lassify ing algor ithm s of assoc iative texts gene ra te a huge mum be r of ru les. If the ru les w ere no t pruned, the e ffic iency o f c lassification would be influenced. H ow ever, if the form er prun ingm ethod were adopted, d ifferent degrees of accuracy o f c lassifica tion w ould appear. Therefore, an assoc iative text c lassification algo rithm-based on ru les prun ing o fmutual inform ation is presen ted to prune the ru les o f each c lass. The ru les w ith h igh c las s ify ing capacity are chosen to form classifiers to c lassify the texts be ing classified. The study illum inates that the mutual inform ation-based rules pruning a lgo rithm no t on ly gets much less rules but ism o re he lpfu l fo r im prov ing the accuracy o f the assoc iation categor ization. The exper imenta l resu lts show the performance o f th is m e thod is better than both ARCBC a lgo rithm and the algor ithm wh ich uses a ll rules.

参考文献/References:

[ 1] Liu B, H suW, M a YM. Integ rating c lassification and assoc iation ru lem ining[ C] / / ACM In t’ l Conf on Know ledge D iscovery and DataM in ing. New Yo rk: ACM Press, 1998: 80-86.
[ 2] L iW, H an J, Pei J. CMAR: Accurate and e fficient c lassifica tion based on mu ltip le c lassification ru les[ C ] / / CerconeN. Proc
o f the 2001 IEEE Int’lConf on DataM in ing. Ca lifo rn ia: IEEE Press, 2001: 369-376.
[ 3] ZaÏane O R, An ton ieM L. C lassify ing tex t docum ents by assoc iating term s w ith tex t categor ies[ C] / / Zhou X F. Proc o f the 13th Austra lasian Da tabase Con.f M elbourne: Austra lian Com pute r Society, 2002: 215-222.
[ 4] Ag rawa l R, Sr ikant R. Fast algor ithm s fo rm in ing association ru les[ C] / / Bocca J B, JarkeM, Zan io lo C. Proc o f the 20th Vary Larg e Data Bases Conference. Santiago, 1994: 487-499.
[ 5] H an J, Pe i J, Y in Y W. M in ing frequen t pa tterns w ithout candidate generation[ J]. DataM in ing and Know ledge D iscovery,2004, 8( 1): 53-87.
[ 6] 陈晓云, 陈袆, 王雷, 等. 基于分类规则树的频繁模式文本分类[ J]. 软件学报, 2006, 17( 5): 1 017-1 025.
Chen Xiaoyun, Chen H u,i W ang Le,i et a .l Frequent pattern tex t classification based on rules tree[ J] . SofWt are, 2006, 17( 5): 1 017-1 025. ( in Ch inese)
[ 7] http: / / sewm. pku. edu. cn /QA / re ference / ICTCLAS /FreeICTCLAS /[ OL]. 中文自然语言处理开放平台网站, 2006.
http: / / sewm. pku. edu. cn /QA / reference / ICTCLAS /FreeICTCLAS /[ OL]. The S ite o f Ch inese N atura l LanÏgnage Processing Platfo rm, 2006. ( in Ch inese)

相似文献/References:

[1]李华峰,钱焕延.一种用于口令同步的通用混合密码传输协议[J].南京师范大学学报(工程技术版),2008,08(04):178.
 L iHuafeng,Q ianH uanyan.A GeneralHybrid Cryptograph Transfer ProtocolApplied in Password Synchronization[J].Journal of Nanjing Normal University(Engineering and Technology),2008,08(04):178.
[2]李 慧,李存华,王 霞.一种新颖的个性化视频搜索排名算法[J].南京师范大学学报(工程技术版),2008,08(04):182.
 L iH u,i L iCunhua,W ang X ia.A Novel Individualized V ideo Search Rank ing Algorithm[J].Journal of Nanjing Normal University(Engineering and Technology),2008,08(04):182.
[3]强 浩,施连敏,王洪元.基于CDMA 6025平台的Flash文件系统的设计[J].南京师范大学学报(工程技术版),2008,08(04):186.
 Q iang H ao,Sh iL ianm in,W angH ongyuan. Design of Flash File System Based on 6025 P latform of CDMA[J].Journal of Nanjing Normal University(Engineering and Technology),2008,08(04):186.
[4]冯茂岩.立体显示与三维液晶技术研究[J].南京师范大学学报(工程技术版),2008,08(04):195.
 FengM aoyan.Research of the Stereoscopic D isplay and 3D LCD Technique[J].Journal of Nanjing Normal University(Engineering and Technology),2008,08(04):195.

备注/Memo

备注/Memo:
 基金项目: 教育部留学回国人员启动基金、中科院软件所开放课题基金( SYSKF0701)、福州大学科技发展基金( 2005-XQ-13)和福建省教
育厅基金( JB06023)资助项目.
通讯联系人: 白清源, 教授, 研究方向: 数据库技术和数据挖掘. E-m ail:ba iqy@ fzu. edu. cn
更新日期/Last Update: 2013-07-22