[1]高洁,吉根林.一种增量式Bayes文本分类算法[J].南京师范大学学报(工程技术版),2004,04(03):049-52.
 GAO Jie,JI Genlin.Incremental Bayes Text Categorization Algorithm[J].Journal of Nanjing Normal University(Engineering and Technology),2004,04(03):049-52.
点击复制

一种增量式Bayes文本分类算法
分享到:

南京师范大学学报(工程技术版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
04卷
期数:
2004年03期
页码:
049-52
栏目:
出版日期:
2004-09-30

文章信息/Info

Title:
Incremental Bayes Text Categorization Algorithm
作者:
高洁吉根林
南京师范大学数学与计算机科学学院, 江苏南京210097
Author(s):
GAO Jie JI Genlin
School of Mathematics and Computer Science, Nanjing Normal University, Nanjing 210097, China
关键词:
文本分类 增量学习 Nave Bayes
Keywords:
text categorization incremental learning NaÇve Bayes
分类号:
TP391.1
摘要:
文本自动分类是数据挖掘和机器学习中非常重要的研究领域 .针对难以获得大量有类标签的训练集问题 ,提出了基于小规模标注语料的增量式Bayes文本分类算法 .该算法分两种情况处理 :第一种情况是新增样本有类标签 ,可直接重新计算样本属于某类别的条件概率 .第二种情况是新增样本无类标签 ,则利用现有分类器为其训练类标签 ,然后利用新样本来修正分类器 .实验结果表明 ,该算法是可行有效的 ,比Na veBayes文本分类算法有更高的精度 .增量式Bayes分类算法的提出为分类器的更新提供了一条新途径
Abstract:
Automatic text categorization is an important research field in data mining and machine learning. An incremental Bayes text categorization algorithm based on small labeled documents is presented to solve the difficult problem involving getting labeled training documents. The algorithm can process two cases : the labeled and unlabeled incremental documents. Directly computing the probability of the samples of a certain class is the processing method for labeled documents. The unlabeled docu ments are labeled first by using the original classification , and then the new classification is trained from the incremental docu ments. The experimental results show that this algorithm is feasible and effective , providing a new method for updating of classi fication.

参考文献/References:

[1 ] Rish I. An empirical study of the naÇve Bayes classifier[C] . IJCAI - 01 workshop on“Empirical Methods in AI”Technical reports , 2001. 215.
[2 ] SAMUEL KOTZ. Modern Bayesian Statistics [M] . George Washington University Press , 2000. 109.
[3 ] Geiger D , Heckeman D. A characterization of the Dirichlet distribution with applicable to learning Bayesian network[A] . Proceedings of Eleventh Conference on Uncertainty in Artifi cial Intelligence[C] . Montreal , QU ,1995. 196- 207.
[4 ] Dominigos P , Pazzani M. On the optimality of the simple Bayesian classifier under zero-one loss [J ] . Machine Learn ing ,1997 ,29 (2 3) :103- 130.
[5 ] Kamal Nigam , et al . Learning to classify the text from la beled and unlabeled documents[A] . Proc 15th National Con ference on Artificial Intelligence [C] . Wisconsin ,1998. 792- 799.

相似文献/References:

[1]沈志斌,白清源.文本分类中特征权重算法的改进[J].南京师范大学学报(工程技术版),2008,08(04):095.
 Shen Zhibin,Bai Qingyuan.Improvement of Feature Weighting Algorithm in Text Classification[J].Journal of Nanjing Normal University(Engineering and Technology),2008,08(03):095.
[2]张永军,刘金岭.一种改进的高效贝叶斯短信文本分类器[J].南京师范大学学报(工程技术版),2014,14(03):070.
 Zhang Yongjun,Liu Jinling.An Improved Efficient Bayesian Short Message Text Classifier[J].Journal of Nanjing Normal University(Engineering and Technology),2014,14(03):070.

备注/Memo

备注/Memo:
基金项目: 江苏省重点实验室开放基金资助项目(KJS03064) .
作者简介: 高洁(1979 - ) ,女,硕士,助教,主要从事数据库和数据挖掘技术的研究. E-mail :scarletg@tom. com
通讯联系人: 吉根林(1964 - ) ,教授,主要从事数据库和数据挖掘技术的教学与研究. E-mail :glji @njnu. edu. cn
更新日期/Last Update: 2013-04-29