|Table of Contents|

Improvement of Feature Weighting Algorithm in Text Classification(PDF)

南京师范大学学报(工程技术版)[ISSN:1006-6977/CN:61-1281/TN]

Issue:
2008年04期
Page:
95-98
Research Field:
Publishing date:

Info

Title:
Improvement of Feature Weighting Algorithm in Text Classification
Author(s):
Shen ZhibinBai Qingyuan
College of Mathematics and Computer Science,Fuzhou University,Fuzhou 350002,China
Keywords:
tex t c lassifica tion feature we ight TFIDF class difference BOR-TFIDF
PACS:
TP391.1
DOI:
-
Abstract:
TFIDF is a k ind of comm on m ethods used to m easure the te rm s in a docum en t. The me thod is easy but ignores the distr ibution of the feature in each class. So, it can not rea lly re flect each fea ture’ s contribution to each class. A im ing at th is sho rtage, w e put forwa rd the BOR-TFIDF and use it to readjust each feature’ s d ifferentiation to each class, .i e. , mod ifies each feature’ s w e ight. Then the classifier is used to check its validaty. The m ethod is be tter than trad itional TFIDF and proves that the BOR-TFIDF m ethod is feasible.

References:

[ 1] 张玉芳, 彭时名, 吕佳. 基于文本分类TFIDF方法的改进与应用[ J]. 计算机工程, 2006, 32( 19) : 76-78.
Zhang Yufang, Peng Sh im ing, LÜ Jia. Improvem ent and application o fTFIDF m ethod based on tex t classification[ J]. Computer Eng ineering, 2006, 32( 19): 76-78. ( in Chinese)
[ 2] Sebastiani F. M ach ine learn ing in au tom ated tex t ca tego rization[ J]. ACM Computing Surveys, 2002, 34( 1): 1-47.
[ 3] Lew is D D, Naïve Bayes. The independence assum ption in in fo rm ation re trieval[ C ] / / The 10 th European Con f onM achine
Learning. N ew York: Springer-Verlag, 1998.
[ 4] Y im ingY ang, X in L iu. A re-ex am ination o f text ca tego rization m e thods[ C ] / / S IGIR’ 99. New York: ACM Press, 1999: 42-49.
[ 5] Yang Y, Chute C G. An exam ple-based mapp ingm e thod for tex t categor ization and re trieval[ J]. ACM T rans on Inform ation System s, 1994, 12( 3): 252-277.
[ 6] H an E H, Karyp is G. Centro id-based docum ent c lassifica tion: analysis and experim enta l results[ C] / / Proc of PKDD’ 00. London: Springer-Ver lag, 2000: 424-431.
[ 7] Schapire R E, SingerY. Im proved boosting algorithm s using confidence-rated pred ica tions[ C ] / / Proc of the 11 th Annual Conf on Computational Learn ing Theory. M adison: ACM Press, 1998: 80-91.
[ 8] Joach im s T. Tex t categor ization w ith support vecto rm ach ines: learn ing w ith m any re levant featu res[ C ] / / The 10th European Confon Machine Learn ing. B erlin: Spr ing er, 1998: 137-142.
[ 9] 徐凤亚, 罗振声. 文本自动分类中特征权重算法的改进研究[ J]. 计算机工程与应用, 2005( 1): 181-184.
Xu Fengya, Luo Zhensheng. An improved approach to term we ighting in autom ated tex t classification[ J]. Com puter Eng ineering and App lica tions, 2005( 1): 181-184. ( in Ch inese)
[ 10] 张云涛, 龚玲, 王永成. 文本分类中TFIDF方法的改进[ J]. 浙江大学学报, 2005, 6A( 1): 49-55.
Zhang Yuntao, Gong Ling, W ang Yong cheng. An im proved TF- IDF approach for text class ification[ J]. Journal of Zhe jiang University, 2005, 6A( 1): 49-55. ( in Ch inese)
[ 11] 寇莎莎, 魏振军. 自动文本分类中权值公式的改进[ J]. 计算机工程与设计, 2005, 26( 6): 1 616-1 618.
Kou Shasha, W e i Zhenjun. Im proved w eigh ting fo rmu la in auto tex t c lassifica tion[ J]. Computer Eng ineer ing and Des ign,2005, 26( 6): 1 616-1 618. ( in Ch inese)
[ 12] 李荣陆. 文本分类系统[ DB /OL]. http: / /www. nlp. org. cn /docs/dow nload. php? doc- id= 102. 2004- 08- 19.
L iRong lu. Tex t c lassica tion system [ DB /OL ]. Data Se t, http: / /www. nlp. org. cn /docs/download. php? doc- id= 102.2004- 08- 19. ( in Chinese)
[ 13] Dav id D, Lew is. Reuters- 21578, Test Co llections[ R /OL] . h ttp: / /www. dav iddlew is. com / resources/ testco llections/ reuters21578/. 1996.

Memo

Memo:
-
Last Update: 2013-04-24