[1]尹军梅,杨明.一种面向单个正例的Fisher线性判别分类方法[J].南京师范大学学报(工程技术版),2008,08(03):061-65.
 Yin Junmei,Yang Ming.A Fisher Linear Discriminant Classification Approach Dealing With Single Positive Sample[J].Journal of Nanjing Normal University(Engineering and Technology),2008,08(03):061-65.
点击复制

一种面向单个正例的Fisher线性判别分类方法
分享到:

南京师范大学学报(工程技术版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
08卷
期数:
2008年03期
页码:
061-65
栏目:
出版日期:
2008-09-30

文章信息/Info

Title:
A Fisher Linear Discriminant Classification Approach Dealing With Single Positive Sample
作者:
尹军梅;杨明;
南京师范大学数学与计算机科学学院, 江苏南京210097
Author(s):
Yin JunmeiYang Ming
School of Mathematics and Computer Science,Nanjing Normal University,Nanjing 210097,China
关键词:
不平衡数据集 F isher线性判别 过抽样
Keywords:
im ba lanced data se t fisher linear discr im inant( FLD) over-samp ling
分类号:
TP181
摘要:
提出了一种解决不平衡数据集中少数类只有一个样本的方法,找出单个正例在负类中的k个近邻,按照一定规则依次在单个正例和它的各个近邻的连线上产生合成样本,并把这些合成样本添加到原始的正类中,用加权F isher线性分类方法对新的数据集进行训练.实验结果表明该方法可有效地提高少数类的分类性能.
Abstract:
An approach to dea ling w ith imbalanced data set w ith on ly one positive sam ple is proposed. After finding out the K-Near-Ne ighbours( K-NN) o f the sing le pos itive sample, according to certa in rules, synthetic samp les are produced in turn on the connected lines be tw een the sing le positive samp le and every near ne ighbour of it. Then the produced synthetic samp les are added to the o rig ina l positive c lasses. Further, the new data set is tra ined w ith the we ighing F isher linear d iscr im inant classification approach. In the experim ent, e igh t data sets are chosen from UCI, and the da ta sets are tra ined. The resu lts show that th is approach can improve the classifica tion perfo rmance o f the m inor ity classes effective ly.

参考文献/References:

[ 1] Chan P K, Sto lfo S J. Tow ard sca lab le learn ing w ith non-un iform c lass and cost distributions: a case study in credit ca rd fraud detection[ C] / / Proc of the Fourth Interna tiona l Con ference on Know ledg eD iscovery and DataM ining( KDD- 98). NewYork, 1998: 164-168.
[ 2] W eiss G M, H irsh H. Learn ing to pred ict rare events in event sequences [ C ] / / Proc of the Fourth Internationa lConference on
Know ledg e Discovery and Da taM ining ( KDD- 98). New York, 1998: 359-363.
[ 3] A tiya A F. Bankruptcy prediction for credit risk using neura l netwo rk: a surv ey and new results [ J] . IEEE Trans on Neural Ne tw orks, 2001, 12( 4): 929-935.
[ 4] KubatM, H o lte R C, M atw in S. M ach ine learn ing for the detec tion o f o il spills in sa tellite radar im ages[ J]. M ach ine Learning,1998, 30( 2): 195-215.
[ 5] M a loo fM A. Lea rn ing when data sets a re imba lanced and w hen co sts are unequa l and unknown[ C] / / ICML- 2003W orkshop on Learn ing From Im balanced Da ta Sets II, 2003.
[ 6] KubatM, M atw in S. Address ing the curse o f imbalanced tra in ing sets: one-sided selection[ C] / / Proceedings o f the Fourteen th Interna tiona l Conference onM achine Learn ing. San Franc isco, CA: M organ Kaufm ann Press, 1997: 179-186.
[ 7] Chaw laN, Bow yerK, H allL, e t a.l SMOTE: syntheticm ino rity over-samp ling technique[ J]. Journa l o fArtific ia l Inte lligence Research, 2002, 16: 321-357.
[ 8] 周荃, 王崇骏, 王珺, 等. PC415: 用于不均衡数据集的C41 5改进算法[ J]. 计算机辅助工程, 2006, 15( 3): 23-26.
Zhou Quan,W ang Chong jun, W ang Jun, et a.l PC415: im proved C415 algorithm app lied in imba lanced datase t[ J]. Com puter A ided Eng ineer ing, 2006, 15( 3) : 23-26. ( in Ch inese)
[ 9] 肖健华, 吴今培. 样本数目不对称时的SVM 模型[ J] . 计算机科学, 2003, 30( 2): 165-167.
X iao Jianhua, Wu Jinpe.i SVM model w ith unequa l samp le number betw een c lasses[ J] . Computer Sc ience, 2003, 30( 2): 165-167. ( in Ch inese)
[ 10] 谢纪刚, 裘正定. 非平衡数据集Fishe r线性判别模型[ J]. 北京交通大学学报, 2006, 30( 5) : 15-18.
X ie J igang, Q iu Zhengding. Fisher linear d iscr im inant model w ith c lass im balance[ J] . Journal o f B eijing Jiaotong Un iv ers ity,2006, 30( 5): 15-18. ( in Ch inese)
[ 11] Chaw la N, Lazarev ic A, H all L, et a.l SMOTEBoost: im prov ing prediction of them inor ity class in boo sting[ C] / / 7 th European
Con ference on Pr inc iples and Practice o f Know ledg e D iscovery in Databases. Croatia: Cav tat-Dubrovn ik, 2003: 107-119.
[ 12] 边肇棋, 张学工. 模式识别[M ] . 北京: 清华大学出版社, 2001.
B ian Zhaoq,i Zhang Xuegong. Patte rn Recognition[M ]. Be ijing: TsinghuaUn iversity Press, 2001. ( in Chinese)

备注/Memo

备注/Memo:
基金项目: 国家自然科学基金( 40771163)资助项目.
通讯联系人: 杨 明, 教授, 博士, 研究方向: 数据挖掘、机器学习与粗糙集理论及应用研究. E-m ail:myang@ n jnu. edu. cn
更新日期/Last Update: 2013-04-24