Classification Methods on Imbalanced Data: a Survey(PDF)


Research Field:
Publishing date:


Classification Methods on Imbalanced Data: a Survey
Yang MingYin JunmeiJi Genlin
School of Mathematics and Computer Science,Nanjing Normal University,Nanjing 210097,China
imba lanced data over-sam pling under-samp ling cost-sensitive one c lassifie r feature se lection subspace
C lassifica tion is one of the mo st im po rtant research contents in m achine lea rn ing, and the trad itiona l classif-i ca tion m ethods are re lative ly m ature, when dea ling w ith w el-l ba lanced data they can m ake good perform ance. But in real w orld the data is usua lly im ba lanced. The design o f the ex isting class ification me thods is often based on the assumption tha t the tra in ing sets are we l-l balanced, so it m ay lead to the descend ing capability o f the c lassification m ethods when dealing w ith im balanced da ta. M ak ing researches on imba lanced data is qu ite important. In order to he lp readers to have a clear idea o f the curren tly propo sed and futurew ork on the issue o f unba lanced da ta class ification, w e make a sim ple survey of the stud ies of th is issue and g ive som e key problem s attracting resea rchers in th is paper.


