[1]凌霄汉,吉根林.一种基于聚类集成的无监督特征选择方法[J].南京师范大学学报(工程技术版),2007,07(03):060-63.
 Ling Xiaohan,Ji Genlin.A Clustering Ensemble Based Unsupervised Feature Selection Approach[J].Journal of Nanjing Normal University(Engineering and Technology),2007,07(03):060-63.
点击复制

一种基于聚类集成的无监督特征选择方法
分享到:

南京师范大学学报(工程技术版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
07卷
期数:
2007年03期
页码:
060-63
栏目:
出版日期:
2007-09-30

文章信息/Info

Title:
A Clustering Ensemble Based Unsupervised Feature Selection Approach
作者:
凌霄汉;吉根林;
南京师范大学数学与计算机科学学院, 江苏南京210097
Author(s):
Ling XiaohanJi Genlin
School of Mathematics and Computer Science,Nanjing Normal University,Nanjing 210097,China
关键词:
特征选择 无监督学习 集成学习
Keywords:
feature se lection unsuperv ised learn ing ensem ble lea rning
分类号:
TP311.13
摘要:
提出了一种无监督的特征选择方法,其基本思想是利用聚类来指导特征选择,对于无类别标签的数据样本集,先进行聚类获得数据类标签,再利用ReliefF算法进行特征选择.采用聚类集成方法解决一些聚类结果的不稳定问题,最终特征选择结果通过多次特征选择综合得到.实验结果表明,该算法具有良好的特征选择性能,在去除无关或冗余特征后可进一步提高聚类质量.
Abstract:
An unsuperv ised fea ture se lection approach is proposed, wh ich utilizes c luste ring to obta in the c lass labe l o f data ob ject and uses ensemb le techn ique to reso lve the instab ility o f cluster ing. As c lustering resu lts generated by som e a lgo rithm s are usually different from each other, feature se lection perfo rm sm ultip ly and all results are com b ined to produce fina l se lected fea tures. In addition, Relie fF is ame liorated, w hich is a superv ised fea ture selection a lgorithm and is em ployed as an essentia l part in the approach. Exper imenta l resu lts show that the approach can rem ove redundan t features and improve the quality of c lustering.

参考文献/References:

[ 1] Kononenko I. Estima ting attr ibutes: ana lys is and ex tensions o f re lief[ C] / / Proceed ings o f the 7 th European Con fe rence onM ach ine Learn ing. B erlin: Spr ing er, 1994: 171-182.
[ 2] Liu H, SetionoR. Featu re se lection and c lassification: a probab ilisticw rapper approach[ C] / / Proceed ings of the 9 th Internationa l Con fe rence on Industr ia l and Eng ineering App lications o fA I and ES. Fukuoka: Springer, 1996: 419-424.
[ 3] DashM, L iu H. Fea ture se lection for c lassifica tion[ J] . Inte lligent Data Ana ly sis, 1997, 1( 3): 131-156.
[ 4] Schapire R E. The strength of w eak learnab ility [ J]. M achine Learn ing, 1990, 5( 2) : 197-227.
[ 5] Fred A L N, Ja in A K. Data c lustering us ing ev idence accumu lation[ C ] / / Proceedings o f the 16th Internationa lConference on Pattern Recogn ition. Quebec: IEEE Press, 2002: 276-280.
[ 6] Newm an D J, H e ttich S, B lake C L, et a.l UC I reposito ry o f m ach ine learn ing da tabases [ EB /OL]. [ 2006-12-21] http: / / www. ics. uc .i edu / ~ m learn /MLRepository. htm ,l 1998.
[ 7] M artin H C Law, MÇ rioA T Figu re iredo, An ilK Jain. S imu ltaneous feature se lection and c luste ring usingm ix turem ode ls[ J]. IEEE Transac tions on Pa ttern Analysis andM ach ine In tellig ence, 2004, 26( 9): 1 154-1 166.
[ 8] M odha D S, Spang lerW S. Fea ture we ighting in k-means c lustering[ J]. M ach ine Learn ing, 2003, 52( 3): 217-237

相似文献/References:

[1]万文强,张伶卫.分布式环境下的隐私保护特征选择研究[J].南京师范大学学报(工程技术版),2012,12(03):060.
 Wan Wenqiang,Zhang Lingwei.Privacy Preserving Feature Selection in Distributed Environment[J].Journal of Nanjing Normal University(Engineering and Technology),2012,12(03):060.
[2]杨杨,吕静.高维数据的特征选择研究[J].南京师范大学学报(工程技术版),2012,12(01):057.
 Yang Yang,Lü Jing.Some Studies on Feature Selection for High Dimensional Data[J].Journal of Nanjing Normal University(Engineering and Technology),2012,12(03):057.
[3]杨杨,刘会东.一种基于成对约束的特征选择改进算法[J].南京师范大学学报(工程技术版),2011,11(01):056.
 Yang Yang,Liu Huidong.An Improved Algorithm for Feature Selection Based on Pairwise Constraint[J].Journal of Nanjing Normal University(Engineering and Technology),2011,11(03):056.
[4]孙良君,范剑锋,杨琬琪,等.基于Group Lasso的多源电信数据离网用户分析[J].南京师范大学学报(工程技术版),2014,14(04):077.
 Sun Liangjun,Fan Jianfeng,Yang Wanqi,et al.Group Lasso-Based Feature Selection for Off-networkAnalysis in Multisource Teledata[J].Journal of Nanjing Normal University(Engineering and Technology),2014,14(03):077.
[5]宗 影,李玉凤,刘红玉.基于面向对象随机森林方法的滨海湿地植被分类研究[J].南京师范大学学报(工程技术版),2021,21(04):047.[doi:10.3969/j.issn.1672-1292.2021.04.008]
 Zong Ying,Li Yufeng,Liu Hongyu.A Study of Coastal Wetland Vegetation ClassificationBased on Object-oriented Random Forest Method[J].Journal of Nanjing Normal University(Engineering and Technology),2021,21(03):047.[doi:10.3969/j.issn.1672-1292.2021.04.008]
[6]潘思远,刘园奎,毛 煜,等.基于邻域决策误差率的多标记特征选择[J].南京师范大学学报(工程技术版),2023,23(01):066.[doi:10.3969/j.issn.1672-1292.2023.01.009]
 Pan Siyuan,Liu Yuankui,Mao Yu,et al.Multi-Label Feature Selection Based on Neighborhood Approximation Error Rate[J].Journal of Nanjing Normal University(Engineering and Technology),2023,23(03):066.[doi:10.3969/j.issn.1672-1292.2023.01.009]
[7]刘海宏,鱼 明,刘 静,等.基于特征选择和深度学习模型的经济效益风险预测[J].南京师范大学学报(工程技术版),2024,24(04):087.[doi:10.3969/j.issn.1672-1292.2024.04.009]
 Liu Haihong,Yu Ming,Liu Jing,et al.Economic Benefit Risk Prediction Based on Feature Selection and Deep Learning Model[J].Journal of Nanjing Normal University(Engineering and Technology),2024,24(03):087.[doi:10.3969/j.issn.1672-1292.2024.04.009]

备注/Memo

备注/Memo:
基金项目: 江苏省自然科学基金( BK2005135)资助项目.
作者简介: 凌霄汉( 1981-) , 硕士研究生, 主要从事集成学习与数据挖掘方面的学习与研究. E-m ail:nolen0@ 163. com
通讯联系人: 吉根林( 1964-), 教授, 博士生导师, 主要从事数据库与数据挖掘、机器学习等方面的教学与研究. E-m ail:jigenl in@ njnu. edu. cn
更新日期/Last Update: 2013-06-04