[1]杨杨,吕静.高维数据的特征选择研究[J].南京师范大学学报(工程技术版),2012,12(01):057-63.
Yang Yang,Lü Jing.Some Studies on Feature Selection for High Dimensional Data[J].Journal of Nanjing Normal University(Engineering and Technology),2012,12(01):057-63.
点击复制
高维数据的特征选择研究
南京师范大学学报(工程技术版)[ISSN:1006-6977/CN:61-1281/TN]
- 卷:
-
12卷
- 期数:
-
2012年01期
- 页码:
-
057-63
- 栏目:
-
- 出版日期:
-
2012-03-20
文章信息/Info
- Title:
-
Some Studies on Feature Selection for High Dimensional Data
- 作者:
-
杨杨1; 吕静2
-
( 1. 南京师范大学强化培养学院,江苏南京210046) ( 2. 南京师范大学计算机科学与技术学院,江苏南京210046)
- Author(s):
-
Yang Yang1; Lü Jing2
-
1.Honor School,Nanjing Normal University,Nanjing 210046,China
-
- 关键词:
-
高维数据; 降维; 特征选择
- Keywords:
-
high dimension data; dimensionality reduction; feature selection
- 分类号:
-
TP181
- 摘要:
-
特征选择是机器学习的重要研究内容之一.相对于低维数据的特征选择而言,高维数据的特征选择更具挑战性,尤其是高维小样本的特征选择问题,因而吸引很多研究者的关注.高维特征选择问题称为稀疏建模问题,其目标是解决现有特征建模方法在高维特征空间失效的问题.本文对高维数据的特征选择研究成果进行了相应的总结和展望.
- Abstract:
-
Feature selection is a key issue in machine learning field. As compared with feature selection for low dimensional data,feature selection for high dimensional data is a challenging task,especially feature selection issue for high dimensional small size data,so many researcher focus on this problem. In essence,the feature selection problem for high dimensional data is regarded as a sparse modeling issue,whose target is to solve the failure problem of the existing feature modeling methods on high dimensional feature space. Therefore,in this paper,we give a survey of the feature selection methods for high dimensional data,and meanwhile propose some discussions on future work. Our main objective is to provide a reference for readers who are interesting in this research field.
参考文献/References:
[1]Fukunaga K. Introduction of Statistical Pattern Recognition[M]. 2nd ed. Waltham: Academic Press,1991.
[2]黄睿,何明一,杨少军. 一种适用于小样本问题的基于边界的特征提取算法[J]. 计算机学报, 2007, 30( 7) : 1 173-1 178. Huang Rui,He Mingyi,Yang Shaojun. A margin based feature extraction algorithm for the small sample size problem[J]. Chinese Journal of Computers,2007, 30( 7) : 1 173-1 178. ( in Chinese)
[3]He X F,Niyogi P. Locality preserving projections[C]/ / Vancouver,Whistler,Eds. Advances in Neural Information Processing Systems. Cambridge: MIT Press,2003.
[4]Cai D,He X H,Han J W. Semi-supervised discriminant analysis[C]/ / Eleventh IEEE International Conference on Computer Vision. Brazil: Rio de Janeiro,2007.[5]Liu H,Motoda H. Feature Selection for Knowledge Discovery and Data Mining[M]. Boston: Kluwer, 1998.
[6]毛勇,周晓波,夏铮,等. 特征选择算法研究综述[J]. 模式识别与人工智能, 2007, 20( 2) : 211-218. Mao Yong,Zhou Xiaobo,Xia Zheng,et al. A survey for study of feature selection algorithms[J]. Pattern Recognition & Artificial Intelligence,2007, 20( 2) : 211-218. ( in Chinese)
[7]Yu L,Liu H. Feature selection for high-dimensional data: a fast correlation-based filter solution[C]/ / Proceedings of the 20th International Conferences on Machine Learning. Washington,DC,2003: 856-863.
[8]Pudil P,Novovicova J,Kittler J. Floating search methods in feature selection[J]. Pattern Recognition Letters,1994,15: 1 119-1 125.
[9]Liu Y,Zheng Y F. FS-SFS: A novel feature selection method for support vector machines[J]. Pattern Recognition,2006, 39: 1 333-1 345.
[10]Zhou X,Mao K Z,Wu X Y,et al. Fast gene selection for microarray data using SVM-Based evaluation criterion[C]/ / IEEE International Conference on Bioinformatics and Biomedicine. IEEE Computer Society,2008: 386-389.
[11]Kira K,Rendell L. A practical approach to feature selection[C]/ / Proceedings of 9th International Workshop on Machine Learning. San Francisco: Morgan Kaufmann Publishers Inc,1992: 249-256.
[12]Ran G B,Amir N,Naftali T. Margin based feature selection-theory and algorithms[C]/ / Proceedings of the 21th International Conference on Machine Learning. Canada: Banff,2004: 43-50.
[13]王练,李云,汪血焰. 高维特征集选择模型研究[J]. 重庆邮电学院学报: 自然科学版, 2005, 17( 1) : 113-116. Wang Lian,Li Yun,Wang Xueyan. Study on the model of feature selection from huge feature sets[J]. Journal of Chongqing University of Posts and Telecommunications: Nature Science,2005, 17( 1) : 113-116. ( in Chinese)
[14]Peng H C,Long F H,Ding C. Feature selection based on mutual information criteria of max-dependency,max-relevance, and min-redundancy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2005, 27( 8) : 1 226-1 238.
[15]刘峤,秦志光,陈伟,等. 基于零范数特征选择的支持向量机模型[J]. 自动化学报, 2011, 37( 2) : 252-256. Liu Qiao,Qin Zhiguang,Chen Wei,et al. Zero-norm penalized feature selection support vector machine[J]. Acta Automatica Sinica,2011, 37( 2) : 252-256. ( in Chinese)
[16]刘峤,王娟,陈伟,等. 基于随机复杂度约束的高维特征自动选择算法[J]. 电子学报, 2011, 39( 2) : 370-374. Liu Qiao,Wang Juan,Chen Wei,et al. An automatic feature selection algorithm for high dimensional data based on the stochastic complexity regularization[J]. Acta Electronica Sinica,2011, 39( 2) : 370-374. ( in Chinese)
[17]Gheyas I A,Smith L S. Feature subset selection in large dimensionality domains[J]. Pattern Recognition,2010,( 43) : 5- 13.
[18]张波涛,刘士荣,吕强. 采用生物信息克隆的免疫算法[J]. 控制理论与应用,2010, 27( 6) : 799-803. Zhang Botao,Liu Shirong,Lü Qiang. Immune algorithm with biologic information clone[J]. Control Theory & Applications, 2010, 27( 6) : 799-803. ( in Chinese)
[19]任江涛,黄焕宇,孙婧昊,等. 基于相关性分析及遗传算法的高维数据特征选择[J]. 计算机应用, 2006,26( 6) : 1 403- 1 405. Ren Jiangtao,Huang Huanyu,Sun Jinghao,et al. High-dimensional data feature selection based on relevance analysis and GA[J]. Journal of Computer Applications,2006,26( 6) : 1 403-1 405. ( in Chinese)
[20]吴进文,赵晓翠,陈苗苗. 基于遗传算法的高维特征选择的研究[J]. 郑州轻工业学院学报: 自然科学版,2010,25 ( 2) : 75-78. Wu Jinwen,Zhao Xiaocui,Chen Miaomiao. Research on high-dimensional feature selection based on genetic algorithms[J]. Journal of Zhengzhou University of Light Industry: Natural Science,2010, 25( 2) : 75-78. ( in Chinese)
[21]于化龙,顾国昌,刘海波,等. 基于相关性分析的微阵列数据集成分类研究[J]. 计算机研究与发展,2010,47( 2) : 328-335. Yu Hualong,Gu Guochang,Liu Haibo,et al. Ensemble classification of microarray data based on correlation analysis[J]. Journal of Computer Research and Development,2010, 47( 2) : 328-335. ( in Chinese)
[22]Byeon B,Rasheed K. Selection of classifier and feature selection method for microarray data[C]/ / 2010 Ninth International Conference on Machine Learning and Applications ( ICMLA) . Washington,DC,2010.
[23]Santos J M,Ramos S. Using a clustering similarity measure for feature selection in high dimensional data sets[C]/ / Proceedings of ISDA’2010. Cairo,2010.[24]王博,贾焰,杨树强,等. 文本多分类中的特征选择研究[J]. 计算机工程与科学, 2010, 32( 8) : 92-93. Wang Bo,Jia Yan,Yang Shuqiang,et al. Feature selection for multi-class text categorization[J]. Computer Engineering & Science,2010, 32( 8) : 92-93. ( in Chinese)
[25]尚文倩,黄厚宽,刘玉玲,等. 文本分类中基于基尼指数的特征选择算法研究[J]. 计算机研究与发展,2006,43( 10) : 1 688-1 694. Shang Wenqian,Huang Houkuan,Liu Yuling,et al. Research on the algorithm of feature selection based on gini index for text categorization[J]. Journal of Computer Research and Development,2006,43( 10) : 1 688-1 694. ( in Chinese)
[26]Zheng Z,Wu X,Srihari R. Feature selection for text categorization on imbalanced data[J]. ACM SIGKDD Explorations,Newsletter, 2004( 6) : 80-89.
[27]Wasikowski M,Chen X W. Combating the small sample class imbalance problem using feature selection[J]. IEEE Transaction on Knowledge and Data Engineering,2010, 22( 10) : 1 388-1 400.
[28]Shahib A A,Breitling R,Gilbert D. Feature selection and the class imbalance problem in predicting protein function from sequence [J]. Applied Bioinformatics,2005( 4) : 195-203.
[29]Byeon B,Rasheed K. Selection of classifier and feature selection method for microarray data[C]/ / 2010 Ninth International Conference on Machine Learning and Applications. Washington,DC: IEEE Computer Society,2010: 534-539.
备注/Memo
- 备注/Memo:
-
基金项目: 南京师范大学2010 年学生科学基金( 首批立项) .
通讯联系人: 吕静,讲师,研究方向: 模式识别理论与应用. E-mail: 05275@ njnu. edu. cn
更新日期/Last Update:
2013-03-11