|Table of Contents|

Some Studies on Feature Selection for High Dimensional Data(PDF)

南京师范大学学报(工程技术版)[ISSN:1006-6977/CN:61-1281/TN]

Issue:
2012年01期
Page:
57-63
Research Field:
Publishing date:

Info

Title:
Some Studies on Feature Selection for High Dimensional Data
Author(s):
Yang Yang1Lü Jing2
1.Honor School,Nanjing Normal University,Nanjing 210046,China
Keywords:
high dimension datadimensionality reductionfeature selection
PACS:
TP181
DOI:
-
Abstract:
Feature selection is a key issue in machine learning field. As compared with feature selection for low dimensional data,feature selection for high dimensional data is a challenging task,especially feature selection issue for high dimensional small size data,so many researcher focus on this problem. In essence,the feature selection problem for high dimensional data is regarded as a sparse modeling issue,whose target is to solve the failure problem of the existing feature modeling methods on high dimensional feature space. Therefore,in this paper,we give a survey of the feature selection methods for high dimensional data,and meanwhile propose some discussions on future work. Our main objective is to provide a reference for readers who are interesting in this research field.

References:

[1]Fukunaga K. Introduction of Statistical Pattern Recognition[M]. 2nd ed. Waltham: Academic Press,1991.
[2]黄睿,何明一,杨少军. 一种适用于小样本问题的基于边界的特征提取算法[J]. 计算机学报, 2007, 30( 7) : 1 173-1 178. Huang Rui,He Mingyi,Yang Shaojun. A margin based feature extraction algorithm for the small sample size problem[J]. Chinese Journal of Computers,2007, 30( 7) : 1 173-1 178. ( in Chinese)
[3]He X F,Niyogi P. Locality preserving projections[C]/ / Vancouver,Whistler,Eds. Advances in Neural Information Processing Systems. Cambridge: MIT Press,2003.
[4]Cai D,He X H,Han J W. Semi-supervised discriminant analysis[C]/ / Eleventh IEEE International Conference on Computer Vision. Brazil: Rio de Janeiro,2007.[5]Liu H,Motoda H. Feature Selection for Knowledge Discovery and Data Mining[M]. Boston: Kluwer, 1998.
[6]毛勇,周晓波,夏铮,等. 特征选择算法研究综述[J]. 模式识别与人工智能, 2007, 20( 2) : 211-218. Mao Yong,Zhou Xiaobo,Xia Zheng,et al. A survey for study of feature selection algorithms[J]. Pattern Recognition & Artificial Intelligence,2007, 20( 2) : 211-218. ( in Chinese)
[7]Yu L,Liu H. Feature selection for high-dimensional data: a fast correlation-based filter solution[C]/ / Proceedings of the 20th International Conferences on Machine Learning. Washington,DC,2003: 856-863.
[8]Pudil P,Novovicova J,Kittler J. Floating search methods in feature selection[J]. Pattern Recognition Letters,1994,15: 1 119-1 125.
[9]Liu Y,Zheng Y F. FS-SFS: A novel feature selection method for support vector machines[J]. Pattern Recognition,2006, 39: 1 333-1 345.
[10]Zhou X,Mao K Z,Wu X Y,et al. Fast gene selection for microarray data using SVM-Based evaluation criterion[C]/ / IEEE International Conference on Bioinformatics and Biomedicine. IEEE Computer Society,2008: 386-389.
[11]Kira K,Rendell L. A practical approach to feature selection[C]/ / Proceedings of 9th International Workshop on Machine Learning. San Francisco: Morgan Kaufmann Publishers Inc,1992: 249-256.
[12]Ran G B,Amir N,Naftali T. Margin based feature selection-theory and algorithms[C]/ / Proceedings of the 21th International Conference on Machine Learning. Canada: Banff,2004: 43-50.
[13]王练,李云,汪血焰. 高维特征集选择模型研究[J]. 重庆邮电学院学报: 自然科学版, 2005, 17( 1) : 113-116. Wang Lian,Li Yun,Wang Xueyan. Study on the model of feature selection from huge feature sets[J]. Journal of Chongqing University of Posts and Telecommunications: Nature Science,2005, 17( 1) : 113-116. ( in Chinese)
[14]Peng H C,Long F H,Ding C. Feature selection based on mutual information criteria of max-dependency,max-relevance, and min-redundancy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2005, 27( 8) : 1 226-1 238.
[15]刘峤,秦志光,陈伟,等. 基于零范数特征选择的支持向量机模型[J]. 自动化学报, 2011, 37( 2) : 252-256. Liu Qiao,Qin Zhiguang,Chen Wei,et al. Zero-norm penalized feature selection support vector machine[J]. Acta Automatica Sinica,2011, 37( 2) : 252-256. ( in Chinese)
[16]刘峤,王娟,陈伟,等. 基于随机复杂度约束的高维特征自动选择算法[J]. 电子学报, 2011, 39( 2) : 370-374. Liu Qiao,Wang Juan,Chen Wei,et al. An automatic feature selection algorithm for high dimensional data based on the stochastic complexity regularization[J]. Acta Electronica Sinica,2011, 39( 2) : 370-374. ( in Chinese)
[17]Gheyas I A,Smith L S. Feature subset selection in large dimensionality domains[J]. Pattern Recognition,2010,( 43) : 5- 13.
[18]张波涛,刘士荣,吕强. 采用生物信息克隆的免疫算法[J]. 控制理论与应用,2010, 27( 6) : 799-803. Zhang Botao,Liu Shirong,Lü Qiang. Immune algorithm with biologic information clone[J]. Control Theory & Applications, 2010, 27( 6) : 799-803. ( in Chinese)
[19]任江涛,黄焕宇,孙婧昊,等. 基于相关性分析及遗传算法的高维数据特征选择[J]. 计算机应用, 2006,26( 6) : 1 403- 1 405. Ren Jiangtao,Huang Huanyu,Sun Jinghao,et al. High-dimensional data feature selection based on relevance analysis and GA[J]. Journal of Computer Applications,2006,26( 6) : 1 403-1 405. ( in Chinese)
[20]吴进文,赵晓翠,陈苗苗. 基于遗传算法的高维特征选择的研究[J]. 郑州轻工业学院学报: 自然科学版,2010,25 ( 2) : 75-78. Wu Jinwen,Zhao Xiaocui,Chen Miaomiao. Research on high-dimensional feature selection based on genetic algorithms[J]. Journal of Zhengzhou University of Light Industry: Natural Science,2010, 25( 2) : 75-78. ( in Chinese)
[21]于化龙,顾国昌,刘海波,等. 基于相关性分析的微阵列数据集成分类研究[J]. 计算机研究与发展,2010,47( 2) : 328-335. Yu Hualong,Gu Guochang,Liu Haibo,et al. Ensemble classification of microarray data based on correlation analysis[J]. Journal of Computer Research and Development,2010, 47( 2) : 328-335. ( in Chinese)
[22]Byeon B,Rasheed K. Selection of classifier and feature selection method for microarray data[C]/ / 2010 Ninth International Conference on Machine Learning and Applications ( ICMLA) . Washington,DC,2010.
[23]Santos J M,Ramos S. Using a clustering similarity measure for feature selection in high dimensional data sets[C]/ / Proceedings of ISDA’2010. Cairo,2010.[24]王博,贾焰,杨树强,等. 文本多分类中的特征选择研究[J]. 计算机工程与科学, 2010, 32( 8) : 92-93. Wang Bo,Jia Yan,Yang Shuqiang,et al. Feature selection for multi-class text categorization[J]. Computer Engineering & Science,2010, 32( 8) : 92-93. ( in Chinese)
[25]尚文倩,黄厚宽,刘玉玲,等. 文本分类中基于基尼指数的特征选择算法研究[J]. 计算机研究与发展,2006,43( 10) : 1 688-1 694. Shang Wenqian,Huang Houkuan,Liu Yuling,et al. Research on the algorithm of feature selection based on gini index for text categorization[J]. Journal of Computer Research and Development,2006,43( 10) : 1 688-1 694. ( in Chinese)
[26]Zheng Z,Wu X,Srihari R. Feature selection for text categorization on imbalanced data[J]. ACM SIGKDD Explorations,Newsletter, 2004( 6) : 80-89.
[27]Wasikowski M,Chen X W. Combating the small sample class imbalance problem using feature selection[J]. IEEE Transaction on Knowledge and Data Engineering,2010, 22( 10) : 1 388-1 400.
[28]Shahib A A,Breitling R,Gilbert D. Feature selection and the class imbalance problem in predicting protein function from sequence [J]. Applied Bioinformatics,2005( 4) : 195-203.
[29]Byeon B,Rasheed K. Selection of classifier and feature selection method for microarray data[C]/ / 2010 Ninth International Conference on Machine Learning and Applications. Washington,DC: IEEE Computer Society,2010: 534-539.

Memo

Memo:
-
Last Update: 2013-03-11