[1]王晨曦,刘园奎,吕 彦,等.基于邻域决策误差率的层次分类在线流特征选择[J].南京师范大学学报(工程技术版),2022,22(04):009-18.[doi:10.3969/j.issn.1672-1292.2022.04.002]
 Wang Chenxi,Liu Yuankui,Lv Yan,et al.Online Hierarchical Streaming Feature Selection Based on Neighborhood Decision Error Rate[J].Journal of Nanjing Normal University(Engineering and Technology),2022,22(04):009-18.[doi:10.3969/j.issn.1672-1292.2022.04.002]
点击复制

基于邻域决策误差率的层次分类在线流特征选择
分享到:

南京师范大学学报(工程技术版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
22卷
期数:
2022年04期
页码:
009-18
栏目:
计算机科学与技术
出版日期:
2022-12-15

文章信息/Info

Title:
Online Hierarchical Streaming Feature Selection Based on Neighborhood Decision Error Rate
文章编号:
1672-1292(2022)04-0009-10
作者:
王晨曦12刘园奎12吕 彦12林耀进12
(1.闽南师范大学计算机学院,福建 漳州 363000)
(2.闽南师范大学数据科学与智能应用福建省高等学校重点实验室,福建 漳州 363000)
Author(s):
Wang Chenxi12Liu Yuankui12Lv Yan12Lin Yaojin12
(1.School of Computer Science,Minnan Normal University,Zhangzhou 363000,China)
(2.Key Laboratory of Data Science and Intelligence Application,Minnan Normal University,Zhangzhou 363000,China)
关键词:
在线流特征选择层次分类兄弟关系邻域决策误差率
Keywords:
online streaming feature selectionhierarchical classificationsibling relationshipsneighborhood decision error rate
分类号:
TP18
DOI:
10.3969/j.issn.1672-1292.2022.04.002
文献标志码:
A
摘要:
在实际应用领域中,存在许多特征空间无法预先给定的场景,数据以特征流的形式随时间动态流入特征空间,而样本数量是固定不变的. 同时,数据的类别中往往存在丰富的层次化结构关系,传统的特征选择算法在性能上已无法满足需求. 基于此,本文提出一种面向层次分类学习的在线流特征选择算法. 首先,利用兄弟节点之间的关系设计了一种基于最大近邻的决策误差率计算公式. 其次,设计在线重要性选择和在线冗余更新两种在线评估准则,用于选择决策误差最小的特征子集. 最后,在6个层次数据集上的实验结果表明,所提算法优于一些现有的在线流特征选择算法.
Abstract:
In many practical application fields,there are numerous scenes in which the entire feature space cannot be available in advance,candidate features flow into the feature space dynamically over time,and the number of samples is fixed. At the same time,there exists a hierarchical structure relationship between classes,and traditional feature selection methods cannot be able to meet the demand. Based on these,an online streaming feature selection algorithm for hierarchical classification learning is presented. Firstly,a decision error rate calculation formula is designed on the basis of the largest nearest neighbor according to sibling relationships. Secondly,two online evaluation criteria of online significance selection and online relevance analysis are proposed to select features with minimum decision error. Finally,experimental results on six hierarchical datasets manifest that the proposed algorithm is better than some existing online streaming feature selection algorithms.

参考文献/References:

[1]胡清华,王煜,周玉灿,等. 大规模分类任务的分层学习方法综述[J]. 中国科学:信息科学,2018,48(5):7-20.
[2]赵红. 面向层次结构数据的特征选择方法[D]. 天津:天津大学,2019.
[3]FREEMAN C,KULIC D,BASIR O. Joint feature selection and hierarchical classifier design[C]//Proceedings of 2011 IEEE International Conference on Systems,Man and Cybernetics. Anchorage,USA:IEEE,2011.
[4]SONG J,ZHANG P Z,QIN S J,et al. A method of the feature selection in hierarchical text classification based on the category discrimination and position information[C]//Proceedings of 2015 International Conference on Industrial Informatics-Computing Technology,Intelligent Technology,Industrial Information Integration. Wuhan,China:ICIICII,2015.
[5]PAN S R,WU J,ZHU X Q. Cogboost:boosting for fast cost-sensitive graph classification[J]. IEEE Transactions on Knowledge & Data Engineering,2015,27(11):2933-2946.
[6]ZHAO H,ZHU P F,WANG P,et al. Hierarchical feature selection with recursive regularization[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence. Melbourne,Australia:AAAI Press,2017:3483-3489.
[7]ZHOU J,FOSTER D P,STINE R A,et al. Streamwise feature selection[J]. Journal of Machine Learning Research,2006,7(1):1861-1885.
[8]YU K,WU X D,DING W,et al. Scalable and accurate online feature selection for big data[J]. ACM Transactions on Knowledge Discovery from Data,2016,11(2):1-39.
[9]LIN Y J,HU Q H,LIU J H,et al. Streaming feature selection for multi-label learning based on fuzzy mutual information[J]. IEEE Transactions on Fuzzy Systems,2017,25(6):1491-1507.
[10]LIU J H,LIN Y J,WU S X,et al. Online multi-label group feature selection[J]. Knowledge-Based Systems,2018,143:42-57.
[11]LI H G,WU X D,LI Z,et al. Group feature selection with streaming features[C]//Proceedings of 2013 IEEE 13th International Conference on Data Mining. Dallas,USA:IEEE,2013.
[12]HU Q H,PEDRYCZ W,YU D R,et al. Selecting discrete and continuous features based on neighborhood decision error minimization[J]. IEEE Transactions on Systems Man & Cybernetics(Part B),2010,40(1):137-150.
[13]EISNER R,POULIN B,SZAFRON D,et al. Improving protein function prediction using the hierarchical structure of the gene ontology[C]//Procceedings of IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology. La Jolla,USA:IEEE,2005.
[14]CECI M,MALERBA D. Classifying web documents in a hierarchy of categories:a comprehensive study[J]. Journal of Intelligent Information Systems,2007,28(1):37-78.
[15]WU X D,YU K,DING W,et al. Online feature selection with streaming features[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(5):1178-1192.
[16]ZHOU P,HU X G,LI P P,et al. OFS-Density:A novel online streaming feature selection method[J]. Pattern Recognition,2019,86:48-61.
[17]ZHOU P,HU X G,LI P P,et al. Online feature selection for high-dimensional class-imbalanced data[J]. Knowledge-Based Systems,2017,136:187-199.
[18]ZHOU P,HU X G,LI P P. A new online feature selection method using neighborhood rough set[C]//Proceedings of 2017 IEEE International Conference on Big Knowledge. Hefei,China:IEEE,2017.

备注/Memo

备注/Memo:
收稿日期:2022-08-08.
基金项目:国家自然科学基金项目(62076116)、福建省自然科学基金项目(2020J01811、2020J01792和2021J02049).
通讯作者:林耀进,博士,教授,研究方向:数据挖掘、粒计算. E-mail:zzlinyaojin@163.com
更新日期/Last Update: 2022-12-15