«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j.issn.1672-1292.2023.01.009]
点击复制

基于邻域决策误差率的多标记特征选择

分享到：

南京师范大学学报（工程技术版）[ISSN:1006-6977/CN:61-1281/TN]

卷:: 23卷
期数:: 2023年01期

页码:: 066-74

栏目:: 计算机科学与技术

出版日期:: 2023-03-15

文章信息/Info

Title:: Multi-Label Feature Selection Based on Neighborhood Approximation Error Rate

文章编号:: 1672-1292(2023)01-0066-09

作者:: 潘思远¹; 2; 刘园奎¹; 2; 毛煜¹; 2; 林耀进¹; 2; (1.闽南师范大学计算机学院,福建漳州 363000) (2.闽南师范大学计算机学院数据科学与智能应用福建省高等学校重点实验室,福建漳州 363000)

Author(s):: Pan Siyuan¹; 2; Liu Yuankui¹; 2; Mao Yu¹; 2; Lin Yaojin¹; 2; (1.School of Computer Science,Minnan Normal University,Zhangzhou 363000,China) (2.Fujian Key Laboratory of Granular Computing and Application,School of Computer Science,Minnan Normal University,Zhangzhou 363000,China)

关键词:: 多标记学习; 特征选择; 邻域近似误差率

Keywords:: multi-label learning; feature selection; neighborhood approximation error rate

分类号:: O643/X703

DOI:: 10.3969/j.issn.1672-1292.2023.01.009

文献标志码:: A

摘要:: 多标记学习可以同时处理与一组标记相关的数据,多标记学习的研究对于多义性对象的学习建模具有十分重要的意义. 与传统的单标记学习一样,数据的高维性是多标记学习的阻碍,因此数据降维是一项十分重要的工作,而特征选择是一种有效的数据降维技术. 提出了基于邻域近似误差率的多标记特征选择算法. 首先,在邻域粗糙集理论的基础上,引入实例的边界来对所有实例进行粒度化. 其次,基于邻域决策误差率提出了邻域近似误差率的策略来评价特征. 最后,在公开的数据集上进行了大量的实验,结果表明所提算法的有效性.

Abstract:: Multi-label learning can process data associated with a set of labels simultaneously, and the study of multi-label learning is very important for learning modeling of polysemous objects. As with traditional single-label learning, the high dimensionality of data is an obstacle to multi-label learning, so data dimensionality reduction is a very important task, and feature selection is an effective data dimensionality reduction technique. A multi-label feature selection algorithm is proposed on the basis of the neighborhood approximation error rate. Firstly, based on the neighborhood rough set theory, the boundaries of instances are introduced to granularize all instances. Secondly, a neighborhood approximation error rate strategy is proposed to evaluate features based on the neighborhood decision error rate. Finally, extensive experiments are conducted on publicly available datasets.The results show the effectiveness of the proposed algorithm.

参考文献/References:

[1]FAKHARI A,MOGHADAM A. Combination of classification and regression in decision tree for multi-labeling image annotation and retrieval[J]. Applied Soft Computing,2013,13(2):1292-1302.
[2]GAO W,ZHOU Z H. On the consistency of multi-label learning[C]//Proceedings of the 24th Annual Conference on Learning Theory. PMLR 19:341-358,2011.
[3]GU Q,LI Z,HAN J. Correlated multi-label feature selection[C]//Proceedings of the 20th ACM International Conference on Information and Knowledge Management. Glasgow,Scotland:Association for Computing Machinery,2011.
[4]DAI J H,XU Q. Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification[J]. Applied Soft Computing,2013,13(1):211-221.
[5]LIN Y J,HU Q H,LIU J H,et al. Multi-label feature selection based on neighborhood mutual information[J]. Applied Soft Computing,2016,38:244-256.
[6]SECHIDIS K,NIKOLAOU N,BROWN G. Information theoretic feature selection in multi-label data through composite likelihood[C]//Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition(SPR)and Structural and Syntactic Pattern Recognition(SSPR). Joensuu,Finland,2014.
[7]SPOLAOR N,CHERMAN E A,MONARD M C,et al. A comparison of multi-label feature selection methods using the problem transformation approach[J]. Electronic Notes in Theoretical Computer Science,2013,292:135-151.
[8]SPOLAOR N,MONARD M C,TSOUMAKAS G,et al. Label construction for multi-label feature selection[C]//2014 Brazilian Conference on Intelligent Systems. San Carlos,Venezuela,2014.
[9]SLAVKOV I,KARCHESKA J,KOCEV D,et al. ReliefF for hierarchical multi-label classification[J]. International Workshop on New Frontiers in Mining Complex Patterns. Springer,Cham,2013:148-161.
[10]GHARROUDI,ELGHAZEL,AUSSEM. A comparison of multi-label feature selection methods using the random forest paradigm[C]//Canadian Conference on Artificial Intelligence. Montreal,QC,Canada,2014.
[11]段洁,胡清华,张灵均,等. 基于邻域粗糙集的多标记分类特征选择算法[J]. 计算机研究与发展,2015,52(1):56-65.
[12]HU Q H,PEDRYCZ W,YU D R,et al. Selecting discrete and continuous features based on neighborhood error minimization[J]. IEEE Transactions on Systems,Man and Cybernetics,Part B,2009,40(1):137-150.
[13]GAO T L,JIA X H,JIANG R,et al. SaaS service combinatorial trustworthiness measurement method based on Markov Theory and cosine similarity[J]. Security and Communication Networks,2022:7080367.
[14]陈超逸,林耀进,唐莉,等. 基于邻域交互增益信息的多标记流特征选择算法[J]. 南京大学学报(自然科学),2020,56(1):30-40.
[15]ZHANG M L,PENA J M,ROBLES V. Feature selection for multi-label naive bayes classification[J]. Information Sciences,2009,179(19):3218-3229.
[16]ZHANG Y,ZHOU Z H. Multi-label dimensionality reduction via dependence maximization[J]. ACM Transactions on Knowledge Discovery from Data,2010,4(3):1-21.
[17]LEE J,KIM D W. Feature selection for multi-label classification using multivariate mutual information[J]. Pattern Recognition Letters,2013,34(3):349-357.
[18]卢舜,林耀进,吴镒潾,等. 基于多粒度一致性邻域的多标记特征选择[J]. 南京大学学报(自然科学),2022,58(1):60-70.
[19]FRIEDMAN M. A comparison of alternative tests of significance for the problem of m rankings[J]. The Annals of Mathematical Statistics,1940,11(1):86-92.
[20]NEMENYI P B. Distribution-free multiple comparisons[M]. Princeton,State of New Jersey:Princeton University,1963.

相似文献/References:

[1]万文强,张伶卫.分布式环境下的隐私保护特征选择研究[J].南京师范大学学报(工程技术版),2012,12(03):060.
　Wan Wenqiang,Zhang Lingwei.Privacy Preserving Feature Selection in Distributed Environment[J].Journal of Nanjing Normal University(Engineering and Technology),2012,12(01):060.
[2]杨杨,吕静.高维数据的特征选择研究[J].南京师范大学学报(工程技术版),2012,12(01):057.
　Yang Yang,Lü Jing.Some Studies on Feature Selection for High Dimensional Data[J].Journal of Nanjing Normal University(Engineering and Technology),2012,12(01):057.
[3]杨杨,刘会东.一种基于成对约束的特征选择改进算法[J].南京师范大学学报(工程技术版),2011,11(01):056.
　Yang Yang,Liu Huidong.An Improved Algorithm for Feature Selection Based on Pairwise Constraint[J].Journal of Nanjing Normal University(Engineering and Technology),2011,11(01):056.
[4]凌霄汉,吉根林.一种基于聚类集成的无监督特征选择方法[J].南京师范大学学报(工程技术版),2007,07(03):060.
　Ling Xiaohan,Ji Genlin.A Clustering Ensemble Based Unsupervised Feature Selection Approach[J].Journal of Nanjing Normal University(Engineering and Technology),2007,07(01):060.
[5]孙良君,范剑锋,杨琬琪,等.基于Group Lasso的多源电信数据离网用户分析[J].南京师范大学学报(工程技术版),2014,14(04):077.
　Sun Liangjun,Fan Jianfeng,Yang Wanqi,et al.Group Lasso-Based Feature Selection for Off-networkAnalysis in Multisource Teledata[J].Journal of Nanjing Normal University(Engineering and Technology),2014,14(01):077.
[6]宗影,李玉凤,刘红玉.基于面向对象随机森林方法的滨海湿地植被分类研究[J].南京师范大学学报(工程技术版),2021,21(04):047.[doi:10.3969/j.issn.1672-1292.2021.04.008]
　Zong Ying,Li Yufeng,Liu Hongyu.A Study of Coastal Wetland Vegetation ClassificationBased on Object-oriented Random Forest Method[J].Journal of Nanjing Normal University(Engineering and Technology),2021,21(01):047.[doi:10.3969/j.issn.1672-1292.2021.04.008]

备注/Memo

备注/Memo:: 收稿日期:2022-09-15.
基金项目:国家自然科学基金项目(62076116)、福建省自然科学基金项目(2020J01811、2020J01792、2021J02049).
通讯作者:林耀进,博士,教授,研究方向:数据挖掘,粒计算. E-mail:zzlinyaojin@163.com

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed1068
全文下载/Downloads1280
评论/Comments

更新日期/Last Update: 2023-03-15