[1]胡豪杰,陈辉,穆婷婷,等.基于外点检测的加权k-means算法[J].南京师范大学学报(工程技术版),2022,22(01):075-80.[doi:10.3969/j.issn.1672-1292.2022.01.011]
 Hu Haojie,Chen Hui,Mu Tingting,et al.Weighted k-means Algorithm Based on Outlier Detection[J].Journal of Nanjing Normal University(Engineering and Technology),2022,22(01):075-80.[doi:10.3969/j.issn.1672-1292.2022.01.011]
点击复制

基于外点检测的加权k-means算法
分享到:

南京师范大学学报(工程技术版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
22卷
期数:
2022年01期
页码:
075-80
栏目:
机器学习
出版日期:
2022-03-15

文章信息/Info

Title:
Weighted k-means Algorithm Based on Outlier Detection
文章编号:
1672-1292(2022)01-0075-06
作者:
胡豪杰1陈辉2穆婷婷3姚敏立1何 芳1张峰干1
(1.火箭军工程大学,陕西 西安 710025)(2.中国航天科技集团有限公司第四研究院,陕西 西安 710025)(3.北京新时代环球进出口有限公司,北京 100027)
Author(s):
Hu Haojie1Chen Hui2Mu Tingting3Yao Minli1He Fang1Zhang Fenggan1
(1.Rocket Force Engineering University,Xi’an 710025,China)(2.The Fourth Academy of China Aerospace Science and Technology Corporation,Xi’an 710025,China)(3.Beijing New Era Global Import and Export Co.,Ltd.,Beijing 100027,China)
关键词:
聚类k-means外点检测0-norm
Keywords:
clusteringk-meansoutlier detection0-norm
分类号:
TP391
DOI:
10.3969/j.issn.1672-1292.2022.01.011
文献标志码:
A
摘要:
为解决k-means聚类算法中异常样本点破坏数据分布,致使簇中心发生较大偏差的问题,通过计算样本点与潜在簇中心的距离赋予样本点不同的权重,降低外点对数据分布的影响,并通过对权重向量施加0-norm范数在聚类模型中自适应移除外点. 采用交替最小化优化算法求解模型,在人工合成数据集和真实数据集上的实验表明,所提模型能有效降低外点对聚类的影响,可得到更有效的聚类效果.
Abstract:
In this paper,to solve the problem of that few outliers can easily destroy the cluster structure,leading to a significant deviation for the obtained centroids in k-means clustering algorithm,we assign different weights on the data points based on their distance from the potential cluster center to alleviate the negative impact on the data structure. Moreover,we also incorporate outlier detection in our clustering model by imposing 0-norm constraint on weight assignments. To optimize the model,we introduce an efficient alternating minimization algorithm. Extensive experiments on both synthetic and real datasets show the effectiveness of the proposed model.

参考文献/References:

[1] JAIN A K,DUBES R C. Algorithms for clustering data[M]. New Jersey,USA:Prentice-Hall,1988:227-229.
[2]吉珊珊. 基于神经网络树和人工蜂群优化的数据聚类[J].南京师大学报(自然科学版),2021,44(1):119-127.
[3]CAMPOS R,DIAS G,JORGE A M,et al. Survey of temporal information retrieval and related applications[J]. ACM Computing Surveys,2014,47(2):1-41.
[4]CAI X,NIE F P,HUANG H. Multi-view k-means clustering on big data[C]//Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence. Beijing:IJCAI,2013:2598-2604.
[5]NIE F P,HUANG H,CAI X,et al. Efficient and robust feature selection via joint 2,1-norms minimization[C]//Proceedings of the 23rd International Conference on Neural Information Processing Systems. Vancouver,Canada,2010.
[6]HUANG S,REN Y,XU Z. Robust multi-view data clustering with multi-view capped-norm K-means[J]. Neurocomputing,2018,311:197-208.
[7]袁小翠,刘宝玲,马永力. 基于空间邻域连通区域标记法的点云离群点检测[J].计算机应用研究,2020,37(增刊2):380-382,385.
[8]BEER A,LAUTERBACH J,SEIDL T. MORe++:k-means based outlier removal on high-dimensional data[C]//Proceedings of the 12th International Conference on Similarity Search and Applications. Newark,USA:Springer,2019:188-202.
[9]HAUTAM?KI V,CHEREDNICHE-NKO S,K?RKK?INEN I,et al. Improving K-means by outlier removal[C]//Proceedings of the 14th Scandinavian Conference on Image Analysis. Joensuu,Finland:Springer-Verlag,2005:978-987.
[10]AHMED M,NASER A. A novel approach for outlier detection and clustering improvement[C]//Proceedings of the 2013 IEEE 8th Conference on Industrial Electronics & Applications. Melbourne,Australia:IEEE,2013.
[11]WHANG J J,DHILLON I S,GLEICH D F. Non-exhaustive,overlapping k-means[M]//Proceedings of the 2015 SIAM International Conference on Data Mining. Vancouver,Canada:SIAM,2015.
[12]GAN G J,NG M K P. k-means clustering with outlier removal[J]. Pattern Recognition Letters,2017,90:8-14.
[13]LIU H F,LI J,WU Y,et al. Clustering with outlier removal[J]. IEEE Transactions on Knowledge and Data Engineering,2019. DOI:10.1109/TKDE.2019.2954317.
[14]许振,吉根林,唐梦梦.基于聚类的兴趣区域间异常轨迹并行检测算法[J].南京师大学报(自然科学版),2019,42(1):59-64.

相似文献/References:

[1]郭建军,梁敬东,牛又奇,等.约束聚类算法研究[J].南京师范大学学报(工程技术版),2008,08(04):128.
 Guo Jianjun,Liang Jingdong,Niu Youqi.Research on Algorithms of the Constrained Clustering[J].Journal of Nanjing Normal University(Engineering and Technology),2008,08(01):128.
[2]马宝萍.一种新的分级混合聚类法[J].南京师范大学学报(工程技术版),2003,03(01):022.
 Ma Baoping.A New Hierarchical Hybrid Clustering Method[J].Journal of Nanjing Normal University(Engineering and Technology),2003,03(01):022.

备注/Memo

备注/Memo:
收稿日期:2021-08-31.
通讯作者:何芳,博士,讲师,研究方向:机器学习. E-mail:fanghe1107@gmail.com
更新日期/Last Update: 2022-03-15