[1]吕 峰,柴变芳,李文斌,等.一种主动半监督K-means聚类算法的改进策略[J].南京师范大学学报(工程技术版),2018,18(02):056.[doi:10.3969/j.issn.1672-1292.2018.02.008]
 Lü Feng,Chai Bianfang,Li Wenbin,et al.An Improved Strategy of Active Semi-supervisionK-means Clustering Algorithm[J].Journal of Nanjing Normal University(Engineering and Technology),2018,18(02):056.[doi:10.3969/j.issn.1672-1292.2018.02.008]
点击复制

一种主动半监督K-means聚类算法的改进策略
分享到:

南京师范大学学报(工程技术版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
18卷
期数:
2018年02期
页码:
056
栏目:
计算机与信息工程
出版日期:
2018-06-30

文章信息/Info

Title:
An Improved Strategy of Active Semi-supervisionK-means Clustering Algorithm
文章编号:
1672-1292(2018)02-0056-07
作者:
吕 峰柴变芳李文斌王 垚
河北地质大学信息工程学院,河北 石家庄 050031
Author(s):
Lü FengChai BianfangLi WenbinWang Yao
School of Information Engineering,Hebei GEO University,Shijiazhuang 050031,China
关键词:
主动半监督聚类成对约束聚类改进算法
Keywords:
active semi-supervised clusteringpairwise constrained clusteringimproved algorithm
分类号:
TP181
DOI:
10.3969/j.issn.1672-1292.2018.02.008
文献标志码:
A
摘要:
经典的APCKmeans(active pairwise constrained K-means)算法通过主动学习的方式构造must-link约束集和cannot-link约束集作为监督信息进行半监督聚类,提高了结果的准确性. 但该算法在样本指派的过程中可能出现指派不是当前最优的问题. 提出一种优先指派标签样本的方法,应用于APCKmeans算法,使用改进后的APCKmeans_I算法实现了使用较少的监督信息取得更好的聚类结果. 将改进策略应用于PCKmeans(pairwise const
Abstract:
The classic APCKmeans(active pairwise constrained K-means)algorithm constructs the must-link constraint set and the cannot-link constraint set as the supervised information by Semi-Supervised Clustering through the active learning method to improve the accuracy of the results. However,the algorithm may not be assigned to the current optimal problem during the sample assignment process. This paper proposes a method of assigning label samples to APCKmeans algorithm,and proposes an improved APCKmeans_I algorithm to achieve better clustering results with less supervisory information. The improved strategy is applied to PCKmeans(pairwise constrained K-means)algorithm,and PCKmeans_I algorithm is proposed. Experiments on the UCI reference data set show that the performance of the improved algorithm is obviously improved.

参考文献/References:

[1] BASU S,BANERJEE A,MOONEY R J. Semi-supervised clustering by seeding[C]//Nineteenth International Conference on Machine Learning. San Fransisco,USA:Morgan Kaufmann Publishers Inc,2002:19-26.
[2]WAGSTAFF K,CARDIE C. Clustering with instance-level constraints[C]//Seventeenth International Conference on Machine Learning. Stanford,CA,USA,2000:1103-1110.
[3]王玲,薄列峰,焦李成. 密度敏感的半监督谱聚类[J]. 软件学报,2007,18(10):2412-2422.
WANG L,BO L F,JIAO L C. Density-sensitive semi-supervised spectral clustering[J]. Journal of software,2007,18(10):2412-2422.(in Chinese)
[4]尹学松,胡恩良,陈松灿. 基于成对约束的判别型半监督聚类分析[J]. 软件学报,2008,19(11):2791-2802.
YIN X S,HU E L,CHEN S C. Discriminative semi-supervised clustering analysis with pairwise constraints[J]. Journal of software,2008,19(11):2791-2802.(in Chinese)
[5]肖宇,于剑. 基于近邻传播算法的半监督聚类[J]. 软件学报,2008,19(11):2803-2813.
XIAO Y,YU J. Semi-supervised clustering based on affinity propagation algorithm[J]. Journal of software,2008,19(11):2803-2813.(in Chinese)
[6]方玲,陈松灿. 结合特征偏好的半监督聚类学习[J]. 计算机科学与探索,2015,9(1):105-111.
FANG L,CHEN S C. Semi-supervised clustering learning combined with feature preferences[J]. Journal of frontiers of computer science and technology,2015,9(1):105-111.(in Chinese)
[7]张俊溪,吴晓军,蒋江红. 复杂分布数据的半监督阶段聚类[J]. 计算机科学与探索,2016,10(7):1003-1009.
ZHANG J X,WU X J,JIANG J H. Semi-supervised stage clustering for complex distribution data[J]. Journal of frontiers of computer science and technology,2015,10(7):1003-1009.(in Chinese)
[8]高莹,刘大有,齐红,等. 一种半监督K均值多关系数据聚类算法[J]. 软件学报,2008,19(11):2814-2821.
GAO Y,LIU D Y,QI H,et al. Semi-supervised K-means clustering algorithm for multi-type relational data[J]. Journal of software,2008,19(11):2814-2821.(in Chinese)
[9]BASU S,BANERJEE A,MOONEY R J. Active semi-supervision for pairwise constrained clustering[C]//Proceedings of the SIAM International Conference on Data Mining. Lake Buena Vista,FL,2004:333-344.
[10]XIONG S,AZIMI J,FERN X Z. Active learning of constraints for semi-supervised clustering[J]. IEEE transactions on knowledge and data engineering,2013,26(1):43-54.
[11]GREENE D,CUNNINGHAM P. Constraint selection by committee:an ensemble approach to identifying informative constraints for semi-supervised clustering[M]//Machine Learning:ECML 2007. Berlin Heidelberg:Springer-Verlag,2007:140-151.
[12]HUANG R,LAM W. Semi-supervised document clustering via active learning with pairwise constraints[C]//IEEE International Conference on Data Mining. Omaha,Nebraska,USA:IEEE,2007:517-522.
[13]MALLAPRAGADA P K,JIN R,JAIN A K. Active query selection for semi-supervised clustering[C]//International Conference on Pattern Recognition. Anchorage,AK,USA:IEEE,2008:1-4.
[14]XU Q,DESJARDINS M,WAGSTAFF K L. Active constrained clustering by examining spectral eigenvectors[C]//International Conference on Discovery Science. Berlin Heidelberg:Springer-Verlag,2005:294-307.
[15]WU M R,SCHOLKOPF B. A local learning approach for clustering[C]//Proceedings of the Conference on Neural Information Processing Systems. Cambridge,MA,USA:MIT Press,2006:1529-1536.
[16]ASUNCION A,NEWMAN D. UCI machine learning repository[EB/OL][2014-02-18]. http://www.ics.uci.edu/~mlearn/MLRepository.html.

备注/Memo

备注/Memo:
收稿日期:2017-12-10.
基金项目:国家自然科学基金(61503260)、河北省研究生创新资助项目(CXZZSS2017131)、河北地质大学教改项目(2017J04).
通讯联系人:李文斌,博士,教授,研究方向:机器学习、复杂网络等. E-mail:25304189@qq.com
更新日期/Last Update: 2018-06-30