[1]万文强,张伶卫.分布式环境下的隐私保护特征选择研究[J].南京师范大学学报(工程技术版),2012,12(03):060-67.
 Wan Wenqiang,Zhang Lingwei.Privacy Preserving Feature Selection in Distributed Environment[J].Journal of Nanjing Normal University(Engineering and Technology),2012,12(03):060-67.
点击复制

分布式环境下的隐私保护特征选择研究
分享到:

南京师范大学学报(工程技术版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
12卷
期数:
2012年03期
页码:
060-67
栏目:
出版日期:
2012-09-20

文章信息/Info

Title:
Privacy Preserving Feature Selection in Distributed Environment
作者:
万文强;张伶卫;
南京邮电大学计算机学院,江苏南京210003
Author(s):
Wan WenqiangZhang Lingwei
College of Computer,Nanjing University of Posts and Telecommunications,Nanjing 210003,China
关键词:
隐私保护特征选择分布式微分隐私主成分分析
Keywords:
privacy preservingfeature selectiondistributiondifferential privacyprincipal component analysis
分类号:
TP309
摘要:
在Map-Reduce的分布式环境框架下,基于微分隐私与主成分分析,并与熵、误分类增益、基尼指数等统计量相结合,提出了一种新的在分布式环境下的隐私保护特征选择算法,实现了在保护数据集隐私的同时保护特征的隐私.仿真实验结果表明,该算法具有较好的性能,能够在保护一定程度隐私信息的同时,有效地进行特征选择.
Abstract:
Privacy preserving and feature selection are very important in data mining. Thus,how to select feature effectively based on privacy preserving is also a hot topic. Under the Map-Reduce distributed environment framework,proposed was the combination of the differential privacy and principal component analysis with the statistics including entropy, misclassification gain,and gini index,a new privacy preserving feature selection algorithm on distributed environment. The algorithm achieved the purposes of protecting privacy of both data sets and features. The simulation results on several bench-mark data sets indicated that this algorithm performed well. During the selection of the important features, it could protect privacy information to a certain extent.

参考文献/References:

[1]边肇祺,张学工. 模式识别[M]. 2 版. 北京: 清华大学出版社, 2000. Bian Zhaoqi,Zhang Xuegong. Pattern Recognition[M]. 2nd ed. Beijing: Tsinghua University Press,2000. ( in Chinese)
[2]Dash M,Liu H. Feature selection for classification[J]. Intelligent Data Analysis,1997,1( 3) : 131-156.
[3]Liu H,Yu L. Toward integrating feature selection algorithms for classification and clustering[J]. IEEE Transactions on Knowledge and Data Engineering,2005,17( 3) : 1-12.
[4]Guyon I,Elisseeff A. An introduction to variable and feature selection[J]. Journal of Machine Learning Research,2003( 3) : 1 157-1 182.
[5]O’Leary D E. Knowledge Discovery as a Threat to Database Security Knowledge Discovery in Database[M]. Menloprk,CA: AAAI /MIF Press,1991: 507-516.
[6]Sweeney L. K-anonymity: a model for protecting privacy[J]. International Journal on Uncertainty,Fuzziness and Knowledgebased Systems,2002,10( 5) : 557-570.
[7]Clifton C,Kantarcioglu M,Vaidya J,et al. Tools for privacy preserving distributed data mining[J]. ACM SIGKDD Explorations Newsl,2002,4( 2) : 28-34.
[8]Dwork C. Differential privacy[C]/ / Proc of the 33rd ICALP. Venice,2006.
[9]葛新景,朱建明. 基于博弈论的隐私保护分布式数据挖掘[J]. 计算机科学, 2011, 38( 11) : 161-166.Ge Xinjing,Zhu Jianming. Privacy preserving distributed data mining based on game theory[J]. Computer Science,2011,38 ( 11) : 161-166. ( in Chinese)
[10]Das K. Privacy preserving distributed data mining based on multi-objective optimization and algorithmic game theory[D]. Baltimore: University of Maryland Baltimore County,2009.
[11]Das K,Bhaduri K,Kargupta H. A local asynchronous distributed privacy preserving feature selection algorithm for large peerto- peer networks[J]. Knowledge Information System,2010, 24( 3) : 341-367.
[12]Dwork C. Differential privacy: a survey of results[C]/ / The 5th Annual Conference on Theory and Applications of Models of Computation. Xi’an,2008.
[13]Dwork C,McSherry F,Nissim K,et al. Calibrating noise to sensitivity in private data analysis[C]/ / Proceedings of the 3rd Theory of Cryptography Conference. New York,2006: 265-284.
[14]Ding C,He Xiaofeng. Principal component analysis and effective K-means clustering[C]/ / Proceedings of the 4th SIAM International Conference on Data Mining. Orlando,2004.
[15]何晓群. 多元统计分析[M]. 北京: 中国人民大学出版社,2004. He Xiaoqun. Multivariate Statistical Analysis[M]. Beijing: China Renmin University Press,2004. ( in Chinese)
[16]Mao K Z. Identifying critical variables of principal components for unsupervised feature selection[J]. IEEE Trans Systems, Man,and Cybernetics-part B: Cybernetics,2005,35( 2) : 339-344.
[17]Avidan S,Butman M. Efficient methods for privacy preserving face detection[C]/ / NIPS 2006. Vancouver,2006: 57-64.
[18]Jia B,Wlodarczyk T,Rong C. Performance considerations of data acquisition in hadoop system[C]/ / The 2nd IEEE International Conference on Cloud Computing Technology and Science. Indianapolis,2010: 545-549.
[19]Gunarathne T,Wu T L,Qiu J,et al. MapReduce in the clouds for science[C]/ / The 2nd IEEE International Conference on Cloud Computing Technology and Science. Indianapolis,2010: 565-572.
[20]刘鹏. 云计算[M]. 2 版,北京: 电子工业出版社, 2011. Liu Peng. Cloud Computing[M]. 2nd ed. Beijing: Electronic Industry Press,2011. ( in Chinese) [21]Newman D J,Hettich S,Black C L,et al. UCI Machine Learning Repository[EB/OL]. [2012-07-10]. http: / /archive. ics. uci. edu /ml /datasets. html.

相似文献/References:

[1]杨杨,吕静.高维数据的特征选择研究[J].南京师范大学学报(工程技术版),2012,12(01):057.
 Yang Yang,Lü Jing.Some Studies on Feature Selection for High Dimensional Data[J].Journal of Nanjing Normal University(Engineering and Technology),2012,12(03):057.
[2]杨杨,刘会东.一种基于成对约束的特征选择改进算法[J].南京师范大学学报(工程技术版),2011,11(01):056.
 Yang Yang,Liu Huidong.An Improved Algorithm for Feature Selection Based on Pairwise Constraint[J].Journal of Nanjing Normal University(Engineering and Technology),2011,11(03):056.
[3]凌霄汉,吉根林.一种基于聚类集成的无监督特征选择方法[J].南京师范大学学报(工程技术版),2007,07(03):060.
 Ling Xiaohan,Ji Genlin.A Clustering Ensemble Based Unsupervised Feature Selection Approach[J].Journal of Nanjing Normal University(Engineering and Technology),2007,07(03):060.
[4]孙良君,范剑锋,杨琬琪,等.基于Group Lasso的多源电信数据离网用户分析[J].南京师范大学学报(工程技术版),2014,14(04):077.
 Sun Liangjun,Fan Jianfeng,Yang Wanqi,et al.Group Lasso-Based Feature Selection for Off-networkAnalysis in Multisource Teledata[J].Journal of Nanjing Normal University(Engineering and Technology),2014,14(03):077.
[5]宗 影,李玉凤,刘红玉.基于面向对象随机森林方法的滨海湿地植被分类研究[J].南京师范大学学报(工程技术版),2021,21(04):047.[doi:10.3969/j.issn.1672-1292.2021.04.008]
 Zong Ying,Li Yufeng,Liu Hongyu.A Study of Coastal Wetland Vegetation ClassificationBased on Object-oriented Random Forest Method[J].Journal of Nanjing Normal University(Engineering and Technology),2021,21(03):047.[doi:10.3969/j.issn.1672-1292.2021.04.008]
[6]潘思远,刘园奎,毛 煜,等.基于邻域决策误差率的多标记特征选择[J].南京师范大学学报(工程技术版),2023,23(01):066.[doi:10.3969/j.issn.1672-1292.2023.01.009]
 Pan Siyuan,Liu Yuankui,Mao Yu,et al.Multi-Label Feature Selection Based on Neighborhood Approximation Error Rate[J].Journal of Nanjing Normal University(Engineering and Technology),2023,23(03):066.[doi:10.3969/j.issn.1672-1292.2023.01.009]

备注/Memo

备注/Memo:
基金项目: 国家自然科学基金( 61073114) .通讯联系人: 万文强,硕士,研究方向: 数据挖掘与机器学习. E-mail: aiaiyouchou@163. com
更新日期/Last Update: 2013-03-11