|Table of Contents|

Privacy Preserving Feature Selection in Distributed Environment(PDF)


Research Field:
Publishing date:


Privacy Preserving Feature Selection in Distributed Environment
Wan WenqiangZhang Lingwei
College of Computer,Nanjing University of Posts and Telecommunications,Nanjing 210003,China
privacy preservingfeature selectiondistributiondifferential privacyprincipal component analysis
Privacy preserving and feature selection are very important in data mining. Thus,how to select feature effectively based on privacy preserving is also a hot topic. Under the Map-Reduce distributed environment framework,proposed was the combination of the differential privacy and principal component analysis with the statistics including entropy, misclassification gain,and gini index,a new privacy preserving feature selection algorithm on distributed environment. The algorithm achieved the purposes of protecting privacy of both data sets and features. The simulation results on several bench-mark data sets indicated that this algorithm performed well. During the selection of the important features, it could protect privacy information to a certain extent.


[1]边肇祺,张学工. 模式识别[M]. 2 版. 北京: 清华大学出版社, 2000. Bian Zhaoqi,Zhang Xuegong. Pattern Recognition[M]. 2nd ed. Beijing: Tsinghua University Press,2000. ( in Chinese)
[2]Dash M,Liu H. Feature selection for classification[J]. Intelligent Data Analysis,1997,1( 3) : 131-156.
[3]Liu H,Yu L. Toward integrating feature selection algorithms for classification and clustering[J]. IEEE Transactions on Knowledge and Data Engineering,2005,17( 3) : 1-12.
[4]Guyon I,Elisseeff A. An introduction to variable and feature selection[J]. Journal of Machine Learning Research,2003( 3) : 1 157-1 182.
[5]O’Leary D E. Knowledge Discovery as a Threat to Database Security Knowledge Discovery in Database[M]. Menloprk,CA: AAAI /MIF Press,1991: 507-516.
[6]Sweeney L. K-anonymity: a model for protecting privacy[J]. International Journal on Uncertainty,Fuzziness and Knowledgebased Systems,2002,10( 5) : 557-570.
[7]Clifton C,Kantarcioglu M,Vaidya J,et al. Tools for privacy preserving distributed data mining[J]. ACM SIGKDD Explorations Newsl,2002,4( 2) : 28-34.
[8]Dwork C. Differential privacy[C]/ / Proc of the 33rd ICALP. Venice,2006.
[9]葛新景,朱建明. 基于博弈论的隐私保护分布式数据挖掘[J]. 计算机科学, 2011, 38( 11) : 161-166.Ge Xinjing,Zhu Jianming. Privacy preserving distributed data mining based on game theory[J]. Computer Science,2011,38 ( 11) : 161-166. ( in Chinese)
[10]Das K. Privacy preserving distributed data mining based on multi-objective optimization and algorithmic game theory[D]. Baltimore: University of Maryland Baltimore County,2009.
[11]Das K,Bhaduri K,Kargupta H. A local asynchronous distributed privacy preserving feature selection algorithm for large peerto- peer networks[J]. Knowledge Information System,2010, 24( 3) : 341-367.
[12]Dwork C. Differential privacy: a survey of results[C]/ / The 5th Annual Conference on Theory and Applications of Models of Computation. Xi’an,2008.
[13]Dwork C,McSherry F,Nissim K,et al. Calibrating noise to sensitivity in private data analysis[C]/ / Proceedings of the 3rd Theory of Cryptography Conference. New York,2006: 265-284.
[14]Ding C,He Xiaofeng. Principal component analysis and effective K-means clustering[C]/ / Proceedings of the 4th SIAM International Conference on Data Mining. Orlando,2004.
[15]何晓群. 多元统计分析[M]. 北京: 中国人民大学出版社,2004. He Xiaoqun. Multivariate Statistical Analysis[M]. Beijing: China Renmin University Press,2004. ( in Chinese)
[16]Mao K Z. Identifying critical variables of principal components for unsupervised feature selection[J]. IEEE Trans Systems, Man,and Cybernetics-part B: Cybernetics,2005,35( 2) : 339-344.
[17]Avidan S,Butman M. Efficient methods for privacy preserving face detection[C]/ / NIPS 2006. Vancouver,2006: 57-64.
[18]Jia B,Wlodarczyk T,Rong C. Performance considerations of data acquisition in hadoop system[C]/ / The 2nd IEEE International Conference on Cloud Computing Technology and Science. Indianapolis,2010: 545-549.
[19]Gunarathne T,Wu T L,Qiu J,et al. MapReduce in the clouds for science[C]/ / The 2nd IEEE International Conference on Cloud Computing Technology and Science. Indianapolis,2010: 565-572.
[20]刘鹏. 云计算[M]. 2 版,北京: 电子工业出版社, 2011. Liu Peng. Cloud Computing[M]. 2nd ed. Beijing: Electronic Industry Press,2011. ( in Chinese) [21]Newman D J,Hettich S,Black C L,et al. UCI Machine Learning Repository[EB/OL]. [2012-07-10]. http: / /archive. ics. uci. edu /ml /datasets. html.


Last Update: 2013-03-11