[1]孙良君,范剑锋,杨琬琪,等.基于Group Lasso的多源电信数据离网用户分析[J].南京师范大学学报(工程技术版),2014,14(04):077.
 Sun Liangjun,Fan Jianfeng,Yang Wanqi,et al.Group Lasso-Based Feature Selection for Off-networkAnalysis in Multisource Teledata[J].Journal of Nanjing Normal University(Engineering and Technology),2014,14(04):077.
点击复制

基于Group Lasso的多源电信数据离网用户分析
分享到:

南京师范大学学报(工程技术版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
14卷
期数:
2014年04期
页码:
077
栏目:
出版日期:
2014-12-31

文章信息/Info

Title:
Group Lasso-Based Feature Selection for Off-networkAnalysis in Multisource Teledata
作者:
孙良君1范剑锋2杨琬琪2史颖欢2高 阳2周新民3
(1.中博信息技术研究院有限公司,江苏 南京 210012)(2.南京大学计算机软件新技术国家重点实验室,江苏 南京 210046)(3.江苏省公安厅物证鉴定中心,江苏 南京 210024)
Author(s):
Sun Liangjun1Fan Jianfeng2Yang Wanqi2Shi Yinghuan2Gao Yang2Zhou Xinmin3
(1.Zhongbo Information Technology Research Institute Company,Nanjing 210012,China)(2.State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210046,China)(3.Forensic Center of Jiangsu Province Public Security Bureau,Nanjing 210024,
关键词:
电信企业客户流失多源数据特征选择Group Lasso
Keywords:
telecom companiescustomer churnmultisource datafeature selectionGroup Lasso
分类号:
TP181
文献标志码:
A
摘要:
随着行业竞争愈演愈烈,电信企业的客户流失情况越来越严重,给电信企业造成了巨大损失.通过电信企业的数据来做离网用户的预测,从而进一步作出挽留客户的正确决策,成为电信企业日益关注的问题.面对电信后台汇总的多源数据,经分析发现其呈现天然的组结构.为了选择对于离网类别最具判别性的特征,本文使用了一种基于Group Lasso的组特征选择方法,在此基础上用交叉验证法选择适当的特征组,最终将选择出的少量组特征用于预测离网和停机的宽带用户.实验表明,在江苏某地级市电信离网用户分析数据中取得了比其他特征选择方法的精度平均高至少10%的预测性能.
Abstract:
With the intensified competition in the industry,customer churn analysis is becoming one of the most significant tasks for the telecom companies,which might lead great financial loss to them.Thus,using the data to predict potential off-network customers and then making business decisions to retain these customers,have drawn lots of attention nowadays.In this paper,we present a Group Lasso-based feature selection method to predict the latent off-network customers by analyzing the corresponding multisource teledata.Specifically,we utilize the cross-validation strategy to choose the optimal sets of feature groups.Extensive experiment results show that the proposed approach has the superior performance(the Precision value is 10% higher than the other methods)on a real telecom dataset derived by a certain city in a prefectural city of Jiangsu.

参考文献/References:

[1] 王雷,陈松林,顾学道.客户流失预警模型及其在电信企业的应用[J].电信科学,2006,22(9):47-51.
Wang Lei,Chen Songlin,Gu Xuedao.Analysis and application for national telecoms of customer churn alarm models[J].Telecommunication Science,2006,22(9):47-51.(in Chinese)
[2]田玲,邱会中,郑莉华.基于神经网络的电信客户流失预测主题建模及实现[J].计算机应用,2007,27(9):2 294-2 297.
Tian Ling,Qiu Huizhong,Zheng Lihua.Telecom churn prediction modeling and application based on neural network[J].Journal of Computer Application,2007,27(9):2 294-2 297.(in Chinese)
[3]Richter Y,Yom-Tov E,Slonim N.Predicting customer churn in mobile networks through analysis of social groups[C]//Proceedings of SIAM International Conference on Data Mining.Columbus,2010:732-741.
[4]Idris Adnan,Asifullah Khan,YeonSoo Lee.Intelligent churn prediction in telecom:employing mRMR feature selection and RotBoost based ensemble classification[J].Applied Intelligence,2013,39(3):659-672.
[5]Guyon,Isabelle,André Elisseeff.An introduction to variable and feature selection[J].The Journal of Machine Learning Research,2003(3):1 157-1 182.
[6]Cong Y,Yuan J S,Liu J.Sparse reconstruction cost for abnormal event detection[C]//Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition.IEEE Conference Society,2011:3 449-3 456.
[7]R Tibshirani.Regression shrinkage and selection via the lasso[J].Journal of the Royal Statistical Society:Series B(Methodological),1996,58(1):267-288.
[8]Fan Jianqing,Li Runze.Variable selection via nonconcave penalized likelihood and its oracle properties[J].Journal of the American Statistical Association,2001,96(456):1 348-1 360.
[9]Robert Tibshirani,Michael Saunders.Sparsity and smoothness via the fused lasso[J].Journal of the Royal Statistical Society:Statistical Methodology,2005,67(1):91-108.
[10]Zou Hui.The adaptive lasso and its oracle properties[J].Journal of the American Statistical Association,2006,101(476):1 418-1 429.
[11]Lukas Meier,Sara Van De Geer,Peter Bühlmann.The group lasso for logistic regression[J].Journal of the Royal Statistical Society:Statistical Methodology,2008,70(1):53-71.
[12]Yuan Ming,Lin Yi.Model selection and estimation in regression with grouped variables[J].Journal of the Royal Statistical Society:Statistical Methodology,2006,68(1):49-67.
[13]Francis R Bach.Consistency of the group lasso and multiple kernel learning[J].The Journal of Machine Learning Research,2008(9):1 179-1 225.
[14]Jerome Friedman,Trevor Hastie,Robert Tibshirani.A note on the group lasso and a sparse group lasso[J/OL].[2014-07-20]http://arxiv.org/abs/1001.0736.
[15]Beck,Amir,Marc Teboulle.A fast iterative shrinkage-thresholding algorithm for linear inverse problems[J].SIAM Journal on Imaging Sciences,2009,2(1):183-202.
[16]Liu Jun,Ji Shuiwang,Ye Jieping.SLEP:Sparse Learning with Efficient Projections[M].TEMPE:Arizona State University,2009.
[17]Yu L,Liu H.Feature selection for high-dimensional data:a fast correlation-based filter solution[C]//Proceedings of the Twentieth International Conference on Marchine Learning(ICML 2003).Washington DC,2003.

备注/Memo

备注/Memo:
收稿日期:2014-07-20.
基金项目:国家自然科学基金(61035003、61175042、61021062、61305068)、江苏省科技厅项目(BK2011005、BK20130581)、新世纪人才项目(NCET-10-0476)、江苏省医疗专项(BL2013033)、江苏省高校研究生科研创新计划项目(CXZZ13_0055).
通讯联系人:高阳,教授,博士生导师,研究方向:强化学习、智能Agent、智能应用.E-mail:gaoy@nju.edu.cn
更新日期/Last Update: 2014-12-31