[1]万家华.基于mixtureLDA的微博主题挖掘[J].南京师范大学学报(工程技术版),2017,17(01):080.[doi:10.3969/j.issn.1672-1292.2017.01.012]
 Wan Jiahua.Topic Model Based on MixtureLDA in Microblog Platform[J].Journal of Nanjing Normal University(Engineering and Technology),2017,17(01):080.[doi:10.3969/j.issn.1672-1292.2017.01.012]
点击复制

基于mixtureLDA的微博主题挖掘
分享到:

南京师范大学学报(工程技术版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
17卷
期数:
2017年01期
页码:
080
栏目:
计算机工程
出版日期:
2017-03-30

文章信息/Info

Title:
Topic Model Based on MixtureLDA in Microblog Platform
文章编号:
1672-1292(2017)01-0080-06
作者:
万家华
安徽新华学院信息工程学院,安徽 合肥 230088
Author(s):
Wan Jiahua
Institute of Information Engineering,Anhui Xinhua University,Hefei 230088,China
关键词:
微博主题挖掘微博类型mixtureLDA
Keywords:
Microblogtopic modelmicroblog typesmixtureLDA
分类号:
TP391.6
DOI:
10.3969/j.issn.1672-1292.2017.01.012
文献标志码:
A
摘要:
针对目前的主题挖掘只考虑主题内容的概率分布方法,本文提出一种综合考虑内容、时间等因素的微博主题挖掘模型mixtureLDA. 该模型能够分析用户不同类型微博的主题概率分布和时间微博主题概率. 实验使用新浪微博数据集,结果表明基于mixtureLDA的微博主题挖掘模型能够有效地挖掘出用户微博和时间微博的主题概率分布. 与MB-LDA、userLDA模型对比,mixtureLDA模型可有效降低困惑度.
Abstract:
Current studies on the topic model provide little discussion on time factor,but only focus on content factor. In this paper,a comprehensive content-considered and time-considered topic model,mixtureLDA is presented. Through the model one can obtain different kinds of user microblogs and time microblogs of topic probability distribution. The statistic data derived from Sina Weibo are applied as a case study. The results show that the topic model based on mixtureLDA provides more reliable topic probability distribution of user microblogs and time microblogs. Compared with MB-LDA and user LDA methods,the perplexity value from mixtureLDA method is lower,which means that it is more effective.

参考文献/References:

[1] GUO Z,LI Z,TU H. Sina microblog:an information-driven online social network[C]//2011 International Conference on Cyberworlds(CW). Calgary,Canada,2011:160-167.
[2]SHEN Y,LI S,ZHENG L,et al. Emotion mining research on microblog[C]//1st IEEE Symposium on Web Society. Lanzhou,China:IEEE,2009:71-75.
[3]NALLAPATI R,COHEN W W. Link-plsa-lda:a new unsupervised model for topics and influence of blogs[C]//Proceedings of ICWSM. Washington DC,USA,2008:84-92.
[4]王永贵,张旭,任俊阳,等. 结合微博关注特性的UF_AT模型用户兴趣挖掘研究[J]. 计算机应用研究,2015,32(7):1 982-1 985.WANG Y G,ZHANG X,REN J Y,et al. Research on micro-blog user’s interest mining based on UF_AT model which combining with focusing feature of microblog[J]. Application research of computers,2015,32(7):1 982-1 985.(in Chinese)
[5]STEYVERS M,GRIFFITHS T. Probabilistic topic models[J]. Handbook of latent semantic analysis,2007,427(7):424-440.
[6]BLEI D M,NG A Y,JORDAN M I. Latent dirichlet allocation[J]. Journal of machine learning research,2003(3):993-1 022.
[7]LAFFERTY J D,BLEI D M. Correlated topic models[J]. Advances in neural information processing systems,2005,18:147-154.
[8]DAVID M B,JOHN D. Lafferty:dynamic topic models[C]//Proceedings of ICML 2006. Pittsburgh,USA,2006:113-120.
[9]MATTHEW D H,DAVID M B,FRANCIS R B. Online learning for latent dirichlet allocation[C]//Proceedings of NIPS 2010. Vancouver,Canada,2010:856-864.
[10]PAK A,PAROUBEK P. Twitter as a corpus for sentiment analysis and opinionmining[C]//Proceedings of LREC 2010. Valletta,Malta,2010:1 320-1 326.
[11]LIU Z,YU W,CHEN W,et al. Short text feature selection for microblog mining[C]//2010 International Conference on Computational Intelligence and Software Engineering. Wuhan,China,2010:1-4.
[12]ZHANG Y,WU Y,YANG Q. Community discovery in twitter based on userinterests[J]. Journal of computational information systems,2012,8(3):991-1 000.
[13]LI W,SUN L,FENG Y,et al. Smoothing lda model for text categorization[C]//Proceedings of AIRS 2008. Harbin,China,2008:83-94.
[14]ANDRé P,BERNSTEIN M,LUTHER K. Who gives a tweet?evaluating microblogcontent value[C]//Proceedings of CSCW 2012. New York,USA,2012:471-474.
[15]RAMAGE D,DUMAIS S T,LIEBLING D J. Characterizing microblogs with topic models[C]//Proceedings of ICWSM 2010. Washington DC,USA,2010:130-137.
[16]ZHANG C,SUN J. Large scale microblog mining using distributed mb-lda[C]//Proceedings of IW3C2 2012. Lyon,France,2012:1 035-1 042.
[17]ZHAO W X,JIANG J,WENG J S,et al. Comparing twitter and traditional media using topic models[C]//Proceedings of ECIR 2011. Dublin,Ireland,2011:338-349.
[18]DIAO Q M,JIANG J,ZHU F D,et al. Finding bursty topics from microblogs[C]//Proceedings of ACL 2012. Jeju,Korea,2012:536-544.
[19]YIN H,CUI B,LU H,et al. A unified model for stable and temporal topic detection from social media data[C]//2013 IEEE 29th International Conference on Data Engineering(ICDE). Brisbane,Australia,2013:661-672.
[20]陶永才,何宗真,石磊,等. 基于加权动态兴趣度的微博个性化推荐[J]. 计算机应用,2014,34(12):3 491-3 496.
TAO Y C,HE Z Z,SHI L,et al. Personalized microblogging recommendation based on weighted dynamic degree of interest[J]. Journal of computer applications,2014,34(12):3 491-3 496.(in Chinese)
[21]GRIFFITHS T. Gibbs sampling in the generative model of latent dirichlet allocation[R]. Palo Alto:Standford University,2002.

备注/Memo

备注/Memo:
收稿日期:2016-08-08.
基金项目:安徽省高校自然科学重点项目(KJ2014A100).
通讯联系人:万家华,讲师,研究方向:数据挖掘、Web信息处理. E-mail:349826355@qq.com
更新日期/Last Update: 1900-01-01