[1]苗建新,吉根林,朱颖雯,等.基于闭合频繁Induced子树的GML文档结构聚类[J].南京师范大学学报(工程技术版),2009,09(02):061-64.
 Miao Jianxin,Ji Genlin,Zhu Yingwen.Clustering GML Documents by Structure Based on Closed Frequent Induced Subtrees[J].Journal of Nanjing Normal University(Engineering and Technology),2009,09(02):061-64.
点击复制

基于闭合频繁Induced子树的GML文档结构聚类
分享到:

南京师范大学学报(工程技术版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
09卷
期数:
2009年02期
页码:
061-64
栏目:
出版日期:
2009-06-30

文章信息/Info

Title:
Clustering GML Documents by Structure Based on Closed Frequent Induced Subtrees
作者:
苗建新;吉根林;朱颖雯;
南京师范大学计算机科学与技术学院, 江苏南京210097
Author(s):
Miao JianxinJi GenlinZhu Yingwen
School of Computer Sciences,Nanjing Normal University,Nanjing 210097,China
关键词:
闭合频繁Induced子树 GML结构聚类 聚类
Keywords:
c lo sed frequent induced sub tree c luste ring GML by struc ture cluster ing
分类号:
TP311.13
摘要:
提出了一种GML文档结构聚类新算法MCF-CLU.与其它相关算法不同,该算法基于闭合频繁Induced子树进行聚类,聚类过程中不需树之间的两两相似度比较,而是挖掘GML文档数据库的闭合频繁Induced子树,为每个文档求一个闭合频繁Induced子树作为该文档的代表树,将具有相同代表树的文档聚为一类.聚类过程中自动生成簇的个数,为每个簇形成聚类描述,而且能够发现孤立点.实验结果表明算法MCF-CLU是有效的,且性能优于其它同类算法.
Abstract:
Th is paper presents an a lgo rithm MCF- CLU fo r c luster ing GML docum ents by structure. Different from o ther a lgo rithm s, it goes on c luste ring based on the c losed frequent induced subtrees, and doesn t’ need comparing the sim ilar-i ty be tw een trees. The closed frequent induced sub trees o f a ll the GML do cum ents are com puted. The representa tive closed frequent induced sub tree o f every document is obta ined. The docum ents wh ich have the sam e representative tree a re regarded as a c luster. During the c lustering process, no t on ly the num ber o f clusters can be obta ined autom a tica lly, bu t the descr iption of the c luste rs can be ach ieved. By the way, the iso la ted po ints of the docum ents can be found. The experim ental resu lts show thatMCF- CLU is effec tive, and tha t its perfo rm ance is super io r to those o f o ther GML c lustering a lgo rithm s.

参考文献/References:

[ 1] Chaw athe S S. Comparing h ie rarch ical data in ex terna lm em o ry[ C ] / / Proceed ing s o f the VLDB Conference. San Franc isco:
M o rgan Kaufmann Pub lishe rs Inc, 1999: 90-101.
[ 2] De Francesca F, Gordano G, Orta le R, et a .l A genera l fram ewo rk fo rXML docum en t cluster ing[ R]. ICAR-CNR( Consig lio
Naz iona le de lle R icerche Istitu to d i Ca lco lo eRe ti ad A lte Prestazion i), 2003.
[ 3] LianW, Cheung D W, M amou lis, et a .l An effic ient and sca lable a lgo rithm fo r cluster ing XML docum en ts by structure[ J].
IEEE Transactions on Know ledge and Data Eng inee ring, 2004, 16( 1): 82-96.
[ 4] Guha S, Rastog i R, Shim K. ROCK: a robust cluster ing algorithm fo r categor ica l a ttr ibu tes[ C] / / Pro ceedings o f ICDE99( Internationa
l Con ference on Data Eng inee ring). Los A lam ito s: IEEE Com pute r Society, 1999: 512-521.
[ 5] Dalam agas T, Cheng T, W inke lK, et a.l C lustering XML documents using structura l summ ar ies[ C] / / Cu rrentT rends in Database
Techno logy-EDBT 2004W orkshops. Be rlin: Spr inge r, 2004: 547-556.
[ 6] Ch iY, X ia Y, Y angY, et a.l M in ing c losed and m ax ima l frequent subtrees from da tabases o f labeled rooted trees[ J]. IEEE
T ransactions on Know ledge and Data Eng ineer ing, 2005, 17( 2): 190-202.

备注/Memo

备注/Memo:
基金项目: 国家自然科学基金( 40771163)资助项目.
通讯联系人: 吉根林, 教授, 博士生导师, 研究方向: 数据挖掘及应用技术、XML技术. E-m ailE-m ail:glj@i njnu. edu. cn
更新日期/Last Update: 2013-04-23