|Table of Contents|

Clustering GML Documents by Structure Based on Closed Frequent Induced Subtrees(PDF)

南京师范大学学报(工程技术版)[ISSN:1006-6977/CN:61-1281/TN]

Issue:
2009年02期
Page:
61-64
Research Field:
Publishing date:

Info

Title:
Clustering GML Documents by Structure Based on Closed Frequent Induced Subtrees
Author(s):
Miao JianxinJi GenlinZhu Yingwen
School of Computer Sciences,Nanjing Normal University,Nanjing 210097,China
Keywords:
c lo sed frequent induced sub tree c luste ring GML by struc ture cluster ing
PACS:
TP311.13
DOI:
-
Abstract:
Th is paper presents an a lgo rithm MCF- CLU fo r c luster ing GML docum ents by structure. Different from o ther a lgo rithm s, it goes on c luste ring based on the c losed frequent induced subtrees, and doesn t’ need comparing the sim ilar-i ty be tw een trees. The closed frequent induced sub trees o f a ll the GML do cum ents are com puted. The representa tive closed frequent induced sub tree o f every document is obta ined. The docum ents wh ich have the sam e representative tree a re regarded as a c luster. During the c lustering process, no t on ly the num ber o f clusters can be obta ined autom a tica lly, bu t the descr iption of the c luste rs can be ach ieved. By the way, the iso la ted po ints of the docum ents can be found. The experim ental resu lts show thatMCF- CLU is effec tive, and tha t its perfo rm ance is super io r to those o f o ther GML c lustering a lgo rithm s.

References:

[ 1] Chaw athe S S. Comparing h ie rarch ical data in ex terna lm em o ry[ C ] / / Proceed ing s o f the VLDB Conference. San Franc isco:
M o rgan Kaufmann Pub lishe rs Inc, 1999: 90-101.
[ 2] De Francesca F, Gordano G, Orta le R, et a .l A genera l fram ewo rk fo rXML docum en t cluster ing[ R]. ICAR-CNR( Consig lio
Naz iona le de lle R icerche Istitu to d i Ca lco lo eRe ti ad A lte Prestazion i), 2003.
[ 3] LianW, Cheung D W, M amou lis, et a .l An effic ient and sca lable a lgo rithm fo r cluster ing XML docum en ts by structure[ J].
IEEE Transactions on Know ledge and Data Eng inee ring, 2004, 16( 1): 82-96.
[ 4] Guha S, Rastog i R, Shim K. ROCK: a robust cluster ing algorithm fo r categor ica l a ttr ibu tes[ C] / / Pro ceedings o f ICDE99( Internationa
l Con ference on Data Eng inee ring). Los A lam ito s: IEEE Com pute r Society, 1999: 512-521.
[ 5] Dalam agas T, Cheng T, W inke lK, et a.l C lustering XML documents using structura l summ ar ies[ C] / / Cu rrentT rends in Database
Techno logy-EDBT 2004W orkshops. Be rlin: Spr inge r, 2004: 547-556.
[ 6] Ch iY, X ia Y, Y angY, et a.l M in ing c losed and m ax ima l frequent subtrees from da tabases o f labeled rooted trees[ J]. IEEE
T ransactions on Know ledge and Data Eng ineer ing, 2005, 17( 2): 190-202.

Memo

Memo:
-
Last Update: 2013-04-23