Clustering GML Documents by Structure Based on Closed Frequent Induced Subtrees
Miao JianxinJi GenlinZhu Yingwen
School of Computer Sciences,Nanjing Normal University,Nanjing 210097,China
c lo sed frequent induced sub tree c luste ring GML by struc ture cluster ing
Th is paper presents an a lgo rithm MCF- CLU fo r c luster ing GML docum ents by structure. Different from o ther a lgo rithm s, it goes on c luste ring based on the c losed frequent induced subtrees, and doesn t’ need comparing the sim ilar-i ty be tw een trees. The closed frequent induced sub trees o f a ll the GML do cum ents are com puted. The representa tive closed frequent induced sub tree o f every document is obta ined. The docum ents wh ich have the sam e representative tree a re regarded as a c luster. During the c lustering process, no t on ly the num ber o f clusters can be obta ined autom a tica lly, bu t the descr iption of the c luste rs can be ach ieved. By the way, the iso la ted po ints of the docum ents can be found. The experim ental resu lts show thatMCF- CLU is effec tive, and tha t its perfo rm ance is super io r to those o f o ther GML c lustering a lgo rithm s.


