[1]朱学芳,韩占校.一种图像主题网络爬虫的实现方法研究[J].南京师范大学学报(工程技术版),2008,08(04):115-117.
 Zhu Xuefang,Han Zhanxiao.Design and Implementation of a Web Crawler for Images[J].Journal of Nanjing Normal University(Engineering and Technology),2008,08(04):115-117.
点击复制

一种图像主题网络爬虫的实现方法研究
分享到:

南京师范大学学报(工程技术版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
08卷
期数:
2008年04期
页码:
115-117
栏目:
出版日期:
2008-12-30

文章信息/Info

Title:
Design and Implementation of a Web Crawler for Images
作者:
朱学芳;韩占校;
南京大学信息管理系, 江苏南京210093
Author(s):
Zhu XuefangHan Zhanxiao
Department of Information Management,Nanjing University,Nanjing 210093,China
关键词:
链接锚文本链接上下文 网络爬虫 JXTA 主题爬虫
Keywords:
anchor text link-contentW eb craw ler JXTA top ica l craw ler
分类号:
TP393.092
摘要:
针对一种图像主题爬虫进行了设计研究,采用了基于文字内容的启发式方法,实现了借助图像文件的锚文本及其上下文进行主题相关性判定,能更准确的抓取相关图像资源.还对网页实现了主题相关性判定,以便更有效地引导爬虫的爬行路经.经实验证明,本系统可起到一定的优化效果,为实现定向主题的图像信息采集奠定了良好的基础.
Abstract:
An approach of a w eb craw ler for im ages is designed and imp lemented in th is paper. An elicitation m ethod based on tex t content is adopted, and the de term ination o f topic co rre la tion is rea lized w ith the help o f the ancho r tex t o f im age files and the ir contex ts, to snatch at resources of re levant im ages mo re accura tely. The pape r a lso carr ies out the determ ination o f topic correlation of im ages so as to pilot m ore e ffectively the craw ling path of the craw lers. Expe rim en ts prov e tha t the system has a certa in e ffect o f optim ization, and lays a good foundation of rea lizing the co llection o f im age inform ation of d irectiona l topics

参考文献/References:

[ 1] De Bra P, H ouben G, Kornatzky Y, et a .l Inform ation retriev al in distributed hypertex ts[ C ] / /Pro c of the 4th RIAO Conference.New York, 1994: 481-491.
[ 2] 刘金红, 陆余良. 主题网络爬虫研究综述[ J]. 计算机应用研究, 2007, 24( 10): 26-29, 47.
Liu Jinhong, Lu Yuliang . Survey on top ic- focusedW eb craw le r[ J]. App lication Research of Com pute rs, 2007, 24( 10): 26-29, 47. ( in Ch inese)
[ 3] Chakrabarti S, Punera K, Subram anyam M. A cce lera ted fo cused craw ling through online relevance feedback[ C]. Proc of the
11 th Internationa lW o rldW ideW eb Conference. H aw a i:i [ s. n. ], 2002.
[ 4] 张磊, 林坤辉, 周昌乐, 等. 基于图像内容检索的主题爬虫设计方法[ J]. 广西师范大学学报: 自然科学版, 2007, 25( 2): 182-185.
Zhang Le,i L in Kunhu,i Zhou Chang le, et a.l Design m e thod of theme craw ler o f conten t based im age re trieval[ J]. Journal o f Guangx iNorm a lUn ive rsity: Natura l Sc ience Edition, 2007, 25( 2): 182-185. ( in Chinese)
[ 5] Br in S, Page L. The anatom y o f a large-sca le hypertex tua lW eb search Eng ine [ C] . Proc the 7thW or ldW ideW eb Conference,[ s. n. ] , 1998: 146-164.
[ 6] Lucene [ EB /OL] . http: / / lucene. apache. org /, 2008. 7. 21.

备注/Memo

备注/Memo:
通讯联系人: 朱学芳, 教授, 博士, 研究方向: 计算机图像/信号处理、模式识别、信息检索自动化理论与技术等. E-m ail:x fzhu@ n ju. edu. cn
更新日期/Last Update: 2013-04-24