[1]贺 胜,曲维光,卢亚军.CLUCENE在语料库建设中的应用[J].南京师范大学学报(工程技术版),2008,08(04):118-122.
 He Sheng,Qu Weiguang,Lu Yajun.Applying CLUCENE in Corpus Building[J].Journal of Nanjing Normal University(Engineering and Technology),2008,08(04):118-122.
点击复制

CLUCENE在语料库建设中的应用
分享到:

南京师范大学学报(工程技术版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
08卷
期数:
2008年04期
页码:
118-122
栏目:
出版日期:
2008-12-30

文章信息/Info

Title:
Applying CLUCENE in Corpus Building
作者:
贺 胜1 曲维光2 卢亚军3
1. 南京师范大学文学院, 江苏南京210097; 2. 南京师范大学数学与计算机科学学院, 江苏南京210097;
3. 西北民族大学藏语言文化学院, 甘肃兰州730030
Author(s):
He Sheng1Qu Weiguang2Lu Yajun3
1.School of Chinese Language and Literature,Nanjing Normal University,Nanjing 210097,China;2.School of Mathematics and Computer Science,Nanjing Normal University,Nanjing 210097,China;3.School of Tibetan Language and Culture,Northwest University for Nation
关键词:
C lucen e 语料库 语料库建设
Keywords:
C lucene corpus co rpus building
分类号:
TP391.1
摘要:
深入分析了现有语料库的构建模式和语料库应具备的功能模块,提出基于文件系统和Clucene全文检索引擎工具包的语料库建设方案.实验证明,Clucene具有丰富的接口设计和良好的扩展性,为语料库建设提供了一种较好的技术实现方式.
Abstract:
Th is paper exam ines deeply the constructed m ode ls o f the current co rpus bu ild ing design and the functions co rpus should have. A new corpus design based on file system and C lucene full text sea rching eng ine packag e is proposed. Experim ents show tha t C lucene prov ides va rious types o f inter faces and can be easily extended for large quantity data. These characteristics m ake the package a prom is ing p la tform for corpus build ing.

参考文献/References:

[ 1] 何婷婷. 语料库的数据管理方式研究[ C] / /第一届学生计算语言学研讨会论文集. 北京: 清华大学出版社, 2002: 307-310.
H e T ing ting. Study on data m anagement of corpus[ C ] / / Pro ceedings 1st Studen tsWo rkshop on Com puta tiona l Linguistics.Beijing: TsinghuaUn iversity Press, 2002: 307-310. ( in Ch inese)
[ 2] 金天荣. 文档数据库与关系数据库研究[ J]. 微计算机信息, 2008( 3): 173-174.
Jin T ianrong. Research on the document database and re lationship database[ J] . M icrocompu ter Informa tion, 2008( 3): 173-174. ( in Chinese)
[ 3] 傅爱平. 语料库研究与应用综述[ DB /OL]. [ 2007-10-22] . http: / /cc .l pku. edu. cn /doub tfire /CorpusL ingu istics/ Introduction/FuA ip ing- Co rpus- introduction. pd.fBo A iping. S tudy and app lication summ ar ization of corpus[ DB /OL]. [ 2007-10-22]. http: / /cc.l pku. edu. cn / doubtfire/CorpusLinguistics/Introduc tion /FuA ip ing- Corpus- introduc tion. pd.f ( in Ch inese)
[ 4] 贺胜. 面向大规模语料库的全文检索系统研究[ J] . 图书与情报, 2008( 4): 93-97
H e Sheng. Resea rch o f fu l-l text retr ieva l system for la rge-scale co rpus[ J]. Library& Inform ation, 2008( 4): 93-97. ( in Chinese)
[ 5] 贺胜. 基于Lucene的中文全文检索系统[ J]. 中国高校科技与产业化, 2007( 6): 142-144
H e Sheng. Ch inese fu l-l tex t retrieva l system based on Lucene[ J]. Ch inese Un iversity Techno logy Transfer, 2007( 6 ): 142-144. ( in Ch inese)
[ 6] C lucene- a C + + Search Eng ine[ EB /OL]. [ 2007-10-12]. http: / /sourceforge. net/projects /c lucene.

备注/Memo

备注/Memo:
基金项目:江苏省社会科学基金(07YYB003、06JSBYY001);国家自然科学基金(60773173);国家社会科学基金(07BYY050);国家社会科学基金2005重点项目(05AYY001);国家“973”计划(2004CB318102)资助项目
通讯联系人: 贺 胜, 讲师, 博士生, 研究方向: 中文信息处理. E-m ail: h esheng99@ s ina. com
更新日期/Last Update: 2013-04-24