[1]丁德鑫曲维光徐涛,董宇.基于CRF模型的组合型歧义消解研究[J].南京师范大学学报(工程技术版),2008,08(04):073-76.
 Ding Dexin,Qu Weiguang,Xu Tao,et al.Research of Disambiguating Combinational Ambiguity in Chinese Word Segmentation Based on CRF[J].Journal of Nanjing Normal University(Engineering and Technology),2008,08(04):073-76.
点击复制

基于CRF模型的组合型歧义消解研究
分享到:

南京师范大学学报(工程技术版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
08卷
期数:
2008年04期
页码:
073-76
栏目:
出版日期:
2008-12-30

文章信息/Info

Title:
Research of Disambiguating Combinational Ambiguity in Chinese Word Segmentation Based on CRF
作者:
丁德鑫1曲维光1徐涛1;董宇2
1. 南京师范大学数学与计算机科学学院, 江苏南京210097; 2. 金陵科技学院龙蟠学院, 江苏南京211169
Author(s):
Ding Dexin1Qu Weiguang1Xu Tao1Dong Yu2
1.School of Mathematics and Computer Science,Nanjing Normal University,Nanjing 210097,China;2.Longpan School,Jinling Institute of Technology,Nanjing 211169,China
关键词:
中文自动分词 组合歧义 CRF
Keywords:
Ch inese wo rd segm entation comb inationa l amb iguity CRF
分类号:
TP311
摘要:
组合型歧义切分是汉语自动分词的难点之一.为此,利用CRF(条件随机场)模型,以歧义字段的上下文的词和词性建立特征模板,进行歧义消解研究.以1998年半年《人民日报》为语料,对常用的10个组合歧义字段进行消歧,平均消歧正确率达到96.35%,取得了良好的效果.实验表明,利用该模型能有效提高消歧正确率.
Abstract:
Com bina tiona l am bigu ity is one of the d ifficult po in ts in Ch inesew ord segm entation. B ased on theCRF ( Cond itiona l Random Fie lds) m ode,l th is pape r establishes feature tem plate by the contextual wo rds and part o f speeches o f the amb iguity w ord. 10 o ften-used am bigu ity wo rds are tested by us ing ha lf of the 1998 " People" s Da ily" co rpus, and the average accuracy is 96. 35%. The resu lt o f the exper iment revea ls that using themodel is mo re effective for d isam biguation.

参考文献/References:

[ 1] 刘开瑛, 由丽萍. 汉语框架语义知识库构建工程[ C ]. 北京: 清华大学出版社, 2006: 64-71.
Liu Ka iy ing, You L ip ing. On Chinese Fram eN et Construc tion [ C]. Be ijing: Ts inghua Un iversity Press, 2006: 64-71. ( in Chinese)
[ 2] 孙茂松, 黄昌宁, 邹嘉彦. 利用汉字二元语法关系解决汉语自动分词中的交集型歧义[ J]. 计算机研究与发展, 1997,34( 5): 332-339.
SunM aosong, Huang Changn ing, Ben jam in K Tsou. Us ing cha racte r b ig ram for am bigu ity reso lution in ch inesew ord segm entation[J]. Com puter Research and Developm ent, 1997, 34( 5): 332-339. ( in Ch inese)
[ 3] 孙茂松, 左正平. 消解中文三字长交集型分词歧义的算法[ J]. 清华大学学报, 1999, 39( 5): 101-103.
SunM aosong, Zuo Zhengp ing. A lgorithm for so lv ing 3-charac ter cross ing am b iguities in Ch inesew ord segm enta tion[ J]. Tsinghua Univ ( Sci& Tech), 39( 5): 101-103. ( in Ch inese)
[ 4] 廉竹钧. 汉语组合型切分歧义字段消歧方法研究[ D]. 北京: 北京语言文化大学, 2002.
Lian Zhu jun . A S tudy on the Disamb iguation o f Comb inator ia lAmb igu ities in Ch ineseW o rd Segm entation[ D]. Be ijing: Beijing Language and Culture University, 2002. ( in Ch inese)
[ 5] 郑家恒, 吴芳芳. 多义型歧义字段切分研究[ C ]. 北京: 清华大学出版社, 1999: 129-134.
Zhang Jiaheng, Wu Fang fang. Research onM ult-i sense Type Am biguous Phrases Segm enta tion[ C]. Be ijing: TsinghuaUn iversity Press, 1999. 129-134. ( in Chinese)
[ 6] 肖云, 孙茂松, 邹嘉彦. 利用上下文信息解决汉语自动分词中的组合型歧义[ J] . 计算机工程与应用, 2001, 37( 19):
87-81.
X iaoYun, SunM aosong, Ben jam in K Tsou. So lv ing com binato rial amb iguity in Ch inese wo rd segm entation us ing contex tual information[ J]. Computer
Engineering and App lication, 2001, 37( 19): 87-81. ( in Ch inese)
[ 7] Luo X iao, SunM aosong , Tsou B K. Cove ring am bigu ity reso lution in Chinese wo rd segm entation based on con tex tua l inform ation[C ] / / Pro ceedings of the 19th In ternational Conference on Com puta tiona l Lingu istics. Ta iw an: [ s. n. ], 2002: 598-604
[ 8] 曲维光, 吉根林, 穗志方, 等. 基于语境信息的组合型分词歧义消解方法[ J]. 计算机工程, 2006, 32( 17): 74-76.
X iaoYun, SunM aosong, Ben jam in K Tsou. So lv ing com binato rial amb iguity in Ch inese wo rd segm entation us ing contex tual information[ J]. Compu ter Eng ineer ing and App lication, 2001, 37( 19): 87-81. ( in Ch inese)
[ 9] 冯素琴, 陈惠明. 一种自组织的汉语组合型歧义消歧方法[ J]. 计算机工程与设计, 2007, 28( 3): 737-749, 742.
Feng Suq in, Chen H uim ing. AdaptiveChinese com bina to ria l am bigu ities disamb iguate m ethod[ J] . Com pute r Eng ineer ing and Des ign, 2007, 28( 3): 737-749, 742 . ( in Chinese)
[ 10] John La fferty, AndrewM cCa llum, Fem ando Pere ira. Cond itional random fie lds: Probab ilisticm odels fo r segm enting and labeling
sequence data[ C ] / / Proceed ings of the 18 th ICML. San Francisco: Mogan Koufm ann, 2001: 282-289.
[ 11] 冯素琴, 陈惠明. 基于语境信息的汉语组合型歧义消歧方法[ J]. 中文信息学报, 2007, 21( 6): 13-16, 42.
Feng Suqin, Chen H uim ing. Contex t-based approach to comb inationa l amb iguity reso lution in Chinesew ord segm entation[ J].Journal o f Chinese Inform a tion Process ing, 2007, 21( 6) : 13-16, 42. ( in Chinese)

备注/Memo

备注/Memo:
基金项目:国家自然科学基金(60773173);国家“973”计划基金(2004CB318102);江苏省社科基金(06JSBYY001、07YYB003);国家社科基金(07BYY050)资助项目
通讯联系人: 曲维光, 博士, 副教授, 研究方向: 计算语言学和人工智能. E-m ail:w gqu@ n jnu. edu. cn
更新日期/Last Update: 2013-04-24