[1]孙争艳,陈 磊,魏苏波,等.基于边界信息和词汇信息增强的中文命名实体识别[J].南京师范大学学报(工程技术版),2024,24(04):079-86.[doi:10.3969/j.issn.1672-1292.2024.04.008]
 Sun Zhengyan,Chen Lei,Wei Subo,et al.Named Entity Recognition Based on Boundary Information and Word Information Enhancement[J].Journal of Nanjing Normal University(Engineering and Technology),2024,24(04):079-86.[doi:10.3969/j.issn.1672-1292.2024.04.008]
点击复制

基于边界信息和词汇信息增强的中文命名实体识别
分享到:

南京师范大学学报(工程技术版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
24卷
期数:
2024年04期
页码:
079-86
栏目:
计算机科学与技术
出版日期:
2024-12-15

文章信息/Info

Title:
Named Entity Recognition Based on Boundary Information and Word Information Enhancement
文章编号:
1672-1292(2024)04-0079-08
作者:
孙争艳1陈 磊1魏苏波2陈宝国1
(1.淮南师范学院计算机学院,安徽 淮南 232038)
(2.上海大学计算机工程与科学学院,上海 200444)
Author(s):
Sun Zhengyan1Chen Lei1Wei Subo2Chen Baoguo1
(1.College of Computer,Huainan Normal University,Huainan 232038,China)
(2.School of Computer Engineering and Science,Shanghai University,Shanghai 200444,China)
关键词:
命名实体识别位置信息类别描述信息多层次文本特征
Keywords:
named entity recognitionlocation informationcategory description informationmulti-level text features
分类号:
TP391
DOI:
10.3969/j.issn.1672-1292.2024.04.008
文献标志码:
A
摘要:
中文命名实体识别(named entity recognition,NER)是一种提取实体对的自然语言处理(natural language processing,NLP)技术,广泛应用于知识图构建和信息提取任务中. 传统的中文NER方法主要强调字符信息的分析,而忽略了位置和单词特征等重要方面,阻碍了实体边界的准确识别. 引入了一种增强的中文命名实体识别模型,该模型高度重视边界和单词信息,以实现实体边界的精确校准. 首先,构建多层次文本特征作为模型的输入. 然后,提出了融合位置信息和类别描述信息的策略,以增强语义表示能力. 最后,使用条件随机场模型将增强的特征向量映射到序列标签输出,以准确提取所有实体和类别标签. 模型在现有数据集OntoNotes、Resume和Weibo上,F1得分分别提高了0.82%、0.78%和1.51%,验证了模型的有效性.
Abstract:
Chinese named entity recognition(NER)is a natural language processing(NLP)technology that extracts entity pairs,which is widely used in knowledge graph construction and information extraction tasks. The traditional Chinese NER method mainly emphasizes character-level analysis,but ignores important aspects such as location and word features,which hinders the accurate identification of entity boundaries. This paper introduces an enhanced Chinese NER model that places a heightened emphasis on both boundary and word information to enable the precise calibration of entity boundaries. Firstly,multi-level text features are constructed as the input of the model. Then,the strategy of integrating location information and category description information is proposed to enhance the semantic representation ability. Finally,the conditional random field(CRF)model is used to map the enhanced feature vector to the serialized label output to accurately extract all entity and category labels. The efficacy of the proposed model is underscored by empirical evidence,revealing advancements in the F1 score by increments of 0.82%,0.78%,and 1.51% on the existing datasets OntoNotes,Resum and Weibo,respectively.

参考文献/References:

[1]刘浏,王东波. 命名实体识别研究综述[J]. 情报学报,2018,37(3):329-340.
[2]COLLINS M,SINGER Y. Unsupervised models for named entity classification[C]//Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. College Park,MD,USA,1999.
[3]CUCERZAN S,YAROWSKY D. Language independent named entity recognition combining morphological and contextual evidence[C]//Empirical Methods in Natural Language Processing. 1999.
[4]LI Y,SONG L,ZHANG C. Sparse conditional hidden Markov model for weakly supervised named entity recognition[C]//Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York,NY,US:Association for Computing Machinery,2022:978-988.
[5]LIU P,GUO Y M,WANG F L,et al. Chinese named entity recognition:The state of the art[J]. Neurocomputing,2022,473:37-53.
[6]AN Y,XIA X Y,CHEN X L,et al. Chinese clinical named entity recognition via multi-head self-attention based BiLSTM-CRF[J]. Artificial Intelligence in Medicine,2022,127:102282.
[7]GOVINDARAJAN S,MUSTAFA M A,KIYOSOV S,et al. An optimization based feature extraction and machine learning techniques for named entity identification[J]. Optik,2023,272:170348.
[8]LIU Y X,WANG L,SHI T F,et al. Detection of spam reviews through a hierarchical attention architecture with N-gram CNN and Bi-LSTM[J]. Information Systems,2022,103:101865.
[9]ELANGOVAN A,LI Y,PIRES D E V,et al. Large-scale protein-protein post-translational modification extraction with distant supervision and confidence calibrated BioBERT[J]. BMC Bioinformatics,2022,23(4):1-23.
[10]CHEN M J,LUO X,SHEN H L,et al. A novel named entity recognition scheme for steel e-commerce platforms using a lite BERT[J]. Computer Modeling in Engineering & Sciences,2021,129(1):47-63.
[11]孙振,李新福. 多特征融合的中文电子病历命名实体识别[J]. 计算机工程与应用,2023,59(23):1-10.
[12]雷松泽,刘博,王瑜菲,等. 结合多特征嵌入和多网络融合的中文医疗命名实体识别[J]. 电子与信息学报,2023,45(8):1-8.
[13]韩晓凯,岳颀,褚晶,等. 基于注意力增强的点阵Transformer的中文命名实体识别方法[J]. 厦门大学学报(自然科学版),2022,61(6):1062-1071.
[14]崔少国,陈俊桦,李晓虹. 融合语义及边界信息的中文电子病历命名实体识别[J]. 电子科技大学学报,2022,51(4):565-571.
[15]宋旭晖,于洪涛,李邵梅. 基于图注意力网络字词融合的中文命名实体识别[J]. 计算机工程,2022,48(10):298-305.
[16]CHEN C,KONG F. Enhancing entity boundary detection for better chinese named entity recognition[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Online,2021:20-25.
[17]GUI T,MA R T,ZHANG Q,et al. CNN-Based Chinese NER with lexicon rethinking[C]//Twenty-eighth International Joint Conference on Artificial Intelligence. Macao,China,2019:4982-4988.
[18]梁兵涛,倪云峰. 基于集成学习的中文命名实体识别方法[J]. 南京师大学报(自然科学版),2022,45(3):123-131.
[19]吴炳潮,邓成龙,关贝,等. 动态迁移实体块信息的跨领域中文实体识别模型[J]. 软件学报,2022,33(10):3776-3792.
[20]孔令巍,朱艳辉,张旭,等. 基于对抗训练的中文电子病历命名实体识别[J]. 湖南工业大学学报,2022,36(3):36-43.
[21]ZHANG Y,YANG J. Chinese NER using lattice LSTM[J]. arXiv Preprint arXiv:1805.02023,2018.
[22]PENG D L,WANG Y R,LIU C,et al. TL-NER:A transfer learning model for Chinese named entity recognition[J]. Information Systems Frontiers,2020,22(6):1291-1304.
[23]ZHU P,CHENG D W,YANG F Z,et al. Improving Chinese named entity recognition by large-scale syntactic dependency graph[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing,2022,30:979-991.
[24]CHEN T Y,HU Y M. Entity relation extraction from electronic medical records based on improved annotation rules and BiLSTM-CRF[J]. Annals of Translational Medicine,2021,9(18):1415.
[25]ZHU Y Y,WANG G X. CAN-NER:Convolutional attention network for Chinese named entity recognition[J]. arXiv Preprint arXiv:1904.02141,2020.
[26]石春丹,秦岭. 基于BGRU-CRF的中文命名实体识别方法[J]. 计算机科学,2019,46(9):237-242.
[27]LI J Y,FEI H,LIU J,et al. Unified named entity recognition as word-word relation classification[J]. Proceedings of the AAAI Conference on Artificial Intelligence. 2022,36(10):10965-10973.

相似文献/References:

[1]陈 杰,奚雪峰,皮 洲,等.基于ALBERT的中文医疗病历命名实体识别[J].南京师范大学学报(工程技术版),2021,21(01):036.[doi:10.3969/j.issn.1672-1292.2021.01.006]
 Chen Jie,Xi Xuefeng,Pi Zhou,et al.ALBERT-Based Named Entity Recognition of Chinese Medical Records[J].Journal of Nanjing Normal University(Engineering and Technology),2021,21(04):036.[doi:10.3969/j.issn.1672-1292.2021.01.006]

备注/Memo

备注/Memo:
收稿日期:2024-05-12.
基金项目:安徽省科研计划编制项目重点项目(2024AH051731)、国家重点实验室开放基金项目(COGOS-2023HE02)、淮南市指导性科技计划项目(4302)、淮南师范学院校级专项重点(基础教育)项目(2023XJZD025).
通讯作者:陈磊,硕士,教授,研究方向:数据挖掘、实体识别. Email:leichen@hnnu.edu.cn
更新日期/Last Update: 2024-12-15