[1]陈 杰,奚雪峰,皮 洲,等.基于ALBERT的中文医疗病历命名实体识别[J].南京师范大学学报(工程技术版),2021,(01):036-43.[doi:10.3969/j.issn.1672-1292.2021.01.006]
 Chen Jie,Xi Xuefeng,Pi Zhou,et al.ALBERT-Based Named Entity Recognition of Chinese Medical Records[J].Journal of Nanjing Normal University(Engineering and Technology),2021,(01):036-43.[doi:10.3969/j.issn.1672-1292.2021.01.006]
点击复制

基于ALBERT的中文医疗病历命名实体识别
分享到:

南京师范大学学报(工程技术版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2021年01期
页码:
036-43
栏目:
计算机科学与技术
出版日期:
2021-03-15

文章信息/Info

Title:
ALBERT-Based Named Entity Recognition of Chinese Medical Records
文章编号:
1672-1292(2021)01-0036-08
作者:
陈 杰1奚雪峰12皮 洲1盛胜利3崔志明12
(1.苏州科技大学电子与信息工程学院,江苏 苏州 215009)(2.苏州智慧城市研究院,江苏 苏州 215009)(3.Computer Science Department,Texas Tech University,Texas 79431,USA)
Author(s):
Chen Jie1Xi Xuefeng12Pi Zhou1Victor S Sheng3Cui Zhiming12
(1.School of Electronic and Computer Engineering,Suzhou University of Science and Technology,Suzhou 215009,China)(2.Suzhou Smart City Research Institute,Suzhou 215009,China)(3.Computer Science Department,Texas Tech University,Texas 79431,USA)
关键词:
ALBERT命名实体识别电子医疗病历双向长短记忆网络条件随机场
Keywords:
ALBERTnamed entity recognitionclinical electronic medical recordsBiLSTMCRF
分类号:
TP181
DOI:
10.3969/j.issn.1672-1292.2021.01.006
文献标志码:
A
摘要:
医疗病历命名实体识别的主要任务是将临床电子病历中的非结构化文本转化为结构化数据,进而为面向医疗领域任务开展的数据挖掘提供基础支撑. 提出一种基于ALBERT模型融合学习的中文医疗病历命名实体识别模型. 首先,采用人工标注方式扩展样本数据集,结合ALBERT模型对数据集进行微调; 其次,采用双向长短记忆网络(BiLSTM)提取文本的全局特征; 最后,基于条件随机场模型(CRF)命名实体的序列标记. 在标准数据集上的实验结果表明,该方法进一步提高了医疗文本命名识别精度,减少了时间开销.
Abstract:
The main task of named entity recognition on medical record is to convert unstructured text into structured data,and then provide an important fundamental support for data mining for medical field tasks. This paper proposes a named entity recognition method for Chinese medical records based on ALBERT and fusion model. Firstly,we use manual labeling to expand the sample dataset,and fine-tune the dataset in conjunction with the ALBERT. Secondly,the Bi-directional Long Short-Term Memory(BiLSTM)is used to extract the global features of the text. Finally,on the basis of the conditional random field model(CRF),sequence tags for named entities are made. The experimental results on the standard dataset show that the proposed method further improves the accuracy of name entity recognition on medical text and greatly reduces the time overhead.

参考文献/References:

[1] BIKEL D M,SCHWARTA R,WEISCHEDEL R M. An algorithm that learns what’s in a name[J]. Machine Learning,1999,34(1/2/3):211-231.
[2]LIAO W H,VEERAMACHANENI S. A simple semi-supervised algorithm for named entity recognition[C]//The Proceedings of NAACL HLT 2009. Boulder,USA:ASL,2009:58-65.
[3]RATINOV L,ROTH D. Design challenges and misconceptions in named entity recognition[C]//Proceedings of the Thirteenth Conference on Computational Natural Language Learning(CoNLL-2009). Boulder,USA:ASL,2009:147-155.
[4]TSAI T H,WU S H,LEE C W,et al. Mencius:a Chinese named entity recognizer using the maximum entropy-based hybrid model[J]. International Journal of Computational Linguistics and Chinese Language Processing,2004,9(1):65-82.
[5]陈钰枫,宗成庆,苏克毅. 汉英双语命名实体识别与对齐的交互式方法[J]. 计算机学报,2011,34(9):1688-1696.
[6]张海楠,伍大勇,刘悦,等. 基于深度神经网络的中文命名实体识别[J]. 中文信息学报,2017,31(4):28-35.
[7]杨锦锋,关毅,何彬,等. 中文电子病历命名实体和实体关系语料库构建[J]. 软件学报,2016,27(11):2725-2746.
[8]YOUNG T,HAZARIKA D,PORIA S,et al. Recent trends in deep learning based natural language processing[J]. IEEE Computational Intelligence Magazine,2018,13(3):55-75.
[9]ASAHARA M,MATSUMOTO Y. Japanese named entity extraction with redundant morphological analysis[C]//Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. Association for Computational Linguistics. Sapporo,Japan:ACL,2003:8-15.
[10]CHEN A,PENG F,SHAN R,et al. Chinese named entity recognition with conditional probabilistic models[C]//Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing. Sydney,Australia:ACL,2006:173-176.
[11]CHEN Y,ZHOU C J,LI T X,et al. Named entity recognition from Chinese adverse drug event reports with lexical feature based BiLSTM-CRF and tri-training[J]. Journal of Biomedical Informatics,2019,96:103252.
[12]HUANG Z H,XU W,YU K. Bidirectional LSTM-CRF models for sequence tagging[C]//ACL. Beijing,China:ACL,2015:13-16.
[13]STRUBELL E,VERGA P,BELANGER D,et al. Fast and accurate entity recognition with iterated dilated convolutions[C]//EMNLP. Copenhagen,Denmark:ACL,2017:2670-2680.
[14]LIU K X,HU Q C,LIU J W. Named entity recognition in Chinese electronic medical records based on CRF[C]//2017 14th Web Information Systems and Applications Conference(WISA). Jeju,Korea:IEEE,2017:105-110.
[15]LIU Z J,YANG M,WANG X L,et al. Entity recognition from clinical texts via recurrent neural network[J]. BMC Medical Informatics and Decision Making,2017,17:53-61.
[16]QIU J,QI W,ZHOU Y,et al. Fast and accurate recognition of Chinese clinical named entities with residual dilated convolutions[C]//2018 IEEE International Conference on Bioinformatics and Biomedicine(BIBM). Madrid,Spain:IEEE,2018:935-942.
[17]PETERS M E,NEUMANN M,IYYER M,et al. Deep contextualized word representations[C]//Proceedings of NAACL-HLT. New Orleans,USA:ACL,2018:2227-2237.
[18]DEVLIN J,CHANG M W,LEE K,et al. BERT:pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Minneapolis,USA:ACL,2019:278-286.
[19]LAN Z,CHEN M,GOODMAN S,et al. ALBERT:a lite BERT for self-supervised learning of language representations[C]//International Conference on Learning Representations. New Orleans,USA:Elsevier,2019:12-17.
[20]HOCHREITER S,SCHMIDHUBER J. Long short-termmemory[J]. Neural Computation,1997,9(8):1735-1780.
[21]LAMPLE G,BALLESTEROS M,SUBRAMANIAN S,et al. Neural architectures for named entity recognition[C]//NAACL-HLT. San Diego,USA:ACL,2016:260-270.
[22]LUO L,YANG Z,YANG P,et al. An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition[J]. Bioinformatics,2018,34(8):1381-1388.
[23]VASWANI A,SHAZEER N,PARMAR N,et al. Attention is all you need[C]//Advances in Neural Information Processing Systems. Long Beach,USA:NeurIPS,2017:6000-6010.

备注/Memo

备注/Memo:
收稿日期:2020-08-08.
基金项目:国家自然科学基金项目(61673290、61876217)、江苏省“六大人才高峰”高层次人才项目(XYDXX-086)、苏州市科技发展计划产业前瞻性项目(SYG201817)、2020年江苏省研究生科研创新计划项目(KYCX20_2762).
通讯作者:奚雪峰,副教授,研究方向:自然语言处理、高性能并行计算、面向对象技术应用. E-mail:xfxi@usts.edu.cn
更新日期/Last Update: 2021-03-15