[1]杨登辉,刘 靖.基于RBBLC模型的中文事件抽取方法[J].南京师范大学学报(工程技术版),2022,22(03):038-44,82.[doi:10.3969/j.issn.1672-1292.2022.03.006]
 Yang Denghui,Liu Jing.Chinese Event Extraction Method Based on RBBLC Model[J].Journal of Nanjing Normal University(Engineering and Technology),2022,22(03):038-44,82.[doi:10.3969/j.issn.1672-1292.2022.03.006]
点击复制

基于RBBLC模型的中文事件抽取方法
分享到:

南京师范大学学报(工程技术版)[ISSN:1006-6977/CN:61-1281/TN]

卷:
22卷
期数:
2022年03期
页码:
038-44,82
栏目:
计算机科学与技术
出版日期:
2022-09-15

文章信息/Info

Title:
Chinese Event Extraction Method Based on RBBLC Model
文章编号:
1672-1292(2022)03-0038-07
作者:
杨登辉刘 靖
(内蒙古大学计算机学院,内蒙古 呼和浩特 010021)
Author(s):
Yang DenghuiLiu Jing
(College of Computer Science,Inner Mongolia University,Hohhot 010021,China)
关键词:
事件抽取RoBERTa双向LSTM序列标注文本大数据分析
Keywords:
event extractionRoBERTabidirectional LSTMsequence taggingtext big data analysis
分类号:
TP311.5
DOI:
10.3969/j.issn.1672-1292.2022.03.006
文献标志码:
A
摘要:
在公检法、纪检监察等领域的大数据分析中,结构化数据和非结构化文本数据往往成为主要数据源. 基于这类数据进行业务分析时,需要重点提取数据背后的隐型关联,而事件抽取是对此类文本数据进行关联分析的核心基础. 过往事件抽取任务将事件触发词识别和事件要素识别分开进行,由事件触发词识别得到的事件触发词及事件类型进行后续的事件要素识别,存在误差传播的问题,且以往的基于表示的方法构建的词向量,对于句子级特征的提取能力存在缺失. 提出了一种RBBLC联合抽取模型,以序列标注的方式同时完成事件识别和事件要素识别. 所提RBBLC模型基于RoBERTa构建包含更丰富上下文信息的词向量,继而应用BiLSTM-CNN的网络结构捕捉语句内部关联信息进行事件触发词及论元标签预测和事件类型预测. 在CEC语料库上进行了抽取实验和归纳分析,本方法的F1值、准确率、召回率三项指标较基线方法分别提高了16%、28%和24%,有效提升了事件抽取任务性能.
Abstract:
In big data analysis in the field of public security and law,discipline inspection and supervision,structured data and unstructured text data often become the main data source. When conducting business analysis based on this type of data,it is necessary to focus on extracting the implicit associations behind the data,and event extraction is the core basis for association analysis of such text data. The past event extraction task separates event trigger word recognition and event element recognition. The event trigger word and event type obtained from the event trigger recognition are used for subsequent event element recognition. There is a problem of error propagation,and the previous representation-based method is constructed Word vectors lack the ability to extract sentence-level features. This paper proposes a RBBLC joint extraction model,which completes event recognitionand event element recognition at the same time by means of sequence labeling. The RBBLC model builds word vectors containing richer context information based on RoBERTa,and then uses the network structure of BiLSTM-CNN to capture the relevant information within thesentence for event trigger word and argumentlabelprediction and event type prediction. The experiment is carried out on the CEC corpus. Compared with the baseline method,the F1 value,accuracy rate,and recall rate of our method are improved by 16%,28% and 24% respectively,which is effective improved the performance of event extraction tasks.

参考文献/References:

[1]程思伟,葛唯益,王羽,等. BGCN:基于BERT和图卷积网络的触发词检测[J]. 计算机科学,2021,48(7):302-308.
[2]KIM J T,MOLDOVAN D I. Acquisition of linguistic patterns for knowledge-based information extraction[J]. IEEE Transactions on Knowledge and Data Engineering,1995,7(5):713-724.
[3]GUPTA S,PATEL D. NE2:named event extraction engine[J]. Knowledge and Information Systems,2019,59(2):311-335.
[4]YUBO C,L IHENG X,KANG L,et al. Event extractionvia dynamic multi-pooling convolutional neural networks[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th Inter-national Joint Conference on Natural Language Processing.Beijing,China:ACL,2015:167-176.
[5]ZENG Y,YANG H,FENG Y,et al. A convolution Bi-LSTM neural network model for chinese event extraction[C]//Natural Language Understanding and Intelligent Applications. Berlin,Germany:Springer,2016:275-287.
[6]潘璋,黄德根. 事件要素注意力与编码层融合的触发词抽取研究[J]. 小型微型计算机系统,2021,42(4):3-7.
[7]CHEN S,LIN H F,FAN X C,et al. Biomedical event trigger detection with convolutional highway neural network and extreme learning machine[J]. Applied Soft Computing Journal,2019,84:105661.
[8]ZHAN L Y,JIANG X P,LIU Q. Research on Chinese event extraction method based on HMM and multi-stage method[J]. Journal of Physics:Conference Series,2021,1732(1):012024.
[9]YU W,YI M,HUANG X,et al. Make it directly:event extraction based on tree-LSTM and Bi-GRU[J]. IEEE Access,2020(8):14344-14354.
[10]吴文涛,李培峰,朱巧明. 基于混合神经网络的实体和事件联合抽取方法[J]. 中文信息学报,2019,33(8):82-88.
[11]季忠祥,吴悦. 基于组合神经网络的中文事件抽取[J]. 上海大学学报(自然科学版),2021,27(3):129-137.
[12]王雷,李瑞轩,李玉华. 文档级无触发词事件抽取联合模型[J]. 计算机科学与探索,2021,15(12):1-9.
[13]DEVLIN J,CHANG M W,LEE K. BERT:pretraining of deep bidirectional transformers for language understanding[J]. arXiv Preprint arXiv:1810.04805,2019.
[14]NGUYEN T H,CHO K,GRISHMAN R. Joint event extraction via recurrent neural networks[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. California,USA,2016.
[15]张江英,郝矿荣,王直杰. 基于Lattice LSTM-CRF模型的中文紧急事件抽取[C]//2020中国自动化大会(CAC2020)论文集. 上海:中国自动化学会,2020.

备注/Memo

备注/Memo:
收稿日期:2022-03-16.
基金项目:国家自然科学基金资助项目(61662051)、内蒙古科技计划项目(2019GG372)、内蒙古纪检监察大数据实验室开放课题项目(IMDBD202005).
通讯作者:刘靖,博士,教授,研究方向:云计算与大数据分析、软件可靠性确认. E-mail:liujing@imu.edu.cn
更新日期/Last Update: 2022-09-15