|Table of Contents|

A Multimodal Emotion Recognition Method Based on Decision Level Fusion(PDF)

南京师范大学学报(工程技术版)[ISSN:1006-6977/CN:61-1281/TN]

Issue:
2022年02期
Page:
35-40
Research Field:
计算机科学与技术
Publishing date:

Info

Title:
A Multimodal Emotion Recognition Method Based on Decision Level Fusion
Author(s):
Han Tianyi12Lin Rongheng12
(1.School of Computer Science(National Pilot Software Engineering School),Beijing University of Posts and Telecommunications,Beijing 100876,China)(2.State Key Laboratory of Networking and Switching Technology,Beijing University of Posts and Telecommunications,Beijing 100876,China)
Keywords:
emotion recognitionconvolutional neural networkcombination of software and hardwaremultimodaldecision-level fusion
PACS:
TP391
DOI:
10.3969/j.issn.1672-1292.2022.02.006
Abstract:
This paper designs a multimodal emotion recognition system that combines software and hardware. The system uses Mel-Frequency Cepstrum Coefficient and convolutional neural networks to recognize and classify emotions on speech and facial expressions. At the same time,emotion recognition of speech is transferred to neural network computing sticks to reduce the environmental load. In modal fusion,the method of decision-level fusion is used to improve the recognition accuracy. Experimental results show that the system has high recognition accuracy and can maintain running speed in the environment with poor performance.

References:

[1] ANG J,DHILLON R,KRUPSKI A,et al. Prosody-based automatic detection of annoyance and frustration in human-computer dialog[C]//Seventh International Conference on Spoken Language Processing. Denver,USA:DBLP,2002.
[2]LEE C M,NARAYANAN S S,PIERACCINI R. Combining acoustic and language information for emotion recognition[C]//Seventh International Conference on Spoken Language Processing. Denver,USA:DBLP,2002.
[3]GRAVES A,FERNáNDEZ S,SCHMIDHUBER J. Bidirectional LSTM networks for improved phoneme classification and recognition[C]//International Conference on Artificial Neural Networks. Berlin,Germany:Springer,2005.
[4]EYBEN F,W?LLMER M,GRAVES A,et al. On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues[J]. Journal on Multimodal User Interfaces,2010,3:7-19.
[5]陈闯,CHELLALI R,邢尹. 改进遗传算法优化BP神经网络的语音情感识别[J]. 计算机应用研究,2019,36(2):344-346,361.
[6]EKMAN P,FRIESEN W V. Manual for the Facial Action Coding System[M]. Palo Alto:Consulting Psychologists Press,1978.
[7]ZHANG Z Y. Feature-based facial expression recognition:sensitivity analysis and experiments with a multilayer perceptron[J]. International Journal of Pattern Recognition and Artificial Intelligence,1999,13(6):893-911.
[8]SHAN C F,GONG S G,MCOWAN P W. Robust facial expression recognition using local binary patterns[C]//IEEE International Conference on Image Processing 2005. Genova,Italy:IEEE,2005.
[9]KO B C. A brief review of facial emotion recognition based on visual information[J]. Sensors,2018,18(2):401.
[10]谢非,龚俊,王元祥,等. 基于肤色增强和分块PCA的人脸表情识别方法[J]. 南京师范大学学报(工程技术版),2017,17(2):49-56.
[11]KIM Y,LEE H,PROVOST E M. Deep learning for robust feature generation in audiovisual emotion recognition[C]//2013 IEEE International Conference on Acoustics,Speech and Signal Processing. Vancouver,Canada:IEEE,2013.
[12]HOSSAIN M S,MUHAMMAD G. Audio-visual emotion recognition using multi-directional regression and Ridgelet transform[J]. Journal on Multimodal User Interfaces,2016,10:325-333.
[13]闫静杰,卢官明,李海波,等. 基于人脸表情和语音的双模态情感识别[J]. 南京邮电大学学报(自然科学版),2018,38(1):60-65.
[14]JIANG D M,CUI Y L,ZHANG X J,et al. Audio visual emotion recognition based on triple-stream dynamic bayesian network models[C]//International Conference on Affective Computing and Intelligent Interaction. Berlin,Germany:Springer,2011.
[15]KAYA H,GüRPINAR F,SALAH A A. Video-based emotion recognition in the wild using deep transfer learning and score fusion[J]. Image and Vision Computing,2017,65:66-75.
[16]KRIZHEVSKY A,SUTSKEVER I,HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM,2017,6(6):84-90.
[17]LIENHART R,MAYDT J. An extended set of Haar-like features for rapid object detection[C]//Proceedings of the International Conference on Image Processing 2002. Rochester,USA:IEEE,2002.
[18]LEE C M,NARAYANAN S S. Toward detecting emotions in spoken dialogs[J]. IEEE Transactions on Speech and Audio Processing,2005,13(2):293-303.
[19]DATCU D,ROTHKRANTZ L. Multimodal recognition of emotions in car environments[C]//Proceedings of the Driver Car Internation & Interface 2009. Prague,Czech:DCI&I,2009.

Memo

Memo:
-
Last Update: 1900-01-01