«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j.issn.1672-1292.2017.02.010]
点击复制

一种基于GMM和多项式拟合的语音编码改进算法

分享到：

南京师范大学学报（工程技术版）[ISSN:1006-6977/CN:61-1281/TN]

卷:: 17卷
期数:: 2017年02期

页码:: 063

栏目:: 计算机工程

出版日期:: 2017-06-30

文章信息/Info

Title:: An Improved Speech Coding Algorithm Based onGMM and Polynomial Fitting

文章编号:: 1672-1292(2017)02-0063-07

作者:: 王蓉蓉¹; 李平²; 曾毓敏¹; 韦怡¹; (1.南京师范大学物理科学与技术学院,江苏南京 210023)(2.泰州职业技术学院信息技术学院,江苏泰州 225300)

Author(s):: Wang Rongrong¹; Li Ping²; Zeng Yumin¹; Wei Yi¹; (1.School of Physical Science and Technology,Nanjing Normal University,Nanjing 210023,China)(2.College of Information Technology,Taizhou Polytechnic College,Taizhou 225300,China)

关键词:: 语音编码; GMM; 多项式拟合; 范特蒙矩阵

Keywords:: speech coding; GMM; polynomial fitting; Vandermonde matrix

分类号:: TN912.3

DOI:: 10.3969/j.issn.1672-1292.2017.02.010

文献标志码:: A

摘要:: 提出一种基于高斯混合模型和多项式拟合的语音编码改进算法. 在GMM模型对短时语音谱包络进行参数化的基础上,将一定数量的语音帧划分为一个片段,利用谱特征的相关性对片段内的GMM参数进行多项式拟合联合编码,从而使得参数进一步减少. 仿真结果表明,本文算法码率对比基于GMM的语音编码器有显著降低.

Abstract:: A vocoder is proposed basing on polynomial fitting and Gaussian Mixture Model(pGMM). In the vocoder,several frames are collected into a segment after using GMM model to parameterize the short-time speech spectrum envelope. The polynomial trajectory is used to fit the parameters of GMM in a segment according to the correlation between neighboring frames,thus reducing the number of parameters. The results show that the bit rate of pGMM vocoder is further reduced in contrast with the vocoder based on GMM.

参考文献/References:

[1] 吴家安. 现代语音编码技术[M]. 北京:科学出版社,2008:287.
WU J A. Modern speech coding technology[M]. Beijing:Science Press,2008:287.(in Chinese)
[2]DUSAN S,FLANAGAN J L,KARVE A,et al. Speech compression by polynomial approximation[J]. IEEE transactions on audio,speech,and language processing,2007,15(2):387-395.
[3]LAURENT F. Adaptive long-term coding of LSF parameters trajectories for large-delay/very-to ultra-low bit-rate speech coding[J/OL]. EURASIP journal on audio,speech,and music processing,2010[2016-08-10]. http://hal.archives-ouvertes.fr/hal-00534492./DOI/10.1155/2010/597039.
[4]张楠,韩笑蕾,张洋. 基于MELP算法的超帧结构状态统计及改进算法研究[J].电脑知识与技术,2012,8(16):3982-3986.
ZHANG N,HAN X L,ZHANG Y. Studing on improved algorithms based on the statistics of super-frame structure state of the MELP algorithm[J]. Computer knowledge and technology,2012,8(16):3982-3986.(in Chinese)
[5]李平,曾毓敏. 基于GMM的甚低码率语音编码器[J].光电子技术,2007,27(3):110-114.
LI P,ZENG Y M. A very low bit-rate vocoder based on GMM[J]. Photoelectronic technique,2007,27(2):109-114.(in Chinese)
[6]GHALEHJEGH S H,ROSE R C. Linear regression based acoustic adaptation for the subspace gaussian mixture model[J].IEEE transactions on audio,speech,and language processing,2014,22(9):1 391-1 402.
[7]HWANG H,TSAO Y,WANG H. Incorporating global variance in the training phase of GMM-based voice conversion[C]//Proceedings of APSIPA. Kaohsiung:IEEE Xplore,2013:1-6.
[8]?ZBEK Y,JOHNSON M H,DEMIREKLER M. Estimation of articulatory trajectories based on Gaussian Mixture Model(GMM)with audio-visual information fusion and dynamic kalmansmoothing[J].IEEE transactions on audio,speech,and language processing,2001,19(5):1 180-1 195.
[9]SCHWARTZ B,GANNOT S,HABETS E A P. Online speech dereverberation using Kalman filter and EM algorithm[J]. IEEE/ACM transactions on audio,speech,and language processing,2015,23(2):394-406.
[10]孙林慧,杨震. 基于压缩感知的分布式语音压缩与重构[J].信号处理,2010,26(6):824-829.
SUN L H,YANG Z. Distributed speech compression and reconstruction based on compressed sensing theory[J]. Signal processing,2010,26(6):824-829.(in Chinese)
[11]刘慧婷,张旻,程家兴. 基于多项式拟合算法的EMD端点问题的处理[J].计算机工程与应用,2004(16):84-86.
LIU H T,ZHANG M,CHENG J X. Dealing with the end issue of EMD based on polynomial fitting algorithm[J]. Computer engineering and applications,2004(16):84-86.(in Chinese)
[12]FORTUNE S A,HOPGOOD J R. Speech classification for enhancing single channel blind dereverberation[C]//Signal Processing Conference,2008 16th European. Washington:IEEE,2008:1-5.
[13]邓峰,鲍枫,鲍长春. 基于MPEG-AAC编码器的压缩域音频增强方法[J]. 电子学报,2014,42(6):1410-1418.
DENG F,BAO F,BAO C C. Audio enhancement in compressed domain based on MPEG-AAC codec[J]. Acta electronica sinica,2014,42(6):1 410-1 418.(in Chinese)
[14]CERNAK M,GARNER P N,LAZARIDIS A,et al. Incremental syllable-context phonetic vocoding[J]. IEEE/ACM transactions on audio,speech,and language processing,2015,23(6):1019-1030.

备注/Memo

备注/Memo:: 收稿日期:2016-09-18.
基金项目:江苏省科技支撑计划(BE2014139)、江苏省自然科学基金(BK2010546).
通讯联系人:曾毓敏,教授,研究方向:语音信号处理和图像处理. E-mail:zengyumin@njnu.edu.cn

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed2364
全文下载/Downloads3458
评论/Comments

更新日期/Last Update: 2017-06-30