|Table of Contents|

An Improved Speech Coding Algorithm Based onGMM and Polynomial Fitting(PDF)


Research Field:
Publishing date:


An Improved Speech Coding Algorithm Based onGMM and Polynomial Fitting
Wang Rongrong1Li Ping2Zeng Yumin1Wei Yi1
(1.School of Physical Science and Technology,Nanjing Normal University,Nanjing 210023,China)(2.College of Information Technology,Taizhou Polytechnic College,Taizhou 225300,China)
speech codingGMMpolynomial fittingVandermonde matrix
A vocoder is proposed basing on polynomial fitting and Gaussian Mixture Model(pGMM). In the vocoder,several frames are collected into a segment after using GMM model to parameterize the short-time speech spectrum envelope. The polynomial trajectory is used to fit the parameters of GMM in a segment according to the correlation between neighboring frames,thus reducing the number of parameters. The results show that the bit rate of pGMM vocoder is further reduced in contrast with the vocoder based on GMM.


[1] 吴家安. 现代语音编码技术[M]. 北京:科学出版社,2008:287.
WU J A. Modern speech coding technology[M]. Beijing:Science Press,2008:287.(in Chinese)
[2]DUSAN S,FLANAGAN J L,KARVE A,et al. Speech compression by polynomial approximation[J]. IEEE transactions on audio,speech,and language processing,2007,15(2):387-395.
[3]LAURENT F. Adaptive long-term coding of LSF parameters trajectories for large-delay/very-to ultra-low bit-rate speech coding[J/OL]. EURASIP journal on audio,speech,and music processing,2010[2016-08-10]. http://hal.archives-ouvertes.fr/hal-00534492./DOI/10.1155/2010/597039.
[4]张楠,韩笑蕾,张洋. 基于MELP算法的超帧结构状态统计及改进算法研究[J].电脑知识与技术,2012,8(16):3982-3986.
ZHANG N,HAN X L,ZHANG Y. Studing on improved algorithms based on the statistics of super-frame structure state of the MELP algorithm[J]. Computer knowledge and technology,2012,8(16):3982-3986.(in Chinese)
[5]李平,曾毓敏. 基于GMM的甚低码率语音编码器[J].光电子技术,2007,27(3):110-114.
LI P,ZENG Y M. A very low bit-rate vocoder based on GMM[J]. Photoelectronic technique,2007,27(2):109-114.(in Chinese)
[6]GHALEHJEGH S H,ROSE R C. Linear regression based acoustic adaptation for the subspace gaussian mixture model[J].IEEE transactions on audio,speech,and language processing,2014,22(9):1 391-1 402.
[7]HWANG H,TSAO Y,WANG H. Incorporating global variance in the training phase of GMM-based voice conversion[C]//Proceedings of APSIPA. Kaohsiung:IEEE Xplore,2013:1-6.
[8]?ZBEK Y,JOHNSON M H,DEMIREKLER M. Estimation of articulatory trajectories based on Gaussian Mixture Model(GMM)with audio-visual information fusion and dynamic kalmansmoothing[J].IEEE transactions on audio,speech,and language processing,2001,19(5):1 180-1 195.
[9]SCHWARTZ B,GANNOT S,HABETS E A P. Online speech dereverberation using Kalman filter and EM algorithm[J]. IEEE/ACM transactions on audio,speech,and language processing,2015,23(2):394-406.
[10]孙林慧,杨震. 基于压缩感知的分布式语音压缩与重构[J].信号处理,2010,26(6):824-829.
SUN L H,YANG Z. Distributed speech compression and reconstruction based on compressed sensing theory[J]. Signal processing,2010,26(6):824-829.(in Chinese)
[11]刘慧婷,张旻,程家兴. 基于多项式拟合算法的EMD端点问题的处理[J].计算机工程与应用,2004(16):84-86.
LIU H T,ZHANG M,CHENG J X. Dealing with the end issue of EMD based on polynomial fitting algorithm[J]. Computer engineering and applications,2004(16):84-86.(in Chinese)
[12]FORTUNE S A,HOPGOOD J R. Speech classification for enhancing single channel blind dereverberation[C]//Signal Processing Conference,2008 16th European. Washington:IEEE,2008:1-5.
[13]邓峰,鲍枫,鲍长春. 基于MPEG-AAC编码器的压缩域音频增强方法[J]. 电子学报,2014,42(6):1410-1418.
DENG F,BAO F,BAO C C. Audio enhancement in compressed domain based on MPEG-AAC codec[J]. Acta electronica sinica,2014,42(6):1 410-1 418.(in Chinese)
[14]CERNAK M,GARNER P N,LAZARIDIS A,et al. Incremental syllable-context phonetic vocoding[J]. IEEE/ACM transactions on audio,speech,and language processing,2015,23(6):1019-1030.


Last Update: 2017-06-30