«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

[1]任新社,缪华,马青玉.基于改进特征值的语音分割算法研究[J].南京师范大学学报(工程技术版),2011,11(03):073-77.
　Ren Xinshe,Miao Hua,Ma Qingyu.A Speech Segmentation Algorithm Based on Improved Eigenvalue[J].Journal of Nanjing Normal University(Engineering and Technology),2011,11(03):073-77.
点击复制

基于改进特征值的语音分割算法研究

分享到：

南京师范大学学报（工程技术版）[ISSN:1006-6977/CN:61-1281/TN]

卷:: 11卷
期数:: 2011年03期

页码:: 073-77

栏目:

出版日期:: 2011-11-30

文章信息/Info

Title:: A Speech Segmentation Algorithm Based on Improved Eigenvalue

作者:: 任新社¹; 缪华²; 马青玉³; ( 1．南京师范大学教育技术系，江苏南京210097) ( 2．解放军国际关系学院教育技术中心，江苏南京210039) ( 3．南京师范大学物理科学与技术学院，江苏南京210046)

Author(s):: Ren Xinshe¹; Miao Hua ²; Ma Qingyu³; 1.Department of Educational Technology,Nanjing Normal University,Nanjing 210097,China

关键词:: 语音检索; 语音分割; 改进特征值

Keywords:: speech retrieval; speech segmentation; improved eigenvalue

分类号:: TN912.3

摘要:: 随着网络技术和媒体应用的迅速发展,传统的文本检索已不能满足需要,视频检索由于数据量大而得不到应用,语音检索就显示出重要的研究价值.一个语音序列由多种不同类型的语音片段构成,而每一种类型的语音往往又包含不同的意义,因此通过语音特征进行语音分段来实现语音检索是现代媒体数据进行检索的重要手段.通过对语音信号每一帧的基本特征值与整个语音序列的平均基本特征值进行比较,得到一个改进的特征值,并利用K-Nearest Neighbor算法进行语音分割,结果表明基于改进特征值的语音分割算法能够有效提高语音分割的准确性.

Abstract:: With the rapid development of internet technology and media application，text-based retrieval cannot satisfy the requirements and auditory-visual processing can not be applied for the large data amount，so the emergence of speech retrieval is particularly important． An audio clip usually consists of many different types of audio segments with different meanings; therefore，it becomes a new method to perform speech retrieval with audio segmentation for modern media based on audio eigenvalue． In the article，the basic eigenvalue of each audio frame is compared with the average eigenvalue of the entire audio clip and then the improved eigenvalue can be obtained for audio segmentation by using the KNearest Neighbor algorithm． The experimental results show that the proposed algorithm based on the improved eigenvalue can efficiently improve the accuracy of audio segmentation．

参考文献/References:

［1］李恒峰，李国辉．基于内容的音频检索与分类［J］．计算机工程与应用，2000，36( 7) : 54-56． Li Hengfeng，Li Guohui． Content-based audio retrieval and classification［J］． Computer Engieering and Applications，2000， 36( 7) : 54-56． ( in Chinese)
［2］朱爱红，李连．基于内容音频检索综述［J］．微机发展，2003，13( 12) : 58-61． Zhu Aihong，Li Lian． The summarization of content-based audio retrieval［J］． Microcomputer Development，2003，13( 12) : 58-61． ( in Chinese)
［3］张燕，唐振民．基于MFCC 和HMM 的音乐分类方法研究［J］．南京师范大学学报: 工程技术版，2008，8( 4) : 112-114． Zhang Yan，Tang Zhenmin． Research of music classification based on MFCC feature and HMM model［J］． Journal of Nanjing Normal University: Engineering and Technology Edition，2008，8( 4) : 112-114． ( in Chinese)
［4］张永皋，马青玉，孙青．基于MFCC 和CHMM 技术的语音情感分析及其在教育中的应用研究［J］．南京师范大学学报: 工程技术版，2009，9( 2) : 89-92． Zhang Yonggao，Ma Qingyu，Sun Qing． Investigation on speech emotion analyses and its application in education based on MFCC and CHMM techniques［J］． Journal of Nanjing Normal University: Engineering and Technology Edition，2009，9( 2) : 89- 92． ( in Chinese)
［5］Foote J． An overview of audio information retrieval［J］． Multimedia Systems，1999，7( 1) : 47-59．
［6］Saunders J． Real-time discrimination of broadcast speech /music［C］/ / Proc ICASSP96． Washington DC: IEEE Computer Society， 1996( 2) : 993-996．
［7］Scheirer E，Slaney M． Construction and evaluation of a robust multifeature music /speech discriminator［C］/ / Proc ICASSP97． Washington DC: IEEE Computer Society，1997( 2) : 1-4．
［8］Zhang Y B，Zhou J． Audio segmentation based on multi-scale audio classification［J］． Multimedia Systems，2004( 4) : 349- 352．
［9］Lu L，Zhang H J，Jiang H． Content analysis for audio classification and segmentation［J］． IEEE Trans Speech Audio Process， 2002，10( 7) : 504-516．
［10］Campbell J P，Jr． Speaker recognition: a tutorial ［J］． Proceedings of the IEEE，1997，85( 9) : 1 437-1 462．
［11］Lu L，Jiang H，Zhang H J． A robust audio classification and segmentation method［C］/ / Proc 9th ACM Int Conf Multimedia． New York: ACM，2001: 203-211．
［12］El-Maleh K，Klein M，Petrucci G，et al． Speech /music discrimination for multimedia application［C］/ / Proc ICASSP00． Istanbul: IEEE Press， 2000: 2 445-2 448．

备注/Memo

备注/Memo:: 基金项目: 国家自然科学基金( 10974098) 、江苏省科技厅自然科学基金( BK2009407) 和教育部博士点基金( 20093207120003) ．通讯联系人: 马青玉，博士，教授，研究方向: 声学技术和生物医学电子技术． E-mail: maqingyu@ njnu． edu． Cn

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed1399
全文下载/Downloads3079
评论/Comments

更新日期/Last Update: 2013-03-21