楊超 劉云飛 徐向旭 劉傳輝 朱弘
摘 ?要: 基于預(yù)測(cè)編碼、SOM自主神經(jīng)網(wǎng)絡(luò)矢量編碼和Huffman編碼的聯(lián)合編碼算法(PV算法)壓縮效果雖然較好,但它在對(duì)每段語(yǔ)音編碼時(shí),都需要利用該段語(yǔ)音信號(hào),通過(guò)SOM自主神經(jīng)網(wǎng)絡(luò)訓(xùn)練得到碼本,算法復(fù)雜、耗時(shí)。為此文中提出從具有一般特征的多段語(yǔ)音信號(hào)中通過(guò)SOM自主神經(jīng)網(wǎng)絡(luò)訓(xùn)練提取碼本,所有的語(yǔ)音信號(hào)段PV編碼都統(tǒng)一用該碼本,不需要對(duì)每一段語(yǔ)音信號(hào)編碼都做一次提取碼本的運(yùn)算,這樣不僅節(jié)省了每段語(yǔ)音PV編碼時(shí)用于訓(xùn)練碼本的時(shí)間,也節(jié)省了需要編碼的專(zhuān)用碼本的信息,減小了碼率。實(shí)驗(yàn)結(jié)果顯示,通用碼本的PV編碼算法在保證一定語(yǔ)音質(zhì)量的條件下,是可行的。文中提出的編碼算法在語(yǔ)言壓縮編碼方面具有較高的研究?jī)r(jià)值和很好的應(yīng)用前景。
關(guān)鍵詞: PV編碼; 矢量編碼; 語(yǔ)音信號(hào)編碼; 神經(jīng)網(wǎng)絡(luò)訓(xùn)練; 通用碼本; 專(zhuān)用碼本
中圖分類(lèi)號(hào): TN911.3?34 ? ? ? ? ? ? ? ? ? ? ?文獻(xiàn)標(biāo)識(shí)碼: A ? ? ? ? ? ? ? ? ? ? ? 文章編號(hào): 1004?373X(2019)12?0165?03
Abstract: The joint encoding algorithm based on the predictive coding, SOM autonomous neural network vector coding and Huffman coding (PV algorithm) has a good combination effect, but is complex and time?consuming when used to obtain the codebook by means of the SOM autonomous neural network training since the speech signal segment needs to be used during the encoding of each speech segment. Therefore, the SOM autonomous neural network training is proposed in this paper to extract the codebook from multiple speech signal segments with general features. The codebook is used for PV coding of all speech signal segments. There is no need to perform a codebook extraction operation for encoding of each speech signal segment, which not only saves the codebook training time for PV coding of each speech segment, but also saves the information of specific codebooks that need to encode, and reduces the bit rate. The experimental results show that the PV coding algorithm of the general codebook is feasible under the condition of guaranteeing a certain speech quality, and the coding algorithm proposed in this paper has a high research value and good application prospect in the aspect of language compression coding.
Keywords: PV coding; vector coding; speech signal coding; neural network training; general codebook; specific codebook
0 ?引 ?言
語(yǔ)音編碼的目的是減少表示語(yǔ)音信號(hào)的碼元數(shù)量[1],早在1972年,ITU?T發(fā)布A/μ率 64 Kb/s,脈沖編碼調(diào)制語(yǔ)音編碼標(biāo)準(zhǔn)G.711 [2]。目前語(yǔ)音編碼的基本算法主要有波形編碼、混合編碼和參數(shù)編碼[3]。一種基于預(yù)測(cè)編碼、SOM自主神經(jīng)網(wǎng)絡(luò)[4?7]矢量編碼[8?10]和Huffman編碼的聯(lián)合編碼算法(以下簡(jiǎn)稱(chēng)PV編碼算法)屬于波形編碼[11],其碼率達(dá)到12.8 Kb/s,小于采用ADPCM編碼算法的波形編碼標(biāo)準(zhǔn)G.72的碼率32 Kb/s(波形編碼的最小碼率)。但是PV算法中的SOM自主神經(jīng)網(wǎng)絡(luò)矢量的訓(xùn)練樣本采用的是待傳輸?shù)男盘?hào),缺點(diǎn)是每傳送一段語(yǔ)音信號(hào),就需要對(duì)SOM自主神網(wǎng)絡(luò)進(jìn)行一次訓(xùn)練,工作量比較大。本文提出PV算法中的SOM神經(jīng)網(wǎng)絡(luò)的矢量量化的碼矢(本)采用通用碼矢(本),這樣不需要每傳輸一段語(yǔ)音就對(duì)SOM神經(jīng)網(wǎng)絡(luò)訓(xùn)練一次,文中將研究PV編碼算法中矢量量化采用通用碼本的編解碼效果,旨在找到碼率變化范圍較小的條件下,減少PV編碼算法的運(yùn)算量和運(yùn)算時(shí)間的算法。
1 ?PV算法
圖1為2維PV算法編碼部分程序流程圖。以此類(lèi)推,n維PV算法是將一列語(yǔ)音信號(hào)轉(zhuǎn)變?yōu)閚列,然后進(jìn)行線(xiàn)性預(yù)測(cè)和n維矢量量化。
2 ?通用矢量碼本PV算法實(shí)驗(yàn)結(jié)果與分析
專(zhuān)用碼本PV編碼算法中,對(duì)SOM自主神經(jīng)網(wǎng)絡(luò)的訓(xùn)練樣本源自待傳輸?shù)恼Z(yǔ)音信號(hào),這樣的碼本只對(duì)待傳輸?shù)男盘?hào)量化誤差小。通用碼本的SOM神經(jīng)網(wǎng)絡(luò)訓(xùn)練樣本源自眾多的常用的語(yǔ)音信號(hào),這樣從統(tǒng)計(jì)意義上來(lái)說(shuō),通用碼本對(duì)一般語(yǔ)音信號(hào)的矢量量化誤差小。
實(shí)驗(yàn)中,先選取了男聲、女聲和男聲音樂(lè)混合聲音3段信號(hào)。因?yàn)?維64碼矢PV語(yǔ)音編碼具有較小的碼率,這里按照8維PV編碼算法的編碼原理,對(duì)3段信號(hào)分別進(jìn)行8列線(xiàn)性預(yù)測(cè)并計(jì)算誤差,得到3個(gè)長(zhǎng)度分別是5 000,6 017,5 016的8列(維)的誤差矩陣,將三段誤差矢量拼在一起形成了一個(gè)16 033的8維矩陣,送入到SOM自主神經(jīng)網(wǎng)絡(luò)訓(xùn)練。為了得到64個(gè)碼本,神經(jīng)網(wǎng)絡(luò)的輸出設(shè)定為64。據(jù)神經(jīng)網(wǎng)絡(luò)的訓(xùn)練結(jié)果,得到8維PV編碼算法的64碼矢的通用碼本。壓縮率和信噪比的計(jì)算公式為: [壓縮率=編碼后二進(jìn)制碼總位數(shù)初始信號(hào)二進(jìn)制碼總位數(shù)×100%] (1)
式中:原始信號(hào)的功率為[Ps];語(yǔ)音信號(hào)的噪聲功率[Pn]。
圖2為用8維PV編碼算法的64碼矢的通用碼本對(duì)一段語(yǔ)音信號(hào)編譯碼的情況。從圖中可以看出,譯碼恢復(fù)信號(hào)在時(shí)域和頻域都保持了原始語(yǔ)音信號(hào)的基本特征和形狀。播放還原聲音,仍能較為清楚地聽(tīng)清語(yǔ)音內(nèi)容,音色變化不大,存在少量噪聲。信噪比為6.14 dB,壓縮率為8.58%。
表1為用5段聲音做為訓(xùn)練樣本,將它們拼接成長(zhǎng)度為26 843的8維預(yù)測(cè)誤差矩陣,通過(guò)SOM自主神經(jīng)網(wǎng)絡(luò)訓(xùn)練得到64個(gè)通用碼本,并用該通用碼本對(duì)10段聲音做8維64碼矢的PS編譯碼后恢復(fù)的聲音情況。
由表1可見(jiàn),專(zhuān)用碼本恢復(fù)的10段聲音質(zhì)量較通用碼本的好,10段用通用碼本恢復(fù)的語(yǔ)音中有8組樣本恢復(fù)聲音質(zhì)量為良,語(yǔ)音內(nèi)容能清楚辨別,音色變化較小;2組恢復(fù)聲音質(zhì)量較差,內(nèi)容勉強(qiáng)聽(tīng)清,音色有變化,噪聲較大較差。所以,雖然通用碼本譯碼恢復(fù)的聲音質(zhì)量較專(zhuān)用碼本差,但仍然可行。
3 ?結(jié) ?論
雖然適當(dāng)?shù)倪x擇參數(shù)可使PV算法編碼碼率值很小,但是,對(duì)每段語(yǔ)音編碼時(shí),都需要利用該段語(yǔ)音信號(hào),通過(guò)SOM自主神經(jīng)網(wǎng)絡(luò)訓(xùn)練得到碼本,算法復(fù)雜、耗時(shí)。本文提出從具有一般特征的多段語(yǔ)音信號(hào)中通過(guò)SOM自主神經(jīng)網(wǎng)絡(luò)訓(xùn)練提取碼本,所有的語(yǔ)音信號(hào)段PV編碼都用該碼本,不需要對(duì)每一段語(yǔ)音信號(hào)編碼都做一次提取碼本的運(yùn)算,這樣不僅節(jié)省了每段語(yǔ)音PV編碼時(shí)用于訓(xùn)練碼本的時(shí)間,而且節(jié)省了需要編碼的專(zhuān)用碼本的信息,從而減少了碼率。實(shí)驗(yàn)結(jié)果顯示,通用碼本的PV編碼算法在保證一定語(yǔ)音質(zhì)量的條件下,是可行的。
注:本文通訊作者為徐向旭。
參考文獻(xiàn)
[1] 肖東,莫福源,陳庚,等.低碼率語(yǔ)音編碼中過(guò)渡幀對(duì)合成語(yǔ)音的影響[J].應(yīng)用聲學(xué),2016,35(1):77?83.
XIAO Dong, MO Fuyuan, CHEN Geng, et al. Effects of transition frame on synthesized speech in low bit rate speech coding [J]. Journal of applied acoustics, 2016, 35(1): 77?83.
[2] 李曉明.語(yǔ)音與音頻信號(hào)的通用編碼方法研究[D].北京:北京工業(yè)大學(xué),2014.
LI Xiaoming. Research on universal coding method for speech and audio signals [D]. Beijing: Beijing University of Technology, 2014.
[3] 梁冬蕾.音頻語(yǔ)音聯(lián)合編碼算法研究[D].西安:西安電子科技大學(xué),2010.
LIANG Donglei. Research on joint coding algorithm for audio speech [D]. Xian: Xidian University, 2010.
[4] 錢(qián)海軍.基于BP神經(jīng)網(wǎng)絡(luò)的圖像壓縮的Matlab實(shí)現(xiàn)[J].電腦開(kāi)發(fā)與應(yīng)用,2011,24(12):77?79.
QIAN Haijun. Image compression based on neural network using Matlab [J]. Computer development & applications, 2011, 24(12): 77?79.
[5] 王龍,杜敦偉,白艷萍.SOM網(wǎng)絡(luò)在雷達(dá)目標(biāo)識(shí)別中的應(yīng)用[J].科技視界,2015(16):52?53.
WANG Long, DU Dunwei, BAI Yanping. Application of SOM network in radar target recognition [J]. Science & technology vision, 2015(16): 52?53.
[6] 楊晨,閆薇.利用SOM網(wǎng)絡(luò)模型進(jìn)行聚類(lèi)研究[J].網(wǎng)絡(luò)安全技術(shù)與應(yīng)用,2014(2):44?45.
YANG Chen, YAN Wei. Research on the clustering by using SOM network model [J]. Network security technology & application, 2014(2): 44?45.
[7] 鄒瑜,帥仁俊.基于改進(jìn)的SOM神經(jīng)網(wǎng)絡(luò)的醫(yī)學(xué)圖像分割算法[J].計(jì)算機(jī)工程與設(shè)計(jì),2016,37(9):2533?2537.
ZOU Yu, SHUAI Renjun. Improved segmentation algorithm of medical images based on SOM neural network [J]. Computer engineering and design, 2016, 37(9): 2533?2537.
[8] 楊超,賀一君,任建存,等.碼本均衡矢量編碼算法[J].現(xiàn)代電子技術(shù),2016,39(13):38?40.
YANG Chao, HE Yijun, REN Jiancun, et al. Codebook equilibrium algorithm for vector coding [J]. Modern electronics technique, 2016, 39(13): 38?40.
[9] 楊超,董世錕.矢量量化圖像壓縮方法[J].海軍航空工程學(xué)院學(xué)報(bào),2011,26(1):11?14.
YANG Chao, DONG Shikun. Image compression method based on vector quantization [J]. Journal of Naval Aeronautical and Astronautical University, 2011, 26(1): 11?14.
[10] MAKHOUL J, ROUCOS S, GISH H. Vector quantization in speech coding [J]. Proceeding of the IEEE, 1985, 73(11): 1551?1588.
[11] 楊超,劉云飛,徐向旭,等.基于預(yù)測(cè)編碼和矢量編碼的語(yǔ)音信號(hào)編碼算法[J].現(xiàn)代電子技術(shù),2018,41(24):128?131.
YANG Chao, LIU Yunfei, XU Xiangxu, et al. Speech signal coding algorithm based on predictive coding and vector coding [J]. Modern electronics technique, 2018, 41(24): 128?131.