聶可卉 劉文哲 童同 杜民 高欽泉
摘 要:針對(duì)目前視頻質(zhì)量增強(qiáng)和超分辨率重建等任務(wù)中常采用的光流估計(jì)相關(guān)算法只能估計(jì)像素點(diǎn)間線性運(yùn)動(dòng)的問題, 提出了一種新型多幀去壓縮偽影網(wǎng)絡(luò)結(jié)構(gòu)。該網(wǎng)絡(luò)由運(yùn)動(dòng)補(bǔ)償模塊和去壓縮偽影模塊組成。運(yùn)動(dòng)補(bǔ)償模塊采用自適應(yīng)可分離卷積代替?zhèn)鹘y(tǒng)的光流估計(jì)算法,能夠很好地處理光流法不能解決的像素點(diǎn)間的曲線運(yùn)動(dòng)問題。對(duì)于不同視頻幀,運(yùn)動(dòng)補(bǔ)償模塊預(yù)測(cè)出符合該圖像結(jié)構(gòu)和像素局部位移的卷積核,通過局部卷積的方式實(shí)現(xiàn)對(duì)后一幀像素的運(yùn)動(dòng)偏移估計(jì)和像素補(bǔ)償。將得到的運(yùn)動(dòng)補(bǔ)償幀和原始后一幀聯(lián)結(jié)起來作為去壓縮偽影模塊的輸入,通過融合包含不同像素信息的兩視頻幀,得到對(duì)該幀去除壓縮偽影后的結(jié)果。與目前最先進(jìn)的多幀質(zhì)量增強(qiáng)(MFQE)算法在相同的訓(xùn)練集和測(cè)試集上訓(xùn)練并測(cè)試,實(shí)驗(yàn)結(jié)果表明,峰值信噪比提升(ΔPSNR)較MFQE最大增加0.44dB,平均增加0.32dB,驗(yàn)證了所提出網(wǎng)絡(luò)具有良好的去除視頻壓縮偽影的效果。
關(guān)鍵詞:視頻質(zhì)量增強(qiáng);光流估計(jì);運(yùn)動(dòng)補(bǔ)償;自適應(yīng)可分離卷積;去視頻壓縮偽影
中圖分類號(hào):TP391; TP183
文獻(xiàn)標(biāo)志碼:A
Abstract: The existing optical flow estimation methods, which are frequently used in video quality enhancement and superresolution reconstruction tasks, can only estimate the linear motion between pixels. In order to solve this problem, a new multiframe compression artifact removal network architecture was proposed. The network consisted of motion compensation module and compression artifact removal module. With the traditional optical flow estimation algorithms replaced with the adaptive separable convolution, the motion compensation module was able to handle with the curvilinear motion between pixels, which was not able to be well solved by optical flow methods. For each video frame, a corresponding convolutional kernel was generated by the motion compensation module based on the image structure and the local displacement of pixels. After that, motion offsets were estimated and pixels were compensated in the next frame by means of local convolution. The obtained compensated frame and the original next frame were combined together as input for the compression artifact removal module. By fusing different pixel information of the two frames, the compression artifacts of the original frame were removed. Compared with the stateoftheart MultiFrame Quality Enhancement (MFQE) algorithm on the same training and testing datasets, the proposed network has the improvement of Peak SignaltoNoise Ratio (ΔPSNR) increased by 0.44dB at most and 0.32dB on average. The experimental results demonstrate that the proposed network performs well in removing video compression artifacts.
英文關(guān)鍵詞Key words: video quality enhancement; optical flow estimation; motion compensation; adaptive separable convolution; video compression artifact removal
0 引言
去壓縮偽影是計(jì)算機(jī)視覺中的經(jīng)典問題。圖像和視頻壓縮算法通常通過減小媒體文件大小以降低傳輸帶寬,達(dá)到節(jié)省傳輸成本和時(shí)間的效果;然而這種壓縮算法不可避免地導(dǎo)致圖像和視頻中信息的丟失和引入不必要的偽影,嚴(yán)重影響用戶的視覺體驗(yàn),因此,如何去除壓縮偽影并復(fù)原這些圖像和視頻是現(xiàn)在熱門的研究問題。
在過去幾年中,隨著深度學(xué)習(xí)的發(fā)展,許多方法已成功應(yīng)用于去除圖像壓縮偽影:首先,偽影減少卷積神經(jīng)網(wǎng)絡(luò)(Artifacts Reduction Convolutional Neural Network, ARCNN)[1]已經(jīng)證明了深度卷積神經(jīng)網(wǎng)絡(luò)(Convolutional Neural Network, CNN)在去除圖像中JPEG(Joint Photographic Experts Group)壓縮偽影的有效性; 隨后,深度雙域卷積網(wǎng)絡(luò)(Deep Dualdomain Convolutional Network, DDCN)[2]采用在頻域和像素域上同時(shí)對(duì)圖像進(jìn)行處理來去除壓縮偽影; 近年來,隨著生成對(duì)抗網(wǎng)絡(luò)[3]被提出并被廣泛使用后,Guo等[4]和Galteri等[5]采用生成對(duì)抗網(wǎng)絡(luò)來去除圖像的壓縮偽影。上述提及的方法都驗(yàn)證了深度神經(jīng)網(wǎng)絡(luò)對(duì)于去除單一圖像壓縮偽影的有效性。
目前,通過以單幀圖像作為輸入得到的去偽影后的視頻幀仍存在較嚴(yán)重的物體輪廓模糊甚至信息丟失的情況,可見該方法在處理連續(xù)視頻幀上具有較大的局限性。通過融合視頻中連續(xù)的多幀圖像,利用相鄰幀之間像素的相關(guān)性和幀間的信息互補(bǔ)性,從而補(bǔ)償各幀丟失的信息,可以獲得更好的去視頻壓縮偽影效果。
現(xiàn)有的對(duì)視頻的質(zhì)量進(jìn)行增強(qiáng)的研究主要分布在視頻去噪去模糊、視頻超分辨率重建等工作[6-10]上。近來, Wang等[11]提出深層卷積自動(dòng)解碼器(Deep CNNbased Auto Decoder, DCAD)網(wǎng)絡(luò)用于壓縮視頻質(zhì)量恢復(fù), 該網(wǎng)絡(luò)由10層卷積層組成,由于網(wǎng)絡(luò)體積較小,重建效果因此受限。Yang等[12]提出了解碼側(cè)卷積神經(jīng)網(wǎng)絡(luò)(DecoderSide Convolutional Neural Network, DSCNN)用于視頻質(zhì)量增強(qiáng),該網(wǎng)絡(luò)由兩個(gè)子網(wǎng)絡(luò)組成,其中幀內(nèi)解碼側(cè)卷積神經(jīng)網(wǎng)絡(luò)(IntraDecoderside Convolutional Neural Network, DSCNNI)用來減少幀內(nèi)編碼的壓縮偽影而幀間解碼側(cè)卷積神經(jīng)網(wǎng)絡(luò)(InterDecoderside Convolutional Neural Network, DSCNNB)用來減少幀間編碼的壓縮偽影。由于以上兩種方法均未使用到相鄰視頻幀間的信息,故而均可看作是單幀圖像去偽影算法。Yang等[13]提出了分別通過兩個(gè)不同網(wǎng)絡(luò)處理HEVC(High Efficiency Video Coding)幀內(nèi)和幀間編碼幀的質(zhì)量增強(qiáng)卷積神經(jīng)網(wǎng)絡(luò)(Quality Enhancement Convolutional Neural Network, QECNN)方法。由于該方法僅考慮到去除HEVC編碼的視頻,不適用于全部場(chǎng)景,故而Yang等[14]提出多幀質(zhì)量增強(qiáng)(MultiFrame Quality Enhancement, MFQE)網(wǎng)絡(luò)結(jié)構(gòu)。MFQE包含四部分: 一個(gè)支持向量機(jī)(Support Vector Machine,SVM)用于對(duì)高質(zhì)量幀(Peak Quality Frame, PQF)和非高質(zhì)量幀(nonPeak Quality Frame, nonPQF)進(jìn)行分類,運(yùn)動(dòng)補(bǔ)償網(wǎng)絡(luò)用來實(shí)現(xiàn)幀間運(yùn)動(dòng)補(bǔ)償,兩個(gè)不同的質(zhì)量增強(qiáng)網(wǎng)絡(luò)分別用來減少PQF和nonPQF幀的壓縮偽影。若壓縮視頻不存在PQF和nonPQF時(shí)(例如壓縮質(zhì)量系數(shù)設(shè)置為CRF(Constant Rate Factor)),該網(wǎng)絡(luò)將不能很好地發(fā)揮作用。
光流估計(jì)算法是利用圖像序列中圖像在時(shí)間域上的變化以及相鄰幀之間的相關(guān)性來找到上一幀和當(dāng)前幀之間存在的對(duì)應(yīng)關(guān)系從而計(jì)算相鄰幀之間物體運(yùn)動(dòng)的一種方法。對(duì)于傳統(tǒng)的光流估計(jì)法[15-18]來說,需要通過光流圖估計(jì)和像素形變這兩個(gè)階段得到預(yù)測(cè)幀,由于缺乏光流圖的真實(shí)值,故而以上方法存在較大誤差。文獻(xiàn)[19]指出光流圖估計(jì)法被看作是點(diǎn)到點(diǎn)的固定的變換圖(transmission map),也即假定像素點(diǎn)A到像素點(diǎn)B的移動(dòng)是一條直線(反之亦然),而并未考慮像素點(diǎn)的曲線運(yùn)動(dòng),并且在視頻運(yùn)動(dòng)過程中出現(xiàn)遮擋和模糊的情況時(shí),光流法可能會(huì)由于找不到相鄰幀中對(duì)應(yīng)的像素點(diǎn)而無法得到較為準(zhǔn)確的運(yùn)動(dòng)路徑。
空間轉(zhuǎn)換網(wǎng)絡(luò)(Spatial Transformer Network)[20]的提出使得網(wǎng)絡(luò)可以學(xué)習(xí)到兩張圖片像素的空間映射關(guān)系,并將這種點(diǎn)對(duì)點(diǎn)的映射關(guān)系以網(wǎng)格轉(zhuǎn)換(grid transform)的形式表現(xiàn),該形式可以類似表示光流場(chǎng)中矢量運(yùn)動(dòng),很快該空間轉(zhuǎn)換網(wǎng)絡(luò)被用于編碼運(yùn)動(dòng)視頻中的光流圖特征[14,21]進(jìn)行運(yùn)動(dòng)補(bǔ)償操作。
本文通過使用兩個(gè)級(jí)聯(lián)網(wǎng)絡(luò)解決去除視頻壓縮偽影的問題。本文網(wǎng)絡(luò)包括兩個(gè)模塊:運(yùn)動(dòng)補(bǔ)償模塊和去壓縮偽影模塊。與通常使用基于網(wǎng)格映射進(jìn)行運(yùn)動(dòng)補(bǔ)償?shù)姆椒ú煌疚闹械倪\(yùn)動(dòng)補(bǔ)償網(wǎng)絡(luò)通過一維的局部可分離卷積方式實(shí)現(xiàn),不僅可以有效地估計(jì)像素偏移,同時(shí)可以對(duì)相鄰幀間信息進(jìn)行補(bǔ)償,為缺損視頻幀帶來更多像素信息。隨后,運(yùn)動(dòng)補(bǔ)償模塊得到的對(duì)后一幀的補(bǔ)償幀聯(lián)結(jié)原始的后一幀作為去壓縮偽影模塊的輸入,通過融合包含不同像素信息兩幀,重建后一幀視頻幀,實(shí)現(xiàn)去除壓縮偽影的效果。該網(wǎng)絡(luò)可以實(shí)現(xiàn)端到端的訓(xùn)練。
本文的主要工作如下:
1)采用可分離局部卷積實(shí)現(xiàn)相鄰幀間像素估計(jì)與補(bǔ)償。較光流估計(jì)法點(diǎn)到點(diǎn)的直線運(yùn)動(dòng)估計(jì),該方法通過非線性特征映射的方式可以對(duì)像素間可能存在的曲線運(yùn)動(dòng)進(jìn)行估計(jì),因而更具靈活性。
2)提出了一種新穎的基于卷積神經(jīng)網(wǎng)絡(luò)去除視頻壓縮偽影的網(wǎng)絡(luò)模型方案,網(wǎng)絡(luò)模型由運(yùn)動(dòng)補(bǔ)償模塊和去壓縮偽影模塊相連接實(shí)現(xiàn),通過聯(lián)結(jié)多幀圖像作為網(wǎng)絡(luò)輸入從而融合相鄰幀間缺損信息,可以達(dá)到更好的去除視頻偽影效果。
3 結(jié)語
本文提出一種新型多幀去壓縮偽影網(wǎng)絡(luò)結(jié)構(gòu), 其中:運(yùn)動(dòng)補(bǔ)償模塊以自適應(yīng)可分離卷積方式實(shí)現(xiàn)對(duì)后一幀像素的運(yùn)動(dòng)偏移估計(jì)和缺損像素補(bǔ)償; 去壓縮偽影模塊通過融合含有不同像素信息量的補(bǔ)償幀和對(duì)應(yīng)的原始視頻幀,最終得到去視頻壓縮偽影結(jié)果。在本文實(shí)驗(yàn)中,運(yùn)動(dòng)補(bǔ)償網(wǎng)絡(luò)得到的補(bǔ)償幀較對(duì)應(yīng)壓縮幀的PSNR平均提升了0.03dB,與對(duì)應(yīng)未壓縮視頻幀的幀間差較壓縮幀平均減少了0.04dB,由此證明了運(yùn)動(dòng)補(bǔ)償網(wǎng)絡(luò)對(duì)缺損像素的補(bǔ)償作用,并且,結(jié)合了運(yùn)動(dòng)補(bǔ)償網(wǎng)絡(luò)后去偽影結(jié)果比僅去壓縮偽影網(wǎng)絡(luò)結(jié)果在視覺效果上有顯著提升。本文中結(jié)合了運(yùn)動(dòng)補(bǔ)償網(wǎng)絡(luò)的去壓縮偽影結(jié)果較目前先進(jìn)的ARCNN、DCAD、DSCNN和MFQE增強(qiáng)算法結(jié)果在相同測(cè)試序列上平均ΔPSNR分別提高了1.58dB,1.55dB,1.42dB以及0.32dB,較MFQE算法在測(cè)試序列上最大ΔPSNR提升了0.44dB,并且本文網(wǎng)絡(luò)去偽影后視覺效果較上述算法均有顯著提升,這表明本文所提出的網(wǎng)絡(luò)具有良好的去除視頻壓縮偽影的作用。
在未來工作中,將展開對(duì)網(wǎng)絡(luò)加速方法的研究,例如嘗試使用深度可分離卷積代替原始二維卷積的策略,通過調(diào)整網(wǎng)絡(luò)結(jié)構(gòu),在保證網(wǎng)絡(luò)性能的前提下對(duì)網(wǎng)絡(luò)進(jìn)行加速。
參考文獻(xiàn) (References)
[1] ??? DONG C, DENG Y, CHEN C L, et al. Compression artifacts reduction by a deep convolutional network[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway, NJ: IEEE, 2015: 576-584.
[2] ??? GUO J, CHAO H. Building dualdomain representations for compression artifacts reduction [C]// ECCV 2016: Proceedings of the 2016 European Conference on Computer Vision. Berlin: Springer, 2016: 628-644.
[3] ??? GOODFELLOW I J, POUGETABADIE J, MIRZA M, et al. Generative adversarial networks[J/OL]. arXiv Preprint, 2014, 2014: arXiv:1406.2661 [2014-06-10]. https://arxiv.org/abs/1406.2661.
[4] ??? GUO J, CHAO H. Onetomany network for visually pleasing compression artifacts reduction [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2017: 4867-4876.
[5] ??? GALTERI L, SEIDENARI L, BERTINI M, et al. Deep generative adversarial compression artifact removal [C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway, NJ: IEEE, 2017: 4836-4845.
[6] ??? 楊麗麗,盛國.一種基于卷積神經(jīng)網(wǎng)絡(luò)的礦井視頻圖像降噪方法[J]. 礦業(yè)研究與開發(fā), 2018, 38(2): 106-109. (YANG L L, SHENG G. A mine video image denoising method based on convolutional neural network[J]. Mining Research and Development, 2018, 38(2): 106-109.)
[7] ??? REN W, PAN J, CAO X, et al. Video deblurring via semantic segmentation and pixelwise nonlinear kernel[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway, NJ: IEEE, 2017: 1086-1094.
[8] ??? SAJJADI M S M, VEMULAPALLI R, BROWN M. Framerecurrent video superresolution[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2018: 6626-6634.
[9] ??? TAO X, GAO H, LIAO R, et al. Detailrevealing deep video superresolution [C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2018: 6626-6634.
[10] ?? 李玲慧,杜軍平,梁美玉,等.基于時(shí)空特征和神經(jīng)網(wǎng)絡(luò)的視頻超分辨率算法[J].北京郵電大學(xué)學(xué)報(bào),2016, 39(4):1-6. (LI L H, DU J P, LIANG M Y, et al. Video super resolution algorithm based on spatiotemporal features and neural networks[J]. Journal of Beijing University of Posts and Telecommunications, 2016, 39(4):1-6.)
[11] ?? WANG T, CHEN M, CHAO H. A novel deep learningbased method of improving coding efficiency from the decoderend for HEVC[C]// Proceedings of the 2017 Data Compression Conference. Piscataway, NJ: IEEE, 2017: 410-419.
[12] ?? YANG R, XU M, WANG Z. Decoderside HEVC quality enhancement with scalable convolutional neural network[C]// Proceedings of the 2017 IEEE International Conference on Multimedia and Expo. Piscataway, NJ: IEEE, 2017: 817-822.
[13] ?? YANG R, XU M, WANG Z, et al. Enhancing quality for HEVC compressed videos [J/OL]. arXiv Preprint, 2018, 2018: arXiv:1709.06734 (2017-09-20) [2018-07-06]. https://arxiv.org/abs/1709.06734.
[14] ?? YANG R, XU M, LIU T, et al. Multiframe quality enhancement for compressed video [C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2018: 6664-6673.
[15] ?? DOSOVITSKIY A, FISCHERY P, ILG E, et al. FlowNet: learning optical flow with convolutional networks [C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway, NJ: IEEE, 2015: 2758-2766.
[16] ?? BAILER C, TAETZ B, STRICKER D. Flow fields: dense correspondence fields for highly accurate large displacement optical flow estimation[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway, NJ: IEEE, 2015: 4015-4023.
[17] ?? REVAUD J, WEINZAEPFEL P, HARCHAOUI Z, et al. EpicFlow: edgepreserving interpolation of correspondences for optical flow [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2015: 1164-1172.
[18] ?? ILG E, MAYER N, SAIKIA T, et al. FlowNet2.0: evolution of optical flow estimation with deep networks[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2017, 2: 6.
[19] ?? MAHAJAN D, HUANG F C, MATUSIK W, et al. Moving gradients: a pathbased method for plausible image interpolation [J]. ACM Transactions on Graphics, 2009, 28(3): Article No. 42.
[20] ?? JADERBERG M, SIMONYAN K, ZISSERMAN A, et al. Spatial transformer networks[C]// Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge, MA: MIT Press, 2015: 2017-2025.
[21] ?? NIKLAUS S, MAI L, LIU F. Video frame interpolation via adaptive separable convolution [C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Washington, DC: IEEE Computer Society, 2017: 261-270.
[22] ?? HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2016: 770-778.
[23] ?? HE K, ZHANG X, REN S, et al. Identity mappings in deep residual networks [C]// ECCV 2016: Proceedings of the 2016 European Conference on Computer Vision. Berlin: Springer, 2016: 630-645.
[24] ?? DROZDZAL M, VORONTSOV E, CHARTRAND G, et al. The importance of skip connections in biomedical image segmentation [M]// Deep Learning and Data Labeling for Medical Applications. Berlin: Springer, 2016: 179-187.
[25] ?? BOSSEN F. Common test conditions and software reference configurations [S/OL].[2013-06-20].http://wftp3.itu.int/avarch/jctvcsite/2010_07_B_Geneva/JCTVCB300.doc.
[26] ?? GLOROT X, BENGIO Y. Understanding the difficulty of training deep feedforward neural networks [C]// Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. Sardinia, Italy: JMLR, 2010: 249-256.
[27] ?? KINGMA D, BA J. Adam: a method for stochastic optimization[EB/OL].[2018-03-20]. http://yeolab.weebly.com/uploads/2/5/5/0/25509700/a_method_for_stochastic_optimization_.pdf.
[28] ?? BARRON J T. A more general robust loss function[J/OL]. arXiv Preprint, 2017, 2017: arXiv:1701.03077 (2017-01-11) [2017-01-11]. https://arxiv.org/abs/1701.03077.
[29] ?? LAI W S, HUANG J B, AHUJA N, et al. Deep Laplacian pyramid networks for fast and accurate superresolution[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2017: 5835-5843.