李佳,鄭元林,廖開陽,樓豪杰,李世宇,陳澤豪
基于顯著性深層特征的無參考圖像質(zhì)量評(píng)價(jià)算法
李佳1,鄭元林1,2*,廖開陽1,2,樓豪杰1,李世宇1,陳澤豪1
(1.西安理工大學(xué) 印刷包裝與數(shù)字媒體學(xué)院,西安 710048; 2.陜西省印刷包裝工程重點(diǎn)實(shí)驗(yàn)室,西安 710048)(*通信作者電子郵箱zhengyuanlin@xaut.edu.cn)
針對(duì)通用型無參考圖像質(zhì)量評(píng)價(jià)(NR-IQA)算法,提出一種基于偽參考圖像顯著性深層特征的評(píng)價(jià)算法。首先,在失真圖像的基礎(chǔ)上,利用微調(diào)的ConSinGAN模型生成相應(yīng)的偽參考圖像作為失真圖像的補(bǔ)償信息,彌補(bǔ)NR-IQA算法缺少真實(shí)參考信息的不足;然后,提取偽參考圖像的顯著性信息,將偽參考顯著性圖像與失真圖像輸入到VGG16網(wǎng)絡(luò)中提取深層特征;最后,融合二者的深層特征并將其映射到由全連接層組成的回歸網(wǎng)絡(luò)中,從而產(chǎn)生與人類視覺一致的質(zhì)量預(yù)測(cè)。為了驗(yàn)證算法的有效性,在四個(gè)大型公開的圖像數(shù)據(jù)集TID2013、TID2008、CSIQ與LIVE上進(jìn)行實(shí)驗(yàn),結(jié)果顯示所提算法在TID2013數(shù)據(jù)集上的斯皮爾曼秩相關(guān)系數(shù)(SROCC)比H-IQA算法提升了5個(gè)百分點(diǎn),比RankIQA算法提升了14個(gè)百分點(diǎn),針對(duì)單一失真類型也具有穩(wěn)定的性能。實(shí)驗(yàn)結(jié)果表明,所提算法總體表現(xiàn)優(yōu)于現(xiàn)有主流全參考圖像質(zhì)量評(píng)價(jià)(FR-IQA)和NR-IQA算法,與人類主觀感知表現(xiàn)一致。
無參考圖像質(zhì)量評(píng)價(jià);生成對(duì)抗網(wǎng)絡(luò);顯著性;深度學(xué)習(xí);超分辨率
由于多媒體與通信技術(shù)的飛速發(fā)展,數(shù)字圖像數(shù)量爆炸式增長(zhǎng),圖像已經(jīng)成為人類獲取和傳遞信息的主要載體,是現(xiàn)代生活中不可缺少的部分。然而圖像在經(jīng)過獲取、存儲(chǔ)、傳輸與處理等環(huán)節(jié)會(huì)引入不同失真,引起圖像質(zhì)量退化問題。因此,圖像質(zhì)量評(píng)價(jià)(Image Quality Assessment, IQA)是一個(gè)重要的研究主題,可以用于許多圖像處理應(yīng)用中,例如圖像和視頻編碼、圖像去噪、圖像重建和圖像合成等。
客觀圖像質(zhì)量評(píng)價(jià)算法以對(duì)參考圖像的依賴程度可分為三類:全參考型質(zhì)量評(píng)價(jià)(Full-Reference IQA, FR-IQA)[1-2]、半?yún)⒖夹唾|(zhì)量評(píng)價(jià)(Reduced-Reference IQA, RR-IQA)[3]以及無參考型質(zhì)量評(píng)價(jià)(No-Reference IQA, NR-IQA)[4-6],具體取決于質(zhì)量評(píng)價(jià)過程中所需的參考信息量。盡管FR-IQA和RR-IQA算法普遍具有更好的效果,但是在實(shí)際情況中通常無法獲得參考圖像來作為失真圖像的對(duì)比信息,因此不需要原始參考圖像信息的NR-IQA更具有研究意義。FR-IQA與RR-IQA由于能利用未失真的原始圖像對(duì)比分析失真圖像與原始圖像的差異性信息進(jìn)行建模,在過去數(shù)十年中已取得了顯著成果。相比之下,NR-IQA只能以失真圖像作為輸入進(jìn)行評(píng)估,使NR-IQA做出良好的圖像質(zhì)量預(yù)測(cè)更具有挑戰(zhàn)性。
缺少參考信息在一定程度上抑制了NR-IQA算法的發(fā)展,數(shù)年來研究人員做出了許多努力來解決NR-IQA的難題。通過對(duì)失真敏感的場(chǎng)景統(tǒng)計(jì)數(shù)據(jù),可以檢測(cè)和量化圖像質(zhì)量失真程度,因此出現(xiàn)了自然場(chǎng)景參數(shù)包括離散小波系數(shù)、log-Gabor濾波器與顏色統(tǒng)計(jì)系數(shù)[7]與使用廣義高斯密度函數(shù)對(duì)圖像塊建模的離散余弦變換系數(shù)(Discrete Cosine Transform, DCT)[8]等。而后NR-IQA又受到機(jī)器學(xué)習(xí)在計(jì)算機(jī)視覺任務(wù)中成功應(yīng)用的啟發(fā),提出了一些基于神經(jīng)卷積網(wǎng)絡(luò)的算法[4,6,9-10],并在NR-IQA中取得了重大進(jìn)展。
然而,NR-IQA由于缺失參考圖像信息,依然與FR-IQA算法的性能相差甚遠(yuǎn)。為此,近年來有NR-IQA算法開始聚焦于生成偽參考圖像[5,11-12],即在失真圖像的基礎(chǔ)上生成偽參考圖像作為失真圖像的補(bǔ)償信息。這種算法解決了由于原始參考圖像缺失而導(dǎo)致NR-IQA算法效果差、難度大的問題,但是,這種算法普遍針對(duì)幾種特定失真,不適用于通用無參考算法。
因此,本文針對(duì)通用失真,提出了基于顯著性深層特征無參考評(píng)價(jià)算法,通過改進(jìn)生成對(duì)抗網(wǎng)絡(luò)(Generative Adversarial Network, GAN)模型,生成更逼真可靠的偽參考圖像,作為失真圖像的補(bǔ)償信息。該算法可以在一定程度上彌補(bǔ)NR-IQA缺少參考圖像的不足,進(jìn)行與人類視覺一致的質(zhì)量預(yù)測(cè)。
綜上,本文的主要工作如下:
1)NR-IQA由于缺失參考信息,預(yù)測(cè)模型效果普遍不甚理想。本文對(duì)一種改進(jìn)的GAN模型——ConSinGAN[13]進(jìn)行微調(diào),生成可靠逼真的偽參考圖像作為失真圖像的補(bǔ)償信息,使用超分辨率模型ESPCN(Efficient Sub-Pixel Convolutional Neural Network)恢復(fù)在訓(xùn)練過程中損失的紋理細(xì)節(jié)和清晰度,再提取偽參考圖像的顯著性信息與失真圖像一起輸入到IQA模型中。
2)提出了一種基于顯著性深層特征的無參考評(píng)價(jià)算法。將失真圖像與偽參考顯著性圖像輸入到深層特征提取網(wǎng)絡(luò)中提取語義信息,將二者的深層特征融合并映射到由全連接層組成的回歸網(wǎng)絡(luò)中,得到最終的預(yù)測(cè)得分。
NR-IQA由于具備更多的應(yīng)用場(chǎng)景被廣泛研究,近年來涌現(xiàn)了許多基于卷積神經(jīng)網(wǎng)絡(luò)(Convolutional Neural Network, CNN)的算法。例如Bosse等[9]基于CNN提出了一個(gè)端到端的無參考模型,將失真圖像輸入到CNN中提取特征,特征向量映射到由全連接層組成的回歸網(wǎng)絡(luò)中獲取預(yù)測(cè)得分。利用CNN提取特征相比傳統(tǒng)算法具有更好的效果,因?yàn)镃NN能獲取傳統(tǒng)算法無法提取到的深層語義特征。Liu等[4]提出的RankIQA(learning from Rankings for no-reference IQA)訓(xùn)練孿生網(wǎng)絡(luò)根據(jù)圖像質(zhì)量自動(dòng)給圖像進(jìn)行排序,而后將訓(xùn)練好的孿生網(wǎng)絡(luò)遷移到CNN中實(shí)現(xiàn)對(duì)圖像質(zhì)量的估算。RankIQA通過單個(gè)網(wǎng)絡(luò)向前傳播一批圖像并反向傳播該批次中所有圖像對(duì)得出的梯度,比傳統(tǒng)的孿生網(wǎng)絡(luò)更有效;該算法在當(dāng)時(shí)最新技術(shù)水平上提高了5個(gè)百分點(diǎn),在TID2013與LIVE數(shù)據(jù)集上都表現(xiàn)出不錯(cuò)的效果。Su等[6]提出了一個(gè)自適應(yīng)網(wǎng)絡(luò)架構(gòu)hyperIQA(hyper network IQA),該方法首先用ResNet50(50-layer Deep Residual Network)網(wǎng)絡(luò)提取圖像深層語義特征,通過構(gòu)建的超網(wǎng)絡(luò)自適應(yīng)地建立感知規(guī)則并將其用于質(zhì)量預(yù)測(cè)網(wǎng)絡(luò)。hyperIQA可以自適應(yīng)地估計(jì)圖像質(zhì)量,因此也適用于自然環(huán)境下捕獲的不同圖像。
通用NR-IQA算法由于缺失參考信息具有很大難度,大多NR-IQA算法效果不佳,為此,有研究將偽參考圖像概念引入到IQA中。例如,Min等[11]提出的IQA度量針對(duì)塊度、銳度和噪聲三種失真,假定生成的偽參考圖像遭受最嚴(yán)重的失真,計(jì)算失真圖像和偽參考圖像之間的相似度以評(píng)估圖像質(zhì)量。而后Min等[5]在此基礎(chǔ)上作了改進(jìn),對(duì)失真圖像作進(jìn)一步退化操作引入偽參考圖像,進(jìn)而對(duì)圖像進(jìn)行質(zhì)量評(píng)估。胡晉濱等[14]針對(duì)高斯模糊、JPEG(Joint Photographic Experts Group)壓縮與高斯噪聲三種失真類型提出了基于偽參考圖像的NR-IQA算法。Lin等[15]利用GAN生成具有更高質(zhì)量的偽參考圖像,然后將偽參考信息與失真圖像配對(duì),將其傳給回歸網(wǎng)絡(luò)引導(dǎo)回歸網(wǎng)絡(luò)學(xué)習(xí)感知差異,從而產(chǎn)生精確的質(zhì)量預(yù)測(cè)。相比之下,將GAN引入到IQA中能生成更高質(zhì)量的偽參考圖像,故本文采用微調(diào)的ConSinGAN模型生成偽參考圖像并進(jìn)行超分辨率操作,結(jié)合顯著性信息對(duì)圖像進(jìn)行深層特征提取,以得到與人類視覺良好的一致性。
IQA算法就是一種根據(jù)人類視覺系統(tǒng)建立圖像預(yù)測(cè)模型的計(jì)算機(jī)可執(zhí)行算法。本文提出了一種基于偽參考圖像顯著性深層特征的無參考圖像質(zhì)量評(píng)價(jià)算法。該算法主要由三部分組成,即偽參考圖像生成網(wǎng)絡(luò)、深層特征提取網(wǎng)絡(luò)與質(zhì)量回歸網(wǎng)絡(luò)。首先將失真圖像輸入到微調(diào)的ConSinGAN中生成偽參考圖像,經(jīng)過超分辨率模型ESPCN恢復(fù)在訓(xùn)練階段丟失的紋理細(xì)節(jié)與清晰度,再提取偽參考圖像的顯著性特征作為失真圖像的補(bǔ)償信息;而后將失真圖與偽參考顯著性圖作為輸入輸送到由CNN構(gòu)成的特征提取網(wǎng)絡(luò)中,提取二者的深層特征并融合;最后將融合特征輸入到質(zhì)量回歸網(wǎng)絡(luò)中,得到失真圖像的預(yù)測(cè)得分。圖1為本文算法的流程。
GAN是一種新型的生成式對(duì)抗網(wǎng)絡(luò)模型,包括生成模型(Generative Model)和判別模型(Discriminative Model)兩個(gè)部分,判別模型和生成模型相互博弈,使生成圖像更接近于目標(biāo)圖像[16]。生成器的作用主要是生成偽參考圖像,鑒別器以對(duì)抗的方式幫助生成器產(chǎn)生更符合預(yù)期目標(biāo)的結(jié)果。GAN及各種變體[17-18]在生成自然圖像領(lǐng)域蓬勃發(fā)展,最為經(jīng)典的是穩(wěn)定有效的DCGAN(Deep Convolutional Generative Adversarial Network)模型[19]。但是,GAN不易訓(xùn)練,模型容易崩潰,生成高分辨率圖像(例如256×256)會(huì)導(dǎo)致模型訓(xùn)練不穩(wěn)定,有時(shí)甚至是無意義的輸出[20]。因此,本文不采用原始GAN,而是通過引入文獻(xiàn)[13]中的ConSinGAN模型來生成逼真可靠的偽參考圖像。在生成偽參考圖像時(shí)微調(diào)模型,在訓(xùn)練過程中去掉原模型在每一階段添加的額外噪聲,以保證生成器輸出更逼真可靠的偽參考圖像。
ConSinGAN模型生成的偽參考圖像清晰度不足,相較于原始參考圖像清晰度下降且丟失部分紋理細(xì)節(jié)。為分析部分細(xì)節(jié)丟失與清晰度下降是否會(huì)影響后續(xù)顯著性特征提取效果,分別提取同樣會(huì)降低圖像細(xì)節(jié)層次與清晰度的高斯模糊失真下的不同失真等級(jí)的顯著性特征圖,如圖2所示。對(duì)比分析發(fā)現(xiàn),清晰度與細(xì)節(jié)層次的降低會(huì)影響到顯著性特征的提取效果,因此本文添加超分辨率[21]模塊,在提升偽參考圖像清晰度的同時(shí)盡可能保持圖像不丟失紋理信息,以便后續(xù)更準(zhǔn)確地提取顯著性特征。
圖 2 不同高斯模糊失真等級(jí)的顯著性圖
近年來,基于深度學(xué)習(xí)的通用NR-IQA算法已表現(xiàn)出優(yōu)于傳統(tǒng)算法的預(yù)測(cè)性能。本文的IQA部分主要由特征提取與質(zhì)量回歸網(wǎng)絡(luò)兩部分組成。IQA中以VGG(Visual Geometry Group)模型為原型提取失真圖像與偽參考顯著性圖像的深層特征并融合,而后將融合特征作為輸入映射到質(zhì)量回歸網(wǎng)絡(luò),最終得到與人類視覺一致的質(zhì)量預(yù)測(cè)。失真圖與偽參考顯著性圖的兩個(gè)特征提取網(wǎng)絡(luò)模型是一致的,都是以VGG網(wǎng)絡(luò)為原型,主要由卷積層和池化層構(gòu)成,在卷積與池化的過程中進(jìn)行圖像特征的提取。為更好地提取圖像語義特征,去掉原VGG網(wǎng)絡(luò)末尾三層全連接層與最末層,提取第四層(最大池化層)的特征。由于一幅圖像并非每個(gè)區(qū)域都會(huì)受到觀看者的注意,通常引起觀看者注意的區(qū)域部分的失真比其他區(qū)域中的失真影響更大,因此在特征提取部分中加入了顯著性信息作為深層特征的補(bǔ)償信息。將偽參考顯著性圖像與失真圖像的深層特征融合并映射到回歸網(wǎng)絡(luò)中獲取預(yù)測(cè)得分。回歸網(wǎng)絡(luò)由兩層全連接層(Fully Connection layer, FC)組成,分別為FC-512與FC-1。
在實(shí)際操作中,偽參考生成模型在不同階段生成不同分辨率的圖像,因此訓(xùn)練階段參數(shù)設(shè)置至關(guān)重要,設(shè)置合適的學(xué)習(xí)率和訓(xùn)練階段可改善模型的學(xué)習(xí)過程。設(shè)置學(xué)習(xí)率為0.1,訓(xùn)練階段為6,可令模型得到較好的生成效果。另外,在訓(xùn)練過程中與原模型設(shè)置不同,每個(gè)階段不需添加額外噪聲,以保證生成器輸出更可靠的偽參考圖像。
本文算法在被廣泛使用的大型公開數(shù)據(jù)集TID2013[26]、LIVE[27]、CSIQ[28]和TID2008[29]上進(jìn)行實(shí)驗(yàn)。TID2013數(shù)據(jù)集包括25張?jiān)紙D像,給這些原始圖像施加24種不同失真,例如簡(jiǎn)單的高斯噪聲、壓縮失真或更復(fù)雜的非偏心模式噪聲等,每種失真等級(jí)為5,共計(jì)得到3 000張失真圖像;TID2013與TID2008數(shù)據(jù)集的圖像主觀得分差異值范圍均為[0,9];LIVE數(shù)據(jù)庫(kù)包括29張?jiān)紙D像與779張失真圖像,這些失真圖像在不同的失真級(jí)別上受到5種不同失真類型的影響;CSIQ數(shù)據(jù)庫(kù)包括30張?jiān)紙D像與866張包含6種失真類型的失真圖像,主觀質(zhì)量得分范圍在[0,1]區(qū)間。
將本文算法與現(xiàn)有主流IQA算法進(jìn)行比較來驗(yàn)證算法的性能效果。由于本文的NR-IQA算法借助了類似于FR-IQA算法功能的偽參考圖像思想,因此選擇具有代表性的FR-IQA算法分別在TID2013、TID2008、LIVE與CSIQ數(shù)據(jù)集上與本文算法進(jìn)行比較,包括結(jié)構(gòu)相似性指數(shù)(Structural SIMilarity index, SSIM)[30]、FSIMc(Feature Similarity Index Method)[31]、VSI(Visual Saliency-Induced Index)[32]、GMSD(Gradient Magnitude Similarity Deviation)[33]、SPSIM(SuperPixel-based SIMilarity index)[34]、LLM(Local Linear Model)[35]等,結(jié)果如表1所示。從表中可以看出,本文算法在四個(gè)數(shù)據(jù)集中都表現(xiàn)出與人類主觀評(píng)價(jià)良好的一致性。
在LIVE數(shù)據(jù)集中,本文算法的SROCC、PLCC性能比FSIMc[31]算法均提高了2個(gè)百分點(diǎn),PLCC性能比GMSD[33]提高了4個(gè)百分點(diǎn),RMSE性能相比其他算法也大幅降低。CSIQ數(shù)據(jù)集中,SROCC性能比GMSD算法提高了1個(gè)百分點(diǎn),KROCC性能提高了2個(gè)百分點(diǎn),RMSE性能相比其他算法也有優(yōu)勢(shì)。在TID2013、TID2008數(shù)據(jù)集中,對(duì)于SROCC、PLCC、KROCC與RMSE評(píng)估,本文算法均優(yōu)于其他FR-IQA算法,表現(xiàn)出優(yōu)秀的性能。這表明當(dāng)圖像涵蓋豐富內(nèi)容時(shí),學(xué)習(xí)圖像語義內(nèi)容有助于感知圖像質(zhì)量。
表1 實(shí)驗(yàn)數(shù)據(jù)集上與本文算法與主流IQA算法的性能比較
表2 在LIVE與TID2013數(shù)據(jù)集上不同算法的性能比較
本文算法使用CNN提取特征,為進(jìn)一步分析本文算法的性能效果,選擇同類型具有代表性的算法進(jìn)行比較,包括DIQaM-NR(Deep Image Quality assessment Metric-No Refenence)[9]、DIIVINE(Distortion Identification-based Image Verity and INtegrity Evaluation, DIIVINE)[36]、CORNIA(COdebook Representation for No reference Image Assessment)[37]、BIQI(Blind Image Quality Index)[38]、H-IQA(Hallucinated-IQA)[15]、RankIQA[4]、hyperIQA[6]等,實(shí)驗(yàn)結(jié)果如表2所示。在LIVE數(shù)據(jù)集中,SROCC性能比DIQaM-NR提高2個(gè)百分點(diǎn),比hyperIQA提高1個(gè)百分點(diǎn),PLCC性能比hyperIQA提高2個(gè)百分點(diǎn),模型性能已超過同類算法。在TID2013數(shù)據(jù)集上,本文算法與其他算法相比,性能有大幅提升,SROCC性能比H-IQA提高了5個(gè)百分點(diǎn),比同為深度學(xué)習(xí)算法的RankIQA提高了14個(gè)百分點(diǎn),性能提升效果明顯;PLCC性能比BIQI算法提高了6個(gè)百分點(diǎn),比DIQaM-NR算法提高了8個(gè)百分點(diǎn),遠(yuǎn)超過同類算法性能。分析對(duì)比本文算法在LIVE與TID2013數(shù)據(jù)集中的不同表現(xiàn)可以看出,算法性能在TID2013數(shù)據(jù)集中有明顯提升,而在LIVE數(shù)據(jù)集中效果提升并不大,這是由于LIVE數(shù)據(jù)集失真類型較少,導(dǎo)致算法在達(dá)到一定程度的性能后評(píng)價(jià)性能提升不明顯,而更大、失真類型更復(fù)雜的TID2013數(shù)據(jù)集能更準(zhǔn)確分析算法的性能效果。
為了評(píng)估本文算法的泛化能力,進(jìn)行跨數(shù)據(jù)集實(shí)驗(yàn),以SROCC為評(píng)價(jià)指標(biāo)評(píng)估模型性能,并與主流無參考算法進(jìn)行比較,如BRISQUE(dubbed Blind/Referenceless Image Spatial QUality Evaluator)[39]、BLIINDS-II(Blind Image Integrity Notator using DCT Statistics)[40]、DIIVINE[36]、CORNIA[37]、DIQaM-NR[9]等。將在整個(gè)TID2013數(shù)據(jù)集訓(xùn)練的模型用CSIQ和LIVE數(shù)據(jù)集進(jìn)行模型性能測(cè)試,以及將在LIVE數(shù)據(jù)集訓(xùn)練的模型用CSIQ與TID2013數(shù)據(jù)集進(jìn)行測(cè)試,結(jié)果如表3所示。在訓(xùn)練集和測(cè)試集的組合中,本文算法表現(xiàn)出中等的泛化能力。TID2013數(shù)據(jù)集的訓(xùn)練模型在CSIQ和LIVE數(shù)據(jù)集上的測(cè)試結(jié)果表現(xiàn)出良好的效果,在CSIQ數(shù)據(jù)集相較于DIQaM-NR算法的SROCC提升了4個(gè)百分點(diǎn),同時(shí)也大大優(yōu)于其他無參考算法。而LIVE數(shù)據(jù)集的訓(xùn)練模型在CSIQ與TID2013數(shù)據(jù)集上進(jìn)行交叉評(píng)估時(shí),本文算法雖優(yōu)于其他無參考算法,但是這幾種無參考算法在這種情況下的表現(xiàn)都較差,在TID2013數(shù)據(jù)集上的SROCC都無法超過0.5。這是因?yàn)門ID2013數(shù)據(jù)集有24種失真類型,而LIVE數(shù)據(jù)集只有5種失真類型,二者有4種交叉失真類型,在失真類型與失真等級(jí)更多的TID2013數(shù)據(jù)集上訓(xùn)練的模型的泛化能力要優(yōu)于LIVE數(shù)據(jù)集訓(xùn)練模型的泛化能力,這遵循深度神經(jīng)網(wǎng)絡(luò)的泛化能力取決于訓(xùn)練集大小和多樣性的概念。由TID2013數(shù)據(jù)集訓(xùn)練模型優(yōu)于LIVE數(shù)據(jù)集訓(xùn)練模型的跨數(shù)據(jù)集測(cè)試結(jié)果表明,更大的數(shù)據(jù)集會(huì)有更好的泛化能力。
表3 跨數(shù)據(jù)集測(cè)試中的SROCC
將本文算法在整個(gè)TID2013數(shù)據(jù)集上與現(xiàn)有主流算法進(jìn)行比較,對(duì)數(shù)據(jù)集單一失真類型進(jìn)行分析,如表4所示。與其他算法對(duì)比,本文算法明顯提高了高斯模糊(Gaussian Blur, GB)、JPEG壓縮(JPEG)與JPEG2000壓縮(JP2K)等失真類型的性能,這說明在生成作為失真圖像補(bǔ)償信息的偽參考圖像時(shí),保持圖像的紋理細(xì)節(jié)和清晰度的做法是有意義的。掩蔽噪聲(Masked Noise, MN)和非偏心模式噪聲(Non Eccentricity Pattern Noise, NEPN)失真性能的大幅提升證明了所生成的偽參考圖像作為失真圖像補(bǔ)償信息的有效性。雖然本文算法并未大幅提高數(shù)據(jù)集中所有失真類型的性能,但是平均值相較RankIQA[4]算法仍提高了12個(gè)百分點(diǎn),對(duì)于多種失真算法性能較為穩(wěn)定,說明所提算法在不同失真類型中具有廣泛性。
表 4 在TID2013數(shù)據(jù)集上對(duì)單個(gè)失真類型的性能比較(SROCC)
本文算法針對(duì)例如量化噪聲(Quantization Noise, QN)、高斯模糊(GB)、色差(CHromatic Aberrations, CHA)與JPEG2000壓縮(JP2K)等多種常見失真類型均表現(xiàn)出較好的效果,并且在非偏心模式噪聲(NEPN)與稀疏采樣和重構(gòu)等(Sparse Sampling and Reconstruction, SSR)等不常見失真類型上也表現(xiàn)良好,因此本文算法適用于大多數(shù)失真退化的圖像。然而本文算法針對(duì)不同強(qiáng)度的局部分塊失真(Local block-wise distortions of different intensity, Block)與強(qiáng)度位移(Mean Shift, MS)兩種失真類型表現(xiàn)較差,且算法在訓(xùn)練階段使用了單一失真數(shù)據(jù)庫(kù),因此算法并不適用于多重復(fù)雜失真圖像。圖像失真類型復(fù)雜豐富,目前未出現(xiàn)一種算法能對(duì)所有失真類型均有優(yōu)異效果,因此在保持IQA模型高性能的同時(shí)提高模型的通用性是今后的研究重點(diǎn)。
圖3 TID2013數(shù)據(jù)庫(kù)上各算法的散點(diǎn)圖對(duì)比
此外,本文在失真類型復(fù)雜的TID2013數(shù)據(jù)集上對(duì)比了一些具有代表性度量(FSIM[31]、GMSD[33]、VSI[32])的散點(diǎn)圖,如圖3所示的曲線是通過Logistic函數(shù)非線性擬合獲得的,橫坐標(biāo)為客觀模型預(yù)測(cè)得分,縱坐標(biāo)為主觀得分差異值(Mean Opinion Score, MOS),散點(diǎn)圖反映兩者的相關(guān)度。從圖中可以看出,與其他算法對(duì)比,本文模型的預(yù)測(cè)得分與主觀評(píng)分之間的相關(guān)性更高,表明本文所提模型性能更好,與主觀質(zhì)量評(píng)價(jià)更加一致。
在配備Intel Xeon Gold CPU @2.5GHz 2.49 GHz處理器、256 GB RAM和兩個(gè)NVIDIA GeForce RTX 2080Ti GPU的計(jì)算機(jī)上運(yùn)行幾種基于深度學(xué)習(xí)的IQA算法,統(tǒng)計(jì)它們測(cè)試階段在TID2013數(shù)據(jù)集上每張圖像的平均運(yùn)行時(shí)間,包括模型測(cè)試階段的特征提取和回歸時(shí)間。結(jié)果顯示:本文算法對(duì)每幅圖像的平均運(yùn)行時(shí)間為0.72 s,比DIQaM-NR算法多0.43 s,本文算法雖然計(jì)算復(fù)雜度略高一點(diǎn),但是在TID2013數(shù)據(jù)庫(kù)中的SROCC值比DIQaM-NR算法高8個(gè)百分點(diǎn),性能有明顯提升。
本文提出了一種基于顯著性深層特征的無參考算法,該算法在訓(xùn)練階段生成可靠逼真的偽參考圖像作為失真圖像的補(bǔ)償信息,在一定程度上彌補(bǔ)了NR-IQA算法缺失參考圖像的不足。本文算法在特征提取階段利用微調(diào)的VGG16網(wǎng)絡(luò)提取失真圖與偽參考顯著性圖的深層語義特征并融合,映射到回歸網(wǎng)絡(luò)獲取預(yù)測(cè)得分。實(shí)驗(yàn)結(jié)果表明,本文算法的質(zhì)量預(yù)測(cè)符合人類質(zhì)量感知,預(yù)測(cè)得分與人類主觀質(zhì)量評(píng)價(jià)具有良好的一致性。但本文的計(jì)算復(fù)雜度較高,研究更加快速簡(jiǎn)單且高效的算法是今后研究的重點(diǎn)方向。此外,由于圖像失真類型豐富多樣,針對(duì)多種失真的通用NR-IQA算法具有研究前景,提升算法模型的應(yīng)用廣泛性是IQA的研究重點(diǎn)。
)
[1] KIM J, LEE S. Deep learning of human visual sensitivity in image quality assessment framework[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 1969-1977.
[2] 孫榮榮. 基于灰度共生矩陣相似圖的圖像質(zhì)量評(píng)價(jià)方法[J]. 計(jì)算機(jī)應(yīng)用, 2020,40(S1): 177-179.(SUN R R. Image quality assessment method based on similarity maps of gray level co-occurrence matrix[J]. Journal of Computer Applications, 2020, 40(S1):177-179.)
[3] GOLESTANEH S A, KARAM L J. Reduced-reference quality assessment based on the entropy of DWT coefficients of locally weighted gradient magnitudes[J]. IEEE Transactions on Image Processing, 2016, 25(11): 5293-5303.
[4] LIU X L, WEIJEI J van de, BAGDANOV A D. RankIQA: learning from rankings for no-reference image quality assessment[C]// Proceedings of 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 1040-1049.
[5] MIN X K, ZHAI G T, GU K, et al. Blind image quality estimation via distortion aggravation[J]. IEEE Transactions on Broadcasting, 2018, 64(2): 508-517.
[6] SU S L, YAN Q S, ZHU Y, et al. Blindly assess image quality in the wild guided by a self-adaptive hyper network[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 3664-3673.
[7] ZHANG L, ZHANG L, BOVIK A C. A feature-enriched completely blind image quality evaluator[J]. IEEE Transactions on Image Processing, 2015, 24(8): 2579-2591.
[8] SAAD M A, BOVIK A C, CHARRIER C. Blind image quality assessment: a natural scene statistics approach in the DCT domain[J]. IEEE Transactions on Image Processing, 2012, 21(8): 3339-3352.
[9] BOSSE S, MANIRY D, MüLLER K R, et al. Deep neural networks for no-reference and full-reference image quality assessment[J]. IEEE Transactions on Image Processing, 2018, 27(1): 206-219.
[10] XU P, GUO M, CHEN L, et al. No-reference stereoscopic image quality assessment based on binocular statistical features and machine learning[J]. Complexity, 2021, 2021: No.8834652.
[11] MIN X K, MA K D, GU K, et al. Unified blind quality assessment of compressed natural, graphic, and screen content images[J]. IEEE Transactions on Image Processing, 2017, 26(11): 5462-5474.
[12] 曹玉東,蔡希彪. 基于增強(qiáng)型對(duì)抗學(xué)習(xí)的無參考圖像質(zhì)量評(píng)價(jià)算法[J]. 計(jì)算機(jī)應(yīng)用, 2020, 40(11): 3166-3171.(CAO Y D, CAI X B. No-reference image quality assessment algorithm with enhanced adversarial learning[J]. Journal of Computer Applications, 2020, 40(11): 3166-3171.)
[13] HINT T, FISHER M, WANG O, et al. Improved techniques for training single-image GANs[C]// Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE, 2021: 1299-1308.
[14] 胡晉濱,柴雄力,邵楓. 基于偽參考圖像深層特征相似性的盲圖像質(zhì)量評(píng)價(jià)[J]. 光電子·激光, 2019, 30(11): 1184-1193.(HU J B, CHAI X L, SHAO F. Deep features similarity for blind quality assessment using Pseudo-reference image[J]. Journal of Optoelectronics·Laser, 2019, 30(11): 1184-1193.)
[15] LIN K Y, WANG G X. Hallucinated-IQA: no-reference image quality assessment via adversarial learning[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 732-741.
[16] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2014: 2672-2680.
[17] ARJOVSKY M, CHINATALA S, BOTTOU L, et al. Wasserstein generative adversarial networks[C]// Proceedings of the 34th International Conference on Machine Learning. New York: JMLR.org, 2017: 214-223.
[18] MIRZA M, OSINDERO S. Conditional generative adversarial nets[EB/OL]. (2014-11-06)[2021-07-07].https://arxiv.org/pdf/1411.1784.pdf.
[19] RADFORD A, METZ L, CHINTALA S. Unsupervised representation learning with deep convolutional generative adversarial networks[EB/OL]. (2016-01-07)[2021-07-07].https://arxiv.org/pdf/1511.06434.pdf.
[20] S?NDERBY C K, CABALLERO J, THEIS L, et al. Amortised MAP inference for image super-resolution[EB/OL]. (2017-02-21)[2021-07-07].https://arxiv.org/pdf/1610.04490.pdf.
[21] ZHANG K, GOOL L van, TIMOFTE R. Deep unfolding network for image super-resolution[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 3214-3223.
[22] SHI W Z, CABALLERO J, HUSZáR F, et al. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network[C]// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 1874-1883.
[23] GULRAJANI I, AHMED F, ARJOVSKY M, et al. Improved training of Wasserstein GANs[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 5769-5779.
[24] KINGMA D P, BA J L. Adam: a method for stochastic optimization[EB/OL]. (2017-01-30)[2021-07-07].https://arxiv.org/pdf/1412.6980.pdf.
[25] SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout: a simple way to prevent neural networks from overfitting[J]. Journal of Machine Learning Research, 2014, 15: 1929-1958.
[26] PONOMARENKO N, IEREMEIEV O, LUKIN V, et al. Color image database TID2013: peculiarities and preliminary results[C]// Proceedings of the 2013 European Workshop on Visual Information Processing. Piscataway: IEEE, 2013: 106-111.
[27] SHEIKH H R, SABIR M F, BOVIK A C. A statistical evaluation of recent full reference image quality assessment algorithms[J]. IEEE Transactions on Image Processing, 2006, 15(11): 3440-3451.
[28] LARSON E C, CHANDLER D M. Most apparent distortion: full-reference image quality assessment and the role of strategy[J]. Journal of Electronic Imaging, 2010, 19(1): No.011006.
[29] PONOMARENKO N, LUKIN V, ZELENSKY A, et al. TID2008 — a database for evaluation of full-reference visual quality assessment metrics[J]. Advances of Modern Radioelectronics, 2009, 10(4): 30-45.
[30] WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612.
[31] ZHANG L, ZHANG L, MOU X Q, et al. FSIM: a feature similarity index for image quality assessment[J]. IEEE Transactions on Image Processing, 2011, 20(8): 2378-2386.
[32] ZHANG L, SHEN Y, LI H Y. VSI: a visual saliency-induced index for perceptual image quality assessment[J]. IEEE Transactions on Image Processing, 2014, 23(10): 4270-4281.
[33] XUE W F, ZHANG L, MOU X Q, et al. Gradient magnitude similarity deviation: a highly efficient perceptual image quality index[J]. IEEE Transactions on Image Processing, 2014, 23(2): 684-695.
[34] SUN W, LIAO Q M, XUE J H, et al. SPSIM: a superpixel-based similarity index for full-reference image quality assessment[J]. IEEE Transactions on Image Processing, 2018, 27(9): 4232-4244.
[35] WANG H, FU J, HU S,et al.Image quality assessment based on local linear information and distortion-specific compensation[J].IEEE Transactions on Image Processing, 2017, 26(2): 915-926.
[36] MOORTHY A K, BOVIK A C. Blind image quality assessment: from natural scene statistics to perceptual quality[J]. IEEE Transactions on Image Processing, 2011, 20(12): 3350-3364.
[37] 楊璐,王輝,魏敏. 基于機(jī)器學(xué)習(xí)的無參考圖像質(zhì)量評(píng)價(jià)綜述[J]. 計(jì)算機(jī)工程與應(yīng)用, 2018, 54(19): 34-42.(YANG L, WANG H, WEI M. Review of no-reference image quality assessment based on machine learning[J]. Computer Engineering and Applications, 2018, 54(19): 34-42.)
[38] MOORTHY A K, BOVIK A C. A two-step framework for constructing blind image quality indices[J]. IEEE Signal Processing Letters, 2010, 17(5): 513-516.
[39] ALIZADEH M, SHARIFKHANI M. Subjective video quality prediction based on objective video quality metrics[C]// Proceedings of 4th Iranian Conference on Signal Processing and Intelligent Systems. Piscataway: IEEE, 2018: 7-9.
[40] LI D Q, JIANG T T, LIN W S, et al. Which has better visual quality: the clear blue sky or a blurry animal?[J]. IEEE Transactions on Multimedia, 2019, 21(5): 1221-1234.
No-reference image quality assessment algorithm based on saliency deep features
LI Jia1, ZHENG Yuanlin1,2*, LIAO Kaiyang1,2, LOU Haojie1, LI Shiyu1, CHEN Zehao1
(1,,’,’710048,;2,’710048,)
Aiming at the universal No-Reference Image Quality Assessment (NR-IQA) algorithms, a new NR-IQA algorithm based on the saliency deep features of the pseudo reference image was proposed. Firstly, based on the distorted image, the corresponding pseudo reference image of the distorted image generated by ConSinGAN model was used as compensation information of the distorted image, thereby making up for the weakness of NR-IQA methods: lacking real reference information. Secondly, the saliency information of the pseudo reference image was extracted, and the pseudo saliency map and the distorted image were input into VGG16 netwok to extract deep features. Finally, the obtained deep features were merged and mapped into the regression network composed of fully connected layers to obtain a quality prediction consistent with human vision.Experiments were conducted on four large public image datasets TID2013, TID2008, CSIQ and LIVE to prove the effectiveness of the proposed algorithm. The results show that the Spearman Rank-Order Correlation Coefficient (SROCC) of the proposed algorithm on the TID2013 dataset is 5 percentage points higher than that of H-IQA (Hallucinated-IQA) algorithm and 14 percentage points higher than that of RankIQA (learning from Rankings for no-reference IQA) algorithm. The proposed algorithm also has stable performance for the single distortion types. Experimental results indicate that the proposed algorithm is superior to the existing mainstream Full-Reference Image Quality Assessment (FR-IQA) and NR-IQA algorithms, and is consistent with human subjective perception performance.
No-Reference Image Quality Assessment (NR-IQA); Generative Adversarial Network (GAN); saliency; deep learning; super-resolution
This work is partially supported by National Natural Science Foundation of China (61771386), Natural Science Foundation of Shaanxi Province (2021JM-340).
LI Jia, born in 1997, M. S. candidate. Her research interests include deep learning, image processing.
ZHENG Yuanlin, born in 1976, Ph. D., associate professor. His research interests include color management, evaluation of quality of color image, color science.
LIAO Kaiyang, born in 1976, Ph. D.,lecturer. His research interests include machine vision, artificial intelligence.
LOU Haojie, born in 1996, M. S. candidate. His research interests include computer vision, object detection.
LI Shiyu, born in 1999. His research interests include image processing.
CHEN Zehao, born in 2002,. His research interests include image processing.
TP391.41;TN911.73
A
1001-9081(2022)06-1957-08
10.11772/j.issn.1001-9081.2021040597
2021?04?16;
2021?07?02;
2021?07?15。
國(guó)家自然科學(xué)基金資助項(xiàng)目(61771386);陜西省自然科學(xué)基金資助項(xiàng)目(2021JM-340)。
李佳(1997—),女,四川廣安人,碩士研究生,主要研究方向:深度學(xué)習(xí)、圖像處理;鄭元林(1976—),男,山東泰安人,副教授,博士,主要研究方向:色彩管理、彩色圖像質(zhì)量評(píng)估、顏色科學(xué);廖開陽(1976—),男,湖北荊州人,講師,博士,主要研究方向:機(jī)器視覺、人工智能;樓豪杰(1996—),男,浙江義烏人,碩士研究生,主要研究方向:計(jì)算機(jī)視覺、目標(biāo)檢測(cè);李世宇(1999—),男,陜西西安人,主要研究方向:圖像處理;陳澤豪(2002—),男,陜西渭南人,主要研究方向:圖像處理。