李巍,梁斯昕,張建州
基于梯度引導(dǎo)加權(quán)?延遲負(fù)梯度衰減損失的長尾圖像缺陷檢測
李巍*,梁斯昕,張建州
(四川大學(xué) 計算機(jī)學(xué)院,成都 610065)( ? 通信作者電子郵箱wana@stu.scu.edu.cn)
針對目前圖像缺陷檢測模型對長尾缺陷數(shù)據(jù)集中尾部類檢測效果較差的問題,提出一個基于梯度引導(dǎo)加權(quán)?延遲負(fù)梯度衰減損失(GGW-DND Loss)。首先,根據(jù)檢測器分類節(jié)點(diǎn)的累積梯度比值分別對正負(fù)梯度重新加權(quán),減輕尾部類分類器的受抑制狀態(tài);其次,當(dāng)模型優(yōu)化到一定階段時,直接降低每個節(jié)點(diǎn)產(chǎn)生的負(fù)梯度,以增強(qiáng)尾部類分類器的泛化能力。實(shí)驗(yàn)結(jié)果表明,在自制圖像缺陷數(shù)據(jù)集和NEU-DET(NEU surface defect database for Defect Detection Task)上,所提損失的尾部類平均精度均值(mAP)優(yōu)于二分類交叉熵?fù)p失(BCE Loss),分別提高了32.02和7.40個百分點(diǎn);與EQL v2(EQualization Loss v2)相比,分別提高了2.20和0.82個百分點(diǎn),驗(yàn)證了所提損失能有效提升網(wǎng)絡(luò)對尾部類的檢測性能。
長尾數(shù)據(jù)集;累計梯度比值;加權(quán)損失;圖像缺陷檢測;卷積神經(jīng)網(wǎng)絡(luò)
隨著全球信息化的發(fā)展,圖像由于它的便利性和生動性逐漸成為新一代信息表達(dá)的主要方式。在圖像的生成和傳輸過程中,由于電流電壓不穩(wěn)定或者線路傳輸故障等原因,很容易導(dǎo)致最終呈現(xiàn)的畫面出現(xiàn)各種缺陷[1],這些圖像缺陷包括但不限于壞點(diǎn)、亮度異常、偏色、馬賽克、圖像噪聲等。圖像缺陷檢測[2]旨在發(fā)現(xiàn)圖像中的外觀瑕疵,是保障圖像質(zhì)量、滿足用戶需求的重要技術(shù)之一。
以往的圖像缺陷檢測主要通過人工目測實(shí)現(xiàn),然而傳統(tǒng)的人工巡檢方法存在較大不足:首先人力成本過高且耗時長;其次人的判斷具有強(qiáng)烈的主觀性,沒有一個具體量化的標(biāo)準(zhǔn);再次人在長時間工作下還會產(chǎn)生誤判、漏判等問題?;趥鹘y(tǒng)圖像處理的工業(yè)缺陷檢測算法[3-5]由于速度快、精度高,并且極大地減少了人工干預(yù),得到較大發(fā)展和應(yīng)用;但是這些算法大多針對特定的場景,特征提取過程繁瑣,而且無法適應(yīng)背景復(fù)雜多變的缺陷類型。基于深度學(xué)習(xí)的缺陷檢測方法[6-8]不僅能適應(yīng)多變的工業(yè)場景,也避免了復(fù)雜的人工特征設(shè)計,同時保持較高的速度和精度,具有非常重要的研究價值和廣闊的應(yīng)用前景。
與通用的目標(biāo)檢測任務(wù)相比,缺陷檢測需要考慮具體的任務(wù)需求和數(shù)據(jù)樣本的獲取情況。在圖像缺陷檢測任務(wù)中,正常樣本居多,因此可用于訓(xùn)練的缺陷圖像較少;此外在不同的應(yīng)用場景采集的數(shù)據(jù)集,各類別間的缺陷圖像樣本數(shù)也可能相差較大。例如,對于某些圖像捕獲設(shè)備,由于圖像傳感器光線采集點(diǎn)的異常造成的壞點(diǎn)發(fā)生率遠(yuǎn)大于其他類型的圖像缺陷,而在經(jīng)由某些劣質(zhì)傳輸介質(zhì)的圖像傳輸設(shè)備中,搜集的缺陷主要是噪聲類型。因此,對于大多圖像缺陷檢測任務(wù),數(shù)據(jù)集稀缺且呈現(xiàn)長尾分布,直接在長尾數(shù)據(jù)集上進(jìn)行缺陷檢測訓(xùn)練將面臨較大的挑戰(zhàn),尤其是尾部類。一方面,頭部類占據(jù)了數(shù)據(jù)集的絕大部分,主導(dǎo)了訓(xùn)練的過程;另一方面,由于尾部類自身缺乏足夠的數(shù)據(jù)輸入,使得模型對尾部類學(xué)習(xí)不充分。
基于以上問題,本文提出一個新的損失函數(shù)——梯度引導(dǎo)加權(quán)?延遲負(fù)梯度衰減損失(Gradient-Guide Weighted-Deferred Negative Gradient decay Loss, GGW-DND Loss),以提升缺陷檢測模型在應(yīng)用該損失函數(shù)后對尾部類的檢測效果。
本文的主要工作如下:
1)根據(jù)檢測模型中分類器節(jié)點(diǎn)提取的累計梯度比值粗略地估計不同類別實(shí)例的分布情況,平衡不同分類節(jié)點(diǎn)每次產(chǎn)生的正負(fù)梯度,強(qiáng)調(diào)尾部類相關(guān)節(jié)點(diǎn)的學(xué)習(xí)過程。
2)對訓(xùn)練后期冗余的負(fù)梯度作衰減處理,避免尾部類由于正樣本過少產(chǎn)生的過擬合和弱泛化問題。
3)與當(dāng)下主流的損失函數(shù)進(jìn)行實(shí)驗(yàn)比較,在應(yīng)用GGW-DND Loss后,圖像缺陷檢測模型對尾部類的檢測效果提升明顯。
在機(jī)器視覺任務(wù)中,缺陷的定義[9]通常是一個主觀的概念,不同的檢測任務(wù)具有不同的檢出目標(biāo)。缺陷檢測常用于發(fā)現(xiàn)各種工業(yè)制品的外觀缺陷,包括織物、材料、藥品、芯片和建筑等。根據(jù)不同的缺陷認(rèn)知模式檢測手段分為兩種。第一種是根據(jù)已有的缺陷樣本,總結(jié)缺陷出現(xiàn)的位置和表現(xiàn)形式,它與傳統(tǒng)意義上的目標(biāo)檢測任務(wù)最相似。根據(jù)是否產(chǎn)生區(qū)域建議,主流的缺陷模式已知的深度學(xué)習(xí)缺陷檢測分為一階段目標(biāo)檢測[10-12]和兩階段目標(biāo)檢測[6-8]。兩階段目標(biāo)檢測在輸入圖像經(jīng)過骨干網(wǎng)絡(luò)的特征圖上生成區(qū)域建議,將生成的候選區(qū)域提取特征分別送入分類器和回歸器;而一階段目標(biāo)檢測則直接在骨干網(wǎng)絡(luò)輸出的特征圖上同時作分類和回歸。第二種缺陷檢測的方法是利用無標(biāo)注樣本或正常樣本構(gòu)建檢測模型,通過分析異常樣本與正常樣本之間的差異程度檢測缺陷?;谏疃葘W(xué)習(xí)的異常檢測技術(shù)包括距離度量[13-15]、分類面構(gòu)建[16]、圖像重構(gòu)[17-19]和結(jié)合傳統(tǒng)技術(shù)的混合方法[20-22]。一方面,距離度量和分類面構(gòu)建的方法均對正常樣本存在的特征空間分布有著嚴(yán)格要求,因此不適用于圖像等復(fù)雜背景的缺陷檢測。例如,Ruff等[13]提出深度支持向量數(shù)據(jù)描述(Deep Support Vector Data Description, DSVDD)方法,假設(shè)正常樣本在特征空間中存在單峰分布,但是這在復(fù)雜多變的醫(yī)學(xué)圖像或自然圖像上很難保證。另一方面,圖像重構(gòu)的異常檢測方法存在重構(gòu)模糊、對缺陷樣本過擬合的隱患和優(yōu)化困難等問題。例如,Gong等[23]在自編碼器基礎(chǔ)上,使用記憶模塊存儲具有代表性的特征向量,雖然提高了模型對異常樣本過強(qiáng)的適應(yīng)能力,但是顯著提高了空間存儲要求,而且對于自然圖像仍然存在重構(gòu)模糊問題;Schlegl等[24]提出AnoGAN(Anomaly detection with Generative Adversarial Network)模型,通過迭代優(yōu)化重構(gòu)圖像顯著提升了圖像質(zhì)量,但是它的效率較低,難以應(yīng)用在實(shí)時監(jiān)測任務(wù)。基于以上分析,針對圖像缺陷檢測任務(wù),本文考慮從已知缺陷樣本出發(fā),訓(xùn)練一個缺陷檢測網(wǎng)絡(luò)。
圖像缺陷檢測與目標(biāo)檢測仍存在些許不同。首先,模型最終取得的性能很大程度上與可供訓(xùn)練的樣本數(shù)有關(guān)。通常用于目標(biāo)檢測的數(shù)據(jù)集,如COCO(Common Objects in COntext)數(shù)據(jù)集[25]、VOC(Visual Object Class)數(shù)據(jù)集[26]等,包含了大量的可供訓(xùn)練的實(shí)例樣本;而在缺陷檢測中,實(shí)際獲取的缺陷樣本數(shù)有限。其次,目標(biāo)檢測數(shù)據(jù)集中的圖片經(jīng)過人工精心挑選后,每個類別的樣本實(shí)例數(shù)相對均衡;而在實(shí)際圖像缺陷檢測中,不同場景下采集的缺陷樣本在不同類別中通常表現(xiàn)極端地不平衡。
在目標(biāo)檢測子領(lǐng)域也有針對這種數(shù)據(jù)不平衡現(xiàn)象的研究,即長尾目標(biāo)檢測。目前緩解此類類別樣本不平衡的方法主要有重采樣、加權(quán)損失和遷移學(xué)習(xí)。重采樣方法根據(jù)采樣對象具體可以分為對頭部類作欠采樣[27-29]和對尾部類作過采樣[27,30-31]。Chawla等[32]詳細(xì)分析了重采樣方法的缺陷,指出當(dāng)數(shù)據(jù)極度不平衡時,欠采樣會導(dǎo)致數(shù)據(jù)嚴(yán)重不足,而過采樣則會導(dǎo)致模型對小部分類產(chǎn)生過擬合,兩種方法均不能取得較理想的結(jié)果。加權(quán)損失通過為類級別[33-37]或?qū)嵗墑e[38-40]樣本賦予自適應(yīng)權(quán)重,影響反向傳播中的參數(shù)更新。自適應(yīng)權(quán)重通常以數(shù)據(jù)集中各類樣本的頻率為參考,多數(shù)研究也是在此基礎(chǔ)上展開。如Cui等[33]強(qiáng)調(diào)在類別損失計算中添加有效樣本數(shù)量參數(shù);Cao等[37]考慮給予尾部類更大的分類邊界和一種延遲加權(quán)的訓(xùn)練策略。遷移學(xué)習(xí)的方法[41-43]則是將從頭部類豐富的數(shù)據(jù)中學(xué)習(xí)的特征知識遷移到尾部類,以彌補(bǔ)尾部類數(shù)據(jù)的不足。Yin等[41]提出基于中心的特征遷移方法,解決了部分受試者樣本不足的問題,但是這類方法只適用于特定領(lǐng)域,如人臉識別,對復(fù)雜多變的自然場景收效甚微。Liu等[42]采用動態(tài)元嵌入的方法將視覺記憶庫中的記憶特征和直接學(xué)習(xí)的觀察特征進(jìn)行融合與調(diào)整,但是較難生成視覺記憶庫。
本文實(shí)驗(yàn)均基于兩階段目標(biāo)檢測框架Faster R-CNN(Faster Region-based Convolutional Neural Network)[44],它的基本網(wǎng)絡(luò)結(jié)構(gòu)如圖1所示。輸入圖片經(jīng)過預(yù)處理后,送入骨干網(wǎng)絡(luò)提取特征圖,其中骨干網(wǎng)絡(luò)為具備殘差連接的殘差網(wǎng)絡(luò)ResNet-50(Residual Network-50)[45],由卷積層、批歸一化(Batch Normalization, BN)層、修正線性單元(Rectified Linear Unit, ReLU)激活函數(shù)層、池化層(最大池化)、殘差卷積塊(殘差網(wǎng)絡(luò)中用于改變輸出維度的基本模塊)和殘差恒等塊(殘差網(wǎng)絡(luò)中用于保持輸出維度的基本模塊)組成。區(qū)域建議網(wǎng)絡(luò)(Region Proposal Network, RPN)[44]將特征圖作為輸入,首先經(jīng)過一組公共的卷積層和ReLU激活函數(shù)層;其次基于預(yù)定義的錨框生成相應(yīng)的區(qū)域建議,其中上分支用于判斷錨框中是否存在前景目標(biāo),下分支用于對錨框的邊界進(jìn)行回歸;最后,將區(qū)域建議和特征圖經(jīng)過感興趣區(qū)域調(diào)整(Region Of Interest Align, ROI Align)[46]池化后送入檢測頭網(wǎng)絡(luò)。檢測頭網(wǎng)絡(luò)共用兩組全連接層和ReLU激活函數(shù)層,再分別送入一次全連接層后進(jìn)行類別預(yù)測和邊界框回歸。
圖1 Faster-RCNN的基本網(wǎng)絡(luò)結(jié)構(gòu)
圖2為現(xiàn)有缺陷檢測網(wǎng)絡(luò)中,檢測頭網(wǎng)絡(luò)中類別預(yù)測對單個輸出向量計算損失的過程。在得到單個邏輯輸出向量后,通過Sigmoid函數(shù)將它轉(zhuǎn)換為各類別的概率分布預(yù)測,再與真實(shí)標(biāo)簽計算交叉熵?fù)p失。本文簡單地將檢測頭的類別預(yù)測網(wǎng)絡(luò)看作一個多分類網(wǎng)絡(luò),對于每個RPN輸出的候選區(qū)域?qū)嵗挥幸粋€與標(biāo)簽對應(yīng)類別處的邏輯輸出值需要產(chǎn)生較高的分?jǐn)?shù),而其他類別需要產(chǎn)生較低的分?jǐn)?shù)。即為了盡可能在標(biāo)簽對應(yīng)類別處生成最大預(yù)測值,此類別相關(guān)的神經(jīng)元應(yīng)當(dāng)更有可能被激活;相應(yīng)地,其他神經(jīng)元應(yīng)當(dāng)遭到抑制。
圖2 基于GGW-DND Loss的類別預(yù)測網(wǎng)絡(luò)
為了解決上述問題,早期的通用做法是將數(shù)據(jù)集中各類別實(shí)例數(shù)占總實(shí)例數(shù)之比作為損失函數(shù)調(diào)節(jié)因子。然而,由于網(wǎng)絡(luò)輸入的多樣性和難易樣本的存在,各樣本對損失函數(shù)的貢獻(xiàn)也不同,例如,對于簡易樣本,與其他樣本實(shí)例數(shù)的差異并不會過多地影響最終檢測效果。Focal Loss[38]利用預(yù)測輸出與真實(shí)標(biāo)簽的相近程度,在考慮正負(fù)樣本失衡的同時,顧及了模型對于難易樣本的學(xué)習(xí)。本文則觀察樣本數(shù)和樣本難易程度共同作用的指標(biāo)——反向傳播的累計梯度[47]。
通常情況下,數(shù)據(jù)集中實(shí)例數(shù)的不足將導(dǎo)致尾部分類器節(jié)點(diǎn)的累計梯度比值長期遠(yuǎn)小于1。大量的正負(fù)梯度差異將導(dǎo)致分類器節(jié)點(diǎn)的更新長期遭到抑制,不利于后續(xù)的分類測試。因此,依據(jù)各分類器節(jié)點(diǎn)的累積梯度比值對后續(xù)反向傳播中的正負(fù)梯度進(jìn)行加權(quán),本文稱為梯度引導(dǎo)加權(quán)損失(Gradient-Guide Weighted Loss, GGW Loss)。
交叉熵?fù)p失(Cross Entropy Loss, CE Loss)[33]和二分類交叉熵?fù)p失(Binary Cross Entropy Loss, BCE Loss)[33]是常見的用于分類器中計算損失的函數(shù)。如式(6)~(7)所示,BCE Loss與CE Loss中將分類器邏輯輸出向量轉(zhuǎn)換為各類別概率輸出向量所采用的激活函數(shù):CE Loss采用Softmax函數(shù),而BCE Loss采用Sigmoid函數(shù)。CE Loss假定每個輸入樣本只對應(yīng)一個真實(shí)類別,因此通過Softmax函數(shù)計算的概率值之間相互關(guān)聯(lián)且互斥;而BCE Loss將多分類任務(wù)看作多個獨(dú)立的二分類任務(wù),通過Sigmoid函數(shù)計算的概率值之間相互獨(dú)立。根據(jù)前文分析,如果在訓(xùn)練階段,長期抑制尾部分類器節(jié)點(diǎn)的激活狀態(tài),則在測試階段,它的預(yù)測值也將偏低。如果采用帶有互斥性質(zhì)的Softmax函數(shù)與CE Loss作為損失函數(shù),將無法有效改善尾部分類器節(jié)點(diǎn)的受抑制狀態(tài),因此本文選擇BCE Loss。
根據(jù)文獻(xiàn)[37],在對學(xué)習(xí)率衰減之前,使用BCE Loss或者CE Loss的算法生成的特征優(yōu)于重新加權(quán)算法和重采樣算法。本文采用此設(shè)置,將訓(xùn)練過程中引入ND Loss的時機(jī)推遲到學(xué)習(xí)率衰減之后,計算過程仍參考式(9)。因此,本文將統(tǒng)一使用延遲負(fù)梯度衰減損失(Deferred Negative gradient Decay Loss, DND Loss)代替ND Loss。
在2.1節(jié)中,GGW Loss利用累計梯度比值對正負(fù)梯度加權(quán);在2.2節(jié)中,DND Loss對訓(xùn)練后期的負(fù)梯度進(jìn)行縮放處理。將兩個損失函數(shù)結(jié)合可以得到梯度引導(dǎo)加權(quán)?延遲負(fù)梯度衰減損失(GGW-DND Loss),計算過程如下:
為了驗(yàn)證GGW-DND Loss的有效性,本文分別基于自定義的圖像缺陷檢測數(shù)據(jù)集和NEU-DET(NEU surface defect database for Defect dEtection Task)數(shù)據(jù)集[48]展開實(shí)驗(yàn)。自定義的圖像缺陷檢測數(shù)據(jù)集共包含6種典型的圖像缺陷:馬賽克、偏色、亮度異常、高斯噪聲、椒鹽噪聲和壞塊。訓(xùn)練集共包含2 900張圖像,每張圖像至少包含1種指定類型的缺陷,但也可能包含其他類型的缺陷。前3種缺陷類型各有2 000張圖像,后3種缺陷類型各有20張圖像,不同類型的圖像缺陷樣本如圖3所示。驗(yàn)證集共包含300張圖像,每種類型的缺陷各搜集了200張圖像,并且在該驗(yàn)證集上比較檢測結(jié)果。NEU-DET數(shù)據(jù)集中包含了熱軋鋼帶場景下6種典型的表面缺陷:銀紋(Crazing)、夾雜物(Inclusion)、斑塊(Patches)、麻點(diǎn)表面(Pitted_surface)、軋入氧化皮(Rolled-in_scale)和劃痕(Scratches)。該數(shù)據(jù)集對每種缺陷均搜集了300個樣本,各缺陷實(shí)例數(shù)最多為981個,最少為430個。
3.2.1評估指標(biāo)
在對長尾缺陷檢測數(shù)據(jù)集訓(xùn)練后,本文在相應(yīng)的平衡驗(yàn)證數(shù)據(jù)集上評估檢測模型,并報告所有類的平均精度均值(mean Average Precision, mAP)。此外,參考文獻(xiàn)[35],本文額外根據(jù)每個類的訓(xùn)練樣本數(shù)將這些類劃分成兩組:頭部類和尾部類,并分別報告每組的mAP值用于比較。為作區(qū)分,定義頭部類的平均精度均值為mAPf;尾部類的平均精度均值為mAPr。
3.2.2實(shí)施細(xì)節(jié)
本節(jié)主要對比了主流缺陷檢測模型在面臨圖像缺陷檢測數(shù)據(jù)集存在長尾現(xiàn)象時的表現(xiàn)。
在實(shí)驗(yàn)中,二階段缺陷檢測模型Faster R-CNN采用ResNet-50作為骨干網(wǎng)絡(luò),設(shè)置它的分類器分別基于CE Loss、BCE Loss計算損失,并報告評估指標(biāo)。SSD(Single Shot multibox Detector)[50]與YOLOv5[51]是典型的一階段缺陷檢測模型,直接從輸入圖像中學(xué)習(xí)深度特征并預(yù)測類別和邊界框。本文設(shè)置SSD的骨干網(wǎng)絡(luò)為VGG-16(Visual Geometry Group-16)[52],分類器損失函數(shù)為CE Loss;設(shè)置YOLOv5的骨干網(wǎng)絡(luò)為CSP-DarkNet(Cross Stage Partial-Dark Network)[53],分類器損失函數(shù)為BCE Loss。
從表1可以看出,直接在不平衡缺陷數(shù)據(jù)集上訓(xùn)練時,基于CE Loss和BCE Loss作為分類器損失函數(shù)的Faster R-CNN對于尾部類的檢測指標(biāo)mAPr均優(yōu)于SSD和YOLOv5,即二階段缺陷檢測模型Faster R-CNN對尾部類的檢測精度相較于一階段缺陷檢測模型具有一定的優(yōu)勢,因此本文后續(xù)將基于Faster R-CNN進(jìn)行實(shí)驗(yàn)。
表1 不同缺陷檢測網(wǎng)絡(luò)的檢測結(jié)果對比 單位:%
將GGW-DND Loss與多種當(dāng)前主流的應(yīng)用于長尾數(shù)據(jù)集學(xué)習(xí)的損失函數(shù)進(jìn)行對比實(shí)驗(yàn)。對比損失函數(shù)包括:
1)CE Loss[33]。通過Softmax函數(shù)預(yù)測類別概率,并回傳預(yù)測向量中真實(shí)標(biāo)簽為1處的梯度。
2)BCE Loss[33]。通過Sigmoid函數(shù)預(yù)測類別概率,并對整個預(yù)測向量進(jìn)行梯度回傳;
3)Focal Loss[38]。從損失函數(shù)中樣本難易分類角度出發(fā),解決樣本非平衡帶來的模型訓(xùn)練問題;
4)Logit Adjustment[54]?;跇?biāo)簽頻率擴(kuò)大邏輯輸出向量中頭部類與尾部類之間的差值。
5)EQL(EQualization Loss)[55]。簡單地忽略稀有類別的梯度解決長尾問題。
6)BAGS(BAlanced Group Softmax)[40]。將標(biāo)簽分組,通過分組訓(xùn)練平衡檢測框架內(nèi)的分類器。
7)EQL v2[47]。根據(jù)累計梯度比值對訓(xùn)練過程中的正負(fù)梯度進(jìn)行動態(tài)重新加權(quán),平等地重新平衡每個類別的訓(xùn)練過程。
如表2所示,F(xiàn)ocal Loss在極端的數(shù)據(jù)不平衡的場景下,對尾部類的檢測效果較差;Logit Adjustment和EQL在一定程度上都能提升對尾部類的檢測;BAGS在原始模型上通過微調(diào)適應(yīng)尾部類的學(xué)習(xí),但在缺陷檢測訓(xùn)練集數(shù)量有限的前提下,不能取得很好的結(jié)果;EQL v2依據(jù)歷史正負(fù)梯度比值對本輪批次的梯度作加權(quán),取得了較好的結(jié)果;GGW-DND Loss在mAPr上取得了最優(yōu)值,相較于BCE Loss和EQL v2,分別提高了32.02和2.20個百分點(diǎn)。
表2 基于不同損失函數(shù)的檢測結(jié)果對比 單位:%
本文共涉及2個損失函數(shù)、4個超參數(shù)。下文對應(yīng)用GGW-DND Loss的Faster R-CNN在不同超參數(shù)組合下的缺陷檢測性能作比較。
圖4 不同和組合下的映射函數(shù)
Fig. 4 Mapping functions under different combinations of and
表3 不同組合參數(shù)和下的檢測結(jié)果
表4 不同下的檢測結(jié)果
表5 不同下的檢測結(jié)果
圖5 不同下縮放后的對應(yīng)負(fù)梯度
表6為將Faster R-CNN中分類器損失函數(shù)分別設(shè)置為BCE Loss和GGW-DND Loss后各分類節(jié)點(diǎn)的累計梯度比值。其中:前3類(馬賽克、偏色和亮度異常)的樣本在數(shù)據(jù)集中較多,屬于頭部類;后3類(高斯噪聲、椒鹽噪聲和壞塊)的樣本在數(shù)據(jù)集中較少,屬于尾部類。從表6可知,采用BCE Loss后,頭部類與尾部類分類器節(jié)點(diǎn)之間的累計梯度比值差異較大;而采用GGW-DND Loss后,各節(jié)點(diǎn)之間的累計梯度比值差距明顯縮小。
表6 基于BCE Loss和GGW-DND Loss的分類器節(jié)點(diǎn)的累計梯度比值
表7 消融實(shí)驗(yàn)結(jié)果 單位:%
為了驗(yàn)證本文所提損失函數(shù)對不同數(shù)據(jù)集的泛化性,遵循文獻(xiàn)[37]中的設(shè)置,本文制作了NEU-DET的長尾版本的訓(xùn)練集,具體情況如表8所示。
表8 NEU-DET數(shù)據(jù)集
表9 NEU-DET數(shù)據(jù)集上基于不同損失函數(shù)的檢測結(jié)果比較 單位:%
本文從實(shí)際應(yīng)用角度出發(fā),研究了圖像缺陷檢測中對長尾數(shù)據(jù)集的處理方式,提出基于梯度引導(dǎo)加權(quán)?延遲負(fù)梯度衰減的損失(GGW-DND Loss)。首先,基于分類器中各分類節(jié)點(diǎn)的累積梯度比值,分別放大和縮小相應(yīng)的正負(fù)梯度,緩解尾部類長期正梯度不足且負(fù)梯度泛濫的現(xiàn)象;其次,直接限制訓(xùn)練后期產(chǎn)生的負(fù)梯度,緩解尾部類的過擬合和弱泛化問題;最后,基于兩類數(shù)據(jù)集的實(shí)驗(yàn)結(jié)果均表明所提損失函數(shù)的有效性。但是,針對NEU-DET上的實(shí)驗(yàn)結(jié)果,本文損失函數(shù)在面對復(fù)雜缺陷和實(shí)例數(shù)懸殊的場景下,尾部類的檢測性能仍遠(yuǎn)不及頭部類,后續(xù)工作將從這一方面展開。
[1] 王志明. 無參考圖像質(zhì)量評價綜述[J]. 自動化學(xué)報, 2015, 41(6):1062-1079.(WANG Z M. Review of no-reference image quality assessment[J]. Acta Automatica Sinica, 2015, 41(6): 1062-1079.)
[2] 房亞男. 數(shù)字視頻圖像質(zhì)量檢測關(guān)鍵技術(shù)研究[J]. 科技創(chuàng)新導(dǎo)報, 2013(16): 22-26.(FANG Y N. Research on key technologies of digital video image quality detection[J]. Science and Technology Innovation Herald, 2013(16): 22-26.)
[3] WANG F L, ZUO B. Detection of surface cutting defect on MagNet using Fourier image reconstruction[J]. Journal of Central South University, 2016, 23(5): 1123-1131.
[4] WEE C Y, PARAMESRAN R. Image sharpness measure using eigenvalues[C]// Proceedings of the 9th International Conference on Signal Processing. Piscataway: IEEE, 2008: 840-843.
[5] HOU Z, PARKER J M. Texture defect detection using support vector machines with adaptive Gabor wavelet features[C]// Proceedings of the 7th IEEE Workshops on Applications of Computer Vision. Piscataway: IEEE, 2005: 275-280.
[6] CHA Y J, CHOI W, SUH G, et al. Autonomous structural visual inspection using region-based deep learning for detecting multiple damage types[J]. Computer-Aided Civil and Infrastructure Engineering, 2018, 33(9): 731-747.
[7] TAO X, ZHANG D P, WANG Z H, et al. Detection of power line insulator defects using aerial images analyzed with convolutional neural networks[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020, 50(4): 1486-1498.
[8] HE Y, SONG K C, MENG Q G, et al. An end-to-end steel surface defect detection approach via fusing multiple hierarchical features[J]. IEEE Transactions on Instrumentation and Measurement, 2020, 69(4): 1493-1504.
[9] 陶顯,侯偉,徐德. 基于深度學(xué)習(xí)的表面缺陷檢測方法綜述[J]. 自動化學(xué)報, 2021, 47(5):1017-1034.(TAO X, HOU W, XU D. A survey of surface defect detection methods based on deep learning[J]. Acta Automatica Sinica, 2021, 47(5): 1017-1034.)
[10] LI J Y, SU Z F, GENG J H, et al. Real-time detection of steel strip surface defects based on improved YOLO detection network[J]. IFAC-PapersOnLine, 2018, 51(21): 76-81.
[11] ZHANG C B, CHANG C C, JAMSHIDI M. Concrete bridge surface damage detection using a single-stage detector[J]. Computer-Aided Civil and Infrastructure Engineering, 2020, 35(4): 389-409.
[12] CHEN S H, TSAI C C. SMD LED chips defect detection using a YOLOv3-dense model[J]. Advanced Engineering Informatics, 2021, 47: No.101255.
[13] RUFF L, VANDERMEULEN R A, G?RNITZ N, et al. Deep one-class classification[C]// Proceedings of the 35th International Conference on Machine Learning. New York: JMLR.org, 2018: 4393-4402.
[14] BERGMANN P, FAUSER M, SATTLEGGER D, et al. Uninformed students: student-teacher anomaly detection with discriminative latent embeddings[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 4182-4191.
[15] LIZNERSKI P, RUFF L, VANDERMEULEN R A, et al. Explainable deep one-class classification[EB/OL]. (2021-03-18) [2022-09-09].https://arxiv.org/pdf/2007.01760.pdf.
[16] GOLAN I, EL-YANIV R. Deep anomaly detection using geometric transformations[C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2018: 9781-9791.
[17] HASELMANN M, GRUBER D P, TABATABAI P. Anomaly detection using deep learning based image completion[C]// Proceedings of the 17th IEEE International Conference on Machine Learning and Applications. Piscataway: IEEE, 2018: 1237-1242.
[18] SAKURADA M, YAIRI T. Anomaly detection using autoencoders with nonlinear dimensionality reduction[C]// Proceedings of the MLSDA 2nd Workshop on Machine Learning for Sensory Data Analysis. New York: ACM, 2014: 4-11.
[19] ZHOU C, PAFFENROTH R C. Anomaly detection with robust deep autoencoders[C]// Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2017: 665-674.
[20] GUPTA K, BHAVSAR A, SAO A K. Detecting mitotic cells in HEp-2 images as anomalies via one class classifier[J]. Computers in Biology and Medicine, 2019, 111: No.103328.
[21] NAPOLETANO P, PICCOLI F, SCHETTINI R. Anomaly detection in nanofibrous materials by CNN-based self-similarity[J]. Sensors, 2018, 18(1): No.209.
[22] CANDèS E J, LI X D, MA Y, et al. Robust principal component analysis?[J]. Journal of the ACM, 2011, 58(3): No.11.
[23] GONG D, LIU L Q, LE V, et al. Memorizing normality to detect anomaly: memory-augmented deep autoencoder for unsupervised anomaly detection[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 1705-1714.
[24] SCHLEGL T, SEEB?CK P, WALDSTEIN S M, et al. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery[C]// Proceedings of the 2017 International Conference on Information Processing in Medical Imaging, LNCS 10265. Cham: Springer, 2017: 146-157.
[25] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context[C]// Proceedings of the 2014 European Conference on Computer Vision, LNCS 8693. Cham: Springer, 2014: 740-755.
[26] EVERINGHAM M, GOOL L van, WILLIAMS C K I, et al. The PASCAL Visual Object Classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303-338.
[27] BUDA M, MAKI A, MAZUROWSKI M A. A systematic study of the class imbalance problem in convolutional neural networks[J]. Neural Networks, 2018, 106: 249-259.
[28] HE H B, GARCIA E A. Learning from imbalanced data[J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9): 1263-1284.
[29] OUYANG W L, WANG X G, ZHANG C, et al. Factors in finetuning deep model for object detection with long-tail distribution[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 864-873.
[30] BYRD J, LIPTON Z C. What is the effect of importance weighting in deep learning?[C]// Proceedings of the 36th International Conference on Machine Learning. New York: JMLR.org, 2019: 872-881.
[31] SHEN L, LIN Z C, HUANG Q M. Relay backpropagation for effective learning of deep convolutional neural networks[C]// Proceedings of the 2016 European Conference on Computer Vision, LNCS 9911. Cham: Springer, 2016: 467-482.
[32] CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16: 321-357.
[33] CUI Y, JIA M L, LIN T Y, et al. Class-balanced loss based on effective number of samples[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 9260-9269.
[34] WANG Y X, RAMANAN D, HEBERT M. Learning to model the tail[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 7032-7042.
[35] HUANG C, LI Y N, LOY C C, et al. Learning deep representation for imbalanced classification[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 5375-5384.
[36] ZHANG X, FANG Z Y, WEN Y D, et al. Range loss for deep face recognition with long-tailed training data[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 5419-5428.
[37] CAO K D, WEI C, GAIDON A, et al. Learning imbalanced datasets with label-distribution-aware margin loss[C]// Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2019: 1567-1578.
[38] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2999-3007.
[39] LI B Y, LIU Y, WANG X G. Gradient harmonized single-stage detector[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2019: 8577-8584.
[40] LI Y, WANG T, KANG B Y, et al. Overcoming classifier imbalance for long-tail object detection with balanced group softmax[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 10988-10997.
[41] YIN X, YU X, SOHN K, et al. Feature transfer learning for face recognition with under-represented data[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 5697-5706.
[42] LIU Z W, MIAO Z Q, ZHAN X H, et al. Large-scale long-tailed recognition in an open world[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 2532-2541.
[43] CHU P, BIAN X, LIU S P, et al. Feature space augmentation for long-tailed data[C]// Proceedings of the 2020 European Conference on Computer Vision, LNCS 12374. Cham: Springer, 2020: 694-710.
[44] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149.
[45] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778.
[46] HE K M, GKIOXARI G, DOLLáR P, et al. Mask R-CNN[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2980-2988.
[47] TAN J R, LU X, ZHANG G, et al. Equalization loss v2: a new gradient balance approach for long-tailed object detection[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 1685-1694.
[48] HE Y, SONG K, MENG Q, et al. An end-to-end steel surface defect detection approach via fusing multiple hierarchical features[J]. IEEE Transactions on Instrumentation and Measurement, 2019, 69(4): 1493-1504.
[49] KINGMA D P, BA J L. Adam: a method for stochastic optimization[EB/OL]. (2017-01-30) [2022-07-13].https://arxiv.org/pdf/1412.6980.pdf.
[50] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multiBox detector[C]// Proceedings of the 2016 European Conference on Computer Vision, LNCS 9905. Cham: Springer, 2016: 21-37.
[51] KUZNETSOVA A, MALEVA T, SOLOVIEV V. YOLOv5 versus YOLOv3 for apple detection[M]// KRAVETS A G, BOLSHAKOV A A, SHCHERBAKOV M. Cyber-Physical Systems: Modelling and Intelligent Control, SSDC 338. Cham: Springer, 2021: 349-358.
[52] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2015-04-10) [2022-07-24].https://arxiv.org/pdf/1409.1556.pdf.
[53] REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. (2018-04-08) [2022-08-09].https://arxiv.org/pdf/1804.02767.pdf.
[54] MENON A K, JAYASUMANA S, RAWAT A S, et al. Long-tail learning via logit adjustment[EB/OL]. (2021-07-09) [2022-05-22].https://arxiv.org/pdf/2007.07314.pdf.
[55] TAN J R, WANG C B, LI B Y, et al. Equalization loss for long-tailed object recognition[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 11659-11668.
Long-tailed image defect detection based on gradient-guide weighted-deferred negative gradient decay loss
LI Wei*, LIANG Sixin, ZHANG Jianzhou
(,,610065,)
Aiming at the problem that the current image defect detection models have poor detection effect on tail categories in long-tail defect datasets, a GGW-DND Loss (Gradient-Guide Weighted-Deferred Negative Gradient decay Loss) was proposed. First, the positive and negative gradients were re-weighted according to the cumulative gradient ratio of the classification nodes in the detector in order to reduce the suppressed state of tail classifier. Then, once the model was optimized to a certain stage, the negative gradient generated by each node was sharply reduced to enhance the generalization ability of the tail classifier. Experimental results on the self-made image defect dataset and NEU-DET (NEU surface defect database for Defect dEtection Task) show that the mean Average Precision (mAP) for tail categories of the proposed loss is better than that of Binary Cross Entropy Loss (BCE Loss), the former is increased by 32.02 and 7.40 percentage points respectively, and compared with EQL v2 (EQualization Loss v2), the proposed loss has the mAP increased by 2.20 and 0.82 percentage points respectively, verifying that the proposed loss can effectively improve the detection performance of the network for tail categories.
long-tail dataset; cumulative gradient ratio; weighted loss; image defect detection; Convolutional Neural Network (CNN)
This work is partially supported by China Postdoctoral Science Foundation (2022M712235), Postdoctoral Research and Development Foundation of Sichuan University (2022SCU12074).
LI Wei, born in 1998, M. S. candidate. His research interests include defect detection, object detection.
LIANG Sixin, born in 1995, Ph. D. candidate. His research interests include machine vision, camera calibration, medical image processing.
ZHANG Jianzhou, born in 1962, Ph. D., professor. His research interests include machine vision, computer vision.
1001-9081(2023)10-3267-08
10.11772/j.issn.1001-9081.2022091413
2022?09?26;
2022?12?04;
中國博士后科學(xué)基金資助項目(2022M712235);四川大學(xué)專職博士后研發(fā)基金資助項目(2022SCU12074)。
李?。?998—),男,江蘇泰州人,碩士研究生,主要研究方向:缺陷檢測、目標(biāo)檢測; 梁斯昕(1995—),男,云南曲靖人,博士研究生,主要研究方向:機(jī)器視覺、攝像機(jī)標(biāo)定、醫(yī)學(xué)圖像處理; 張建州(1962—),男,河北邯鄲人,教授,博士,主要研究方向:機(jī)器視覺、計算機(jī)視覺。
TP391
A
2022?12?12。