張 伏,陳自均,鮑若飛,張朝臣,王治豪
基于改進(jìn)型YOLOv4-LITE輕量級(jí)神經(jīng)網(wǎng)絡(luò)的密集圣女果識(shí)別
張 伏1,2,3,陳自均1,鮑若飛1,張朝臣1,王治豪1
(1. 河南科技大學(xué)農(nóng)業(yè)裝備工程學(xué)院,洛陽 471003;2. 江蘇大學(xué)現(xiàn)代農(nóng)業(yè)裝備與技術(shù)教育部重點(diǎn)實(shí)驗(yàn)室,鎮(zhèn)江 212013;3. 機(jī)械裝備先進(jìn)制造河南省協(xié)同創(chuàng)新中心,洛陽 471003)
對(duì)密集圣女果遮擋、粘連等情況下的果實(shí)進(jìn)行快速識(shí)別定位,是提高設(shè)施農(nóng)業(yè)環(huán)境下圣女果采摘機(jī)器人工作效率和產(chǎn)量預(yù)測(cè)的關(guān)鍵技術(shù)之一,該研究提出了一種基于改進(jìn)YOLOv4-LITE輕量級(jí)神經(jīng)網(wǎng)絡(luò)的圣女果識(shí)別定位方法。為便于遷移到移動(dòng)終端,該方法使用MobileNet-v3作為模型的特征提取網(wǎng)絡(luò)構(gòu)建YOLOv4-LITE網(wǎng)絡(luò),以提高圣女果果實(shí)目標(biāo)檢測(cè)速度;為避免替換骨干網(wǎng)絡(luò)降低檢測(cè)精度,通過修改特征金字塔網(wǎng)絡(luò)(Feature Pyramid Networks,F(xiàn)PN)+路徑聚合網(wǎng)絡(luò)(Path Aggregation Network,PANet)的結(jié)構(gòu),引入有利于小目標(biāo)檢測(cè)的104×104尺度特征層,實(shí)現(xiàn)細(xì)粒度檢測(cè),在PANet結(jié)構(gòu)中使用深度可分離卷積代替普通卷積降低模型運(yùn)算量,使網(wǎng)絡(luò)更加輕量化;并通過載入預(yù)訓(xùn)練權(quán)重和凍結(jié)部分層訓(xùn)練方式提高模型的泛化能力。通過與YOLOv4在相同遮擋或粘連程度的測(cè)試集上的識(shí)別效果進(jìn)行對(duì)比,用調(diào)和均值、平均精度、準(zhǔn)確率評(píng)價(jià)模型之間的差異。試驗(yàn)結(jié)果表明:在重疊度為0.50時(shí)所提出的密集圣女果識(shí)別模型在全部測(cè)試集上調(diào)和均值、平均精度和準(zhǔn)確率分別為0.99、99.74%和99.15%,同比YOLOv4分別提升了0.15、8.29、6.55個(gè)百分點(diǎn),權(quán)重大小為45.3 MB,約為YOLOv4的1/5,對(duì)單幅416×416(像素)圖像的檢測(cè),在圖形處理器(Graphics Processing Unit,GPU)上速度可達(dá)3.01 ms/張。因此,該研究提出的密集圣女果識(shí)別模型具有識(shí)別速度快、識(shí)別準(zhǔn)確率高、輕量化等特點(diǎn),可為設(shè)施農(nóng)業(yè)環(huán)境下圣女果采摘機(jī)器人高效工作以及圣女果產(chǎn)量預(yù)測(cè)提供有力的保障。
機(jī)器視覺;模型;YOLO;深度學(xué)習(xí);圖像識(shí)別;目標(biāo)檢測(cè)
視覺識(shí)別系統(tǒng)作為果蔬采摘機(jī)器人系統(tǒng)的組成部分,在果蔬目標(biāo)識(shí)別定位、自動(dòng)采摘和果蔬估產(chǎn)等方面[1-2]具有至關(guān)重要作用。圣女果(Cherry tomatoes)因其營(yíng)養(yǎng)價(jià)值高、風(fēng)味獨(dú)特而被廣泛種植。由于圣女果果實(shí)高度密集(粘連)、果實(shí)小、葉片遮擋嚴(yán)重,且其生長(zhǎng)周期長(zhǎng)、果蔬高度不一,采收環(huán)節(jié)成為最費(fèi)時(shí)耗力的工作[3-4],因此研究圣女果快速精確識(shí)別,對(duì)實(shí)現(xiàn)采摘機(jī)器人自動(dòng)采摘、提高作業(yè)效率具有較大應(yīng)用價(jià)值。
近年來,國(guó)內(nèi)外基于傳統(tǒng)機(jī)器視覺和圖像處理技術(shù),對(duì)自然環(huán)境下果疏目標(biāo)識(shí)別提出了解決方法。馬翠花等[5]提出了使用基于密集和稀疏重構(gòu)的顯著性檢測(cè)方法與改進(jìn)隨機(jī)Hough變換識(shí)別方法,對(duì)番茄果簇中的綠色單果番茄進(jìn)行識(shí)別,識(shí)別準(zhǔn)確率為77.6%。李寒等[6]通過RGB-D圖像和SOM-K-means算法獲取果實(shí)的位置信息與輪廓形狀,該方法對(duì)多個(gè)番茄重疊粘連識(shí)別正確率達(dá)87.2%。Payne等[7]提出一種基于機(jī)器視覺的芒果產(chǎn)量估測(cè)方法,利用RGB和YCbCr顏色空間分割和基于相鄰像素變異性的紋理分割方法,從背景像素中有效分割出水果,該方法可計(jì)算芒果果實(shí)的數(shù)量及產(chǎn)量估測(cè)。李昕等[8]將邊緣預(yù)檢測(cè)、快速定位圓心點(diǎn)等模塊添加在Hough算法中,對(duì)遮擋油茶果識(shí)別率為90.70%。Gong等[9]提出一種改進(jìn)的8連通性的編碼識(shí)別算法,進(jìn)行簇狀柑橘的產(chǎn)量預(yù)測(cè)。Gong等[9]提出一種改進(jìn)的8連通性的編碼識(shí)別算法,進(jìn)行簇狀柑橘的產(chǎn)量預(yù)測(cè)。Xu等[10]采用基于面向梯度直方圖(Histogram of Oriented Gradient,HOG)描述符與支持向量機(jī)(Support Vector Machine,SVM)分類器相結(jié)合的方法,分類器檢測(cè)率為87%。此外,還有學(xué)者對(duì)粘連、重疊的果實(shí)進(jìn)行識(shí)別研究[11-15],檢測(cè)準(zhǔn)確率均在90%左右。上述方法利用果實(shí)的形狀特征、紋理特征與背景顏色差異特征等單特征或多特征組合實(shí)現(xiàn)果實(shí)的識(shí)別,一旦遇到光照改變、果實(shí)粘連重疊、枝葉遮擋、相似度高的背景等情況識(shí)別準(zhǔn)確率會(huì)降低。另外,運(yùn)用傳統(tǒng)的機(jī)器視覺技術(shù)識(shí)別果實(shí),受制于自身分類器算法的限制,無法滿足當(dāng)前復(fù)雜環(huán)境下果實(shí)目標(biāo)識(shí)別任務(wù)[16-17]。
近年來,隨著計(jì)算機(jī)視覺與深度學(xué)習(xí)技術(shù)在農(nóng)業(yè)領(lǐng)域的廣泛應(yīng)用,深度卷積神經(jīng)網(wǎng)絡(luò)(Deep Convolutional Neural Networks,DCNN)在果蔬目標(biāo)檢測(cè)中凸顯出巨大的優(yōu)越性,其主要分為兩種,一種是以RCNN[18]、Fast RCNN[19]、Faster RCNN[20]為代表的兩階段(two stage)目標(biāo)檢測(cè)方法,其算法思想是先獲得目標(biāo)區(qū)域建議框再在區(qū)域建議框中進(jìn)行分類。閆建偉等[21]提出改進(jìn)的Faster RCNN自然環(huán)境中不同成熟度和遮擋情況下刺梨果實(shí)的識(shí)別方法,識(shí)別召回率達(dá)81.4%,準(zhǔn)確率達(dá)95.53%,F(xiàn)1值達(dá)94.99%,識(shí)別速度為0.2 s/張,其模型精度高、泛化能力強(qiáng),但兩階段(two stage)目標(biāo)檢測(cè)方法生成候選區(qū)域步驟需占用大量資源,檢測(cè)時(shí)間較長(zhǎng)。另一種方法是以SSD(Single Shot MultiBox Detector)[22]、YOLO(You Only Look Once)[23]等為代表的單階段(one stage)目標(biāo)檢測(cè)方法,其算法思想是目標(biāo)建議框與分類標(biāo)簽在同一網(wǎng)絡(luò)下完成。成偉等[24]基于改進(jìn)的YOLOv3網(wǎng)絡(luò)識(shí)別溫室番茄,識(shí)別模型估產(chǎn)平均精度(mean Average Precision,mAP)為95.7%,單幅圖像處理耗時(shí)15 ms,并且對(duì)密集和受遮擋的果實(shí)具有更好的識(shí)別效果。呂石磊等[25]通過引入GIoU邊框回歸損失函數(shù)和,使用MobileNet-v2作為模型的骨干網(wǎng)絡(luò),構(gòu)建了基于改進(jìn)型YOLOv3-LITE輕量級(jí)神經(jīng)網(wǎng)絡(luò),對(duì)自然環(huán)境下密集、葉片遮擋嚴(yán)重的柑橘進(jìn)行識(shí)別定位,在全部測(cè)試集上調(diào)和均值為93.69%,平均精度值為91.13%,對(duì)單張416×416(像素)的圖片識(shí)別速度為16.9 ms,因此使用YOLO算法對(duì)果實(shí)進(jìn)行識(shí)別,在保證較高的識(shí)別準(zhǔn)確率情況下,又具備檢測(cè)速度快、內(nèi)存占用少等特點(diǎn)[26-29]。
為解決設(shè)施農(nóng)業(yè)環(huán)境下密集度高、果實(shí)小、遮擋粘連嚴(yán)重的圣女果精確、快速識(shí)別問題,提出改進(jìn)型YOLOv4-LITE輕量級(jí)神經(jīng)網(wǎng)絡(luò),在特征融合結(jié)構(gòu)中引入實(shí)現(xiàn)對(duì)小目標(biāo)檢測(cè)的特征層,增加模型細(xì)粒度檢測(cè),并進(jìn)行剪枝操作和引入深度可分離卷積結(jié)構(gòu)壓縮網(wǎng)絡(luò)量,加快識(shí)別速度,同時(shí)通過載入預(yù)訓(xùn)練權(quán)重并凍結(jié)部分層的訓(xùn)練方式提高模型精度,以期為圣女果快速精準(zhǔn)識(shí)別提供參考。
圣女果圖像的采集地點(diǎn)為河南省洛陽市孟津區(qū)某采摘園基地溫室內(nèi),拍攝時(shí)間為2021年3月19日上午10:00-12:00。在白天自然光條件下采集,采集時(shí),使用高清數(shù)碼單反相機(jī)距圣女果果實(shí)20~80 cm的距離拍攝,為模擬采摘機(jī)器人識(shí)別系統(tǒng),拍攝角度為左側(cè)拍、右側(cè)拍、俯拍、仰拍和正面拍共5個(gè)方向角度[30],共采集成熟圣女果原始圖像249張,挑選出214張。所采集圣女果圖像大小為6 000×4 000(像素),種類涵蓋順光、逆光、重疊、遮擋、粘連等各種實(shí)際溫室中圣女果生長(zhǎng)環(huán)境的圖像。溫室環(huán)境下的部分圣女果如圖1所示,可以發(fā)現(xiàn)圣女果果實(shí)出現(xiàn)重度粘連、遮擋現(xiàn)象。
在深度學(xué)習(xí)模型訓(xùn)練階段,數(shù)據(jù)采集的越充分、越全面,模型識(shí)別效果越顯著,因此通過數(shù)據(jù)擴(kuò)增方法(Data augmentation)擴(kuò)充樣本數(shù)量。為更能模擬復(fù)雜場(chǎng)景下拍攝狀態(tài)和應(yīng)用到深度神經(jīng)網(wǎng)絡(luò)中,在Keras框架下結(jié)合Opencv,先對(duì)采集數(shù)據(jù)裁剪和壓縮至416×416(像素),再對(duì)圖像進(jìn)行平移、隨機(jī)旋轉(zhuǎn)、鏡像翻轉(zhuǎn)、水平翻轉(zhuǎn)、垂直翻轉(zhuǎn)、顏色增強(qiáng)、亮度改變、添加高斯噪音等不同程度與組合的物理變換對(duì)圖像進(jìn)行數(shù)據(jù)增強(qiáng)[18,31]。增強(qiáng)后共得到10 710張圖像作為數(shù)據(jù)集。
使用LabelImg工具對(duì)圖像中圣女果果實(shí)進(jìn)行手工標(biāo)記矩形框。標(biāo)注時(shí),對(duì)完全裸露的圣女果切于矩形框內(nèi)側(cè)進(jìn)行標(biāo)注,對(duì)遮擋或粘連的圣女果果實(shí)裸露于圖像的部分進(jìn)行矩形框標(biāo)注,對(duì)圖像邊界出現(xiàn)部分和遮擋的圣女果程度小于10%時(shí)進(jìn)行無標(biāo)注處理。標(biāo)注完成后得到包含ground truth的.xml文件,之后將數(shù)據(jù)集按照9∶1比例劃分,其中90%部分再按照9∶1分為訓(xùn)練集和驗(yàn)證集,剩余10%作為測(cè)試集,最終得到訓(xùn)練集、驗(yàn)證集和測(cè)試集的樣本數(shù)量分別為8 674、965、1 071張。
相對(duì)于R-CNN、Fast R-CNN、Faster R-CNN等兩階段(two-stage)的目標(biāo)檢測(cè)方法,YOLOv4[32]作為經(jīng)典的單階段(one-stage)深度識(shí)別網(wǎng)絡(luò)模型,直接在網(wǎng)絡(luò)中生成預(yù)測(cè)物體分類(Classification)和邊界框(Bounding box),極大提高了目標(biāo)檢測(cè)速度,其結(jié)構(gòu)簡(jiǎn)圖如圖2所示。YOLOv4網(wǎng)絡(luò)模型由3大部分組成:主干網(wǎng)絡(luò)(Backbone network)、頸部網(wǎng)絡(luò)(Neck network)、頭部網(wǎng)絡(luò)(Head network)。主干網(wǎng)絡(luò)為CSPDarknet53特征提取網(wǎng)絡(luò),由CSP1~CSP5共5個(gè)模塊構(gòu)成,每個(gè)模塊由CSPX模塊和CBM或CBL模塊相互交替堆疊而成,YOLOv4在主干網(wǎng)絡(luò)中使用了CSPnet結(jié)構(gòu),如圖2c所示,在殘差塊X Res unit堆疊的另一邊經(jīng)CMB處理后形成一個(gè)大的殘差邊,增強(qiáng)了CNN的學(xué)習(xí)能力。在CSPnet結(jié)構(gòu)中又引入Mish激活函數(shù)替代Leaky ReLU激活函數(shù),Mish函數(shù)具有無上界、有下界、非單調(diào)、無窮階連續(xù)性、平滑性等特點(diǎn),有助于模型實(shí)現(xiàn)正則化、穩(wěn)定網(wǎng)絡(luò)梯度流。Mish函數(shù)表達(dá)式為
式中為輸入值,tan為雙曲正切函數(shù),ln為以常數(shù)e為底數(shù)的對(duì)數(shù)函數(shù)。
經(jīng)過CSPDarknet53主干網(wǎng)絡(luò)得到52×52×256(特征層P3)、26×26×512(特征層P4)、13×13×1 024這3個(gè)特征層,其中對(duì)13×13×1 024特征層在空間金字塔池化(Spatial Pyramid Pooling,SPP)中分別進(jìn)行13×13、9×9、5×5、1×1四個(gè)不同尺度的最大池化處理,然后通過Concatenate運(yùn)算整合得到特征層P5,SPP結(jié)構(gòu)極大地增大感受野,獲得了更多上下文特征。P3、P4、P5三個(gè)特征層先經(jīng)過特征金字塔網(wǎng)絡(luò)(Feature Pyramid Networks,F(xiàn)PN)進(jìn)行自下向上的上采樣(Upsample)融合,但FPN融合低、高層特征方法路徑長(zhǎng),細(xì)節(jié)信息傳遞困難,因此YOLOv4設(shè)計(jì)路徑聚合網(wǎng)絡(luò)(Path Aggregation Network,PANet)通過下采樣(Downsample)對(duì)3個(gè)特征層完成自上向下的路徑增強(qiáng),極大縮短了信息傳播路徑,同時(shí)利用了低層特征的精準(zhǔn)定位信息。YOLOv4使用了Mosaic數(shù)據(jù)增強(qiáng)、Label Smoothing平滑、CIOU回歸損失函數(shù)和學(xué)習(xí)率余弦退火衰減法等訓(xùn)練技巧。
YOLOv4的損失函數(shù)Loss包括回歸損失函數(shù)Loss(coord)、置信度損失函數(shù)Loss(conf)和分類損失函數(shù)Loss(cls)。損失函數(shù)公式如下:
YOLOv4網(wǎng)絡(luò)含有CSPnet結(jié)構(gòu)的CSPDarknet53網(wǎng)絡(luò),雖降低了參數(shù)量,獲得更快的速度,但模型計(jì)算仍復(fù)雜,需消耗較多的內(nèi)存空間。本研究提出一種基于目標(biāo)檢測(cè)的輕量級(jí)網(wǎng)絡(luò)模型,在傳統(tǒng)的YOLOv4網(wǎng)絡(luò)基礎(chǔ)上,將MobileNet-V3網(wǎng)絡(luò)作為特征提取主干網(wǎng)絡(luò),構(gòu)建YOLOv4-LITE輕量級(jí)網(wǎng)絡(luò)模型。MobileNet是一種基于移動(dòng)端和嵌入式設(shè)備的高效CNN模型,具有更輕量級(jí)、更快速的特點(diǎn)。MobileNet-V3既繼承了MobileNet-V1的深度可分離卷積(depthwise separable convolutions),又融合了MobileNet-V2的具有線性瓶頸的逆殘差結(jié)構(gòu)(the inverted residual with linear bottleneck)的結(jié)構(gòu)特點(diǎn),MobileNet-V3由多個(gè)block堆疊而成,block的bneck結(jié)構(gòu)如圖3所示,其依次經(jīng)過1×1、3×3、1×1的卷積提升通道數(shù)量、進(jìn)行深度卷積和降低維度。MobileNet-V3在bottlenet結(jié)構(gòu)中引入SE(Squeeze and Excitation)結(jié)構(gòu)的輕量級(jí)注意力機(jī)制,并基于MobileNet-V2的結(jié)構(gòu):將頭部的卷積核通道數(shù)量由32降低至16;在尾部通過avg pooling將特征圖大小由7×7降到了1×1,并且舍棄了紡錘形的3×3和1×1的卷積,在減少消耗時(shí)間的同時(shí)又提高了精度。
圣女果作為識(shí)別目標(biāo),其尺度各有差異、不盡相同,因此為避免使用MobileNet-V3骨干網(wǎng)絡(luò)降低對(duì)小目標(biāo)的檢測(cè)精度,通過修改FPN網(wǎng)絡(luò),如圖4所示,從YOLOv4-LITE輕量級(jí)神經(jīng)網(wǎng)絡(luò)的主干網(wǎng)絡(luò)MobileNet-V3中,輸出13×13(特征層P5')、26×26(特征層P4')、52×52(特征層P3')和104×104(特征層P2)共4個(gè)尺度,P5'感受野較大適合大尺寸目標(biāo)檢測(cè),P4'適合中等目標(biāo)檢測(cè),在P3'基礎(chǔ)上進(jìn)行上采樣,融合P2特征層后獲得豐富的淺層信息,使之對(duì)小目標(biāo)檢測(cè)更加敏感,實(shí)現(xiàn)細(xì)粒度的檢測(cè)。在特征傳播過程中13×13尺度特征層仍然經(jīng)過SPP結(jié)構(gòu)得到特征層P5',本研究將特征層P5'、P4'、P3'和P2在FPN結(jié)構(gòu)中通過上采樣進(jìn)行不同金字塔級(jí)別的Feature map結(jié)合,每一個(gè)特征層經(jīng)過DBL和Upsample操作變換,獲得與上一個(gè)特征層相同的尺度和通道數(shù)量,然后經(jīng)過Concatence與上一個(gè)特征層融合獲得信息更加豐富的Feature map。為防止網(wǎng)絡(luò)過于冗余,對(duì)FPN特征融合后輸出的4個(gè)特征層進(jìn)行剪枝操作,也即將FPN輸出的104×104尺度特征層不再進(jìn)行YOLO Head的預(yù)測(cè)輸出,直接在PANet結(jié)構(gòu)中上采樣,因此改進(jìn)算法仍保留了YOLO Head的13×13、26×26、52×52三個(gè)尺度的特征層(P5''、P4''、P3'')預(yù)測(cè)輸出。
但在改進(jìn)FPN網(wǎng)絡(luò),增加104×104特征層負(fù)責(zé)檢測(cè)小目標(biāo)的同時(shí),網(wǎng)絡(luò)數(shù)量必然會(huì)相應(yīng)增加,為降低網(wǎng)絡(luò)運(yùn)算數(shù)量,在PANet結(jié)構(gòu)中引入深度可分離卷積(Depthwise separable convolution)代替原網(wǎng)絡(luò)Downsample中普通卷積進(jìn)行下采樣,實(shí)現(xiàn)自上而下的特征信息交互,有效減少網(wǎng)絡(luò)計(jì)算量和參數(shù);同時(shí)深度可分離卷積可由自身的1×1 Filter數(shù)量任意決定輸出通道數(shù)量,解除了普通卷積核個(gè)數(shù)和步長(zhǎng)的限制,代替了池化操作的作用,在節(jié)省內(nèi)存消耗的同時(shí)也提高了模型精度[33]。
注:P5'、P4'、P3'和P2為改進(jìn)FPN+PANet結(jié)構(gòu)的輸入特征層;P5''、P4''、P3''為改進(jìn)FPN+PANet結(jié)構(gòu)的輸出。
2.3.1 試驗(yàn)平臺(tái)
本研究使用Tensorflow和Keras框架來改進(jìn)YOLOv4網(wǎng)絡(luò)。試驗(yàn)環(huán)境如表1所示。
2.3.2 密集圣女果識(shí)別網(wǎng)絡(luò)訓(xùn)練
密集圣女果目標(biāo)檢測(cè)網(wǎng)絡(luò)流程如圖5所示。通過對(duì)比改進(jìn)模型的不同修改之處進(jìn)行對(duì)比試驗(yàn),并在相同驗(yàn)證集上驗(yàn)證模型的效果。首先對(duì)采集的數(shù)據(jù)進(jìn)行預(yù)處理,然后在LabelImg工具上進(jìn)行目標(biāo)圣女果手工標(biāo)注,標(biāo)注時(shí)對(duì)圖像邊界出現(xiàn)部分和遮擋的圣女果程度小于10%時(shí)進(jìn)行無標(biāo)注處理,對(duì)標(biāo)注的圖像數(shù)據(jù)與對(duì)應(yīng)的.xml文件同時(shí)進(jìn)行擴(kuò)增,并以PASCALVOV數(shù)據(jù)格式保存,分別采用使用預(yù)訓(xùn)練權(quán)重、不使用預(yù)訓(xùn)練權(quán)重的訓(xùn)練方法進(jìn)行網(wǎng)絡(luò)模型的訓(xùn)練,使用預(yù)權(quán)重訓(xùn)練時(shí)先凍結(jié)預(yù)訓(xùn)練權(quán)重網(wǎng)絡(luò)層,將更多的資源放在后面網(wǎng)絡(luò)的參數(shù)訓(xùn)練上,后解凍這部分網(wǎng)絡(luò)層的訓(xùn)練方式,通過先凍結(jié)部分層后解凍的訓(xùn)練方式可有效保證權(quán)值。模型訓(xùn)練時(shí),采用Keras訓(xùn)練中的早停法(early stopping)技巧,當(dāng)模型在訓(xùn)練集上的損失值不再下降時(shí),停止訓(xùn)練,有效防止過擬合。
表1 試驗(yàn)環(huán)境
網(wǎng)絡(luò)訓(xùn)練參數(shù)設(shè)置。不使用預(yù)訓(xùn)練權(quán)重訓(xùn)練時(shí),模型超參數(shù)設(shè)置批樣本(batch size)數(shù)量為2,動(dòng)量因子(Momentum)為0.9,權(quán)值初始學(xué)習(xí)率(learning rate)為0.001,衰減系數(shù)為0.000 5;使用預(yù)訓(xùn)練權(quán)重時(shí),模型凍結(jié)層超參數(shù)設(shè)置代訓(xùn)練(epoch)為50,批樣本數(shù)量為8,動(dòng)量因子為0.9,權(quán)值初始學(xué)習(xí)率為0.000 1,衰減系數(shù)為0.000 5,解除凍結(jié)后超參數(shù)設(shè)置代訓(xùn)練為50,批樣本數(shù)量為2,動(dòng)量因子為0.9,權(quán)值初始學(xué)習(xí)率為0.001,衰減系數(shù)為0.000 5,總代訓(xùn)練為100。2種訓(xùn)練方式均采用BN(batch normalization)正則化進(jìn)行網(wǎng)絡(luò)層權(quán)值更新,在訓(xùn)練集上每經(jīng)過一代訓(xùn)練(epoch)保存一次權(quán)重文件,并生成日志文件輸出訓(xùn)練集和驗(yàn)證集的損失值。訓(xùn)練集損失和驗(yàn)證集損失曲線如圖6所示。
在前50代訓(xùn)練時(shí)驗(yàn)證集損失雖在前期出現(xiàn)震蕩但總體趨于減小趨勢(shì),訓(xùn)練集損失持續(xù)下降;在第50代訓(xùn)練時(shí)由于解凍訓(xùn)練使訓(xùn)練集損失升高,驗(yàn)證集損失減少;在第50代訓(xùn)練之后由于驗(yàn)證集損失的減小,訓(xùn)練集損失不斷減小,當(dāng)驗(yàn)證集損失不再發(fā)生大幅變化時(shí),訓(xùn)練集損失也不再變化,模型收斂。
2.3.3 模型測(cè)試
為客觀衡量模型對(duì)密集圣女果的目標(biāo)檢測(cè)效果,使用調(diào)和均值F1值(F1-score)、召回率(Recall)、準(zhǔn)確率(Precision)、檢測(cè)速度、網(wǎng)絡(luò)參數(shù)量、權(quán)重大小來評(píng)價(jià)訓(xùn)練后的模型,其中F1、Recall、Precision計(jì)算公式如公式(6)~(8)所示。
式中TP為真實(shí)的正樣本數(shù)量,F(xiàn)P為虛假的正樣本數(shù)量,F(xiàn)N為虛假的負(fù)樣本數(shù)量。平均精度(Average Precision,AP)計(jì)算公式如式(9)所示。
式中為積分變量,是對(duì)召回率與精確度乘積的積分。AP為PR(Precision-Recall)曲線與坐標(biāo)軸圍成的面積,取值在0~1之間。AP50為重疊度(Intersection Over Union,IOU)=0.5時(shí)不同查全率下的精度平均值;AP75為IOU=0.75時(shí)不同查全率下的精度平均值。
為驗(yàn)證密集圣女果目標(biāo)識(shí)別算法的性能,分別對(duì)是否載入預(yù)訓(xùn)練權(quán)重并凍結(jié)部分層方法和不同骨干網(wǎng)絡(luò)在相同密集圣女果數(shù)據(jù)集和參數(shù)設(shè)置上進(jìn)行訓(xùn)練,之后對(duì)提出的模型的部分改進(jìn)之處進(jìn)行對(duì)比試驗(yàn)分析。
以YOLOv4-LITE網(wǎng)絡(luò)為對(duì)象網(wǎng)絡(luò),分別對(duì)其進(jìn)行使用預(yù)訓(xùn)練權(quán)重并凍結(jié)部分層的訓(xùn)練方式和不使用預(yù)訓(xùn)練權(quán)重的訓(xùn)練方式的2種方式進(jìn)行訓(xùn)練。不使用預(yù)訓(xùn)練權(quán)重即在訓(xùn)練集上對(duì)所有參數(shù)進(jìn)行初始訓(xùn)練;使用預(yù)訓(xùn)練權(quán)重并凍結(jié)部分層的訓(xùn)練方式為載入COCO數(shù)據(jù)集的MobileNet-V3權(quán)重,初始訓(xùn)練凍結(jié)YOLOv4-LITE中的MobileNet-V3網(wǎng)絡(luò)層,先訓(xùn)練后半部分網(wǎng)絡(luò),在訓(xùn)練50代后開始解凍凍結(jié)部分網(wǎng)絡(luò)層,進(jìn)行全部網(wǎng)絡(luò)的訓(xùn)練。表2比較了2種訓(xùn)練策略的結(jié)果。
試驗(yàn)結(jié)果表明,相比于不使用預(yù)訓(xùn)練權(quán)重,使用預(yù)訓(xùn)練權(quán)重并凍結(jié)部分層的訓(xùn)練方式在IOU=0.50和IOU=0.75時(shí)平均精度值提升約3個(gè)百分點(diǎn),同時(shí)調(diào)和均值和準(zhǔn)確率也有不同程度的提升。
為滿足設(shè)施農(nóng)業(yè)圣女果采摘機(jī)器人與相關(guān)嵌入式移動(dòng)端檢測(cè)設(shè)備的實(shí)際生產(chǎn)需求,以MobileNet-V3作為骨干網(wǎng)絡(luò)設(shè)計(jì)YOLOv4-LITE輕量級(jí)神經(jīng)網(wǎng)絡(luò)。為驗(yàn)證設(shè)計(jì)的YOLOv4-LITE的合理性,在1 071張相同測(cè)試集上進(jìn)行對(duì)比試驗(yàn),試驗(yàn)結(jié)果如表3所示。從表3可以看出,相較于原YOLOv4以CSPDarkNet-53作為骨干網(wǎng)絡(luò),YOLOv4-LITE以MobileNet系列輕量級(jí)網(wǎng)絡(luò)作為骨干網(wǎng)絡(luò),權(quán)重大小、檢測(cè)速度和網(wǎng)絡(luò)參數(shù)量都有大幅改善,但MobileNet-V1和MobileNet-V2作為骨干網(wǎng)絡(luò),在IOU=0.50和IOU=0.75時(shí),大部分F1值、AP、Recall和Precision均低于原YOLOv4網(wǎng)絡(luò)。采用MobileNet-V3作為骨干網(wǎng)絡(luò),模型在F1值、AP和Precision指標(biāo)上均有不同程度的提升,在IOU=0.50時(shí),且相對(duì)于原YOLOv4 方法在IOU=0.50時(shí),F(xiàn)1值提升了0.01,AP值提升了1.71個(gè)百分點(diǎn),準(zhǔn)確率提升了3.69個(gè)百分點(diǎn),對(duì)單張416×416(像素)圖像的檢測(cè)速度高達(dá)2.78 ms/張,權(quán)重大小減小了142 MB,網(wǎng)絡(luò)參數(shù)量也減少了37.82%,因此YOLOv4-LITE輕量級(jí)神經(jīng)網(wǎng)絡(luò)在移動(dòng)設(shè)備或嵌入式終端上應(yīng)用具有明顯優(yōu)勢(shì),但在IOU=0.75時(shí)的AP為43.94%低于原YOLOv4網(wǎng)絡(luò)的50.75%,可能是由于在目標(biāo)識(shí)別過程中,對(duì)于小目標(biāo)的特征檢測(cè)不夠充分。
表2 不同方式訓(xùn)練結(jié)果比較
表3 不同骨干網(wǎng)絡(luò)檢測(cè)結(jié)果對(duì)比
在原始的YOLOv4算法檢測(cè)圣女果目標(biāo)時(shí),其尺度各有差異、不盡相同,往往由于識(shí)別目標(biāo)密集度高、重疊和目標(biāo)小等問題,出現(xiàn)漏檢、誤檢的情況。因此提出改進(jìn)型YOLOv4-LITE輕量級(jí)密集圣女果識(shí)別神經(jīng)網(wǎng)絡(luò)模型,具體改進(jìn)方式如下:①骨干網(wǎng)絡(luò)替換。以MobiletNet-V3作為骨干網(wǎng)絡(luò)。②改進(jìn)FPN結(jié)構(gòu)。在FPN結(jié)構(gòu)中增加104×104特征層。兼顧大、中目標(biāo)檢測(cè)的同時(shí),增加了有利于小目標(biāo)檢測(cè)的Feature map。③改進(jìn)PANet。在PANet引入深度可分離卷積,為減少因增加小目標(biāo)檢測(cè)Feature map后網(wǎng)絡(luò)參數(shù)量的增大。為驗(yàn)證改進(jìn)型YOLOv4-LITE中FPN+PANet結(jié)構(gòu)改進(jìn)的優(yōu)越性,在1 071張相同測(cè)試集上進(jìn)行對(duì)比試驗(yàn),試驗(yàn)結(jié)果如表4所示。
表4 不同改進(jìn)結(jié)構(gòu)檢測(cè)結(jié)果對(duì)比
注:①骨干網(wǎng)絡(luò)替換;②改進(jìn)FPN結(jié)構(gòu);③改進(jìn)PANet。
Note: ① Replace backbone network; ② Improve FPN structure; ③ Improve PANet.
在YOLOv4基礎(chǔ)上改進(jìn)FPN結(jié)構(gòu)比原YOLOv4的AP50提高了8.29個(gè)百分點(diǎn),AP75提高了15.01個(gè)百分點(diǎn),F(xiàn)1值在相應(yīng)的IOU閾值下相應(yīng)提升了0.14和0.24,但其權(quán)重大小增加了4 MB,檢測(cè)速度增加了0.27 ms/張,網(wǎng)絡(luò)參數(shù)量增加了14.85%。與只替換骨干網(wǎng)路(YOLOv4-LITE+ MobiletNet-V3)相比,骨干網(wǎng)路替換+改進(jìn)FPN結(jié)構(gòu)網(wǎng)絡(luò),AP50提高了6.58個(gè)百分點(diǎn),AP75提高了21.82個(gè)百分點(diǎn),F(xiàn)1值在相應(yīng)的IOU閾值下相應(yīng)提升了0.13和0.20,但其權(quán)重大小增加了146 MB,檢測(cè)速度增加了2.11 ms/張,網(wǎng)絡(luò)參數(shù)量增加了63.23%。說明YOLOv4和YOLOv4+ MobiletNet-V3通過增加小目標(biāo)的Feature map提高了模型細(xì)粒度的檢測(cè),但模型權(quán)重大小、檢測(cè)速度和參數(shù)量會(huì)相應(yīng)增大。
骨干網(wǎng)絡(luò)替換+改進(jìn)FPN結(jié)構(gòu)+改進(jìn)PANet網(wǎng)絡(luò),在保證F1、AP、Recall和Precision較高的同時(shí),模型權(quán)重大小為45.3 MB,檢測(cè)速度為3.01 ms/張,網(wǎng)絡(luò)參數(shù)量為12 026 685。相比于YOLOv4網(wǎng)絡(luò),其模型權(quán)重大小減小了198.7 MB,檢測(cè)速度減少了1.61 ms/張,模型參數(shù)量減少了81.33%;相比于YOLOv4+MobiletNet-V3,其模型權(quán)重大小減小了56.7 MB,檢測(cè)速度增加了0.23 ms/張,模型參數(shù)量減少了69.97%。說明改進(jìn)PANet的策略在不影響精度等情況下,可有效減小內(nèi)存消耗、降低模型參數(shù)量、加快模型識(shí)別速度。
總體來看,本研究提出的改進(jìn)型YOLOv4-LITE輕量級(jí)網(wǎng)絡(luò),與YOLOv4網(wǎng)絡(luò)相比,在IOU=0.50時(shí),F(xiàn)1值提升了0.15,AP提高了8.29個(gè)百分點(diǎn),準(zhǔn)確率提高了6.55個(gè)百分點(diǎn),權(quán)重大小約為YOLOv4的1/5,檢測(cè)速度降低了34.85%。
圖7對(duì)比了YOLOv4網(wǎng)絡(luò)與改進(jìn)型YOLOv4-LITE輕量級(jí)網(wǎng)絡(luò)在設(shè)施環(huán)境下對(duì)密集圣女果的識(shí)別結(jié)果,圖中深色框標(biāo)注為算法對(duì)圣女果的識(shí)別結(jié)果,淺色框?yàn)閷?duì)識(shí)別結(jié)果圖像的人工標(biāo)注,表明有個(gè)別果實(shí)未能識(shí)別出來,即算法間對(duì)果實(shí)識(shí)別的差異。從圖中可以看出,YOLOv4在識(shí)別高密集圣女果時(shí),對(duì)高度粘連和遮擋的圣女果果實(shí)出現(xiàn)漏檢的情況,對(duì)較小目標(biāo)未能成功檢測(cè)識(shí)別,相比之下,本文所提出的算法對(duì)高度粘連、嚴(yán)重遮擋和小目標(biāo)果實(shí)有較高識(shí)別率和良好的泛化性能。
1)提出了一種改進(jìn)型的YOLOv4-LITE輕量化神經(jīng)網(wǎng)絡(luò)檢測(cè)高度密集、嚴(yán)重粘連的圣女果目標(biāo)識(shí)別算法。利用MobileNet-V3作為骨干網(wǎng)絡(luò)構(gòu)建YOLOv4-LITE輕量化網(wǎng)絡(luò),在特征融合層(Feature Pyramid Networks,F(xiàn)PN)結(jié)構(gòu)中引入小目標(biāo)檢測(cè)的特征層以增加圖像檢測(cè)細(xì)粒度,對(duì)并在PANet網(wǎng)絡(luò)中引入深度可分離卷積網(wǎng)絡(luò)代替?zhèn)鹘y(tǒng)網(wǎng)絡(luò)使改進(jìn)網(wǎng)絡(luò)更輕量化,更利于嵌入式設(shè)備和移動(dòng)端的部署,為實(shí)現(xiàn)采摘機(jī)器人自動(dòng)采摘提供理論依據(jù)。并通過采用載入預(yù)訓(xùn)練權(quán)重并凍結(jié)部分層的訓(xùn)練方式比不使用預(yù)訓(xùn)練權(quán)重平均精度提高約3%。
2)在1 071張相同測(cè)試集上以調(diào)和平均值、平均精度和準(zhǔn)確率為判斷依據(jù),通過對(duì)比試驗(yàn)驗(yàn)證提出方法的可行性與優(yōu)越性。與YOLOv4網(wǎng)絡(luò)相比,通過替換骨干網(wǎng)路、改進(jìn)特征金字塔網(wǎng)絡(luò)和改進(jìn)路徑聚合網(wǎng)絡(luò),在IOU=0.50時(shí),本文提出的方法調(diào)和均值提升了0.15,平均精度提高了8.29個(gè)百分點(diǎn),準(zhǔn)確率提高了6.55個(gè)百分點(diǎn),權(quán)重大小約為YOLOv4的1/5,檢測(cè)速度為3.01 ms/張。通過對(duì)比試驗(yàn),驗(yàn)證了該方法具有顯著優(yōu)勢(shì)。
[1] Tang Y C, Chen M Y, Wang C L, et al. Recognition and localization methods for vision-based fruit picking robots: Areview[J]. Frontiers in Plant Science, 2020, 11: 1-17.
[2] 趙獻(xiàn)立,王志明. 機(jī)器學(xué)習(xí)算法在農(nóng)業(yè)機(jī)器視覺系統(tǒng)中的應(yīng)用[J]. 江蘇農(nóng)業(yè)科學(xué),2020,48(12):226-231.
Zhao Xianli, Wang Zhiming. Application of machine learning algorithm in agricultural machine vision system[J].Jiangsu Agricultural Sciences, 2020, 48(12): 226-231. (in Chinese with English abstract)
[3] 林偉明,胡云堂. 基于YUV 顏色模型的番茄收獲機(jī)器人圖像分割方法[J]. 農(nóng)業(yè)機(jī)械學(xué)報(bào),2012,43(12):176-180.
Lin Weiming, Hu Yuntang. Image segmentation method based on YUV color space for tomato harvesting robort[J]. Transactions of the Chinese Society for Agricultural Machinery, 2012, 43(12): 176-180. (in Chinese with English abstract)
[4] Ochida C O, Itodo A U, Nwanganga P A. A review on postharvest storage, processing and preservation of tomatoes (lycopersicon esculentum mill)[J]. Asian Food Science Journal, 2018, 6(2): 1-10.
[5] 馬翠花,張學(xué)平,李育濤,等. 基于顯著性檢測(cè)與改進(jìn)Hough變換方法識(shí)別未成熟番茄[J].農(nóng)業(yè)工程學(xué)報(bào),2016,32(14):219-226.
Ma Cuihua, Zhang Xueping, Li Yutao, et al. Identification of immature tomatoes base on salient region detection and improved Hough transform method[J]. Transactions of the Chinese Society of Agricultural Engineering, 2016, 32(14): 219-226.
[6] 李寒,陶涵虓,崔立昊,等. 基于SOM—K-means 算法的番茄果實(shí)識(shí)別與定位方法[J]. 農(nóng)業(yè)機(jī)械學(xué)報(bào),2021,52(1):23-29.
Li Han, Tao Hanxiao, Cui Lihao, et al. Recognition and localization method of tomato based on SOM—K-means algorithm. Transactions of the Chinese Society for Agricultural Machinery, 2021,52(1):23-29.
[7] Payne A B, Walsh K B, Subedi P P, et al. Estimation of mango crop yield using image analysis-segmentation method[J]. Computers and Electronics in Agriculture, 2013,91: 57-64.
[8] 李昕,李立君,高自成,等. 改進(jìn)類圓隨機(jī)Hough 變換及其在油茶果實(shí)遮擋識(shí)別中的應(yīng)用[J]. 農(nóng)業(yè)工程學(xué)報(bào),2013,29(1):164-170.
Li Xin, Li Lijun, Gao Zicheng, et al. Revised quasi-circular randomized Hough transform and its application in camellia-fruit recognition[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2013, 29(1): 164-170. (in Chinese with English abstract)
[9] Gong A, Yu J, He Y, et al. Citrus yield estimation based on images processed by an Android mobile phone[J]. Biosystems Engineering, 2013, 115(2): 162-170.
[10] Xu Y, Imou K, Kaizu Y, et al. Two-stage approach for detecting slightly overlapping strawberries using HOG descriptor[J]. Biosystems Engineering, 2013, 115(2): 144-153.
[11] 謝忠紅,姬長(zhǎng)英,郭小清,等. 基于改進(jìn)Hough 變換的類圓果實(shí)目標(biāo)檢測(cè)[J]. 農(nóng)業(yè)工程學(xué)報(bào),2010,26(7):157-162.
Xie Zhonghong, Ji Changying, Guo Xiaoqing, et al. Target detection of fruit-like fruit based on improved Hough transform[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2010, 26(7): 157-162. (in Chinese with English abstract)
[12] Zhao C, Lee W S, He D. Immature green citrus detection based on colour feature and sum of absolute transformed difference (SATD) using colour images in the citrus grove[J]. Computers and Electronics in Agriculture, 2016, 124: 243-253.
[13] Liu S, Yang C, Hu Y, et al. A method for segmentation and recognition of mature citrus and branches-leaves based on regional features[C]//Chinese Conference on Image and Graphics Technologies. Singapore: Springer, 2018: 292-301.
[14] 盧軍,桑農(nóng). 變化光照下樹上柑橘目標(biāo)檢測(cè)與遮擋輪廓恢復(fù)技術(shù)[J]. 農(nóng)業(yè)機(jī)械學(xué)報(bào),2014,45(4):76-81.
Lu Jun, Sang Nong. Detection of citrus fruits within tree canopy and recovery for occlusion contour in variable illumination[J]. Transactions of the Chinese Society for Agricultural Machinery, 2014, 45(4): 76-81. (in Chinese with English abstract)
[15] Stein M, Bargoti S, Underwood J. Image based mango fruit detection, localisation and yield estimation using multiple view geometry[J]. Sensors, 2016, 16(11): 1915.
[16] Xu Y, Imou K, Kaizu Y, et al. Two-stage approach for detecting slightly overlapping strawberries using HOG descriptor[J]. Biosystems engineering, 2013, 115(2): 144-153.
[17] Wachs J P, Stern H I, Burks T, et al. Low and high-level visual feature-based apple detection from multi-modal images[J]. Precision Agriculture, 2010, 11(6): 717-735.
[18] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition,Columbus, OH, USA, 2014: 580-587.
[19] Girshick R. Fast R-CNN[C]//IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 1440-1448.
[20] Ren S, He K, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[C]//Annual Conference on Neural Information Processing Systems, Montreal, Canada, 2015: 91-99.
[21] 閆建偉,趙源,張樂偉,等. 改進(jìn)Faster-RCNN 自然環(huán)境下識(shí)別刺梨果實(shí)[J]. 農(nóng)業(yè)工程學(xué)報(bào),2019,35(18):143-150.
Yan Jianwei, Zhao Yuan, Zhang Lewei, et al. Recognition of Rosa roxbunghii in natural environment based on improved Faster RCNN[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2019, 35(18): 143-150. (in Chinese with English abstract)
[22] Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//European Conference on Computer Vision Springer, Cham, 2016: 21-37.
[23] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016: 779-788.
[24] 成偉,張文愛,馮青春,等. 基于改進(jìn)YOLOv3的溫室番茄果實(shí)識(shí)別估產(chǎn)方法[J].中國(guó)農(nóng)機(jī)化學(xué)報(bào),2021,42(4):176-182.
Cheng Wei, Zhang Wenai. Feng Qingchun, et al. Method of greenhouse tomato fruit dentification and yield estimation based on improved YoLOv3[J]. Journal of Chinese Agricultural Mechanization. 2021 42(4): 176-182. (in Chinese with English abstract)
[25] 呂石磊,盧思華,李震,等. 基于改進(jìn)YOLOv3-LITE 輕量級(jí)神經(jīng)網(wǎng)絡(luò)的柑橘識(shí)別方法[J]. 農(nóng)業(yè)工程學(xué)報(bào),2019,35(17):205-214.
Lü Shilei, Lu Sihua, Li Zhen, et al. Orange recognition method using improved YOLOv3-LITE lightweight neural network[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2019, 35(17): 205-214. (in Chinese with English abstract)
[26] 薛月菊,黃寧,涂淑琴,等. 未成熟芒果的改進(jìn)YOLOv2 識(shí)別方法[J]. 農(nóng)業(yè)工程學(xué)報(bào),2018,34(7):173-179.
Xue Yueju, Huang Ning, Tu Shuqin, et al. Immature mango detection based on improved YOLOv2[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2018, 34(7): 173-179. (in Chinese with English abstract)
[27] 趙德安,吳任迪,劉曉洋,等. 基于YOLO 深度卷積神經(jīng)網(wǎng)絡(luò)的復(fù)雜背景下機(jī)器人采摘蘋果定位[J]. 農(nóng)業(yè)工程學(xué)報(bào),2019,35(3):164-173.
Zhao Dean, Wu Rendi, Liu Xiaoyang, et al. Apple positioning based on YOLO deep convolutional neural network for picking robot in complex background[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2019, 35(3): 164-173. (in Chinese with English abstract)
[28] 張健. 基于改進(jìn)YOLOv3的果園行人檢測(cè)方法研究[D]. 鎮(zhèn)江:江蘇大學(xué),2020.
Zhang Jian. Research on Orchard Pedestrian Detection Method based on Improved YOLOv3[D]. Zhenjiang: Jiangsu University, 2020. (in Chinese with English abstract)
[29] 蔡逢煌,張?jiān)丽?,黃捷. 基于YOLOv3 與注意力機(jī)制的橋梁表面裂痕檢測(cè)算法[J]. 模式識(shí)別與人工智能,2020,33(10):926-933.
Cai Fenghuang, Zhang Yuexin, Huang Jie. Bridge surface crack detection algorithm based on YOLOv3 and attention mechanism[J]. Pattern Recognition and Artificial Intelligence, 2020, 33(10): 926-933. (in Chinese with English abstract)
[30] Hu X L, Liu Y, Zhao Z X, et al. Real-time detection of uneaten feed pellets in underwater images for aquaculture using an improved YOLO-V4 network[J]. Computers and Electronics in Agriculture, 2021, 185: 106135.
[31] 李就好,林樂堅(jiān),田凱,等. 改進(jìn)Faster R-CNN的田間苦瓜葉部病害檢測(cè)[J]. 農(nóng)業(yè)工程學(xué)報(bào),2020,36(12):179-185.
Li Jiuhao, Lin Lejian, Tian Kai, et al. Detection of leaf diseases of balsam pear in the field based on improved Faster R-CNN[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2020, 36(12): 179-185. (in Chinese with English abstract)
[32] Bochkovskiy A, Wang C Y, Liao H Y M. YOLOv4: Optimal speed and accuracy of object detection[J]. arXiv preprint arXiv: 2004. 10934.
[33] Howard A, Sandler M, Chen B, et al. Searching for MobileNetV3[C]. // IEEE/CVF International Conference on Computer Vision, Seoul, SOUTH KOREA, 2019: 1314-1324.
Recognition of dense cherry tomatoes based on improved YOLOv4-LITE lightweight neural network
Zhang Fu1,2,3, Chen Zijun1, Bao Ruofei1, Zhang Chaochen1, Wang Zhihao1
(1.,,471003,;2.,,212013,;3.,471003,)
Small and hidden conditions of dense cherry tomatoes have posed a great challenge to the rapid identification and positioning of fruits. New key technology with strong robustness is highly demanding to improve the efficiency and yield prediction of cherry tomatoes in the facility agriculture environment. In this study, a novel recognition method was proposed to locate the dense cherry tomatoes using an improved YOLOv4-LITE lightweight neutral network. A mobile Net-v3 easy migration to mobile terminals was selected as the feature extraction network of the model to construct a YOLOv4-LITE for a higher detection speed of cherry tomatoes. A feature pyramid network was set as the modified (FPN) + Path Aggregation Network (PANet) structure, in order to avoid replacing the backbone network to reduce the detection accuracy. Specifically, a 104×104 Future map was introduced to achieve fine-grained detection for the small targets. More importantly, a deep separable convolution was used in the PANet structure to reduce the number of model calculations. The new network was more lightweight, where the generalization ability of the model was improved by loading pre-training weights and freezing partial layer training. A comparison was made on the recognition effect of YOLOv4, F1, and AP on the test set with the same degree of occlusion or adhesion, further to evaluate the difference between the models. The test results show that the improved FPN structure on the basis of YOLOv4 was higher than the AP50of the original YOLOv4 AP75increased by 15.00 percentage points, and the F1 increased by 0.14 and 0.24 under the corresponding IOU threshold. However, the weight increased by 4 MB, while the detection speed increased to 0.27 ms/sheet, and the number of network parameters increased by 14.85%. The improved FPN structure on the basis of YOLOv4+MobiletNet-V3, AP50increased by 6.58 percentage points, AP75increased by 21.82 percentage points, F1 value increased by 0.13 and 0.20 under the corresponding IOU threshold, indicating that YOLOv4 and YOLOv4+MobiletNet-V3 lacked small goals. Fortunately, the Future map of small targets was added to improve the fine-grained detection of the model, but the number of model parameters and weights increased accordingly. As such, the PANet structure was improved to introduce a deep separable convolutional network, while ensuring high F1, AP, Recall and Precision.Optimal performance was achieved, where the model weight was compressed to 45.3 MB, the detection speed was 3.01 ms/sheet, and the network parameters were 12 026 685. Specifically, the new network was reduced by 198.7MB, compared with the original YOLOv4. The data indicated that the improved PANet strategy presented similar accuracy under such circumstances, while effectively reduced memory consumption, and the number of model parameters, but accelerated the speed of model recognition. The F1, AP50, and recall of the proposed recognition model for the dense cherry tree on all test sets were 0.99, 99.74%, and 99.15%, respectively. The improved YOLOv4 increased by 0.15, 8.29, and 6.55 percentage points, respectively, and the weight size was 45.3MB, about 1/5 of YOLOv4. Additionally, the detection of a single 416×416 image reached a speed of 3.01ms/frame on the GPU. Therefore, the recognition model of dense cherry tomatoes behaved a higher speed of recognition, a higher accuracy, and lighter weight than before. The finding can provide strong support to the efficient production forecast of cherry tomatoes in the facility agriculture environment.
computer vision; models; YOLO; deep learning; image recognition; target detection
張伏,陳自均,鮑若飛,等. 基于改進(jìn)型YOLOv4-LITE輕量級(jí)神經(jīng)網(wǎng)絡(luò)的密集圣女果識(shí)別[J]. 農(nóng)業(yè)工程學(xué)報(bào),2021,37(16):270-278.doi:10.11975/j.issn.1002-6819.2021.16.033 http://www.tcsae.org
Zhang Fu, Chen Zijun, Bao Ruofei, et al. Recognition of dense cherry tomatoes based on improved YOLOv4-LITE lightweight neural network[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2021, 37(16): 270-278. (in Chinese with English abstract) doi:10.11975/j.issn.1002-6819.2021.16.033 http://www.tcsae.org
2021-06-02
2021-08-14
國(guó)家自然科學(xué)基金資助項(xiàng)目(52075149);河南省科技攻關(guān)計(jì)劃項(xiàng)目(212102110029);現(xiàn)代農(nóng)業(yè)裝備與技術(shù)教育部重點(diǎn)實(shí)驗(yàn)室和江蘇省農(nóng)業(yè)裝備與智能化高技術(shù)重點(diǎn)實(shí)驗(yàn)室開放基金課題(JNZ201901);河南省高等教育教學(xué)改革研究與實(shí)踐項(xiàng)目(研究生教育)成果(2019SJGLX063Y)
張伏,教授,研究方向?yàn)檗r(nóng)業(yè)信息化與農(nóng)業(yè)裝備仿生技術(shù)。Email:zhangfu30@126.com
10.11975/j.issn.1002-6819.2021.16.033
TP391,TP81
A
1002-6819(2021)-16-0270-09