王金鵬,周佳良,張躍躍,胡皓若
?研究速報(bào)?
基于優(yōu)選YOLOv7模型的采摘機(jī)器人多姿態(tài)火龍果檢測(cè)系統(tǒng)
王金鵬,周佳良,張躍躍,胡皓若
(南京林業(yè)大學(xué)機(jī)械電子工程學(xué)院,南京 210000)
為了檢測(cè)復(fù)雜自然環(huán)境下多種生長(zhǎng)姿態(tài)的火龍果,該研究基于優(yōu)選YOLOv7模型提出一種多姿態(tài)火龍果檢測(cè)方法,構(gòu)建了能區(qū)分不同姿態(tài)火龍果的視覺(jué)系統(tǒng)。首先比較了不同模型的檢測(cè)效果,并給出不同設(shè)備的建議模型。經(jīng)測(cè)試,YOLOv7系列模型優(yōu)于YOLOv4、YOLOv5和YOLOX的同量級(jí)模型。適用于移動(dòng)設(shè)備的YOLOv7-tiny模型的檢測(cè)準(zhǔn)確率為83.6%,召回率為79.9%,平均精度均值(mean average precision,mAP)為88.3%,正視角和側(cè)視角火龍果的分類準(zhǔn)確率為80.4%,推理一張圖像僅需1.8 ms,與YOLOv3-tiny、YOLOv4-tiny和YOLOX-tiny相比準(zhǔn)確率分別提高了16.8、4.3和4.8個(gè)百分點(diǎn),mAP分別提高了7.3、21和3.9個(gè)百分點(diǎn),與EfficientDet、SSD、Faster-RCNN和CenterNet相比mAP分別提高了8.2、5.8、4.0和42.4個(gè)百分點(diǎn)。然后,該研究對(duì)不同光照條件下的火龍果進(jìn)行檢測(cè),結(jié)果表明在強(qiáng)光、弱光、人工補(bǔ)光條件下均保持著較高的精度。最后將基于YOLOv7-tiny的火龍果檢測(cè)模型部署到Jetson Xavier NX上并針對(duì)正視角火龍果進(jìn)行了驗(yàn)證性采摘試驗(yàn),結(jié)果表明檢測(cè)系統(tǒng)的推理分類時(shí)間占完整采摘?jiǎng)幼骺倳r(shí)間的比例約為22.6%,正視角火龍果采摘成功率為90%,驗(yàn)證了基于優(yōu)選YOLOv7的火龍果多姿態(tài)檢測(cè)系統(tǒng)的性能。
深度學(xué)習(xí);卷積神經(jīng)網(wǎng)絡(luò);采摘機(jī)器人;YOLOv7;目標(biāo)檢測(cè);火龍果
近年來(lái),隨著智慧農(nóng)業(yè)的不斷發(fā)展,水果采摘機(jī)器人成為熱門的研究課題[1],如何構(gòu)建快速且準(zhǔn)確檢測(cè)果實(shí)的機(jī)器視覺(jué)系統(tǒng)是研究采摘機(jī)器人的關(guān)鍵問(wèn)題之一[2-3]?;瘕埞云淝逅目诟小@眼的外觀和吉祥的名字而成為中國(guó)和東南亞最受歡迎的水果之一。目前火龍果的采摘方式多為人工采摘,且火龍果的枝條較硬、帶刺,采摘?jiǎng)趧?dòng)強(qiáng)度大、采摘成本高[4],因此,實(shí)現(xiàn)火龍果智能化采摘是大勢(shì)所趨。
果實(shí)識(shí)別與定位是采摘機(jī)器人智能化作業(yè)的重要環(huán)節(jié),目前對(duì)于果實(shí)識(shí)別與定位的研究較多[5-7]。鄧子青等[8]基于Otsu算法與形態(tài)學(xué)對(duì)火龍果圖像進(jìn)行分割,并求取處理后的質(zhì)心,實(shí)現(xiàn)了識(shí)別與定位,與其他分割算法相比,該方法對(duì)火龍果分割時(shí)的噪點(diǎn)大大減少,邊緣更加光滑。舒田等[9]通過(guò)原始光譜、光譜不同形式變換和不同植被指數(shù)對(duì)火龍果植株冠層、果、枝、花等不同部位進(jìn)行識(shí)別,并構(gòu)建BP模型來(lái)分析不同部位波段特征。結(jié)果表明綠紅植被指數(shù)(red green vegetation index)對(duì)于果實(shí)的識(shí)別能力最強(qiáng),準(zhǔn)確率達(dá)82.8%。肖冬娜等[10]通過(guò)融合4種顏色指數(shù),構(gòu)建融合顏色指數(shù)與點(diǎn)云數(shù)據(jù)空間結(jié)構(gòu)的火龍果單株識(shí)別規(guī)則進(jìn)行分割,結(jié)果表明可見(jiàn)光波段差異顏色指數(shù)融合提取結(jié)果精度最高,其單株面積值的均方根誤差為0.28 m2。
近年來(lái),隨著深度學(xué)習(xí)技術(shù)的發(fā)展[11],越來(lái)越多的研究人員利用卷積神經(jīng)網(wǎng)絡(luò)來(lái)提取特征[12-16],規(guī)避了人為設(shè)計(jì)的特征提取過(guò)程,因此有著更好的泛化能力[17]。在農(nóng)業(yè)領(lǐng)域,單階段目標(biāo)檢測(cè)算法YOLO系列以其高準(zhǔn)確率和高效率而被廣泛使用[18-21]。WANG等[22]曾利用改進(jìn)YOLOv4對(duì)火龍果進(jìn)行識(shí)別,將原本參數(shù)量較多的CSPDarknet主干網(wǎng)絡(luò)替換為參數(shù)量較少的MobileNetv3以提高模型的檢測(cè)速度,將網(wǎng)絡(luò)的第39層和46層直接進(jìn)行上采樣特征融合以提高對(duì)小目標(biāo)的檢測(cè)精度。改進(jìn)后的YOLOv4平均檢測(cè)精度達(dá)到96.48%,平均檢測(cè)時(shí)間為2.28 ms,比原YOLOv4縮短了160.32 ms。楊堅(jiān)等[23]利用改進(jìn)YOLOv4-tiny對(duì)番茄果實(shí)進(jìn)行識(shí)別,增加了76×76的檢測(cè)層以提高小番茄的識(shí)別準(zhǔn)確率,添加卷積注意力模塊以提高遮擋番茄的識(shí)別準(zhǔn)確率,并使用密集連接的卷積網(wǎng)絡(luò)以加強(qiáng)全局特征融合。該方法的平均精度達(dá)到97.9%,平均檢測(cè)速度為每秒111幀。文斌等[24]利用改進(jìn)YOLOv3對(duì)三七葉片病害進(jìn)行檢測(cè),用注意力特征金字塔替代YOLOv3中原始特征金字塔以解決特征融合過(guò)程的干擾問(wèn)題,并使用雙瓶頸層篩選注意力特征金字塔,提取特征加強(qiáng)模型的泛化能力,該方法相比原YOLOv3模型,平均精度提高了1.47個(gè)百分點(diǎn),且在復(fù)雜環(huán)境下抗干擾能力明顯提高。劉天真等[25]利用改進(jìn)YOLOv3對(duì)自然場(chǎng)景下冬棗果實(shí)進(jìn)行識(shí)別,他們利用SENet中的SE結(jié)構(gòu)將特征層的特征權(quán)重校準(zhǔn)為特征權(quán)值,提高了模型的識(shí)別精度,該方法檢測(cè)準(zhǔn)確率達(dá)到88.71%,召回率為83.80%,平均精度為82.01%,與原模型相比平均精度提高了4.78個(gè)百分點(diǎn)。上述方法大多只考慮果實(shí)的位置,很少考慮果實(shí)的姿態(tài)對(duì)采摘的影響。目前,火龍果的識(shí)別定位研究有了一定的進(jìn)展,但仍存在不少問(wèn)題:1)由于自然環(huán)境復(fù)雜多變,目前存在的火龍果識(shí)別定位算法適應(yīng)性普遍不足,存在著較多誤識(shí)別和漏識(shí)別的情況;2)采摘火龍果一般用剪切的方式,由于火龍果的生長(zhǎng)姿態(tài)比較復(fù)雜,因此,部分果實(shí)僅僅定位出果實(shí)中心點(diǎn)不足以引導(dǎo)機(jī)械手爪實(shí)現(xiàn)精準(zhǔn)采摘?;谏鲜鰡?wèn)題,本文提出一種基于優(yōu)選YOLOv7模型的多姿態(tài)火龍果檢測(cè)方法,在識(shí)別定位果實(shí)的基礎(chǔ)上對(duì)不同生長(zhǎng)姿態(tài)的火龍果進(jìn)行分類,通過(guò)對(duì)比YOLOv7系列模型和其他先進(jìn)的目標(biāo)檢測(cè)模型檢測(cè)火龍果的性能,最終選擇檢測(cè)速度較快且精度較高的YOLOv7-tiny,并結(jié)合深度相機(jī)部署到移動(dòng)設(shè)備上構(gòu)建視覺(jué)檢測(cè)系統(tǒng),最后,對(duì)正視角火龍果進(jìn)行了驗(yàn)證性采摘試驗(yàn),為火龍果的智能化精準(zhǔn)采摘提供了一定的技術(shù)指導(dǎo)。
本次試驗(yàn)使用的圖像數(shù)據(jù)為果園實(shí)地拍攝所得,拍攝地點(diǎn)為江蘇省南京市溧水區(qū)溧樂(lè)農(nóng)業(yè)火龍果園,拍攝設(shè)備使用的是佳能77D相機(jī),圖片保存格式為JPEG,分辨率為2 400×1 600。受自然光照強(qiáng)度的影響火龍果的顏色略有不同,向光果實(shí)呈現(xiàn)出亮紅色,背光果實(shí)呈現(xiàn)出暗紅色。為了能夠適應(yīng)不同的工作環(huán)境,在上午、下午以及傍晚進(jìn)行圖像數(shù)據(jù)采集,共采集到1 281張火龍果的原始圖像。包括強(qiáng)光、弱光及人工補(bǔ)光果實(shí)圖像等,圖1為部分采集到的圖像。
圖1 自然環(huán)境下的火龍果圖像
使用LabelImg工具對(duì)原始圖像進(jìn)行標(biāo)注,并隨機(jī)劃分為訓(xùn)練集、驗(yàn)證集和測(cè)試集,數(shù)量分別為1 036、116和129張。數(shù)據(jù)集按照三種光照條件劃分結(jié)果如表1所示。
表1 數(shù)據(jù)集劃分表
本研究依據(jù)火龍果相對(duì)于深度相機(jī)的位置不同將火龍果姿態(tài)分為側(cè)視角火龍果和正視角火龍果。深度相機(jī)位于機(jī)器人腰關(guān)節(jié)正前方,當(dāng)視野中火龍果長(zhǎng)在枝條的兩側(cè)時(shí),這類火龍果被檢測(cè)為側(cè)視角火龍果,采用咬合式末端執(zhí)行器的機(jī)器人需要調(diào)整末端執(zhí)行器的接近角度,使得末端執(zhí)行器從火龍果的尾部方向靠近才能完成采摘;另一方面,當(dāng)枝條上的火龍果尾部朝向機(jī)器人一側(cè)時(shí),這類火龍果被檢測(cè)為正視角火龍果,咬合式末端執(zhí)行器無(wú)需太多調(diào)整動(dòng)作即可完成采摘,圖2為上述兩類火龍果與機(jī)器人(末端執(zhí)行器/視覺(jué)系統(tǒng))的初始位置關(guān)系圖。
如圖2所示,圖2a中的果實(shí)能被現(xiàn)有的咬合式末端執(zhí)行器方便地剪切,圖2b中的果實(shí)不能很方便地被剪切采摘,需要視覺(jué)系統(tǒng)更加精準(zhǔn)的引導(dǎo)機(jī)器人對(duì)末端執(zhí)行器進(jìn)行精確的姿態(tài)調(diào)整。因此,本研究對(duì)不同姿態(tài)的火龍果進(jìn)行分類是有意義的。
圖2 正視角火龍果和側(cè)視角火龍果與末端執(zhí)行器位置關(guān)系
1.2.1 YOLOv7目標(biāo)檢測(cè)模型優(yōu)選
YOLOv7是2022年7月初推出的目標(biāo)檢測(cè)算法,作者在公共數(shù)據(jù)集上進(jìn)行的對(duì)比試驗(yàn)結(jié)果表明,YOLOv7有著目前目標(biāo)檢測(cè)領(lǐng)域最高的檢測(cè)速度和檢測(cè)精度。YOLOv7-tiny是其中參數(shù)量最少的模型,本研究擬優(yōu)選YOLOv7-tiny目標(biāo)檢測(cè)模型構(gòu)建視覺(jué)檢測(cè)系統(tǒng),圖3為YOLOv7-tiny模型結(jié)構(gòu)示意圖。輸入圖像經(jīng)過(guò)backbone進(jìn)行特征提取后獲得3個(gè)不同尺度的特征圖,這3個(gè)尺度的特征圖在head中進(jìn)一步將不同尺度的特征進(jìn)行整合,最終輸出3個(gè)尺度的預(yù)測(cè)結(jié)果分別用于檢測(cè)大目標(biāo)、中目標(biāo)和小目標(biāo)。
注:Convolution指的是卷積層,BN指的是批量歸一化層,LeakyReLU指的是激活函數(shù)。
1.2.2 試驗(yàn)環(huán)境
本文訓(xùn)練和測(cè)試使用的硬件環(huán)境為:CPU R7 5800X、GPU RTX 3070Ti、內(nèi)存8GB*2,軟件環(huán)境為Win10操作系統(tǒng)、Python 3.8、Pytorch 1.10、Torchvision 0.11、cuda 11.3、cudnn 8.2.0。
本文部署用于室外采摘試驗(yàn)的硬件主要為Jetson Xavier NX開(kāi)發(fā)板、ZED雙目相機(jī)、S6H4D_Plus六軸機(jī)械臂,如圖4所示。
1.2.3 模型訓(xùn)練
本次研究針對(duì)YOLOv7訓(xùn)練了7種網(wǎng)絡(luò),分別為YOLOv7、YOLOv7-d6、YOLOv7-e6、YOLOv7-e6e、YOLOv7-tiny、YOLOv7-w6和YOLOv7x。其中YOLOv7-tiny、YOLOv7、YOLOv7-w6分別是為邊緣GPU、普通GPU和云GPU設(shè)計(jì)的三種模型,YOLOv7x則是對(duì)YOLOv7模型深度和寬度進(jìn)行縮放獲得,YOLOv7-e6和YOLOv7-d6則是對(duì)YOLO-w6進(jìn)行縮放所獲得的,YOLOv7-e6e則是由YOLOv7-E6使用E-ELAN后獲得。
圖4 機(jī)器人采摘系統(tǒng)示意圖
訓(xùn)練使用的超參數(shù)如下:輸入圖像尺寸為640×640像素,訓(xùn)練迭代次數(shù)為200,批量大小為4,參數(shù)優(yōu)化器為SGD(stochastic gradient descent),動(dòng)量(momentum)為0.937,權(quán)重衰減項(xiàng)(weight decay)為0.000 5,初始學(xué)習(xí)率為0.01,最小學(xué)習(xí)率為0.001,學(xué)習(xí)率下降為余弦退火算法,訓(xùn)練的初始權(quán)重為隨機(jī)初始化。
1.2.4 評(píng)價(jià)指標(biāo)
本文選擇準(zhǔn)確率(precision,)、召回率(recall,)、準(zhǔn)確率和召回率的調(diào)和平均F1、平均精度(average precision,AP)、平均精度均值(mean average precision,mAP)作為模型的評(píng)價(jià)指標(biāo)[12]。其中可以反映模型的查準(zhǔn)率,可以反映模型的查全率,1可以反映準(zhǔn)確率和召回率的綜合性能,AP可以反映單個(gè)類別的平均精度,mAP可以反映所有類別的平均精度均值。
本節(jié)將對(duì)比YOLOv7系列7種不同模型的檢測(cè)性能,為選擇最適用于火龍果采摘機(jī)器人的模型提供依據(jù)。使用測(cè)試集中129張圖片對(duì)YOLOv7系列的7種模型進(jìn)行測(cè)試,7種模型的屬性和檢測(cè)性能如表2所示。
從表2各列中數(shù)值可以發(fā)現(xiàn),YOLOv7-e6e模型有著最高的準(zhǔn)確率85.0%,這意味著使用該模型時(shí)誤識(shí)別的概率最低,采摘機(jī)器人錯(cuò)誤抓取非目標(biāo)果實(shí)的概率最低;YOLOv7x模型有著最高的召回率85.4%,這意味著使用該模型時(shí)漏檢率最低,采摘機(jī)器人漏采摘的概率最低;YOLOv7x模型也是檢測(cè)側(cè)視角的火龍果效果最好的模型,使用該模型檢測(cè)側(cè)視角的火龍果平均精度為92.3%;YOLOv7模型檢測(cè)的平均精度均值為89.3%,高于其他所有模型,其也是檢測(cè)正視角的火龍果效果最好的模型。當(dāng)綜合考慮準(zhǔn)確率和召回率時(shí),YOLOv7x的F1值為83.1%,高于其他所有模型。YOLOv7-tiny模型的參數(shù)量為0.6×107,模型大小為12 MB,模型層數(shù)為255,推理時(shí)間為1.8 ms,其參數(shù)量、模型大小和模型層數(shù)最少,這意味著使用該模型時(shí),單位時(shí)間內(nèi)能夠檢測(cè)更多幀圖像,有利于提高采摘機(jī)器人的效率。根據(jù)表中AP0和AP1的結(jié)果,可以發(fā)現(xiàn),正視角火龍果平均精度均低于側(cè)視角的火龍果,這意味著網(wǎng)絡(luò)更難檢測(cè)正視角的火龍果,這對(duì)機(jī)器人視覺(jué)系統(tǒng)和機(jī)械臂的空間布置提出了較高的要求。分析兩類火龍果檢測(cè)精度存在差異的原因后得知,一些果實(shí)姿態(tài)介于正視角和側(cè)視角之間,難以有定量的界定指標(biāo),尤其是正視角的果實(shí)。
表2 7種YOLOv7系列模型的檢測(cè)性能和參數(shù)對(duì)比
注:AP0和AP1分別為對(duì)側(cè)視角火龍果和正視角火龍果計(jì)算的平均精度。、和1分別為檢測(cè)兩類火龍果的平均準(zhǔn)確率、平均召回率和調(diào)和級(jí)數(shù)。
Note: AP0and AP1are the average precision of the dragon fruits in the side view and dragon fruits in the front view, respectively.,and1 are the mean precision, mean recall, and harmonic progression for detecting two types of dragon fruits, respectively.
對(duì)于火龍果的采摘,經(jīng)驗(yàn)表明,誤識(shí)別相對(duì)于漏識(shí)別會(huì)造成更加嚴(yán)重的損害,當(dāng)模型錯(cuò)誤的將側(cè)視角的火龍果識(shí)別為正視角的火龍果時(shí),可能會(huì)損壞果實(shí)與枝條,甚至損壞末端執(zhí)行器。選擇模型的原則為保證召回率不會(huì)太低和保證檢測(cè)的實(shí)時(shí)性的前提下選擇準(zhǔn)確率最高的模型。因此,對(duì)于配置較高的上位機(jī),為了盡可能減少誤識(shí)別率同時(shí)保證一定的檢測(cè)速度,可以使用準(zhǔn)確率最高的YOLOv7-e6e模型,該模型能最大程度降低誤識(shí)別造成的影響,若上位機(jī)配置較高,也能保證檢測(cè)的實(shí)時(shí)性。對(duì)于配置一般的上位機(jī),選擇參數(shù)量最小的YOLOv7-tiny模型,該模型最大的優(yōu)勢(shì)是輕量化。在一些配置較低的設(shè)備上也能實(shí)現(xiàn)實(shí)時(shí)檢測(cè),同時(shí)該模型的檢測(cè)準(zhǔn)確率為83.6%,優(yōu)于除了YOLOv7-e6e的所有模型??紤]到本次試驗(yàn)使用的上位機(jī)為Jetson Xavier NX,YOLOv7-e6e的檢測(cè)時(shí)間過(guò)長(zhǎng),不利于機(jī)器人的實(shí)時(shí)性檢測(cè),因此優(yōu)選YOLOv7-tiny作為后續(xù)機(jī)器人采摘試驗(yàn)的火龍果檢測(cè)模型。
YOLO系列是近些年熱門的目標(biāo)檢測(cè)算法,為了充分說(shuō)明YOLOv7-tiny的有效性。本文將YOLOv7-tiny與其他常用的目標(biāo)檢測(cè)模型進(jìn)行對(duì)比,對(duì)比結(jié)果如表3所示。
從表3中可以看出,YOLOv7-tiny與同量級(jí)的YOLOv3-tiny、YOLOv4-tiny和YOLOX-tiny相比準(zhǔn)確率分別提高了16.8、4.3和4.8個(gè)百分點(diǎn),mAP分別提高了7.3、21和3.9個(gè)百分點(diǎn);與YOLOv5s、YOLOXs、YOLOv4G、YOLOv4M、YOLOv5x和YOLOXx相比,準(zhǔn)確率分別提高了7.3、4.2、7.3、6.5、3.5和3.9個(gè)百分點(diǎn)。與EfficientDet、SSD、Faster-RCNN和CenterNet相比mAP分別提高了8.2、5.8、4.0和42.4個(gè)百分點(diǎn)。
此外,CenterNet模型檢測(cè)火龍果的準(zhǔn)確率最高(90.9%),但是該模型的召回率僅為22.1%,這意味著存在大量的漏檢測(cè)情況,不適用于采摘機(jī)器人。Faster-RCNN模型檢測(cè)火龍果的召回率最高(89.6%),但是該模型的準(zhǔn)確率僅為57.5%,這意味著存在大量誤檢測(cè)情況,也不適用于采摘機(jī)器人。YOLOv7-tiny模型的平均精度均值最高為88.3%,且準(zhǔn)確率僅次于CenterNet模型,因此,YOLOv7-tiny是最適合用于構(gòu)建火龍果采摘機(jī)器人視覺(jué)檢測(cè)系統(tǒng)的模型。
表3 YOLOv7-tiny與其他常用目標(biāo)檢測(cè)模型的檢測(cè)性能對(duì)比
注:YOLOv4G和YOLOv4M分別代表主干網(wǎng)絡(luò)為GhostNet和MobileNetv3的YOLOv4模型。
Note: YOLOv4G and YOLOv4M represent YOLOv4 models with GhostNet and mobilenetv3 backbone networks respectively.
本研究根據(jù)火龍果生長(zhǎng)姿態(tài)的不同將其分為正視角火龍果和側(cè)視角火龍果,使用訓(xùn)練過(guò)的YOLOv7-tiny模型對(duì)129張測(cè)試集圖像進(jìn)行預(yù)測(cè),其預(yù)測(cè)的混淆矩陣如表4所示。表中背景欄指的是除果實(shí)以外的物體以及模型在一些沒(méi)有果實(shí)的位置檢測(cè)出有果實(shí)的情況。
表4中可以看出側(cè)視角火龍果和正視角火龍果的標(biāo)簽數(shù)量分別為137和113,該137個(gè)側(cè)視角火龍果有116正確分類,有16個(gè)錯(cuò)誤的分類為正視角的火龍果,有5個(gè)漏識(shí)別;113個(gè)正視角的火龍果有85個(gè)正確分類,有18個(gè)錯(cuò)誤的分類為側(cè)視角的火龍果,有10個(gè)漏識(shí)別;此外,有4個(gè)背景被誤識(shí)別為側(cè)視角的火龍果,1個(gè)背景被誤識(shí)別為正視角的火龍果。測(cè)試集中的250個(gè)目標(biāo)果實(shí),有201個(gè)被正確分類,分類準(zhǔn)確率為80.4%。原因分析:1)極少部分火龍果的姿態(tài)處于正視角和側(cè)視角的邊緣,人工標(biāo)注時(shí)可能存在少部分誤差。2)極少部分距離比較遠(yuǎn)的果實(shí)在人工標(biāo)注時(shí)被忽略,但是網(wǎng)絡(luò)依然能識(shí)別它們,這就會(huì)導(dǎo)致將背景誤識(shí)別為火龍果。
表4 YOLOv7-tiny預(yù)測(cè)的混淆矩陣
從在將模型部署到邊緣計(jì)算設(shè)備上之前,應(yīng)當(dāng)驗(yàn)證模型有足夠的泛化能力。本文使用的深度相機(jī)為ZED雙目相機(jī),其可以用于室外,但是該相機(jī)受光照影響較大。因此,為了防止模型對(duì)某種光照條件下的火龍果檢測(cè)性能太差,對(duì)表1測(cè)試集中3種不同光照條件分別進(jìn)行測(cè)試,結(jié)果如表5所示。從表5中可以看出,模型在人工補(bǔ)光時(shí)的精度略低于強(qiáng)光和弱光時(shí),分析其原因可能是測(cè)試集中人工補(bǔ)光的樣本偏少,存在部分難以識(shí)別的樣本??傮w來(lái)說(shuō),在3種光照條件下YOLOv7-tiny模型均保持著較高的檢測(cè)精度,這證明YOLOv7-tiny模型有著較好的泛化能力,能夠用于自然環(huán)境下火龍果的檢測(cè)。圖5為YOLOv7-tiny模型對(duì)三種光照條件下的火龍果檢測(cè)結(jié)果的可視化。
表5 YOLOv7-tiny模型對(duì)不同光照條件下火龍果的檢測(cè)性能
注:圖中矩形框表示檢測(cè)出的果實(shí)的位置和邊界;df_side表示檢測(cè)出的側(cè)視角的火龍果;df_front表示檢測(cè)出的正視角的火龍果。
本次室外采摘試驗(yàn)地點(diǎn)為江蘇省南京市溧水區(qū)溧樂(lè)農(nóng)業(yè),其種植方式為大棚式,當(dāng)天天氣為陰天,光照較弱。由于側(cè)視角的火龍果不能直接根據(jù)識(shí)別到的中心點(diǎn)坐標(biāo)直接采摘,本文只對(duì)識(shí)別到的正視角的火龍果進(jìn)行采摘試驗(yàn),對(duì)于識(shí)別到的側(cè)視角需要進(jìn)一步處理獲取果實(shí)的生長(zhǎng)方向等信息,作者在其他文獻(xiàn)已有詳細(xì)研究[26-27]。
本文將YOLOv7-tiny部署到Jetson Xavier NX開(kāi)發(fā)板上,并結(jié)合深度相機(jī)與機(jī)械臂構(gòu)建一套火龍果采摘系統(tǒng),采摘系統(tǒng)的工作過(guò)程為:上位機(jī)處理深度相機(jī)采集到的火龍果圖像得到火龍果在相機(jī)坐標(biāo)系的三維坐標(biāo),再將相機(jī)坐標(biāo)系內(nèi)的坐標(biāo)轉(zhuǎn)換為機(jī)械臂坐標(biāo)系的三維坐標(biāo),機(jī)械臂根據(jù)坐標(biāo)規(guī)劃路徑到達(dá)采摘點(diǎn)。對(duì)于檢測(cè)到的正視角的火龍果,會(huì)設(shè)置機(jī)械臂正常采摘,對(duì)于檢測(cè)到的側(cè)視角的火龍果,不設(shè)置機(jī)械臂的采摘。圖6為機(jī)器人采摘火龍果的過(guò)程。
圖6 機(jī)器人采摘火龍果的過(guò)程
圖6a為軟件系統(tǒng)處理完成后機(jī)械臂按一定的路徑前往采摘點(diǎn),6b為機(jī)械臂到達(dá)采摘點(diǎn),6c為末端執(zhí)行器執(zhí)行剪切動(dòng)作,6d為成功采摘果實(shí)。
為了驗(yàn)證采摘系統(tǒng)的成功率以及YOLOv7模型的效率,進(jìn)行了多組正視角的火龍果采摘試驗(yàn),如果采摘的果實(shí)沒(méi)有明顯損傷,那么就認(rèn)為采摘成功。表6中記錄了20次采摘試驗(yàn)的數(shù)據(jù)。從表6中數(shù)據(jù)可以看出,上位機(jī)推理時(shí)間占整個(gè)采摘過(guò)程的比例較低,為22.6%,說(shuō)明本文部署的YOLOv7-tiny模型有較快的推理速度以及較高的定位準(zhǔn)確率,能夠滿足實(shí)時(shí)性要求;正視角火龍果采摘成功率為90%,說(shuō)明本研究的采摘系統(tǒng)基本能夠完成火龍果的自動(dòng)化采摘。試驗(yàn)中兩次采摘失敗主要原是因?yàn)樯疃认鄼C(jī)計(jì)算的距離存在一定誤差,導(dǎo)致采摘時(shí)末端執(zhí)行器剪切到果實(shí)。
表6 采摘試驗(yàn)結(jié)果
注:上位機(jī)推理時(shí)間指的是從整個(gè)程序開(kāi)始運(yùn)行到機(jī)械臂開(kāi)始運(yùn)動(dòng)記錄的時(shí)間,總采摘時(shí)間指的是從整個(gè)程序開(kāi)始運(yùn)行到機(jī)械臂采摘完成后復(fù)位記錄的時(shí)間。
Note: The inference time of the upper computer refers to the time from the start of the entire program to the start of the motion recording of the robotic arm, and the total picking time refers to the time from the start of the entire program to the time when the robotic arm returns to the record after picking.
為了有效的實(shí)現(xiàn)火龍果自動(dòng)化采摘,本研究基于YOLOv7模型,對(duì)不同姿態(tài)的火龍果分類,并對(duì)正視角的火龍果實(shí)現(xiàn)了采摘,其中主要結(jié)論如下:
1)YOLOv7系列內(nèi)的7種模型中,YOLOv7-e6e模型的準(zhǔn)確率最高,但是參數(shù)量太大,難以部署。YOLOv7-tiny模型檢測(cè)火龍果的準(zhǔn)確率為83.6%,召回率為79.9%,平均精度均值mAP為88.3%,火龍果姿態(tài)分類準(zhǔn)確率為80.4%,平均推理時(shí)間為1.8ms,是YOLOv7系列中最適合部署在移動(dòng)設(shè)備上的模型。與同參數(shù)量級(jí)YOLO系列中的YOLOv3-tiny、YOLOv4-tiny以及YOLOX-tiny相比較,準(zhǔn)確率分別提高了16.8、4.3和4.8個(gè)百分點(diǎn),平均精度均值mAP分別提高了7.3、21和3.9個(gè)百分點(diǎn)。與EfficientDet、CenterNet、SSD、Faster-RCNN相比較,mAP分別提高了8.2,42.4,5.8和4.0個(gè)百分點(diǎn)。在強(qiáng)光、弱光和人工補(bǔ)光的檢測(cè)條件下,YOLOv7-tiny均保持有較高水平的精度,說(shuō)明YOLOv7-tiny模型有較強(qiáng)的泛化性。
2)使用YOLOv7-tiny模型構(gòu)建的采摘系統(tǒng)在自然環(huán)境下的采摘成功率為90%,其中上位機(jī)計(jì)算時(shí)間占總采摘時(shí)間的比例平均為22.6%,驗(yàn)證了本文基于YOLOv7-tiny模型的采摘系統(tǒng)實(shí)現(xiàn)火龍果自動(dòng)化采摘的可行性,為火龍果及類似水果在自然環(huán)境下的識(shí)別及定位提供了參考,為水果采摘機(jī)器人的研制提供了支撐。
[1] XU Zhibo, HUANG Xiaopeng, HUANG Yuan, et al. A real-time zanthoxylum target detection method for an intelligent picking robot under a complex background, based on an improved YOLOv5s architecture[J]. Sensors, 2022, 22(2): 682.
[2] 周裔揚(yáng),鄧三鵬,祁宇明,等. 基于YOLOv5的移動(dòng)機(jī)器人目標(biāo)檢測(cè)算法的研究[J]. 裝備制造技術(shù),2021,8:15-18. ZHOU Yiyang, DENG Sanpeng, QI Yuming, et al. Mobile robot target recognition algorithm based on YOLOv5[J]. Equipment Manufacturing Technology, 2021, 8: 15-18. (in Chinese with English abstract)
[3] 王丹丹,何東健. 基于R-FCN深度卷積神經(jīng)網(wǎng)絡(luò)的機(jī)器人疏果前蘋果目標(biāo)的識(shí)別[J]. 農(nóng)業(yè)工程學(xué)報(bào),2019,35(3):156-163. WANG Dandan, HE Dongjian. Recognition of apple targets before fruits thinning by robot based on R-FCN deep convolution neural network[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2019, 35(3): 156-163. (in Chinese with English abstract)
[4] 黃鳳珠,陸貴鋒,武志江,等. 火龍果種質(zhì)資源果實(shí)品質(zhì)性狀多樣性分析[J]. 中國(guó)南方果樹,2019,48(6):46-52,58. HUANG Fengzhu, LU Guifeng, WU Zhijiang, et al. Diversity analysis of fruit quality traits in pitaya germplasm resources[J]. South China Fruits, 2019, 48(6): 46-52, 58. (in Chinese with English abstract)
[5] 易詩(shī),李俊杰,張鵬,等. 基于特征遞歸融合YOLOv4網(wǎng)絡(luò)模型的春見(jiàn)柑橘檢測(cè)與計(jì)數(shù)[J]. 農(nóng)業(yè)工程學(xué)報(bào),2021,37(18):161-169. YI Shi, LI Junjie, ZHANG Peng, et al. Detecting and counting of spring-see citrus using YOLOv4 network model and recursive fusion of features[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2021, 37(18): 161-169. (in Chinese with English abstract)
[6] 杜文圣,王春穎,朱衍俊,等. 采用改進(jìn)Mask R-CNN算法定位鮮食葡萄疏花夾持點(diǎn)[J]. 農(nóng)業(yè)工程學(xué)報(bào),2022,38(1):169-177. DU Wensheng, WANG Chunying, ZHU Yanjun, et al. Fruit stem clamping points location for table grape thinning using improved mask R-CNN[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(1): 169-177. (in Chinese with English abstract)
[7] WANG Peng, NIU Tong, HE Dongjian. Tomato young fruits detection method under near color background based on improved Faster R-CNN with attention mechanism[J]. Agriculture, 2021, 11(11): 1059.
[8] 鄧子青,王陽(yáng),張兵,等. 基于Otsu算法與形態(tài)學(xué)的火龍果圖像分割研究[J]. 智能計(jì)算機(jī)與應(yīng)用,2022,12(6):106-109. DENG Ziqing, WANG Yang, ZHANG Bing, et al. Research on pitaya image segmentation based on Otsu algorithm and morphology[J]. Intelligent Computer and Applications, 2022, 12(6): 106-109. (in Chinese with English abstract)
[9] 舒田,陳智虎,劉春艷,等. 火龍果植株高光譜識(shí)別與特征波段提取[J]. 貴州農(nóng)業(yè)科學(xué),2022,50(3):117-124. SHU Tian, CHEN Zhihu, LIU Chunyan, et al. Hyperspectral identification and characteristic bands extraction of pitaya plant[J]. Guizhou Agricultural Sciences, 2022, 50(3): 117-124. (in Chinese with English abstract)
[10] 肖冬娜,周忠發(fā),尹林江,等. 融合顏色指數(shù)與空間結(jié)構(gòu)的喀斯特山地火龍果單株識(shí)別[J]. 激光與光電子學(xué)進(jìn)展,2022,59(10):489-503. XIAO Dongna, ZHOU Zhongfa, YIN Linjiang, et al. Identification of single plant of karst mountain pitaya by fusion of color index and spatial structure[J]. Laser & Optoelectronics Progress, 2022, 59(10): 489-503. (in Chinese with English abstract)
[11] 趙立新,邢潤(rùn)哲,白銀光,等. 深度學(xué)習(xí)在目標(biāo)檢測(cè)的研究綜述[J]. 科學(xué)技術(shù)與工程,2021,21(30):12787-12795. ZHAO Lixin, XING Runzhe, BAI Yinguang, et al. Review on survey of deep learning in target detection[J]. Science Technology and Engineering, 2021, 21(30): 12787-12795. (in Chinese with English abstract)
[12] 林森,劉美怡,陶志勇. 采用注意力機(jī)制與改進(jìn)YOLOv5的水下珍品檢測(cè)[J]. 農(nóng)業(yè)工程學(xué)報(bào),2021,37(18):307-314. LIN Sen, LIU Meiyi, TAO Zhiyong. Detection of underwater treasures using attention mechanism and improved YOLOv5[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2021, 37(18): 307-314. (in Chinese with English abstract)
[13] PARICO A I B, AHAMED T. Real time pear fruit detection and counting using YOLOv4 models and deep sort[J]. Sensors, 2021, 21(14): 4803.
[14] DING F, ZHUANG Z, LIU Y, et al. Detecting defects on solid wood panels based on an improved SSD algorithm[J]. Sensors, 2020, 20(18): 5315.
[15] 龍潔花,趙春江,林森,等. 改進(jìn)Mask R-CNN的溫室環(huán)境下不同成熟度番茄果實(shí)分割方法[J]. 農(nóng)業(yè)工程學(xué)報(bào),2021,37(18):100-108. LONG Jiehua, ZHAO Chunjiang, LIN Sen, et al. Segmentation method of the tomato fruits with different maturities under greenhouse environment based on improved Mask R-CNN[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2021, 37(18): 100-108. (in Chinese with English abstract)
[16] 蔡舒平,孫仲鳴,劉慧,等. 基于改進(jìn)型YOLOv4的果園障礙物實(shí)時(shí)檢測(cè)方法[J]. 農(nóng)業(yè)工程學(xué)報(bào),2021,37(2):36-43. CAI Shuping, SUN Zhongming, LIU Hui, et al. Real-time detection methodology for obstacles in orchards using improved YOLOv4[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2021, 37(2): 36-43. (in Chinese with English abstract)
[17] GUO Q, CHEN Y, TANG Y, et al. Lychee fruit detection based on monocular machine vision in orchard environment[J]. Sensors, 2019, 19(19): 4091.
[18] WANG Z, WALSH K, KOIRALA A. Mango fruit load estimation using a video based mangoYOLO-Kalman filter-Hungarian algorithm method[J]. Sensors (Basel, Switzerland), 2019, 19(12): 2742.
[19] ZHENG Z, XIONG J, LIN H, et al. A method of green citrus detection in natural environments using a deep convolutional neural network[J]. Frontiers in Plant Science, 2021, 12: 705737.
[20] 趙輝,喬艷軍,王紅君,等. 基于改進(jìn)YOLOv3的果園復(fù)雜環(huán)境下蘋果果實(shí)識(shí)別[J]. 農(nóng)業(yè)工程學(xué)報(bào),2021,37(16):127-135. ZHAO Hui, QIAO Yanjun, WANG Hongjun, et al. Apple fruit recognition in complex orchard environment based on improved YOLOv3[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2021, 37(16): 127-135. (in Chinese with English abstract)
[21] 熊俊濤,劉振,湯林越,等. 自然環(huán)境下綠色柑橘視覺(jué)檢測(cè)技術(shù)研究[J]. 農(nóng)業(yè)機(jī)械學(xué)報(bào),2018,49(4):45-52. XIONG Juntao, LIU Zhen, TANG Linyue, et al. Visual detection technology of green citrus under natural environment[J]. Transactions of the Chinese Society for Agricultural Machinery (Transactions of the CSAM), 2018, 49(4): 45-52. (in Chinese with English abstract)
[22] WANG J, GAO K, JIANG H, et al. Method for detecting dragon fruit based on improved lightweight convolutional neural network[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2020, 36(20): 218-225. 王金鵬,高凱,姜洪喆,等. 基于改進(jìn)的輕量化卷積神經(jīng)網(wǎng)絡(luò)火龍果檢測(cè)方法(英文)[J]. 農(nóng)業(yè)工程學(xué)報(bào),2020,36(20):218-225. (in English with Chinese abstract)
[23] 楊堅(jiān),錢振,張燕軍,等. 采用改進(jìn)YOLOv4-tiny的復(fù)雜環(huán)境下番茄實(shí)時(shí)識(shí)別[J]. 農(nóng)業(yè)工程學(xué)報(bào),2022,38(9):215-221. YANG Jian, QIAN Zhen, ZHANG Yanjun, et al. Real-time recognition of tomatoes in complex environments based on improved YOLOv4-tiny[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(9): 215-221. (in Chinese with English abstract)
[24] 文斌,曹仁軒,楊啟良,等. 改進(jìn)YOLOv3算法檢測(cè)三七葉片病害[J]. 農(nóng)業(yè)工程學(xué)報(bào),2022,38(3):164-172. WEN Bin, CAO Renxuan, YANG Qiliang, et al. Detecting leaf disease for Panax notoginseng using an improved YOLOv3 algorithm[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(3): 164-172. (in Chinese with English abstract)
[25] 劉天真,滕桂法,苑迎春,等. 基于改進(jìn)YOLO v3的自然場(chǎng)景下冬棗果實(shí)識(shí)別方法[J]. 農(nóng)業(yè)機(jī)械學(xué)報(bào),2021,52(5):17-25. LIU Tianzhen, TENG Guifa, YUAN Yingchun, et al. Winter jujube fruit recognition method based on improved YOLOv3 under natural scene[J]. Transactions of the Chinese Society for Agricultural Machinery, 2021, 52(5): 17-25. (in Chinese with English abstract)
[26] ZHOU J, ZHANG Y, WANG J. RDE-YOLOv7: An improved model based on YOLOv7 for better performance in detecting dragon fruits[J]. Agronomy, 2023, 13: 1042.
[27] ZHOU J, ZHANG Y, WANG J. A dragon fruit picking detection method Based on YOLOv7 and PSP-Ellipse[J]. Sensors, 2023, 23(8): 3803.
Multi-pose dragon fruit detection system for picking robots based on the optimal YOLOv7 model
WANG Jinpeng, ZHOU Jialiang, ZHANG Yueyue, HU Haoruo
(,,210000,)
Dragon fruit is one of the most popular fruits in Asia. The current manual picking cannot fully meet the requirement of large-scale production in recent years, due to the labor-intensive task. Alternatively, the automated picking of dragon fruit can be expected to greatly reduce labor intensity. Among them, the vision system can be one of the most important parts of the picking robot. The commonly-used recognition cannot consider the complex growth posture of dragon fruit. The hard branches and complex postures of dragon fruit can make it difficult to achieve automatic picking. It is a high demand to distinguish the dragon fruit with the different postures, and then guide the robotic arm to approach the fruit in an appropriate path. In this study, a multi-pose detection of dragon fruit was proposed for the automatic picking using optimal YOLOv7-tiny model. 1 281 images of dragon fruit were taken in the field, including 450, 535, and 296 images under strong, weak, and artificial light conditions. The image datasets were then divided into 1 036 images for training, 116 images for validation, and 129 images for testing, according to three light levels. Among them, the light conditions were the largest influencing factor on the detection performance. A series of experiments were conducted using the dataset. Firstly, the detection performance was compared with the seven models in the YOLOv7 series. The optimal models were given for the different devices, in terms of the number of model parameters and detection performance. Secondly, the detection performance of the YOLOv7 series models was compared with the other target detection models. Finally, the YOLOv7-tiny model was deployed into the mobile device. Specifically, the depth camera was combined with the robotic arm in the field picking. The results showed that the YOLOv7-e6e model in the YOLOv7 series presented the highest precision of 85.0%, while the YOLOv7x model was the highest recall of 85.4%, and the YOLOv7 model was the highest mean average precision (mAP) of 89.3%. The YOLOv7-tiny model shared the least parameters, weight files, layers, and inference time of 6 × 106, 12MB, 255, and 1.8ms, respectively. It infers that the improved model was the most suitable for mobile devices, due to the fast inference speed. The detection precision of YOLOv7-tiny was 83.6%, the recall was 79.9%, the mAP was 88.3%, and the accuracy rate of classification for the multi-pose dragon fruits was 80.4%. Furthermore, the precision of YOLOv7-tiny increased by 16.8, 4.3, and 4.8 percentage points, respectively, whereas, the mAP increased by 7.3, 21, and 3.9 percentage points, compared with the YOLOv3-tiny, YOLOv4-tiny, and YOLOX-tiny. The precision of YOLOv7-tiny increased by 7.3, 4.2, 7.3, 6.5, 3.5, and 3.9 percentage points, respectively, compared with the YOLOv5s, YOLOXs, YOLOv4G, YOLOv4M, YOLOv5x, and YOLOXx. In addition, the mAP of YOLOv7-tiny increased by 8.2, 5.8, 4.0, and 42.4 percentage points, respectively, compared with the EfficientDet, SSD, Faster-RCNN, and CenterNet, indicating the high level of detection accuracy of YOLOv7-tiny model. The picking system of dragon fruit was constructed to verify by some picking experiments using the trained YOLOv7-tiny model. The experiment results show that the inference time of the vision system only accounted for 22.6% of the whole picking action time. The picking success rate of dragon fruits in the front view was 90%, indicating the higher performance of automatic picking than before. The conclusion can also provide technical support for fruit picking.
deep learning; convolutional neural networks; picking robot; YOLOv7; object detection; dragon fruit
2022-08-03
2023-04-01
江蘇省農(nóng)業(yè)科技自主創(chuàng)新項(xiàng)目(CX[22]3099);國(guó)家林業(yè)和草原局應(yīng)急科技項(xiàng)目(202202-3);江蘇省現(xiàn)代農(nóng)機(jī)裝備與技術(shù)推廣項(xiàng)目(NJ2021-18);江蘇省重點(diǎn)研發(fā)計(jì)劃項(xiàng)目(BE2021016-2)
王金鵬,博士,副教授,研究方向?yàn)橹悄芑r(nóng)業(yè)裝備、智能采摘機(jī)器人。Email:jpwang@njfu.edu.cn
10.11975/j.issn.1002-6819.202208031
TP301.6;TP181;S24
A
1002-6819(2023)-08-0276-08
王金鵬,周佳良,張躍躍,等. 基于優(yōu)選YOLOv7模型的采摘機(jī)器人多姿態(tài)火龍果檢測(cè)系統(tǒng)[J]. 農(nóng)業(yè)工程學(xué)報(bào),2023,39(8):276-283. doi:10.11975/j.issn.1002-6819.202208031 http://www.tcsae.org
WANG Jinpeng, ZHOU Jialiang, ZHANG Yueyue, et al. Multi-pose dragon fruit detection system for picking robots based on the optimal YOLOv7 model[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2023, 39(8): 276-283. (in Chinese with English abstract) doi:10.11975/j.issn.1002-6819.202208031 http://www.tcsae.org