段潔利,王昭銳,鄒湘軍,袁浩天,黃廣生,楊 洲,3
采用改進(jìn)YOLOv5的蕉穗識(shí)別及其底部果軸定位
段潔利1,2,王昭銳1,鄒湘軍1,袁浩天1,黃廣生1,楊 洲1,2,3※
(1. 華南農(nóng)業(yè)大學(xué)工程學(xué)院,廣州 510642;2. 嶺南現(xiàn)代農(nóng)業(yè)科學(xué)與技術(shù)廣東省實(shí)驗(yàn)室,廣州 510600;3. 嘉應(yīng)學(xué)院廣東省山區(qū)特色農(nóng)業(yè)資源保護(hù)與精準(zhǔn)利用重點(diǎn)實(shí)驗(yàn)室,梅州 514015)
為提高香蕉采摘機(jī)器人的作業(yè)效率和質(zhì)量,實(shí)現(xiàn)機(jī)器人末端承接機(jī)構(gòu)的精確定位,該研究提出一種基于YOLOv5算法的蕉穗識(shí)別,并對(duì)蕉穗底部果軸進(jìn)行定位的方法。將CA(Coordinate Attention)注意力機(jī)制融合到主干網(wǎng)絡(luò)中,同時(shí)將C3(Concentrated-Comprehensive Convolution Block)特征提取模塊與CA注意力機(jī)制模塊融合構(gòu)成C3CA模塊,以此增強(qiáng)蕉穗特征信息的提取。用 EIoU(Efficient Intersection over Union)損失對(duì)原損失函數(shù)CIoU(Complete Intersection over Union)進(jìn)行替換,加快模型收斂并降低損失值。通過(guò)改進(jìn)預(yù)測(cè)目標(biāo)框回歸公式獲取試驗(yàn)所需定位點(diǎn),并對(duì)該點(diǎn)的相機(jī)坐標(biāo)系進(jìn)行轉(zhuǎn)換求解出三維坐標(biāo)。采用D435i深度相機(jī)對(duì)蕉穗底部果軸進(jìn)行定位試驗(yàn)。識(shí)別試驗(yàn)表明,與YOLOv5、Faster R-CNN等模型相比,改進(jìn)YOLOv5模型的平均精度值(mean Average Precision, mAP)分別提升了0.17和21.26個(gè)百分點(diǎn);定位試驗(yàn)表明,采用改進(jìn)YOLOv5模型對(duì)蕉穗底部果軸定位誤差均值和誤差比均值分別為0.063 m和2.992%,與YOLOv5和Faster R-CNN模型相比,定位誤差均值和誤差比均值分別降低了0.022 m和1.173個(gè)百分點(diǎn),0.105 m和5.054個(gè)百分點(diǎn)。試驗(yàn)實(shí)時(shí)可視化結(jié)果表明,改進(jìn)模型能對(duì)果園環(huán)境下蕉穗進(jìn)行快速識(shí)別和定位,保證作業(yè)質(zhì)量,為后續(xù)水果采摘機(jī)器人的研究奠定了基礎(chǔ)。
圖像識(shí)別;機(jī)器人;香蕉采摘;果軸定位;注意力機(jī)制;損失函數(shù)
中國(guó)是香蕉生產(chǎn)和消費(fèi)大國(guó),但是香蕉采摘效率低,損傷較大,是一項(xiàng)高耗勞動(dòng)力的活動(dòng)[1]。傳統(tǒng)的香蕉采收通常需要兩個(gè)青壯年手工操作,勞動(dòng)強(qiáng)度大,并隨著人口老齡化和勞動(dòng)力高成本化,勢(shì)必會(huì)降低采收效率[2]。為了提高香蕉的采收效率、減少損傷和應(yīng)對(duì)勞動(dòng)力短缺等問(wèn)題,本研究對(duì)蕉穗進(jìn)行識(shí)別定位,為其應(yīng)用于采摘機(jī)器人的托接系統(tǒng)做前期研究。近年來(lái),國(guó)內(nèi)外對(duì)于機(jī)器人的研究逐漸從工業(yè)領(lǐng)域拓展到了農(nóng)業(yè)領(lǐng)域[3],與此同時(shí)基于圖像處理和機(jī)器學(xué)習(xí)的目標(biāo)檢測(cè)方法也更廣泛地應(yīng)用于農(nóng)業(yè)作業(yè)。盧軍等[4]針對(duì)果實(shí)遮擋,在變化光照條件下提出了基于彩色信息和目標(biāo)輪廓整合的柑橘遮擋檢測(cè)方法。顧蘇杭等[5]考慮到光照等因素,提出一種基于顯著性輪廓的蘋(píng)果目標(biāo)識(shí)別方法應(yīng)用于蘋(píng)果采摘機(jī)器人,該方法能通過(guò)圖像分割以及圖像后處理完整地提取蘋(píng)果輪廓,對(duì)蘋(píng)果目標(biāo)識(shí)別率達(dá)到98%。Yamamoto等[6]針對(duì)傳統(tǒng)果實(shí)識(shí)別需要對(duì)其特征設(shè)置特定閾值帶來(lái)的泛化性低的問(wèn)題,提出了基于傳統(tǒng)RGB數(shù)碼相機(jī)結(jié)合機(jī)器學(xué)習(xí)的方法對(duì)番茄圖像特征生成的分類模型進(jìn)行圖像分割,達(dá)到理想的分割效果。
上述傳統(tǒng)機(jī)器學(xué)習(xí)方法檢測(cè)復(fù)雜背景圖像中的水果目標(biāo)會(huì)受到背景信息影響[7],而且需要手工提取特征,難以獲得較好的檢測(cè)結(jié)果。采用深度學(xué)習(xí)方法只需要提供帶標(biāo)簽的數(shù)據(jù)集,而不需要人為設(shè)計(jì)特征就能夠從數(shù)據(jù)集中提取目標(biāo)的特征[8]。由于卷積神經(jīng)網(wǎng)絡(luò)內(nèi)嵌多個(gè)隱含層,通過(guò)學(xué)習(xí)得到更加高級(jí)的數(shù)據(jù)特征表示,在解決目標(biāo)檢測(cè)等問(wèn)題有很大的優(yōu)勢(shì)[9]。常用的目標(biāo)檢測(cè)網(wǎng)絡(luò)有Fast R-CNN(Fast Region with CNN)系列[10-12]、SSD(Single Shot multibox Detector)[13-16]和YOLO(You Only Look Once)系列[17-20]。YOLO網(wǎng)絡(luò)相較于其他兩者的模型更精簡(jiǎn),運(yùn)行速度最快,實(shí)時(shí)性更好[21],因此在農(nóng)業(yè)目標(biāo)檢測(cè)上的應(yīng)用也越來(lái)越普遍。呂石磊等[22]提出一種基于改進(jìn)YOLOv3-LITE輕量級(jí)網(wǎng)絡(luò)模型,實(shí)現(xiàn)了自然環(huán)境下柑橘果實(shí)的快速精確識(shí)別。趙德安等[23]通過(guò)YOLOv3卷積神經(jīng)網(wǎng)絡(luò)遍歷整個(gè)圖像,回歸目標(biāo)類別和位置,在保證檢測(cè)效率與準(zhǔn)確率的前提下實(shí)現(xiàn)了端到端的目標(biāo)檢測(cè)。同樣是基于改進(jìn)YOLOv3模型,Tian等[24]提出應(yīng)用于光照波動(dòng)、復(fù)雜背景、重疊和枝葉遮擋等條件下的果園不同生長(zhǎng)階段的蘋(píng)果檢測(cè)。該方法加入了密集連接網(wǎng)絡(luò)(DenseNet),有效地提升了網(wǎng)絡(luò)性能并實(shí)現(xiàn)重疊和遮擋條件下的蘋(píng)果檢測(cè)。同樣為了克服上述問(wèn)題,Liu等[25]基于YOLOv3提出了一種改進(jìn)的番茄檢測(cè)模型YOLO-Tomato,用圓形包圍盒更準(zhǔn)確地匹配番茄果實(shí),提升了模型的檢測(cè)性能。
目前國(guó)內(nèi)外對(duì)于球形水果的目標(biāo)檢測(cè)趨于成熟,但以香蕉為對(duì)象的目標(biāo)檢測(cè)成果尚少。Fu等[26]首先對(duì)自然環(huán)境下的香蕉目標(biāo)檢測(cè)提出了一種基于顏色和紋理特征的自然環(huán)境下香蕉檢測(cè)方法;后續(xù)作者基于YOLOv4提出YOLO-banana目標(biāo)檢測(cè)網(wǎng)絡(luò)[27-29],實(shí)現(xiàn)果園環(huán)境下對(duì)香蕉果實(shí)多類別快速檢測(cè)。Wu[30-32]等對(duì)香蕉機(jī)器人進(jìn)行研究,提出了基于YOLO系列算法的香蕉多目標(biāo)特征識(shí)別,并針對(duì)香蕉果軸斷蕾點(diǎn)進(jìn)行視覺(jué)定位,經(jīng)試驗(yàn)達(dá)到作業(yè)要求。
該研究在上述相關(guān)研究的基礎(chǔ)上提出一種能夠搭載于香蕉采收機(jī)器人承接機(jī)構(gòu)上的基于改進(jìn)YOLOv5的果園環(huán)境下蕉穗目標(biāo)識(shí)別定位的視覺(jué)系統(tǒng),通過(guò)該系統(tǒng)對(duì)蕉穗進(jìn)行識(shí)別和對(duì)蕉穗底部果軸進(jìn)行定位,并對(duì)識(shí)別精度和定位精度進(jìn)行驗(yàn)證。
該研究所使用的蕉穗果實(shí)圖像于2021年3月17日和2022年2月10日分別在廣東省農(nóng)科院香蕉園和廣東省潮州市饒平縣內(nèi)香蕉種植地區(qū)進(jìn)行拍攝,拍攝當(dāng)天天氣晴。圖像采集設(shè)備為高像素智能手機(jī),拍攝過(guò)程攝像頭與香蕉果實(shí)距離為500~1 500 mm,圖像分辨率為3 024×4 000像素,共采集果園環(huán)境下的蕉穗圖片500張,按照4∶1的比例構(gòu)造訓(xùn)練集和測(cè)試集,即訓(xùn)練集圖片400張,測(cè)試集圖片100張。
用圖像標(biāo)注軟件LabelImg對(duì)采集的500張蕉穗圖片進(jìn)行標(biāo)注,生成YOLO模型對(duì)應(yīng)的xml文件,文件包含蕉穗在圖像中的坐標(biāo)位置、圖像大小以及標(biāo)簽名banana。然后把蕉穗圖片和標(biāo)注好的文件分成訓(xùn)練集和測(cè)試集分別放在images和labels文件夾,組成本研究需要使用的蕉穗數(shù)據(jù)集。
YOLOv5的網(wǎng)絡(luò)結(jié)構(gòu)主要由骨干網(wǎng)絡(luò)(Backbone)、頸部網(wǎng)絡(luò)(Neck)以及預(yù)測(cè)網(wǎng)絡(luò)(Prediction)組成。根據(jù)模型由小到大,YOLOv5網(wǎng)絡(luò)分成YOLOv5s、YOLOv5m、YOLOv5l和YOLOv5x??紤]到果園環(huán)境下蕉穗識(shí)別任務(wù)的實(shí)時(shí)性和效率,該研究選擇較輕量級(jí)的YOLOv5s進(jìn)行改進(jìn)。如圖1所示,本文從以下三方面進(jìn)行改進(jìn):
1)在Backbone 網(wǎng)絡(luò)的C3模塊(Concentrated- Comprehensive Convolution Block)之后添加CA(Coordinate Attention)注意力機(jī)制,構(gòu)成C3CA模塊,4個(gè)C3CA模塊可以將網(wǎng)絡(luò)中有用的特征信息進(jìn)行重用,強(qiáng)化對(duì)蕉穗特征的提取。
2)在最后一個(gè)C3CA模塊后面添加CA模塊,加強(qiáng)對(duì)位置信息和通道信息的提取,有助于提升目標(biāo)定位效果。
3)用EIoU(Efficient Intersection over Union)損失對(duì)原損失函數(shù)CIoU(Complete Intersection over Union)進(jìn)行替換,通過(guò)分別計(jì)算目標(biāo)框?qū)捀叩牟町愔?,改善樣本分布不均的?wèn)題,加快模型收斂并降低模型損失。
注:CBS為卷積單元;C3CA、C3模塊后面的數(shù)字表示模塊個(gè)數(shù);SPPF為池化操作;Concat表示特征拼接;Upsample為特征上采樣;Conv2d表示二維卷積;輸出80×80×255、40×40×255和20×20×255表示網(wǎng)絡(luò)輸出特征圖的長(zhǎng)、寬和深度。
2.1.1 CA注意力模塊
由于果園環(huán)境下生長(zhǎng)的蕉穗背景復(fù)雜、形態(tài)多樣,這會(huì)對(duì)算法最終的識(shí)別效果造成一定影響,因此該研究在YOLOv5算法網(wǎng)絡(luò)中添加注意力機(jī)制,通過(guò)突出蕉穗的重要特征從而提升識(shí)別效果。最早用于計(jì)算機(jī)視覺(jué)的通道注意力機(jī)制對(duì)于提升網(wǎng)絡(luò)性能具有顯著效果,但通常會(huì)忽略非常重要的位置信息[33]。因此本研究采用將位置信息結(jié)合通道注意力的輕量級(jí)移動(dòng)網(wǎng)絡(luò)CA注意力模塊強(qiáng)化蕉穗特征信息并弱化背景信息。CA注意力模塊的實(shí)現(xiàn)分為全局信息嵌入和坐標(biāo)注意力生成兩個(gè)部分。
注:C、H、W分別表示特征通道的深度、高度和寬度;r為縮減比;n表示對(duì)應(yīng)模塊個(gè)數(shù);:X Avg Pool、Y Avg Pool分別表示橫向全局池化和縱向全局池化。
式(1)為通道注意力機(jī)制中用于全局信息嵌入的全局池化操作,給定輸入特征張量=[1,2,3,…,x]∈R×H×W,全局池化操作與第通道(∈)相關(guān)的輸出Z為
上述兩種變換分別沿著兩個(gè)空間方向聚合特征,輸出結(jié)果使注意力模塊捕捉一個(gè)空間方向并保存另一空間方向的準(zhǔn)確位置信息,這使得網(wǎng)絡(luò)對(duì)于目標(biāo)的定位更為精準(zhǔn)。
為充分利用上述方法獲取到的全局信息并利用其表達(dá)的特征進(jìn)行注意力生成,將式(2)、(3)結(jié)果進(jìn)行拼接操作,再使用卷積變換形成中間特征圖
式中為sigmoid激活函數(shù)。最后將g及g展開(kāi)作為注意力權(quán)值,注意力機(jī)制的模塊輸出=[1,2,3,…,y]為
2.1.2 損失函數(shù)
原始YOLOv5模型采用CIoU(Complete Intersection over Union)損失在一定程度上可以加快預(yù)測(cè)框回歸速度,但仍存在相對(duì)比例懲罰項(xiàng)對(duì)于線性比例的預(yù)測(cè)框?qū)捀弑群驼鎸?shí)框的寬高比不起作用的問(wèn)題,而且預(yù)測(cè)框?qū)捀邿o(wú)法保持同增減。為了解決這些問(wèn)題,采用EIoU(Efficient Intersection over Union)[34]作為改進(jìn)模型的損失函數(shù),分別通過(guò)計(jì)算邊界框?qū)捀叩牟町惐认鼵IoU縱橫比描述所存在的模糊性,EIoU損失值EIoU為
式中IoU、dis和asp分別表示交并比損失、距離損失和相位損失;表示中心點(diǎn)之間的歐式距離;和b表示預(yù)測(cè)框和真實(shí)框的中心點(diǎn);表示預(yù)測(cè)框和真實(shí)框最小包圍矩形的對(duì)角線長(zhǎng)度;w和h表示預(yù)測(cè)框的寬高;w和h表示真實(shí)框的寬高;d、d表示預(yù)測(cè)框和真實(shí)框最小包圍矩形的寬度和高度。
該研究蕉穗檢測(cè)網(wǎng)絡(luò)采用Pytorch框架搭建,處理器為11th Gen Intel(R) Core(TM) i7-11800H@2.30GHz,16G內(nèi)存,顯卡為GeForce RTX 3060 Laptop GPU。批量大?。╞atch size)設(shè)置為4,訓(xùn)練步數(shù)(Epoch)設(shè)為300,圖像通過(guò)歸一化處理為分辨率640×640,學(xué)習(xí)率(Learning rate)為0.01,優(yōu)化器采用隨機(jī)梯度下降(Stochastic gradient descent, SGD)。試驗(yàn)結(jié)果的評(píng)價(jià)指標(biāo)采用準(zhǔn)確率(Precision)、召回率(Recall)、平均精度值(mean Average Precision, mAP)和模型大小,設(shè)定IoU(Intersection over Union)≥0.5為對(duì)蕉穗的正確檢測(cè)。
試驗(yàn)使用Intel RealSense D435i立體視覺(jué)深度相機(jī),深度圖分辨率為1 280×720像素,彩色圖分辨率為848×480像素,深度探測(cè)范圍0.2~10.0 m,采用USB供電。采用RealSense相機(jī)自帶軟件Intel Real Sense Viewer獲取相機(jī)內(nèi)參,如表1所示。
表1 RealSense D435i相機(jī)內(nèi)參數(shù)
在YOLOv5目標(biāo)檢測(cè)網(wǎng)絡(luò)中通過(guò)回歸公式預(yù)測(cè)邊界框的中心點(diǎn)b、b以及寬高b、b如下
式中t,t是對(duì)目標(biāo)中心坐標(biāo)預(yù)測(cè)的偏移參數(shù);t,t是對(duì)目標(biāo)寬高預(yù)測(cè)的縮放因子;(c,c)為對(duì)應(yīng)網(wǎng)格左上角的坐標(biāo),p、p為錨框模板映射在特征層上的寬和高。b、b、b、b可以通過(guò)xywh2xyxy( )函數(shù)唯一確定預(yù)測(cè)框左上和右下兩對(duì)坐標(biāo)值(1,1)和(2,2)。最后通過(guò)幾何關(guān)系求出定位點(diǎn)(1,1)如下
通過(guò)該式即求得目標(biāo)檢測(cè)框底部的中心點(diǎn),由于該點(diǎn)與蕉穗底部果軸基本重合,因此也就得到蕉穗底部果軸在像素坐標(biāo)系下的定位點(diǎn)(1,1),如圖3所示。
如圖4所示,本研究要確定定位點(diǎn)在世界坐標(biāo)系中的位置必須先求得該點(diǎn)到相機(jī)坐標(biāo)系{x,y,z}轉(zhuǎn)換關(guān)系,再求得相機(jī)坐標(biāo)系到圖像坐標(biāo)系{,}以及圖像坐標(biāo)系到像素坐標(biāo)系{,}的轉(zhuǎn)換關(guān)系。坐標(biāo)系{,}平行于坐標(biāo)系{x,y,z}中x,y構(gòu)成的平面,坐標(biāo)系{,}的原點(diǎn)位于圖像左上角。定位點(diǎn)通過(guò)剛體變換矩陣轉(zhuǎn)換到相機(jī)坐標(biāo)系下的點(diǎn)P。依據(jù)投影光線的相似三角形原理推導(dǎo)出坐標(biāo)系{x,y,z}與坐標(biāo)系{,}之間的轉(zhuǎn)換關(guān)系如下:
注:圖像所處像素坐標(biāo)系為{};c和c為網(wǎng)格左上角對(duì)應(yīng)方向的長(zhǎng)度;p和p為錨框的寬和高;(b,b)為預(yù)測(cè)邊界框的中心點(diǎn);b和b為預(yù)測(cè)邊界框的寬高;(1,1)和(2,2)分別為預(yù)測(cè)框兩對(duì)坐標(biāo)值;(1,1)為本研究的定位點(diǎn)。
Note: The pixel coordinate system of the image is{};candcare the coordinate at the upper left corner of the corresponding grid;pandpare the width and height of the anchor frame; (b,b) is the center point of the prediction bounding box;bandbare the width and height of the predicted bounding box; (1,1) and (2,2) are prediction box two pairs of coordinate values respectively; (1,1) is the anchor point of this study.
圖3 目標(biāo)邊界框和定位點(diǎn)回歸
Fig.3 Regression of target boundary box and anchor point
注:{xc, yc, zc}為相機(jī)坐標(biāo)系,原點(diǎn)為Oc;{x,y}為圖像坐標(biāo)系,原點(diǎn)為Oi;{u,v}為像素坐標(biāo)系,原點(diǎn)為Op;點(diǎn)Pc為相機(jī)坐標(biāo)系下的定位點(diǎn);點(diǎn)P為投影光線OcPc與圖像坐標(biāo)系平面的交點(diǎn);f為焦距,mm。
又因?yàn)閳D像坐標(biāo)系與像素坐標(biāo)系存在平移關(guān)系,所以兩者間又有如下轉(zhuǎn)換
即為相機(jī)坐標(biāo)系和像素坐標(biāo)系的轉(zhuǎn)換公式。其中
如圖5所示,蕉穗深度圖像由左右兩個(gè)紅外相機(jī)各對(duì)同一蕉穗獲取P1、P2兩幅紅外圖像,然后通過(guò)三角測(cè)量的原理獲得深度圖像Pd后與彩色圖像Pc進(jìn)行配準(zhǔn)。設(shè)定彩色圖素坐標(biāo)系{u,v},深度圖素坐標(biāo)系{u,v},對(duì)兩者進(jìn)行配準(zhǔn)后每一幀彩色圖像中檢測(cè)到的每一個(gè)像素點(diǎn)(u,v)都能對(duì)應(yīng)一個(gè)深度圖像素點(diǎn)(u,v)。最后根據(jù)公式(14)和pyrealsense2庫(kù)中的get_distance( )函數(shù)可以得到定位點(diǎn)在相機(jī)坐標(biāo)系下的三維坐標(biāo)(x,y,z)。
注:P1、P2為左右紅外相機(jī)分別獲取的紅外圖像;Pd、Pc分別為深度圖和彩色圖;{u,v}為彩色圖素坐標(biāo)系;{u,v}為深度圖素坐標(biāo)系。
Note: P1and P2are infrared images obtained by left and right infrared cameras respectively; Pdand Pcare depth map and color map respectively; {u,v} is the color pixel coordinate system; {u,v} is the depth map pixel coordinate system.
圖5 相機(jī)配準(zhǔn)示意圖
Fig.5 Camera registration diagram
定位試驗(yàn)通過(guò)使用模型訓(xùn)練結(jié)果對(duì)蕉穗進(jìn)行目標(biāo)檢測(cè),將目標(biāo)檢測(cè)框底部的中心點(diǎn)作為定位點(diǎn)并求解該點(diǎn)的三維坐標(biāo)。由于將試驗(yàn)得到的相機(jī)到果軸的距離距離值z代入式(14)將可得果軸的其它兩個(gè)坐標(biāo)數(shù)值x和y,因此誤差源在于所測(cè)量的z值。影響定位試驗(yàn)誤差的因素很多,主要為環(huán)境噪聲引起的圖像定位點(diǎn)誤差、隨機(jī)誤差和測(cè)量工具誤差等。為了驗(yàn)證該研究識(shí)別定位的可靠性,對(duì)距離值z進(jìn)行精度評(píng)價(jià)。為驗(yàn)證相機(jī)定位的精度,采用激光測(cè)距儀作為輔助設(shè)備。
如圖6所示,為了模擬果園環(huán)境下對(duì)蕉穗的實(shí)時(shí)定位,該試驗(yàn)將激光測(cè)距儀安裝到深度相機(jī)上端。經(jīng)測(cè)量得激光測(cè)距儀與深度相機(jī)的高度差遠(yuǎn)小于定位點(diǎn)距離,因此忽略此高度差。將該裝置連接至筆記本電腦,用Python開(kāi)發(fā)的程序驅(qū)動(dòng)深度相機(jī)獲取距離測(cè)量值z,同時(shí)使用激光測(cè)距儀獲取距離測(cè)量值z并對(duì)應(yīng)保存。為了評(píng)價(jià)定位精度,采用誤差均值E和誤差比均值E作為評(píng)價(jià)指標(biāo),E反映了估計(jì)值和真實(shí)值之間的絕對(duì)誤差,E反映了估計(jì)值和真實(shí)值的相對(duì)誤差。計(jì)算公式如下:
式中為同一幅圖像成功識(shí)別并定位的蕉穗株數(shù)。
注:zci為深度相機(jī)測(cè)量值,m;zdci為激光測(cè)距儀測(cè)量值,m。
為了豐富數(shù)據(jù)集,提升網(wǎng)絡(luò)模型的魯棒性和泛化性能,防止數(shù)據(jù)集數(shù)量不足造成過(guò)擬合,模型訓(xùn)練時(shí)對(duì)蕉穗圖像進(jìn)行數(shù)據(jù)增強(qiáng),包括平移(Translate)、縮放(Scale)、左右翻轉(zhuǎn)(Flip left-right)、混疊(Mixup)和改變HSV值,將蕉穗訓(xùn)練集擴(kuò)充到844張。在保證訓(xùn)練批量、步數(shù)、優(yōu)化器、和學(xué)習(xí)率等初始參數(shù)和設(shè)備一致的前提下,對(duì)YOLOv5、Faster R-CNN模型和改進(jìn)YOLOv5模型進(jìn)行訓(xùn)練。表2是模型訓(xùn)練結(jié)果,與原始YOLOv5模型相比,改進(jìn)模型的準(zhǔn)確率提升2.8個(gè)百分點(diǎn),召回率達(dá)到100%,平均精度值提升0.17個(gè)百分點(diǎn);與Faster R-CNN模型相比,改進(jìn)模型的準(zhǔn)確率提升52.96個(gè)百分點(diǎn),召回率提升17.91個(gè)百分點(diǎn),平均精度值提升21.26個(gè)百分點(diǎn);改進(jìn)YOLOv5模型與原始YOLOv5模型大小相比減小了1.06 MB。從訓(xùn)練結(jié)果可以看出改進(jìn)模型在融合了CA注意力機(jī)制和C3CA特征提取模塊之后具有最高的平均精度值,而且模型占用內(nèi)存最小,能夠很好提升模型對(duì)果園環(huán)境下蕉穗的特征提取能力,能夠在保證平均精度值的前提下改善識(shí)別效果。
表2 模型的訓(xùn)練結(jié)果
改進(jìn)的YOLOv5模型在YOLOv5模型的基礎(chǔ)上融合了CA注意力模塊、C3CA特征提取模塊和EIoU損失函數(shù)。為了能夠更加直觀地分析改進(jìn)的YOLOv5模型相較于原始模型的提升,分別進(jìn)行了5組試驗(yàn)并對(duì)比試驗(yàn)的平均精度值mAP,試驗(yàn)結(jié)果如表3所示。
試驗(yàn)1為原始YOLOv5模型,在不添加CA、 C3CA和EIoU的條件下平均精度值為99.12%;與原始YOLOv5模型相比,試驗(yàn)2和試驗(yàn)3分別只添加了CA模塊和C3CA模塊,平均精度值分別提高了0.03個(gè)百分點(diǎn)和0.02個(gè)百分點(diǎn);試驗(yàn)4為同時(shí)添加CA和C3CA模塊的改進(jìn)YOLOv5模型,平均精度值提高了0.14個(gè)百分點(diǎn);試驗(yàn)5為本研究的改進(jìn)模型,平均精度值為99.29%,提高了0.17個(gè)百分點(diǎn)。
表3 模型消融試驗(yàn)結(jié)果
注:“√”表示采用該方法。
Note: “√” indicates that this method is used.
圖7為對(duì)原始YOLOv5模型(試驗(yàn)1)與改進(jìn)YOLOv5模型(試驗(yàn)5)進(jìn)行訓(xùn)練后的定位損失函數(shù)和目標(biāo)損失函數(shù),可以看出訓(xùn)練經(jīng)過(guò)300步數(shù)后趨于穩(wěn)定,模型收斂,并且改進(jìn)模型與原始模型相比損失值更低,訓(xùn)練效果得到提升。
圖7 訓(xùn)練損失函數(shù)
為了提升作業(yè)效率,確定最佳的定位范圍,該研究首先在廣東省肇慶市農(nóng)科所蕉園進(jìn)行了預(yù)試驗(yàn),將試驗(yàn)的測(cè)量值范圍限制在1.0~2.5 m。
正式試驗(yàn)在廣東省東莞市水果蔬菜研究所進(jìn)行。模擬果園環(huán)境下對(duì)蕉穗底部果軸的實(shí)時(shí)定位進(jìn)行了隨機(jī)試驗(yàn),采用YOLOv5模型、Faster R-CNN模型和改進(jìn)YOLOv5模型對(duì)距離為1.0~2.5 m范圍的單株和雙株蕉穗進(jìn)行識(shí)別與直線距離定位,每個(gè)模型進(jìn)行10次定位試驗(yàn)。如圖8所示為YOLOv5模型、Faster R-CNN模型和改進(jìn)YOLOv5模型的部分試驗(yàn)的可視化圖像,三種模型在定位范圍內(nèi)都識(shí)別到了視野內(nèi)的蕉穗,并都獲取到了測(cè)量值。從置信度來(lái)看,改進(jìn)YOLOv5模型達(dá)到了90%,置信度得到了保證。記錄深度相機(jī)測(cè)量值z和激光測(cè)距儀測(cè)量值z,計(jì)算誤差均值E和誤差比均值E并計(jì)算平均值,結(jié)果如表4所示。
注:FPS為相機(jī)幀率;檢測(cè)框上的信息“banana”為類別名香蕉,后面的數(shù)字為置信度;檢測(cè)框底部數(shù)值表示定位點(diǎn)在相機(jī)坐標(biāo)系下的三維坐標(biāo)(xc,yc,zc)。
表4 蕉穗底部果軸定位試驗(yàn)結(jié)果
注:z、z分別為深度相機(jī)測(cè)量值和激光測(cè)距儀測(cè)量值,m;E、E分別為為誤差均值和誤差比均值,%。
Note:zandzare the depth camera measurement and the laser rangefinder measurement respectively, m;EandEare the mean of error and the mean of error ratio respectively, %.
根據(jù)表4試驗(yàn)結(jié)果可得:所得到的誤差均值E分別為0.085、0.168和0.063 m;誤差比均值E分別為4.165%、8.046%和2.992%。與Faster R-CNN模型相比,改進(jìn)YOLOv5模型的誤差均值E和誤差比均值E分別降低了0.105 m和5.054個(gè)百分點(diǎn);與原始YOLOv5模型相比,改進(jìn)YOLOv5模型的誤差均值E和誤差比均值E分別降低了0.022 m和1.173個(gè)百分點(diǎn);除此之外,試驗(yàn)中測(cè)量得出誤差大于0.2 m的情況則為定位錯(cuò)誤。原始YOLOv5模型在試驗(yàn)編號(hào)3、4出現(xiàn)定位錯(cuò)誤;Faster R-CNN模型在試驗(yàn)編號(hào)1、4、8出現(xiàn)定位錯(cuò)誤;改進(jìn)YOLOv5模型僅有試驗(yàn)編號(hào)6出現(xiàn)了定位錯(cuò)誤,錯(cuò)誤率相比之下有所降低。綜上所述,改進(jìn)YOLOv5模型對(duì)蕉穗底部果軸定位能夠達(dá)到試驗(yàn)?zāi)康?,效果相比原始YOLOv5模型和Faster R-CNN模型都得到提升。
為了實(shí)現(xiàn)果園環(huán)境蕉穗的快速識(shí)別定位,該研究在YOLOv5模型的基礎(chǔ)上添加了CA注意力模塊和C3CA特征提取模塊,提升網(wǎng)絡(luò)對(duì)蕉穗特征信息的提取;并用EIoU損失函數(shù)替換原先的CIoU損失函數(shù);在目標(biāo)檢測(cè)階段回歸了蕉穗底部果軸定位點(diǎn),通過(guò)深度相機(jī)獲取定位點(diǎn)在相機(jī)坐標(biāo)系下的三維坐標(biāo)。
1)與YOLOv5模型相比,改進(jìn)YOLOv5模型對(duì)蕉穗的識(shí)別準(zhǔn)確率提升了2.8個(gè)百分點(diǎn),平均精度值提升了0.17個(gè)百分點(diǎn);模型大小減小了1.06 MB。改進(jìn)模型在模型大小和平均精度值方面都得到提升,減少了訓(xùn)練時(shí)的內(nèi)存損耗,更利于模型的遷移應(yīng)用,加強(qiáng)室外實(shí)時(shí)識(shí)別的效果。
2)在定位試驗(yàn)中,改進(jìn)YOLOv5模型的誤差均值E和誤差比均值E分別為0.063 m和2.992%。較YOLOv5模型降低了0.022 m和1.173個(gè)百分點(diǎn);較Faster R-CNN模型降低了0.105 m和5.054個(gè)百分點(diǎn)。改進(jìn)模型的定位試驗(yàn)誤差最低,進(jìn)一步減小了對(duì)蕉穗底部果軸的定位誤差。
采用改進(jìn)YOLOv5模型能較好的在果園環(huán)境下進(jìn)行遷移應(yīng)用和快速識(shí)別定位,能夠滿足果園環(huán)境下香蕉采摘機(jī)器人承接機(jī)構(gòu)對(duì)蕉穗底部果軸的定位要求。
[1] 段潔利,陸華忠,王慰祖,等. 水果采收機(jī)械的現(xiàn)狀與發(fā)展[J]. 廣東農(nóng)業(yè)科學(xué),2012,39(16):189-192.
Duan Jieli, Lu Huazhong, Wang Weizu, et al. Present situation and developent of the fruit harvesting machinery[J]. Guangdong Agricultural Sciences, 2012, 39(16): 189-192. (in Chinese with English abstract).
[2] Duan Jieli, Wang Zhaorui, Ye Lei, et al. Research progress and developent trend of motion planning of fruit picking robot arm[J]. Journal of Intellient Agricultural Mechanization, 2021, 2(2): 7-17.
段潔利,王昭銳,葉磊,等. 水果采摘機(jī)械臂運(yùn)動(dòng)規(guī)劃研究進(jìn)展與發(fā)展趨勢(shì)[J]. 智能化農(nóng)業(yè)裝備學(xué)報(bào),2021,2(2):7-17. (in English with Chinese abstract)
[3] Liu Y, Ma X, Shu L, et al. From industry 4.0 to agriculture 4.0: Current status, enabling technologies, and research challenges[J]. IEEE Transactions on Industrial Informatics, 2020, 17(6): 4322-4334.
[4] 盧軍,桑農(nóng). 變化光照下樹(shù)上柑橘目標(biāo)檢測(cè)與遮擋輪廓恢復(fù)技術(shù)[J]. 農(nóng)業(yè)機(jī)械學(xué)報(bào),2014,45(4):76-81.
Lu Jun, Sang Nong. Detection of circus fruits within tree canopy and recovery for occusion contour in variable illumination[J]. Transactions of the Chinese Society of Agricultural Machinery, 2014, 45(4): 76-81. (in Chinese with English abstract)
[5] 顧蘇杭,馬正華,呂繼東. 基于顯著性輪廓的蘋(píng)果目標(biāo)識(shí)別方法[J]. 計(jì)算機(jī)應(yīng)用研究,2017,34(8):2551-2556.
Gu Suhang, Ma Zhenghua, Lü Jidong. Recognition method of apple target based on sinificant contour[J]. Application Research of Computers, 2017, 34(8): 2551-2556. (in Chinese with English abstract)
[6] Yamamoto K, Guo W, Yoshioka Y, et al. On plant detection of intact tomato fruits using image analysis and machine learning methods[J]. Sensors, 2014, 14(7): 12191-12206.
[7] Kang D, Benipal Sukhpreet S, Gopal Dharshan L, et al. Hybrid pixel-level concrete crack segmentation and quantification across complex backgrounds using deep learning[J]. Automation in Construction, 2020, 118: 103291.
[8] 胡根生,吳繼甜,鮑文霞,等. 基于改進(jìn)YOLOv5網(wǎng)絡(luò)的復(fù)雜背景圖像中茶尺蠖檢測(cè)[J]. 農(nóng)業(yè)工程學(xué)報(bào),2021,37(21):191-198.
Hu Gensheng, Wu Jitian, Bao Wenxia, et al. Detection ofin complex background images using improved YOLOv5[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2021, 37(21): 191-198. (in Chinese with English abstract)
[9] LeCun Yann, Bengio Yoshua, Hinton Geoffrey. Deep learning[J]. Nature, 2015, 521(7553): 436-444.
[10] Girshick R. Fast R-CNN[C]//Santiago: Proceedings of the IEEE International Conference on Computer Vision, 2015: 1440-1448.
[11] Ren S, He K, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. Advances in Neural Information Processing Systems, 2015, 28: 91-99.
[12] Li J, Liang X, Shen S, et al. Scale-aware Fast r-cnn for pedestrian etection[J]. IEEE Transactions on Multimedia, 2017, 20(4): 985-996.
[13] Liu W, Anguelov D, Erhan D, et al. SSD: Single shot multibox detector[C]//Cham: European Conference on Computer Vision. Springer. 2016: 21-37.
[14] Womg A, Shafiee Mohammad J, Li F, et al. Tiny SSD: A tiny single-shot detection deep convolutional neural network for real-time embedded object detection[C]//Toronto: 2018 15th Conference on Computer and Robot Vision (CRV), 2018: 95-101.
[15] Wang X, Hua X, Xiao F, et al. Multi-object detection in traffic scenes based on improved SSD[J]. Electronics, 2018, 7(11): 302.
[16] Zhai S, Shang D, Wang S, et al. DF-SSD: An improved SSD object detection algorithm based on DenseNet and feature fusion[J]. IEEE Access, 2020, 8: 24344-24357.
[17] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Las Vegas: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779-788.
[18] Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//Honolulu: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2017: 6517-6525.
[19] Redmon J, Farhadi A YOLOv3: An incremental improvement[EB/OL]. (2018-04-08) [2022-08-12] https://arxiv.org/abs/1804.02767.
[20] Alexey B, Wang C, Liao H. YOLOv4: Optimal speed and accuracy of object detection[EB/OL]. (2020-04-23) [2022-08-12] https://arxiv.org/abs/2004.10934.
[21] Fang W, Wang L, Ren P. Tinier-YOLO: A real-time object detection method for constrained environments[J]. IEEE Access, 2019, 8: 1935-1944.
[22] 呂石磊,盧思華,李震,等. 基于改進(jìn)YOLOv3-LITE輕量級(jí)神經(jīng)網(wǎng)絡(luò)的柑橘識(shí)別方法[J]. 農(nóng)業(yè)工程學(xué)報(bào),2019,35(17):205-214.
Lü Shilei, Lu Sihua, Li Zhen, et al. Orange recognition method using improved YOLOv3-LITE lightweight neural network[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2019, 35(17): 205-214. (in Chinese with English abstract)
[23] 趙德安,吳任迪,劉曉洋,等. 基于YOLO深度卷積神經(jīng)網(wǎng)絡(luò)的復(fù)雜背景下機(jī)器人采摘蘋(píng)果定位[J]. 農(nóng)業(yè)工程學(xué)報(bào),2019,35(3):164-173.
Zhao De'an, Wu Rendi, Liu Xiaoyang, et al. Apple positioning based on YOLO deep convolutional neural network for picking robot in complex background[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2019, 35(3): 164-173. (in Chinese with English abstract)
[24] Tian Y, Yang G, Wang Z, et al. Apple detection during different growth stages in orchards using the improved YOLO-V3 model[J]. Computers and Electronics in Agriculture, 2019, 157: 417-426.
[25] Liu G, Nouaze J C, Touko Mbouembe P L, et al. YOLO-Tomato: A robust algorithm for tomato detection based on YOLOv3[J]. Sensors, 2020, 20(7): 2145.
[26] Fu L, Duan J, Zou X, et al. Banana detection based on color and texture features in the natural environment[J]. Computers and Electronics in Agriculture, 2019, 167: 105057.
[27] Fu L, Duan J, Zou X, et al. Fast and accurate detection of banana fruits in complex background orchards[J]. IEEE Access, 2020, 8: 196835-196846.
[28] Fu L, Yang Z, Wu F, et al. YOLO-Banana: A lightweight neural network for rapid detection of banana bunches and stalks in the natural environment[J]. Agronomy, 2022, 12(2), 391.
[29] Fu L, Wu F, Zou X, et al. Fast detection of banana bunches and stalks in the natural environment based on deep learning[J]. Computers and Electronics in Agriculture, 2022, 194: 106800.
[30] Wu F, Duan J, Chen S, et al. Multi-target recognition of bananas and automatic positioning for the inflorescence axis cutting point[J/OL]. Frontiers in Plant Science, (2021-11-02)[2022-08-12] https://doi.org/10.3389/fpls.2021.705021.
[31] 吳烽云,葉雅欣,陳思宇,等. 復(fù)雜環(huán)境下香蕉多目標(biāo)特征快速識(shí)別研究[J]. 華南農(nóng)業(yè)大學(xué)學(xué)報(bào),2022,43(2):96-104.
Wu Fengyun, Ye Yaxin, Chen Siyu, et al. Research on fast recognition of banana multi-target features by visual robot in complex environment[J]. Journal of South China Agricultural University, 2022, 43(2): 96-104. (in Chinese with English abstract)
[32] Wu F, Duan J, Ai P, et al. Rachis detection and three-dimensional localization of cut off point for vision-based banana robot[J]. Computers and Electronics in Agriculture, 2022, 198: 107079.
[33] Hou Q, Zhou D, Feng J. Coordinate attention for efficient mobile network design[EB/OL]. (2021-03-04) [2022-08-12] https://arxiv.org/abs/2103.02907.
[34] Zhang Y, Ren W, Zhang Z, et al. Focal and efficient IOU loss for accurate bounding box regression[EB/OL]. (2022-07-16) [2022-08-12] https://arxiv.org/abs/2101.08158.
Recognition of bananas to locate bottom fruit axis using improved YOLOv5
Duan Jieli1,2, Wang Zhaorui1, Zou Xiangjun1, Yuan Haotian1, Huang Guangsheng1, Yang Zhou1,2,3※
(1.,,510642,; 2.,510600,; 3.,,514015,)
Banana has been one of the major fruits in the production and consumption in China. But, the banana harvesting is a high labor consuming activity with the low efficiency and large fruit damage. This study aims to improve the operation efficiency and quality of the banana in the picking robot. An accurate and rapid recognition was also proposed to locate the fruit axis at the bottom of banana using the YOLOv5 algorithm. Specifically, a coordinate attention (CA) mechanism was fused into the backbone network. The Concentrated-Comprehensive Convolution Block (C3) feature extraction module was fused with the CA attention mechanism module to form the C3CA module, in order to enhance the extraction of the banana feature information. The original Complete Intersection over Union (CIoU) of loss function was replaced with the Efficient Intersection over Union (EIoU). As such, the convergence of the model was speeded up to reduce the loss value. After that, the anchor point was determined for the test to improve the regression formula of prediction target box. The camera coordinate system of the point was transformed to deal with the three-dimensional coordinates. D435i depth camera was then used to locate the fruit axis at the bottom of banana. The original YOLOv5, Faster R-CNN, and improved YOLOv5 model were trained to verify the model. The accuracy of the improved model increased by 2.8 percentage points, the recall rate reached 100%, and the average accuracy value increased by 0.17 percentage points, compared with the original. There were the 52.96 percentage points higher precision, 17.91 percentage points higher recall, and 21.26 percentage points higher average precision value, compared with the Faster R-CNN model. The size of the improved model was reduced by 1.06MB, compared with the original. The field test was conducted on July 1, 2022 in Dongguan Fruit and Vegetable Research Institute, Guangdong Province, China. A test was realized for the random real-time location of the fruit axis at the bottom of banana in the field environment. The original YOLOv5, Faster R-CNN, and improved YOLOv5 model were used to recognize and localize the single and double plants in the range of 1.0-2.5 m. Each model was tested for 10 times. The estimated and real values were recorded to calculate the mean error, the mean error ratio, and the mean value. The original YOLOv5, Faster R-CNN, and improved YOLOv5 model all performed better to identify the banana in the field of view within the localization range and the estimated values. Among them, the mean errors were 0.085, 0.168, and 0.063 m, respectively, while the mean error ratios were 4.165%, 8.046%, and 2.992%, respectively. The mean values of error and error ratio in the improved model were reduced by 0.105 m, and 5.054 percentage points, respectively, during the original training, compared with the Faster R-CNN model. By contrast, the error and error ratio of the improved YOLOv5 model were reduced by 0.022 m and 1.173 percentage points, respectively, compared with the original. In addition, the measurement error greater than 0.2 m in the test was a locating error. Only test 6 showed the locating errors with the low error rate in the improved YOLOv5 model. The locating errors were found in tests 3 and 4 of the original, while the Faster R-CNN model showed the localization errors in the tests of 1, 4 and 8. Together with the ideal localization, the lower error and higher dimensional accuracy, the improved YOLOv5 model was conducive to the migration application and rapid recognition of bananas in the complex environments. In this case, the vision module of banana picking robot can meet the requirements for the axial locating of the undertaking mechanism at the bottom of banana fruit in the field environment.
image recognition;robots; banana picking; fruit axis location; attention mechanism; loss function
10.11975/j.issn.1002-6819.2022.19.014
S225.93
A
1002-6819(2022)-19-0122-09
段潔利,王昭銳,鄒湘軍,等. 采用改進(jìn)YOLOv5的蕉穗識(shí)別及其底部果軸定位[J]. 農(nóng)業(yè)工程學(xué)報(bào),2022,38(19):122-130.doi:10.11975/j.issn.1002-6819.2022.19.014 http://www.tcsae.org
Duan Jieli, Wang Zhaorui, Zou Xiangjun, et al. Recognition of bananas to locate bottom fruit axis using improved YOLOv5[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2022, 38(19): 122-130. (in Chinese with English abstract) doi:10.11975/j.issn.1002-6819.2022.19.014 http://www.tcsae.org
2022-08-12
2022-09-29
嶺南現(xiàn)代農(nóng)業(yè)實(shí)驗(yàn)室科研項(xiàng)目(NT2021009);國(guó)家重點(diǎn)研發(fā)計(jì)劃項(xiàng)目(2020YFD1000104);財(cái)政部和農(nóng)業(yè)農(nóng)村部:現(xiàn)代農(nóng)業(yè)產(chǎn)業(yè)技術(shù)體系建設(shè)專項(xiàng)資金(CARS-31-10);廣東省現(xiàn)代農(nóng)業(yè)產(chǎn)業(yè)技術(shù)體系創(chuàng)新團(tuán)隊(duì)建設(shè)專項(xiàng)資金(2022KJ109)
段潔利,博士,博士生導(dǎo)師,研究方向?yàn)橹悄苻r(nóng)業(yè)裝備。Email:duanjieli@scau.edu.cn
楊洲,博士,博士生導(dǎo)師,研究方向?yàn)樗a(chǎn)機(jī)械化與信息化。Email:yangzhou@scau.edu.cn