傅隆生,馮亞利,Elkamil Tola,劉智豪,李 瑞,崔永杰
?
基于卷積神經(jīng)網(wǎng)絡(luò)的田間多簇獼猴桃圖像識別方法
傅隆生1,2,馮亞利1,Elkamil Tola3,劉智豪1,李 瑞1,崔永杰1,2
(1. 西北農(nóng)林科技大學(xué)機(jī)械與電子工程學(xué)院,楊凌 712100; 2. 農(nóng)業(yè)部農(nóng)業(yè)物聯(lián)網(wǎng)重點(diǎn)實(shí)驗(yàn)室,楊凌 712100; 3. Precision Agriculture Research Chair, King Saud University, Riyadh 11451, Saudi Arabia)
為實(shí)現(xiàn)田間條件下快速、準(zhǔn)確地識別多簇獼猴桃果實(shí),該文根據(jù)獼猴桃的棚架式栽培模式,采用豎直向上獲取果實(shí)圖像的拍攝方式,提出一種基于LeNet卷積神經(jīng)網(wǎng)絡(luò)的深度學(xué)習(xí)模型進(jìn)行多簇獼猴桃果實(shí)圖像的識別方法。該文構(gòu)建的卷積神經(jīng)網(wǎng)絡(luò)通過批量歸一化方法,以ReLU為激活函數(shù),Max-pooling為下采樣方法,并采用Softmax回歸分類器,對卷積神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)進(jìn)行優(yōu)化。通過對100幅田間多簇獼猴桃圖像的識別,試驗(yàn)結(jié)果表明:該識別方法對遮擋果實(shí)、重疊果實(shí)、相鄰果實(shí)和獨(dú)立果實(shí)的識別率分別為78.97%、83.11%、91.01%和94.78%。通過與5種現(xiàn)有算法進(jìn)行對比試驗(yàn),該文算法相對相同環(huán)境下的識別方法提高了5.73個百分點(diǎn),且識別速度達(dá)到了0.27 s/個,識別速度較其他算法速度最快。證明了該文算法對田間獼猴桃圖像具有較高的識別率和實(shí)時性,表明卷積神經(jīng)網(wǎng)絡(luò)在田間果實(shí)識別方面具有良好的應(yīng)用前景。
圖像處理;圖像識別;算法;深度學(xué)習(xí);卷積神經(jīng)網(wǎng)絡(luò);獼猴桃
中國是獼猴桃栽培面積最大的國家[1],大都是手工收獲果實(shí)[2]。在鄉(xiāng)村勞動力向城鎮(zhèn)轉(zhuǎn)移的大背景下,發(fā)展獼猴桃自動化采摘技術(shù),特別是研發(fā)獼猴桃采摘機(jī)器人,具有重要的意義[3-7]。獼猴桃采摘機(jī)器人的首要及關(guān)鍵技術(shù)之一是果實(shí)的快速有效識別[7]。在自然場景下,由于獼猴桃果實(shí)顏色與枯葉、枝干、果柄等復(fù)雜背景的顏色相近[8],果實(shí)成簇并存在大量重疊與遮擋果實(shí)。因此,對田間環(huán)境下獼猴桃果實(shí)的特征學(xué)習(xí)從而進(jìn)行識別是獼猴桃采摘機(jī)器人急需解決的一個關(guān)鍵性問題[9]。
近年來,科研工作者對自然環(huán)境下獼猴桃果實(shí)的識別進(jìn)行了深入研究??傮w分為2部分,從果實(shí)斜側(cè)面獲取圖像進(jìn)行識別和從果實(shí)底部豎直向上采集圖像進(jìn)行識別。對于從斜側(cè)面獲取的圖像,丁亞蘭等[10]利用顏色因子,采用固定閾值進(jìn)行獼猴桃圖像分割,但無法有效識別強(qiáng)反光及暗影區(qū)的果實(shí);崔永杰等[11]利用**顏色空間*通道進(jìn)行獼猴桃圖像分割,采用橢圓形Hough變換擬合單個果實(shí)輪廓,但對田間背景下的獼猴桃果實(shí)分割效果不理想;崔永杰等[8]通過對比不同顏色空間,提出利用0.9-顏色特征,結(jié)合橢圓形Hough變換進(jìn)行果實(shí)的識別,但識別的果實(shí)針對特定類型,在實(shí)際應(yīng)用中其適用性受到抑制;詹文田等[5]基于Adaboost算法,通過引入RGB、HIS、***顏色模型構(gòu)建識別獼猴桃果實(shí)的分類器,但識別速度有待提高;慕軍營等[12]利用Otsu算法在*通道進(jìn)行圖像分割,基于正橢圓Hough變換提取Canny算子獲取的獼猴桃果實(shí)邊緣圖像進(jìn)行識別,但不能很好識別遠(yuǎn)處的果實(shí)。對于豎直向上獲取的圖像,Scarfe等[13]采用固定閾值法去除背景,提取Sobel邊緣后,利用模板匹配的方法識別獼猴桃,但未利用果實(shí)的形狀信息;Fu等[14]提出1.1-顏色特性進(jìn)行夜間獼猴桃圖像分割,并結(jié)合最小外接矩形法和橢圓形Hough變換識別每個果實(shí),但只能識別單簇果實(shí);傅隆生等[9]利用豎直向上成像時果萼都顯現(xiàn)且與果實(shí)有區(qū)別的特點(diǎn),進(jìn)行基于果萼的夜間獼猴桃識別,但未涉及遮擋與重疊果實(shí)的識別,且對多果簇識別效果不佳。田間環(huán)境下的獼猴桃果實(shí)圖像特征多樣、背景復(fù)雜且形態(tài)特征差異大。已有識別方法主要根據(jù)經(jīng)驗(yàn),受樣本和人為主觀性的影響,很難具有普適性,魯棒性差,難以用一種方法同時識別所有類型的獼猴桃果實(shí),且不能同時識別多簇果實(shí),不能滿足復(fù)雜田間環(huán)境下的應(yīng)用需求。
相比常規(guī)方法,近年興起的卷積神經(jīng)網(wǎng)絡(luò)[15](convolutional neural network,CNN)方法直接由數(shù)據(jù)本身驅(qū)動特征及表達(dá)關(guān)系的自我學(xué)習(xí),對圖像具有極強(qiáng)的數(shù)據(jù)表征能力。CNN已在手寫字符識別[16-18]、人臉識別[19-21]、行為識別[22-23]以及農(nóng)作物識別[24-25]等方面,獲得了較好的效果。學(xué)者們也開始CNN在果實(shí)識別方面的研究,王前程[26]將CNN應(yīng)用于處理后的6種水果圖像數(shù)據(jù)集進(jìn)行識別,證明了CNN在水果圖像識別上的有效性;Sa等[27]基于CNN模型建立果實(shí)的深度網(wǎng)絡(luò)識別模型,對不同的果實(shí)圖像進(jìn)行測試,取得不錯的效果。以上研究的開展為CNN應(yīng)用于果實(shí)識別提供了參考和可行性依據(jù),同時也表明CNN在圖像識別中可以克服傳統(tǒng)方法的不足。
本文在采集大量田間樣本圖像的基礎(chǔ)上,通過CNN對復(fù)雜背景下的獼猴桃果實(shí)進(jìn)行識別,避免人為主觀因素影響識別結(jié)果。依據(jù)田間環(huán)境下獼猴桃圖像的特點(diǎn),優(yōu)化LeNet卷積神經(jīng)網(wǎng)絡(luò)的結(jié)構(gòu)與參數(shù),從而建立一種基于卷積神經(jīng)網(wǎng)絡(luò)的田間獼猴桃果實(shí)圖像的識別模型,以實(shí)現(xiàn)田間復(fù)雜環(huán)境下多簇獼猴桃果實(shí)的快速有效識別。
試驗(yàn)供試圖像于2016年10月—11月采集自陜西省眉縣獼猴桃試驗(yàn)站(34°07'39''N,107°59'50''E,海拔648 m),將數(shù)碼相機(jī)(Canon EOS 40D)通過三腳架置于獼猴桃果實(shí)下方100 cm左右對“海沃德”品種進(jìn)行拍照。共采集原始圖像700幅,晴天上午、下午2個不同時間段各350幅,圖像格式為JPEG,分辨率為2 352×1 568像素,如圖1所示。
圖1 田間自然環(huán)境下的獼猴桃圖像
獼猴桃采用棚架式栽培方式形成果實(shí)自然下垂且位于枝葉下方的特點(diǎn),底部豎直向上成像后,每個果實(shí)的果萼部分都顯現(xiàn)。該文隨機(jī)選取600幅(上午和下午各300幅)圖像,截取具有萼的單果作為目標(biāo)區(qū)域,并剔除無效的圖像區(qū)域,所截取的樣本圖像最小尺寸為74×76像素。再由人工對原始采集的圖片進(jìn)行篩選,從而避免數(shù)據(jù)樣本的錯誤選定和單一性。最終試驗(yàn)所用數(shù)據(jù)集由正樣本(6 000幅)和負(fù)樣本(4 020幅)組成,為2個不同時間段均勻分布(上午和下午各5 010幅)。數(shù)據(jù)集均用于卷積神經(jīng)網(wǎng)絡(luò)的訓(xùn)練和參數(shù)優(yōu)化驗(yàn)證,分別從正、負(fù)樣本中隨機(jī)選擇80%樣本構(gòu)建訓(xùn)練集,20%作為驗(yàn)證集。部分正、負(fù)樣本圖樣如圖2所示。
圖2 試驗(yàn)部分?jǐn)?shù)據(jù)集樣本示例
模型訓(xùn)練完成后,將剩余的100幅獼猴桃原始圖像(上午和下午各50幅)作為模型效果驗(yàn)證的測試集,為減少計算量及運(yùn)行時間,將原圖像縮放為600×400像數(shù)進(jìn)行測試,訓(xùn)練數(shù)據(jù)集與測試圖像間不重疊。最后本文將與已有的獼猴桃識別方法進(jìn)行對比分析。由于測試數(shù)據(jù)集中兩個不同時間段樣本數(shù)量呈均衡分布,因此可將測試結(jié)果的平均準(zhǔn)確率作為本文模型的識別效果評價指標(biāo)[28]。
本文使用Matlab的MatConvnet工具箱[29]建立卷積神經(jīng)網(wǎng)絡(luò)。LeNet[30]是典型的卷積神經(jīng)網(wǎng)絡(luò),最初成功用于手寫數(shù)字識別。由于獼猴桃果實(shí)的識別亦是對某一未知獼猴桃果實(shí)圖像進(jìn)行識別和匹配,該過程與LeNet手寫字符的識別相似。因此可以將卷積神經(jīng)網(wǎng)絡(luò)LeNet作為基礎(chǔ)網(wǎng)絡(luò)架構(gòu),并對其重要的結(jié)構(gòu)參數(shù)和訓(xùn)練策略進(jìn)行優(yōu)化,以獲取適合獼猴桃果實(shí)圖像識別的模型架構(gòu)。LeNet卷積神經(jīng)網(wǎng)絡(luò)具體算法描述如下:
1)卷積層
卷積核的尺寸與數(shù)量對于CNN的性能至關(guān)重要。輸入圖像通過個不同的卷積核卷積,生成個不同的特征圖,卷積層如式(1)所示。
2)下采樣層
下采樣層對輸入進(jìn)行抽樣,如式(2)所示。
本文處理平臺為筆記本計算機(jī),處理器為Inter(R)Core(TM)i3,主頻為2.40 GHz,4 GB內(nèi)存,500 GB硬盤,運(yùn)行環(huán)境為:Windows 7 64位,Matlab R2016a,Microsoft Visual Studio 12.0。
若將LeNet結(jié)構(gòu)直接引入獼猴桃果實(shí)圖像特征提取與分類,考慮與原始網(wǎng)絡(luò)所用樣本(手寫字符)的差異以及獼猴桃果實(shí)圖像的成像通道,本文將所用正、負(fù)樣本圖像通過插值縮放變化為3×32×32的矩陣,并將正、負(fù)樣本分別標(biāo)記為“2”和“1”,作為網(wǎng)絡(luò)訓(xùn)練的輸入。由于獼猴桃圖像受扭轉(zhuǎn)、變形等因素影響較小,因此可以約減原始LeNet網(wǎng)絡(luò)中各卷積層中局部感受野的數(shù)量,以提高網(wǎng)絡(luò)的訓(xùn)練速度。該文對不同結(jié)構(gòu)的卷積神經(jīng)網(wǎng)絡(luò)進(jìn)行訓(xùn)練,然后通過驗(yàn)證對比不同模型識別的準(zhǔn)確率及耗時試驗(yàn),本研究最終采用的局部感受野的尺寸均為5×5,3個卷積層C1、C3、C5局部感受野個數(shù)分別是6、16和120個。
針對各層分布不均和精度彌散的問題,該文引入批次規(guī)則化(batch normalization,BN)法減小影響,加快網(wǎng)絡(luò)收斂,防止過擬合。在原網(wǎng)絡(luò)第1、3、5卷積層后添加BN層,將輸出按照同一批次的特征數(shù)值規(guī)范化至同一分布,具體如下所示
激活函數(shù)采用非飽和線性修正單元(rectified linear units,ReLU)。由于Max-pooling作為一種非線性的下采樣方法,可以在一定的程度上降低卷積層參數(shù)誤差造成的估計均值偏移所引起的特征提取的誤差,試驗(yàn)選用Max-pooling 作為下采樣方法。網(wǎng)絡(luò)的訓(xùn)練階段采用批量隨機(jī)梯度下降法(mini-batch stochastic gradient descend)。
本文選用損失函數(shù)Softmax loss(對應(yīng)Softmax回歸分類器)進(jìn)行網(wǎng)絡(luò)性能的對比分析。最終確定的卷積神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)可表示為32×32-6C-2S-16C-2S-120C-2,如圖3所示。
注:C1、S2、C3、S4、C5和FC分別為第1 卷積層、第2下采樣層、第3卷積層、第4下采樣層、第5卷積層和全連接層。
基于LeNet的獼猴桃果實(shí)識別步驟[31]如下所示:
1)對裁剪后的獼猴桃圖像進(jìn)行分類并作相應(yīng)預(yù)處理,使圖像符合網(wǎng)絡(luò)訓(xùn)練的要求;
2)對1)中圖像進(jìn)行隨機(jī)采樣,獲得適量的數(shù)據(jù)集,初始化LeNet結(jié)構(gòu)得到初始化濾波器的權(quán)值;
3)將2)的濾波器與1)的訓(xùn)練集圖像卷積,獲得預(yù)定數(shù)量的特征圖,用BN法對數(shù)據(jù)進(jìn)行處理;
4)將3)中獲得的特征圖通過式(2)進(jìn)行最大化采樣,得到泛化后的圖像;
5)分別利用上邊3)和4)的方法對4)中輸出的特征圖進(jìn)行二次卷積,二次批量歸一化處理,二次下采樣,獲得所需的特征圖;
6)用同樣的方法對5)中輸出的特征圖進(jìn)行三次卷積,三次批量歸一化處理;
7)將6)中所有特征圖轉(zhuǎn)化為一個列向量,作為全連接層的輸入,計算識別結(jié)果和標(biāo)記的差異,通過反向傳播算法自頂向下調(diào)節(jié)更新網(wǎng)絡(luò)參數(shù);
8)輸入處理后的測試圖像,利用訓(xùn)練得到的網(wǎng)絡(luò)模型對測試圖像進(jìn)行分類,通過Softmax分類器,并結(jié)合多尺度滑動窗算法顯示識別結(jié)果。
由于田間拍攝的獼猴桃圖像中果實(shí)并非全部相互獨(dú)立,因此本文按照圖像中果實(shí)輪廓的完整程度將果實(shí)分為4種類型:第1類是指果實(shí)的部分區(qū)域被遮擋而導(dǎo)致輪廓不完整的果實(shí),稱為遮擋果實(shí),如圖4a所示;第2類是指兩個以及其以上果實(shí)區(qū)域互相遮擋不易于區(qū)分開的果實(shí),稱為重疊果實(shí),如圖4b所示(為矩形框所標(biāo)記的果實(shí));第3類是2個及以上果實(shí)輪廓相接,稱為相鄰果實(shí),如圖4c所示;第4類是指果實(shí)輪廓獨(dú)立完整且相互分離的果實(shí),稱為獨(dú)立果實(shí),如圖4d所示。
圖4 獼猴桃圖像類別
采用上文描述的CNN結(jié)構(gòu),使用訓(xùn)練集樣本來訓(xùn)練CNN,網(wǎng)絡(luò)初始權(quán)重的初始化采用標(biāo)準(zhǔn)差為0.01,均值為0的高斯分布。樣本迭代次數(shù)均設(shè)置為45次,批處理BatchSize為100,并設(shè)置權(quán)重參數(shù)的初始學(xué)習(xí)速率為0.001,動量因子設(shè)置為0.9。對上述訓(xùn)練集進(jìn)行45次迭代的訓(xùn)練,其變化曲線如圖5所示。
結(jié)果表明,隨著迭代次數(shù)不斷增加,訓(xùn)練集和驗(yàn)證集的分類誤差逐漸降低,當(dāng)訓(xùn)練迭代到第28次時,網(wǎng)絡(luò)可以實(shí)現(xiàn)對訓(xùn)練集和驗(yàn)證集的誤識別率都降至0,之后分類準(zhǔn)確率趨于穩(wěn)定,且從第3次迭代以后訓(xùn)練集和驗(yàn)證集兩者的誤差差值相差不大,說明模型狀況良好。模型在經(jīng)過28次迭代后,訓(xùn)練損失基本收斂到穩(wěn)定值,表明卷積神經(jīng)網(wǎng)絡(luò)達(dá)到了預(yù)期的訓(xùn)練效果。
圖5 訓(xùn)練和驗(yàn)證誤差曲線
按照圖3所示的網(wǎng)絡(luò)結(jié)構(gòu),使用訓(xùn)練好的模型對獼猴桃果實(shí)樣本進(jìn)行識別。圖6為輸入的獼猴桃果實(shí)圖像經(jīng)過3個卷積層所對應(yīng)的各層特征圖的輸出結(jié)果,輸出層輸出1和2分別指代背景和果實(shí)。由圖6所示的各層顯示結(jié)果可知,卷積操作能夠有效提取獼猴桃果實(shí)特征,說明本試驗(yàn)的網(wǎng)絡(luò)結(jié)構(gòu)通過局部感受野和權(quán)值共享,能夠降低背景干擾、增強(qiáng)目標(biāo)特征。
圖6 卷積神經(jīng)網(wǎng)絡(luò)各卷積層的處理結(jié)果示例
為了驗(yàn)證模型的可靠性與穩(wěn)定性,對測試集的100幅田間獼猴桃果實(shí)圖像(上午和下午各50幅,共包含目標(biāo)獼猴桃果實(shí)5 918個)進(jìn)行識別。該文選用重疊系數(shù)[32]作為試驗(yàn)結(jié)果有效性的評價指標(biāo),重疊系數(shù)是指識別后的目標(biāo)與真實(shí)目標(biāo)重合的比率。該課題組設(shè)計的末端執(zhí)行器[7]根據(jù)獼猴桃生長特點(diǎn),從果實(shí)底部旋轉(zhuǎn)上升伸入毗鄰間隙,采用逐漸包絡(luò)的方式分離毗鄰果實(shí)并抓持,試驗(yàn)結(jié)果表明允許的誤差半徑為10 mm,因此只需知道果實(shí)的大部分區(qū)域(80%)即可進(jìn)行果實(shí)的采摘,避免果實(shí)實(shí)際區(qū)域難以精確定位的問題。因此當(dāng)重疊系數(shù)大于等于80%,即為正確識別。果實(shí)識別成功率為成功識別的果實(shí)數(shù)與實(shí)際目標(biāo)果實(shí)數(shù)的比值。果實(shí)識別時間為一幅圖片的運(yùn)行時間除以該圖片中成功識別的果實(shí)數(shù)。識別結(jié)果如表1所示,識別效果如圖7所示。
表1 獼猴桃果實(shí)識別結(jié)果
由表1知,獨(dú)立果實(shí)識別率最高(94.78%),其次是相鄰果實(shí)(91.01%),再次是重疊果實(shí)(83.11%),識別效果最差的是遮擋果實(shí)(78.97%)。當(dāng)獼猴桃果實(shí)距離圖像中心較遠(yuǎn)從而發(fā)生變形或被枝葉遮擋面積較大時,果實(shí)易被誤識別;當(dāng)圖像中多個果實(shí)連續(xù)重疊時,易將后邊果實(shí)與前方果實(shí)判斷為一個果實(shí)或者無法識別后方重疊嚴(yán)重的果實(shí),出現(xiàn)漏識別現(xiàn)象,如圖7a所示;當(dāng)圖像中果實(shí)的部分或整體區(qū)域被陽光直射形成強(qiáng)烈反光,該區(qū)域不易識別或無法識別,影響識別精度;圖像中的多個相鄰果實(shí)兩側(cè)相鄰部分輪廓,易被識別為一個果實(shí),出現(xiàn)誤識別,是造成識別率低的主要原因,如圖7b所示。圖7b所示的誤識別情況,可能原因是制作訓(xùn)練數(shù)據(jù)集時,單個獼猴桃果實(shí)裁剪效果不理想(裁剪重疊區(qū)域時沒有處理好邊緣問題)。
圖7 獼猴桃果實(shí)識別結(jié)果以及誤識別示例
在圖7中有一些獼猴桃果實(shí)所占的區(qū)域不能被精確的識別,識別區(qū)域(圖7中黑色的框)稍有偏離果實(shí)實(shí)際區(qū)域。但整體上而言,果實(shí)的主要區(qū)域已被識別,采用本課題組開發(fā)的獼猴桃采摘機(jī)器人末端執(zhí)行器[7]能夠?qū)崿F(xiàn)果實(shí)的采摘。
相關(guān)文獻(xiàn)提出了基于田間環(huán)境下獼猴桃圖片的識別方法,為了驗(yàn)證本文提出的算法性能,與Scarfe[13]、詹文田等[5]、崔永杰等[8]、Fu等[14]、傅隆生等[9]5種常規(guī)方法進(jìn)行比較,結(jié)果如表2所示。
表2 不同獼猴桃果實(shí)識別方法的性能比較
從表2中可以看到,崔永杰等[8]、Fu等[14]提出的算法的識別率與本文算法的識別率相近,詹文田等[5]、傅隆生等[9]提出的算法的識別率比本文算法的識別率分別高7.41個百分點(diǎn)和5.01個百分點(diǎn)。但是Fu等[14]和傅隆生等[9]識別的獼猴桃果實(shí)圖像是近距離底部拍攝的圖像,并且只針對果實(shí)相互獨(dú)立和相鄰的單簇少果類型,識別5果及以上效果不好;詹文田等[5]和崔永杰等[8]識別的獼猴桃果實(shí)圖像是近距離側(cè)面拍攝的圖像,圖像的獲取在很大程度上是人為拍攝特定角度的圖片,并且用于試驗(yàn)的測試圖片每一張只針對一種果實(shí)特征,在實(shí)際應(yīng)用中具有一定局限性。此外,Scarfe[13]、詹文田等[5]、崔永杰等[8]、Fu等[14]、傅隆生等[9]的識別方法均需對獼猴桃圖像提取人工選取的底層特征,對圖像進(jìn)行大量的預(yù)處理,操作復(fù)雜。另外已有常規(guī)算法缺乏高層次表達(dá),難以體現(xiàn)所選底層特征間的空間關(guān)系,因此對于識別多果相對困難。
本文提出的算法只需對圖像進(jìn)行簡單的預(yù)處理,用于試驗(yàn)的測試圖片每一張都包含果實(shí)的4種特征,一幅圖片至少包含30個果實(shí)以上,并且在單個獼猴桃果實(shí)識別速度上本文算法比其他3種算法有了明顯提升。在相同的識別環(huán)境下,本文算法的識別率89.29%高于Scarfe[13]的83.56%,提高了5.73個百分點(diǎn)。總體而言,本文提出的基于卷積神經(jīng)網(wǎng)絡(luò)的識別方法具有較強(qiáng)的抗干擾能力,可以同時識別田間復(fù)雜環(huán)境下的多簇獼猴桃果實(shí),且識別過程耗時短,對光線變化、枝葉遮擋均具有相對較好的魯棒性,更加滿足獼猴桃采摘機(jī)器人實(shí)際應(yīng)用中的采摘要求。
1)針對獼猴桃采摘的需求,提出了一種基于卷積神經(jīng)網(wǎng)絡(luò)的田間獼猴桃果實(shí)識別方法,本文對LeNet模型進(jìn)行參數(shù)優(yōu)化和結(jié)構(gòu)約簡,并通過試驗(yàn)驗(yàn)證,表明識別模型可以自動從復(fù)雜數(shù)據(jù)中有效學(xué)習(xí)到獼猴桃的特征,從而避免了常規(guī)方法中由研究者主觀選取特征的不足。同時,簡約后的模型在很大程度滿足了在實(shí)際中的應(yīng)用,加強(qiáng)了該模型在常規(guī)性能計算平臺上的適應(yīng)性。
2)本文構(gòu)建的32×32-6C-2S-16C-2S-120C-2結(jié)構(gòu)卷積神經(jīng)網(wǎng)絡(luò),經(jīng)過訓(xùn)練后對100幅圖像中共包含5 918個獼猴桃果實(shí)的識別率達(dá)到89.29%,相對其它遠(yuǎn)距離底部成像識別多簇獼猴桃果實(shí)的識別方法提高了5.73個百分點(diǎn)。在果實(shí)識別速度上,本算法達(dá)到平均0.27 s識別一個獼猴桃果實(shí),基本上滿足獼猴桃采摘機(jī)器人的工作需求。
3)本文所用模型可以應(yīng)用于田間環(huán)境下多果獼猴桃識別,突破了大多數(shù)常規(guī)識別算法不能同時識別多簇獼猴桃果實(shí)的不足,為獼猴桃采摘機(jī)器人多機(jī)械臂作業(yè)的研究提供有力支撐。
目前,該文所用模型可以準(zhǔn)確地識別出獼猴桃果實(shí)是否存在,但對于一些遮擋和重疊果實(shí)沒有達(dá)到很好的效果,尤其是兩個或兩個以上相鄰或重疊果實(shí)兩側(cè)部分輪廓,易被識別為一個果實(shí),從而出現(xiàn)誤識別現(xiàn)象,這種現(xiàn)象有待進(jìn)一步研究。同時,為了達(dá)到推廣應(yīng)用的效果,下一步將深化網(wǎng)絡(luò)結(jié)構(gòu),增加學(xué)習(xí)樣本的種類與數(shù)量,提高分類器的識別能力。
[1] 張計育,莫正海,黃勝男,等. 21世紀(jì)以來世界獼猴桃產(chǎn)業(yè)發(fā)展以及中國獼猴桃貿(mào)易與國際競爭力分析[J]. 中國農(nóng)學(xué)通報,2014,30(23):48-55.
Zhang Jiyu, Mo Zhenghai, Huang Shengnan, et al. Development of kiwifruit industry in the world and analysis of trade and international competitiveness in china entering 21st century[J]. China Agricultural Science Bulletin, 2014, 30(23): 48-55. (in Chinese with English abstract)
[2] 陳軍,王虎,蔣浩然,等. 獼猴桃采摘機(jī)器人末端執(zhí)行器設(shè)計[J]. 農(nóng)業(yè)機(jī)械學(xué)報,2012,43(10):151-154.Chen Jun, Wang Hu, Jiang Haoran, et al. Design of end-effector for kiwifruit harvesting robot[J]. Transactions of the Chinese Society for Agricultural Machinery, 2012, 43(10): 151-154. (in Chinese with English abstract)
[3] Zhang L, Wang Y, Yang Q, et al. Kinematics and trajectory planning of a cucumber harvesting robot manipulator[J]. International Journal of Agricultural & Biological Engineering, 2009, 2(1): 1-7.
[4] Rakun J, Stajnko D, Zazula D. Detecting fruits in natural scenes by using spatial-frequency based texture analysis and multiview geometry[J]. Computers & Electronics in Agriculture, 2011, 76(1): 80-88.
[5] 詹文田,何東健,史世蓮. 基于Adaboost算法的田間獼猴桃識別方法[J]. 農(nóng)業(yè)工程學(xué)報,2013,29(23):140-146.
Zhan Wentian, He Dongjian, Shi Shilian. Recognition of kiwifruit in field based on Adaboost algorithm[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2013, 29(23): 140-146. (in Chinese with English abstract)
[6] Bechar A, Vigneault C. Agricultural robots for field operations: Concepts and components[J]. Biosystems Engineering, 2016, 149: 94-111.
[7] 傅隆生,張發(fā)年,槐島芳德,等. 獼猴桃采摘機(jī)器人末端執(zhí)行器設(shè)計與試驗(yàn)[J]. 農(nóng)業(yè)機(jī)械學(xué)報,2015,46(3):1-8.
Fu Longsheng, Zhang Fanian, Gejima Yoshinori , et al. Development and experiment of end-effector for kiwifruit harvesting robot[J]. Transactions of the Chinese Society for Agricultural Machinery, 2015, 46(3): 1-8. (in Chinese with English abstract)
[8] 崔永杰,蘇帥,王霞霞,等. 基于機(jī)器視覺的自然環(huán)境中獼猴桃識別與特征提取[J]. 農(nóng)業(yè)機(jī)械學(xué)報,2013,44(5):247-252.
Cui Yongjie, Su Shuai, Wang Xiaxia, et al. Recognition and feature extraction of kiwifruit in natural environment based on machine vision[J]. Transactions of the Chinese Society for Agricultural Machinery, 2013, 44(5): 247-252. (in Chinese with English abstract)
[9] 傅隆生,孫世鵬,Vázquez-Arellano Manuel,等. 基于果萼圖像的獼猴桃果實(shí)夜間識別方法[J]. 農(nóng)業(yè)工程學(xué)報,2017,33(2):199-204.
Fu Longsheng, Sun Shipeng, Vázquez-Arellano Manuel, et al. Kiwifruit recognition method at night based on fruit calyx[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2017, 33(2): 199-204. (in Chinese with English abstract)
[10] 丁亞蘭,耿楠,周全程. 基于圖像的獼猴桃果實(shí)目標(biāo)提取研究[J]. 微計算機(jī)信息,2009,18(4):294-295.
Ding Yalan, Geng Nan, Zhou Quancheng. Research on the object extraction of kiwifruit based on images[J]. Microcomputer Information, 2009, 18(4): 294-295. (in Chinese with English abstract)
[11] 崔永杰,蘇帥,呂志海,等. 基于Hough變換的獼猴桃毗鄰果實(shí)的分離方法[J]. 農(nóng)機(jī)化研究,2012,34(12):166-169.
Cui Yongjie, Su Shuai, Lü Zhihai, et al. A method for separation of kiwifruit adjacent fruits based on Hough transformation[J]. Journal of Agricultural Mechanization Research, 2012, 34(12): 166-169. (in Chinese with English abstract)
[12] 慕軍營,陳軍,孫高杰,等. 基于機(jī)器視覺的獼猴桃特征參數(shù)提取[J]. 農(nóng)機(jī)化研究,2014,36(6):138-142.
Mu Junying, Chen Jun, Sun Gaojie, et al. Characteristic parameters extraction of kiwifruit based on machine vision[J]. Journal of Agricultural Mechanization Research, 2014, 36(6): 138-142.(in Chinese with English abstract)
[13] Scarfe A J. Development of an Autonomous Kiwifruit Harvester[D]. New Zealand, Manawatu: Massey University,2012.
[14] Fu L, Wang B, Cui Y, et al. Kiwifruit recognition at nighttime using artificial lighting based on machine vision[J]. International Journal of Agricultural and Biological Engineering, 2015, 8(4): 52-59.
[15] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[C]// Curran Associates Inc. International Conference on Neural Information Processing Systems. 2012: 1097-1105.
[16] Alwzwazy H, Albehadili H, Alwan Y, et al. Handwritten digit recognition using convolutional neural networks[J]. International Journal of Innovative Research in Computer & Communication Engineering, 2016, 4(2): 1101-1106.
[17] Yang W, Jin L, Tao D, et al. Dropsample : A new training method to enhance deep convolutional neural networks for large-scale unconstrained handwritten Chinese character recognition[J]. Pattern Recognition, 2016, 58(4): 190-203.
[18] Albu R D. Human face recognition using convolutional neural networks[J]. Journal of Electrical & Electronics Engineering, 2009, 2(2): 110-113.
[19] Ramaiah N P, Ijjina E P, Mohan C K. Illumination invariant face recognition using convolutional neural networks[C]// IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems, 2015: 1-4.
[20] Singh R, Om H. Newborn face recognition using deep convolutional neural network[J]. Multimedia Tools & Applications, 2017, 76(18): 19005-19015.
[21] Dobhal T, Shitole V, Thomas G, et al. Human activity recognition using binary motion image and deep learning [J]. Procedia Computer Science, 2015, 58: 178-185.
[22] Ronao C A, Cho S B. Human activity recognition with smartphone sensors using deep learning neural networks[J]. Expert Systems with Applications, 2016, 59: 235-244.
[23] 王忠民,曹洪江,范琳. 一種基于卷積神經(jīng)網(wǎng)絡(luò)深度學(xué)習(xí)的人體行為識別方法[J]. 計算機(jī)科學(xué),2016,43(s2):56-58.
Wang Zhongmin, Cao Hongjiang, Fan Lin. Method on human activity recognition based on convolutional neural networks[J]. Computer Science, 2016, 43(s2): 56-58. (in Chinese with English abstract)
[24] 高震宇,王安,劉勇,等. 基于卷積神經(jīng)網(wǎng)絡(luò)的鮮茶葉智能分選系統(tǒng)研究[J]. 農(nóng)業(yè)機(jī)械學(xué)報,2017,48(7):53-58.
Gao Zhenyu, Wang An, Liu Yong, et al. Intelligent fresh-tea-leaves sorting system research based on convolution neural network[J]. Transactions of the Chinese Society for Agricultural Machinery, 2017, 48(7): 53-58. (in Chinese with English abstract)
[25] 周云成,許童羽,鄭偉,等. 基于深度卷積神經(jīng)網(wǎng)絡(luò)的番茄主要器官分類識別方法[J]. 農(nóng)業(yè)工程學(xué)報,2017,33(15):219-226.
Zhou Yuncheng, Xu Tongyu, Zheng Wei, et al. Classification and recognition approaches of tomato main organs based on DCNN[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2017, 33(15): 219-226. (in Chinese with English abstract)
[26] 王前程. 基于深度學(xué)習(xí)的水果圖像識別算法研究[D].保定:河北大學(xué),2016.
Wang Qiancheng, The Algorithm Research of Fruit Image Recognition Based on Deep Learning[D]. Baoding: Hebei University,2016. (in Chinese with English abstract)
[27] Sa I, Ge Z, Dayoub F, et al. Deepfruits: A fruit detection system using deep neural networks[J]. Sensors, 2016, 16(8): 1-23.
[28] 楊國國,鮑一丹,劉子毅. 基于圖像顯著性分析與卷積神經(jīng)網(wǎng)絡(luò)的茶園害蟲定位與識別[J]. 農(nóng)業(yè)工程學(xué)報,2017,33(6):156-162.
Yang Guoguo, Bao Yidan, Liu Ziyi. Localization and recognition of pests in tea plantation based on image saliency analysis and convolutional neural network[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2017, 33(6): 156-162. (in Chinese with English abstract)
[29] Vedaldi A, Lenc K. MatConvNet: Convolutional neural networks for MATLAB[C]// 23rd ACM International Conference on Multimedia, Brisbane, Australia, 2015: 689-692.
[30] Lécun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
[31]李輝,石波. 基于卷積神經(jīng)網(wǎng)絡(luò)的人臉識別算法[J].軟件導(dǎo)刊, 2017,16(3):26-29.
[32] 宋懷波,張衛(wèi)園,張欣欣,等. 基于模糊集理論的蘋果表面陰影去除方法[J]. 農(nóng)業(yè)工程學(xué)報,2014,30(3):135-141.
Song Huaibo, Zhang Weiyuan, Zhang Xinxin, et al. Shadow removal method of apples based on fuzzy set theory[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2014, 30(3): 135-141. (in Chinese with English abstract)
Image recognition method of multi-cluster kiwifruit in field based on convolutional neural networks
Fu Longsheng1,2, Feng Yali1, Elkamil Tola3, Liu Zhihao1, Li Rui1, Cui Yongjie1,2
(1.712100; 2.712100; 3.)
China is the largest country for cultivating kiwifruit, and Shaanxi Province provides the largest production, which accounts for approximately 70% of the production in China and 33% of the global production. Harvesting kiwifruit in this region relies mainly on manual picking which is labor-intensive. Therefore, the introduction of robotic harvesting is highly desirable and suitable. The fast and effective recognition of kiwifruit in the field under natural scenes is one of the key technologies for robotic harvesting. Recently, the study on kiwifruit recognition has been limited to a single cluster and multi clusters in the field have seldom been considered. In this paper, according to growth characteristics of kiwifruit grown on sturdy support structures, an RGB (red, green, blue) camera was placed around 100 cm underneath the canopy so that kiwifruit clusters could be included in the images. We proposed a kiwifruit image recognition system based on the convolutional neural network (CNN), which has a good robustness avoiding the subjectivity and limitation of the features selection by artificial means. The CNN could be trained end to end, from raw pixels to ultimate categories, and we optimized the critical structure parameters and the training strategy. Ultimately, the network was made up of 1 input layer, 3 convolutional layers, 2 sub-sampling layers, 1 full convolutional layer, and 1 output layer. The CNN architecture was optimized by using batch normalization (BN) method, which normalized the data distribution of the middle layer and the output data, accelerating the training convergence and reducing the training time. Therefore, the BN layers were added after the 1, 3 and 5th convolutional layer (Conv1, Conv3, and Conv5 layer) of the original LeNet network. The size of all convolutional kernels was 5×5, and that of all the sub-sampling layers was 2×2. The feature map numbers of Conv1, Conv3, and Conv5 were 6, 16 and 120, respectively. After manual selection and normalizing, the RGB image of kiwifruit was transferred into a matrix with the size of 32×32 as the input of the network, stochastic gradient descent was used to train our models with mini-batch size of 100 examples, and momentum was set as 0.9. In addition, the CNN took advantages of the part connections, the weight sharing and Max pooling techniques to lower complexity and improve the training performance of the model simultaneously. The network used rectified linear units (ReLU) as activation function, which could greatly accelerate network convergence. The proposed model for training kiwifruit was represented as 32×32-6C-2S-16C-2S-120C-2. Finally, 100 images of kiwifruit in the field (including 5918 fruits) were used to test the model, and the results showed that the recognition ratios of occluded fruit, overlapped fruit, adjacent fruit and separated fruit were 78.97%, 83.11%, 91.01% and 94.78%, respectively. The overall recognition rate of the model reached 89.29%, and it only took 0.27 s in average to recognize a fruit. There was no overlap between the testing samples and the training samples, which indicated that the network had a high generalization performance, and the testing images were captured from 9 a.m. to 5 p.m., which indicated the network had a good robustness to lightness variations. However, some fruits were wrongly detected and undetected, which included the fruits occluded by branches or leaves, overlapped to each other and the ones under extremely strong sunlight. Particularly, 2 or more fruits overlapped were recognized as one fruit, which was the main reason to the success rate not very high. This phenomenon demands a further research. By comparing with the conventional methods, it suggested that the method proposed obtained a higher recognition rate and better speed, and especially it could simultaneously identify multi-cluster kiwifruit in the field, which provided significant support for multi-arm operation of harvesting robotic. It proves that the CNN has a great potential for recognition of fruits in the field.
image processing; image recognition; algorithms; deep learning; convolutional neural network; kiwifruit
10.11975/j.issn.1002-6819.2018.02.028
TP391.41
A
1002-6819(2018)-02-0205-07
2017-08-28
2017-12-26
陜西省重點(diǎn)研發(fā)計劃一般項(xiàng)目(2017NY-164);陜西省科技統(tǒng)籌創(chuàng)新工程計劃項(xiàng)目(2015KTCQ02-12);國家自然科學(xué)基金資助項(xiàng)目(61175099);西北農(nóng)林科技大學(xué)國際合作種子基金(A213021505)
傅隆生,江西吉安人,副教授,博士,主要從事農(nóng)業(yè)智能化技術(shù)與裝備研究。Email:fulsh@nwafu.edu.cn
中國農(nóng)業(yè)工程學(xué)會會員:傅隆生(E042600025M)
傅隆生,馮亞利,Elkamil Tola,劉智豪,李 瑞,崔永杰. 基于卷積神經(jīng)網(wǎng)絡(luò)的田間多簇獼猴桃圖像識別方法[J]. 農(nóng)業(yè)工程學(xué)報,2018,34(2):205-211. doi:10.11975/j.issn.1002-6819.2018.02.028 http://www.tcsae.org
Fu Longsheng, Feng Yali, Elkamil Tola, Liu Zhihao, Li Rui, Cui Yongjie. Image recognition method of multi-cluster kiwifruit in field based on convolutional neural networks[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2018, 34(2): 205-211. (in Chinese with English abstract) doi:10.11975/j.issn.1002-6819.2018.02.028 http://www.tcsae.org