薛月菊,楊曉帆,鄭 嬋,陳暢新,甘海明,李詩(shī)梅
基于隱馬爾科夫模型的深度視頻哺乳母豬高危動(dòng)作識(shí)別
薛月菊1,楊曉帆1,鄭 嬋2,陳暢新1,甘海明1,李詩(shī)梅1
(1. 華南農(nóng)業(yè)大學(xué)電子工程學(xué)院,廣州 510642;2. 華南農(nóng)業(yè)大學(xué)數(shù)學(xué)與信息學(xué)院,廣州 510642)
哺乳母豬的高危動(dòng)作和仔豬存活率有密切關(guān)系,能直接體現(xiàn)其母性行為能力,而這些高危動(dòng)作又與其姿態(tài)轉(zhuǎn)換的頻率、持續(xù)時(shí)間等密切相關(guān)。針對(duì)豬舍環(huán)境下,環(huán)境光線變化、母豬與仔豬黏連、豬體形變等給哺乳母豬姿態(tài)轉(zhuǎn)換識(shí)別帶來(lái)的困難。該文以梅花母豬為研究對(duì)象,以Kinect2.0采集的深度視頻圖像為數(shù)據(jù)源,提出基于Faster R-CNN和隱馬爾科夫模型的哺乳母豬姿態(tài)轉(zhuǎn)換識(shí)別算法,通過(guò)Faster R-CNN產(chǎn)生候選區(qū)域,并采用維特比算法構(gòu)建定位管道;利用Otsu分割和形態(tài)學(xué)處理提取疑似轉(zhuǎn)換片段中母豬軀干部、尾部和身體上下兩側(cè)的高度序列,由隱馬爾科夫模型識(shí)別姿態(tài)轉(zhuǎn)換。結(jié)果表明,對(duì)姿態(tài)轉(zhuǎn)換片段識(shí)別的精度為93.67%、召回率為87.84%。研究結(jié)果可為全天候母豬行為自動(dòng)識(shí)別提供技術(shù)參考。
圖像處理;算法;模型;高危動(dòng)作;哺乳母豬;Faster R-CNN;隱馬爾科夫模型;定位管道
在養(yǎng)豬場(chǎng)飼養(yǎng)環(huán)境下,母豬高危動(dòng)作主要體現(xiàn)在姿態(tài)轉(zhuǎn)換次數(shù)、頻率和持續(xù)時(shí)間,這也是其母性行為優(yōu)劣的重要體現(xiàn),與仔豬成活率密切相關(guān)[1]。通過(guò)人工眼睛直接觀察或視頻監(jiān)控觀察母豬姿態(tài)轉(zhuǎn)換具有強(qiáng)主觀性,且耗時(shí)費(fèi)力[2]。母豬姿態(tài)轉(zhuǎn)換行為自動(dòng)識(shí)別可為其母性行為特質(zhì)和規(guī)律研究提供基礎(chǔ)信息,對(duì)防止仔豬踩壓死亡,提高仔豬生存率,降低豬場(chǎng)管理的人工成本意義重大[3-4]。
傳感器技術(shù)已被用于監(jiān)測(cè)母豬姿態(tài),如Thompson等[4]通過(guò)傳感器獲取加速度數(shù)據(jù),用支持向量機(jī)識(shí)別母豬姿態(tài)和姿態(tài)轉(zhuǎn)換,姿態(tài)轉(zhuǎn)換識(shí)別的精度和召回率分別為 82.60%和76.10%;閆麗等[5]通過(guò)在母豬頸部或腿部佩戴MPU6050模塊采集母豬行為數(shù)據(jù),對(duì)高危動(dòng)作識(shí)別的準(zhǔn)確率為81.7%。為克服母豬應(yīng)激、傳感器脫落或損壞等影響,計(jì)算機(jī)視覺(jué)開始用于獲取母豬姿態(tài)信息。如Lao等[2]用深度圖像對(duì)母豬的臥、坐、跪、站立和飲食等動(dòng)作進(jìn)行識(shí)別,準(zhǔn)確率分別為99.9%、96.4%、78.1%、99.2%和92.7%;Zheng等[6]和薛月菊等[7]用目標(biāo)檢測(cè)器Faster R-CNN,識(shí)別了深度圖像中母豬的站立、坐、俯臥、腹臥和側(cè)臥姿態(tài),對(duì)5類姿態(tài)的平均精度均值分別為87.1%和93.25%。但文獻(xiàn)[2,6,7]的工作僅限于基于靜態(tài)圖像的姿態(tài)識(shí)別。
而哺乳母豬姿態(tài)轉(zhuǎn)換行為是從一個(gè)姿態(tài)變換到另一個(gè)姿態(tài)的行為,涉及到轉(zhuǎn)換的前后姿態(tài)及轉(zhuǎn)換過(guò)程。由于未充分利用行為的時(shí)間序列信息,已有的靜態(tài)圖像目標(biāo)檢測(cè)算法難以直接用于行為識(shí)別[8]。需從視頻幀序列中自動(dòng)分析正在進(jìn)行的姿態(tài)轉(zhuǎn)換行為。但因視頻的時(shí)域和空域都有極強(qiáng)的相關(guān)性,伴有大量時(shí)空信息和冗余信息,給行為識(shí)別帶來(lái)了挑戰(zhàn)[9-10]。
為實(shí)現(xiàn)行為識(shí)別,研究者采用不同方法提取視頻集中的時(shí)空信息并對(duì)其進(jìn)行分析。如Wang等[11]提出了基于時(shí)空密集軌跡的行為識(shí)別方法,在HMDB51數(shù)據(jù)集上的識(shí)別精度達(dá)到57.2%;Peng等[12]用堆疊的2個(gè)Fisher向量編碼層對(duì)文[11]密集軌跡進(jìn)行編碼,聚合形成視頻的描述特征,以提高行為識(shí)別精度,與文[11]相比在HMDB51數(shù)據(jù)集上的識(shí)別精度提高9.59%。但上述手工設(shè)計(jì)特征的方法過(guò)于依賴于專家經(jīng)驗(yàn),模型泛化能力不強(qiáng),難以應(yīng)用于真實(shí)的實(shí)時(shí)場(chǎng)景中[13]。最近,卷積神經(jīng)網(wǎng)絡(luò)(conv-olut-ionalneural networks,CNN)被引入至視頻行為識(shí)別領(lǐng)域[14-15]中。對(duì)于場(chǎng)景中存在多個(gè)運(yùn)動(dòng)目標(biāo)的情況,為克服場(chǎng)景中其他運(yùn)動(dòng)對(duì)象對(duì)所關(guān)注對(duì)象行為識(shí)別的影響,研究者基于CNN在視頻幀上進(jìn)行動(dòng)作區(qū)域檢測(cè),并進(jìn)行跟蹤和行為分析[16-18]。如Gkioxari等[16]和Saha等[17]分別通過(guò)雙通道CNN網(wǎng)絡(luò)提取每個(gè)候選區(qū)域的表觀特征和運(yùn)動(dòng)特征,再計(jì)算候選區(qū)域的動(dòng)作分?jǐn)?shù),最后將候選區(qū)域在時(shí)間維度上利用優(yōu)化算法進(jìn)行連接形成動(dòng)作管道,行為識(shí)別效果較好。然而,文[16-17]計(jì)算復(fù)雜,僅適合短視頻(小于200幀)的動(dòng)作分類,難以解決長(zhǎng)段視頻的母豬姿態(tài)轉(zhuǎn)換行為識(shí)別。
為克服自由欄場(chǎng)景下,母豬姿態(tài)轉(zhuǎn)換行為過(guò)程中姿態(tài)的不確定性和身體的形變,及仔豬對(duì)哺乳母豬姿態(tài)轉(zhuǎn)化行為識(shí)別的影響,借鑒文獻(xiàn)[16-18]動(dòng)作管道生成的思路,利用目標(biāo)檢測(cè)-跟蹤-行為分析的方法研究母豬姿態(tài)轉(zhuǎn)換行為識(shí)別。以深度視頻圖像為母豬姿態(tài)轉(zhuǎn)換識(shí)別的數(shù)據(jù)源,提出以Faster R-CNN[19]輸出的姿態(tài)概率形成疑似轉(zhuǎn)換片段定位管道,提取定位管道中的母豬高度時(shí)間序列特征,然后輸入隱馬爾科夫模型(hidden Markov model,HMM)[20],實(shí)現(xiàn)母豬姿態(tài)轉(zhuǎn)換高危動(dòng)作識(shí)別,試驗(yàn)驗(yàn)證本文方法的有效性。
本文試驗(yàn)數(shù)據(jù)均采集于廣東佛山市三水區(qū)樂(lè)家莊養(yǎng)殖場(chǎng),于2016年5月30日、11月29~30日;2017年4月19~20日、4月25日;及2018年9月5~6日共進(jìn)行5次采集,見表1。其中試驗(yàn)數(shù)據(jù)B1和B4以盡可能涵蓋不同哺乳母豬尺寸為目的,每間豬欄采集時(shí)間長(zhǎng)度為30~50 min不等。試驗(yàn)數(shù)據(jù)B2、B3和B5以盡可能涵蓋母豬全天活動(dòng)為目的,每次僅針對(duì)1欄或2欄豬進(jìn)行采集。每間豬欄長(zhǎng)3.8 m×寬2.0 m,均包含一頭哺乳母豬及6~10只日齡為2~21 d的仔豬。通過(guò)在豬欄頂部架設(shè)Kinect2.0攝像頭垂直向下以5幀/s的速率采集RGB-D視頻圖像。為拍攝豬欄整個(gè)區(qū)域,攝像機(jī)架設(shè)的高度為200~230 cm。本文利用512×424像素的深度視頻圖像,研究母豬姿態(tài)轉(zhuǎn)換行為自動(dòng)識(shí)別。
表1 試驗(yàn)數(shù)據(jù)集
訓(xùn)練集、驗(yàn)證集和測(cè)試集應(yīng)保證數(shù)據(jù)多樣性,涵蓋不同母豬尺寸、姿態(tài)和轉(zhuǎn)換行為。本文選擇表1中B1、B4和B5的部分?jǐn)?shù)據(jù)作為Faster R-CNN和HMM訓(xùn)練集和驗(yàn)證集,B2和B3作為本文算法的測(cè)試集,用于算法的性能評(píng)價(jià)。
對(duì)于Faster R-CNN的訓(xùn)練集和驗(yàn)證集,為避免時(shí)序相關(guān)性等問(wèn)題,從B1、B4和B5每個(gè)母豬姿態(tài)未發(fā)生變化的視頻段中隨機(jī)抽取至多5張深度圖像,并在動(dòng)物行為專家指導(dǎo)下人工標(biāo)注母豬邊界框和姿態(tài)類別,分別得到站立2 941、坐2 916、趴臥2 844和側(cè)臥3 085張。再?gòu)拿款愖藨B(tài)中各隨機(jī)抽取1 500張,共6 000張,分別做順時(shí)針90°、180°和270°旋轉(zhuǎn)以及水平、垂直鏡像擴(kuò)增,最后形成36 000張圖像作為訓(xùn)練集,剩余站立1 441、坐1 416、趴臥1 344和側(cè)臥1 585張作為驗(yàn)證集。
對(duì)于HMM的訓(xùn)練集和驗(yàn)證集,在B1、B4和B5中隨機(jī)截取發(fā)生與未發(fā)生姿態(tài)轉(zhuǎn)換的圖像序列各120段,共240段。根據(jù)姿態(tài)轉(zhuǎn)換持續(xù)時(shí)間的統(tǒng)計(jì)結(jié)果,每段圖像序列長(zhǎng)度為60~120幀不等。隨機(jī)抽取其中120段作為訓(xùn)練集,其余作為驗(yàn)證集。
選取試驗(yàn)數(shù)據(jù)B2和B3作為測(cè)試樣本。B2從連續(xù)時(shí)間長(zhǎng)度29.7 h的視頻中人工選取了26個(gè)片段,每段長(zhǎng)度30~47 min,在切分視頻前剔除了部分母豬長(zhǎng)時(shí)間保持不動(dòng)的視頻片段。B3為連續(xù)29 h的視頻,為適應(yīng)圖像處理服務(wù)器計(jì)算性能和內(nèi)存的限制,簡(jiǎn)單地將29 h的視頻人工切分成34個(gè)長(zhǎng)視頻,每段長(zhǎng)度36~50 min。人工切分并未涉及母豬姿態(tài)轉(zhuǎn)換的起始和結(jié)束,僅為適應(yīng)服務(wù)器內(nèi)存的限制,并在切分時(shí)避免切分點(diǎn)落在姿態(tài)轉(zhuǎn)換的過(guò)程中。而母豬姿態(tài)轉(zhuǎn)換的起始和結(jié)束以及姿態(tài)轉(zhuǎn)換的類別,是通過(guò)算法從該部分長(zhǎng)視頻識(shí)別出來(lái)。在各長(zhǎng)視頻分析之后,將分析結(jié)果按時(shí)間順序拼接即可獲得1d以上(29 h)的識(shí)別結(jié)果。
由于B2和B3視頻數(shù)據(jù)共包含380 073張深度視頻幀,人工標(biāo)注每張視頻幀中母豬的邊界框工作量過(guò)大,本文從每個(gè)母豬姿態(tài)未發(fā)生變化的視頻段中隨機(jī)抽取至多5張深度圖像進(jìn)行邊界框標(biāo)注,分別得到站立967、坐940、趴臥972和側(cè)臥984張,作為Faster R-CNN測(cè)試集。并人工標(biāo)注了視頻中母豬各類姿態(tài)及姿態(tài)轉(zhuǎn)換的起始幀和結(jié)束幀,姿態(tài)轉(zhuǎn)換共156次,用于與本文算法自動(dòng)識(shí)別結(jié)果對(duì)比,評(píng)價(jià)本文算法的識(shí)別精度。
母豬自高向低的俯身動(dòng)作對(duì)仔豬威脅較大,被稱為高危動(dòng)作[21]。Weary等[22]通過(guò)觀察統(tǒng)計(jì)發(fā)現(xiàn),仔豬被擠壓主要發(fā)生于以下三類姿態(tài)轉(zhuǎn)換:1)母豬從站立轉(zhuǎn)至臥,易造成仔豬在母豬腹部被擠壓;2)母豬從坐轉(zhuǎn)換至臥,其中從坐轉(zhuǎn)換至側(cè)臥對(duì)仔豬威脅較大;3)母豬在趴臥與側(cè)臥間轉(zhuǎn)換,該類轉(zhuǎn)換對(duì)仔豬威脅較小。而母豬的身體上升動(dòng)作極少發(fā)生擠壓仔豬的情況。按照對(duì)仔豬的威脅程度和發(fā)生頻率,本文將母豬的姿態(tài)轉(zhuǎn)換分為4類,見表2。
表2 母豬4類姿態(tài)轉(zhuǎn)換
注:DM1為下行轉(zhuǎn)換1,DM2為下行轉(zhuǎn)換2,AM為上行轉(zhuǎn)換,RL為翻身。下同。
Note: DM1 represents descending movements 1, DM2 represents descending movements 2, AM represents ascending movements, and RL represents rolling. The same below.
由于母豬姿態(tài)轉(zhuǎn)換中身體的形變和姿態(tài)的多樣性,易導(dǎo)致Faster R-CNN目標(biāo)定位和姿態(tài)識(shí)別錯(cuò)誤。因此,本文提出母豬定位管道算法(詳見2.3節(jié))以修正目標(biāo)定位錯(cuò)誤,將疑似姿態(tài)轉(zhuǎn)換視頻片段的定位管道中母豬身體各部分高度序列輸入HMM,來(lái)進(jìn)一步判斷是否發(fā)生姿態(tài)轉(zhuǎn)換。
本文姿態(tài)轉(zhuǎn)換識(shí)別方法分為5個(gè)步驟(見圖1):第一步,深度圖像質(zhì)量增強(qiáng)。第二步,用改進(jìn)Faster R-CNN[7]進(jìn)行目標(biāo)定位和姿態(tài)分類,選取每幀中概率最大的姿態(tài),形成姿態(tài)序列,用于第三步疑似轉(zhuǎn)換片段的檢測(cè)和第五步單一姿態(tài)片段的分類;同時(shí)保留概率最大的前5個(gè)檢測(cè)框作為候選區(qū)域,用于第三步的管道構(gòu)建。第三步,針對(duì)姿態(tài)序列中存在分類錯(cuò)誤,根據(jù)時(shí)序相關(guān)性,采用時(shí)間域中值濾波進(jìn)行修正,再檢測(cè)視頻中的疑似轉(zhuǎn)換片段,通過(guò)維特比算法構(gòu)建各疑似轉(zhuǎn)換片段的最大分?jǐn)?shù)定位管道。第四步,在定位管道中,用最大類間方差法(Otsu)[23]分割各幀的母豬,計(jì)算各幀母豬軀干部、尾部和身體兩側(cè)的高度,形成高度序列特征。第五步,將每個(gè)定位管道的高度序列輸入HMM模型,將疑似轉(zhuǎn)換片段分為姿態(tài)轉(zhuǎn)換與未轉(zhuǎn)換片段;根據(jù)轉(zhuǎn)換前后單一姿態(tài)片段的類別對(duì)姿態(tài)轉(zhuǎn)換片段進(jìn)行分類,最終實(shí)現(xiàn)姿態(tài)轉(zhuǎn)換識(shí)別。
圖1 母豬姿態(tài)轉(zhuǎn)換識(shí)別流程圖
豬舍環(huán)境下,豬舍內(nèi)部粉塵阻擋造成Kinect2.0的紅外光無(wú)法投射至拍攝對(duì)象,且未封閉豬舍四周受陽(yáng)光直射導(dǎo)致該部分區(qū)域反射回的紅外光捕獲困難,因此,采集到的深度圖像存在大量噪聲。本文采用5×5中值濾波器進(jìn)行去噪,通過(guò)限制對(duì)比度自適應(yīng)直方圖均衡化提高圖像對(duì)比度。
采用改進(jìn)Faster R-CNN[7]在視頻幀上進(jìn)行母豬定位和姿態(tài)分類,該網(wǎng)絡(luò)主要包含卷積層、最大池化層、正則化層、殘差結(jié)構(gòu)和全連接層。模型在共享卷積層中引入殘差結(jié)構(gòu),提升模型收斂速度和精度[24];并引入Center Loss監(jiān)督信號(hào),以增大不同姿態(tài)類別間特征的距離,減少類內(nèi)差異[25]。
模型的訓(xùn)練基于Caffe框架[26],用隨機(jī)梯度下降法和反向傳播算法微調(diào)參數(shù),最大迭代次數(shù)為9×104,其中前6×104次學(xué)習(xí)率為10-3,后3×104次學(xué)習(xí)率為10-4,沖量為0.9,權(quán)值的衰減系數(shù)為5×10-4,mini-batch為256,對(duì)網(wǎng)絡(luò)層采用均值為0,標(biāo)準(zhǔn)差為0.1的高斯分布初始化。根據(jù)深度圖像中母豬的尺寸,將錨點(diǎn)面積設(shè)置為962、1282和1602像素,長(zhǎng)寬比設(shè)置為1:1、1:3和3:1。當(dāng)檢測(cè)框與人工標(biāo)注框交并比超過(guò)閾值0.7并且類別一致,則認(rèn)為檢測(cè)結(jié)果正確。
測(cè)試時(shí),將B2和B3的深度視頻圖像逐幀輸入至該模型,選取每幀中概率最大的姿態(tài)形成姿態(tài)序列,用于第三步疑似轉(zhuǎn)換片段的檢測(cè)和第五步單一姿態(tài)片段的分類,同時(shí)保留每幀概率最大的前5個(gè)檢測(cè)框作為候選區(qū)域,用于第三步定位管道構(gòu)建。
由于母豬姿態(tài)轉(zhuǎn)換行為中姿態(tài)連續(xù)地變化,其姿態(tài)不確定,隨著時(shí)間推移可能一定程度上與站、坐、側(cè)臥或趴臥其中一種相似,也可能相異于任何姿態(tài)。如圖2左右虛線之間部分為母豬從趴臥到站立的轉(zhuǎn)換。在未轉(zhuǎn)換前,趴臥姿態(tài)概率最高;轉(zhuǎn)換過(guò)程中,4類姿態(tài)的概率均較低且最大概率變化頻繁,導(dǎo)致Faster R-CNN輸出姿態(tài)類別不確定和頻繁變化;姿態(tài)轉(zhuǎn)換結(jié)束后,站立姿態(tài)概率最高。
圖2 母豬從趴臥轉(zhuǎn)站立的前后姿態(tài)概率圖
因此,在利用時(shí)序相關(guān)性以長(zhǎng)度為5幀的中值濾波進(jìn)行姿態(tài)類別的修正之后,根據(jù)姿態(tài)轉(zhuǎn)換頻繁程度檢測(cè)疑似姿態(tài)轉(zhuǎn)換片段。采用長(zhǎng)度為20幀、步長(zhǎng)為1幀的滑動(dòng)窗口計(jì)算每個(gè)窗口內(nèi)姿態(tài)序列的變化次數(shù)。通過(guò)統(tǒng)計(jì)姿態(tài)轉(zhuǎn)換片段中姿態(tài)變化次數(shù),最終截取變化次數(shù)大于3的片段作為疑似轉(zhuǎn)換片段,其余片段作為單一姿態(tài)片段。在疑似轉(zhuǎn)換片段中,部分片段是由Faster R-CNN姿態(tài)識(shí)別錯(cuò)誤而被誤認(rèn)為姿態(tài)轉(zhuǎn)換,實(shí)際上母豬未發(fā)生轉(zhuǎn)換,將該類片段稱為未轉(zhuǎn)換片段。因此,需對(duì)疑似轉(zhuǎn)換片段進(jìn)一步判斷,分為姿態(tài)轉(zhuǎn)換片段和未轉(zhuǎn)換片段兩類(詳見2.5節(jié))。為降低Faster R-CNN目標(biāo)定位錯(cuò)誤對(duì)姿態(tài)轉(zhuǎn)換判斷造成的影響,本文參考文獻(xiàn)[16]的動(dòng)作管道思想,提出母豬定位管道建立方法,對(duì)第二步獲得的各幀中最大概率的前5個(gè)候選區(qū)域,計(jì)算相鄰幀候選區(qū)域的連接分?jǐn)?shù):
注:t表示疑似轉(zhuǎn)換片段中的第t幀。
母豬發(fā)生姿態(tài)轉(zhuǎn)換時(shí),由于身體的變形和姿態(tài)的多樣性難以利用母豬身體形狀信息,而身體高度則會(huì)發(fā)生相應(yīng)變化。因此,可用身體不同部位的高度時(shí)間序列對(duì)疑似轉(zhuǎn)換片段進(jìn)行分類。考慮到母豬在站立和坐姿下,頭部高度及朝向經(jīng)常發(fā)生變化,本文利用母豬身體部分高度,而不用頭部高度。
對(duì)優(yōu)化后的定位管道,逐幀在檢測(cè)框中采用Otsu和形態(tài)學(xué)處理將母豬從背景中分割出來(lái)。由于在自由欄豬舍中,母豬大部分時(shí)間靠墻活動(dòng),通過(guò)Otsu分割出來(lái)的母豬常與墻體和仔豬相連,本文采用霍夫變換檢測(cè)Otsu分割結(jié)果中的墻壁并將其剔除,再進(jìn)行閉操作,斷開母豬與仔豬黏連,最后將分割結(jié)果映射至原始深度圖像。
不同的母豬姿態(tài)豬體各部位的高度值也不同,為提取該高度特征,對(duì)豬體劃分區(qū)域,見圖4。考慮到圖像中豬體角度不同,在劃分前統(tǒng)一對(duì)目標(biāo)區(qū)域進(jìn)行雙三次插值法旋轉(zhuǎn)。用圓弧度判斷母豬尾部和頭部,再連接頭尾2點(diǎn)形成直線,于直線1/4和3/4點(diǎn)處分別作垂線,將母豬豬體分為頭部(區(qū)域1和4)、軀干部(區(qū)域2和5)、尾部(區(qū)域3和6)及身體上側(cè)(區(qū)域2和3)和下側(cè)(區(qū)域5和6)。計(jì)算母豬軀干部、尾部和身體上下兩側(cè)各部分的平均高度,合并各幀結(jié)果,形成疑似轉(zhuǎn)換片段的高度序列,即屬性個(gè)數(shù)為4、長(zhǎng)度為60~120不等的高度時(shí)間序列特征。對(duì)每段高度序列作減均值預(yù)處理。
圖4 母豬區(qū)域劃分示意圖
將高度時(shí)間序列特征輸入連續(xù)型HMM[28]進(jìn)行分類,并選用混合高斯概率密度函數(shù)描述每個(gè)隱狀態(tài)的分布。一個(gè)HMM模型可記為:
式中為隱狀態(tài)數(shù),為高斯混合模型中核函數(shù)的個(gè)數(shù),為隱狀態(tài)轉(zhuǎn)移概率矩陣,為概率分布函數(shù)矢量,為初始隱狀態(tài)概率矢量。
對(duì)HMM設(shè)置核函數(shù)個(gè)數(shù)和隱狀態(tài)數(shù)均為2,最大迭代次數(shù)為500,訓(xùn)練最小誤差為10-6。重復(fù)10次試驗(yàn),保留準(zhǔn)確率最高的模型。測(cè)試時(shí),將采集的B2和B3數(shù)據(jù)中所有疑似轉(zhuǎn)換片段高度序列輸入至HMM進(jìn)行分類,其中未轉(zhuǎn)換片段與單一姿態(tài)片段合并,形成分段結(jié)果。
考慮到未轉(zhuǎn)換片段伴有多幀姿態(tài)誤分類,對(duì)于合并后的單一姿態(tài)片段,用片段中姿態(tài)序列占比最大的類別作為單一姿態(tài)的類別。再根據(jù)轉(zhuǎn)換前后單一姿態(tài)片段的類別對(duì)姿態(tài)轉(zhuǎn)換片段分類,形成最終結(jié)果。
本文采用分類混淆矩陣[29]評(píng)價(jià)Faster R-CNN模型。用準(zhǔn)確率(accuracy)評(píng)價(jià)HMM模型:
采用成功定位率(success plot)[30]評(píng)價(jià)定位管道提取結(jié)果,反映目標(biāo)跟蹤的準(zhǔn)確性。用召回率(recall)和精度(precision)[31]評(píng)價(jià)姿態(tài)轉(zhuǎn)換片段識(shí)別結(jié)果,以綜合反映分段點(diǎn)定位和片段分類的準(zhǔn)確性。各指標(biāo)定義:
算法開發(fā)環(huán)境具體如下:計(jì)算機(jī)處理器為Intel(R) Xeon(R) E3-1246 v3,主頻3.50 GHz,內(nèi)存32G,GPU為NVIDIA GTX980Ti,操作系統(tǒng)為Windows7-64bit,算法開發(fā)平臺(tái)為Matlab 2014b。
在測(cè)試集上,按算法流程分別計(jì)算Faster R-CNN模型的分類混淆矩陣、管道的成功定位率及HMM模型準(zhǔn)確率;最后,整體評(píng)價(jià)本算法對(duì)姿態(tài)轉(zhuǎn)換識(shí)別的精度和召回率。
表3為Faster R-CNN模型的分類混淆矩陣(IoU>0.7,測(cè)試集見1.2節(jié))。四類姿態(tài)的平均精度和平均召回率分別為98.49%和92.35%,檢測(cè)速度為0.058 s/幀。
表3表明,F(xiàn)aster R-CNN模型識(shí)別精度較高,同時(shí)檢測(cè)時(shí)間低于采樣間隔(0.2 s),實(shí)時(shí)性較強(qiáng),對(duì)于身體花色與尺寸不同的母豬,該模型具有較好的泛化能力。
表3 Faster R-CNN模型在測(cè)試集上的分類混淆矩陣
注:STD為站立,SIT為坐立,VTL為趴臥,LTL為側(cè)臥。下同。
Note: STD represents standing, SIT represents sitting, VTL representsventral lying, and LTL representslateral lying. The same below.
圖5表明,優(yōu)化后的定位管道對(duì)豬舍環(huán)境下的母豬有良好的跟蹤效果,有效克服了熱燈光線、母豬身體形變等對(duì)跟蹤的影響,說(shuō)明本文算法有較高的魯棒性。
圖5 不同交并比閾值下的成功定位率
通過(guò)本文算法的前3個(gè)步驟,在測(cè)試集中共檢測(cè)出433個(gè)疑似轉(zhuǎn)換片段,經(jīng)人工判別其中158個(gè)為姿態(tài)轉(zhuǎn)換片段,剩余275個(gè)為未轉(zhuǎn)換片段。用HMM模型判斷轉(zhuǎn)換和未轉(zhuǎn)換片段,真正類、真負(fù)類的數(shù)量分別為153和266,準(zhǔn)確率為96.77%。
HMM模型能有效地對(duì)疑似轉(zhuǎn)換片段進(jìn)行分類。對(duì)實(shí)際發(fā)生姿態(tài)轉(zhuǎn)換,而算法錯(cuò)分為未轉(zhuǎn)換的片段,該類情況主要發(fā)生在翻身及坐與趴臥間轉(zhuǎn)換,原因在于轉(zhuǎn)換前后母豬身體運(yùn)動(dòng)幅度小,高度變化不明顯。對(duì)實(shí)際未發(fā)生轉(zhuǎn)換,而算法錯(cuò)分為姿態(tài)轉(zhuǎn)換的片段,主要由于深度圖像噪聲影響導(dǎo)致母豬身體部分缺失,各部位高度計(jì)算錯(cuò)誤。
將B2和B3視頻數(shù)據(jù)輸入本文算法,可得到姿態(tài)轉(zhuǎn)換識(shí)別的結(jié)果。表4為4類姿態(tài)轉(zhuǎn)換的分類混淆矩陣(=0.5),測(cè)試集中母豬姿態(tài)轉(zhuǎn)換共156次,檢測(cè)出的139個(gè)姿態(tài)轉(zhuǎn)換片段與人工分段結(jié)果類別一致,平均精度為93.67%,召回率為87.84%。
表4 測(cè)試集姿態(tài)轉(zhuǎn)換分類混淆矩陣
由于大部分時(shí)段母豬姿態(tài)轉(zhuǎn)換頻率較低,為突出本文算法效果,圖6僅從測(cè)試集試驗(yàn)結(jié)果中截取轉(zhuǎn)換次數(shù)較多的視頻段進(jìn)行展示,每段截取長(zhǎng)度為15 min。其中,粉紅色的部分為姿態(tài)轉(zhuǎn)換片段,其他顏色分別表示不同的姿態(tài)??梢钥闯?,與基于Faster R-CNN單幀檢測(cè)相比,本文算法分段效果更好。圖6a中,F(xiàn)aster R-CNN在識(shí)別母豬站立、坐的過(guò)程中伴有多幀誤分類,通過(guò)定位管道和HMM模型將該部分疑似轉(zhuǎn)換片段歸為未轉(zhuǎn)換片段,進(jìn)而修正視頻幀類別;圖6b是將本文算法應(yīng)用于夜間場(chǎng)景中,所選視頻段環(huán)境光線均偏暗,但從分段結(jié)果來(lái)看,大部分片段定位正確。說(shuō)明本文算法對(duì)光線變化問(wèn)題有較好的泛化能力,解決了環(huán)境光線給識(shí)別帶來(lái)的困難。但圖6c最后一次轉(zhuǎn)換分段點(diǎn)位置不夠準(zhǔn)確,主要因?yàn)樵谀肛i轉(zhuǎn)換姿態(tài)后部分視頻幀分類錯(cuò)誤,而采用滑動(dòng)窗口的方法會(huì)提取全部姿態(tài)序列中變化次數(shù)較多的片段,未能準(zhǔn)確定位姿態(tài)轉(zhuǎn)換片段的起始和結(jié)束幀;圖6d則存在分段點(diǎn)過(guò)多的問(wèn)題,主要原因是母豬在姿態(tài)轉(zhuǎn)換期間為避免踩壓仔豬,會(huì)先轉(zhuǎn)換至其他過(guò)渡姿態(tài)后暫停,出現(xiàn)環(huán)視周圍及拱鼻探查仔豬等母性行為,確保仔豬安全后再進(jìn)行轉(zhuǎn)換。該類母豬過(guò)渡姿態(tài)的不確定性易被誤識(shí)別為發(fā)生兩次姿態(tài)轉(zhuǎn)換,進(jìn)而導(dǎo)致分段點(diǎn)過(guò)多。
圖7為本文算法對(duì)測(cè)試集連續(xù)29 h視頻(B3)的姿態(tài)轉(zhuǎn)換自動(dòng)識(shí)別結(jié)果,共檢測(cè)出84次轉(zhuǎn)換。21:00~7:00母豬處于夜間休息時(shí)間,姿態(tài)轉(zhuǎn)換頻率較低,而日間母豬姿態(tài)轉(zhuǎn)換較為頻繁,識(shí)別結(jié)果符合人工視頻觀察和母豬動(dòng)物行為的規(guī)律[6]。
圖6 部分測(cè)試樣本分段結(jié)果
圖7 本文算法在29 h連續(xù)視頻上的自動(dòng)識(shí)別結(jié)果
本文研究可為哺乳母豬高危動(dòng)作自動(dòng)識(shí)別中的姿態(tài)轉(zhuǎn)換行為識(shí)別提供思路,為后續(xù)母豬福利狀態(tài)評(píng)估等研究打下基礎(chǔ),但仍存在以下不足:1)Faster R-CNN模型識(shí)別精度有待提高,深度圖難以辨別母豬部分姿態(tài),后續(xù)工作考慮融合RGB圖像。2)分段點(diǎn)定位及片段分類的準(zhǔn)確性有待提高,改進(jìn)分段算法將更有利于識(shí)別母豬姿態(tài)轉(zhuǎn)換。
[1] Marchant J N, Broom D M, Corning S. The influence of sow behaviour on piglet mortality due to crushing in an open farrowing system[J]. Animal Science, 2001, 72(1): 19-28.
[2] Lao F, Brown-Brandl T, Stinn J P, et al. Automatic recognition of lactating sow behaviors through depth image processing[J]. Computers & Electronics in Agriculture, 2016, 125(C): 56-62.
[3] Wang J S, Wu M C, Chang H L, et al. Predicting parturition time through ultrasonic measurement of posture changing rate in crated landrace sows[J]. Asian Australasian Journal of Animal Sciences, 2007, 20(5): 682-692.
[4] Thompson R, Matheson S M, Pl?tz T, et al. Porcine lie detectors: Automatic quantification of posture state and transitions in sows using inertial sensors[J]. Computers & Electronics in Agriculture, 2016, 127: 521-530.
[5] 閆麗, 沈明霞, 謝秋菊, 等. 哺乳母豬高危動(dòng)作識(shí)別方法研究[J]. 農(nóng)業(yè)機(jī)械學(xué)報(bào), 2016, 47(1): 266-272.Yan Li, Shen Mingxia, Xie Qiuju, et al. Research on recognition method of lactating sows' dangerous body movement[J]. Transactions of The Chinese Society for Agricultural Machinery, 2016, 47(1): 266-272. (in Chinese with English abstract)
[6] Zheng C, Zhu X, Yang X, et al. Automatic recognition of lactating sow postures from depth images by deep learning detector[J]. Computers and Electronics in Agriculture, 2018, 147: 51-63.
[7] 薛月菊, 朱勛沐, 鄭嬋, 等. 基于改進(jìn)FasterR-CNN識(shí)別深度視頻圖像哺乳母豬姿態(tài)[J]. 農(nóng)業(yè)工程學(xué)報(bào), 2018, 34(9): 189-196.Xue Yueju, Zhu Xunmu, Zheng Chan, et al. Lactating sow postures recognition from depth image of videos based on improved Faster R-CNN[J]. Transactions of the Chinese Society for Agricultural Engineering (Transactions of the CSAE), 2018, 34(9): 189-196. (in Chinese with English abstract)
[8] Jhuang H, Gall J, Zuffi S, et al. Towards Understanding Action Recognition[C]// IEEE International Conference on Computer Vision, 2014.
[9] Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos[C]. Advances in Neural Information Processing Systems, 2014: 568-576.
[10] Aggarwal J K, Ryoo M S. Human activity analysis: A review[J]. Acm Computing Surveys, 2011, 43(3): 1-43.
[11] Wang H, Schmid C. Action recognition with improved trajectories[C]// Proceedings of the IEEE International Conference on Computer Vision, 2013: 3551-3558.
[12] Peng X, Zou C, Qiao Y, et al. Action recognition with stacked fisher vectors[C]// European Conference on Computer Vision, 2014: 581-595.
[13] Du Y, Yuan C, Li B, et al. Hierarchical nonlinear orthogonal adaptive-subspace self-organizing map based feature extraction for human action recognition[C]// Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[14] Shuiwang J, Ming Y, Kai Y. 3D convolutional neural networks for human action recognition[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2013, 35(1): 221-231.
[15] Feichtenhofer C, Pinz A, Zisserman A. Convolutional two- stream network fusion for video action recognition[C]. Computer Vision & Pattern Recognition, 2016.
[16] Gkioxari G, Malik J. Finding action tubes[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recogn-ition, 2015: 759-768.
[17] Saha S, Singh G, Sapienza M, et al. Deep learning for detecting multiple space-time action tubes in videos[C]// British Machine Vision Conference, 2016.
[18] Miao M, Marturi N, Li Y, et al. Region-sequence based six-stream CNN features for general and fine-grained human action recognition in videos[J]. Pattern Recognition, 2017, 76: 506-521.
[19] Ren S, He K, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(6): 1137-1149.
[20] Rabiner L, Juang B. An introduction to hidden Markov models[J]. IEEE ASSP Magazine, 1986, 3(1): 4-16.
[21] Moustsen V, Hales J, Lahrmann H, et al. Confinement of lactating sows in crates for 4 days after farrowing reduces piglet mortality[J]. Animal, 2013, 7(4): 648-654.
[22] Weary D M, Pajor E A, Fraser D, et al. Sow body movements that crush piglets: A comparison between two types of farrowing accommodation[J]. Applied Animal Behaviour Science, 1996, 49(2): 149-158.
[23] Ohtsu N. A threshold selection method from gray-level histograms[J]. IEEE Transactions on Systems Man & Cybernetics, 2007, 9(1): 62-66.
[24] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770-778.
[25] Wen Y, Zhang K, Li Z, et al. A Discriminative Feature Learning Approach for Deep Face Recognition[M]. Springer International Publishing, 2016: 499-515.
[26] Jia Y, Shelhamer E, Donahue J, et al. Caffe: Convolutional architecture for fast feature embedding[C]// Proceedings of the 22nd ACM International Conference on Multimedia, 2014: 675-678.
[27] Viterbi A. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm[J]. IEEE Transactions on Information Theory, 1967, 13(2): 260-269.
[28] Guo D, Zhou W, Wang M, et al. Sign language recognition based on adaptive HMMS with data augmentation[C]// IEEE International Conference on Image Processing, 2016: 2876-2880.
[29] Stehman S V. Selecting and interpreting measures of thematic classification accuracy[J]. Remote Sensing of Environment, 1997, 62(1): 77-89.
[30] Wu Y, Lim J, Yang M H. Online object tracking: A benchmark[C]// Computer Vision and Pattern Recognition, 2013: 2411-2418.
[31] Gao J, Sun C, Yang Z, et al. TALL: Temporal activity localization via language query[C]// Proceedings of the IEEE International Conference on Computer Vision, 2017: 5277-5285.
Lactating sow high-dangerous body movement recognition from depth videos based on hidden Markov model
Xue Yueju1, Yang Xiaofan1, Zheng Chan2, Chen Changxin1, Gan Haiming1, Li Shimei1
(1.,,510642,; 2.510642,)
The high-dangerous body movements of lactating sows are closely related to the survival rate of piglets, which can directly reflect their maternal behavioral ability, and these movements are closely related to the frequency and duration of posture changing. Under the scene of commercial piggery, the influence of illumination variations, heat lamps, sow and piglet adhesion, body deformation, etc., brings great difficulties and challenges to the automatic identification of the posture changes of lactating sows. This study took the Small-ears Spotted sows raised in the Lejiazhuang farm in Sanshui District, Foshan City, Guangdong Province as the research object, and used the depth video images collected by Kinect2.0 as the data source. We proposed a localization and recognition algorithm of sow posture changes based on Faster R-CNN and HMM(hidden Markov model )models from the depth videos. Our algorithm consists of five steps: 1) a 5×5 median filter is used to denoise the video images, and then the contrast of images are improved by contrast limited adaptive histogram equalization;2) an improved Faster R-CNN model is used to detect the most probable posture in each frame to form a posture sequence, and the first five detection boxes with the five highest probabilities are reserved as candidate regions for the action tube generation in the third step;3) a sliding window with a length of 20 frames and a step size of 1 frame is used to detect the suspected change segments in the video,and then the maximum score of action tube of each suspected change segment is construct by the Viterbi algorithm;4) each frame is segmented by Otsu and morphological processingin the suspected change segments,and the heights of the sow trunk, tail and both sides of body are calculated to formheightsequences; 5) the height sequence of each suspected change segment is fed into the HMM model and then classified as posture change or non-change, and finally the posture change segments are classified according to the classes of before and after segments belonging to single posture segments. According to the threat degree of sow accident and the frequency of sow behavior, the posture changes of sows were divided into four categories: descending motion 1, descending motion 2, ascending motion and rolling motion.The data set included 240 video segments covering different sow sizes, postures and posture changes. The 120 video segments were chosen as the testing set, and the rest of the video segments were used as training set and validation set for Faster R-CNN and HMM. Our Faster R-CNN model was trained by using Caffe deep learning framework on an NVIDIA GTX 980Ti GPU (graphics processing unit), and the algorithm was developed on Matlab 2014b platform.The experimental results showed that the Faster R-CNN model had high recognition accuracy, and the detection time of each frame was 0.058 seconds, so the model could be practicably used in a real-time detective vision system.For sows with different body colors and sizes, theFaster R-CNN model had good generalization ability. The HMM model could effectively identify the posture change segments from the suspected change segments with an accuracy of 96.77%. The accuracy of posture changeidentification was 93.67%, and the recall rate of the 4 classes of posture change i.e. descending movements 1, descending movements 2, ascending movements and rolling were 90%, 84.21%, 90.77%, 86.36%.The success plot was 97.40% when the threshold was 0.7, which showed that the optimized position tube had a good tracking effect for sows in the scene of commercial piggery, effectively overcoming the influence of heat lamps and body deformation of sows. Our method could provide a technical reference for 24-hour automatic recognition of lactating sow posture changes and make a foundation for the following research on sow high-dangerous body movement recognition and welfare evaluation.
image processing; algorithms; models; high-dangerous body movement; lactating sows; Faster R-CNN; HMM(hidden Markov model); action tube
10.11975/j.issn.1002-6819.2019.13.021
TP391
A
1002-6819(2019)-13-0184-07
2018-12-18
2019-04-24
國(guó)家科技支撐計(jì)劃(2015BAD06B03-3);廣東省科技計(jì)劃項(xiàng)目(2015A020209148);廣東省應(yīng)用型科技研發(fā)項(xiàng)目(2015B010135007);廣州市科技計(jì)劃項(xiàng)目(201605030013);廣州市科技計(jì)劃項(xiàng)目(201604016122)
薛月菊,漢族,新疆烏蘇人,教授,研究領(lǐng)域?yàn)闄C(jī)器視覺(jué)與圖像處理。Email:xueyueju@163.com
薛月菊,楊曉帆,鄭 嬋,陳暢新,甘海明,李詩(shī)梅.基于隱馬爾科夫模型的深度視頻哺乳母豬高危動(dòng)作識(shí)別[J]. 農(nóng)業(yè)工程學(xué)報(bào),2019,35(13):184-190. doi:10.11975/j.issn.1002-6819.2019.13.021 http://www.tcsae.org
Xue Yueju, Yang Xiaofan, Zheng Chan, Chen Changxin, Gan Haiming, Li Shimei.Lactating sow high-dangerous body movement recognition from depth videos based on hidden Markov model [J]. Transactions of the Chinese Society for Agricultural Engineering (Transactions of the CSAE), 2019, 35(13): 184-190. (in Chinese with English abstract) doi:10.11975/j.issn.1002-6819.2019.13.021 http://www.tcsae.org