楊世強 羅曉宇 喬丹 柳培蕾 李德信
摘 要:針對現(xiàn)有動作識別中對連續(xù)動作識別研究較少且單一算法對連續(xù)動作識別效果較差的問題,提出在單個動作建模的基礎上,采用滑動窗口法和動態(tài)規(guī)劃法結合,實現(xiàn)連續(xù)動作的分割與識別。首先,采用深度置信網(wǎng)絡和隱馬爾可夫結合的模型DBN-HMM對單個動作建模;其次,運用所訓練動作模型的對數(shù)似然值和滑動窗口法對連續(xù)動作進行評分估計,實現(xiàn)初始分割點的檢測;然后,采用動態(tài)規(guī)劃對分割點位置進行優(yōu)化并對單個動作進行識別。在公開動作數(shù)據(jù)庫MSR Action3D上進行連續(xù)動作分割與識別測試,結果表明基于滑動窗口的動態(tài)規(guī)劃能夠優(yōu)化分割點的選取,進而提高識別精度,能夠用于連續(xù)動作識別。
關鍵詞:隱馬爾可夫模型;動作分割;動作識別;滑動窗口;動態(tài)規(guī)劃
中圖分類號: TP391.4
文獻標志碼:A
Abstract: Concerning the fact that there are few researches on continuous action recognition in the field of action recognition and single algorithms have poor effect on continuous action recognition, a segmentation and recognition method of continuous actions was proposed based on single motion modeling by combining sliding window method and dynamic programming method. Firstly, the single action model was constructed based on the Deep Belief Network and Hidden Markov Model (DBN-HMM). Secondly, the logarithmic likelihood value of the trained action model and the sliding window method were used to estimate the score of the continous action, detecting the initial segmentation points. Thirdly, the dynamic programming method was used to optimize the location of the segmentation points and identify the single action. Finally, the testing experiments of continuous action segmentation and recognition were conducted with an open action database MSR Action3D. The experimental results show that the dynamic programming based on sliding window can optimize the selection of segmentation points to improve the recognition accuracy, which can be used to recognize continuous action.
Key words: Hidden Markov Model (HMM); action segmentation; action recognition; sliding window; dynamic programming
0 引言
人體動作識別是近年來諸多鄰域研究的熱點[1], 如視頻監(jiān)控[2]、人機交互[3]等領域。隨著人口老齡化,服務機器人將在未來的日常生活中發(fā)揮重要作用,觀察和反映人類行動將成為服務機器人的基本技能[4]。動作識別逐漸應用到人們生活和工作的各個方面,具有深遠的應用價值。
動作行為一般是以連續(xù)動作的形式來體現(xiàn),包含多個單一動作,行為識別時根據(jù)分割與識別的前后關系,可分為直接分割和間接分割。直接分割是先根據(jù)簡單的參數(shù)大小變化確定分割邊界,然后識別分割好的片段,如白棟天等[5]根據(jù)關節(jié)速度、關節(jié)角度的變化對動作序列進行初始分割,該方法較為簡單快速,但對于較復雜的連續(xù)動作分割誤差較大。間接分割是分割與識別同時進行,連續(xù)動作的分割與識別在實際中相互耦合,動作分割結果會影響動作識別,且動作分割一般需要動作識別的支持。在連續(xù)動作的識別中使用較多的算法有動態(tài)時間規(guī)整(Dynamic Time Warping, DTW)[6]、連續(xù)動態(tài)規(guī)劃(Continuous Dynamic Programming, CDP)[7]和隱馬爾可夫模型(Hidden Markov Model, HMM)。
Gong等[8]用動態(tài)流形變化法(Dynamic Manifold Warping, DMW)計算兩個多變量時間序列之間的相似性,實現(xiàn)動作分割與識別。Zhu等[9]利用基于特征位差的在線分割方法將特征序列劃分為姿態(tài)特征段和運動特征段,通過在線模型匹配計算每個特征片段可以被標記為提取的關鍵姿態(tài)或原子運動的似然概率。Lei等[10]提出了一種結合卷積神經(jīng)網(wǎng)絡(Convolutional Neural Network, CNN)和HMM的分層框架(CNN-HMM),對連續(xù)動作同時進行分割與識別,能提取有效魯棒的動作特征,對動作視頻序列取得了較好的識別結果,而且HMM具有較強的可擴展性。Kulkarni等[11] 設計了一種視覺對準技術動態(tài)幀規(guī)整(Dynamic Frame Warping, DFW),對每個動作視頻訓練超級模板,能實現(xiàn)多個動作的分割與識別;但在測試中,測試動作序列幀與模板的距離計算復雜度較高,且與概率統(tǒng)計的方法相比,模型訓練方面學習能力較低。Evangelidis等[12]采用滑動窗口來構造框架式Fisher向量,由多類支持向量機(Support Vector Machine, SVM)進行分類,由于滑動窗口法對動作序列長度的固定,導致對相同類別且長度有較大差異的動作識別較差。
和HMM結合的復合模型DBN-HMM對單個動作建模,該復合模型對時序數(shù)據(jù)具有較強的建模能力和模型學習能力,然后利用評分機制和滑動窗口法對初始分割點進行檢測,最后用動態(tài)規(guī)劃法進行分割點優(yōu)化與識別。利用滑動窗口可降低動態(tài)規(guī)劃計算復雜度,而動態(tài)規(guī)劃能彌補滑動窗口固定長度的缺陷,最終實現(xiàn)最優(yōu)分割點的檢測。
1 單個動作建模
連續(xù)動作識別中首先對連續(xù)動作中的單個動作分別建模,在此使用DBN與HMM相結合的模型DBN-HMM對動作建模。
1.1 特征提取
人體動作可以表示為三維空間中人體不同肢體的旋轉變化,結合由關節(jié)點組成的人體骨架模型,可由人體的20個關節(jié)點在空間中的三維坐標表示人體姿態(tài),各關節(jié)點位置分別為:頭部、左/右肩關節(jié)、肩膀中心、左/右肘關節(jié)、脊柱中心、左/右手腕關節(jié)、左/右手、左/右髖關節(jié)、臀部中心、左/右膝蓋、左/右腳踝、左/右腳。在肢體角度模型中,一個肢體由人體20個關節(jié)點中兩個相鄰關節(jié)點在空間中的相對位置來表示。假設所有關節(jié)都是從脊柱關節(jié)點延伸出的,由相鄰兩個關節(jié)點組成的一個肢體中,靠近脊柱關節(jié)的關節(jié)點定義為父關節(jié)點,另一個定義為子關節(jié)點。通過坐標系轉換將世界坐標系轉換為局部球坐標系來表示每個肢體的相對位置信息,以每個肢體中的父關節(jié)點作為球坐標系的原點,子關節(jié)點與父關節(jié)點的連線長度為r,其在球坐標系中與Z軸的夾角為φ,投影到XOY平面上與X軸的夾角為θ,一個肢體角度模型可以表示為(r,θ,φ),如圖1所示。由于距離r包含有人體尺寸的影響,因此去掉距離r,由(θ,φ)表示肢體角度模型。
5 結語
針對現(xiàn)有動作識別中對連續(xù)動作識別研究較少,且單一算法對連續(xù)動作識別效果較差的問題,本文給出了一種連續(xù)動作的分割與識別方法——采用滑動窗口法和動態(tài)規(guī)劃法結合用于連續(xù)動作的分割與識別。建立的DBN-HMM具有較強的建模能力,結合滑動窗口和動態(tài)規(guī)劃對連續(xù)動作分割點進行檢測,使兩種方法互補,既能降低計算復雜度又能彌補固定長度的限制。實驗結果表明,本文方法在復雜連續(xù)動作的分割與識別中獲得了較好的識別結果。不過算法的識別率還有進一步提高的空間,在后續(xù)研究中需考慮開展采集連續(xù)動作視頻的動作分割與識別。
參考文獻
[1] 胡瓊,秦磊,黃慶明.基于視覺的人體動作識別綜述[J].計算機學報,2013,36(12): 2512-2524. (HU Q, QIN L, HUANG Q M. A survey on visual human action recognition [J]. Chinese Journal of Computers, 2013, 36 (12):2512-2524.)
[2] AGGARWAL J K, RYOO M S. Human activity analysis:a review[J]. ACM Computing Surveys, 2011, 43(3): Article No. 16.
[3] KOPPULA H S, SAXENA A. Anticipating human activities using object affordances for reactive robotic response [J]. IEEE Transactions on Pattern analysis and Machine Intelligence, 2015, 38(1): 1-14.
[4] ZHANG C, TIAN Y. RGB-D camera-based daily living activity recognition [J]. Journal of Computer Vision and Image Processing, 2012, 2(4): 1-7.
[5] 白棟天,張磊,黃華.RGB-D視頻中連續(xù)動作識別[J].中國科技論文,2016(2):168-172. (BAI D T, ZHANG L,HUANG H. Recognition continuous human actions from RGB-D videos[J]. China Science Paper, 2016(2): 168-172.)
[6] DARRELL T, PENTLAND A. Space-time gestures [C]// Proceedings of the 1993 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 1993: 335-340.
[7] OKA R. Spotting method for classification of real world data[J]. Computer Journal, 1998, 41(8): 559-565.
[8] GONG D, MEDIONI G, ZHAO X. Structured time series analysis for human action segmentation and recognition [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(7): 1414-1427.
[9] ZHU G, ZHANG L, SHEN P, et al. An online continuous human action recognition algorithm based on the Kinect sensor [J]. Sensors, 2016, 16(2): 161-179.
[10] LEI J, LI G, ZHANG J, et al. Continuous action segmentation and recognition using hybrid convolutional neural network-hidden Markov model model[J]. IET Computer Vision, 2016, 10(6): 537-544.
[11] KULKARNI K, EVANGELIDIS G, CECH J, et al. Continuous action recognition based on sequence alignment[J]. International Journal of Computer Vision, 2015, 112(1): 90-114.
[12] EVANGELIDIS G D, SINGH G, HORAUD R. Continuous gesture recognition from articulated poses [C]// Proceedings of the 2014 European Conference on Computer Vision. Cham: Springer, 2014: 595-607.
[13] SONG Y, GU Y, WANG P, et al. A Kinect based gesture recognition algorithm using GMM and HMM [C]// Proceedings of the 2013 6th International Conference on Biomedical Engineering and Informatics. Piscataway, NJ: IEEE, 2013: 750-754.
[14] VITERBI A J. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm [J]. IEEE Transactions on Information Theory, 1967, 13(2): 260-269.
[15] TAYLOR G W, HINTON G E, ROWEIS S. Modeling human motion using binary latent variables [C]// Proceedings of the 19th International Conference on Neural Information Processing Systems. Cambridge, MA: MIT Press, 2007: 1345-1352.
[16] HINTON G E, SIMON O, TEH Y W, et al. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2014, 18(7): 1527-1554.
[17] LI W, ZHANG Z, LIU Z. Action recognition based on a bag of 3D points [C]// Proceedings of the 2010 IEEE Computer Vision and Pattern Recognition Workshops. Washington, DC: IEEE Computer Society, 2010: 9-14.