侯麗 劉琦
【摘 要】跨攝像機(jī)行人因光照、視角、姿態(tài)的差異,會(huì)使其外觀變化顯著,給行人再識(shí)別的研究帶來(lái)嚴(yán)峻挑戰(zhàn)。文中提出基于深度學(xué)習(xí)和度量學(xué)習(xí)的行人再識(shí)別方法。首先采用手工特征和深度特征融合網(wǎng)絡(luò)FFN提取行人圖像特征,然后將核矩陣應(yīng)用于KISSME距離度量學(xué)習(xí)中,獲取更優(yōu)的距離度量模型。在具有挑戰(zhàn)的VIPeR和PRID450S兩個(gè)公開數(shù)據(jù)集上進(jìn)行仿真實(shí)驗(yàn),實(shí)驗(yàn)結(jié)果表明所提出的行人再識(shí)別算法的有效性。
【關(guān)鍵詞】行人再識(shí)別;特征融合網(wǎng)絡(luò);深度學(xué)習(xí);距離度量學(xué)習(xí)
中圖分類號(hào): TP391.41文獻(xiàn)標(biāo)識(shí)碼: A文章編號(hào): 2095-2457(2019)29-0112-002
DOI:10.19694/j.cnki.issn2095-2457.2019.29.051
Deep Learning and Metric Learning Based Person Re-identification
HOU Li LIU Qi
(School of Information Engineering,Huangshan University,Huangshan Anhui 245041,China)
【Abstract】Pedestrian may vary greatly in appearance due to differences in illumination, viewpoint, and poses across cameras, which can bring serious challenges in person re-identification. A deep learning and metric learning based algorithm is proposed for person re-identification in this paper. Features of human images are first extracted by a feature fusion net (FFN) composed of handcraft features and deep features, and then a kernel matrix is applied to KISSME distance metric learning to obtain a better distance metric model. Experimental results have shown that the proposed algorithm effectively improves recognition rates on two challenging datasets (VIPeR, PRID450s).
【Key words】Person re-identification; Feature fusion net; Deep learning; Distance metric learning
0 引言
行人再識(shí)別屬于一種智能視頻分析技術(shù),對(duì)行人目標(biāo)的跨攝像頭跟蹤以及行人行為分析等具有重要的研究意義。行人再識(shí)別技術(shù),是指讓計(jì)算機(jī)去判斷不同攝像頭拍攝的行人圖像是否具有相同身份,通過(guò)行人的外觀去匹配不同攝像頭拍攝的行人圖像。因監(jiān)控場(chǎng)景的多變性和跨攝像機(jī)行人外觀變化的復(fù)雜性,對(duì)行人再識(shí)別的研究極具挑戰(zhàn)性。
當(dāng)前對(duì)行人再識(shí)別的研究主要集中于兩方面:一是提取具有辨識(shí)力的特征來(lái)描述行人外觀[1-11],二是探索具有辨識(shí)力的距離度量學(xué)習(xí)方法[12-18]。然而,大多數(shù)手工提取的特征(顏色/紋理/形狀等)在進(jìn)行跨攝像機(jī)行人匹配時(shí),或者辨識(shí)力不夠,或者對(duì)視角變化不具有魯棒性。深度特征在一定程度上彌補(bǔ)了手工提取特征的不足,但需要通過(guò)大量樣本的監(jiān)督學(xué)習(xí)才能獲取更優(yōu)的特征模型。而距離度量學(xué)習(xí)在一定程度上減輕了跨攝像機(jī)行人匹配時(shí)的外觀差異,然而因有限的訓(xùn)練樣本數(shù)據(jù),可能無(wú)法獲取跨攝像機(jī)行人更優(yōu)的距離度量。
為了更好地解決跨攝像機(jī)行人外觀的顯著變化,文中結(jié)合深度學(xué)習(xí)技術(shù)和度量學(xué)習(xí)技術(shù)進(jìn)行行人再識(shí)別,其算法流程如圖1所示。首先采用手工特征和深度特征融合網(wǎng)絡(luò)FFN對(duì)行人的訓(xùn)練樣本進(jìn)行辨識(shí)特征提取,然后將核矩陣K應(yīng)用于KISSME距離度量學(xué)習(xí)中,以獲取更優(yōu)的距離度量模型,從而提高行人再識(shí)別的準(zhǔn)確率和魯棒性。
圖1 算法流程
1 辨識(shí)特征提取
為了更準(zhǔn)確地描述行人外觀,文中采用手工特征和深度特征融合網(wǎng)絡(luò)FFN提取行人圖像特征[3],如圖2所示。FFN由兩個(gè)子網(wǎng)絡(luò)組成。第一個(gè)子網(wǎng)絡(luò)使用傳統(tǒng)的CNN(卷積、池化、激活函數(shù))來(lái)處理輸入行人圖像;第二個(gè)子網(wǎng)絡(luò)使用額外的手工特征(RGB, HSV, LAB, YCbCr, YIQ顏色特征和Gabor紋理特征)來(lái)表示相同的行人圖像。兩個(gè)子網(wǎng)絡(luò)共同作用形成更加充分的行人圖像描述。第二個(gè)子網(wǎng)絡(luò)在特征學(xué)習(xí)過(guò)程中用于調(diào)整第一個(gè)子網(wǎng)絡(luò)的學(xué)習(xí)方向。最終,在融合層產(chǎn)生4096維的FFN特征向量。
圖2 FFN特征提取圖解[3]
2 核距離度量學(xué)習(xí)
為了減輕跨攝像機(jī)行人外觀的變化,在行人匹配階段,采用基于核技巧的KISSME[12]距離度量學(xué)習(xí)方法,獲取最優(yōu)的馬氏距離度量學(xué)習(xí)模型。
給定一對(duì)樣本(xi,xj),其馬氏距離定義如公式(1)所示:
d■■(xi,xj)=(xi-xj)TM(xi-xj)(1)
式中:M=∑■■-∑■■為正的半正定馬氏距離矩陣,能夠很容易地從訓(xùn)練樣本中學(xué)習(xí)?!芐=■∑■(x■-x■)(x■-x■)■和∑D=■∑∑■(x■-x■)(x■-x■)分別表示行人圖像相似對(duì)S和不相似對(duì)D的協(xié)方差矩陣。
文中通過(guò)核技巧將樣本特征向量從輸入特征空間映射到高維核空間,樣本特征向量之間借助核函數(shù)的映射獲取核矩陣K,即:K=ΦT(X)Φ(X)表示。X表示樣本特征,Φ(X)表示輸入特征空間到核空間的非線性映射。核函數(shù)的引入避免“維數(shù)災(zāi)難”,可大大減少計(jì)算量,也可通過(guò)自由的選取合適的核函數(shù)改善算法的性能。
3 實(shí)驗(yàn)結(jié)果
文中應(yīng)用具有挑戰(zhàn)性的兩個(gè)公開數(shù)據(jù)集:VIPeR和PRID450S,估計(jì)所提出的行人再識(shí)別算法的累計(jì)匹配特性(CMC)。通過(guò)隨機(jī)選取行人數(shù)的一半作為訓(xùn)練樣本集,另一半作為測(cè)試樣本集。訓(xùn)練集中的樣本用于學(xué)習(xí)距離度量模型,測(cè)試集中的樣本用于衡量跨攝像機(jī)行人圖像的特征距離。
表1和圖3給出了VIPeR和PRID450S兩個(gè)數(shù)據(jù)集的實(shí)驗(yàn)結(jié)果。由表1可知,基于相同特征FFN,在PRID450S數(shù)據(jù)集中有更優(yōu)的識(shí)別率。在VIPeR數(shù)據(jù)集排序?yàn)?時(shí)識(shí)別率僅為26.9%,而在PRID450S數(shù)據(jù)集排序?yàn)?時(shí)識(shí)別率為49.33%。
表1VIPeR和PRID450S兩個(gè)數(shù)據(jù)集的最高識(shí)別率(%)。列出了排序?yàn)?,5,10,20的累積匹配分?jǐn)?shù)。
表1
圖3 VIPeR和PRID450S兩個(gè)數(shù)據(jù)集的最高識(shí)別率(%)
4 結(jié)論
文中提出了基于深度學(xué)習(xí)和度量學(xué)習(xí)的行人再識(shí)別算法。采用深度特征和手工特征融合網(wǎng)絡(luò)FFN提取行人圖像特征,并將核矩陣K應(yīng)用于KISSME距離度量學(xué)習(xí)中,獲取更優(yōu)的距離度量模型。在具有挑戰(zhàn)的VIPeR和PRID450S兩個(gè)行人再識(shí)別數(shù)據(jù)集上的實(shí)驗(yàn)結(jié)果展示了文中提出的行人再識(shí)別算法的有效性。
【參考文獻(xiàn)】
[1]S. Liao, Y. Hu, X. Zhu, and S. Z. Li, “Person re-identification by local maximal occurrence representation and metric learning,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, Massachusetts, USA, 2015.6.7-2015.6.12.
[2]T. Xiao, H. Li, W. Ouyang, and X. Wang, “Learning deep feature representations with domain guided dropout for person re-identification,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, USA, 2016.6.26-2016.7.1.
[3]S. Wu, Y. C. Chen, X. Li, A. C. Wu, J. J. You, W. S. Zheng, “An enhanced deep feature representation for person re-identification,” IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, 2016.3.7-2016.3.9
[4]D. Cheng, Y. Gong, S. Zhou, J. Wang, and N. Zheng, “Person re-identification by multi-channel parts-based CNN with improved triplet loss function,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, USA, 2016.6.26-2016.7.1.
[5]Y. Chen, X. Zhu, and S. Gong, “Person re-identification by deep learning multi-scale representations,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, USA, 2017.7.21-2017.7.26
[6]H. Zhao, et al., “Spindle net: Person re-identification with human body region guided feature decomposition and fusion,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, USA, 2017.7.21-2017.7.26.
[7]X. Liu, et al., “Hydraplus-net: Attentive deep features for pedestrian analysis,” IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017.10.22-2017.10.29.
[8]Y. Sun, et al., “Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline),” European Conference on Computer Vision (ECCV), Munich, Germany, 2018.9.8-2018.9.14.
[9]L.Zhao,et al.,“Deeply-learned part-aligned representations for person re-identification,”IEEE International Conference on Computer Vision (ICCV),Venice,Italy,2017.10.22-2017.10.29.
[10]L. He, et al., “Deep spatial feature reconstruction for partial person re-identification: Alignment-free approach,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, Utah, 2018.6.18-2018.6.22.
[11]X. Chang, et al., “Multi-level factorisation net for person re-identification,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, Utah, 2018.6.18-2018.6.22.
[12]M. Koestinger, et al., “Large scale metric learning from equivalence constraints,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, Rhode Island, USA, 2012.6.16-2012.6.21.
[13]S. Pedagadi, et al., “Local fisher discriminant analysis for pedestrian re-identification,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, Oregon, USA, 2013.6.23-2013.6.28.
[14]F. Xiong, M. Gou, O. Camps, M. Sznaier, “Person re-identification using kernel-based metric learning methods,” European conference on computer vision (ECCV), Zurich, Switzerland, 2014.9.6-2014.9.12.
[15]S. Paisitkriangkrai, C. Shen, A. Hengel, “Learning to rank in person re-identification with metric ensembles,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, Massachusetts, USA, 2015.6.7-2015.6.12.
[16]Y. Yang, S. Liao, Z. Lei, S. Z. Li, “Large scale similarity learning using similar pairs for person verification,” AAAI Conference on Artificial Intelligence (AAAI), Phoenix, Arizona, USA, 2016.2.12-2016.2.17.
[17]L. Hou, K. Han, W. G. Wan, J-N Hwang, H. Y. Yao, “Normalized Distance Aggregation of Discriminative Features for Person Re-identification,” Journal of Electronic Imaging, 2018, 27(2): 023006.
[18]X. Yang, M. Wang, and D. Tao, “Person re-identification with metric learning using privileged information,” IEEE Transactions on Image Processing, 2018, 27(2),791-805.