袁瑩
DOI:10.16644/j.cnki.cn33-1094/tp.2016.09.005
摘 要: 圖像中存在顏色、形狀和紋理等全局特征以及LBP和SIFT等局部特征,這些異構(gòu)特征之間存在明顯的結(jié)構(gòu)信息。不同視覺特征在表示特定高層語義時重要程度不同,因此,正確的特征選擇對于圖像標(biāo)注來說具有十分重要的意義。為了充分利用異構(gòu)特征之間的結(jié)構(gòu)組效應(yīng),提出了一種基于組稀疏的高維特征選擇算法及其在圖像標(biāo)注中的應(yīng)用。通過與其他三種算法在圖像標(biāo)注上的性能對比,證明該算法能得到更優(yōu)的圖像標(biāo)注結(jié)果。
關(guān)鍵詞: 異構(gòu)特征; 組稀疏; 特征選擇; 圖像標(biāo)注
中圖分類號:TP311 文獻標(biāo)志碼:A 文章編號:1006-8228(2016)09-17-04
Image annotation based on structured grouping sparsity
Yuan Ying
(Department of Computer and Information Technology, ZheJiang Police College, Hangzhou, Zhejiang 310058, China)
Abstract: The heterogeneous features can describe various aspects of visual characteristics of images, such as global features (color, shape and texture) or local features (SIFT and LBP). Different heterogeneous features have different structural information. Different groups of heterogeneous features have different intrinsic discriminative power to characterize the semantics inside images. Therefore, to select the right features is of great significance for image annotation. In order to effectively utilize the structural grouping effect among heterogeneous visual features, a high-dimensional feature selection method based on structured grouping sparsity is proposed, and its application in image annotation is introduced. Comparing with the performance of other three algorithms in image annotation, it is proved that the proposed algorithm can get better image annotation results.
Key words: heterogeneous features; group sparsity; feature selection; image annotation
0 引言
隨著數(shù)字攝影、網(wǎng)絡(luò)技術(shù)、存儲技術(shù)的迅速發(fā)展,互聯(lián)網(wǎng)中的圖像數(shù)據(jù)大量涌現(xiàn)。許多互聯(lián)網(wǎng)網(wǎng)站如Flickr以及Wikipidia,提供用戶免費上傳、儲存、分享照片,同時將圖片標(biāo)上標(biāo)簽以供瀏覽、查詢。這些網(wǎng)站每天都在不斷產(chǎn)生和使用海量的圖像數(shù)據(jù)且伴隨有大量文本信息,例如標(biāo)注信息。然而這些標(biāo)注信息往往是混亂無序的,同時還存在不少錯誤。這些圖像數(shù)據(jù)在給人們生活帶來各種便利的同時,也使用戶如何能夠從圖像數(shù)據(jù)中快速準(zhǔn)確地找到所需要的信息成為了一個迫切需要解決的課題。因此如何正確標(biāo)注圖像具有十分重要的意義。
1 稀疏表達概述
近年來,從統(tǒng)計信號處理中發(fā)展出的壓縮感知(Compressive Sensing,簡稱CS)受到越來越多的關(guān)注。壓縮感知利用“數(shù)據(jù)是稀疏可壓縮”先驗知識進行信號重建。壓縮感知(Compressed sensing) 和特征選擇(Feature selection)理論與方法相結(jié)合,用來對圖像形成更加有效的“稀疏表達”(Sparse representation),成為計算機視覺和機器學(xué)習(xí)等領(lǐng)域的研究熱點問題。美國斯坦福大學(xué)的Tibshirani 和加州大學(xué)伯克利分校的Breiman幾乎同時提出了對特征系數(shù)施以?1-范數(shù)約束的lasso(least absolute shrinkage and selection operator)思想[1-2],促使被選擇出來的特征盡可能稀疏,以保證結(jié)果穩(wěn)定性和提高數(shù)據(jù)處理過程的可解釋性(interpretable)。但是,以lasso為基礎(chǔ)的特征選擇方法并沒有考慮到特征之間存在的組效應(yīng)(grouping effect)特性(即某一(類)特征與其他(類)特征之間存在很強相關(guān)性)。
為了克服這一不足,本文利用異構(gòu)特征所存在的組稀疏(grouping sparsity)特點去選擇某一語義所對應(yīng)的重要特征,提出了一種基于結(jié)構(gòu)化組稀疏的高維特征選擇算法(high-dimensional feature selection methods based on Structured Grouping Sparsity,簡稱SGS)。
4 結(jié)束語
本文將同一種類視覺特征歸屬為一組(如SIFT 特征歸屬為一組,而顏色直方圖歸屬為另一組),使得圖像異構(gòu)特征在表達時能充分利用這種結(jié)構(gòu)性組效應(yīng)。同時,為了克服數(shù)據(jù)高維異構(gòu)特征帶來的線性不可分問題,本章提出了一個基于結(jié)構(gòu)化組稀疏的高維特征選擇圖像標(biāo)注算法(high-dimensional feature selection methods based on Structured Grouping Sparsity,簡稱SGS)。本文通過與其他三種算法在圖像標(biāo)注上的性能對比,證明了所提出算法SGS 能得到更優(yōu)的圖像標(biāo)注結(jié)果。
但是在高維特征上的基于核學(xué)習(xí)的算法需要將數(shù)據(jù)通過核函數(shù)映射到新的特征空間,映射后的核矩陣維數(shù)只跟樣本數(shù)量有關(guān),因此對于大規(guī)模圖像數(shù)據(jù),核學(xué)習(xí)算法運行較慢,且無法滿足隨時增長的圖像標(biāo)注問題。如何建立大規(guī)模圖像數(shù)據(jù)的學(xué)習(xí)模型以及如何處理實時增長的圖片數(shù)據(jù),是圖像標(biāo)注領(lǐng)域值得研究的重要問題。
參考文獻(References):
[1] Robert Tibshirani. Regression shrinkage and selection via
the lasso. Journal of the Royal Statistical Society. Series B (Methodological),1996:267-288
[2] Leo Breiman. Heuristics of instability and stabilization in
model selection. The annals of statistics,1996.24(6):2350-2383
[3] F. Wu, Y. Yuan, Y. Rui, S. Yan, Y. Zhuang. Annotating
web images using nova: Non-convex group sparsity. In Proceedings of the 20th ACM international conference on Multimedia,2012:509-518
[4] Alexander Loui, Jiebo Luo, Shih-Fu Chang, Dan Ellis, Wei
Jiang, Lyndon Kennedy, Keansub Lee, Akira Yanagawa. Kodak's consumer video benchmark data set: concept definition and annotation. In Proceedings of the international workshop on Workshop on multimedia information retrieval,2007:245-254
[5] Hao Li, Meng Wang, Xian-Sheng Hua. Msra-mm 2.0: A
large-scale web multimedia dataset. In Data Mining Workshops, 2009. ICDMW'09. IEEE International Conference on,2009:164-169
[6] Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li,
Zhiping Luo, Yantao Zheng. Nus-wide: a real-world web image database from national university of singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval,2009:48
[7] M. Yuan and Y. Lin. Model selection and estimation in
regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology),2006.68(1):49-67
[8] Ying Yuan, Jian Shao, Fei Wu, Yue-Ting Zhuang. Image
annotation by the multiple kernel learning with group sparsity effect. Ruanjian Xuebao/Journal of Software,2012.23(9):2500-2509