Kai-Xuan Chen,Xiao-Jun Wu()
Abstract In pattern recognition,the task of image set classification has often been performed by representing data using symmetric positive definite(SPD)matrices,in conjunction with themetricoftheresulting Riemannian manifold.In this paper,we propose a new data representation framework for image sets which we call component symmetric positive definite representation(CSPD).Firstly,we obtain sub-image sets by dividing the images in the set into square blocks of the same size,and use a traditional SPD model to describe them.Then,we use the Riemannian kernel to determine similarities of corresponding subimage sets.Finally,the CSPD matrix appears in the form of the kernel matrix for all the sub-image sets;its i,j-th entry measures the similarity between the i-th and j-th sub-image sets.The Riemannian kernel is shown to satisfy Mercer’s theorem,so the CSPD matrix is symmetric and positive definite,and also lies on a Riemannian manifold.Test on three benchmark datasets shows that CSPD is both lower-dimensional and more discriminative data descriptor than standard SPD for the task of image set classification.
Keywords symmetric positive definite(SPD)matrices;Riemannian kernel;image classification,Riemannian manifold
Image set classification has received wide attention in the domains of artificial intelligence and pattern recognition[1–8].An image set contains a large number of images taken under different conditions,allowing more robust discrimination than use of singleshot images[9–13].Representations of image sets used for the task of image classification commonly include Gaussian mixture models[14],linear subspaces[1],and covariance descriptors(CovDs)[7,8,15].The latter have been widely applied,e.g.,in virus recognition[13],object detection[4],and gesture classification[5].The traditional SPD model is based on CovDs,and SPD matrices lie on a non-linear manifold called the SPD manifold.
The dimensionality of traditional SPD matrices[2,4–8]used for representing image sets is relatively high,which resultsin excessivecomputation and low efficiency of algorithms.Dimensionality reduction is always important in computer vision and machine learning.Classical methods,such as principal component analysis(PCA)[16]and linear discriminant analysis(LDA)[17]are widely used in various applications. However,as SPD matrices lie on a non-linear Riemannian manifold,these methods are unsuitable for analyzing SPD matrices.Recently,the work extending dimensionality reduction to Riemannian manifolds has received wide attention.Bidirectional covariance matrices(BCM)[8]and SPDML[2]are dimensionality reduction methods which can be used with the SPD manifold.BCM[8]applies two-directional twodimensional PCA[18]directly to the SPD matrices to obtain low-dimensional descriptors.SPDML[2]embeds the high-dimensional SPD matrices into a lower-dimensional,more discriminative SPD manifold through projection.
Fig.1 Processing an image set using the traditional SPD approach and our CSPD approach.Above:traditional SPD.The SPD matrix is computed using a covariance descriptor;it lies on a non-linear SPD manifold.Below:our CSPD approach.It firstly divides images in the set into square blocks of the same size.Thei-th sub-image setBiis represented by a traditional SPD model.The Riemannian kernel is then used to describe the similarity between the sub-image sets.The final CSPD appears in the form of the Riemannian kernel matrix of the representations of sub-image sets.
In this paper,we propose a new framework to obtain low-dimensional,discriminative descriptors for representing image sets.Figure 1 shows pipelines for producing descriptors of image sets,for both our framework and the traditional SPD model.Given an image set withnimages,in traditional SPD,the images are vectorized to obtain an image set matrix S=[s1,...,sn](see Fig.1(b)),wheresi∈RDrepresents thei-th image in the image set.Using CovDs results in aD×DSPD matrix(see Fig.1(c))for the representation of the entire image set.Itsi,jth element is the covariance between thei-th andj-th rows of the image set matrixS(see Fig.1(b)).Unlike the traditional SPD model,we describe the image set by measuring similarities between sub-image sets.We firstly divide the images in the set intod×dblocks of the same size,and obtaind2sub-image sets.We then vectorize the sub-images(see Fig.1(d))and use covariance descriptors to represent them(Fig.1(e)).Finally,the CSPD(Fig.1(f))takes the form of the Riemannian kernel matrix;itsi,j-th element denotes the similarity between thei-th andj-th sub-image sets.The dimensionality of the CSPD matrix is d2×d2,which depends on the number of sub-image sets.
The rest of this paper is organized as follows.In Section 2,brie fly overview the geometry of the SPD manifold and some related classical Riemannian metrics.In Section 3,we present the traditional SPD model and our CSPD model,and introduce some SPD manifold-based classification algorithms used in our experiments.In Section 4,we present experimental results,which show that CSPD is a lower-dimensional and more discriminative data descriptor than standard SPD for the task of image set classification.In Section 5,we consider the effects of block size.In Section 6,we present our conclusions and discuss future directions.
In this section,we overview the geometry of the SPD manifold and some related classical Riemannian metrics.We adopt the following notation:is the space spanned by realn×nSPD matrices,Snis the tangent space spanned by realn×nsymmetric matrices at the point of the identity matrixIn∈Rn×n,andTPis the tangent space spanned by real n×n symmetric matrices at the point P∈.
As described in Ref.[2],the SPD manifold spanned by SPD matrices is a non-linear Riemannian manifold,and does not satisfy the scalar multiplication axiom of a vector space.For example,the matrix resulting by multiplying an SPD matrix by a negative scalar does not lie on[11].Thus,similarity of two SPD matrices cannot be meaningfully computed using the Euclidean metric,and instead,Riemannian metrics are a better tool for analyzing SPD matrices.
A variety of Riemannian metrics have been proposed for SPD manifolds.In particular,the affine invariant Riemannian metric(AIRM)[2,8,19]is the most widely studied Riemannian metric,and has the property of affine invariance.The Stein divergence and Jeffrey divergence[2,10,20],which are efficient metrics to measure geodesic distance on the SPD manifold,are equivalent to Bregman divergence for some special seed function.The log-Euclidean metric(LEM)[7,8]gives the similarity between two SPD matrices by computing their distance in tangent space.We now consider AIRM and LEM in detail,as these two metrics are used in our experiments.
The AIRM geodesic distancedAIRMbetween two pointsSpiandSpjon the SPD manifold can be written as
where ‖·‖F(xiàn)denotes the Frobenius norm,log(·)is the matrix logarithm operator.
The Log-Euclidean metric[4,7,8,11]is a bivariant Riemannian metric coming from Lie group multiplication on SPD matrices[11].The distance dLogEDbetween two pointsSpiandSpjon the SPD manifold computed by this metric can be written as
The LEM can be viewed as the distance between points in the domain of the matrix logarithm,which is the tangent space of the SPD manifold obtained by logarithmic mapping[7,8]:
Fig.2 Logarithmic mapping.
whereSnis a vector space. Figure 2 illustrates the concept of logarithm mapping.Furthermore,the associated Riemannian kernel function can be computed by the inner product of points in the tangent space[7,11]:
For the all pointsSp1,...,SpN∈,kLogEis a symmetric function becausekLogE(Spi,Spj)=kLogE(Spj,Spi).From Ref.[7],we have
Equation(6)shows that the Log-Euclidean kernel guarantees the positive definiteness of the Riemannian kernel which thus satis fies Mercer’s theorem.
In this section,we brie fly describe the traditional SPD model[7,8]and then introduce our proposed CSPD model in detail.We also discuss classification algorithms based on these approaches.
For an image set matrixS=[s1,...,sn]withn images,letsi∈RDbe a feature vector obtained by vectorizing thei-th image in the set.The final descriptor of the image set is aD×DSPD matrix[2,7,8]computed from covariance descriptors:
where λ is set to 10?3tr(C)andIis the identity matrix[7].
In this way,the image sets are represented as SPD matrices,which lie on the SPD manifold.
Our proposed framework,which describes the similarities between sub-image sets,offers lowerdimensional and more discriminative descriptors for image sets than SPD.We firstly divide the images in the set intod×dsquare blocks of the same size,and each sub-image set is described by covariance descriptors(see Eq.(7)).This results ind2SPD matrices representing sub-image sets.The final CSPD matrix is the Riemannian kernel matrix of these SPD matrices,and its dimensionality is d2×d2.
See Fig.1(below)again.In this figure,the images were divided into 2×2 square blocks to form 4 subimage sets:B1,...,B4.There are 4 corresponding covariance descriptors:C1,...,C4.The CSPD lies in ∈ R4×4,and is a matrix describing the similarity between these 4 sub-image sets.In order to measure similarity between the sub-image sets,we use the Log-Euclidean kernel(Eq.(5))to compute the similarity of the covariance descriptors(Fig.1(f)):
whereCSPDi,jmeasures the similarity between theith andj-th sub-image sets.Note that as Eq.(6)shows that the log-Euclidean kernel guarantees positive definiteness of the Riemannian kernel,the CSPD matrix also lies on the SPD manifold.
The nearest neighbor(NN)algorithm is one of the simplest methods for classification and regression used in computer vision and pattern recognition.Nearest neighbour classification algorithms based on AIRM and LEM have been utilized with the SPD manifold[8],and these simple classification algorithms clearly show the advantages of our CSPD model(see later).
Covariance discriminative learning(CDL)has been proposed for image set classification[7],and classical classification algorithms are directly based on the SPD manifold.They derive a kernel function that maps the SPD matrices from the Riemannian manifold to Euclidean space through the LEM metric.This allows classical classification algorithms operating on a linear space to be exploited in the kernel formulation. Linear discriminant analysis(LDA)and partial least squares(PLS)in this linear space have been used for the task of classification[7].
We also consider the Riemannian sparse coding algorithm LogEKSR [11], which takes the Riemannian geometry of the SPD manifold into account and applies sparse representation and dictionary learning to SPD matrices by mapping the SPD matrices into reproducing kernel Hilbert space(RKHS).Note that the log-Euclidean kernels in this algorithm,which are the derivatives of those in Eq.(7)and meet Mercer’s theorem,are respectively the polynomial kernel,exponential kernel,and Gaussian kernel.
In order to verify the effectiveness of our model,we have carried out experiments on three tasks:object categorization,hand gesture recognition,and virus cell classification using the three datasets ETH-80[4],the Cambridge hand gesture dataset(CG)[5],and the virus dataset[13]respectively.In our experiments,we compared the accuracies resulting from our proposed CSPD model with those from the traditional SPD model using the same classification algorithms.To do so,we used the most common nearest neighbor classifiers based on AIRM[2,8]and LEM[4,8,11],as described earlier.As well as these two NN classifiers,we made use of classical Riemannian classification algorithms LogEKSR[11]and CDL[7],which are the efficient methods on SPD manifolds.In the following,we use the following notation:
· NN-AIRMSPD:AIRM-based nearest neighbor classifier on the SPD manifold spanned by traditional SPD matrices.
· NN-AIRMCSPD:AIRM-based nearest neighbor classifier on the SPD manifold spanned by our proposed CSPD matrices.
· NN-LogEDSPD:LEM-based nearest neighbor classifier on the SPD manifold spanned by traditional SPD matrices.
· NN-LogEDCSPD:LEM-based nearest neighbor classifier on the SPD manifold spanned by our proposed CSPD matrices.
·CDLSPD:CD on the SPD manifold spanned by traditional SPD matrices.
·CDLCSPD:CDL on the SPD manifold spanned by our proposed CSPD matrices.
· LogEKSRSPD:LogEKSR on the SPD manifold spanned by traditional SPD matrices.
·LogEKSRCSPD:LogEKSR on the SPD manifold spanned by our proposed CSPD matrices.
In our experiments,we re-sized all the images to 24×24,allowing them to be divided into 2×2,3×3,4×4,6×6,8×8,and 12×12 blocks.With this image size,the dimensionality of the traditional SPD is 576×576.Instead,the dimensionality of the CSPD will be 4×4,9×9,16×16,36×36,64×64,and 144×144.It is thus clear that our approach provides a lowerdimensional data representation.Next,we use the results of the experiments to verify the discriminative power of our model.
The ETH-80 dataset has eight categories of images:apples,pears,tomatoes,cows,dogs,horses,cups,and cars.Each class has 10 image sets,and each image set comprises 41 images from different angles.Figure 3(top)shows some images in the ETH-80 dataset.For each class,we randomly chose 2 image sets as training data,and the remaining image sets were used as test data.We give average accuracies and standard deviations of the 10 cross validation experiments.Table 1 shows the performance of both our proposed CSPD model and the traditional SPD model using four classification algorithms.The results for our CSPD model here use 6×6 blocks.
Fig.3 Images from the three datasets.Top:ETH-80[4].Middle:CG[5].Bottom:virus[13].
Table 1 Recognition rates with standard deviations,ETH-80 dataset
The Cambridge hand gesture dataset consists of a set of high resolution color sequences acquired by a Senz3D sensor showing an image sequence of hand gestures defined by 3 primitive hand shapes and 3 primitive motions.This dataset has 900 image sets in 9 classes with 100 image sets in each class(see Fig.3(middle)).For the task of hand gesture recognition,20 image sets of each class were randomly selected as training data,and the remaining image sets were used as test data.Ten-fold cross validation experiments were carried out on this dataset.We give the average accuracies with standard deviations for the ten experiments in Table 2.The results for our CSPD model are again based on 6×6 blocks.
The virus dataset contains 15 categories,each category having 5 image sets,each with 20 pictures taken from different angles(see Fig.3(bottom)).We arbitrarily chose 3 for training and the rest for testing.Table 3 shows the results using our proposed CSPD model and the traditional SPD model using four classification algorithms.The results for our CSPD model are this time based on 4×4 blocks.
Table 2 Recognition rates with standard deviations,hand gesture dataset
Table 3 Recognition rates with standard deviations,virus cell dataset
For all three datasets,the results for all four classifiers show that our proposed CSPD can provide more discriminative and robust features than traditional SPD.Especially for the ETH-80 dataset,the recognition rates of the two NN classifiers based on our proposed CSPD,NN-AIRMCSPDand NN-LogEDCSPD,are higher than for the two state-ofthe-art classification algorithms CDL and LogEKSR,based on traditional SPD.Also,the accuracies of the two classifiers CDL and LogEKSR show that our proposed CSPD is more discriminative,andLogEKSRCSPDachieves the best result with an accuracy of 89.92%and standard deviation of 3.84. For the hand gesture and virus datasets,the advantages are not as obvious as for ETH-80.Nevertheless,LogEKSRCSPDstill achieves the best result with an accuracy of 91.02%and standard deviation of 1.54 for the hand gesture dataset,while CDLCSPDachieves the best result with an accuracy of 54.50%and standard deviation of 7.38 on the virus dataset.
Next,we will present the effects of varying block size on accuracy and running for a fixed classification algorithm.We show results using the ETH-80 dataset as an example.The notation used is as follows:
·SPDTR:traditional SPD data representation.
· CSPDn×n:CSPD descriptor obtained by dividing the image into n×n blocks.
In order to display the effects of block size,we give the average accuracies achieved for traditional SPD as well as for 6 different CSPD descriptors arising from different block sizes.Table 4 shows the averageaccuracies achieved with different data descriptors and NN-AIRM,NN-LogED,CDL,and LogEKSR classification algorithms. The data in each row are the average recognition rates using the same data descriptor and different classification algorithms;the columns give the average recognition rates for the same classification algorithm with different data descriptors.The recognition rates using the CSPD model are lower than for the traditional SPD model when using 2×2 blocks.However,the algorithms have better recognition rates than for SPD when using our proposed CSPD model and larger blocks.
Table 4 Effect of block size on average accuracy
In order to show the robustness of our proposed CSPD model,we give average standard deviations of ten experiments in Table 5.The data in the row are the standard deviations of accuracy using the same data descriptor with different classification algorithms,while columns give standard deviations of accuracy for the same classification algorithm with different data descriptors.The results show that standard deviations from our CSPD model are generally lower than for the traditional SPD model,for the same classification algorithm,especially for larger block sizes.
The results in these two tables show that,typically,our CSPD model works best when the images were divided into 6×6 blocks,justifying the use of this size in the results in Tables 1 and 2.The results of classification algorithms based on the CSPD model in Table 3 were obtained by dividing the images inthe virus cell dataset into 4×4 blocks,which also works well.
Table 5 Effect of block size on standard deviation of accuracy
CSPD matrices have lower dimensionality than traditional SPD matrices.This property saves run time.We consider the efficiency of our CSPD model in two ways:(i)the run time using different data representation models while using the same classifier,and(ii)the time needed to compute data descriptors(SPD or CSPD).
Table 6 shows the time needed to compute data descriptors(SPD or CSPD).Clearly,the time needed for CSPD is less than for traditional SPD when the image set is divided into 2×2,3×3,4×4,and 6×6 blocks.In general,all descriptors take comparable time.
Secondly,we compare classification time using different data descriptors in Table 7.Each row gives run time for the same data descriptor while using different classification algorithms,while columns give run time using the same classification algorithm with different data descriptors.
These two tables show that our CSPD model takes least time when the images were divided into 2×2 blocks.Even with 12×12 blocks,our CSPD model takes far less time than traditional SPD while using the same classification algorithm.We can thusconclude that our CSPD model improves the efficiency of algorithms significantly.
Table 6 Time needed to compute data descriptors
Table 7 Time taken by different data descriptors with the same classification algorithm
In this paper,we have proposed the component symmetric positive definite(CSPD)model as a novel descriptor for image sets.Its superior time performance is due to its lower dimensionality,and it also shows better discriminative ability,providing higher recognition rates than those from traditional SPD when using the same classification algorithm.The latter is clearly demonstrated by results from two nearest neighbor classification algorithms.In future,we hope to devise further data descriptors for image set classification.
Computational Visual Media2018年3期