Chuanli Wang,En ZhuXinwang LiuJiaohua Qin,Jianping Yinand Kaikai Zhao
Abstract:Multiple kernel clustering based on local kernel alignment has achieved outstanding clustering performance by applying local kernel alignment on each sample.However,we observe that most of existing works usually assume that each local kernel alignment has the equal contribution to clustering performance,while local kernel alignment on different sample actually has different contribution to clustering performance.Therefore this assumption could have a negative effective on clustering performance.To solve this issue,we design a multiple kernel clustering algorithm based on self-weighted local kernel alignment,which can learn a proper weight to clustering performance for each local kernel alignment.Specifically,we introduce a new optimization variable-weight-to denote the contribution of each local kernel alignment to clustering performance,and then,weight,kernel combination coefficients and cluster membership are alternately optimized under kernel alignment frame.In addition,we develop a three-step alternate iterative optimization algorithm to address the resultant optimization problem.Broad experiments on five benchmark data sets have been put into effect to evaluate the clustering performance of the proposed algorithm.The experimental results distinctly demonstrate that the proposed algorithm outperforms the typical multiple kernel clustering algorithms,which illustrates the effectiveness of the proposed algorithm.
Keywords:Multiple kernel clustering,kernel alignment,local kernel alignment,self-weighted.
On the one hand,kernel-based clustering algorithms are simple and effective[Filippone,Camastra,Masulli et al.(2008);Tzortzis and Likas(2009)];on the other hand,many typical clustering algorithms,such as spectral clustering and non-negative matrix factorization clustering,can be interpreted from the perspective of kernel[Dhillon,Guan and Kulis(2007);Ding,He,Simon et al.(2005)].Therefore,kernel-based clustering algorithms have been a research hot in various applications[Gnen and Margolin(2014);Li,Qin,Xiang et al.(2015)].Compared with one kernel,multiple kernel can provides more useful and complementary information for clustering[Cai,Nie and Huang(2013);Cai,Jiao,Zhuge et al.(2018);Hou,Nie,Tao et al.(2017)].Multiple kernel clustering(MKC)has attracted more and more attention,and a lot of MKC algorithms and their variants have been proposed recently[Han,Yang,Yang et al.(2018);Du,Zhou,Shi et al.(2015)].
MKC algorithms aim to improve the clustering performance by jointly optimizing a group of kernel combination coefficients and cluster membership[Liu,Dou,Yin et al.(2016)].In light of the difference of optimization frame,existing MKC algorithms can be roughly grouped into two categories.The spirit of the first category is that the single kernel is replaced with a combined kernel in objective function of clustering,and the optimal kernel combination coefficients and cluster membership are solved under clustering frame.The algorithms with regard to this one mainly include:multiple kernelK-means[Huang,Chuang,Chen et al.(2012)],multiple kernel fuzzyC-means[Chen,Chen,Lu et al.(2011)],robust multiple kernelK-means[Du,Zhou,Shi et al.(2015)],optimal neighborhood clustering[Liu,Zhou,Wang et al.(2017)],etc.Instead,the idea of the other category is that the cluster membership is viewed as pseudo label,and then it is put in the objective of kernel alignment[Wang,Zhao,and Tian(2015)],which is a widely used learning criterion in supervised learning,and the optimal kernel combination coefficients and pseudo label are optimized under multiple kernel learning frame.Along this idea,Lu et al.[Lu,Wang,Lu et al.(2014)]proposed centered kernel alignment for multiple kernel clustering,Liu et al.[Liu,Dou,Yin et al.(2016)]proposed kernel alignment maximization for clustering,Li et al.[Li,Liu,Wang et al.(2016)]proposed multiple kernel clustering based on local kernel alignment,etc.our work in this paper pays close attention to the clustering algorithms belonging to the second category.
Among these algorithms belonging to the second category,multiple kernel clustering based on local kernel alignment(LKAMKC)obtains prominent clustering performance by using local kernel to exploit the local structure information of data for clustering.Concretely,the sum of the objective of each local kernel alignment is defined as the optimization objective of LKAMKC,that is,it conducts local kernel alignment on each sample.
However,LKAMKC has achieved significant clustering performance,we observe that most of existing works usually assume that each local kernel alignment has an equal contribution to clustering performance,that is,each local kernel alignment is equally considered in whole clustering period.Obviously,this assumption does not well take the difference of each local kernel alignment into count,which could hinder the improving of the clustering performance.To address this issue,we propose a multiple kernel clustering algorithm based on self-weighted local kernel alignment to improve clustering performance.In detail,we introduce a weight variable to denote the contribution of each local kernel alignment to clustering performance,and then,weight of each local kernel alignment,kernel combination coefficients and cluster membership are jointly optimized.The proposed algorithm improves clustering performance by imposing learned weight on each local kernel alignment.After that,we develop a three-step alternate iterative optimization algorithm to solve the new optimization problem.Broad experiments on five benchmark data sets have been put into effect to evaluate the clustering performance of the proposed algorithm.The experimental results clearly show that the proposed algorithm outperforms the typically compared methods,which illustrates the effectiveness of the proposed algorithm.
In this section,we first review the related work about the kernel alignment and local kernel alignment for multiple kernel clustering.
Supposed a data set with n samples has m kernel feature matrices{Ki}i=1,…,m,and the data set needs to be divided into k clusters.Letand Kμdenote the relaxed cluster membership matrix and the combined kernel respectively.Kμcan be calculated as:
where μp≥ 0 denotes the combination coefficient of kernel matrix Kpto Kμ.
According to Lu et al.[Lu,Wang,Lu et al.(2014)],HH?can be regarded as a pseudo ideal kernel matrix.By substituting the true deal kernel with HH?,the objective of kernel alignment for multiple kernel clustering(KAMKC)can be expressed as:
where 〈·,·〉Fdenotes the Frobenius inner product of the two matrices and μ=[μ1,…,μm].H?H=Ikmeans H satisfies orthogonal constraint,μ?1m=1 means μ satisfies one norm constraint.
Because Eq.(2)is too complicated to directly optimize,Liu et al.[Liu,Dou,Yin et al.(2016)]not only theoretically discusses the connection between KAMKC and multiple kernelK-means(MKKM)but also derives an easy and equivalent optimization objective of KAMKC based on MKKM.The new optimization formula of KAMKC can be fulfilled as:
As seen from Eq.(2)or Eq.(3),KAMKC only utilizes the global structure information of kernel,while ignores its local structure information.Local kernel alignment for multiple kernel clustering(LKAMKC)enhances the clustering performance by exploiting the local structure of each sample with local kernel.
Replacing the global Kμ,H and M with the local,H(i)and M(i)respectively,the objective of local kernel alignment(LKA)on ithsample can be written as:
By accumulating objective of each LKA one by one,the objective of LKAMKC can be written as:
As shown from Eq.(8),LKAMKC equally considers each local kernel alignment,while inappropriately ignores the difference between each local kernel alignment.Thus,the contribution of each LKA to clustering performance is not properly exploited,which could hinder the improving of the clustering performance.To address the issue,we introduce a new weight variable to denote the contribution of each local kernel alignment to clustering performance.The new optimization variable and the old optimization variables in Eq.(8)are jointly optimized.By imposing a weight for each local kernel alignment on Eq.(8),the formulation of the proposed multiple kernel clustering algorithm can be written as:
where w=[w1,w2,…,wn]is the weight of the each local kernel alignment,w?1n=1 means w needs satisfy one norm constraint.
Although the proposed algorithm introduces a new variable,the optimization problem in Eq.(9)can still be solved.Specifically,we proposed a three-step alternating iterative method to optimize Eq.(9).
(i)Optimizing H when μ and w are given
Supposed other two optimization variables are given beforehand,then Eq.(9)can be translated into the following optimization problem.
Eq.(10)is a standard problem of kernel k-means,and the optimal H can be comprised by the k eigenvectors that correspond to the k largest eigenvalues of V.
(ii)Optimizing μ when H and w are given
If H and w are fixed,Eq.(9)is equivalent to a quadratic programming problem about μ.
Eq.(11)can be effectively solved by existing off-the-shelf packages.
(iii)Optimizing w when H and μ are given
If H and μ are fixed,Eq.(9)is equivalent to the following optimization problem.
Clearly,if aiis greater than zero,Eq.(12)is a convex quadratic programming problem,and it has an analytic solution.By applying the KKT condition on Eq.(12),the global optimal wican be computed by the following.
To prove that aiis greater than zero,we only need to prove thatis greater than zero because μ?M(i)μ must be greater than zero since M(i)is a positive definite matrix.
Proof:H?H=Ikand HH?H=H because of H is an orthogonal matrix.Let hidenote ithcolumn of matrix H,where i>=1 and i<=k.Clearly,HH?hi=hiwhich illustrates that HH?has k eigenvalue equalling to one and n-k eigenvalue equaling to zero.Alike,In-HH?has n-k eigenvalue equalling to one and k eigenvalue equaling to zero,so In-HH?is a positive definite matrix.In addition,Kμis a positive definite kernel matrix,therefore,is greater than zero.
Proof:We have A(i)=A(i)*A(i)and A(i)=A(i)?since A(i)=S(i)S(i)?.A(i)-A(i)HH?A(i)=A(i)(In-HH?)A(i).Let y is arbitrary vector.Clearly,y?A(i)(In-HH?)A(i)y> 0 because(y?A(i))?=A(i)y and In-HH?is a positive definite matrix,which is justified by theorem 1.Therefore,A(i)-A(i)HH?A(i)is a positive definite matrix,correspondingly,
In the proposed algorithm,the neighborhood of samples is crucial while it is difficult to exactly define during clustering.To simplify the optimization problem,we keep the neighborhood of samples fixed in the while process of optimization.By doing so,Eq.(10)is a standard kernel k-means optimization problem,Eq.(11)is a convex quadratic programming problem and Eq.(12)is also a convex quadratic programming problem.They are all convergent.Besides,the objective of the proposed algorithm has a lower bound.Therefore,the proposed clustering algorithm is convergent.The following results of experiment can illustrate the convergence of proposed algorithm.
We use Algorithm 1 to describe the implementation of the proposed algorithm,where t is the number of iteration.The input of Algorithm 1 includes kernel matrix,the number k of clusters,regularization parameter λ and the threshold θ of convergence.The output includes the relaxed clustering membership H,kernel combination coefficients μ and the weight w of each local kernel alignment.The convergent condition of Algorithm 1 is that the difference of the last two objectives is less than θ.
Algorithm 1Multiple Kernel Clustering based on Self-weighted Local Kernel Alignment
Input:
Output:μ and w.
Initialize A(i)for ?it?samples according to one criterion of τ nearest neighbors.
In this section,we conduct a large number of experiments to evaluate the clustering performance of the proposed algorithms.Moreover,we compare the proposed algorithm with many state-of-the-art MKC algorithms proposed recently.
To conveniently and convincingly evaluate the clustering performance of the proposed algorithm,five benchmark data sets from multiple kernel learning,are adopted in our experiments.They are Yale2https://vismod.media.mit.edu/vismod/classes/mas62200/datasets/,Digital3https://ss.sysu.edu.cn/ py/,ProteinFold4https://mkl.ucsd.edu/dataset/protein-fold-prediction,Movement5https://archive.ics.uci.edu/ml/datasets/Libras+Movement,Catech1026https://mkl.ucsd.edu/dataset/.The detailed number of samples,kernels and classes of these data sets are listed in Tab.1.
Table 1:The details of data sets in our experiments
Local kernel alignment for multiple kernel clustering(LKAMKC)[Li,Liu,Wang et al.(2016)]is a strong baseline since the proposed clustering algorithm directly extends it.In addition,the compared algorithms also include many related and the state-of-the-art multiple kernel clustering algorithms.Details of compared algorithms are as follows:Multiple kernelK-means(MKKM)[Huang,Chuang,Chen et al.(2012)],Localized multiple kernelK-means(LMKKM)[Gnen and Margolin(2014)],Robust multiple kernelK-means(RMKKM)[Du,Zhou,Shi et al.(2015)],Co-regularized spectral clustering(CRSC)[Kumar and Daumé(2011)],Robust multi-view spectral clustering(RMSC)[Xia,Pan,Du,et al.(2014)],Robust Multiple Kernel Clustering(RMKC)[Zhou,Du,Shi et al.(2015)],Kernel alignment for multiple kernel clustering(KAMKC)[Liu,Dou,Yin et al.(2016)],Optimal kernel clustering with multiple kernels(OKMKC)[Liu,Zhou,Wang et al.(2017)].
For movement data set,12 kernel matrices are computed according to Zhou et al.[Zhou,Du,Shi et al.(2015)],and kernel matrices of the other data sets are downloaded from respective websites.To eliminate differences between kernels,we let the diagonal elements of all kernel matrices equal to one by applying centering and scaling on kernels[Cortes,Mohri and Rostamizadeh(2013)].
LKAMKC algorithm and the proposed algorithm has the same two parameters:the number of neighbors τ and regularization parameter λ.For the number of neighbors,we respectively select the first kernel,the second kernel,the third kernel,and the average kernel to measure the neighborhood of samples,and the optimal τ is obtained by grid search from[0.05,0.1,…,0.95]*n where n is the number of samples.For the regularization parameter λ,the optimal value is chosen by grid search from[2-10,2-13,…,210].For the other compared algorithms,their parameters are set up according to the methods used in corresponding references.
To objectively evaluate the performance of the clustering algorithms,in all experiments we use the true number of classes as the number of clusters,and we adopt clustering accuracy(ACC),normalized mutual information(NMI)and purity as the indicators of the clustering performance.For all experiments,the simulations of the proposed algorithm and compared algorithms are carried out in MATLAB 2013b environment with windows 8 operation system.To reduce the effect of randomness caused byK-means as much as possible,we repeat each experiment for 30 times and report the best result.
Table 2:Clustering performance of all algorithms on all data sets
Tab.2 reports the best experimental results of the proposed algorithm and all compared algorithm,and Tab.3reports the more detailed comparison results between the proposed algorithm and LKAMKC algorithm.In all experiment,the neighborhood of samples is fixed but the criterion to measure the neighborhood of samples is adjustable.In Tab.2,LKAMKC and the proposed algorithm use the average combination kernel to measure the neighborhood of samples.In Tab.3,LKAMKC-K1,LKAMKC-K2,LKAMKC-K3 and LKAMKC-A denotes LKAMKC respectively adopts the first kernel,the second kernel,the third kernel and the average combination kernel to measure the neighborhood of samples.Also proposed-K1,proposed-K2,proposed-K3 and proposed-A denotes the proposed algorithm respectively adopts the first kernel,the second kernel,the third kernel and the average combination kernel to measure the neighborhood of samples.From Table 2,we have the following observation.
Table 3:The detailed comparison between the proposed algorithm and LKAMKC
These clustering algorithms which utilize local kernel,including LKAMKC and the proposed algorithm,significantly outperform the compared ones,which do not utilize local kernel,and among them,OKMKC demonstrates the best performance.Taking the results of ACC for an example,the proposed algorithm exceeds OKMKC 4.02%,10.92%,3.53%,3.47%,and 6.29% on Rale,Digital,ProteinFold,Movement and Caltech102,respectively.Similar conclusion can also be found in light of NMI and purity.It clearly indicates the importance of the local geometrical structure of data for clustering.
In terms of performance indicators:ACC,NMI and purity,the proposed algorithm obtains the best clustering performance on all data sets.Taking the results of ACC for an example,it exceeds LKAMKC,which is a strong baseline since the proposed algorithm directly extends it,by 0.99%,3.23%,3.53%,3.01% and 4.13% on Rale,Digital,ProteinFold,ProteinFold,Movement and Caltech102,respectively.Also,the excellent performance of the proposed algorithm in terms of the NMI and purity can be seen from the Tab.2,where similar observation can be found.It clearly shows the superiority of suitably utilizing the local kernel alignment.
From Tab.3,we can draw the following points:
Both LKAMKC and the proposed algorithm are sensitive to the neighborhood of samples.Taking Digital for an example,for LKAMKC and the proposed algorithm using the third kernel to measure the neighborhood of samples can achieve the better performance than using the first kernel to measure the neighborhood of samples.
Using the average kernel to measure the neighborhood of samples can achieve the better performance than using the single kernel to measure it.Taking ACC for an example,Proposed-A and LKAMKC-A exceed Proposed-K1 and LKAMKC-K1 by 0.06% and 0.08%,3.45% and 3.20%,5.75% and 2.65%,4.51% and 2.85%,0.73% and 0.78% on Yale,Digital,ProteinFold,Movement and Caltech102,respectively,which also shows that the combined kernel can contains more information of the neighborhood of samples than the single kernel contains.
No matter which the neighborhood of samples is chosen,the proposed algorithm is always better than LKAMKC.Taking ACC for an example,Proposed-K1 exceed LKAMKC-K1 by 1.82%,1.35%,2.62%,0.97%,2.19% on Yale,Digital,ProteinFold,Movement and Caltech102,respectively,which confirms the superiority and effectiveness of the proposed algorithm again.
When applying the proposed algorithm to cluster data,two parameters that contains the number τ of the nearest neighbors and regularization parameter λ need to be set up manually.Tab.3has analyzed the effect of the neighborhood of samples on the clustering performance.To evaluate the stability of the parameter λ,we select average kernel to measure the neighborhood of samples and fix the τ firstly and carry out a series of experiments on all data sets.Both the experimental results of the proposed algorithm and a baseline,which is the best result of LKAMKC with the same set,are drawn in Fig.1.From Fig.1,the following observation can be found.
1)The clustering performance of the proposed algorithm on all data sets is stable when parameter λ varies from a wide range;2)For Yale,λ=2-1is a watershed,if λ is less than watershed the ACC of the proposed is higher than the baseline,or the ACC of proposed is lower than the baseline.3)For Digital and Caltech,λ also has a watershed,differently,if λ is less than watershed the ACC of proposed is lower than the baseline,or the ACC of proposed is higher than the baseline.4)For ProteinFold and Movement,The ACC of proposed is better than the baseline when λ varies from a bounded range.For instance,when 2-4.5≤λ≤28.5,the curve of the proposed algorithm is on the top of the baseline.
To validate the convergence of the proposed algorithm,we record the objective value of the proposed algorithms at each iteration with fixing parameter τ and λ.Fig.2plots the number of iteration and the corresponding objective value of the proposed algorithms at one iteration.As seen from Fig.2,the objective value of the proposed algorithm is monotonically decreasing with regard to the time of iteration,and the proposed algorithm quickly converged in less than eleven iterations,which confirm the convergence of the proposed algorithm from the view of experiment.
Figure 1:The performance of the proposed algorithm with regard to parameter λ
Figure 2:The convergence of the proposed algorithm
In this paper,we propose a multiple kernel clustering algorithm based on self-weighted local kernel alignment,which can improve the clustering performance by exploiting the contribution of each local kernel alignment to clustering performance more rationally.A three-step alternate optimization algorithm with convergence is developed to address the subsequent optimization problem.Broad experiments on five benchmark data sets validate the effectiveness and superiority of the proposed algorithm.
As shown from Eq.(8)and Eq.(9),both LKAMKC and the proposed algorithm utilize all local kernel to cluster.However,if the number of samples is big,the clustering algorithms based on local kernel alignment is very time-consuming.Therefore,a fast version of the proposed algorithm,which is suitable for the big data sets[Xiao,Wang,Liu et al.(2018)],is worth studying in the future.
Acknowledgement:This work was supported by the National Key R&D Program of China(No.2018YFB1003203),National Natural Science Foundation of China(Nos.61672528,61773392,61772561),Educational Commission of Hu Nan Province,China(No.14B193)and the Key Research&Development Plan of Hunan Province(No.2018NK2012).
Computers Materials&Continua2019年10期