• 
    

    
    

      99热精品在线国产_美女午夜性视频免费_国产精品国产高清国产av_av欧美777_自拍偷自拍亚洲精品老妇_亚洲熟女精品中文字幕_www日本黄色视频网_国产精品野战在线观看 ?

      Variational Bayesian data analysis on manifold

      2018-07-31 03:30:20YangMING
      Control Theory and Technology 2018年3期

      Yang MING

      Key Lab of Systems and Control,Academy of Mathematics and Systems Science,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100190,China

      Abstract In this paper,variational inference is studied on manifolds with certain metrics.To solve the problem,the analysis is first proposed for the variational Bayesian on Lie group,and then extended to the manifold that is approximated by Lie groups.Then the convergence of the proposed algorithm with respect to the manifold metric is proved in two iterative processes:variational Bayesian expectation(VB-E)step and variational Bayesian maximum(VB-M)step.Moreover,the effective of different metrics for Bayesian analysis is discussed.

      Keywords:Variational Bayesian,Lie group,data analysis

      1 Introduction

      Variational Bayesian(VB)methods have been studied for a huge amount of models to approximate posterior probability distribution,which have benefits of low computational cost and analytic tractability[1–4],and therefore,its has been successfully applied to many areas related to control,estimation,and signal process[5].Based on VB methods,the efficiency of online model selection has been greatly improved[6].Furthermore,VB approximation was applied to recursive noise adaptive Kalman filtering[7].In[8],the VB method was used to control the state estimation for heavy-tailed noise.Additionally,in[9],the VB idea was employed to estimate the joint distribution of states and non-Gaussian noises statistically,where the non-Gaussian noises were simulated by mixture Gaussian distributions.

      In many practical situations,data may be located in some special manifolds,not the whole spaces.Usually,the dimension of a manifold may be much less than that of the Euclidean space where the manifold is embedded referring to the well-known Whitney embedding theorem,and therefore,we can largely reduce the dimension when we know the structure of a data manifold.In fact,the geometric structure of the data indeed is very helpful in data mining or data-based decision according to the famous manifold learning hypothesis[10].Since it is very hard to study the data distribution directly in manifolds for general cases,in some specific manifolds such as Lie groups,a number of results were obtained[11].Due to the symmetric group properties and smooth differential calculus,Lie groups can easily describe data arisen from various settings from rigid body motions[12]to phase transitions[13].Most of these data involve Lie group to provide a rigorous mathematical framework and reduce those complicated computations with the help of geometric structure[14,15].However,the studies of learning or reasoning methods on general manifolds are still under development.

      Most VB-based algorithms do not get straight connection with the geometric structure.The motivation of this paper is to develop a geometric approach for VB analysis on given manifolds,particularly those that can be approximated by Lie groups.To do this,we need to fully explore the geometric structure when we derive VB-based algorithms.The contribution of the paper may be summarized as follows:

      ?We consider the VB on Lie group with help of geometric analysis,and we discuss the convergence of the proposed algorithm with the help of VB-E and VB-M steps.

      ?We consider the VB on a manifold,which can be approximated by some Lie group,and get some estimation for the error bounds.

      ?We discussed various metrics for the VB methods,with comparison between them from the data analysis viewpoint.

      The remainder of the paper is organized as follows.Section 2 presents preliminaries and problem description of this paper.Then Section 3 gives the design and analysis of approximating data manifold by distribution supported on Lie group with specific error bound,while Section 4 shows theoretical inference for variational Bayesian on Lie group.Furthermore,Section 5 discusses several metrics to show their strong or weak points in the computation for data distribution on manifolds.Finally,Section 6 provides concluding remarks.

      2 Preliminaries

      In this section,we review some basic properties of Lie groups from the computational perspective.Note that all of the groups considered here are matrix Lie groups for convenience.Then we introduce VB for its extension to manifolds.

      2.1 Lie group

      It is known that Lie group is manifold with smooth differential structure,which is compatible with its group operation.It can be studied through calculus and continuous group symmetry[16],and also represents the best developed theory of continuous symmetry of mathematical objects and structures,which makes them powerful tools for many ideal geometric distributions of real data samples.

      Consider the set of 3×3 rotation matrices

      It can be shown that the 9 independent entry elements in a 3×3 real matrix are restricted by the orthogonality condition MMT=I to the space where a 3 degree of freedom subspace remains.The condition det(M)=1,however,limits the discussion to one component of the subspace.It is common to describe the 3 degree of freedom of the rotation group using standard parametrization such as the Euler angles:

      where Mi(θ)is a counterclockwise rotation about i th coordinate axis.

      The group SO(3)is a Lie group satisfying compactness,and therefore,it has finite volume measure.When using Euler angles,the volume is computed with respect to the integration measure

      which integrated over 0≤ α,γ ≤ 2π and 0≤ β≤ 2π giving the value 1.

      The Lie algebra so(3)consists of real skew-symmetric matrices of the following form:

      In general,Lie algebra and Lie group are connected by exponential mapping.For Lie groups discussed here,the exponential mapping is the exponential function for matrix.In this specific case,

      Moreover,it is well known that

      Although this low-dimensional example of Lie group is presented to make the discussion concrete,a great number of different kinds of Lie groups exit.For example,the same construction used to define SO(3)relative to R3×3can be used to define SO(n)from Rn×n.The result isdimensional Lie group and has a natural volume element d M.In general,the real matrix Lie algebra of dimension n is defined through a basis which consists of real matrices{Xi}for i=1,...,n satisfying the following commutator operation:

      is always admissible in some neighborhood about the identity of the corresponding Lie group.As a matter of fact,this parametrization is smoothly differentiable with respect to X(i)over group SO(n)except for a measure zero set.The logarithm mapping ln g(x)=X is defined as the inverse of the exponential function as well.It will be useful in the practical computation to identify a vector x∈Rnas

      Here{ei}represents the standard basis for Rn.

      In terms of the definitions that have been mentioned in the previous examples,the adjoint operation Ad and ad are the following matrix functions:

      The dimension of these matrices is the same as the dimension of Lie group,which may be different from the general matrix elements.The function Δ(g)=detAd(g)is called the modular function of G.For a unimodular Lie group,Δ(g)=1.

      2.2 Metrics and variational Bayesian inference

      Suppose pr,p?are two distribution on the manifold M,and here are the well-known metrics:

      ?The KL divergence

      Note that the KL divergence is anti-symmetric and is infinite when there are points such that p?(x)=0 and pr(x)>0.

      ?The total variation metric

      here Ω represents all the Borel sets of M.

      ?The Wasserstein metric

      here Π(pr,p?)represents all the joint distribution ξ(x,y)whose marginals are prand p?respectively.Geometrically,ξ(x,y)denotes the amount of weight transported from x to y by transforming the distribution prto distribution p?.

      In the context of Bayesian inference,the posterior probability distribution comes from prior distribution and likelihood function,which arisen from the observable data for some statistical model[17].Bayesian inference studies the posterior probability distribution based on Bayes’theorem:

      where Φ denotes the general model parameter whose probability may be affected by the observable data D.

      By variational Bayesian methods,the joint posterior distribution p(x,Φ|Y)can be approximated by free form distributions q(x,Φ).Here Y represents hidden variables and other hypothesis.

      where q(x)and q(Φi)are Gaussian or Wishart distributions respectively.The purpose of VB approximation is to minimize the distance between posterior distribution p(x,Φ|Y)and variational distribution q(x,Φ)under KL divergence:

      where c represents a constant independent of the variational distribution q(x,Φ).The evidence of the lower bound is

      Therefore,the problem becomes

      The variational Bayesian methods iterate between each variational distribution q(Φi)(VB-E step)and updating global distribution q(x)(VB-Mstep)while keeping other distribution fixed,which can be summarized as

      In this paper,we will consider how to develop VB on manifolds that can be approximated by some Lie groups.At first,we study the VB on Lie groups,and then we give error analysis of the manifolds that can be approximated by Lie groups.Then we discuss several metrics to show their strong or weak points in the computation on manifolds.

      3 Approximating data manifold

      In order to learn a probability distribution supported on the general manifold,our strategy is through variational Bayesian methods which mainly define a set of parametric densities{p?}?∈Φand searching for the parameter that maximized the likelihood function on the manifold[18],that is,for real data samplesour goal is to find the following solution:

      Asymptotically this is equivalent to minimizing the KL divergence KL(pr‖p?)on the data manifold through its nearby Lie group structure.Thus we have to modify the probability density p?to exist over the manifold.This is not the common situation where we are working with density distribution supported by more complex manifold[10].This data manifold usually is not unimodular Lie group and does not even admitsymmetric structures.The standard construction is to add some noise term to approximate the real distribution,which is a natural consequence of the high-dimensional space structure.

      Theorem 1Let ? be a independent random variable with density p?.If the manifold M admits distribution pMsupported on the Lie group G,then pM+?is absolutely continuous and

      ProofLet N be a Borel subset with measure 0.Since ? and pMare independent,according to Fubini theorem

      Here we used the fact from real analysis that if N has measure 0,then so does N?x.Therefore,p?(N?x)=0,which shows that pM+?is absolutely continuous.

      Moreover,we compute the density of pM+?using the independence of ? and pM.For the Borel set K ∈ M we have

      Corollary 1?For ?~ N(0,σ2I),

      Hence,this theorem has shown that the density pM+?(x)is proportional to the inverse distance for points on pMsupported in G.Also,their weights are exactly the probability of these points.When the support of pMis a Lie group,we will choose the average weighted distance to the points according to the group operation.In the above corollary,we can see the impact of various random errors with different types of decays by altering the covariance matrix.

      The phenomenon described above is quite intriguing because these distribution models are highly accurate on the Lie group data set.Thus we can first define a simple synthetic geometric model to investigate some examples.For the Lie group SO(3),the corresponding geometric model is 3 dimensional sphere:

      which is not a real Lie group.However there is a canonical double cover homeomorphism χ between them:

      Thus all the learning distribution processes about SO(3)can be pulled back to S3.

      Therefore,the data distribution on this 3 dimensional sphere is generated by a random x∈R4where the norm ‖x‖2is assigned with equal probability.Studying a synthetic sphere data set has many advantages since the probability density of data pr(x)is well defined and is uniform over all x in its support.We can also sample uniformly from pr(x)by z~N(0,I)and then setting x=z/‖z‖2or x=Rz/‖z‖2.Thus,we can control the difficulty of the learning problem by vary R.

      4 Variational inference on Lie group

      Lie Groups are defined by the fact that their integration measures are invariant under shifts and inversions.In any parametrization,this measure or the corresponding volume element can be expressed as in the above example by first computing a left or right Jacobian matrix and then setting d g=|J(q)|d q1d q2...d qn,where n is the Lie group dimension.In the special case when q=x is the exponential coordinates,

      where x=X∨and d x=d x1d x2...d xn.In the above expression it make sense to write the division of one matrix by another because the involved matrices commute.The symbol G is used to denote the Lie algebra corresponding to G.In practice,the integral is performed over a subset of G,which is equivalent to defining f(eX)to be zero over some portion of G.

      Suppose that f(g)is a probability density function on a Lie group G.Then

      It can be shown that unimodularity implies the following equalities for arbitrary h∈G[19]:

      Then the variational approximation is to minimize KL divergence between posterior probability pG(X,Θ)and variational distribution qG(X,Θ),where X are data points supported on G,Θ is the set of unknown parameters and Y represents hidden variables:

      Moreover,let us consider a basic Bayesian model of a set of data points from a Gaussian distribution on the Lie group G,with unknown mean μ and variance τ?1.For mathematical convenience,we compute in terms of the precision τ,i.e.,the reciprocal of the variance.Here we use conjugate prior probability on the inference.In other words,

      where μ0,λ0,a0,b0are fixed hyperparameters.Given N data points X={x1,...,xN}supported on this Lie group G,our purpose is to infer on the posterior probability distribution

      of the parameters μ and τ.

      Thus,the joint probability distribution can be written as

      where

      Assume we have

      that is,the posterior probability distribution can be divided into independent factors of μ and τ.Then

      Using the identity for quadratics sum,we can get the following result.

      Lemma 1For any real number A,B,x,y,z(A+B≠0),we have the following identity:

      Then we can reduce the expansion for ln(μ),

      Note that in the above equations,c1,c2,c3and c4refer to values that are constant with respect to μ.

      As a result,

      where c5is constant with respect to τ.

      Therefore,

      In all those cases,the parameters for the distribution over one of the variables depend on the computation of expectation to the other variable.We can expand the expectations,in light of the standard formulas for the expectations of the Gaussian and gamma distributions:

      Applying these formulas to the above equations,it is straightforward to check

      and working out the last term:

      Note that μN(yùn),λN,bNare circular dependencies,and therefore,the computation can be done as follows:

      ?Set initial value for λN.

      ?VB-E step:use the current value ofλNand the known values of the other parameters to compute bN.

      ?VB-M step:use the current value of bNand the known values of the other parameters to compute λN.

      ?Iterate the last two steps until neither of them have changed more than some small amount,as the criterion for convergence.

      5 Analysis for different metrics

      It is known that the existing VB method basically relies on the KL divergence.However,this divergence or metric may not be so efficient when VB is considered on manifolds.In this section,we investigate different ways to measure the topology induced by the geometric distribution and the real distribution on the data manifold.Once we find a suitable metric,we may easily rewrite VB with the new metric.Therefore,here we only discuss some comparison between several well-known metrics or divergences.

      As is mentioned in Section 4,generally the distribution pron a real data manifold is more intractable than a Lie group case[13].If there exists a proper metric or divergence ρ(pr,p?),then the intractable distribution can be well approximated.

      Here is an example shown that some simple sequences of probability distributions on manifold converge under Wasserstein metric but do not converge under the others.

      Suppose ξ ~ Δ[0,1]is the uniform distribution on the real line interval.Let p0be the density of(0,ξ)∈ R2,uniformly on the straight line through(0,0).Then let p?(?,ξ)=(?,ξ)with ? being the one parameter.It is easy to check that in this situation,

      As we can see the basic difference between such metrics is their effect on the convergence of probability distribution sequences.When ?n→ 0,the sequence{p?n}n∈Nconverges to p0under the Wasserstein metric,but not in other cases.Geometrically,when the metric ρ makes it easier for a sequence of distributions to converge,it induces a weaker topology on the manifold.This example presents us a geometric setting that we can infer a probability distribution over the lower dimensional data manifold by doing variational approximation from the Wasserstein metric.Although this simple example has density function with disjoint components supported on two orthogonal lines,the same conclusion holds for lower dimensional manifolds intersecting in general position as long as their supports have a nonempty component.

      Because the Wasserstein metric induces a weaker topology than the KL divergence,it is natural to decide whether W(pr,p?)is a properly continuous approximation function on parameter ?.In fact,we have the following result.

      Theorem 2If pris a given distribution over data manifold M,ξ is an absolutely continuous random variable over parameter space Ξ and h∶Rd×Ξ → M is a density function,denoted by h?(ξ),then W(pr,p?)is continuous with respect to the Wasserstein metric and almost differentiable everywhere when h is locally Lipschitz in ?.

      Proof Suppose that ? and ?′are two vector parameters in Rd.Then,with the joint distribution λ of(h?(ξ),h?′(ξ)),we show λ ∈ Π(p?,p?′).By definition,we have

      If h is locally Lipschitz in ?,then h?(ξ) → h?′(ξ)as ? → ?′,so ‖h?→ h?′‖→ 0 pointwise as the function for ξ.Since the manifold M is connected,there is a uniform bound denoted by L for any two points.Therefore,

      According to the control convergence theorem,we have

      Finally,

      which implies the continuity of W(pr,p?).

      For any coupling(?,ξ),there exists a constant K(?,ξ)such that(?,ξ) ∈ U for an open set U.In this open neighborhood(?′,ξ′)∈ U,we have

      Take expectations on both sides and set ξ = ξ′,we get

      as long as(?′,ξ) ∈ U.Thus we may define U?={?′|(?′,ξ) ∈ U}.It is routine to check that U?is open as well since U is open.We can also define C(?)=Eξ[K(?,ξ)]and derive for all ?′∈ U?,which shows that W(pr,p?)is locally Lipschitz.Thus,W(pr,p?)is continuous.By Radamacher’s theorem,it is also almost differentiable everywhere. □

      From this theorem,we can see that,if two distributions prand p?are supported on manifold M which is compactly connected,then the error terms may force the modified distribution pr+?to overlap with p?+?almost everywhere.The following lemma shows that,for the Wasserstein metric,the modified distribution pr+?change smoothly around the compact component when the error terms have bounded decrease.

      Lemma 2Let ? be a random variable with zero mean.Then

      where L=E[‖?‖22]is the variance of ?.

      ProofSuppose that η ~ pMand ξ = η + ? with ? independent from η.If λ is the joint distribution for(η,ξ),then it clearly has marginals pMand pM+?.Hence,

      where the last step is by Jensen’s inequality. □

      Because the distance of actual sample points on the data manifold is closely related to those error modification and continuity,ultimately we want a strategy to investigate this approximation no matter whether those metrics are continuous or not.The next theorem shows the advantage of the Wasserstein metric in this perspective.

      Theorem 3Suppose that pr,p?are two distributions on manifold M and ? is a random variable with zero mean and variance L.If pr+?,p?+?are supported on a ball of diameter D in M,then

      ProofBy Lemma 1,we compute the middle terms as a function of L.

      Moreover,we use the fact that total variation satisfies W(pr,p?)≤ δW(pr,p?).Finally,we work with Pinsker’s inequality to reduce each KL divergence term,which is one of the non-negative component for JSD metric. □

      This theorem presents an intriguing phenomenon that the two unrelated terms in this inequality can be bounded.Since L can be decreased by reducing the error and JSD(pr+?‖p?+?)can be minimized by optimizing the KL divergence between those two distributions.This can be seen as an extension to the Wasserstein metric in terms of manifold approximation.

      6 Conclusions

      In this paper,we have analysed the approximation of data manifold by distribution supported on Lie group with specific error bound and given a basic variational inference example for Gaussian distribution on certain Lie group.Variational Bayessian have been studied on general manifolds with certain metrics.Since the embedding low dimensional data manifold may have empty support intersection,we have compared different metrics for suitable cases,where the posterior probability distribution of error terms and variance are bounded by an noise modification.

      永登县| 大竹县| 永定县| 马关县| 苏尼特右旗| 长葛市| 张家川| 利川市| 潜江市| 桂东县| 昌黎县| 绍兴县| 隆昌县| 兴安县| 桃园市| 治多县| 新源县| 武威市| 武城县| 玉溪市| 南昌县| 日照市| 宁晋县| 安康市| 萍乡市| 荥经县| 宝山区| 凤山县| 天镇县| 胶南市| 长治市| 民县| 贞丰县| 沿河| 寿光市| 敦煌市| 垣曲县| 安阳市| 五台县| 盖州市| 永泰县|