Jian-Feng Cai,Meng Huang,Dong Liand Yang Wang
1Department of Mathematics,The Hong Kong University of Science and Technology,Clear Water Bay,Kowloon,Hong Kong
2SUSTech International Center for Mathematics and Department of Mathematics,Southern University of Science and Technology,Shenzhen,Guangdong 518055,China
Abstract.A fundamental problem in phase retrieval is to reconstruct an unknown signal from a set of magnitude-only measurements.In this work we introduce three novel quotient intensity models(QIMs)based on a deep modiif cation of the traditional intensity-based models.A remarkable feature of the new loss functions is that the corresponding geometric landscape is benign under the optimal sampling complexity.When the measurements ai∈Rnare Gaussian random vectors and the number of measurements m≥Cn,the QIMs admit no spurious local minimizers with high probability,i.e.,the target solution x is the unique local minimizer(up to a global phase)and the loss function has a negative directional curvature around each saddle point.Such benign geometric landscape allows the gradient descent methods to find the global solution x(up to a global phase)without spectral initialization.
Key words:Phase retrieval,landscape analysis,non-convex optimization.
The intensity-based model for phase retrieval is
whereaj∈Rn,j=1,···,mare given vectors andmis the number of measurements.The phase retrieval problem aims to recover the unknown signalx∈Rnbased on the measurementsA natural approach to solve this problem is to consider the minimization problem
However,as shown in[28],to guarantee the above loss function to have benign geometric landscape,the requirement of sampling complexity isO(nlog3n).This result is recently improved toO(nlogn)in[6].On the other hand,due to the heavy tail of the quartic random variables in(1.1),such results seem to be optimal for this class of loss functions.
To remedy this issue,we propose in this work three novel quotient intensity models(QIMs)to recoverxunder optimal sampling complexity.We rigorously prove that,for Gaussian random measurements,those empirical loss functions admit the benign geometric landscapes with high probability under the optimal sampling complexityO(n).Here,the phrase“benign” means:(1)the loss function has no spurious local minimizers;and(2)the loss function has a negative directional curvature around each saddle point.The three quotient intensity models are
The phase retrieval problem arises in many fields of science and engineering such as X-ray crystallography[17,23],microscopy[22],astronomy[8],coherent diffractive imaging[16,27]and optics[33]etc.In practical applications due to the physical limitations optical detectors can only record the magnitude of signals while losing the phase information.Many algorithms have been designed to solve the phase retrieval problem,which includes convex algorithms and non-convex ones.The convex algorithms usually rely on a “matrix-lifting”technique,which is computationally inefficient for large scale problems[2,4,32].In contrast,many non-convex algorithms bypass the lifting step and operate directly on the lower-dimensional ambient space,making them much more computationally efficient.Early non-convex algorithms were mostly based on the technique of alternating projections,e.g.,Gerchberg-Saxton[15]and Fineup[10].The main drawback,however,is the lack of theoretical guarantee.Later Netrapalli et al.[24]proposed the AltMinPhase algorithm based on a technique known asspectral initialization.They proved that the algorithm linearly converges to the true solution withO(nlog3n)resampling Gaussian random measurements.This work led further to several other non-convex algorithms based on spectral initialization.A common thread is first choosing a good initial guess through spectral initialization,and then solving an optimization model through gradient descent,such as the Wirtinger Flow method[3],Truncated Wirtinger Flow algorithm[7],randomized Kaczmarz method[18,30,35],Gauss-Newton method[12],Truncated Amplitude Flow algorithm[34],Reshaped Wirtinger Flow(RWF)[36]and so on.
As was already mentioned earlier,producing a good initial guess using spectral initialization seems to be a prerequisite for prototypical non-convex algorithms to succeed with good theoretical guarantees.A natural and fundamental question is:
Is it possible for non-convex algorithms to achieve successful recovery with a random initialization(i.e.,without spectral initialization or any additional truncation)?
In the recent work[28],Ju Sun et al.carried out a deep study of the global geometric structure of phase retrieval problem.They proved that the loss function does not have any spurious local minima underO(nlog3n)Gaussian random measurements.More specifically,it was shown in[28]that all local minimizers coincide with the target signalxup to a global phase,and the loss function has a negative directional curvature around each saddle point.Thanks to this benign geometric landscape any algorithm which can avoid saddle points converges to the true solution with high probability.A trust-region method was employed in[28]to find the global minimizers with random initialization.To reduce the sampling complexity,it has been shown in[21]that a combination of the loss function with a judiciously chosen activation function also possesses the benign geometry structure underO(n)Gaussian random measurements.Recently,a smoothed amplitude flow estimator has been proposed in[5]and the authors show that the loss function has benign geometry structure under the optimal sampling complexity.Numerical tests show that the estimator in[5]yields very stable and fast convergence with random initialization and performs as good as or even better than the existing gradient descent methods with spectral initialization.
The emerging concept of a benign geometric landscape has also recently been explored in many other applications of signal processing and machine learning,e.g.,matrix sensing[1,25],tensor decomposition[13],dictionary learning[29]and matrix completion[14].For general optimization problems there exist a plethora of loss functions with well-behaved geometric landscapes such that all local optima are also global optima and each saddle point has a negative direction curvature in its vincinity.Correspondingly several techniques have been developed to guarantee that the standard gradient based optimization algorithms can escape such saddle points efficiently,see e.g.,[9,19,20].
This paper aims to show the intensity-based model(1.1)with some deep modification has a benign geometry structure under the optimal sampling complexity.More specifically,we first introduce three novel quotient intensity models and then we prove rigorously that each loss function of them has no spurious local minimizers.Furthermore,the loss function of quotient intensity model has a negative directional curvature around each saddle point.Such properties allow first order method like gradient descent to locate a global minimum with random initial guess.
Our first result shows that the loss function of(1.2)has the benign geometric landscape,as stated below.
Theorem 1.1(Informal).Consider the quotient intensity model(1.2).Assumeare i.i.d.standard Gaussian random vectors and x/=0.There exist positiveabsolute constants c,C,such that if m≥Cn,then with probability at least1-e-cmthe loss function f=f(u)has no spurious local minimizers.The only local minimizers are±x.All other critical points are strict saddles.
The second result is the global analysis for the estimator(1.3).
Theorem 1.2(Informal).Consider the quotient intensity model(1.3).Let0<β<∞.Assumeare i.i.d.standard Gaussian random vectors and x/=0.There exist positive constants c,C depending only on β,such that if m≥Cn,thenwith probability at least1-e-cmthe loss function f=f(u)has no spurious local minimizers.The only local minimizer is±x and all other critical points are strict saddles.
Remark 1.1.There appears some subtle differences between estimators(1.2)and(1.3).Although the former looks more singular,one can prove full strong convexity in the neighborhood of the global minimizers.In the latter case,however,we only have certain restricted convexity.
The third result is the global landscape for the estimator(1.4).
Theorem 1.3(Informal).Consider the quotient intensity model(1.4).Let0<β1,β2<∞.Assumeare i.i.d.standard Gaussian random vectors and x/=0.There exist positive constants c,C depending only on β,such that if m≥Cn,then with probability at least1-e-cmthe loss function f=f(u)has no spurious local minimizers.The only local minimizer is±x and all other critical points are strict saddles.
Remark 1.2.For this case,thanks to the strong damping,we have full strong convexity in the neighborhood of the global minimizers.
Throughout this proof we fixβ>0 as a constant and do not study the precise dependence of other parameters onβ.We writeu∈Sn-1ifu∈Rnand‖u‖2=We useχto denote the usual characteristic function.For exampleχA(x)=1 ifx∈AandχA(x)=0 ifx/∈A.We denote byδ1,∈,η,η1various constants whose value will be taken sufficiently small.The needed smallness will be clear from the context.For any quantityX,we shall writeX=O(Y)if|X|≤CYfor some constantC>0.We writeX?YifX≤CYfor some constantC>0.We shall writeX?YifX≤cYwhere the constantc>0 will be sufficiently small.In our proof it is important for us to specify the precise dependence of the sampling sizemin terms of the dimensionn.For this purpose we shall writem?nifm≥Cnwhere the constantCis allowed to depend onβand the small constants∈,∈ietc used in the argument.One can extract more explicit dependence ofCon the small constants andβbut for simplicity we suppress this dependence here.We shall say an eventAhappens with high probability if P(A)≥1-Ce-cm,wherec>0,C>0 are constants.The constantscandCare allowed to depend onβand the small constants∈,δmentioned before.
In Sections 2-4 we carry out an in-depth analysis of the corresponding geometric landscape of QIM1,QIM2 and QIM3 under optimal sampling complexityO(n).In Section 5,we report some numerical experiments to demonstrate the efficiency of our proposed estimators.In Appendix,we collect the technique lemmas which are used in the proof.
In this section,we consider the first quotient intensity model and prove that it has benign geometric landscape,as demonstrated below.
Theorem 2.1.Assume{ak}mk=1are i.i.d.standard Gaussian random vectors and x/=0.There exist positive absolute constants c,C,such that if m≥Cn,then with probability at least1-e-cmthe loss function f=f(u)defined by(2.1)has no spurious local minimizers.The only local minimizer is±x,and the loss function is strongly convex in a neighborhood of±x.The point u=0is a local maximum point with strictly negative-definite Hessian.All other critical points are strict saddles,i.e.,each saddle point has a neighborhood where the function has negative directional curvature.Without loss of generality we shall assumex=e1throughout the rest of the proof.
Note that the set has measure zero.Thus for typical realization we haveak·e1/=0 for allk.This means that the loss functionf(u)defined by(2.1)is smooth almost surely.We denote the Hessian of the functionf(u)along theξ-direction(ξ∈Sn-1)as
Theorem 2.2(Strong convexity nearu=±e1).There exists an absolute constant0<∈0?1such that the following hold.For m?n,it holds with high probability that
Proof.By Lemma A.1,we can take∈>0 sufficiently small,Nsufficiently large such that
The above term inside the expectation is clearly OK for union bounds.Thus for‖u±e1‖≤∈0andm?n,it holds with high probability that
Thus the desired inequality follows.
We first investigate the pointu=0.It is trivial to verify that?f(0)=0 sinceak·e1/=0 for allkalmost surely.
Lemma 2.1(u=0 has strictly negative-de finite Hessian).We have u=0is a local maximum point with strictly negative-definite Hessian.More precisely,for m?n,it holds with high probability that
Proof.By(2.2),it is obvious that
The desired conclusion then easily follows from Bernstein’s inequality.
A simple calculation leads to
Lemma 2.2(The regime‖u‖2≥1+∈0is OK).Let0<∈0?1be any given small constant.Then the following hold:For m?n,with high probability it holds that
Proof.DenoteXk=ak·e1andZk=ak·.By(2.3a)and Cauchy-Schwartz,we have
where 0<δ1?1 is an absolute constant which we can take to be sufficiently small,and in the last inequality we have used Bernstein.The desired result then easily follows by takingand choosingδ1such thatR1≤1+∈0.
From(2.3a),due to the highly irregular coefficients nearR,it is difficult to control the upper bound of?Rfin the regimeR?1.To resolve this difficulty,we shall examine the Hessian in this regime.
where He1e1is defined in(2.2).
Proof.It follows from(2.2)together with Bernstein’s inequality that form?nwith high probability,it holds
This completes the proof.
1.We have
2.The point u=0is a local maximum point with strictly negative-definite Hessian,
3.We have
Proof.This follows from Lemmas 2.1,2.2 and 2.3.
Proof.By(2.3a),we have if?Rf(u)=0,then
Theorem 2.5(Localization ofRwhen||·e1|-1|≤η0,R≤1+η0anduis a critical point).Let0<η0?1be given.For mn,the following hold with high probability:Assume u=is a critical point with≤R≤1+η0,and||·e1|-1|≤η0.Then wemust have
where c(η0)→0as η0→0.
Proof.Denote?ξf=ξ·?fforξ∈Sn-1.It is not difficult to check that
whereXk=ak·e1.Settingξ=andξ=e1,respectively give us two equations:
Without loss of generality we assume1‖2≤η?1.Then with high probability we have
Observe that by Cauchy-Schwartz,
Plugging the above estimates into(2.5),we obtain
Using(2.4a),we then get
The desired result then easily follows.
We now complete the proof of the main theorem.
Proof of Theorem2.1.We proceed in several steps.
1.By Theorem 2.2,the functionf(u)is strongly convex when‖u±e1‖2?1.
2.By Theorem 2.3,fhas non-vanishing gradient whenThe pointu=0 is a strict local maximum point with strictly negative-definite Hessian.
This completes the proof.
Consider forβ>0,
Theorem 3.1.Let0<β<∞.are i.i.d.standard Gaussian randomvectors and x/=0.There exist positive constants c,C depending only on β,such that if m≥Cn,then with probability at least1-e-cmthe loss function f=f(u)defined by(3.1)has no spurious local minimizers.The only local minimizer is±x,and the loss function is restrictively convex in a neighborhood of±x.The point u=0is a local maximum point with strictly negative-definite Hessian.All other critical points are strict saddles,i.e.,each saddle point has a neighborhood where the function has negative directional curvature.
Remark 3.1.See Theorem 3.4 for the precise statement concerning restrictive convexity.
Without loss of generality we shall assumex=e1throughout the rest of the proof.
We first investigate the pointu=0.It is trivial to verify that?f(0)=0 sinceak·e1/=0 for allkalmost surely.
Lemma 3.1(u=0 has strictly negative-de finite Hessian).We have u=0is local maximum point with strictly negative-definite Hessian.More precisely,for m?n,it holds with high probability that
where d1>0is an absolute constant.
Proof.We begin by noting that since almost surelyak·e1/=0 for allk,the functionfis smooth atu=0.It suffices for us to consider(write
Clearly
The desired conclusion then easily follows by using Bernstein’s inequality.
Clearly
Lemma 3.2(The regime‖u‖2?1 is OK).There exist constants R1=R1(β)>0,d1=d1(β)>0such that the following hold:For m?n,with high probability it holds that
Proof.We only sketch the proof.DenoteXk=ak·e1andZk=akUsing the inequalities(assumeR?1 and denote byC1>0 a constant depending only onβ)
It is also easy to show that
Thus we can takeNlarge such that
It is easy to show that by takingRlarge,form?n,it holds with high probability that
Since all the other terms are OK for union bounds,the desired result then clearly follows by takingRlarge.
Lemma 3.3(The regime‖u‖2?1 withis OK).There exist a constantR2=R2(β)>0such that the following hold:For m?n,with high probability it holds that
Proof.We only sketch the proof.DenoteA short computation gives
whereu1=u·e1andu′=u-u1e1.Now observe that
where in the above the constant∈1>0 will be taken sufficiently small.The needed smallness will become clear momentarily.Sinceit is clear that for some absolute constantC1>0,
Now
All the other terms can be treated by takingRsufficiently small,and the desired result follows easily.
Lemma 3.4(The regime‖u‖2?1 withis OK).There exist a constantR3=R3(β)>0such that the following hold:For m?n,with high probability it holds that the loss function f=f(u)has no critical points in the regime
Proof.We assume that for 0<R?1 there exists some critical point.The idea is to examine the necessary conditions for a potential critical point and then derive a lower bound onR.DenoteXk=ak·e1andZk=ak.By(3.2),we have?Rf=0 which gives
On the other hand,by using?u1f(u)=0,we obtain
Thus
Without loss of generality we assumeObserve that for 0<R≤1,we have
Thus with high probabilityB1~1.
Now by Lemma B.1,for 0<R?1,we have
Also for 0<R≤1,we have
By Lemma B.2,for 0<R?1,we have
wherec1,c2>0 are constants depending only onβ.Thus with high probability we have for 0<R?1,B2~1.
Now since
we obtain
whereB3=B1/B2.Observe thatA1>0,A2>0,and
It follows easily that with high probability we have
But then it follows from the equationRA1=B1that we must haveR~1.Thus the desired result follows.
Theorem 3.2(The regimes‖u‖2?1 and‖u‖2?1 are OK).For m?n,with high probability the following hold:
1.We have
where d1,R1are constants depending only on β.
2.We have
where R2>0is a constant depending only on β.
3.The loss function f=f(u)has no critical points in the regime
where R3>0is a constant depending only on β.
4.The point u=0is a local maximum point with strictly negative-definite Hessian,
where d2>0is an absolute constant.
Proof.This follows from Lemmas 3.1,3.2,3.3 and 3.4.
Lemma 3.5(The regime‖u‖2~1 with∈01|≤1-∈0is OK).Let0<∈0?1be given.Assume0<c1<c2<∞are two given constants.Then for m?n,the following hold with high probability:The loss function f=f(u)has no critical points in the regime:
More precisely,introduce the parametrization=e1cosθ+e⊥sinθ,where θ∈[0,π]and e⊥∈Sn-1satisfies e⊥·e1=0.Then in the aforementioned regime,we have
where α1depends only on(β,∈0,c1,c2).
Proof.See appendix.
Lemma 3.6(The regime‖u‖2~1 with|·e1|≤∈1is OK).Let0<∈1?1be a sufficiently small constant.Assume0<c1<c2<∞are two given constants.Then for m?n,the following hold with high probability:Consider the regime
Introduce the parametrizationsatisfies e⊥·e1=0.Then in the aforementioned regime,we have
where α2>0depends only on(β,∈1,c1,c2).
Proof.See appendix.
Theorem 3.3(The regime‖u‖2~1,||·e1|-1|≤∈0,|‖u‖2-1|≥c(∈0)is OK).Let0<R1<1<R2<∞be given cons?tants.Le?t0<∈0?1be a given sufficiently smallconstant and consider the regimeThere exists aconstant c0=c0(∈0,R1,R2,β)>0which tends to zero as∈0→0such that the followinghold:For m?n,with high probability it holds that
Proof.We first consider the regimeR≥1+c.Letbe an even function satisfying 0≤φ(x)≤1 for allx,φ(x)=1 for|x|≤1 andφ(x)=0 for|x|>2.By using(3.2),we have
By takingKsufficiently large,we can easily obtain
wherea~N(0,In).For fixedK,it is not difficult to check that the lower bound(3.4)are OK for union bounds and they can be made close to the expectation with high probability,uniformly inR~1 andSn-1.The perturbation argument(i.e.,estimating the error terms coming from replacingak·byak·e1and so on)becomes rather easy after taking the expectation.It is then not difficult to show that
forR≥1+c(∈0).
Next we turn to the regimeR2≤R≤1-c(∈0).Without loss of generality we may assume|11|≤∈0.The idea is to exploit the decomposition used in the proof of Lemma 3.4.Namely using?Rf=0 and?u1f=0,we have
It is not difficult to check that with high probability,we haveB1~1,B2~1,and
whereη(∈0)→0 as∈0→0.We then obtain
From this it is easy(similar to an argument used in the proof of Lemma 3.4)to derive that
Now note that the pre-factor ofA2is1=1+O(∈0).By using the relation
we obtain
By using localization,i.e.,decomposing
whereη1(∈0)→0 as∈0→0.It then follows easily that(with high probability)
whereη2(∈0)→0 as∈0→0.
Now observe that forA1,we have
whereC∈>0 depends only on∈.Clearly by taking∈>0 sufficiently small and using the derived quantitative estimates preceding this paragraph,we can guarantee that(with high probability)
It follows that we must have|R-1|?1 for a potential critical point.By using(3.2)we have?Rf(R=0)<0.By using(3.3)we have?RRf>0.Since we have shown?Rf>0 forR>1+c(∈0),it then follows that?Rf=0 occurs at a unique point|R-1|?1 and?Rf<0 forR<1-c(∈0)providedc(∈0)is suitably re-defined.
We now show restrictive convexity of the loss functionf(u)near the global minimizeru=±e1.
Theorem 3.4(Restrictive convexity near the global minimizer).There exists0<∈0?1sufficiently small such that if m?n,then the following hold with high probability:
where γis a constant depending only on β.
3.Alternatively we can use the parametrizationand|t|≤∈0.Then with this special parametrization,we have
Note that this includes the global minimizers u=±e1.
In yet other words,f(u)is restrictively convex in a sufficiently small neighborhood of±e1.
Proof.See appendix.
Proof of Theorem3.1.We proceed in several steps as follows.
1.For the regime‖u‖2?1 and‖u‖2?1,we use Theorem 3.2.The pointu=0 is a local maximum point with strictly negative-definite Hessian.All other possible critical points must have negative curvature direction.
2.For the regime‖u‖2~1,1|-1|≥∈0,we use Lemma 3.5 and 3.6.The loss function either has a nonzero gradient,or it is a strict saddle with a negative curvature direction.
3.For the regime‖u‖2~1,1|-1|≤∈0,|‖u‖2-1|≥c(∈0),we apply Theorem 3.3.The loss function has nonzero gradient in this regime.
4.Finally for the regime close to the global minimizers±e1,we use Theorem 3.4 to show restrictive convexity.This ensures that±e1are the only minimizers.
This completes the proof.
Consider forβ1>0,β2>0,
Theorem 4.1.Let0<β1,β2<∞.are i.i.d.standard Gaussianrandom vectors and x/=0.There exist positive constants c,C depending only on(β1,β2),such that if m≥Cn,then with probability at least1-e-cmthe loss function f=f(u)defined by(4.1)has no spurious local minimizers.The only local minimizer is±x,and the loss function is strongly convex in a neighborhood of±x.The point u=0is a local maximum point with strictly negative-definite Hessian.All other critical points are strict saddles,i.e.,each saddle point has a neighborhood where the function has negative directional curvature.
Without loss of generality we shall assumex=e1throughout the rest of the proof.Thus we consider
We first investigate the pointu=0.It is trivial to verify that?f(0)=0 sinceak·e1/=0 for allkalmost surely.
Lemma 4.1(u=0 has strictly negative-definite Hessian).We have u=0is a local maximum point with strictly negative-definite Hessian.More precisely,it holds(almost surely)that
where d1>0is a constant depending only on β2.
Proof.We begin by noting that since almost surelyak·e1/=0 for allk,the functionfis smooth atu=0.It suffices for us to consider
By a simple computation,we have
The desired conclusion then easily follows.
Clearly
Lemma 4.2(The regimes‖u‖2?1 or‖u‖2?1 are OK).There exist constants Ri=Ri(β1,β2)>0,di=di(β1,β2)>0,i=1,2such that the following hold:For m?n,with high probability it holds that
Proof.DenoteZk=ak·.We first consider the regimeR?1.Observe that
where the last inequality holds form?nwith high probability.On the other hand we note that
where again the last inequality holds forRsufficiently large,and form?nwith high probability.Similarly we have forRsufficiently large,
Thus it follows easily that?Rf?1 forR?1.
Now we turn to the regime 0<R?1.First we note that the main negative term is OK.This is due to the fact that for 0<R≤1,we have(form?nand with high probability)
On the other hand,we have(form?nand with high probability)
if we first takeKsufficiently large followed by takingRsufficiently small.The estimate of the other termis similar and we omit further details.
Collecting the estimates,it is then clear that we can obtain the desired estimate for?Rfwhen 0<R?1.
Lemma 4.3(The regime‖u‖2~1 with∈01|≤1-∈0is OK).Let0<∈0?1be given.Assume0<c1<c2<∞are two given constants.Then for m?n,the following hold with high probability:The loss function f=f(u)has no critical points in the regime:
More precisely,introduce the parametrization=e1cosθ+e⊥sinθ,where θ∈[0,π]and e⊥∈Sn-1satisfies e⊥·e1=0.Then in the aforementioned regime,we have
where α1depends only on(β,∈0,c1,c2).
Proof.We first recall
Clearlyak·=Xkcosθ+(ak·e⊥)sinθ,and
In particular,ifθis away from the end-points 0,π,then
We then obtain(belowZk=ak·)
Thanks to the strong damping,it is not difficult to check that for any∈>0,ifm?n,then with high probability we have
The desired result then follows from Lemma C.1.
Lemma 4.4(The regime‖u‖2~1 with|1|≤∈0is OK).Let0<∈1?1be a sufficiently small constant.Assume0<c1<c2<∞are two given constants.Then for m?n,the following hold with high probability:Consider the regime
Introduce the parametrizationsatisfies e⊥·e1=0.Then in the aforementioned regime,we have
where α2>0depends only on(β,∈1,c1,c2).
Proof.This is similar to the argument in the proof of Lemma 4.3.By a tedious computation,we have
where
It is then tedious but not difficult to check that that for any∈>0,ifm?n,then with high probability we have
The desired result then follows from Lemma C.1.
Theorem 4.2(The regime‖u‖2~1,||1|-1|≤∈0,|‖u‖2-1|≥c(∈0)is OK).Let0<c1<1<c2<∞be given constants.Let0<∈0?1be a given sufficiently smallconstant and consider the regimeThere exists aconstant c0=c0(∈0,c1,c2,β)>0which tends to zero as∈0→0such that the followinghold:For m?n,with high probability it holds that
Proof.We rewrite
where
It is not difficult to check that forR~1,we have
On the other hand,note that(?Rg)(1,b,b)=0,and forR~1,
Thus forR=1+η,η>0 we have
whereγ1>0,γ2>0 are constants depending only on(β1,β2,c1,c2).The desired result(for?Rf>0 whenR→1+)then follows from this and simple application of Bernstein’s inequalities.The estimate for the regimeR→1-is similar.We omit the details.
Theorem 4.3(Strong convexity near the global minimizer).There exist0<∈0?1and a positive constant γsuch that if m?n,then the following hold with high probability:
1.If‖u-e1‖2≤∈0,then
2.If‖u+e1‖2≤∈0,then
In yet other words,f(u)is strongly convex in a sufficiently small neighborhood of±e1.
Proof.See appendix.
Finally we complete the proof of Theorem 4.1.
Proof of Theorem4.1.We proceed in several steps.All the statements below hold under the assumption thatm?nand with high probability.
1.Foru=0,we use Lemma 4.1.In particularu=0 is a local maximum point with strictly negative Hessian.
2.For‖u‖2?1 or‖u‖2?1,we use Lemma 4.2.The loss functions has a nonzero gradient(?Rf/=0)in this regime.
3.For‖u‖2~1 with∈01|≤1-∈0,we use Lemma 4.3 to show that the loss function has a nonzero gradient(?θf/=0)in this regime.
4.For‖u‖2~1 with1|≤∈0,by Lemma 4.4,the loss function has a negative curvature direction(i.e.,?θθf<0)in this regime.
5.For‖u‖2~1,1|-1|≤∈0,|‖u‖2-1|≥c(∈0),Theorem 4.2 shows that the gradient of the loss function does not vanish(i.e.,?Rf/=0).
6.For‖u±e1‖?1,Theorem 4.3 gives the strong convexity in the full neighborhood.
It is not difficult to check that the above 6 scenarios cover the whole of Rn.We omit further details.
In this section,we demonstrate the numerical efficiency of our estimators by simple gradient descent and compare their performance with other competitive algorithms.Our Quotient intensity models are:
We have shown theoretically that any gradient descent algorithm will not get trapped in a local minimum for the estimators above.Here we present numerical experiments to show that the estimators perform very well with randomized initial guess.
We test the performance of our QIM2 and QIM3 and compare with SAF[5],Trust Region[29],WF[3],TWF[7]and TAF[34].Here,it is worth emphasizing that random initialization is used for SAF,Trust Region[29]and our QIM2,QIM3 algorithms while all other algorithms have adopted a spectral initialization.
In our numerical experiments,the target vectorx∈Rnis chosen randomly from the standard Gaussian distribution and the measurement vectorsai,i=1,···,mare generated randomly from standard Gaussian distribution or CDP model.For the real Gaussian case,the signalx~N(0,In)and measurement vectorsai~N(0,In)fori=1,···,m.For the complex Gaussian case,the signalx~N(0,In)+iN(0,In)and measurement vectorsai~N(0,In/2)+iN(0,In/2).For the CDP model,we use masks of octanary patterns as in[3].For simplicity,our parameters and step size are fixed for all experiments.Specifically,we adopt parameterβ=1 and step sizeμ=0.4 for QIM2 and choose the parameterβ1=0.1,β2=1,step sizeμ=0.3 for QIM3.For Trust Region,WF,TWF and TAF,we use the codes provided in the original papers with suggested parameters.
Example 5.1.In this example,we test the empirical success rate of QIM2,QIM3 versus the number of measurements.We conduct the experiments for the real Gaussian,complex Gaussian and CDP cases,respectively.We choosen=128 and the maximum number of iterations isT=2500.For real and complex Gaussian cases,we varymwithin the range[n,10n].For CDP case,we set the ratiom/n=Lfrom 2 to 10.For eachm,we run 100 times trials to calculate the success rate.Here,we say a trial to have successfully reconstructed the target signal if the relative error satisfies dist(uT-x)/‖x‖≤10-5.The results are plotted in Fig.1.It can be seen that 6nGaussian phaseless measurement or 7 octanary patterns are enough for exactly recovery for QIM2 and QIM3.
Figure 1:The empirical success rate for different m/n based on 100 random trails.(a)Success rate for real Gaussian case,(b)Success rate for complex Gaussian case,(c)Success rate for CDP case.
Example 5.2.In this example,we compare the convergence rate of QIM2,QIM3 with those of SAF,WF,TWF,TAF for real Gaussian and complex Gaussian cases.We choosen=128 andm=6n.The results are presented in Fig.2.We can see that our algorithms perform well comparing with state-of-the-art algorithms with spectral initialization.
Figure 2:Relative error versus number of iterations for QIM,SAF,WF,TWF,and TAF method:(a)Real-valued signals;(b)Complex-valued signals.
Example 5.3.In this example,we compare the time elapsed and the iteration needed for WF,TWF,TAF,SAF and our QIM2,QIM3 to achieve the relative error 10-5and 10-10,respectively.We choosen=1000 withm=8n.We adopt the same spectral initialization method for WF,TWF,TAF and the initial guess is obtained by power method with 50 iterations.We run 50 times trials to calculate the average time elapsed and iteration number for those algorithms.The results are shown in Table 1.The numerical results show that QIM3 takes around 27 and 50 iterations to escape the saddle points for the real and complex Gaussian cases,respectively.
Table 1:Time elapsed and iteration number among algorithms on Gaussian signals with n=1000.
We next compare the performance of the above algorithms on recovering a natural image from masked Fourier intensity measurements.The image is the Milky Way Galaxy with resolution 1080×1920.The colored image has RGB channels.We useL=20 random octanary patterns to obtain the Fourier intensity measurements for each R/G/B channel as in[3].Table 2 lists the averaged time elapsed and the iteration needed to achieve the relative error 10-5and 10-10over the three RGB channels.We can see that our algorithms have good performance comparing with state-of-the-art algorithms with spectral initialization.
Table 2:Time elapsed and iteration number among algorithms on recovery of galaxy image.
We now demonstrate the robustness of QIM2,QIM3 to noise and compare them with SAF,WF,TWF,TAF.We consider the noisy modelyi=|〈ai,x〉|+ηiand add different level of Gaussian noises to explore the relationship between the signal-tonoise rate(SNR)of the measurements and the mean square error(MSE)of the recovered signal.Specifically,SNR and MSE are evaluated by
whereuis the output of the algorithms given above after 2500 iterations.We choosen=128 andm=8n.The SNR varies from 20db to 60db.The result is shown in Fig.3.We can see that our algorithms are stable for noisy phase retrieval.
Figure 3:SNR versus relative MSE on a dB-scale under the noisy Gaussian model:(a)Real Gaussian case;(b)Complex Gaussian case.
Appendix A:technical estimates for Section 2
where a~N(0,In).
Proof.We first show that there exist∈>0,such that
Clearly it suffices for us to show
where in the last inequality we used the fact that
Thus(A.2)and(A.1)hold.Now∈is fixed.To show the final inequality,we note that
asNtend to infinity.Thus the desired inequality easily follows.
Lemma A.2.Let0<η0?1be given.Then if m?n,then the following hold with high probability:
Proof.Without loss of generality we write
Clearly|s|≥s0=s0(η0)>0,wheres0(η0)is a constant depending only onη0.Takea~N(0,In)and observe that
if∈>0 is taken sufficiently small.Now we fix this∈.Clearly form?nwith high probability it holds that
Thus,we complete the proof.
Appendix B:technical estimates for Section 3
Lemma B.1.For any∈>0,there exists R0=R0(β,∈)>0sufficiently small,such that if m?n,then the following hold with high probability:
Proof.We then split the sum as
Clearly the first term is amenable to union bounds,and we can make it sufficiently small with high probability by takingη0small(depending only onβand∈).The second term is trivial since we can takeRsufficiently small.Thus we complete the proof.
Lemma B.2.There exists R1=R1(β)>0sufficiently small,such that if m?n,then the following hold with high probability:
In the above c1,c2>0are constants depending only on β.
Proof.DenoteXk=ak·e1.Writeande⊥∈Sn-1satisfiese⊥·e1=0.We then write
For the first term we note that for 0<R≤1,
Thus we clearly have for allwith high probability it holds that
The second term is clearly OK for union bounds and with high probability it can be made sufficiently small.For the last term,observe that with high probability,
ifR≤R1andR1is sufficiently small.The desired result then clearly follows.
Proof of Lemma3.5.Without loss of generality we consider the situation=e1cosθ+where 0<∈1,∈2?1.The point is thatθstays away from the end-points 0 andDenoteXk=ak·e1,Yk=ak·e⊥andZk=ak·.Then
We then obtain
SinceR~1,it is not difficult to check that the third and fourth terms above are amenable to union bounds??The union bound includes covering in and R.,i.e.,with high probability(form?n)we have
Next we treat the second term.Letφ(x)=1 for|x|≤1 andφ(x)=0 for|x|≥2.We have
ForH2we have(η0will be taken sufficiently small)
We first takeη0sufficiently small so thatH2,acan be included in the estimate ofH0without affecting too much the main order.On the other hand,onceη0is fixed,we can takeMsufficiently large such that
Finally we treatH0.Clearly
By takingKlarge,it can be easily checked that
On the other hand,for fixedK,clearlyH0,ais OK for union bounds.It holds with high probability that
Collecting all the estimates,we obtain
where|Error|?1.The desired lower bound for?θfthen easily follows from Lemma B.3 below.
and γ3>0,γ4>0are constants depending only on(β,c1,c2).
Proof.We have
where
Integrating further inxthen gives
where the value ofc3is unimportant for us,and
First we show thatc2<0.By a short computation,we have
Thusc2<0.
Next we show thatc1>0.We have
It amounts to checking
This follows from Lemma B.4 below.
Finally we showc1+c2<0.We have
2(c1+c2)
Since we have shown(B.1),we then only need to check
This in turn follows from Lemma B.4.
Finally we consider the polynomial
Lemma B.4(Refined upper and lower bounds on the Complementary Error function).Let
Then
Remark B.1.In the regimey≥1,one can check that the upper and lower bounds here are sharper than(B.2).One should also recall that the usual way to derive the lower bound in(B.2)through conditional expectation.Namely one can regardas the conditional meanμ(y)=E(X|X>y)whereXhas the p.d.f.Then evaluating the variance E((X-μ1)2|X>y)>0 givesThis yields the upper bound forμ1which in turn is the desired lower bound in(B.2).An interesting question is to derive a sharper two-sided bounds via more careful conditioning.However we shall not dwell on this issue here.
Proof of LemmaB.4.We focus on the regimex>1.By performing successive simple change of variables,we have
where in the last line we adopted Pochhammer’s symbol(a)n=a(a+1)···(a+n-1).Note that the above is an asymptotic series,and it is not difficult to check that
Moreover,ifmis an even integer,then
and ifmis odd,then
Now takingm=4,we have
Forx≥3,it is not difficult to verify that
Hence the upper bound is OK forx≥3.
Next takingm=5,we have
It is not difficult to verify that forx≥4,we have
Hence the lower bound is OK forx≥4.
Finally for the regimex∈[0,4],we use rigorous numerics to verify the inequality.Since we are on a compact interval,this can be done by a rigorous computation with controllable numerical errors.
Proof of Lemma3.6.Again denoteXk=ak·e1andZk=ak·.Without loss of generality we assumefor some sufficiently smallη>0.By a tedious computation,we have
Note that the third,fourth and fifth terms are OK for union bounds.The second and the first term can be handled in a similar way as in the proof of Lemma 3.5.The only difference is that the sign is now negative in the regimeUsing Lemma B.3 it follows that?θθf<0 in this regime.We omit the repetitive details.
Proof of Theorem3.4.Without loss of generality we consider the regime‖u-e1‖2?1.Before we work out the needed estimates for the restricted convexity,we explain the main difficulty in connection with the full Hessian matrix.DenoteXk=ak·e1.Then for anyξ∈Sn-1,we have
First observe that ifu=e1,then the Hessian can be controlled rather easily thanks to the damping
On the other hand,foru/=e1,as far as the lower bound is concerned,the main difficult terms are(B.7)and(B.5)which are out of control if we do not impose any condition onξ(i.e.,using(B.3)to control it).On the other hand,if we restrictξto the directionu-e1,then we can control these difficult terms by using the main good term(B.3).Namely,introduce the decomposition
wheret=‖u-e1‖2?1.Then for(B.5)we write
Sincet?1,the termt(ak·u)2(ak·e1)2(together with the pre-factor term in(B.5))an be included into(B.3)which still has a good lower bound by using localization.On the other hand,the term(ak·u)2(ak·e1)(ak·ξ)can be split as
whereφis a smooth cut-off function satisfying 0≤φ(z)≤1 for allz∈R,φ(z)=1 for|z|≤1 andφ(z)=0 for|z|≥2.Clearly the contribution of(B.9)in(B.5)is OK for union bounds.On the other hand,for(B.10)we have
Clearly this is under control(the first term can again be controlled using(B.3)).
Now we turn to(B.7).The main term is(ak·u)4.We write
Clearly then this is also under control.
By further using localization,we can then show that with high probability,it holds that
where|Error|?1.The desired conclusion then follows from Lemma B.5.
Remark B.2.Introduce the parametrizationwheree⊥∈e1=0,|R-1|?1 and|θ|?1.One might hope to prove that the Hessian matrix
is positive definite nearu=e1under the mere assumem?nand with high probability.However there is a subtle issue which we explain as follows.Consider the main term(writeX=ak·e1andY=ak·e⊥)
The most troublesome piece come from quartic and cubic terms inY,and we consider
In yet other words,the sign is not favorable and this renders the Hessian out of control(before taking the expectation).
where c0>0is a constant depending only on β.
Proof.Introduce the parametrizationwheree⊥·e1=0,|s|≤1.
It is not difficult to check that
Thus it follows that
The desired result then follows by a simple perturbation argument using the fact that E?tttfis uniformly bounded and taking|t|sufficiently small.
Appendix C:technical estimates for Section 4
where
and γ1>0,γ2>0are constants depending only on(β1,β2,c1,c2).Furthermore for some sufficiently small constants θ0=θ0(β1,β2,c1,c2)>0,θ1=θ1(β1,β2,c1,c2)>0,we have
where γi>0,i=3,···,6depend only on(β1,β2,c1,c2).
Proof.We have
Denote
Then
By using integration by parts,we then obtain
Now denote
It is not difficult to check that fora≥0,b≥0,β1,β2>0,R>0,
Observe that
Then ifx,y>0 andθ∈[0,π],then
Integrating inxandy,we then obtain
wherea1~1 and is a smooth function ofθ.Differentiating inθthen gives
Then second term clearly vanishes nearThus the desired estimate for E?θθffollows.
Lemma C.2(Strong convexity of Efwhen‖u±e1‖?1).Let h(u)=Ef(u).There exists0<∈0?1such that the following hold:
Proof.We shall employ the same approach as in the proof of Theorem 2.5 in the second paper of this series of work and sketch only the needed modifications.Without loss of generality consider the regime‖u-e1‖2?1 and introduce the change of variables:
where|ρ-1|?1 and 0≤s?1.Denote
where we note that the value ofh(u)depends only on(ρ,s).Clearly
It is easy to check that
By a tedious computation,we have
where
Sinceρ→1,it is clear thatk1>0 andk2>0,and thus
It is not difficult to check that?sh1(ρ,0)=0 for anyρ>0.Clearly also?ρsh1(ρ,0)=0 for anyρ>0.To compute?ssh1(1,0)we shall use Lemma C.1.Observe that(s=sinθwithθ→0+)
Clearly it follows that
The rest of the argument is then essentially the same as in the proof of Theorem 2.5 in the second paper.We omit further details.
Proof of Theorem4.3.We rewrite
where
Clearly for anyξ∈Sn-1,
In the above,
Then for any(u,)with
Thus the union bound is also OK for this term,and we have
Estimate of(C.4)and(C.5).We begin by noting that(C.4)and(C.5)can be combined into one term.Namely,observe that
whereh3is a bounded smooth function with bounded derivatives in all of its arguments.Now letbe such that 0≤φ(x)≤1 for allx,φ(x)=1 for|x|≤1 andφ(x)=0 for|x|≥2.We then split the sum as
whereKwill be taken sufficiently large.Clearly the first term will be OK for union bounds.On the other hand,the second term can be dominated by
which can be made small by takingKlarge.Thus we have
Collecting the estimates,we have form?nand with high probability,
The desired result then follows from Lemma C.2.
Acknowledgements
J.F.Cai was supported in part by Hong Kong Research Grant Council General Research Grant Nos.16309518,16309219,16310620,and 16306821.Y.Wang was supported in part by Hong Kong Research Grant Council General Research Grant Nos.16306415 and 16308518.
Annals of Applied Mathematics2022年1期