Xiangming Meng, Sheng Wu*, Michael Riis Andersen, Jiang Zhu, Zuyao Ni
1 Huawei Technologies Co. Ltd., Shanghai 201206, China
2 Tsinghua Space Center, Tsinghua University, Beijing 100084, China
3 Department of Computer Science, Aalto University P.O. Box 15400, FI-00076, Finland
4 Ocean College, Zhejiang University, Zhoushan 316021, China
Due to limited volume, weight and power consumption, micro-satellite has to reduce data transmission and storage capacity by image compression when performs earth observation missions. However, the quality of images may become unsatisfied. It is well known that compressed sensing (CS) can reconstruct sparse signals accurately from under-sampled linear measurements [1]. CS makes it possible to compress onboard images of micro-satellites maximally, thereby minimizing the required data transmission and storage space, as image can be recovered by CS to meet the requirement of missions. In the last decade, CS technology has been applied in many areas such as imaging processing, machine learning, radar detection, and computer science. Moreover,exploiting the sparsity of the target signal in wireless communications has been studied intensively in recent years [2], [3]. Typical examples include channel estimation [4]–[6],multiuser detection [7], and low-resolution analog-to-digital converters [8]–[10]. To this end, plethora of methods have been studied in the past years. The Bayesian interpretation of sparse reconstruction involves maximum a posteriori (MAP) inference with some sparsity-promoting priors, e.g., Laplace prior [11],automatic relevance determination [12], Dirichlet process prior [13], and spike and slab prior [14]. Among various Bayesian methods,approximate message passing (AMP) [15] is one state-of-the-art algorithm for sparse reconstruction. AMP can be seen as a large system limit approximation of belief propagation [16]and is deeply related to the seminal Thouless-Anderson-Palmer (TAP) equations in spin glass theory [17]. Moreover, to deal with general linear mixing problems, AMP has been extended to generalized AMP (GAMP) [18],which greatly enables the wide applicability of the AMP framework in sparse reconstruction.
While many practical signals can be described as sparse, they often exhibit an underlying structure, such as clustered sparsity,i.e., the nonzero coefficients occur in clusters,which is also known as group sparse or block sparse [19], [20]. In such settings, the nearby coefficients exhibit dependencies and exploiting such intrinsic structure beyond simple sparsity can significantly boost the reconstruction performance [20]. From the optimization perspective, various regularizations that model specific sparsity pattern are proposed, e.g.,group LASSO [21], Struct OMP [22]. From the Bayesian perspective, a number of methods have been developed to use structured priors to model both sparsity and cluster patterns simultaneously. The main effort of these algorithms lies in constructing a hierarchical prior model,e.g., Markov tree [23], structured spike and slab [24]–[26], hierarchical Gamma-Gaussian[27] to encode the structured sparsity pattern.
In this paper, using the structured spike and slab prior [24]–[26] with high flexibility in modeling the sparsity pattern, an efficient message passing algorithm, termed as AMP with structured spike and slab prior (AMPSSS), is proposed to recover structured sparse signals with no prior knowledge of the sparsity pattern. Different from [24]–[26], which used expectation propagation (EP) [28], [29]to perform approximate Bayesian inference,this paper resorts to AMP [15] and a fast direct method [30], so that the computational complexity of sparse reconstruction with spike and slab prior is significantly reduced. In practice,the sparsity pattern isunknown, and the original AMP cannot be directly applied to recover signals with structured prior distribution. To address these problems, following the Turbo AMP framework [31], [32], an efficient method to learn the hyperparameters of structured spike and slab prior using expectation maximization (EM) [14], [33] and Bethe free energy [4], [34], [35] optimization is proposed.It is important to note that though this paper considers the linear Gaussian model, the proposed method can be extended to generalized linear models in a straightforward way using the GAMP [36], [37]. To test the effectiveness of the proposed method, various experiments on both synthetic and real data are performed,showing that it achieves excellent performance in recovering structured sparse signals with Gaussian process prior.
The rest of this paper is organized as follows. In Section II, the generalized linear model with structured spike and slab prior is described, which encodes the structured sparsity using a transformed Gaussian process.In Section III, the posterior of the proposed model is computed using the framework of AMP. Section IV presents a unified learning scheme of the hyperparameters via EM and Bethe free energy minimization. To reduce the computational complexity, in Section V, a novel method, namely fast direct method, is described. Extensive experiments are conducted in Section VI to demonstrate the efficiency of our method. Finally, some conclusions and future directions are made in Section VII.
Consider the generalized linear model (GLM)with structured prior as shown in figure 1. The input unknown signal vector x∈?Nis generated following a structured prior distribu-tionp0(x). The signal vector x then passes through a linear transform
where A∈?M×Nis the measurement matrix and is assumed to be known. The output observation vector y is obtained through a component-wise random mapping, which is described by a factorized conditional distribution
The GLM arises in various problems in signal processing, communications, and machine learning. The classic linear Gaussian model,
whereis a special case of GLM withwhere N(x; m, C) denotes the Gaussian distribution of x with mean m and covariance C and IKdenotes the identity matrix of dimension K. This paper only considers the linear Gaussian model (3). Extension to GLM is straightforward.
To model the structure of the sparsity pattern of signal x, the authors in [24] proposed a structured spike and slab prior inspired by Gaussian processes [38]. Specifically, the hierarchical prior distribution of x reads
where s∈{0,1}Nis the hidden support vector,δ(?) is the Dirac delta function,f(xi) is the distribution of the nonzero entryxi, φ(?) is the standard Normal cumulative distribution function (CDF) i.e.,Ber(s|p) denotes Bernoulli distribution function withp(s=1|p)=p, andpa(γ) is the a priori probability of γ. Furthermore, the prior covariance matrix Σais constructed using kernel functions, which further constitute a set of hyperparameters. In this paper, the squared exponential kernel function is taken as an example. That is, the (i,j)th element of Σais defined as
so that this kind of covariance matrix include hyperparameters κ ands. Due to the marginal characteristic of multivariate Gaussian distributions,whereare theith element of γaand the ith diagonal element of Σa, respectively. Withpa(γi), the marginal prior probability ofxibeing active can be calculated as
Note that the choice off(xi) in (4) is flexible, which is an advantage of spike and slab prior. This paper only focuses on Gaussian distribution, i.e.,The structured spike and slab prior can encode prior information about the sparsity pattern.Specifically, the mean value vector γ?acontrols the expected degree of sparsity while the covariance matrix Σadetermines the prior correlation of the support [24]–[26]. As in Gaussian processes, the covariance matrix Σacan be constructed using various kernel functions such as radial basis function (RBF) [38]. The joint posterior distribution of x, s, and γ can be written as whereZis the normalization constant.
Fig. 1. Generalized linear model with structured prior [36].
Algorithm 1. AMP [18], [40].
Fig. 2. Factor graph of the joint distribution.
The goal is to estimate x from the noisy observations y using the minimum mean square error (MMSE) criterion. It is well known that the MMSE estimate ofxiis the posterior mean, i.e.,, wherep(xi|y) is the marginal posterior distribution defined as
where xidenotes all variables in x excludingxi. Direct computation of (10) requires high-dimensional summations and integrals,rendering the complexity of exact calculation prohibitively high. The factorization in (9)can be explicitly encoded by a factor graph,one kind of undirected bipartite graph that connects the distribution factors in (9) with the random variables that constitute their arguments [16], as shown in figure 2. The round nodes denote the random variables while the square nodes denote the factors in (9). In fact, since the overall factor graph in figure 2 has loops, exact inference is NP-hard [39].As such, we resort to approximate inference methods.
Before proceeding to deal with the case with structured priors,first take a look at the simple case with separable priors, i.e.,
As shown in figure 3, the factor graph of the joint distribution with separable priors is a subgraph of that with structured priors. Consequently, we can utilize the efficient algorithms such as AMP and GAMP algorithms to perform optimal reconstruction in the subgraph of figure 2. To deal with arbitrary separable priors, AMP has been extended to Bayesian AMP(B-AMP) [18], [40]. Specifically, our method resorts to the Bayesian form AMP, B-AMP,which is summarized in Algorithm 1. For de-tailed derivation, the readers are referred to[18], [40].
The AMP algorithm cannot be directly applied to reconstruct the structured signal of the form(4)-(6) due to its structured prior. To address this problem, this paper resorts to the Turbo approach proposed in [31], [32], [41]. In particular, the factor graph in figure 2 is divided into two subgraphs and then alternate between message passing within subgraphs and message passing between subgraphs in a turbo-like manner until they reach a common agreement,i.e., the iteration converges. Specifically, probabilistic beliefs of the hidden support elements s are exchanged between the two subgraphs in figure 2, the left one exploiting the observation model and the right one exploiting the structured spike and slab prior. One full round of alternation is designated as a “turbo iteration”.
Denote bythe message from factor nodefcito variable nodesiat thetth turbo iteration while the message in the opposite direction is denoted byThe other messages in factor graph shown in figure 2 follows the same notation. The messagescan be viewed as soft estimates of the support elements and treated as priors for the left subgraph. Thanks to the decoupling characteristic of the turbo approach,these tentative priors can be seen as separable priors. As a result, the left subgraph reduces to the ordinary linear Gaussian model for which AMP can be applied. After convergence of AMP, the left subgraph yields messageswhich are then treated as priors for the right subgraph. Then, we update the soft estimates of the support elements using EP without inner iterations, as detailed in section 3.3. The resulting leftward messageare treated as priors for AMP on the left subgraph at the next turbo iteration. This process continues until convergence or a maximum number of turbo iterations. The final estimate of signals x is the output of AMP in the last turbo iteration.
This subsection addresses the computation of messages in the factor graph in a full round of single turbo iteration. Note that the message passing within the left subgraph is performed using AMP as described in Tab. I, so that our main focus in this section is on the message computation within the right subgraph and that between the two subgraphs. Specifically,at thetth turbo iteration, we assume that the message from factor nodefdito variable γiis
Fig. 3. Factor graph of the joint distribution with separable prior.
which implies that
Note that the normalization constantis written in the form of two sub terms related to the zero supportand the active supportas in [42]. Then, by definition,the posterior mean and variance are
To avoid potential numerical problem, it is better to rewrite
Now we focus on the computation of messagesafter convergence of AMP. From the marginal posterior distribution in (17), the posterior support probability is
In the logarithmic domain, we have
From (26) and (27), note that given the prior support probability ofthe extrinsic information of the supportsiproposed by the left subgraph is
where
Therefore, the message from nodefbito variablesiis
To computei.e., the message fromfcito γi, the EP [28] method is resorted.Before proceeding to the detailed message computation, the definition of probability distribution projection operation induced by the Kullback-Leibler divergence is given. Mathematically, the projection of a particular distributionpinto a distribution set Φ is defined as
whereD(p||q) denotes the Kullback-Leibler divergence. Ifp∈Φ, then the projection reduces to the identity mapping, i.e.,q=p.
The joint posterior probability ofsiand γiis
where ∝ denotes identity between two distributions up to a normalization constant. The tentative marginal posterior probability of γican be evaluated as
whereZγiis the normalization constant
For notational brevity, let us define
Then, we have
According to the massage-passing update rule of EP [28], we get
Choosing the Gaussian distribution set Φ,then the projection operationq=ProjΦ[p] reduces to moment matching, i.e., the mean and variance are the same with respect to distributionpandq. So the mean and variance of γiwith respect toqt(γi) need to be computed.Specifically, the mean can be evaluated by
For notational brevity, we follow the derivation in [38] and define
whereare defined as in (35) and(36), respectively. Thus
To calculate the variance of γ_i, we first evaluate its second moment
As in [38], the integral can be calculated as
Therefore, the posterior variance ofγican be easily calculated as
Having derived the posterior mean and variance ofγi, the message from factor nodefcito variable nodeγiis updated as
Then, combining all the messages from factor nodefcito γi,i=1,… ,N, and the Gaussian process prior (6), the updated posterior distribution of γ at thet+1 th iteration is obtained as
Finally, the message from factor nodefdto variable node γiat the (t+1) th iteration is updated as
denote hyperparameters and θtdenote the estimation at thetth iteration. The EM algorithm alternates between the following two steps:
In practice, since the prior parameters that encode prior distribution of γ are unknown, they need to be learned from data. In the sequel,expectation maximization (EM) algorithm is utilized to learn these hyperparameters.
The hidden variables are chosen as x, s,and
where E{?|y; θt} denotes expectation conditioned on observations y under parameters θt,i.e., the expectation is with respect to the posterior conditional distributionp(x, s, γ| y, θt).The exact computation ofp(x, s, γ| y, θt) is intractable in practice. Fortunately, the Turbo AMP framework offers an efficient approximation, whereby the E step in (55) can be readily calculated as
The joint optimization of θ is impractical and thus the incremental EM update rule is adopt-ed, i.e., one element is updated at a time while holding the other parameters fixed.
Following the derivation in [14],Q(θ, θt) is maximized with respect toand τaand the update equations are
For γa, the EM update is
Setting the derivative ofQ(θ, θt) with respect to γato zero results in
To learn these parameters of Σadefined by(7), the EM update can be also applied. However, the computation is a bit tedious. By defining
we have Σa=Σκ from (7) and
Setting the derivative ofQ(θ, θt) with respect to κ to zero results in
After some algebra, we get
where ξijis the (i,j) th element of Σ?1and the last equality is due to the update of γain(67).
As for the learning of length-scale pa-rameters, however, there is no closed form update equations. To address this problem, a free energy view is adopted in optimizings.Following similar derivation in [35], the Bethe free energy after thetth turbo iteration can be calculated as
where the valuesare obtained from AMP-SSS (refer to Algorithm 1 and Algorithm 2). At first glance, it seems that the Bethe free energy is not related to the parameters. In fact,depends onsimplicitly through. For each values ofs, the associatedis obtained as in (72) and the optimalscorresponds to the minimum free energy, i.e.,
Though the search space forsis huge,numerical results show that the performance is insensitive tos. As a result, the search space can be restricted to a small set with typical values. Denote by S the search space set. It is empirically demonstrated thatS={10?3,10?2,1,10,50,100,500} performs quite well most of the time and is used in Section VI.
Algorithm 2. AMP with structured spike and slab priors (AMP-SSS).
In this section, the computation complexity of the proposed algorithm is analyzed. The turbo message schedule implies that the complexity consists of two parts: AMP operation and update of soft support by the structured prior. The complexity of AMP is O(MN)[18], [36]. However, the update of soft support needs matrix inversions as shown in (50) and(51), whose direct solution scales as O(N3),which is prohibitively high in large-scale setting. To make this algorithm applicable in high-dimensional settings, the complexity needs to be further reduced. Note that (50) and(51) can be rewritten as
Therefore, the computational bottleneck lies in the inversion of matrix Σa+. In[43], a kind of fast direct method is proposed to compute the inversion of matrix of the form C=D+K with complexity O(Nlog2N),where D is a general diagonal matrix with non-negative constants just like, K is computed using a specified covariance kernel.Sinceis a diagonal matrix, and that only the diagonal terms ofneed to be computed, the complexity of (74) scales as O(N2).The matrix-by-matrix product in (75) need not be computed directly. Instead, do the matrix-by-vector multiplications first, so that the complexity in (75) is O(N2) as well. Therefore, the overall complexity of the proposed algorithm then reduces to O(MN+N2) ignoringNlog2Nterm per iteration.
In this section, a series of numerical experiments are conducted to investigate the efficiency of the proposed algorithm, referred to as AMP-SSS. Comparisons are made to algorithms without consideration on structures, such as Basis Pursuit (BP) [44], and EM-BG-GAMP [14], as well as some state of-the-art methods without prior knowledge of the sparsity pattern but taking into account structures, e.g., MBCS-LBP [27], PC-SBL[45] and its AMP version PCSBL-GAMP[46]. The performance of oracle least squares estimator (Oracle LS) which knows the true support is given as the benchmark. Note that since the proposed AMP-SSS requires no prior knowledge of the sparsity pattern, e.g., sparse ratio, number of groups, etc., no comparisons are to those algorithms that need partial or full knowledge of the sparsity pattern.
Throughout the experiments, no prior knowledge of the sparsity pattern, e.g., sparsity ratio, number of nonzero groups, is known except for Oracle LS. The maximum number of iterations for PCSBL-GAMP, and EMBG-GAMP is set to beTmax=200, and the tolerance value of termination is ∈toc=10?6.For AMP-SSS, there are two iterative loops.The inner maximum number of AMP iterations is set to beLmax=50 and the outer maximum number of turbo iterations is set to beTmax=50. The tolerance values for the inner AMP and outer turbo iterations are set to be∈amp=10?6and ∈turbo=10?6, respectively. To avoid divergence, the damping technique is used for AMP-SSS, and the damping factor is set to be 0.3. For other algorithms, the default settings are used. The elements of measurement matrix A∈?M×Nare independently generated following standard Gaussian distribution and the columns are normalized to unit norm. The success rate is defined as the ratio of the number of successful trials to the total number of experiments, where a trial is successful if the normalized mean square error (NMSE) is less than -50 dB, wherebeing the recovered signal. The pattern recovery success rate is defined as the ratio of the number of successful trials to the total number of experiments, where a trial is successful if the support is exactly recovered. A coefficient whose magnitude is less than 10?4is deemed as a zero coefficient.
Synthetic block-sparse signals are generated in a similar way as [45], [47], whereKnonzero elements are partitioned intoLblocks with random sizes and random locations. SetN=100,K= 25,L= 4 and the nonzero elements are generated independently following Gaussian distribution with mean μ0=3 and variance τ0=1 . The results are averaged over 100 independent runs. Figure 4(a) andfigure 4(b)depict the success rate and pattern recovery success rate, respectively. It can be seen that in both scenarios, AMP-SSS performs nearly the same as PC-SBL and slightly better than PCSBLGAMP. In the noisy setting, figure 5 shows the average NMSE of different algorithms when the signal to noise ratio (SNR) is 50 dB, whereNote that AMP-SSS and PC-SBL outperform other methods both in terms of NMSE in the noisy case.
Fig. 4. Block sparse Gaussian signal reconstruction, noiseless case.
The synthetic block-sparse binary signals, i.e.,the elements are either 0 or 1, are generated in a similar way as the sparse Gaussian case. SetN= 100,K= 25,L= 4. The results are averaged over 100 independent runs.
Fig. 5. NMSE vs. M/N for block sparse Gaussian signals, SNR = 50 dB.
Figure 6(a) and figure 6(b) depict the success rate and pattern recovery success rate,respectively. It can be seen that AMP-SSS achieves the highest success rate and pattern recovery rate at various measurement ratios,implying that the proposed AMP-SSS is very robust to the true prior of the nonzero elements. In addition, compared with PC-SBL and PCSBL-GAMP, AMP-SSS has much more flexibility to encode the distribution of nonzero elements by specifying various distributions on the slab part.
In the noisy setting, figure 7 shows the average NMSE of different algorithms when the signal to noise ratio (SNR) is 50 dB. Note that AMP-SSS outperforms the other algorithms apparently in terms of average NMSE. Note that for measurement ratios higher than 0.5 in the noisy setting, both AMP-SSS and EM-BGGAMP outperform the Oracle LS estimator in terms of NMSE. This is because both AMPSSS and EM-BG-GAMP can learn and exploit the distribution characteristic of the binary data, while the Oracle LS estimator cannot.
A last series of experiments are carried out to reconstruct 2D images of hand-written digits from the MNIST data set1Data set is available at http://yann.lecun.com/exdb/mnist/. The MNIST data set contains 60,000 digit gray images, each of size 28 × 28 pixels. These images are sparse since most of the pixels are inactive and take the value zero while only a few pixels are active. Moreover, due to the inherent structure of digits, these images also exhibit structured sparsity pattern. One image for each digit is randomly extracted from the MNIST data set. Due to lack of space, the full results are given only for digit 5 in both the noiseless and noisy case with different algorithms. For the remaining digits, average NMSEs in the noisy case at a specific measurement ratio are shown. Note that to recover 2D images, the algorithms MBCS-LBP [27], PC-SBL [45],PCSBL-GAMP [46], and the proposed AMPSSS are all modified to their 2D versions. The results are averaged over 100 independent runs. Figure 8(a) and figure 8(b) depict the success rate and pattern recovery success rate,respectively, which shows that AMP-SSS achieves the highest (except Oracle LS) success rate and pattern recovery rate at various measurement ratios. Figure 9 shows the average NMSEs at different measurements when the signal to noise ratio (SNR) is 30 dB. Note that AMP-SSS outperforms the other methods(except Oracle LS) in terms of NMSE. The typical recovery results are shown in figure 10 when the measurement ratio M/N= 0.30and SNR = 30 dB. In this case AMP-SSS recovers the original image with NMSE = ?2 5.04 dB ,which is much lower than those of the other methods.
Table I shows the average NMSEs of different algorithms for different digits when the measurement ratio M/N = 0.30 and SNR = 30dB. It can be seen that Oracle LS achieves the lowest NMSE in all cases since it knows the support information. For other algorithms with no prior knowledge of the sparsity pattern, the proposed AMP-SSS performs the best for almost all of the digits. From Table I,it can be seen that the ten digits exhibit quite different sparsity patterns, which implies that the reconstruction performance of AMP-SSS is quite robust to the specific sparsity pattern.
Fig. 7. NMSE vs. M/N for block sparse binary signals, SNR = 50 dB.
Fig. 8. Digit 5 reconstruction from the MNIST data set, noiseless case.
Fig. 9. NMSE vs. M/N for one digit 5 from the MNIST data set, noisy case. SNR =30 dB.
Fig. 10. Typical recovery results for one digit 5 from the MNIST data set, noisy case. M/N=0.30 and SNR = 30 dB.
Table I. The nmse for different digits from mnist data set, noisy case. M/N=0.30 AND SNR = 30 dB.
This paper addresses the problem of recovering structured sparse signals using AMP with the structured spike and slab prior. The prior correlation of the support of the solution is encoded by imposing a transformed Gaussian process on the spike and slab probabilities.Under this model, an efficient AMP based algorithm is derived for posterior inference,which reduces the computational complexity significantly. Further, an efficient method is proposed to learn the hyperparameters using EM and Bethe free energy optimization. Various experimental results on both synthetic and real data demonstrate the superior reconstruction performance of the proposed algorithm.
ACKNOWLEDGEMENTS
This work was partially supported by the National Nature Science Foundation of China(Grant No. 91438206 and 91638205), and was supported by Zhejiang Province Natural Science Foundation of China (Grant No.LQ18F010001).
[1] D. L. Donoho, “Compressed sensing,”IEEE Trans.Inf. Theory, vol. 52, no. 4, pp. 1289–1306, Apr.2006.
[2] J. W. Choi, B. Shim, Y. Ding, B. Rao, and D. I. Kim,“Compressed sensing for wireless communications: Useful tips and tricks,”IEEE Communications Surveys Tutorials, vol. 19, no. 3, pp.1527–1550, thirdquarter 2017.
[3] Z. Gao, L. Dai, S. Han, C.-L. I, Z. Wang, and L.Hanzo, “Compressive sensing techniques for next-generation wireless communications,”IEEE Wireless Commun., vol. PP, no. 99, pp. 2–11,2018.
[4] S. Wu, Z. Ni, X. Meng, and L. Kuang, “Block expectation propagation for downlink channel estimation in massive MIMO systems,”IEEE Commun. Lett., vol. 20, no. 11, pp. 2225–2228,Nov 2016.
[5] X. Lin, S. Wu, L. Kuang, Z. Ni, X. Meng, and C. Jiang, “Estimation of sparse massive MIMO-OFDM channels with approximately common support,”IEEE Commun. Lett., vol. 21, no.5, pp. 1179–1182, May 2017.
[6] X. Lin, S. Wu, C. Jiang, L. Kuang, J. Yan, and L.Hanzo, “Estimation of broadband multiuser millimeter-wave massive MIMO-OFDM channels by exploiting their sparse structure,”IEEE Trans.Wireless Commun., vol. PP, no. 99, pp. 1–14,2018.
[7] X. Meng, S. Wu, L. Kuang, D. Huang, and J. Lu,“Multi-user detection for spatial modulation via structured approximate message passing,”IEEE Commun. Lett., vol. 20, no. 8, pp. 1527–1530,Aug 2016.
[8] S. Wang, Y. Li, and J. Wang, “Multiuser detection in massive spatial modulation MIMO with low-resolution ADCs,”IEEE Trans. Wireless Commun., vol. 14, no. 4, pp. 2156–2168, April 2015.
[9] S. Wang, L. Zhang, Y. Li, J. Wang, and E. Oki,“Multiuser MIMO transmission aided by massive one-bit magnitude measurements,”IEEE Trans. Wireless Commun., vol. 15, no. 10, pp.7058–7073, Oct 2016.
[10] H. Cao, J. Zhu, and Z. Xu, “Adaptive one-bit quantization via approximate message passing with nearest neighbour sparsity pattern learning,”IET Signal Processing, Jan. 2018.
[11] T. Park and G. Casella, “The Bayesian LASSO,”Journal of the American Statistical Association,vol. 103, no. 482, pp. 681–686, 2008.
[12] Z. Zhang and B. D. Rao, “Sparse signal recovery with temporally correlated source vectors using sparse Bayesian learning,”IEEE J. Sel. Topics Signal Process., vol. 5, no. 5, pp. 912–926, 2011.
[13] Z. Yuan, C. Zhang, Q. Guo, Z. Wang, X. Lu, and S. Wu, “Combined message passing based SBL with Dirichlet process prior for sparse signal recovery with multiple measurement vectors,”IEEE Access, vol. PP, no. 99, pp. 1–1, 2018.
[14] J. P. Vila and P. Schniter, “Expectation-maximization Gaussianmixture approximate message passing,”IEEE Trans. Signal Process., vol. 61, no.19, pp. 4658–4672, 2013.
[15] D. L. Donoho, A. Maleki, and A. Montanari,“Message-passing algorithms for compressed sensing,”Proc. Nat. Acad. Sci., vol. 106, no. 45,Nov. 2009, pp. 18914–18919.
[16] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger,“Factor graphs and the sum-product algorithm,”IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 498–519, Feb. 2001.
[17] M. Mezard and A. Montanari,Information, physics, and computation. Oxford University Press,2009.
[18] S. Rangan, “Generalized approximate message passing for estimation with random linear mixing,”Proc. IEEE Int. Symp. Inf. Theory, 2011, pp.2168–2172.
[19] M. Stojnic, F. Parvaresh, and B. Hassibi, “On the reconstruction of block-sparse signals with an optimal number of measurements,”IEEE Trans.Signal Process., vol. 57, no. 8, pp. 3075–3085,2009.
[20] Y. C. Eldar, P. Kuppinger, and H. B?lcskei, “Blocksparse signals: Uncertainty relations and efficient recovery,”IEEE Trans. Signal Process., vol.58, no. 6, pp. 3042–3054, 2010.
[21] X. Lv, G. Bi, and C. Wan, “The group LASSO for stable recovery of block-sparse signal representations,”IEEE Transactions on Signal Processing,vol. 59, no. 4, pp. 1371–1382, 2011.
[22] J. Huang, T. Zhang, and D. Metaxas, “Learning with structured sparsity,”The Journal of Machine Learning Research, vol. 12, pp. 3371–3412, 2011.
[23] S. Som and P. Schniter, “Compressive imaging using approximate message passing and a Markov-tree prior,”IEEE Trans. Signal Process., vol.60, no. 7, pp. 3439–3448, 2012.
[24] M. R. Andersen, O. Winther, and L. K. Hansen,“Bayesian inference for structured spike and slab priors,”Proc.Advances in Neural Information Processing Systems, 2014, pp. 1745–1753.
[25] M. R. Andersen, A. Vehtari, O. Winther, and L.K. Hansen, “Bayesian inference for spatio-temporal spike and slab priors,”arXiv preprint arX-iv:1509.04752, 2015.
[26] M. R. Andersen, A. Vehtari, O. Winther, and L.K. Hansen, “Bayesian inference for spatio-temporal spike and slab priors,”Journal of Machine Learning Research, vol. 18, no. 139, pp. 1–58,2017.
[27] L. Yu, H. Sun, G. Zheng, and J. P. Barbot, “Model based Bayesian compressive sensing via local beta process,”Signal Processing, vol. 108, pp.259–271, 2015.
[28] T. Minka, “A family of algorithms for approximate Bayesian inference,” Ph.D. dissertation,Massachusetts Institute of Technology, 2001.
[29] X. Meng, S. Wu, L. Kuang, and J. Lu, “An expectation propagation perspective on approximate message passing,”IEEE Signal Processing Letters,vol. 22, no. 8, pp. 1194–1197, 2015.
[30] S. Ambikasaran, D. Foreman-Mackey, L. Greengard, D. Hogg, and M. O’Neil, “Fast direct meth-ods for Gaussian processes,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 2, pp. 252–265,Feb 2016.
[31] P. Schniter, “Turbo reconstruction of structured sparse signals,”Proc.IEEE 44th Annual Conference on Information Sciences and Systems (CISS).IEEE, 2010, pp. 1–6.
[32] J. Ziniel and P. Schniter, “Efficient high-dimensional inference in the multiple measurement vector problem,”IEEE Trans. Signal Process., vol.61, no. 2, pp. 340–354, 2013.
[33] A. P. Dempster, N. M. Laird, and D. B. Rubin,“Maximum likelihood from incomplete data via the em algorithm,”Journal of the royal statistical society. Series B (methodological), pp. 1–38,1977.
[34] F. Krzakala, M. Mézard, F. Sausset, Y. Sun, and L. Zdeborová, “Probabilistic reconstruction in compressed sensing: algorithms, phase diagrams, and threshold achieving matrices,”Journal of Statistical Mechanics: Theory and Experiment, vol. 2012, no. 08, p. P08009, 2012.
[35] F. Krzakala, A. Manoel, E. Tramel, and L. Zdeborová, “Variational free energies for compressed sensing,”Proc.IEEE International Symposium on Information Theory (ISIT), Jun. 2014, pp. 1499–1503.
[36] S. Rangan, “Generalized approximate message passing for estimation with random linear mixing,”arXiv preprint arXiv:1010.5141v2, 2012.
[37] X. Meng, S. Wu, and J. Zhu, “A unified Bayesian inference framework for generalized linear models,”IEEE Signal Processing Letters, vol. 25,no. 3, pp. 398–402, March 2018.
[38] C. K. Williams and C. E. Rasmussen,Gaussian processes for machine learning. the MIT Press,2006.
[39] D. Koller and N. Friedman,Probabilistic Graphical Models: Principles and Techniques. USA: MIT Press, 2009.
[40] D. L. Donoho, A. Maleki, and A. Montanari,“Message passing algorithms for compressed sensing: I. motivation and construction,” inIEEE Information Theory Workshop (ITW), Jan. 2010,pp. 1–5.
[41] J. Ziniel and P. Schniter, “Dynamic compressive sensing of timevarying signals via approximate message passing,”IEEE Trans. Signal Process.,vol. 61, no. 21, pp. 5270–5284, 2013.
[42] E. W. Tramel, A. Drémeau, and F. Krzakala, “Approximate message passing with restricted Boltzmann machine priors,”arXiv preprint arX-iv:1502.06470, 2015.
[43] S. Ambikasaran and E. Darve, “An O(NlogN) fast direct solver for partial hierarchically semi-separable matrices,”Journal of Scientific Computing,vol. 57, no. 3, pp. 477–501, 2013.
[44] S. S. Chen, D. L. Donoho, and M. A. Saunders,“Atomic decomposition by basis pursuit,”SIAMjournal on scientific computing, vol. 20, no. 1,pp. 33–61, 1998.
[45] J. Fang, Y. Shen, H. Li, and P. Wang, “Pattern-coupled sparse Bayesian learning for recovery of block-sparse signals,”IEEE Trans. Signal Process., vol. 63, no. 2, pp. 360–372, 2015.
[46] J. Fang, L. Zhang, and H. Li, “Two-dimensional pattern-coupled sparse Bayesian learning via generalized approximate message passing,”IEEE Transactions on Image Processing, vol. 25,no. 6, pp. 2920–2930, 2016.
[47] Z. Zhang and B. Rao, “Extension of SBL algorithms for the recovery of block sparse signals with intra-block correlation,”IEEE Trans. on Signal Process., vol. 61, no. 8, pp. 2009–2015, 2013.