Resolution Limits of Sparse Coding in H ig h D im e n s io n s Alyson K. Fletcher, Sundeep Rangan, and Vivek K Goyal§ Abstract This paper addresses the problem of sparsity pattern detection for unknown k sparse n-dimensional signals observed through m noisy, random linear measurements. Sparsity pattern recovery arises in a number of settings including statistical model selection, pattern detection, and image acquisition. The main results in this paper are necessary and sufficient conditions for asymptotically-reliable sparsity pattern recovery in terms of the dimensions m, n and k as well as the signal-tonoise ratio (SNR) and the minimum-to-average ratio (MAR) of the nonzero entries of the signal. With the SNR and MAR fixed, it is shown that the scaling m = O(k log(n - k )) is necessary for any algorithm to succeed, and also sufficient for a computationally-trivial correlation algorithm. Moreover, the trivial algorithm requires at most 4(1 + SNR) times more measurements than maximum likelihood and 4/MAR times more measurements than the well-known lasso method. This provides insight on the precise value and limitations of convex programmingbased algorithms. 1 Introduction Sparse signal models have been used successfully in a variety of applications including waveletbased image processing and pattern recognition. Recent research has shown that certain naturallyoccurring neurological processes may exploit sparsity as well [1­3]. For example, there is now evidence that the V1 visual cortex naturally generates a sparse representation of the visual data relative to a certain Gabor-like basis. Due to the nonlinear nature of sparse signal models, developing and analyzing algorithms for sparse signal processing has been a major research challenge. This paper considers the problem of estimating sparse signals in the presence of noise. We are specifically concerned with understanding the theoretical estimation limits and how far practical algorithms are from those limits. In the context of visual cortex modeling, this analysis may help us understand what visual features are resolvable from visual data. To keep the analysis general, we consider the following abstract estimation problem: An unknown sparse signal x is modeled as an n-dimensional real vector with k nonzero components. The locations of the nonzero components is called the sparsity pattern. We consider the problem of detecting the sparsity pattern of x from an m-dimensional measurement vector y = Ax + d, where A Rm×n is a known measurement matrix and d Rm is an additive noise vector with a known distribution. We are interested in determining necessary and sufficient conditions on the ability to reliably detect the sparsity pattern based on problem dimensions m, n and k , and signal and noise statistics. This work was supported in part by a University of California President's Postdoctoral Fellowship, NSF ´ CAREER Grant CCF-643836, and the Centre Bernoulli at Ecole Polytechnique Federale de Lausanne. ´´ A. K. Fletcher (email: alyson@eecs.berkeley.edu) is with the Department of Electrical Engineering and Computer Sciences, University of California, Berkeley. S. Rangan (email: srangan@qualcomm.com) is with Qualcomm Technologies, Bedminster, NJ. § V. K. Goyal (email: vgoyal@mit.edu) is with the Department of Electrical Engineering and Computer Science and the Research Laboratory of Electronics, Massachusetts Institute of Technology. 1 finite SNR Any algorithm must fail Necessary and sufficient for lasso Sufficient for maximum correlation estimator (9) m< 2 MAR·SNR S NR k log(n - k ) + k - 1 T h eo r em 1 mk (elementary) m 2k log(n - k ) + k + 1 Wainwright [14] m > M8 R k log(n - k ) A f r o m T h eo r em 2 unknown (expressions above and right are necessary) m> 8(1+SNR) k log(n MAR·SNR T h eo r em 2 - k) Table 1: Summary of Results on Measurement Scaling for Reliable Sparsity Recovery (see body for definitions and technical limitations) 1.1 Previous Work While optimal sparsity pattern detection is NP-hard [4], greedy heuristics (matching pursuit [5] and its variants) and convex relaxations (basis pursuit [6], lasso [7], and others) have been widelyused since at least the mid 1990s. While these algorithms worked well in practice, until recently, little could be shown analytically about their performance. Some remarkable recent results are sets of conditions that can guarantee exact sparsity recovery based on certain simple "incoherence" conditions on the measurement matrix A [8­10]. These conditions and others have been exploited in developing the area of "compressed sensing," that considers large random matrices A with i.i.d. Gaussian components [11­13]. The main theoretical result are conditions that guarantee sparse detection with convex programming methods. The best of these results is due to Wainwright [14], that shows that the scaling is necessary and sufficient for lasso to detect the sparsity pattern, provided the SNR scales to infinity. This paper presents new necessary and sufficient conditions that are summarized in Table 1 along with the Wainwright bound (1). The parameters MAR and SNR represent the minimum-to-average and signal-to-noise ratio, respectively. The exact definitions and measurement model are defined below. The necessary condition, that applies to all algorithms, regardless of complexity, shows that the scaling m = (k log(n - k )) must be satisfied for any algorithm to reliably detect the sparsity pattern. Comparing this condition with the Wainwright bound (1), we see that, for a given SNR and MAR, lasso requires a number of measurements, m, that is within a constant factor of the minimum number of measurements. Previous necessary conditions had been based on information-theoretic analyses such as [15­17]. More recent publications with necessary conditions include [18­21]. As described in Section 3, our new necessary condition is significantly stronger than the previous results. As a sufficient condition, we also show that a trivial "maximum correlation" estimator can also achieve reliable detection with the optimal m = O(k log(n - k )) scaling. On this basis we argue that main benefits of more sophisticated methods, such as lasso, is not in achieving the optimal scaling, but rather in the dependence of the minimum-to-average ratio. m 2k log(n - k ) + k + 1. (1 ) 2 Problem Statement Consider estimating a k -sparse vector x Rn through a vector of observations, y = Ax + d, m× n m (2 ) wh e r e A R is a random matrix with i.i.d. N (0, 1/m) entries and d R is i.i.d. unitvariance Gaussian noise. Denote the sparsity pattern of x (positions of nonzero entries) by the set Itrue , which is a k -element subset of the set of indices {1, 2, . . . , n}. Estimates of the sparsity ^ pattern will be denoted by I with subscripts indicating the type of estimator. We seek conditions ^ under which there exists an estimator such that I = Itrue with high probability. 2 In addition to the signal dimensions, m, n and k , we will show that there are two variables that dictate the ability to detect the sparsity pattern reliably: the SNR, and what we will call the minimum-toaverage ratio. The SNR is defined by E[ Ax 2 ] E[ Ax 2 ] = . (3 ) E[ d 2 ] m Since we are considering x as an unknown deterministic vector, the SNR can be further simplified as follows: The entries of A are i.i.d. N (0, 1/m), so columns ai Rm and aj Rm of A satisfy E[ai aj ] = ij . Therefore, the signal energy is given by i =i A xi xj ij = x 2 . E [a aj xi xj ] = E x2 i S NR = ,j Itrue ,j Itrue Substituting into the definition (3), the SNR is given by S NR = 1 x 2. m (4 ) The minimum-to-average ratio of x is defined as MAR = Since x 2 /k is the average of {|xj |2 | j Itrue }, MAR (0, 1] with the upper limit occurring when all the nonzero entries of x have the same magnitude. minj Itrue |xj |2 . x 2 /k (5 ) 3 Necessary Condition for Sparsity Recovery W e h s e first consider sparsity recovery without bn ing concerned with computational complexity of tne estimation algorithm. Since the vector x R is k -sparse, the vector Ax belongs to one of L = k ubspaces spanned by k of the n columns of A. Estimation of the sparsity pattern is the selection of one of these subspaces, and since the noise d is Gaussian, the probability of error is minimized by choosing the subspace closest to the observed vector y . This results in the maximum likelihood (ML) estimate. Mathematically, the ML estimator can be described as follows. Given a subset J {1, 2, . . . , n}, let PJ y denote the orthogonal projection of the vector y onto the subspace spanned by the vectors {aj | j J }. The ML estimate of the sparsity pattern is ^ IML = arg max PJ y 2, J : | J | =k where |J | denotes the cardinality of J . That is, the ML estimate is the set of k indices such that the subspace spanned by the corresponding columns of A contain the maximum signal energy of y . Since the number of subspaces, L, grows exponentially in n and k , an exhaustive search is computationally infeasible. However, the performance of ML estimation provides a lower bound on the number of measurements needed by any algorithm that cannot exploit a priori information on x other than it being k -sparse. ML estimation for sparsity recovery was first examined in [17]. There, it was shown that there exists a constant C > 0 such that the condition k ( m > C max log(n - k ), k log(n/k ) 6) SNR · MAR is sufficient for ML to asymptotically reliably recover the sparsity pattern. Our first theorem provides a corresponding necessary condition. Theorem 1 Let k = k (n) and m = m(n) vary with n such that limn k (n) = and m(n) < MAR · SNR 2- k log(n - k ) + k - 1 (7 ) 3 for some > 0. Then even the ML estimator asymptotically cannot detect the sparsity pattern, i.e., = ^ 0. lim Pr IML = Itrue n Proof sketch: The basic idea in the proof is to consider an "incorrect" subspace formed by removing one of the k correct vectors with the least energy, and adding one of the n - k incorrect vectors with largest energy. The change in energy can be estimated using tail distributions of chi-squared random variables. A complete proof appears in [22]. The theorem shows that for fixed SNR and MAR, the scaling m = (k log(n - k )) is necessary for reliable sparsity pattern recovery. The next section will show that this scaling can be achieved with an extremely simple method. Remarks 1. The theorem strengthens an earlier necessary condition in [18] which showed that there exists a C > 0 such that C m= k log(n - k ) S NR 2. 3. 4. 5. 6. 7. 8. is necessary for asymptotic reliable recovery. Theorem 1 strengthens the result to reflect the dependence on MAR and make the constant explicit. The theorem applies for any k (n) such that limn k (n) = , including both cases with k = o(n) and k = (n). In particular, under linear sparsity (k = n for some constant ), the theorem shows that 2 m n log n MAR · SNR measurements are necessary for sparsity recovery. Similarly, if m/n is bounded above by a constant, then sparsity recovery will certainly fail unless k = O(n/ log n). In the case where SNR · MAR is constant, the sufficient condition (6) reduces to m = C /(SNR · MAR))k log(n - k ), which matches the necessary condition in (7) within a constant factor. In the case of MAR · SNR < 1, the bound (7) improves upon the necessary condition of [14] for the asymptotic success of lasso by the factor (MAR · SNR)-1 . The bound (7) can be compared against the information-theoretic bounds mentioned earlier. The tightest of these bounds is in [16] and shows that the problem dimensions must satisfy n 2 S NR log2 ), (8 ) log2 (1 + SNR) - log2 (1 + m k where = k /n is the sparsity ratio. For large n and k , the bound can be rearranged as l -1 S NR 2h() og2 (1 + SNR) - log2 (1 + ) k, m where h(·) is the binary entropy function. In particular, when the sparsity ratio is fixed, the bound shows only that m needs to grow at least linearly with k . In contrast, Theorem 1 shows that with fixed sparsity ratio m = (k log(n - k )) is necessary for reliable sparsity recovery. Thus, the bound in Theorem 1 is significantly tighter and reveals that the previous information-theoretic necessary conditions from [15­17, 20, 21] are overly optimistic. Results more similar to Theorem 1--based on direct analyses of error events rather than information-theoretic arguments--appeared in [18, 19]. The previous results showed that with fixed SNR as defined here, sparsity recovery with m = (k ) must fail. The more refined analysis in this paper gives the additional log(n - k ) factor and the precise dependence on MAR · SNR. Theorem 1 is not contradicted by the relevant sufficient condition of [20, 21]. hat sufficient T condition holds for scaling that gives linear sparsity and MAR · NR = ( n log n). For S MAR · SNR = n log n, Theorem 1 shows that fewer than m 2 k log k measurements will cause ML decoding to fail, while [21, Thm. 3.1] shows that a typicality-based decoder will succeed with m = (k ) measurements. Note that the necessary condition of [17] is proven for MAR = 1. Theorem 1 gives a bound that increases for smaller MAR; this suggests (though does not prove, since the condition is merely necessary) that smaller MAR makes the problem harder. 4 (1, 1) 40 35 30 25 40 35 30 25 20 15 10 5 2 4 (2, 1) 40 35 30 25 20 15 10 5 2 4 (5, 1) 40 35 30 25 20 15 10 5 2 4 (10, 1) 40 35 30 25 20 15 10 5 2 4 (10, 0.5) 40 35 30 25 20 15 10 5 2 4 (10, 0.2) 40 35 30 25 20 15 10 5 (10, 0.1) 1 0.8 0.6 0.4 0.2 0 m 20 15 10 5 2 4 2 4 k k k k k k k Figure 1: Simulated success probability of ML detection for n = 20 and many values of k , m, SNR, and MAR. Each subfigure gives simulation results for k {1, 2, . . . , 5} and m {1, 2, . . . , 40} for one (SNR, MAR) pair. Each subfigure heading gives (SNR, MAR). Each point represents at least 500 independent trials. Overlaid on the color-intensity plots is a black curve representing (7). Numerical validation Computational confirmation of Theorem 1 is technically impossible, and even qualitative support is hard to obtain because of the high complexity of ML detection. Nevertheless, we may obtain some evidence through Monte Carlo simulation. Fig. 1 shows the probability of success of ML detection for n = 20 as k , m, SNR, and MAR are varied, with each point representing at least 500 independent trials. Each subpanel gives simulation results for k {1, 2, . . . , 5} and m {1, 2, . . . , 40} for one (SNR, MAR) pair. Signals with MAR < 1 are created by having one small nonzero component and k - 1 equal, larger nonzero components. Overlaid on the color-intensity plots is a black curve representing (7). Taking any one column of one subpanel from bottom to top shows that as m is increased, there is a transition from ML failing to ML succeeding. One can see that (7) follows the failure-success transition qualitatively. In particular, the empirical dependence on SNR and MAR approximately follows (7). Empirically, for the (small) value of n = 20, it seems that with MAR · SNR held fixed, sparsity recovery becomes easier as SNR increases (and MAR decreases). 4 Sufficient Condition with Maximum Correlation Detection Consider the following simple estimator. As before, let aj be the j th column of the random matrix A. Define the maximum correlation (MC) estimate as j . ^ IMC = : |a y | is one of the k largest values of |a y | (9 ) j i This algorithm simply correlates the observed signal y with all the frame vectors aj and selects the indices j with the highest correlation. It is significantly simpler than both lasso and matching pursuit and is not meant to be proposed as a competitive alternative. Rather, the MC method is introduced and analyzed to illustrate that a trivial method can obtain optimal scaling with respect to n and k . The MC algorithm was previously studied in [23], which proves a sufficient condition for MC to detect the sparsity pattern when there is no noise. The following theorem tightens and generalizes the result to the noisy case. Theorem 2 Let k = k (n) and m = m(n) vary with n such that limn k = , lim supn k /n 1/2, and (8 + )(1 + SNR) m> k log(n - k ) (1 0 ) MAR · SNR for some > 0. Then the maximum correlation estimator asymptotically detects the sparsity pattern, i.e., = ^ 1. lim Pr IMC = Itrue n 5 Proof sketch: Combining (10) with tail distributions on chi-squared random variables, it is shown that the minimum value for the correlation |a y |2 when j Itrue is greater than the maximum j correlation when j Itrue . A complete proof appears in [22]. Remarks 1. Comparing (7) and (10), we see that for a fixed SNR and minimum-to-average ratio, the simple MC estimator needs only a constant factor more measurements than the optimal ML estimator. In particular, the results show that the scaling of the minimum number of measurements m = (k log(n - k )) is both necessary and sufficient. Moreover, the optimal scaling factor not only does not require ML estimation, it does not even require lasso or matching pursuit--it can be achieved with a remarkably simply method such as maximum correlation. There is, of course, a difference in the constant factors of the expressions (7) and (10). Specifically, the MC method requires a factor 4(1 + SNR) more measurements than ML detection. In particular, for low SNRs (i.e. SNR 1), the factor reduces to 4. 2. For high SNRs, the gap between the MC estimator and the ML estimator can be large. In particular, the lower bound on the number of measurements required by ML decreases to k - 1 as SNR .1 In contrast, with the MC estimator increasing the SNR has diminishing returns: as SNR , the bound on the number of measurements in (10) approaches 8 m> k log(n - k ). (1 1 ) MAR Thus, even with SNR , the minimum number of measurements is not improved from m = O(k log(n - k )). This diminishing returns for improved SNR exhibited by the MC method is also a problem for more sophisticated methods such as lasso. For example, the analysis of [14] shows that when the SNR = (n) (so SNR ) and MAR is bounded strictly away from zero, lasso requires m > 2k log(n - k ) + k + 1 (1 2 ) for reliable recovery. Therefore, like the MC method, lasso does not achieve a scaling better than m = O(k log(n - k )), even at infinite SNR. 3. There is certainly a gap between MC and lasso. Comparing (11) and (12), we see that, at high SNR, the simple MC method requires a factor of at most 4/MAR more measurements than lasso. This factor is largest when MAR is small, which occurs when there are relatively small nonzero components. Thus, in the high SNR regime, the main benefit of lasso is not that it achieves an optimal scaling with respect to k and n (which can be achieved with the simpler MC), but rather that lasso is able to detect small coefficients, even when they are much below the average power. 4. The high SNR limit (11) matches the sufficient condition in [23] for the noise free case, except that the constant in (11) is tighter. Numerical validation MC sparsity pattern detection is extremely simple and can thus be simulated easily for large problem sizes. Fig. 2 reports the results of a large number Monte Carlo simulations of the MC method with n = 100. The threshold predicted by (10) matches well to the parameter combinations where the probability of success drops below about 0.995. 5 Conclusions We have considered the problem of detecting the sparsity pattern of a sparse vector from noisy random linear measurements. Our main conclusions are: · Necessary and sufficient scaling with respect to n and k . For a given SNR and minimum-toaverage ratio, the scaling of the number of measurements is both necessary and sufficient for asymptotically reliable sparsity pattern detection. This scaling is significantly worse than predicted by previous information-theoretic bounds. 1 m = O(k log(n - k )) Of course, at least k + 1 measurements are necessary. 6 (1, 1) 1000 800 600 400 200 10 (1, 0.2) 1000 800 600 400 200 10 k 20 1000 800 600 400 200 20 1000 800 600 400 200 (10, 1) 1000 800 600 400 200 10 (10, 0.2) 1000 800 600 400 200 10 k 20 20 (100, 1) m 10 (100, 0.2) 20 m 10 k 20 Figure 2: Simulated success probability of MC detection for n = 100 and many values of k , m, SNR, and MAR. Each subfigure gives simulation results for k {1, 2, . . . , 20} and m {25, 50, . . . , 1000} for one (SNR, MAR) pair. Each subfigure heading gives (SNR, MAR), so SNR = 1, 10, 100 for the three columns and MAR = 1, 0.5, 0.2 for the three rows. Each point represents 1000 independent trials. Overlaid on the color-intensity plots (with scale as in Fig. 1) is a black curve representing (10). · Scaling optimality of a trivial method. The optimal scaling with respect to k and n can be achieved with a trivial maximum correlation (MC) method. In particular, both lasso and OMP, while likely to do better, are not necessary to achieve this scaling. · Dependence on SNR. While the threshold number of measurements for ML and MC sparsity recovery to be successful have the same dependence on n and k , the dependence on SNR differs significantly. Specifically, the MC method requires a factor of up to 4(1 + SNR) more measurements than ML. Moreover, as SNR , the number of measurements required by ML may be as low as m = k + 1. In contrast, even letting SNR , the maximum correlation method still requires a scaling m = O(k log(n - k )). · Lasso and dependence on MAR. MC can also be compared to lasso, at least in the high SNR regime. There is a potential gap between MC and lasso, but the gap is smaller than the gap to ML. Specifically, in the high SNR regime, MC requires at most 4/MAR more measurements than lasso, where MAR is the mean-to-average ratio defined in (5). Both lasso and MC scale as m = O(k log(n - k )). Thus, the benefit of lasso is not in its scaling with respect to the problem dimensions, but rather its ability to detect the sparsity pattern even in the presence of relatively small nonzero coefficients (i.e. low MAR). While our results settle the question of the optimal scaling of the number of measurements m in terms of k and n, there is clearly a gap in the necessary and sufficient conditions in terms of the scaling of the SNR. We have seen that full ML estimation could potentially have a scaling in SNR as small as m = O(1/SNR) + k - 1. An open question is whether there is any practical algorithm that can achieve a similar scaling. A second open issue is to determine conditions for partial sparsity recovery. The above results define conditions for recovering all the positions in the sparsity pattern. However, in many practical applications, obtaining some large fraction of these positions would be sufficient. Neither the limits of partial sparsity recovery nor the performance of practical algorithms are completely understood, though some results have been reported in [19­21, 24]. 7 References [1] M. Lewicki. Efficient coding of natural sounds. Nature Neuroscience, 5:356­363, 2002. [2] B. A. Olshausen and D. J. Field. Sparse coding of sensory inputs. Curr. Op. in Neurobiology, 14:481­487, 2004. [3] C. J. Rozell, D. H. Johnson, R. G. Baraniuk, and B. A. Olshausen. Sparse coding via thresholding and local competition in neural circuits. Neural Computation, 2008. In press. [4] B. K. Natarajan. Sparse approximate solutions to linear systems. SIAM J. Computing, 24(2):227­234, April 1995. [5] S. G. Mallat and Z. Zhang. Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process., 41(12):3397­3415, Dec. 1993. [6] S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM J. Sci. Comp., 20(1):33­61, 1999. [7] R. Tibshirani. Regression shrinkage and selection via the lasso. J. Royal Stat. Soc., Ser. B, 58(1):267­288, 1996. [8] D. L. Donoho, M. Elad, and V. N. Temlyakov. Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory, 52(1):6­18, Jan. 2006. [9] J. A. Tropp. Greed is good: Algorithmic results for sparse approximation. IEEE Trans. Inform. Theory, 50(10):2231­2242, Oct. 2004. [10] J. A. Tropp. Just relax: Convex programming methods for identifying sparse signals in noise. IEEE Trans. Inform. Theory, 52(3):1030­1051, March 2006. [11] E. J. Candes, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal reconstruc` tion from highly incomplete frequency information. IEEE Trans. Inform. Theory, 52(2):489­ 509, Feb. 2006. [12] D. L. Donoho. Compressed sensing. IEEE Trans. Inform. Theory, 52(4):1289­1306, April 2006. [13] E. J. Candes and T. Tao. Near-optimal signal recovery from random projections: Universal ` encoding strategies? IEEE Trans. Inform. Theory, 52(12):5406­5425, Dec. 2006. [14] M. J. Wainwright. Sharp thresholds for high-dimensional and noisy recovery of sparsity. arXiv:0605.740v1 [math.ST]., May 2006. [15] S. Sarvotham, D. Baron, and R. G. Baraniuk. Measurements vs. bits: Compressed sensing meets information theory. In Proc. 44th Ann. Allerton Conf. on Commun., Control and Comp., Monticello, IL, Sept. 2006. [16] A. K. Fletcher, S. Rangan, and V. K. Goyal. Rate-distortion bounds for sparse approximation. In IEEE Statist. Sig. Process. Workshop, pages 254­258, Madison, WI, Aug. 2007. [17] M. J. Wainwright. Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting. Tech. Rep. 725, Univ. of California, Berkeley, Dept. of Statistics, Jan. 2007. [18] V. K. Goyal, A. K. Fletcher, and S. Rangan. Compressive sampling and lossy compression. IEEE Sig. Process. Mag., 25(2):48­56, March 2008. [19] G. Reeves. Sparse signal sampling using noisy linear projections. Tech. Rep. UCB/EECS2008-3, Univ. of California, Berkeley, Dept. of Elec. Eng. and Comp. Sci., Jan. 2008. [20] M. Akcakaya and V. Tarokh. Shannon theoretic limits on noisy compressive sampling. ¸ arXiv:0711.0366v1 [cs.IT]., Nov. 2007. [21] M. Akcakaya and V. Tarokh. Noisy compressive sampling limits in linear and sublinear ¸ regimes. In Proc. Conf. on Inform. Sci. & Sys., Princeton, NJ, March 2008. [22] A. K. Fletcher, S. Rangan, and V. K. Goyal. Necessary and sufficient conditions on sparsity pattern recovery. arXiv:0804.1839v1 [cs.IT]., April 2008. [23] H. Rauhut, K. Schnass, and P. Vandergheynst. Compressed sensing and redundant dictionaries. IEEE Trans. Inform. Theory, 54(5):2210­2219, May 2008. [24] S. Aeron, M. Zhao, and V. Saligrama. On sensing capacity of sensor networks for the class of linear observation, fixed SNR models. arXiv:0704.3434v3 [cs.IT]., June 2007. 8