Random Projections for Manifold Learning Chinmay Hegde ECE Department Rice University ch3@rice.edu Michael B. Wakin EECS Department University of Michigan wakin@eecs.umich.edu Richard G. Baraniuk ECE Department Rice University richb@rice.edu Abstract We propose a novel method for linear dimensionality reduction of manifold modeled data. First, we show that with a small number M of random projections of sample points in RN belonging to an unknown K -dimensional Euclidean manifold, the intrinsic dimension (ID) of the sample set can be estimated to high accuracy. Second, we rigorously prove that using only this set of random projections, we can estimate the structure of the underlying manifold. In both cases, the number of random projections required is linear in K and logarithmic in N , meaning that K < M N . To handle practical situations, we develop a greedy algorithm to estimate the smallest size of the projection space required to perform manifold learning. Our method is particularly relevant in distributed sensing systems and leads to significant potential savings in data acquisition, storage and transmission costs. 1 Introduction Recently, we have witnessed a tremendous increase in the sizes of data sets generated and processed by acquisition and computing systems. As the volume of the data increases, memory and processing requirements need to correspondingly increase at the same rapid pace, and this is often prohibitively expensive. Consequently, there has been considerable interest in the task of effective modeling of high-dimensional observed data and information; such models must capture the structure of the information content in a concise manner. A powerful data model for many applications is the geometric notion of a low-dimensional manifold. Data that possesses merely K "intrinsic" degrees of freedom can be assumed to lie on a K -dimensional manifold in the high-dimensional ambient space. Once the manifold model is identified, any point on it can be represented using essentially K pieces of information. Thus, algorithms in this vein of dimensionality reduction attempt to learn the structure of the manifold given highdimensional training data. While most conventional manifold learning algorithms are adaptive (i.e., data dependent) and nonlinear (i.e., involve construction of a nonlinear mapping), a linear, nonadaptive manifold dimensionality reduction technique has recently been introduced that employs random projections [1]. Consider a K -dimensional manifold M in the ambient space RN and its projection onto a random subspace of dimension M = C K log(N ); note that K < M N . The result of [1] is that the pairwise metric structure of sample points from M is preserved with high accuracy under projection from RN to RM . (a) (b) (c) (d) Figure 1: Manifold learning using random projections. (a) Input data consisting of 1000 images of a shifted disk, each of size N = 64 × 64 = 4096. (b) True 1 and 2 values of the sampled data. (c,d) Isomap embedding learned from (c) original data in RN , and (d) a randomly projected version of the data into RM with M = 15. This result has far reaching implications. Prototypical devices that directly and inexpensively acquire random projections of certain types of data (signals, images, etc.) have been developed [2, 3]; these devices are hardware realizations of the mathematical tools developed in the emerging area of Compressed Sensing (CS) [4, 5]. The theory of [1] suggests that a wide variety of signal processing tasks can be performed directly on the random projections acquired by these devices, thus saving valuable sensing, storage and processing costs. The advantages of random projections extend even to cases where the original data is available in the ambient space RN . For example, consider a wireless network of cameras observing a scene. To perform joint image analysis, the following steps might be executed: 1. Collate: Each camera node transmits its respective captured image (of size N ) to a central processing unit. 2. Preprocess: The central processor estimates the intrinsic dimension K of the underlying image manifold. 3. Learn: The central processor performs a nonlinear embedding of the data points ­ for instance, using Isomap [6] ­ into a K -dimensional Euclidean space, using the estimate of K from the previous step. In situations where N is large and communication bandwidth is limited, the dominating costs will be in the first transmission/collation step. On the one hand, to reduce the communication needs one may perform nonlinear image compression (such as JPEG) at each node before transmitting to the central processing. But this requires a good deal of processing power at each sensor, and the compression would have to be undone during the learning step, thus adding to overall computational costs. On the other hand, every camera could encode its image by computing (either directly or indirectly) a small number of random projections to communicate to the central processor. These random projections are obtained by linear operations on the data, and thus are cheaply computed. Clearly, in many situations it will be less expensive to store, transmit, and process such randomly projected versions of the sensed images. The question now becomes: how much information about the manifold is conveyed by these random projections, and is any advantage in analyzing such measurements from a manifold learning perspective? In this paper, we provide theoretical and experimental evidence that reliable learning of a K dimensional manifold can be performed not just in the high-dimensional ambient space RN but also in an intermediate, much lower-dimensional random projection space RM , where M = C K log(N ). See, for example, the toy example of Figure 1. Our contributions are as follows. First, we present a theoretical bound on the minimum number of measurements per sample point required to estimate the intrinsic dimension (ID) of the underlying manifold, up to an accuracy level comparable to that of the Grassberger-Procaccia algorithm [7, 8], a widely used geometric approach for dimensionality estimation. Second, we present a similar bound on the number of measurements M required for Isomap [6] ­ a popular manifold learning algorithm ­ to be "reliably" used to discover the nonlinear structure of the manifold. In both cases, M is shown to be linear in K and logarithmic in N . Third, we formulate a procedure to determine, in practical settings, this minimum value of M with no a priori information about the data points. This paves the way for a weakly adaptive, linear algorithm (ML-RP) for dimensionality reduction and manifold learning. The rest of the paper is organized as follows. Section 2 recaps the manifold learning approaches we utilize. In Section 3 presents our main theoretical contributions, namely, the bounds on M required to perform reliable dimensionality estimation and manifold learning from random projections. Sec- tion 4 describes a new adaptive algorithm that estimates the minimum value of M required to provide a faithful representation of the data so that manifold learning can be performed. Experimental results on a variety of real and simulated data are provided in Section 5. Section 6 concludes with discussion of potential applications and future work. 2 Background An important input parameter for all manifold learning algorithms is the intrinsic dimension (ID) of a point cloud. We aim to embed the data points in as low-dimensional a space as possible in order to avoid the curse of dimensionality. However, if the embedding dimension is too small, then distinct data points might be collapsed onto the same embedded point. Hence a natural question to ask is: given a point cloud in N -dimensional Euclidean space, what is the dimension of the manifold that best captures the structure of this data set? This problem has received considerable attention in the literature and remains an active area of research [7, 9, 10]. For the purposes of this paper, we focus our attention on the Grassberger-Procaccia (GP) [7] algorithm for ID estimation. This is a widely used geometric technique that takes as input the set of pairwise distances between sample points. It then computes the scale-dependent correlation dimension of the data, defined as follows. Definition 2.1 Suppose X = (x1 , x2 , ..., xn ) is a finite dataset of underlying dimension K . Define i 1 Cn (r) = I xi -xj 50. (right) ML-RP on the hand rotation database (N = 3840). For M > 60, the Isomap variance is indistinguishable from the variance obtained in the ambient space. sion of the underlying manifold. We plot the intrinsic dimension of the dataset against the minimum number of projections required such that K is within 10% of the conventional GP estimate K (this is equivalent to choosing = 0.1 in Theorem 3.1). We observe the predicted linearity (Theorem 3.1) in the variation of M vs K . Finally, we turn our attention to two common datasets (Figure 3) found in the literature on dimension estimation ­ the face database2 [6], and the hand rotation database [17].3 The face database is a collection of 698 artificial snapshots of a face (N = 64 × 64 = 4096) varying under 3 degrees of freedom: 2 angles for pose and 1 for lighting dimension. The signals are therefore believed to reside on a 3D manifold in an ambient space of dimension 4096. The hand rotation database is a set of 90 images (N = 64 × 60 = 3840) of rotations of a hand holding an object. Although the image appearance manifold is ostensibly one-dimensional, estimators in the literature always overestimate its ID [11]. Random projections of each sample in the databases were obtained by computing the inner product of the image samples with an increasing number of rows of the random orthoprojector . We note that in the case of the face database, for M > 60, the Isomap variance on the randomly projected points closely approximates the variance obtained with full image data. This behavior of convergence of the variance to the best possible value is even more sharply observed in the hand rotation database, in which the two variance curves are indistinguishable for M > 60. These results are particularly encouraging and demonstrate the validity of the claims made in Section 3. 6 Discussion Our main theoretical contributions in this paper are the explicit values for the lower bounds on the minimum number of random projections required to perform ID estimation and subsequent manifold learning using Isomap, with high guaranteed accuracy levels. We also developed an empirical greedy algorithm (ML-RP) for practical situations. Experiments on simple cases, such as uniformly generated hyperspheres of varying dimension, and more complex situations, such as the image databases displayed in Figure 3, provide sufficient evidence of the nature of the bounds described above. http://isomap.stanford.edu http://vasc.ri.cmu.edu//idb/html/motion/hand/index.html. Note that we use a subsampled version of the database used in the literature, both in terms of resolution of the image and sampling of the manifold. 3 2 The method of random projections is thus a powerful tool for ensuring the stable embedding of lowdimensional manifolds into an intermediate space of reasonable size. The motivation for developing results and algorithms that involve random measurements of high-dimensional data is significant, particularly due to the increasing attention that Compressive Sensing (CS) has received recently. It is now possible to think of settings involving a huge number of low-power devices that inexpensively capture, store, and transmit a very small number of measurements of high-dimensional data. ML-RP is applicable in all such situations. In situations where the bottleneck lies in the transmission of the data to the central processing node, ML-RP provides a simple solution to the manifold learning problem and ensures that with minimum transmitted amount of information, effective manifold learning can be performed. The metric structure of the projected dataset upon termination of MLRP closely resembles that of the original dataset with high probability; thus, ML-RP can be viewed as a novel adaptive algorithm for finding an efficient, reduced representation of data of very large dimension. References [1] R. G. Baraniuk and M. B. Wakin. Random projections of smooth manifolds. 2007. To appear in Foundations of Computational Mathematics. [2] M. B. Wakin, J. N. Laska, M. F. Duarte, D. Baron, S. Sarvotham, D. Takhar, K. F. Kelly, and R. G. Baraniuk. An architecture for compressive imaging. In IEEE International Conference on Image Processing (ICIP), pages 1273­1276, Oct. 2006. [3] S. Kirolos, J.N. Laska, M.B. Wakin, M.F. Duarte, D.Baron, T. Ragheb, Y. Massoud, and R.G. Baraniuk. Analog-to-information conversion via random demodulation. In Proc. IEEE Dallas Circuits and Systems Workshop (DCAS), 2006. [4] E. J. Candes, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal reconstruc` tion from highly incomplete frequency information. IEEE Trans. Info. Theory, 52(2):489­509, Feb. 2006. [5] D. L. Donoho. Compressed sensing. IEEE Trans. Info. Theory, 52(4):1289­1306, September 2006. [6] J. B. Tenenbaum, V.de Silva, and J. C. Landford. A global geometric framework for nonlinear dimensionality reduction. Science, 290:2319­2323, 2000. [7] P. Grassberger and I. Procaccia. Measuring the strangeness of strange attractors. Physica D Nonlinear Phenomena, 9:189­208, 1983. [8] J. Theiler. Statistical precision of dimension estimators. Physical Review A, 41(6):3038­3051, 1990. [9] F. Camastra. Data dimensionality estimation methods: a survey. Pattern Recognition, 36:2945­ 2954, 2003. [10] J. A. Costa and A. O. Hero. Geodesic entropic graphs for dimension and entropy estimation in manifold learning. IEEE Trans. Signal Processing, 52(8):2210­2221, August 2004. [11] E. Levina and P. J. Bickel. Maximum likelihood estimation of intrinsic dimension. In Advances in NIPS, volume 17. MIT Press, 2005. [12] S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290:2323­2326, 2000. [13] D. Donoho and C. Grimes. Hessian eigenmaps: locally linear embedding techniques for high dimensional data. Proc. of National Academy of Sciences, 100(10):5591­5596, 2003. [14] Sanjoy Dasgupta and Anupam Gupta. An elementary proof of the JL lemma. Technical Report TR-99-006, University of California, Berkeley, 1999. [15] C. Hegde, M. B. Wakin, and R. G. Baraniuk. Random projections for manifold learning proofs and analysis. Technical Report TREE 0710, Rice University, 2007. [16] M. Bernstein, V. de Silva, J. Langford, and J. Tenenbaum. Graph approximations to geodesics on embedded manifolds, 2000. Technical report, Stanford University. [17] B. Kegl. Intrinsic dimension estimation using packing numbers. In Advances in NIPS, vol´ ume 14. MIT Press, 2002.