Uncorrelated Multilinear Principal Comp onent Analysis through Successive Variance Maximization Haiping Lu hplu@ieee.org Konstantinos N. Plataniotis kostas@comm.toronto.edu Department of Electrical and Computer Engineering, University of Toronto, ON, M5S 3G4, Canada Anastasios N. Venetsanop oulos Ryerson University, Toronto, ON, M5B 2K3, Canada tasvenet@ryerson.ca Abstract Tensorial data are frequently encountered in various machine learning tasks today and dimensionality reduction is one of their most important applications. This paper extends the classical principal component analysis (PCA) to its multilinear version by proposing a novel unsupervised dimensionality reduction algorithm for tensorial data, named as uncorrelated multilinear PCA (UMPCA). UMPCA seeks a tensor-to-vector pro jection that captures most of the variation in the original tensorial input while producing uncorrelated features through successive variance maximization. We evaluate the UMPCA on a second-order tensorial problem, face recognition, and the experimental results show its superiority, especially in lowdimensional spaces, through the comparison with three other PCA-based algorithms. organized as third-order tensors. For instance, data in environmental sensor monitoring are often organized in three modes of time, location and type, and data in web graph mining are commonly organized in three modes of source, destination and text. Other applications involving tensorial data include data center monitoring, social network analysis, network forensics and face recognition (Faloutsos et al., 2007). In these practical applications, tensor ob jects are often specified in a high-dimensional tensor space, leading to the so-called curse of dimensionality. Nonetheless, the class of tensor ob jects in most applications are highly constrained to a subspace, a manifold of intrinsically low dimension (Shakhnarovich & Moghaddam, 2004), and feature extraction or dimensionality reduction is frequently employed to transform a high-dimensional data set into a low-dimensional space of equivalent representation while retaining most of the underlying structure (Law & Jain, 2006). The PCA is a classical linear method for unsupervised dimensionality reduction that transforms a data set consisting of a large number of interrelated variables to a new set of uncorrelated variables, while retaining as much as possible the variations present in the original data set (Jolliffe, 2002). PCA on tensor objects requires their reshaping (vectorization) into vectors in a very high-dimensional space, which not only results in high computational and memory demands but also breaks the natural structure and correlation in the original data (Ye, 2005; Ye et al., 2004; Lu et al., 2008a). It is believed by many researchers that potentially more compact or useful representations can be obtained from the original form and PCA extensions operating directly on the tensor ob jects rather than their vectorized versions are emerging recently (Ye et al., 2004; Lu et al., 2008a; Xu et al., 2005). In (Shashua & Levin, 2001), the tensor rank-one de- 1. Intro duction Various machine learning problems take multidimensional data as input, which are formally called tensors. The elements of a tensor are to be addressed by several indices and the number of indices used in the description defines the order of the tensor ob ject, with each index defining one "mode" (Lathauwer et al., 2000). Many real-world data are naturally tensor objects. For example, matrix data such as gray-level images are second-order tensors, gray-scale video sequences and 3-D ob jects are third-order tensors. In addition, streaming data and mining data are frequently Appearing in Proceedings of the 25 th International Conference on Machine Learning, Helsinki, Finland, 2008. Copyright 2008 by the author(s)/owner(s). Uncorrelated Multilinear Principal Comp onent Analysis composition (TROD) is used to represent a class of images based on variance maximization and (greedy) successive residue calculation. A two-dimensional PCA (2DPCA) is proposed in (Yang et al., 2004) that constructs an image covariance matrix using image matrices as inputs. However, linear transformation is applied only to the right side of image matrices so the image data is pro jected in one mode only, resulting in poor dimensionality reduction. A more general algorithm named generalized low rank approximation of matrices (GLRAM) was introduced in (Ye, 2005), which applies two linear transforms to both the left and right sides of input image matrices and results in a better dimensionality reduction than 2DPCA. GLRAM is developed from the perspective of approximation while the generalized PCA (GPCA) is proposed in (Ye et al., 2004) from the view of variation maximization, as an extension of PCA. Later, the concurrent subspaces analysis (CSA) is formulated in (Xu et al., 2005) for optimal reconstruction of general tensor ob jects, which can be considered as a generalization of GLRAM, and the multilinear PCA (MPCA) introduced in (Lu et al., 2008a) targets at variation maximization for general tensor ob jects in the extension of PCA to the multilinear case, which can be considered as a further generalization of GPCA. However, none of the existing multilinear extensions of PCA mentioned above takes an important property of PCA into account, i.e., PCA derives uncorrelated features, which contain minimum redundancy and ensure independence among features. Instead, most of them produce orthogonal bases in each mode. Although uncorrelated features imply orthogonal pro jection bases in PCA, this is not necessarily the case for its multilinear extension. With this motivation, this paper investigates multilinear extension of PCA that can produce uncorrelated features. We propose a novel uncorrelated multilinear PCA (UMPCA) for unsupervised tensor ob ject dimensionality reduction (feature extraction). UMPCA is based on the tensor-to-vector pro jection (TVP) (Lu et al., 2008b) and it follows the classical PCA derivation of successive variance maximization (Jolliffe, 2002). Thus, a number of elementary multilinear pro jections (EMPs) are solved to maximize the captured variance with the zero-correlation constraint. The solution is iterative in nature, as many other multilinear algorithms (Xu et al., 2005; Ye et al., 2004; Shashua & Levin, 2001). The rest of this paper is organized as follows. Section 2 reviews basic multilinear notations and operations, as well as the concept of tensor-to-vector pro jection. In Sec. 3, the problem of UMPCA is formulated and the solution is derived as a sequential iterative process. Notations Xm , m = 1, ..., M u(n) , n = 1, ..., N {up ym (n)T Table 1. Notations Descriptions the mth input tensor sample the n-mode pro jection vector the pth EMP, where p is the index of the EMP the pro jection of Xm on the TVP {up , n = 1, ..., N }P=1 p the pro jection of Xm on the pth EMP {up , n = 1, ..., N } the pth coordinate vector (n)T (n)T , n = 1, ..., N } ym (p) = ymp = gp (m) gp Next, Sec. 4 evaluates the effectiveness of UMPCA in the popular face recognition task through comparison with PCA, MPCA and TROD. Finally, the conclusions are drawn in Sec. 5. 2. Multilinear Fundamentals This section introduces the multilinear notations, operations and pro jections needed in the presentation of UMPCA, and for further pursuing of multilinear algebra, (Lathauwer et al., 2000) is a good reference. The important notations used in this paper are listed in Table 1 for handy reference. 2.1. Notations and basic multilinear op erations Due to the multilinear nature of tensor ob jects, new notations have been introduced in the literature for mathematical analysis. Following the notations in (Lathauwer et al., 2000), we denote vectors by lowercase boldface letters, e.g., x; matrices by uppercase boldface letters, e.g., U; and tensors by calligraphic letters, e.g., A. Their elements are denoted with indices in parentheses. Indices are denoted by lowercase letters and span the range from 1 to the uppercase letter of the index, e.g., n = 1, 2, ..., N . An N th -order tensor A RI1 ×I2 ×...×IN is addressed by N indices in , n = 1, ..., N , and each in addresses the n-mode of A. The n-mode product of a tensor A by a matrix U RJn ×In , denoted by A ×n U, is a tensor with entries: (A ×n U)(i1 , ..., in-1 , jn , in+1 , ..., iN ) i = A(i1 , i2 , ..., iN ) · U(jn , in ). n (1) The scalar product of two tensors A, B RI1 ×I2 ×...×IN is defined as: i i < A, B >= ... A(i1 , ..., iN ) · B (i1 , ..., iN ). (2) 1 N A rank-one tensor A equals to the outer product of N Uncorrelated Multilinear Principal Comp onent Analysis vectors: A = u(1) u(2) ... u(N ) , which means that A(i1 , i2 , ..., iN ) = u(1) (i1 ) · u(2) (i2 ) · ... · u(N ) (iN ) for all values of indices. 2.2. Tensor-to-vector pro jection In order to extract uncorrelated features from tensorial data directly, we employ the TVP introduced in (Lu et al., 2008b), which is a more general form of the pro jection in (Shashua & Levin, 2001) and consists of multiple EMPs. An EMP is a multilinear proT T T jection {u(1) , u(2) , ..., u(N ) } consisting of one unit pro jection vector in each mode, i.e., u(n) = 1 for n = 1, ..., N , where · is the Euclidean norm for vectors. It pro jects a tensor X RI1 ×I2 ×...×IN to a scalar y through the N unit pro jection vectors as y = X ×1 u(1) ×2 u(2) ... ×N u(N ) =< X , U >, where U = u(1) u(2) ... u(N ) . An EMP can be viewed as a constrained linear pro jection since T < X , U >=< v ec(X ), v ec(U ) >= [v ec(U )] v ec(X ), where v ec(·) denotes the vectorized representation. The TVP of a tensor ob ject X to a vector y (1)T (2)T (N )T RP consists of P EMPs {up , up , ..., up }, p = 1, ..., P , which can be written concisely as {up 1, ..., N }P=1 : p y = X ×N=1 {u(n) , n = 1, ..., N }P=1 , n p p T T T T 3. Uncorrelated Multilinear PCA This section proposes the UMPCA for unsupervised dimensionality reduction of tensor ob jects by first formulating the UMPCA ob jective function and then adopting the successive variance maximization approach and alternating pro jection method to solve the problem. In the presentation, for the convenience of discussion, the training samples are assumed to be zero-mean 1 so that the constraint of uncorrelated features is the same as orthogonal features (Koren & Carmel, 2004). 3.1. Problem formulation Following the standard derivation of PCA given in (Jolliffe, 2002), we consider the variance of the principal components (PCs) one by one. In the TVP setting, the pth PCs are {ymp , m = 1, ..., M }, where M is the number of training samples and ymp is the pro jection of the mth sample Xm by the pth EMP {up Xm ×N=1 n (n)T {up , n (n)T ,n = 1, ..., N }: ymp = = 1, ..., N }. Accordingly, the variance is measure by their total scatter y STp , which is defined as y STp = M m (n)T ,n = (ymp - yp )2 , ¯ (4) =1 (3) where the pth component of y is obtained from the pth (N )T (2)T (1)T . The EMP as: y(p) = X ×1 up ×2 up ... ×N up TROD (Shashua & Levin, 2001) in fact seeks a TVP to maximize the captured variance, however, it takes a heuristic greedy approach. In the next section, we propose a systematic, more principled formulation by taking consideration of the correlation among features. In addition, the TVP for dimensionality reduction here is related mathematically to the parallel factor analysis (PARAFAC) originated from psychometrics (Harshman, 1970), also known as the canonical decomposition (CANDECOMP) (Carroll & Chang, 1970), which is popular in factor analysis of multi-way data, i.e., tensors. However, they are developed from different perspectives. The PARAFAC in the factorization literature aims to decompose a higher-order tensor, often formed by arranging lower-order tensors, into a number of rank-one tensorial factors explaining the formation of the data. In contrast, the ob jective of the TVP for dimensionality reduction here is to learn a low-dimensional (subspace) representation of a class of tensor ob jects from a number of samples so that the underlying (class) structure is well captured. m 1 ymp . In addition, let gp denote where yp = M ¯ the pth coordinate vector, with its mth component gp (m) = ymp . A formal definition of the unsupervised multilinear feature extraction problem to be solved in UMPCA is then given in the following: A set of M tensor ob ject samples {X1 , X2 , ..., XM } are available for training. Each tensor ob ject I2 Xm RI1 ×R ×...×IN Rassumes values in the tensor I2 IN I1 ... , where In is the n-mode dispace R d mension of the tensor and enotes the Kronecker product. The ob jective of the UMPCA is to find a (n) TVP, which consists of P EMPs {up RIn ×1 , n = 1, ..., N }P=1 , mapping from the original tensor space Rp2 RI N I RI 1 ... into a vector subspace RP (with N P < n=1 In ): ym = Xm ×N=1 {u(n) , n = 1, ..., N }P=1 , m = 1, ..., M , n p p (5) such that the variance of the pro jected samples, meay sured by STp , is maximized in each EMP direction, sub ject to the constraint that the P coordinate vectors {gp RM , p = 1, ..., P } are uncorrelated. 1 When the training sample mean is not zero, it can be subtracted to make the training samples to be zero-mean. T Uncorrelated Multilinear Principal Comp onent Analysis In other words, the UMPCA ob jective is to determine (n)T a set of P EMPs {up , n = 1, ..., N }P=1 that maxip mize the variance while producing features with zerocorrelation. Thus, the ob jective function for the pth EMP is {u(n) , n = 1, ..., N } p subject to T gp gq T To solve for up in the n -mode, assuming that (n) {up , n = n } is given, the tensor samples are projected in these (N - 1) modes {n = n } first to obtain the vectors ~ (n ymp ) = (n ) Xm ×1 u(1) ... ×n -1 u(n p p ×n +1 u(n p T -1)T T = arg max M m (ymp - y p )2 , (n ) +1)T ... ×N u(N ) , p (7) =1 T u(n) u(n) p p = 1 and (6) gp gq = pq , p, q = 1, ..., P, where pq is the Kronecker delta (defined as 1 for p = q and as 0 otherwise). 3.2. The UMPCA algorithm To solve the UMPCA problem (6), we follow the successive variance maximization approach in the derivation of PCA in (Jolliffe, 2002). The P EMPs {up , n = 1, ..., N }P=1 are determined one by one p in P steps, with the pth step obtaining the pth EMP: Step 1: Determine the first EMP {u1 , n = y 1, ..., N } by maximizing ST1 without any constraint. Step 2: Determine the second EMP {u2 , n = y 1, ..., N } by maximizing ST2 sub ject to the conT straint that g2 g1 = 0. Step p(p = 3, ..., P ): (n)T {up , n (n)T (n)T (n)T ~ where ymp RIn . This conditional subproblem then ( n ) becomes to determine up that pro jects the vector (n ) ~ samples {ymp , m = 1, ..., M } onto a line so that the variance is maximized, sub ject to the zero-correlation constraint, which is a PCA problem with the input (n ) ~ samples {ymp , m = 1, ..., M }. The corresponding to ~ (n ) tal scatter matrix STp is then defined as ~ (n STp ) = M m ¯( ¯( ~ (n ~ ~ (n ~ (ymp ) - ypn ) )(ymp ) - ypn ) )T , (8) =1 Determine the pth y STp EMP m ( n ) ¯ (n ) = 1 ~ ~ ymp . With (8), we are ready where yp M (n ) to solve for the P EMPs. For p = 1, the u1 that T (n ) ~ (n ) (n ) maximizes the total scatter u1 ST1 u1 in the pro jected space is obtained as the unit eigenvector of ~ (n ) ST1 associated with the largest eigenvalue. Next, we show how to determine the pth (p > 1) EMP given the first (p - 1) EMPs. Given the first (p - 1) EMPs, y the pth EMP aims to maximize the total scatter STp , sub ject to the constraint that features pro jected by the pth EMP are uncorrelated with those pro jected ~ (n ) RIn ×M be by the first (p - 1) EMPs. Let Yp ( n ) ~ (n ) = ~ a ~matrix with ymp as its mth column, i.e., Yp , ~ ~ y1p , y2p , ..., yMp T ( n ) (n ) (n ) = 1, ..., N } by maximizing sub ject T to the constraint that gp gq = 0 for q = 1, ..., p - 1. In order to solve for the pth EMP {up , n = 1, ..., N }, we need to determine N sets of parameters correspond(1) (2) (N ) ing to N pro jection vectors, up , up , ...up , one in each mode. Unfortunately, simultaneous determination of these N sets of parameters in all modes is a complicated non-linear problem without an existing optimal solution, except when N = 1, which is the classical PCA where only one pro jection vector is to be solved. Therefore, we follow the approach in the alternating least square (ALS) algorithm (Harshman, 1970) to solve this multilinear problem. For each EMP to be determined, the parameters of the pro jection vec(n ) tor up for each mode n are estimated one mode by (n) one mode separately, conditioned on {up , n = n }, the parameter values of the pro jection vectors in the other modes. (n)T then the pth coordinate vector ~ (n ) u(n ) . The constraint that gp is unis gp = Yp p correlated with {gq , q = 1, ..., p - 1} can be written as T gp gq = u(n p (n ) T ) ~( Ypn ) gq = 0, q = 1, ..., p - 1. (9) Thus, up (p > 1) can be determined by solving the following constrained optimization problem: u(n p ) = arg max u(n p T T ) ~ (n ) STp u(n ) , p (10) subject to u(n p T ~( u(n ) Ypn ) gq p ) u(n p ) = 1 and = 0, q = 1, ..., p - 1, The solution is given by the following theorem: Theorem 1. The solution to the problem (10) is the (unit-length) eigenvector corresponding to the largest eigenvalue of the fol lowing eigenvalue problem: ~ (n ) (n ) STp u = u, p (11) Uncorrelated Multilinear Principal Comp onent Analysis where ( n p Thus, ) ~( ~( = IIn - Ypn ) Gp-1 -1 GT-1 Ypn p p ~( T ~( p = GT-1 Ypn ) Ypn ) Gp-1 , p T ) , (12) (13) ~( µp-1 = 2-1 · GT-1 Ypn p p Since from (14) and (19), p-1 q =1 T ) ~ (n ) STp u(n ) . p (21) Gp-1 = [g1 g2 ...gp-1 ] R M ×(p-1) , (14) and IIn is an identity matrix of size In × In . Proof. First, Lagrange multipliers can be used to transform the problem (10) to the following to include all the constraints: - u T T ~ (n ) F (u(n ) ) = u(n ) STp u(n ) - (n ) u(n ) - 1 p p p p p p-1 q =1 ~( ~( µq Ypn ) gq = Ypn ) Gp-1 µp-1 , (22) the equation (16) can be written as ~ 2STp u(n p (n ) µq u(n p T ) ~( Ypn ) gq , (15) u(n ) p I ~( ~ ( T (n ) = In - Ypn ) Gp-1 -1 GT-1 Ypn ) STp u(n ) . p p p Using the definition in (12), an eigenvalue problem is ~ (n ) obtained as (n ) STp u = u. Since is the criterion p to be maximized, the maximization is achieved by set(n) to be the (unit) eigenvector corresponding ting up to the largest eigenvalue of (11). By setting 1 = IIn and from Theorem 1, we have (n ) a unified solution for UMPCA: for p = 1, ..., P , up ~ (n ) is obtained as the unit eigenvector of (n ) STp assop ciated with the largest eigenvalue. Algorithm 1 summarizes the UMPCA developed here. Algorithm 1 Uncorrelated Multilinear Principal Component Analysis (UMPCA) Input: A set of tensor samples {Xm RI1 ×...×IN , m = 1, ..., M }, the subspace dimensionality P , and the maximum number of iterations K . for p = 1 to P do for n = 1 to N do (n) Initialize up(0) = 1/ 1 . end for for k = 1 to K do for n = 1 to N do (n) (1)T ~ Calculate ymp = Xm ×1 up(k) ... ×n-1 up(k) ×n+1 up(k-1) ... ×N up(k-1) , for m = 1, ..., M . (n) ( ~ (n) Calculate pn) and STp . Set up(k) to be ~ (n) the (unit) eigenvector of (n) STp associated p with the largest eigenvalue. end for end for ( n) ( n) Set up = upk for all n. Calculate the coordinate vector gp . end for (n-1)T (n+1)T (N )T (n ) ~( - Ypn ) Gp-1 µp-1 = 0 µp-1 ~ (n ) ~( = STp u(n ) - Ypn ) Gp-1 p 2~ ) - 2 u(n p ) where and {µq , q = 1, ..., p - 1} are Lagrange multipliers. The optimization is performed by setting the partial (n ) (n ) to zero: derivative of F (up ) with respect to up F (up (n ) ) (n ) up = - ~ 2STp u(n p p-1 q =1 (n ) ) - 2 u(n p ) ~( µq Ypn ) gq = 0. (16) Multiplying (16) by up (n )T results in =0 , (17) T ~ (n ) 2u(n ) STp u(n ) p p - T 2 u(n ) u(n ) p p = (n )T ~ (n ) (n ) STp up up up (n )T up (n ) which indicates that is exactly the criterion to be maximized, with the constraint on the norm of the pro jection vector incorporated. Next, a set of (p - 1) equations are obtained by multiT ~ (n ) , q = 1, ..., p - 1, respectively: plying (16) by gT Yp q T ~( 2gq Ypn T ) ~ (n ) STp u(n ) - p p-1 q =1 T ~( µq gq Ypn T ) ~( · Ypn ) gq = 0. (18) Let µp-1 = [µ1 µ2 ... µp-1 ]T (19) and use (13) and (14), then the (p - 1) equations of (18) can be represented in a single matrix equation as following: ~( 2GT-1 Ypn p T ) ~ (n ) STp u(n ) - p µp-1 = 0. p (20) Uncorrelated Multilinear Principal Comp onent Analysis 3.3. Initialization, pro jection order and termination As an iterative algorithm, the UMPCA may be affected by the initialization method, the pro jection order and the termination conditions. Due to the space constraint, these issues, as well as the convergence and computational issues, are not studied here. Instead, we adopt simple implementation strategies for them. First, we use the uniform initialization for UMPCA, where all n-mode pro jection vectors are initialized to have unit length and the same value along the In dimensions in n-mode, which is equivalent to the all ones vector 1 with proper normalization. Second, as shown in Algorithm 1, the pro jection order, which is the mode ordering in computing the pro jection vectors, is from 1-mode to N -mode, as in other multilinear algorithms (Ye, 2005; Xu et al., 2005; Lu et al., 2008a). Third, the iteration is terminated by setting K , the maximum number of iterations. Figure 1. Examples of face images from two sub jects in the FERET subset used in our experimental evaluation. 4.2. Face recognition p erformance comparison In the evaluation, we compare the performance of the UMPCA against three PCA-based unsupervised learning algorithms: the PCA (eigenface) algorithm (Turk & Pentland, 1991), the MPCA algorithm (Lu et al., 2008a)2 and the TROD algorithm (Shashua & Levin, 2001). The number of iterations in TROD and UMPCA is set to ten, with the same (uniform) initialization used. For MPCA, we obtain the full pro jection and select the most descriptive P features for recognition. The features obtained by these four algorithms are arranged in descending variation captured (measured by respective total scatter). For classification of extracted features, we use the nearest neighbor classifier (NNC) with Euclidean distance measure. Gray-level face images are naturally second-order tensors (matrices), i.e., N = 2. Therefore, they are input directly as 80 × 80 tensors to the multilinear algorithms (MPCA, TROD, UMPCA), while for PCA, they are vectorized to 6400 × 1 vectors as input. For each sub ject in a face recognition experiment, L(= 1, 2, 3, 4, 5, 6, 7) samples are randomly selected for unsupervised training and the rest are used for testing. We report the results averaged over ten such random splits (repetitions). Figures 2 and 3 show the detailed results3 for L = 1 and L = 7, respectively. L = 1 is an extreme small sample size scenario where only one sample per class is available for training, the so-called one training sample (OTS) case important in practice (Wang et al., 2006), and L = 7 is the maximum number of training samples we can use in our experiments. Figures 2(a) and 3(a) plot the correct recognition rates against P , the dimensionality of the subspace for P = 1, ..., 10, and Figs 2(b) and 3(b) plot those for P = 15, ..., 80. From the figures, UMPCA outperforms the other three methods in both cases and across all dimensionality, indicating that the uncorrelated features extracted directly from the tensorial face data are more effective in classifiNote that MPCA with N = 2 is equivalent to GPCA. Note that for PCA and UMPCA, there are at most 69 features when L = 1 (only 70 faces for training). 3 2 4. Exp erimental Evaluation The proposed UMPCA can potentially benefit various applications involving tensorial data, as mentioned in Sec. 1. Since face recognition has practical importance in security-related applications such as biometric authentication and surveillance, it has been used widely for evaluation of unsupervised learning algorithms (Shashua & Levin, 2001; Yang et al., 2004; Xu et al., 2005; Ye, 2005). Therefore, in this section, we focus on evaluating the effectiveness of UMPCA on this popular classification task through performance comparison with existing unsupervised dimensionality reduction algorithms. 4.1. The FERET database The Facial Recognition Technology (FERET) database (Phillips et al., 2000) is widely used for testing face recognition performance, with 14,126 images from 1,199 sub jects covering a wide range of variations in viewpoint, illumination, facial expression, races and ages. A subset of this database is selected in our experimental evaluation and it consists of those sub jects with each sub ject having at least eight images with at most 15 degrees of pose variation, resulting in 721 face images from 70 sub jects. Since our focus here is on the recognition of faces rather than their detection, all face images are manually cropped, aligned (with manually annotated coordinate information of eyes) and normalized to 80 × 80 pixels, with 256 gray levels per pixel. Figure 1 shows some sample face images from two sub jects in this FERET subset. Uncorrelated Multilinear Principal Comp onent Analysis (a) (b) by UMPCA is considerably lower than those captured by the other methods, which is due to its constraints of zero-correlation and TVP. Despite capturing lower variance, UMPCA is superior in the recognition task performed. Nonetheless, when the variance captured is too low, those corresponding features are no longer descriptive enough to contribute in classification, leading to the saturation. In addition, we also plot the average correlation of individual features with all the other features in Figs. 2(d) and 3(d). As supported by theoretical derivation, features extracted by PCA and UMPCA are uncorrelated. In contrast, features extracted by MPCA and TROD are correlated, with TROD features have higher correlation on average. (c) (d) Figure 2. Detailed face recognition results on the FERET database for L = 1: (a) performance curves for the lowdimensional case, (b) performance curves for the highdimensional case, (c) the variation captured by individual features and (d) the correlation among features. Table 2. Face recognition results on the FERET database: the recognition rates (in percentage) for various Ls and P s. L 2 3 4 (a) (b) 5 6 (c) (d) P PCA MPCA TROD UMPCA PCA MPCA TROD UMPCA PCA MPCA TROD UMPCA PCA MPCA TROD UMPCA PCA MPCA TROD UMPCA 1 2.8 2.6 3.6 8.1 2.7 2.3 4.0 7.5 2.7 2.3 4.2 8.5 3.0 2.6 4.5 8.1 2.8 2.2 4.3 9.1 5 20.2 21.4 19.3 27.6 23.9 25.9 23.5 35.5 25.5 28.7 25.3 39.5 28.9 33.0 28.4 43.6 30.3 33.5 27.3 45.6 10 32.0 28.1 30.6 40.6 37.1 34.8 36.1 49.8 41.7 39.4 41.1 56.2 47.1 43.2 47.2 61.7 49.0 45.7 49.3 62.9 20 39.1 38.9 38.4 45.0 45.9 45.5 44.5 56.0 49.4 50.2 49.0 63.5 55.6 56.8 55.6 68.2 58.5 59.7 58.6 70.7 50 43.6 44.6 43.0 45.8 51.3 52.0 50.1 56.6 56.8 57.5 55.1 64.1 63.9 64.3 62.0 69.1 66.7 67.9 64.7 71.8 80 45.1 46.0 44.3 45.7 52.6 53.3 51.7 56.6 57.9 58.9 56.6 64.2 64.6 65.8 63.9 69.1 68.1 69.7 66.9 71.8 Figure 3. Detailed face recognition results on the FERET database for L = 7: (a) performance curves for the lowdimensional case, (b) performance curves for the highdimensional case, (c) the variation captured by individual features and (d) the correlation among features. cation. The figures also show that for UMPCA, the recognition rate saturates around P = 30, which can be explained by observing the variance captured by individual features as shown in Figs. 2(c) and 3(c) (in log scale). These figures show that the variance captured The recognition results for P = 1, 5, 10, 20, 50, 80 are listed in Table 2 for L = 2, 3, 4, 5, 6, where the best recognition results among the four methods are shown in bold. More detailed results are omitted here to save space. From the table, UMPCA achieves superior recognition results in all cases except for P = 80 and L = 2, where the difference with the best results by MPCA is small (0.3%). In particular, for smaller P (1, 5, 10, 20), UMPCA outperforms the other algorithms significantly, demonstrating its superior capability in classifying faces in low-dimensional spaces. Uncorrelated Multilinear Principal Comp onent Analysis 5. Conclusions This paper proposes a novel uncorrelated multilinear PCA algorithm, where uncorrelated features are extracted directly from tensorial representation through a tensor-to-vector pro jection. The algorithm successively maximizes variance captured by each elementary pro jection while enforcing the zero-correlation constraint. The solution employs the alternating projection method and is iterative. Experiments on face recognition demonstrate that compared with other unsupervised learning algorithms including the PCA, MPCA and TROD, the UMPCA achieves the best results and it is particularly effective in low-dimensional spaces. Thus, face recognition through unsupervised learning benefits from the proposed UMPCA and in future research, it is worthwhile to investigate whether UMPCA can contribute in other unsupervised learning tasks, such as clustering. Law, M. H. C., & Jain, A. K. (2006). Incremental nonlinear dimensionality reduction by manifold learning. IEEE Trans. Pattern Anal. Mach. Intel l., 28, 377­391. Lu, H., Plataniotis, K. N., & Venetsanopoulos, A. N. (2008a). MPCA: Multilinear principal component analysis of tensor ob jects. IEEE Trans. Neural Netw., 19, 18­39. Lu, H., Plataniotis, K. N., & Venetsanopoulos, A. N. (2008b). Uncorrelated multilinear discriminant analysis with regularization and aggregation for tensor ob ject recognition. IEEE Trans. Neural Netw. accepted pending minor revision. Phillips, P. J., Moon, H., Rizvi, S. A., & Rauss, P. (2000). The FERET evaluation method for face recognition algorithms. IEEE Trans. Pattern Anal. Mach. Intel l., 22, 1090­1104. Shakhnarovich, G., & Moghaddam, B. (2004). Face recognition in subspaces. Handbook of Face Recognition (pp. 141­168). Springer-Verlag. Shashua, A., & Levin, A. (2001). Linear image coding for regression and classification using the tensorrank principle. Proc. IEEE Conf. on Computer Vision and Pattern Recognition (pp. 42­49). Turk, M., & Pentland, A. (1991). Eigenfaces for recognition. Journal of Cognitive Neurosicence, 3, 71­86. Wang, J., Plataniotis, K. N., Lu, J., & Venetsanopoulos, A. N. (2006). On solving the face recognition problem with one training sample per sub ject. Pattern Recognition, 39, 1746­1762. Xu, D., Yan, S., Zhang, L., Zhang, H.-J., Liu, Z., & Shum;, H.-Y. (2005). Concurrent subspaces analysis. Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition (pp. 203­208). Yang, J., Zhang, D., Frangi, A. F., & Yang, J. (2004). Two-dimensional PCA: a new approach to appearance-based face representation and recognition. IEEE Trans. Pattern Anal. Mach. Intel l., 26, 131­137. Ye, J. (2005). Generalized low rank approximations of matrices. Machine Learning, 61, 167­191. Ye, J., Janardan, R., & Li, Q. (2004). GPCA: An efficient dimension reduction scheme for image compression and retrieval. The 10th ACM SIGKDD Int. Conf. on Know ledge Discovery and Data Mining (pp. 354­363). Acknowledgments The authors would like to thank the anonymous reviewers for their insightful comments. This work is partially supported by the Ontario Centres of Excellence through the Communications and Information Technology Ontario Partnership Program and the Bell University Labs - at the University of Toronto. References Carroll, J. D., & Chang, J. J. (1970). Analysis of individual differences in multidimensional scaling via an n-way generalization of "eckart-young" decomposition. Psychometrika, 35, 283­319. Faloutsos, C., Kolda, T. G., & Sun, J. (2007). Mining large time-evolving data using matrix and tensor tools. Int. Conf. on Data Mining 2007 Tutorial. Harshman, R. A. (1970). Foundations of the parafac procedure: Models and conditions for an "explanatory" multi-modal factor analysis. UCLA Working Papers in Phonetics, 16, 1­84. Jolliffe, I. T. (2002). Principal component analysis, second edition. Springer Serires in Statistics. Koren, Y., & Carmel, L. (2004). Robust linear dimensionality reduction. IEEE Trans. Vis. Comput. Graphics, 10, 459­470. Lathauwer, L. D., Moor, B. D., & Vandewalle, J. (2000). On the best rank-1 and rank(R1 , R2 , ..., RN ) approximation of higher-order tensors. SIAM Journal of Matrix Analysis and Applications, 21, 1324­1342.