Robust Subspace Segmentation by Low-Rank Representation

Guangcan Liu  roth@sjtu.edu.cn Zhouchen Lin  zhoulin@microsoft.com Yong Yu  yyu@apex.sjtu.edu.cn  Shanghai Jiao Tong University, NO. 800, Dongchuan Road, Min Hang District, Shanghai, China, 200240


Microsoft Research Asia, NO. 49, Zhichun Road, Hai Dian District, Beijing, China, 100190

Abstract
We propose low-rank representation (LRR) to segment data drawn from a union of multiple linear (or affine) subspaces. Given a set of data vectors, LRR seeks the lowestrank representation among all the candidates that represent all vectors as the linear combination of the bases in a dictionary. Unlike the well-known sparse representation (SR), which computes the sparsest representation of each data vector individually, LRR aims at finding the lowest-rank representation of a collection of vectors jointly. LRR better captures the global structure of data, giving a more effective tool for robust subspace segmentation from corrupted data. Both theoretical and experimental results show that LRR is a promising tool for subspace segmentation.

subspaces have been gaining much attention in recent years. For example, the hotly discussed matrix compee tition (Cand`s & Recht, 2009; Keshavan et al., 2009; Cand`s et al., 2009) problem is essentially based on e the hypothesis that the data is drawn from a low-rank subspace. However, a given data set is seldom well described by a single subspace. A more reasonable model is to consider data as lying near several subspaces, leading to the challenging problem of subspace segmentation. Here, the goal is to segment (or cluster) data into clusters with each cluster corresponding to a subspace. Subspace segmentation is an important data clustering problem as it arises in numerous research areas, including machine learning (Lu & Vidal, 2006), computer vision (Ho et al., 2003), image processing (Fischler & Bolles, 1981) and system identification. Previous Work. According to their mechanisms of representing the subspaces, existing works can be roughly divided into four main categories: mixture of Gaussian, factorization, algebraic and compressed sensing. In statistical learning, mixed data are typically modeled as a set of independent samples drawn from a mixture of probabilistic distributions. As a single subspace can be well modeled by a Gaussian distribution, it is straightforward to assume that each probabilistic distribution is Gaussian, so known as the mixture of Gaussian model. Then the problem of segmenting the data is converted to a model estimation problem. The estimation can be performed either by using the Expectation Maximization (EM) algorithm to find a maximum likelihood estimate, as done in (Gruber & Weiss, 2004), or by iteratively finding a min-max estimate, as adopted by Ksubspaces (Ho et al., 2003) and Random Sample Consensus (RANSAC) (Fischler & Bolles, 1981). These methods are sensitive to the noise and outliers. So

1. Introduction
In scientific data analysis and system engineering, one usually needs a parametric model to characterize a given set of data. To this end, linear models such as the linear subspaces 1 are possibly the most common choice, mainly because they are easy to compute and are also often effective in real applications. So the
1 There is no loss of generality in assuming that the subspaces are linear, i.e., they all contain the origin. For the affine subspaces that do not contain the origin, we can always increase the dimension of the ambient space by one and identify each affine subspace with the linear subspace that it spans. So we always use the term "subspace" to denote "linear subspace" and "affine subspace" in this work.

Appearing in Proceedings of the 27 th International Conference on Machine Learning, Haifa, Israel, 2010. Copyright 2010 by the author(s)/owner(s).

Robust Subspace Segmentation by Low-Rank Representation

several efforts have been made for improving their robustness, e.g., the Median K-flats (Zhang et al., 2009) for K-subspaces, the work of (Yang et al., 2006) for RANSAC, and (Ma et al., 2008a; Wright et al., 2008a) use a coding length to characterize a mixture of Gaussian, which may have some robustness. Nevertheless, the problem is still not well solved due to the optimization difficulty, which is a bottleneck for these methods to achieve robustness. Factorization based methods (Costeira & Kanade, 1998; Gruber & Weiss, 2004) seek to represent the given data matrix as a product of two matrices, so that the support pattern of one of the factors reveals the grouping of the points. These methods aim at modifying popular factor analysis algorithms (often based on alternating minimization or EM-style algorithms) to produce such factorizations. Nevertheless, these methods are sensitive to noise and outliers, and it is not easy to modify them to be robust because they usually need iterative optimization algorithms to obtain the factorizations. Generalized Principal Component Analysis (GPCA) (Ma et al., 2008b) presents an algebraic way to model the data drawn from a union of multiple subspaces. By describing a subspace containing a data point by using the gradient of a polynomial at that point, subspace segmentation is then equivalent to fitting the data with polynomials. GPCA can guarantee the success of the segmentation under certain conditions, and it does not impose any restriction on the subspaces. However, this method is sensitive to noise and outliers due to the difficulty of estimating the polynomials from real data, which also causes the high computation cost of GPCA. Recently, Robust Algebraic Segmentation (RAS)(Rao et al., 2010) is proposed to resolve the robustness issue of GPCA. However, the computation difficulty for fitting polynomials is unfathomed. So RAS can make sense only when the data dimension is low and the number of subspaces is small. Recently, the work of (Rao et al., 2009) and Sparse Subspace Clustering (SSC) (Elhamifar & Vidal, 2009) introduced compressed sensing techniques to subspace segmentation. SSC uses the sparsest representation produced by 1 -minimization (Wright et al., 2008b; Eldar & Mishali, 2008) to define the affinity matrix of an undirected graph. Then subspace segmentation is performed by spectral clustering algorithms such as the Normalized Cuts (NCut) (Shi & Malik, 2000). Under the assumption that the subspaces are independent, SSC shows that the sparsest representation is also "block-sparse". Namely, the within-cluster affinities are sparse (but nonzero) and the between-cluster

affinities are all zeros. This implies that the graph is well defined and easy to segment. However, as SR finds the sparsest representation of each data vector individually, there is therefore no global constraint on its solution. So the method may be inaccurate at capturing the global structures of data. This drawback can largely depress the performance when the data is grossly corrupted. When no extra "clean" data are available, SR approaches may not be robust to noise and outliers in the subspace segmentation problem 2 . Our Contributions. In this work, we propose a novel method, called the low-rank representation (LRR), to address the subspace segmentation problem. As a compressed sensing technique, LRR also represents a data vector as a linear combination of the other vectors. Given a set of data vectors drawn from a union of multiple subspaces, unlike SR, LRR finds the lowestrank representation of all data jointly. The lowestrank representation can be used to define the affinities of an undirected graph, and then the final segmentation results can be obtained by spectral clustering. Comparing to SR, LRR is better at capturing global structures of data. So it better fits the subspace segmentation problem. When the subspaces are independent, we prove that there is a lowest-rank representation that reveals the membership of the samples: the within-cluster affinities are dense, and the betweencluster affinities are all zeros. We show that this solution solves a nuclear norm minimization problem, and give an efficient algorithm for solving this problem. For corrupted data, since the corruption will largely increase the rank as shown in matrix competition lectures (e.g., (Cand`s & Recht, 2009; Keshavan et al., e 2009; Cand`s et al., 2009)), the lowest-rank criterion e can enforce to correct corruptions and LRR could be robust to noise and outliers. In summary, the contributions of this work include: - We introduce a simple and efficient new subspace segmentation algorithm that outperforms stateof-the-art algorithms in handling corrupted data. - Our work extends the recovery of corrupted data from single subspace(Cand`s et al., 2009) to mule tiple subspaces. Comparing to (Eldar & Mishali, 2008), which requires the bases of subspaces to be known to handle the corrupted data from multiple subspaces, our method is autonomous, i.e., no extra clean data is required.
SR can be robust to noise and outliers when extra clean data is available, e.g., a well defined dictionary as adopted by (Wright et al., 2008b).
2

Robust Subspace Segmentation by Low-Rank Representation

2. Problem Formulation
More precisely, this paper addresses the following problem. Problem 2.1 Given a set of sufficiently dense data vectors X = [x1 , x2 , · · · , xn ] (each column is a sample) drawn from a union of k subspaces {Si }k of unknown i=1 dimensions, in a D-dimensional Euclidean space, segment all data vectors into their respective subspaces. We consider the problem under two assumptions of increasing practicality and difficulty. Assumption 1. The subspaces are low-rank and independent 3 , and the data is noiseless. Assumption 2. A fraction of the data vectors are corrupted by noise or contaminated by outliers, or to be more precise, the data contains sparse and properly bounded errors. The independence assumption is mild, because this is usually true especially when the subspaces are lowrank. What is critical is that the real data may not strictly follow subspace structures, i.e., there exist errors in data. In case of there is no extra clean data available, this problem is rather challenging. When the clean data is sufficient to represent the subspaces (i.e., the errors are sparse), and the errors do not "confuse" different subspaces (i.e., the errors are properly bounded), it will be shown that LRR is rather robust.

problem min rank(Z),
Z

(2)

s.t., X = AZ. We call the optimal solutions Z  of the above problem the "lowest-rank representations" of data X with respect to a dictionary A. The above optimization problem is difficult to solve due to the discrete nature of the rank function. Fortunately, as suggested by matrix completion methods (e.g., (Cand`s & Recht, 2009; e Keshavan et al., 2009; Cand`s et al., 2009)) the followe ing convex optimization provides a good surrogate for problem (2): min ||Z|| ,
Z

(3)

s.t., X = AZ. Here, || · || denotes the nuclear norm (Fazel, 2002) of a matrix, i.e., the sum of the singular values of the matrix. 3.2. The Basic Messages Let X = [x1 , x2 , · · · , xn ] be a set of data vectors drawn from a union of k subspaces {Si }k . Let {di }k be i=1 i=1 the unknown dimensions of the k subspaces, and Xi be collection of ni data vectors drawn from the i-th subspace Si . Without loss of generality, we assume that X = [X1 , X2 , · · · , Xk ] (i.e., the indices have been rearranged to satisfy the true segmentation of the data). In order to segment the data into their respective subspaces, we need to compute an affinity matrix that encodes the pairwise affinities between data vectors. So we use the data X itself as the dictionary, i.e., problem (3) becomes min ||Z|| ,
Z

3. Subspace Segmentation via LRR
3.1. Low-Rank Representation Consider a set of data vectors X = [x1 , x2 , · · · , xn ] (each column is a sample) in RD , each of which can be represented by the linear combination of the basis in a "dictionary" A = [a1 , a2 , · · · , am ]: X = AZ, (1)

(4)

where Z = [z1 , z2 , · · · , zn ] is the coefficient matrix with each zi being the representation of xi . The dictionary is often overcomplete and hence there can be multiple feasible solutions to problem (1). It has been observed (e.g., by (Elhamifar & Vidal, 2009)) that sparse representations using an appropriate dictionaries A may reveal the clustering of the points xi . However, as mentioned above, sparse representation may not capture the global structures of the data X. As we will see, low rankness may be a more appropriate criterion. That is, we look for a representation Z by solving the
The subspaces are independent if and only if k is the direct sum. Si = k Si , where i=1 i=1
3

s.t., X = XZ. Note here that there always exist feasible solutions even when the data sampling is insufficient, because a data vector can be used to represent itself in LRR. To be more precise, since the solution space is I - null(X), there always exist nontrivial solutions when rank(X) < n. This is different from SR, which is prone to obtain a "trivial" solution if a data vector is used to represent itself. Theorem 3.1 Assume that the data sampling is sufficient such that ni > rank(Xi ) = di . If the subspaces are independent then there exists an optimal solution

Robust Subspace Segmentation by Low-Rank Representation

Z  to problem (4) that is   Z1 0   0 Z2   Z =  0 0 0 0

block-diagonal:  0 0 0 0   ,  .. . 0   0 Zk n×n

(a), Corrupted Data
1 0 -1 1 0 -1 -1 0 1

(b), Z

*

(c), Corrected Data
1 0 -1 1 0 -1 -1 0 1

  with Zi being an ni × ni matrix with rank(Zi ) = di ,  i.

The proof is based on the following well-known lemma, whose proof is provided in the appendix for completeness: Lemma 3.1 Let A and D be square matrices. Then for any matrices B and C of compatible dimension, A C B D 


Figure 1. Illustrating that LRR can automatically correct the corruptions in data: (a) The data a part of which is corrupted. (b) The optimal solution Z  of problem (5) with  = 0.05 (c) The corrected data obtained by XZ  .

A 0 0 D

= A


+ D .

The above lemma allows us to lower-bound the objective value at any solution Z by the value of the block-diagonal restriction of Z. This leads to a simple proof of Theorem 3.1:
Proof (of Theorem 3.1). Let Z be any optimizer to (4). Form a block-diagonal matrix W by setting Wij = Zij , 0, xi and xj belong to the same subspace, otherwise.

Theorem 3.1 does not guarantee that an arbitrary optimal solution to problem (4) is block-diagonal. The difficulty is essentially that the minimizer is non-unique. In theory, this could be resolved, e.g., by choosing a minimizer to (4) of smallest Frobenius norm (which is again a convex problem). However, in our simulations we have observed that the solution obtained is always block-diagonal, and so we do not pursue this here. 3.3. Robustness to Noise and Outliers In real applications, our observations are often noisy, or even grossly corrupted, and observations may be missing. For small noise (e.g., Gaussian) a reasonable strategy is simply to relax the equality constraint in (4), similar to (Candes & Plan, 2009). If we imagine instead that a fraction of the data vectors are grossly corrupted, a more reasonable objective might be min ||Z|| + ||E||2,1 ,
Z,E

Write Q = Z - W . For any matrix M , let [M ]j denote its j-th column. Let xj belong to the l-th subspace; i.e., [XZ]j  Sl . Then by construction [XW ]j  Sl , and [XQ]j  i=l Si . But [XQ]j = [XZ]j - [XW ]j  Sl . By independence, Sl  i=l Si = {0}, and so [XQ]j = 0. Hence, XQ = 0, and W is feasible for (4). By Lemma 3.1, Z   W  , and so W is optimal for (4). We can write W as   W1 0 0 0  0 W2 0 0    n×n W = , R ..  0  . 0 0 0 0 0 Wk where Wi  Rni ×ni . For each i, let Pi  Rni ×ni be the projection onto the null space of Xi . Then Xi (I - Pi )Wi =  Xi Wi = Xi . So, if we set Zi = (I - Pi )Wi , then    Z1 0 0 0   0 Z2 0 0    n×n Z =  R ..  0 . 0 0   0 0 0 Zk
 is again feasible for (4). Now, Z   = i Zi  = (I - Pi )Wi   Wi  = W  , where we have i i used (e.g., (Horn & Johnson, 1991) Corollary 3.5.10) to show that (I - Pi )Wi   (I - Pi ) Wi   Wi  . Hence, Z  is again optimal for (4). Moreover, for each   i, rank(Zi )  rank(I - Pi ) = di . Since Xi = Xi Zi ,   rank(Zi )  di , and so we conclude that rank(Zi ) = di for each i.

(5)

s.t., X = XZ + E,
2 where ||E||2,1 = j=1 i=1 ([E]ij ) is called as the 2,1 -norm, and the parameter  > 0 is used to balance the effects of the two parts, which could be chosen according to properties of the two norms, or tuned empirically. Since 2,1 -norm encourages the columns of E to be zero, the underlying assumption here is that the corruptions are "sample-specific", i.e., some data vectors are corrupted and the others are clean. n n

After obtaining an optimal solution (Z  , E  ), we could recover the original data by using X-E  (or XZ  ). To illustrate how the corruptions are corrected, we refer to an example as shown in Fig.1. There are about 80 data vectors sampled from two one-dimensional subspaces embedded in R3 , and about 25% data vectors are corrupted by large Gaussian errors. The results in Fig.1 show that LRR can well handle the corruptions. One message from these results is that LRR is unlikely to cause "positive-false" to change the clean data. So the possibly existed corruptions could be cor-

Robust Subspace Segmentation by Low-Rank Representation

Algorithm 1 Solving Problem (5) by Inexact ALM
Input: data matrix X, parameter  Initialize: Z = J = 0, E = 0, Y1 = 0, Y2 = 0, µ = 10-6 , maxu = 1010 ,  = 1.1,  = 10-8 . while not converged do 1. fix the others and update J by J = arg min 1 1 ||J|| + ||J - (Z + Y2 /µ)||2 F µ 2

Algorithm 2 Subspace Segmentation by LRR
Input: data matrix X, number of subspaces k 1. obtain the lowest-rank representation by solving problem (5) 2. construct an undirected graph by using the lowestrank representation to define the affinity matrix of the graph 3. use NCut to segment the vertices of the graph into k clusters

2. fix the others and update Z by Z = (I + X t X)-1 (X t X - X t E + J + (X t Y1 - Y2 )/µ) 3. fix the others and update E by E = arg min 1  ||E||2,1 + ||E - (X - XZ + Y1 /µ)||2 F µ 2

4. update the multipliers Y1 Y2 = = Y1 + µ(X - XZ - E) Y2 + µ(Z - J)

5. update the parameter µ by µ = min(µ, maxu ) 6. check the convergence conditions ||X - XZ - E|| <  and ||Z - J|| < . end while

where Y1 and Y2 are Lagrange multipliers and µ > 0 is a penalty parameter. The above problem can by solved by either exact or inexact ALM algorithms (Lin et al., 2009). For efficiency, we choose the inexact ALM, which we outline in Algorithm 1. Its convergence properties could be proved similarly as those in (Lin et al., 2009). Notice that although steps 1 and 3 of the algorithm are convex problems, they both have closedform solutions. Step 1 is solved via the singular value thresholding operator (Cai et al., 2008), while Step 3 is solved via the following lemma: Lemma 3.2 Let Q = [q1 , q2 , · · · , qi , · · · ] be a given matrix and ||·||F be the Frobenius norm. If the optimal solution of 1 min ||W ||2,1 + ||W - Q||2 F W 2 is W  , then the i-th column of W  is W  (:, i) =
||qi ||- ||qi || qi ,

rected in such a way: Let's rearrange the data vectors into X = [Xl , Xc ], where Xl is the clean data without corruptions and Xc is the corrupted data. In the case that the remainder clean data Xl is still sufficient to represent the subspaces, and the corruptions are properly bounded, it shall automatically correct the corruptions so as to obtain the lowest-rank representation. 3.4. Solving the Optimization Problem Since the problem (5) can fall back to the problem (4) by setting the parameter  to be relatively large, here we just present how to solve problem (5). We first convert it to the following equivalent problem:
Z,E,J

0,

if  < ||qi ||, otherwise.

3.5. The Complete Segmentation Algorithm After solving problem (5), as in (Elhamifar & Vidal, 2009), we utilize the lowest-rank representation (denoted by Z  ) to define the affinity matrix of an undirected graph. The data vectors correspond to the vertices and the affinity between xi and xj is computed by |[Z  ]ij | + |[Z  ]ji |. We then could use the spectral clustering algorithms such as Normalized Cuts (Shi & Malik, 2000) to produce the final segmentation results. Integrating LRR with spectral clustering has some advantages. First, since LRR may fail to obtain a block-diagonal representation in complex applications, the spectral clustering could ensure the robustness of the segmentation. Second, it is convenient to integrate the lowest-rank representation with other information by defining such an undirected graph. For example, in some specific applications such as image segmentation, ones may want to enforce that only the neighbor samples can be connected by edges. Algorithm 2 summarizes the whole segmentation algorithm of LRR.

min ||J|| + ||E||2,1 , s.t., X = XZ + E, Z = J,

(6)

which can be solved by solving the following Augmented Lagrange Multiplier (ALM) problem:
Z,E,J,Y1 ,Y2

min

||J|| + ||E||2,1 +

(7)

tr Y1t (X - XZ - E) + tr Y2t (Z - J) + µ (||X - XZ - E||2 + ||Z - J||2 ), F F 2

Robust Subspace Segmentation by Low-Rank Representation
100 90

segmentation accuracy(%)

80 70 60 50 40 30 20

Table 1. Segmentation errors (%) on Hopkins155. For LRR, the parameter is set as  = 2.4. The parameters of the other methods have been also tuned to be the best.

GPCA Max Mean Std. 55.67 30.51 11.79

LSA 38.37 8.77 9.80

RANSAC 41.31 7.81 9.72

SSC 37.44 3.66 7.21

LRR 32.50 3.13 5.96

SR1(=0.81) SR
2,1

(=4.24)

LRR ( = 0.12)
0 20 40 60 80 100

percentage of corruption(%)

Figure 2. The segmentation accuracy (averaged from 20 random trials) across the entire range of corruption for various methods. LRR (red curve) clearly outperforms both of SR1 and SR2,1 , performing nearly perfectly up to 60 percent corruptions. The parameter  is tuned to be the best at 20% percentage of corruption for various methods.

Figure 3. Some examples of the images of a class in Extended Yale Database B.

4. Experiments
4.1. Toy Data For this experiment, we compare the robustness of LRR and SR under the context of subspace segmentation. For fair of comparison, we implement two versions of SR. The first one is the standard version used by most previous methods: SR1 : minZ,E ||Z||1 + ||E||1 , s.t., X = XZ + E, [Z]ii = 0,
n

(8)

5 clusters and observe the segmentation accuracy 4 of each method. Although the 2,1 -norm is better than 1 -norm to fit the corruptions in this experiment, both of SR2,1 and SR1 are significantly outperformed by our LRR, as shown Fig.2. This illustrate the superiority of the lowest-rank criterion. As the corruption is unnecessary to decrease the sparsity, SR cannot handle well the corrupted data in unsupervised environment. Whereas, LRR is good at handling such corrupted data as analyzed in Section 3.3. 4.2. Slightly Corrupted Data In this section, we evaluate LRR on the Hopkins155 motion database (Tron & Vidal, 2007). The database consists of 156 sequences each of which has 39550 data vectors drawn from two or three motions (a motion corresponds to a subspace). Each sequence is a sole clustering task and so there are 156 clustering tasks in total. As the data dimension must be bounded above by 12 (Elhamifar & Vidal, 2009), we use a preprocessing step to project the data to be 12dimensional by PCA. The outliers in the data have been manually removed and the PCA can also reduce some noise. So it could be regarded that this database only contains slight corruptions.
4 As clustering methods cannot predict the class label of each cluster, we use a postprocess step to assign each cluster a label: Given ground truth classification results, the label of a cluster is the index of the ground truth class that contributes the maximum number of samples to the cluster. And then we can obtain the segmentation accuracy by computing the percentage of correctly classified data vectors.

where ||Z||1 = i,j=1 |[Z]ij | is called as the 1 -norm. The second one is modified for better fitting the corruptions: SR2,1 : minZ,E ||Z||1 + ||E||2,1 , s.t., X = XZ + E, [Z]ii = 0. (9)

We construct 5 independent subspaces {Si }5  R100 i=1 whose bases {Ui }5 are computed by Ui+1 = T Ui , 1  i=1 i  4, where T is a random rotation and U1 is a random orthogonal matrix of dimension 100 × 4. So, each subspace has a dimension of 4. We sample 20 data vectors from each subspace by Xi = Ui Qi , 1  i  5 with Qi being a 4 × 20 iid N (0, 1) matrix. Some data vectors are randomly chosen to corrupt, e.g., for a data vector x chosen to corrupt, its observed vector is computed by adding Gaussian noise with zero mean and variance 0.3||x|| (||x|| mostly ranges from 0.1 to 1.7 in this experiment). After obtaining the coefficient matrix, we use the algorithm presented in Section 3.5 to segment the data into

Robust Subspace Segmentation by Low-Rank Representation Table 2. Segmentation accuracy (%) on Extended Yale Database B. We have tuned the parameters of all methods to the best. The parameter of LRR is set to be  = 0.15 GPCA Acc. NA LSA 31.72 RANSAC NA SSC 37.66 LRR 62.53

Fig.3, more than half of the data vectors have been corrupted by "shadows" and noise. So the corruptions in this database is heavy. Since the computation of GPCA and RANSAC is unbearable on this database, we only list the results of LSA, SSC and LRR. Table 2 shows that LRR distinctly outperforms the baselines. The advantages of LRR mainly comes from its ability of automatically correcting the corruptions in data, as shown in Fig.4.

X

=

XZ *

+

E*

5. Conclusion and Future Work
In this work we propose the low-rank representation (LRR) to recover the lowest-rank representation of a set of data vectors in a joint way, i.e., to recover the lowest-rank representation of matrix data. Comparing to the widely used SR, LRR is better at handling the global structures and correcting the corruptions in data automatically. Both theoretical and experimental results illustrate the effectiveness of LRR. However, there remain several directions for future work: · It will be better to learn a compact dictionary for LRR, which is to recover the structure that generates the data.
Figure 4. Some examples of using LRR to correct the corruptions in faces. Left: The original data (X); Middle: The corrected data (XZ  ); Right: The error (E  ).

· The subspace segmentation should not be the only application of LRR. For example, one may use LRR to do supervised classification in a similar way as (Wright et al., 2008b). · LRR also gives a way to recover the corrupted data drawn from multiple subspaces. The theoretical conditions for the success of the recovery should be established.

In order to compare LRR with the state of the art, we also list the results of GPCA, Local Subspace Analysis (LSA) (Yan & Pollefeys, 2006), RANSAC and SSC5 . We manually tune the parameters of each method and report their best results. Table 1 shows that LRR significantly outperforms GPCA, LSA and RANSAC, and also outperforms SSC. This confirms that the lowest-rank criterion is accurate for modeling the structures of subspaces. 4.3. Heavily Corrupted Data We test LRR's ability to cope with the large corruptions in data, using a part of Extended Yale Database B (Lee et al., 2005), which consists of 640 frontal face images of 10 classes (there are 38 classes in the whole database and we use the first 10 classes for experiments). Each class contains about 64 images. We resize the images into 42×48 and use the raw pixel values to form data vectors of dimension 2016. As shown in
The Matlab code of these baselines can be downloaded http://www.vision.jhu.edu/data/hopkins155/. from The Matlab code of our method is available at www.apexlab.org/apex_wiki/gcliu.
5

Appendix
Proof (of Lemma 3.1). It is well-known (e.g., (Horn & Johnson, 1991) Theorem 3.4.1) that for any M , M Hence, A C
t U1 U1 =I, t V1 V1 =I,


= max tr U t M V .
U t U =I V t V =I

B D


max

t U2 U2 =I t V2 V2 =I

tr

t U1 0

0 t U2

A C

B D

V1 0

0 V2

=

t t U1 U1 =I, U2 U2 =I t t V1 V1 =I, V2 V2 =I t U1 U1 =I t V1 V1 =I

max

t t tr U1 AV1 + tr U2 DV2

t = max tr U1 AV1 +

t U2 U2 =I t V2 V2 =I

t max tr U2 DV2

= A


+ D


.

Robust Subspace Segmentation by Low-Rank Representation

Acknowledgments
The authors thank John Wright for sharing his creative ideas, helping designing the experiments, and helping revising the paper; and Yi Ma for giving us constructive instructions. The authors also thank the support by the grants from National Natural Science Foundation of China (NO. 60873211) and RGC/NSFC (NO. 60910123).

Lee, K.C., Ho, J., and Kriegman, D. Acquiring linear subspaces for face recognition under variable lighting. TPAMI, pp. 684­698, 2005. Lin, Zhouchen, Chen, Minming, Wu, Leqin, and Ma, Yi. The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. Technical report, UIUC Technical Report UILU-ENG09-2215, 2009. Lu, Le and Vidal, Ren´. Combined central and sube space clustering for computer vision applications. In ICML, pp. 593­600, 2006. Ma, Yi, Derksen, H., Wei, Hong, and Wright, John. Segmentation of multivariate mixed data via lossy data coding and compression. CVIU, 2008a. Ma, Yi, Yang, Allen, Derksen, Harm, and Fossum, Robert. Estimation of subspace arrangements with applications in modeling and segmenting mixed data. SIAM Review, pp. 413­458, 2008b. Rao, Shankar, Tron, Roberto, Vidal, Ren´, and Ma, e Yi. Motion segmentation in the presence of outlying, incomplete, or corrupted trajectories. TPAMI, 2009. Rao, Shankar, Yang, Allen, Sastry, Shankar, and Ma, Yi. Robust algebraic segmentation of mixed rigidbody and planar motions in two views. IJCV, pp. 425­446, 2010. Shi, Jianbo and Malik, Jitendra. Normalized cuts and image segmentation. TPAMI, pp. 888­905, 2000. Tron, Roberto and Vidal, Ren´. A benchmark for the e comparison of 3-d motion segmentation algorithms. In CVPR, 2007. Wright, John, Tao, Yangyu, Lin, Zhouchen, Ma, Yi, and Shum, Heung-Yeung. Classification via minimum incremental coding length (MICL). In NIPS. 2008a. Wright, John, Yang, Allen, Ganesh, A., Sastry, S. S., and Ma, Yi. Robust face recognition via sparse representation. TPAMI, pp. 210­227, 2008b. Yan, Jingyu and Pollefeys, Marc. A general framework for motion segmentation: Independent, articulated, rigid, non-rigid, degenerate and non-degenerate. In ECCV, pp. 94­106, 2006. Yang, Allen, Rao, Shankar, and Ma, Yi. Robust statistical estimation and segmentation of multiple subspaces. In CVPR Workshop, 2006. Zhang, Teng, Szlam, Arthur, and Lerman, Gilad. Median k-flats for hybrid linear modeling with many outliers. CoRR, 2009.

References
Cai, Jian-Feng, Candes, Emmanuel J., and Shen, Zuowei. A singular value thresholding algorithm for matrix completion. 2008. Candes, Emmanuel J. and Plan, Yaniv. Matrix completion with noise. In IEEE Proceeding, 2009. Cand`s, Emmanuel J. and Recht, Benjamin. Exact e matrix completion via convex optimization. Foundations of Computational Mathematics, 2009. Cand`s, Emmanuel J., Li, Xiaodong, Ma, Yi, and e Wright, John. Robust principal component analysis?, 2009. Costeira, Jo ao Paulo and Kanade, Takeo. A multibody factorization method for independently moving objects. IJCV, pp. 159­179, 1998. Eldar, Yonina C. and Mishali, Moshe. Robust recovery of signals from a union of subspaces. CoRR, 2008. Elhamifar, Ehsan and Vidal, Ren´. Sparse subspace e clustering. In CVPR, pp. 2790­2797, 2009. Fazel, M. Matrix rank minimization with applications. PhD thesis, 2002. Fischler, Martin A. and Bolles, Robert C. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, pp. 381­395, 1981. Gruber, Amit and Weiss, Yair. Multibody factorization with uncertainty and missing data using the em algorithm. In CVPR, pp. 707­714, 2004. Ho, Jeffrey, Yang, Ming-Hsuan, Lim, Jongwoo, Lee, Kuang-Chih, and Kriegman, David J. Clustering appearances of objects under varying illumination conditions. In CVPR, pp. 11­18, 2003. Horn, R. and Johnson, C. Topics in Matrix Analysis. 1991. Keshavan, Raghunandan, Montanari, Andrea, and Oh, Sewoong. Matrix completion from noisy entries. In NIPS, 2009.