Adaptive Hausdorff Estimation of Density Level Sets Aarti Singh and Robert D. Nowak University of Wisconsin - Madison, USA singh@cae.wisc.edu, nowak@engr.wisc.edu Clayton D. Scott University of Michigan - Ann Arbor, USA cscott@eecs.umich.edu Abstract Hausdorff accurate estimation of density level sets is relevant in applications where a spatially uniform mode of convergence is desired to ensure that the estimated set is close to the target set at all points. The minimax optimal rate of error convergence for the Hausdorff metric is known to be (n/ log n)-1/(d+2) for level sets with boundaries that have a Lipschitz functional form, and where the parameter characterizing the regularity of the density around the level of interest is known. Thus, all previous work is non-adaptive to the density regularity and assumes knowledge of the regularity parameter . Moreover, the estimators proposed in previous work achieve the minimax optimal rate for rather restricted classes of sets (for example, the boundary fragment and star-shaped sets) that effectively reduce the set estimation problem to a function estimation problem. This characterization precludes level sets with multiple connected components, which are fundamental to many applications. This paper presents a fully data-driven procedure that is adaptive to unknown local density regularity, and achieves minimax optimal Hausdorff error control for a class of level sets with very general shapes and multiple connected components. The goal of the density level set estimation problem is to generate an estimate G of the level set based on the n observations {Xi }n 1 , such that the error between the estimate G i= and the target set G , as assessed by some performance mea sure which gauges the closeness of the two sets, is small. Most literature available on level set estimation [SHS05, SN06, SD07, WN07, KT93, Tsy97, Pol95, RV06] considers global error measures related to the symmetric set difference. However some applications may need a more local or spatially uniform error measure as provided by the Hausdorff metric, for example, to ensure robustness to outliers or preserve topological properties of the level set. The Hausdorff error metric is defined as follows between two non-empty sets: d (G1 , G2 ) = max{ sup (x, G1 ), sup (x, G2 )} x G 2 x G 1 1 Introduction Density level sets provide useful summaries of a density function for many applications including clustering [Har75, Stu03], anomaly detection [SHS05, SN06, VV06], and data ranking [LPS99]. In practice, however, the density function itself is unknown a priori and only a finite number of observations from the density are available. Let X1 , . . . , Xn be independent, identically distributed observations drawn from an unknown probability measure P , having density f with respect to the Lebesgue measure, and defined on the domain X Rd . Given a desired density level , consider the -level set of the density f G := {x X : f (x) }. This work was partially supported by the National Science Foundation grants CNS-0519824 and ECS-0529381. where (x, G) = inf yG ||x - y ||, the smallest Euclidean distance of a point in G to the point x. If G1 or G2 is empty, then let d (G1 , G2 ) be defined as the largest distance between any two points in the domain. Control of this error measure provides a uniform mode of convergence as it implies control of the deviation of a single point from the desired set. A symmetric set difference based estimator may not provide such a uniform control as it is easy to see that a set estimate can have very small symmetric difference error but large Hausdorff error. Conversely, as long as the set boundary is not space-filling, small Hausdorff error implies small symmetric difference error. Existing results pertaining to nonparametric level set estimation using the Hausdorff metric [KT93, Tsy97, Cav97] focus on rather restrictive classes of level sets (for example, the boundary fragment and star-shaped set classes). These restrictions, which effectively reduce the set estimation problem to a boundary function estimation problem (in rectangular or polar coordinates, respectively), are typically not met in practical applications. In particular, the characterization of level set estimation as a boundary function estimation problem precludes level sets with multiple connected components, which are fundamental to many applications. Moreover, the estimation techniques proposed in [KT93, Tsy97, Cav97] require precise knowledge of the local regularity of the distribution (quantified by the parameter , to be defined below) in the vicinity of the desired level set in order to achieve minimax optimal rates of convergence. Such prior knowledge is unavailable in most practical ap- plications. Recently, a plug-in method based on sup-norm density estimation was put forth in [CMC06] that can handle more general classes than boundary fragments or starshaped sets, however sup-norm based methods require global smoothness assumptions on the density to ensure that the density estimate is good everywhere. Also, the method only deals with a special case of the density regularity condition considered in this paper ( = 1), and is therefore not adaptive to unknown density regularity. In this paper, we propose a plug-in procedure based on a regular histogram partition that can adaptively achieve minimax optimal rates of Hausdorff error convergence over a broad class of level sets with very general shapes and multiple connected components, without assuming a priori knowledge of the density regularity parameter . Adaptivity is achieved by a new data-driven procedure for selecting the histogram resolution. The procedure is specifically designed for the level set estimation problem and only requires local regularity of the density in the vicinity of the desired level. [B] Level set regularity: There exist constants o > 0 and C3 > 0 such that for all o , I (G ) = and (x, I (G )) C3 for all x G . This assumption states that the level set is not arbitrarily narrow anywhere. It precludes features like cusps and arbitrarily thin ribbons, as well as connected components of arbitrarily small size. This condition is necessary since arbitrarily small features cannot be detected and resolved from a finite sample. However, from a practical perspective, if the assumption fails to hold then it simply means that it is not possible to theoretically guarantee that such small features will be recover ed . [C] Level set boundary dimension: There exists a constant C4 > 0 such that for all x G and all , such that 0 < , the minimum number of -balls required to cover G B (x, ) is C4 ( /)-(d-1) . This assumption is related to the box-counting dimension [Fal90] of the boundary of the level set. It essentially says that, at any scale, the boundary behaves locally like a (d - 1)-dimensional surface in the d dimensional domain and is not space-filling. This condition is not restrictive since the Hausdorff error itself is inappropriate for space-filling curves, and in fact it is not required if the density regularity parameter is known. However, the condition is needed to achieve adaptivity using the proposed method, as we shall discuss later. Let F1 () denotes the class of densities satisfying as sumptions [A, B], and F2 () denotes the class of densities satisfying assumptions [A, B, C]. The dependence on other parameters is omitted as these do not influence the minimax optimal rate of convergence. The classes F1 (), F2 () are a generalization of the Lipschitz boundary fragments or starshaped sets considered in [KT93, Tsy97, Cav97] since assumptions [B, C] basically imply that the boundary looks locally like a Lipschitz function. In fact assumptions [B, C] are satisfied by a Lipschitz boundary fragment or star-shaped set; please refer to Section 5.3 for a formal proof. However, there is a slight difference between the upper bound of assumption [A] here and that employed in [Tsy97, Cav97]. The upper bound assumption in [Tsy97, Cav97] only requires that the set {x : |f (x) - | 0 } be non-empty. So as long as there is at least one point on the boundary where the density regularity assumption [A] holds, this determines the complexity of the class. Our assumption requires the density regularity to hold for an open neighborhood about at least one point on the boundary. This is necessary for adaptivity since a procedure cannot sense the regularity as characterized by unless the regularity holds in a region with positive measure. In [Tsy97], Tsybakov established a minimax lower 1 bound of (n/ log n)- d+2 for the class of Lipschitz starshaped sets, which satisfy our assumptions [B, C] (see Section 5.3) and the slightly modified version of assumption [A], as discussed above. His proof uses Fano's lemma to derive the lower bound for a discrete subset of densities from this class. It is easy to see that the discrete subset of densities 2 Density assumptions In this paper, we assume that the density f is supported on the unit hypercube in d-dimensions, that is X = [0, 1]d , and is bounded with range [0, fmax ]. Controlling the Hausdorff accuracy of level set estimates also requires some smoothness assumptions. The most crucial assumption is the first, which characterizes the relationship between distances and changes in density. The last two are topological assumptions on the level set and essentially generalize the notion of Lipschitz functions to closed hypersurfaces. Here we define an -ball centered at a point x as B (x, ) = {y X : ||x - y || }, where || · || denotes the Euclidean distance. Also, an inner -cover of a set G is defined as the union of all -balls contained in G. Formally, x I (G) = :B (x,)G B (x, ). [A] Local density regularity: The density is -regular around the -level set, 0 < < and < fmax , if there exist constants C2 > C1 > 0 and 0 , 1 > 0 such that C1 (x, G ) |f (x) - | C2 (x, G ) for all x X with |f (x) - | 0 , where G is the boundary of the true level set G . And there exists y0 G such that for all x B (y0 , 1 ), |f (x) - | 0 . This assumption is similar to the one used in [Tsy97, Cav97] (we elaborate on the differences later on). The regularity parameter determines the rate of error convergence for level set estimation. Accurate estimation is more difficult at levels where the density is relatively flat (large ), as intuition would suggest. In this paper, we do not assume knowledge of unlike previous investigations into Hausdorff accurate level set estimation [KT93, Tsy97, Cav97, CMC06]. Therefore, here the assumption simply states that there is a relationship between distance and density level, but the precise nature of the relationship is unknown. used in his construction also satisfy our form of assumption [A]. Hence, the same lower bound holds for the classes F1 () and F2 () under consideration as well and we have the following proposition. Here E denotes expectation with respect to the random data sample. Proposition 1 There exists c > 0 such that inf Gn f F () 1 sup E[d (Gn , G )] inf Gn f F () 2 sup n c E[d (Gn , G )] 1 - d+2 factor. Hence the proposed estimator can achieve near minimax optimal rates, given knowledge of the density regularity. We would like to point out that if the parameter 0 characterizing assumption [A] and the density bound fmax are also known, thc n the appropriate resoution can be chosen e l as j = log2 -1 (n/ log n)1/(d+2) , where the constant c c(0 , fmax ). With this choice, the optimal sidelength scales as 2-j (n/ log n)-1/(d+2) , and the estimator Gj exactly achieves the minimax optimal rate. log n . In the paper, we present a method that achieves this min imax lower bound for the class F1 (), given knowledge of the density regularity parameter . We also extend the method to achieve adaptivity to for the class F2 () under the additional assumption [C], while preserving the minimax optimal rate of convergence. Here the inf is taken over all possible set estimators Gn . 3.1 Adaptivity to unknown local density regularity In this section we present a procedure that automatically selects the appropriate resolution j without prior knowledge of . The selected resolution needs to be adapted to the local regularity of the density around the level of interest. To achieve this, we propose the following vernier: ¯ V ,j = min max | - f (A )|. AAj A Aj A 3 Proposed method In this section, we propose a plug-in level set estimator that is based on a regular histogram. The histogram resolution is adaptively selected in a purely data-driven way without assuming knowledge of the local density regularity. Let Aj denote the collection of cells in a regular partition of [0, 1]d into hypercubes of dyadic sidelength 2-j , where j is a non-negative integer. The estimator at this resolution is given as Gj = {A Aj : f (A) }. (1 ) n 1 Here f (A) = P (A)/µ(A), where P (A) = n i=1 I{Xi A} denotes the empirical probability of an observation occurring in A and µ is the Lebesgue measure. Our first result shows that, if the density regularity parameter is known, then the correct resolution can be chosen (as in [Tsy97, Cav97]), and the corresponding estimator achieves near minimax optimal rate over the class of densi ties given by F1 (). Here E denotes expectation with respect to the random data sample. We introduce the notation an bn to denote that an = O(bn ) and bn = O(an ). Theorem 1 Assume that the local density regularity is known. Pick resolution j such that 2-j 1 - (d+2) , where sn is a monotone diverging sesn (n/ log n) quence. Then 1 n - d+2 sup E[d (Gj , G )] C sn log n f F1 () for all n, where C C (C1 , C3 , o , fmax , 0 , d, ) > 0 is a constant. The proof is given in Section 5.1. Theorem 1 provides an upper bound on the Hausdorff error of our estimate. If sn is slowly diverging, for example if sn = (log n) where > 0, this upper bound agrees with the minimax lower bound of Proposition 1 up to a (log n) ¯ Here f (A) = P (A)/µ(A), and j = j + log2 sn , where sn is a slowly diverging monotone sequence, for example log n, log log n, etc. Hence Aj A denotes the collection of sub cells with sidelength 2-j [2-j /sn , 2-j +1 /sn ) within the cell A. The vernier focuses on cells A at resolution j that intersect the boundary (have smallest density deviation from the desired level ), and then evaluates the deviation in average density within subcells of A to judge whether or not the density is uniformly close to over the cell. Thus, the vernier is sensitive to the local density regularity in the vicinity of the desired level and in fact minimizing the vernier leads to selection of the appropriate resolution adapted to the unknown density regularity parameter . By choosing sn with arbitrarily slow divergence, it is possible to get arbitrarily close to the optimal rate of convergence in the Hausdorff sense. However, note that the vernier may not function properly if the boundary of G passes through every subcell of A (since then the subcell averages may be arbitrarily close to irrespective of the density regularity). Assumption [C] precludes this possibility at sufficiently high resolutions. Since V ,j requires knowledge of the unknown probability measure, we must work with the empirical version, defined analogously as: V ,j = min AAj A Aj A max We propose a complexity regularization scheme wherein the empirical vernier V ,j is balanced by a penalty term: j w := AAj | - f (A )|. max 8 log( 2 16 ) max nµ(A) j (d+1) f log( 2 16 ) (A), 8 nµ(A) j (d+1) here 0 < < 1 is a confidence parameter, and µ(A) = 2-j d . Notice that the penalty is computable from the given observations. The precise form of is chosen so that minimizing the empirical vernier plus penalty provides control over the true vernier (refer to Section 5.2 for a formal proof). The final level set estimate is given by G = Gj (2 ) where j = arg min 0j J Thus the search is focused on regular partitions of dyadic sidelength 2-j , j {0, 1, . . . , J }. The choice of J will be specified below. Observe that the value of the empirical vernier decreases with increasing resolution as better approximations to the true level are available. On the other hand, the penalty is designed to increase with resolution to penalize high complexity estimates that might overfit the given sample of data. Thus, the above procedure chooses the appropriate resolution automatically by balancing these two terms. We now establish that our complexity penalized procedure leads to minimax optimal rates of convergence without requiring prior knowledge of any parameters. Theorem 2 Pick J J (n) such that 2-J 1 sn (n/ log n)- d , where sn is a monotone diverging sequence. Let j denote the resolution chosen by the complexity penalized method as given by Eq. (3), and G denote the final estimate of Eq. (2). Then with probability at least 1 - 3/n, for all densities in the class F2 (), 1 1 n - d+2 n - d+2 d d d+2 d+2 -j c1 sn 2 c2 sn sn log n log n for n large enough (so that sn > c(C4 , d)), where c1 , c2 > 0 are constants. In addition, 1 - d+2 n 2 sup E[d (G, G )] C sn log n f F2 () V ,j + j ( 3) straightforward consequence of assuming a functional form for the boundary. We would also like to comment that while we only addressed the density level set problem in this paper, extensions to general regression level set estimation should be possible using a similar approach. The complexity regularization approach (Eq. 3) based on the vernier is similar in spirit to the so-called Lepski methods (for example, [LMS97]) for function estimation which are spatially adaptive bandwidth selectors, however the vernier focuses on cells close to the desired level and thus is specifically tailored to the level set problem. The vernier provides the key to achieve adaptivity while requiring only local regularity of the density in the vicinity of the desired level. In this paper, we assume that the density regularity is the same everywhere along the level set. This might be somewhat restrictive, particularly if the level set consists of multiple components. Adaptivity to spatial variations in the density regularity can be achieved using a spatially adapted partition instead of a regular histogram partition. This might be possible by developing a tree-based approach or a modified Lepski method, and is the subject of current research. 5 Proofs Before proceeding to the proofs, we establish two lemmas that are used throughout. The first one establishes a bound on the deviation of true and empirical density averages. Lemma 1 Consider 0 < < 1. With probability at least 1 - , the following is true for all dyadic resolutions j : AAj for all n, where C C (C1 , C2 , C3 , C4 , o , fmax , 0 , 1 , d, ) > 0 is a constant. The proof is given in Section 5.2. 1 The maximum resolution 2J s-1 (n/ log n) d can be easn ily chosen, based only on n, and allows the optimal resolution for any to lie in the search space. Observe that by appropriate choice of sn , for example sn = (log n)/2 with a small number > 0, the bound of Theorem 2 matches the minimax lower bound of Proposition 1, except for an additional (log n) factor. Hence our method adaptively achieves near minimax optimal rates of convergence for the class F2 (). an d Proof: The proof relies on a pair of VC inequalities (See [DL01] Chapter 3) that bound the relative deviation of true and empirical probabilities. For the collection Aj with shatter coefficient bounded by 2j d , the relative VC inequalities state that for any > 0 s 2 P (A) - P (A) P P up 4 · 2j d e-n /4 > (A) AAj P (A) - P (A) 2 P P sup > 4 · 2j d e-n /4 . AAj (A) P P ¯ max |f (A) - f (A)| j . 4 Concluding Remarks In this paper, we propose a Hausdorff accurate level set estimation method that is adaptive to unknown local density regularity and achieves minimax optimal rates of error convergence over a very general classes of level sets. The analysis in this paper assumes > 0, however the case 0 that allows jumps in the density can also be handled (see [SSN07]), but is omitted here to keep the presentation and proofs simpler. Also, this paper considers locally Lipschitz boundaries, however extensions to additional boundary smoothness (for example, Holder regularity > 1) may be possible in ¨ the proposed framework using techniques such as wedgelets [Don99] or curvelets [CD99]. The earlier work on Hausdorff accurate level set estimation [KT93, Tsy97, Cav97] does address higher smoothness of the boundary but that follows as a Also observe that (A) = P (A) 2 max(P (A), 22 ). To see the first statement, consider 1) P (A) 42 - The statement is obvious. 2) P (A) > 42 - This gives a bound on , which implies P (A) P (A) + P (A)/2 and hence P (A) 2P (A). The second statement follows similarly. Using these statements and the relative VC inequalities for the collection Aj , we have: With probability > 1 - 8 · 2 2j d e-n /4 , A Aj both 2 P P (A) P (A) - (A) max(P (A), 22 ) P (A) P (A) + P (A) P (A) + (A) = P (A) 2 max(P (A), 22 ) and P (A) - P (A) 4 Setting = log(2j d 8/j )/n, we have with probability > 1 - j , A Aj |P (A) - P (A)| 8 T P log(2j d 8/j ) log(2j d 8/j ) max (A), 8 n n he result follows by dividing the result by µ(A), setting j = 2-(j +1) and taking union bound. The next lemma states how the density deviation bound or penalty j scales with resolution. This will be used to derive rates of convergence. Lemma 2 Fo1 all resolutions such that 2j r = O((n/ log n) d ), there exist constants c3 , c4 c4 (fmax , d) > 0 such that for all n, with probability at least 1 - 1/n, 2 2 log n log n jd jd j c4 c3 n n Proof: The lower bound follows by observing that A 1= P (A) maxP (A)×|Aj | = max AAj AAj µ(A) P (A) 2 max(P (A), 22 ) Proof: Let J0 = log2 4 d/o , where o is as defined in assumption [B]. Also define 1 . -j j j := + d2 C1 Consider two cases: I. j < J0 . For this case, since the domain X = [0, 1]d , we use the trivial bound d (Gj , G ) d 2J0 ( d2-j ) 8 d-1 j . o The last step follows by choice of J0 and since j , C1 > 0. II. j J0 . Observe that assumption [B] implies that G is not empty since G I (G ) = for o . We will show that for large enough n, with high probability, Gj G = for j J0 and hence Gj is not empty. Thus the Hausdorff error is given as (4 ) and we need bounds on the two terms in the right hand side. We now prove that Gj is not empty and obtain bounds on the two terms in the Hausdorff error. Towards this end, we establish two propositions. The first proposition proves that for large enough n, with high probability, the distance of all points that are erroneously excluded or included in the level set estimate, from the true set boundary is bounded by j . Notice that, if Gj is non-empty, this provides an upper bound on the second term of the Hausdorff error (Eq. 4). The second proposition establishes that, for large enough n and j J0 , 2j o and hence the inner cover I2j (G ) is not empty. And using the first proposition, with high probability, I2j (G ) contains points that are correctly in cluded in the level set estimate and lie in Gj G . Thus Gj is not empty. Further, along with assumption [B], this provides a bound of j on the distance of any point in G from the estimate Gj , thus bounding the first term of the Hausdorff error (Eq. 4). We end the proof of the two propositions with a white box to indicate that these propositions are included within the proof of Lemma 3, and do not signify end of the proof of Lemma 3. Proposition 2 If Gj G = , then for resolutions sat isfying 2j = O(s-1 (n/ log n)1/d ) and n n1 (fmax , n d, 0 ) with probability at least 1 - 2/n 1/ j + d2-j = j . sup (x, G ) C1 x G j G Aj P (A) d (Gj , G ) = max{ sup (x, Gj ), sup (x, G )}, x G x G j Dividing by µ(A) = 2-j d , using density bound fmax , setting j = 2-(j +1) and taking union bound, we have with probability > 1 - , for all dyadic resolutions j f . log(2j (d+1) 16/ ) max f (A) 2 max max , 2j d 8 AAj n This implies the upper bound using = 1/n and 2j = O((n/ log n)1/d ). and using = 1/n, j 0 and µ(A) = 2-j d . To get an upper bound, using the same arguments as in proof of Lemma 1 based on the relative VC inequality it follows that [SSN07] with probability > 1 - j , for all A Aj P . log(2j d 8/j ) P (A) 2 max (A), 8 n = maxf (A) AAj 5.1 Proof of Theorem 1 The proof relies on the following lemma that will also be used in the proof of Theorem 2. Lemma 3 Consider densities satisfying assumptions [A] and [B]. Then for all resolutions such that 2j = 1 O(s-1 (n/ log n) d ), where sn is a monotone diverging sen quence, and n n0 n0 (fmax , d, 0 , o , C1 , ) with probability at least 1 - 3/n 1 . -1 -j j d (Gj , G ) max(2C3 +3, 8 do ) + d2 C1 Proof:Since by assumption Gj G = , consider x Gj G . Let Ax Aj denote the cell containing x at resolution j . Consider two cases: (i) Ax G = . This implies that (x, G ) d2-j . (ii) Ax G = . Since x Gj G , it is erro neously included or excluded from the level set ¯ estimate Gj . Therefore, if f (Ax ) , then f (Ax ) < otherwise if f (Ax ) < , then ¯ f (Ax ) . This implies that | - f (Ax )| ¯ ¯(Ax ) - f (Ax )|. Using Lemma 1, we get | - |f ¯ f (Ax )| j with probability at least 1 - . Now let x0 be any point in Ax such that | - ¯ f (x0 )| | - f (Ax )| (Notice that at least one such point must exist in Ax since this cell does not intersect the boundary). As argued above, ¯ | - f (Ax )| j with probability at least 1 - 1/n (for = 1/n) and using Lemma 2, j decreases with n for resolutions satisfying 1 2j = O(s-1 (n/ log n) d ) with probability at least n 1 - 1/n. So for large enough n n1 (fmax , d, 0 ), j 0 and hence | - f (x0 )| 0 . Thus, the density regularity assumption [A] holds at x0 with probability > 1 - 2/n and we have | 1 - f (x0 )| (x0 , G ) C1 1 1 | ¯ - f (Ax )| j . C1 C1 Since x, x Ax , (x, G ) (x0 , G ) + -j 0 d2 . Therefore, 1/ j (x, G ) + d2-j . C1 So for both cases, we can say that for resolutions satisfying 2j = O(s-1 (n/ log n)1/d ) and n n1 (fmax , n d, 0 ) with probability at least 1 - 2/n, x Gj G 1/ j (x, G ) + d 2 -j = j . C1 P roposition 3 Recall assumption [B] and denote the inner cover of G with 2j -balls, I2j (G ) I2j . For resolutions satisfying 2j = O(s-1 (n/ log n)1/d ), n j J0 and n n0 n0 (fmax , d, 0 , o , C1 , ), with probability at least 1 - 3/n, P of:Observe that for j J0 , 2 d2-j ro 2 d2-J0 o /2. And using Lemma 2 for large enough n n2 n2 (o , fmax , C1 , ), Gj = and xI2j 2(j /C1 )1/ o /2 with probability at least 1 - 1/n. Therefore for all j J0 and n n2 , 2j o with probability at least 1 - 1/n and hence I2j = . Now consider any 2j -ball in I2j . Then the distance of all points in the interior of the concentric j -ball from the boundary of I2j , and hence from the boundary of G is greater than j . As per Proposition 2 for n n0 = max(n1 , n2 ), with probability > 1 - 3/n, none of these points can lie in Gj G , and hence must lie in Gj G since they are in I2j G . Therefore, N Gj = an d xI2j sup (x, Gj G ) j . ow since G and Gj are non-empty sets, we bound the two terms that contribute to the Hausdorff error x G For this, we will use Propositions 2 and 3, hence all the following statements will hold for resolutions satisfying 2j = O(s-1 (n/ log n)1/d ), j J0 and n n n0 n0 (fmax , d, 0 , o , C1 , ), with probability at least 1 - 3/n. To bound the second term, observe that (i) If Gj \ G = , then supxGj (x, G ) = 0. x G j sup (x, Gj ) an d x G j sup (x, G ). (ii) If Gj \ G = , it implies that Gj G = . Hence, using Proposition 2, we have sup (x, G ) = sup = Thus, for either case x G j x G j \G (x, G ) sup x G j \G (x, G ) (x, G ) j . sup x G j G sup (x, G ) j . (5 ) To bound the first term, observe that (i) If G \ Gj = , then supxG (x, Gj ) = 0. x G (ii) If G \ Gj = , we proceed as follows: sup (x, Gj ) sup (x, Gj G ) x G xI2j sup (x, Gj G ) j . = max{ sup (x, Gj G ), xG \I2j max{j , sup (x, Gj G )} xG \I2j The last step follows using Proposition 3. sup (x, Gj G )}. Now consider any x G \ I2j . Then using tri angle inequality, y G and z I2j , (x, Gj G ) (x, y ) + (y , z ) + (z , Gj G ) (x, y ) + (y , z ) + j . The last step follows using Proposition 3. This implies that y G , (x, Gj G ) (x, y ) + inf (y , z ) + j z I2j y G (x, y ) + (y , z ) + sup (z , Gj G ) z I2j Since the chosen resolution 2-j sn (n/ log n)- (d+2) satisfies conditions of Lemma 3, proof of Theorem 1 now follows using the bound on j from Lemma 2. Let denote the event such that the bounds of Lemma 2 and Lemma 3 ¯ hold. Then for n n0 , P () 4/n. Hence for all n, ¯ P () max(4, n0 )/n. So f F1 (): (Here C may denote a different constant from line to line. Explanation for each step is provided after the equations.) E[d (Gj , G )] = ¯ ¯ P ()E[d (Gj , G )|] + P ()E[d (Gj , G )|] ¯ E[d (Gj , G )|] + P () d 1 / d j C + d 2 -j + C1 n 2 1 2 1 j d log n C max , 2 -j , n n 1 - d+2 n C (C1 , C3 , o , fmax , 0 , d, )sn . log n 1 = (x, y ) + (y , I2j ) + j (x, y ) + sup (y , I2j ) + j (x, y ) + 2C3 j + j . Here the last step invokes assumption [B]. This in turn implies that (x, Gj G ) y G inf (x, y ) + (2C3 + 1)j The second step is true for x G \ I2j . If it was not true, then y G , (x, y ) > 2j and hence there exists a closed 2j -ball around x that is in G . This contradicts the fact that x I2j . Therefore, we have: sup xG \I2j (x, Gj G ) (2C3 + 3)j . 2j + (2C3 + 1)j . The second step follows using the trivial bounds P () ¯ 1 and since the domain X = [0, 1]d, E[d (Gj , G )|] d. The third step follows from Lemma 3 and the fourth one using Lemma 2. The last step follows since the chosen 1 resolution 2-j sn (n/ log n)- (d+2) . 5.2 Proof of Theorem 2 To analyze the resolution chosen by the complexity penalized procedure of Eq. (3) based on the vernier, we first establish two results regarding the vernier. Using Lemma 1, we have the following corollary that bounds the deviation of true and empirical vernier. Corollary 1 Consider 0 < < 1. With probability at least 1 - , the following is true for all dyadic resolutions j : And going back to the start of case (ii) we get: x G Therefore, for either case we have x G sup (x, Gj ) (2C3 + 3)j . (6 ) From Eq. (5) and (6), we have that for all densities satisfying assumptions [A, B], for resolutions satisfying 2j = O(s-1 (n/ log n)1/d ), j J0 and n n0 n n0 (fmax , d, 0 , o , C1 , ), with probability > 1 - 3/n, d (Gj , G ) = max{ sup (x, Gj ), sup (x, G )} x G x G j sup (x, Gj ) (2C3 + 3)j . Proof: Let A0 Aj denote the cell achieving the min defining V ,j and A1 Aj denote the cell achieving the min defining V ,j . Also let A and A denote the subcells at 0 1 resolution j within A0 and A1 , respectively, that have maximum average density deviation from . Similarly, let A and 0 A1 denote the subcells at resolution j within A0 and A1 , respectively, that have maximum empirical density deviation from . Then we have: (Explanation for the steps are given after the equations.) V , j - V , j = = ¯ | - f (A )| - | - f (A )| 0 1 ¯(A )| - | - f (A )| | - f 1 1 1 1 |V ,j - V ,j | j . (2C3 + 3)j . And addressing both Case I (j < J0 ) and Case II (j J0 ), we finally have that for all densities satisfying assumptions [A, B], for resolutions sat1 isfying 2j = O(s-1 (n/ log n) d) and n n0 n n0 (fmax , d, 0 , o , C1 , ), with probability > 1 - 3/n, d (Gj , G ) max(2C3 + 3, 8 d-1 )j . o AAj j ¯ ¯ max{f (A ) - f (A ), f (A ) - f (A )} 1 1 1 1 ¯ max |f (A) - f (A)| ¯ |f (A ) - f (A )| 1 1 ¯ ¯ max{f (A ) - f (A ), f (A ) - f (A )} 1 1 The first inequality invokes definition of A0 , the third inequality invokes definitions of the subcells A , A , and the 1 1 last one follows from Lemma 1. Similarly, V , j - V , j = ¯ | - f (A )| - | - f (A )| 1 0 ¯ | - f (A0 )| - | - f (A )| 0 ¯ |f (A ) - f (A )| 0 0 We establish this statement formally later on, but for now assume that it holds. The local density regularity condition [A] now gives that for all x A , | - f (x)| o min(0 , C1 2-j ) min(0 , C1 )2-j . So we have: A AAj max ¯ ¯ | - f (A )| | - f (A )| min(0 , C1 )2-j o Here the first inequality invokes definition of A1 . The rest follows as above, considering cell A0 instead of A1 . The second result establishes that the vernier is sensitive to the resolution and density regularity. Lemma 4 Consider densities satisfying assumptions [A] and [C]. Recall that j = j + log2 sn , where sn is a monotone diverging sequence. Then for all dyadic resolutions j min(0 , C1 )2-j V ,j C ( d2-j ) holds for n large enough such that sn > 4C4 6d . Here C C (C2 , fmax , 1 , )> 0. Since this is true for any A Aj , in particular, this is true for the cell achieving the min defining V ,j . Hence, the lower bound on the vernier V ,j follows. We now formally prove that assumption [C] on the level set boundary dimension implies that for large enough n (so that sn > 4C4 6d ), A A Aj s.t. x Ao , o (x, G ) 2-j . Observe that it suffices to show that for large enough n, A A Aj -2 s.t. A G = . To prove this last statement, consider two cases: (i) A G = . For sn 8, j - 2 j (recall defini tion of j ), and since A does not intersect the boundary, clearly A A Aj -2 s.t. A G = . (ii) A G = . Let x A G . Consider = d2-j (the diagonal l gth of a cell), then A B (x, ). en Also let = d2-(j -2) /2 (the choice will be justified below). For sn 4, 0 < and using assumption [C], the minimum number of -balls required to cover G B (x, ) is C4 ( /)-(d-1) . Since A B (x, ), the minimum number of -balls required to cover G A is also C4 ( /)-(d-1) . Now consider a uniform partition of the cell A into sub cells of sidelength 2 / = 2-(j -2) . Since the diagd onal length of a subcell d2-(j -2) = 2 , this choice of implies that a subcell at resolution 2-(j -2) is inscribed within an aligned -ball. Observe that at this resolution, in d-dim, an unaligned -ball can intersect up to 3d - 1 subcells (number of neighbors of any hypercube). Therefore, the number of subcells in A Aj -2 that intersect the boundary can be no more than -(d-1) d2-(j -2) d -(d-1) d 3 C4 ( /) = 3 C4 2 d2-j = < C4 6d (j -2-j )d -(j -2-j ) 2 2 2 4C4 6d (j -2-j )d 2 sn Proof: We first establish the upper bound. Recall assumption [A] and consider the cell A Aj that contains the point y0 . Then A G = . Let A denote the subcell at resolu tion j within A that has maximum average density deviation from . Consider two cases: (i) If the resolution is large enough so that d2-j 1 , then the density regularity assumption [A] holds x A since A B (y0 , 1 ), the 1 -ball around y0 . The same holds also for the subcell A . Hence ¯ | - f (A )| C2 ( d2-j ) (ii) If the resolution is not large enough and d2-j > 1 , the following trivial bound holds: fmax ¯ | - f (A )| fmax ( d2-j ) 1 -j The last step holds since d2 > 1 . Hence we can say for all j there exists A Aj such that C ( fmax -j ¯(A )| max | - f d2 ) 2, 1 This yields the upper bound on the vernier: C ( fmax -j V ,j max , d2 ) := C ( d2-j ) 2 1 where C C (C2 , fmax , 1 , ). For the lower bound, consider a cell A Aj . We will show that assumption [C] on the level set boundary dimension basically implies that the boundary does not intersect all subcells at resolution j within the cell A at resolution j . And in fact for large enough n (so that 2-j is small enough, recall that j = j + log2 sn where sn is a monotone diverging sequence), there exists at least one subcell A A Aj o such that x A , o (x, G ) 2-j . where the last step uses the fact 2-j < 2-j +1 /sn . For sn > 4C4 6d , the number of subcells within A at resolution j - 2 that intersect the boundary is less than the total number of subcells within A at that resolution. Therefore, A A Aj -2 s.t. A G = . This in turn implies that for n large enough (so that sn > 4C4 6d ), A A Aj such that x A , (x, G ) o o 2 -j . We are now ready to prove Theorem 2. Observe that Lemmas 2, 3 and Corollary 1 hold together with probability at least 1 - 5/n (taking = 1/n). Using these lemmas, we will show that for the resolution j chosen by Eq. (3), both d V ,j and j are upper bounded by C sn+2 (n/ log n)- d+2 , where C C (C2 , fmax , 1 , d, ) > 0. If this holds, then using Lemma 4 and the definition of j , we have the following upper bound on the sidelength: For sn > 4C4 6d V 1 ,j -j -j 2 sn 2 sn min(0 , C1 ) 1 n - d+2 d d c2 sn sn+2 , log n d 1 n - d+2 d d terms for optimal resolution 2-j sn+2 log n . This establishes the desired bounds on V ,j and j . Now we can invoke Lemma 3 to derive the rate of convergence for the Hausdorff error. Consider large enough n n1 (C4 , d) so that sn > 4C4 6d . Also, recall that the condition of Lemma 3 requires that n n0 (fmax , d, 0 , o , C1 , ). Pick n max(n0 , n1 ) and let denote the event such that the bounds of Lemma 2, Lemma 3 and Corollary ¯ 1 hold with = 1/n. Then, we have P () 5/n for ¯ ) max(5, n0 , n1 )/n. n max(n0 , n1 ), or for all n, P ( So f F2 (), we have: (Here C may denote a different constant from line to line. Explanation for each step is provided after the equations.) where c2 c2 (C1 , C2 , fmax , 0 , 1 , d, ) > 0. Also notice that since 2J s-1 (n/ log n)1/d , we have 2j 2J n sn 2J (n/ log n)1/d , and hence Lemma 2 can be used to provide a lower bound on the sidelength: 1 2 -d sn - j sn j n -j 2> 2 2 2 c2 log n 3 1 s -d 2 n - d+ 2 2d n d c1 sn n+2 log n log n 1 -2 n d+ d d = c1 sn+2 , log n where c1 c1 (C2 , fmax , 1 , d, ) > 0. So we have for sn > 4C4 6d , with probability at least 1 - 5/n, 1 1 n - d+2 n - d+2 d d d+2 d+2 -j c1 sn 2 c2 sn sn . log n log n (7 ) Hence the automatically chosen resolution behaves as desired. Let us now derive the claimed bounds on V ,j and j . Using Corollary 1 and Eq. (3), we have the following oracle inequality: V , j = V + j V min ,j + j ,j E[d (G, G )] ¯ ¯ = P ()E[d (G, G )|] + P ()E[d (G, G )|] ¯ E[d (G, G )|] + P () d 1/ -j j d + d2 + C C1 n 2 1 2 j1 j d log n C max , 2- , n n 1 1 - d+2 n - d+2 n d d+2 2 C sn s n C sn . log n log n Here C C (C1 , C2 , C3 , C4 , o , fmax , 0 , 1 , d, ). The second step follows by observing the trivial bounds P () ¯ 1 and since the domain X = [0, 1]d , E[d (G, G )|] d. The third step follows from Lemma 3 and the fourth one from Lemma 2. The last step follows using the upper j and lower bounds established on 2- in Eq. (7). 5.3 Star-shaped sets satisfy assumptions [B] and [C] Recall the definition of FS L as defined in [Tsy97]. The class corresponds to densities bounded above by fmax , satisfying a slightly modified form of the local density regularity assumption [A]: 0j J Here C C (C2 , fmax , 1 , d, ). The second step uses the definition of j and the last step follows by balancing the two Lemma 4 provides an upper bound on the vernier V ,j , and Lemma 2 provides an upper bound on the penalty j . We now plug these bounds into the oracle inequality. Here C may denote a different constant from line to line. 2 2 log n -j j d V ,j V ,j + j C min + 0j J n m2 2 log n j d sd C min a x -j , n 0j J n n - d+2 d d C sn+2 . log n 0j J min {V ,j + 2j } [A'] Local density regularity: The density is -regular around the -level set, 0 < < and < fmax , if there exist constants C2 > C1 > 0 and 0 > 0 such that C1 (x, G ) |f (x) - | C2 (x, G ) for all x X with |f (x) - | 0 , where G is the boundary of the true level set G , and the set {x : |f (x) - | 0 } is non-empty. and the densities have level sets of the form G = {(r, ); [0, )d-2 × [0, 2 ), 0 r g () R}, where (r, ) denote the polar/hyperspherical coordinates and R > 0 is a constant. g is a periodic Lipschitz function that satisfies g () h, where h > 0 is a constant, and |g () - g ()| L|| - ||1 , , [0, )d-2 × [0, 2 ). Lemma 5 Consider the level set G of a density f FS L . Then G satisfies the assumptions [B] and [C] on the level set regularity and the level set boundary dimension, respectively. Here L > 0 is the Lipschitz constant, and || · ||1 denotes the 1 n o rm. We set R = 1/2 in the definition of the star-shaped set so that the domain is a subset of [-1/2, 1/2]d. With this domain, the following lemma shows that the level set G of a density f FS L satisfies [B] and [C]. The second inequality follows as 1 1 6 sin-1 h- o o since h-o h-o 1 by choice of o h/3. The 2 third inequality is true since sin-1 (z /2) z for 0 z /2. The last step follows by choice of o h/3. Proof: We first present a sketch of the main ideas, and then provide a detailed proof. Consider the -level set G of a density f FS L . To see that it satisfies [B], divide the star-shaped set G into sectors of width so that each sec tor contains at least one -ball and the inner cover I (G ) touches the boundary at some point(s) in each sector. Now one can argue that, in each sector, all other points on the boundary are O() from the inner cover since the boundary is Lipschitz. Since this is true for each sector, we have x G , (x, I (G )) = O(). To see that G satis fies [C], consider any sector of width and divide it into sub-sectors of width O( ), 0 < . Since the boundary is Lipschitz, a constant number of -balls can cover the boundary in each sub-sector. Thus, the minimum number of -balls needed to cover the boundary in all sub-sectors is of the order of the minimum number of sub-sectors, that is, O((/ )d-1 ). Hence, the result follows. We now present the proof in detail. To see that G satisfies [B], fix o h/3. Then for all o , B (0, ) G (since g () h > o ), and hence I (G ) = . We also need to show that C3 > 0 such that for all x G , (x, I (G)) C3 . For this, divide G into M d-1 sectors indexed by m = (m1 , m2 , . . . , md-1 ) { 1 , . . . , M } d -1 ( Sm = r, ) : 0 r g (), mi (mi - 1) i < , i = 1, . . . , d - 2, M M , 2 (md-1 - 1) 2 md -1 d-1 < M M where = (1 , 2 , . . . , d-1 ). Let T M= 2 sin-1 h- o Now from (i) above, each sector contains at least one -ball. Consider any m {1, . . . , M }d-1 . We claim that there exists a point xm G Sm , xm = (g ( ), ) for some [0, )d-2 × [0, 2 ), such that (xm , I (G )) = 0. Suppose not. Then one can slide the -ball within the sector towards the periphery and never touch the boundary, implying that the set G is unbounded. This is a contradic tion by the definition of the class FS L . So now we have, y G Sm , y = (g (), ) (y , I (G )) = (y , xm ) = ||y - xm || ||(g (), ) - (g ( ), )|| g |g () - g ( )| + 2 ()g () · d -1 s i i - i in 2 =1 L|| - ||1 + = (L + 1/2) d -1 i =1 d -1 i =1 |i - i | 2 |i - i | (L + 1/2)d M 9d(L + 1/2) := C3 h his choice of M implies that: (i) There exists an -ball within Sm B (0, h) for every m {1, . . . , M }d-1, and hence within each sector Sm . This follows because the minimum angular width of a sector with radius h required to fit an -ball within is 2 sin-1 2 sin-1 . h- h - o M (ii) The angular-width of the sectors scales as O(). 1 < = 1 1 M -1 - 2 sin-1 2 sin-1 h-o h-o The third step follows using simple algebra (see [SSN07]), the fourth step follows by the Lipschitz condition on g (·), g (·) R = 1/2 and since | sin(z )| |z |. The sixth step follows since x, y Sm and hence |i - i | /M for i = 1, . . . , d - 2 and |d-1 - d-1 | 2 /M . The last step invokes (ii) above. Therefore, we have for all y G Sm , (y , I (G )) C3 . And since the result is true for any sector, condition [B] is satisfied by any level set G with density f FS L . To see that G satisfies [C], consider x G . Let x = (g (0 ), 0 ). Also let i = min{i : (g (), ) B (x, )} (2) and i = max{i :(g (), ) B (x, )}. Define the sector O x S = {(r, ) : 0 r g (), (1) i (1) i i , i = 1, . . . , d - 1 (2) 3 sin-1 9 6 h - o h - o h x bserve that if h/4 < h, the width of S in the ith (2) (1) -1 coordinate, i = i - i 2 sin b y co n g ( 0 ) struction. Since g (·) h, we have i 2 sin-1 h 4/h, where the last step follows since for 0 z /2, sin-1 (z /2) z . If > h/4, then use the trivial bound i 2 8/h. Equivalently, we can say for all and all i, i 8/h. (8 ) x d -1 Further subdivide S into M sub-sectors indexed by m = (m1 , . . . , md-1 ) ( (mi - 1)i (1) i Sm = r, ) : 0 r g (), i + M P mi i (1) < i + , i = 1, . . . , d - 1 M ick M such that for all coordinates, the sub-sector width i 2 M (d-1)(L+1/2) , where 0 < . With this choice of sub-sector width, Sm G can be covered by a -ball. To see this, consider two points in Sm G - (g (), ) and (g (), ). Proceeding as before, we have: ||(g (), ) - (g ( ), )|| (L + 1/2) (L + 1/2) d -1 i =1 d -1 i =1 [CMC06] [ DL 0 1 ] [ Do n 9 9 ] [Fal90] [ Ha r 7 5 ] [ KT 9 3 ] [LMS97] |i - i | i 2. M [LPS99] Since each sub-sector can be covered by a -ball, the minimum number of -balls needed to cover B (x, ) G is equal to the minimum number of sub-sectors needed (M d-1 ). This corresponds to the smallest M such that 2 maxi i (d-1)(L+1/2) . Therefore, minimum number M of -balls needed to cover B (x, ) G is equal to ( d -1 d - 1)(L + 1/2) maxi i 2 ( d -1 d - 1)(L + 1/2) maxi i +1 2 d -1 2 (d - 1)(2L + 1) + h 2 d -1 d -1 (d - 1)(2L + 1) +1 h d -1 := C4 The second inequality follows since from Eq. (8), i 8 h for all i, and since . Therefore, any level set G with density f FS L also satisfies [C]. [Pol95] [RV06] [SD07] [SHS05] [SN06] [SSN07] [Stu03] [Tsy97] [ VV0 6 ] [WN07] Acknowledgements The authors would like to thank Rui Castro for helpful discussions and carefully reviewing the paper. References [Cav97] [CD99] L. Cavalier. Nonparametric estimation of regression level sets. Statistics, 29:131­160, 1997. E. Candes and D. L. Dohono. Curvelets: A sur´ prisingly effective nonadaptive representation for objects with edges. Curves and Surfaces, Larry Schumaker et al., Ed. Vanderbilt University Press, Nashville, TN, 1999. A. Cuevas, W. G. Manteiga, and A. R. Casal. Plug-in estimation of general level sets. Aust. N. Z. J. Stat., 48(1):7­19, 2006. L. Devroye and G. Lugosi. Combinatorial Methods in Density Estimation. Springer, NY, 2001. D. L. Donoho. Wedgelets: Nearly-minimax estimation of edges. Ann. Statist., 27:859­897, 1999. K. Falconer. Fractal Geometry: Mathematical Foundations and Applications. Wiley, West Sussex, England, 1990. J. A. Hartigan. Clustering Algorithms. Wiley, NY, 1975. A. P. Korostelev and A. B. Tsybakov. Minimax Theory of Image Reconstruction. Springer, NY, 1993. O. V. Lepski, E. Mammen, and V. G. Spokoiny. Optimal spatial adaptation to inhomogeneous smoothness: An approach based on kernel estimates with variable bandwidth selectors. Ann. Statist., 25(3):929­947, 1997. R. Y. Liu, J. M. Parelius, and K. Singh. Multivariate analysis by data depth: Descriptive statistics, graphics and inference. Ann. Statist., 27(3):783­ 858, 1999. W. Polonik. Measuring mass concentrations and estimating density contour cluster-an excess mass approach. Ann. Statist., 23(3):855­881, 1995. Philippe Rigollet and Regis Vert. Fast rates for plug-in estimators of density level sets, available at http://www.citebase.org/abstract?id=oai: arxiv.org:math/0611473, 2006. C. Scott and M. Davenport. Regression level set estimation via cost-sensitive classification. IEEE Trans. Signal Process., 55(6):2752­2757, 2007. I. Steinwart, D. Hush, and C. Scovel. A classification framework for anomaly detection. J. Mach. Learn. Res., 6:211­232, 2005. C. Scott and R. Nowak. Learning minimum volume sets. J. Mach. Learn. Res., 7:665­704, 2006. A. Singh, C. Scott, and R. D. Nowak. Adaptive hausdorff estimation of density level sets. Technical Report ECE-07-06, University of Wisconsin - Madison, ECE Dept., available at www.cae.wisc.edu/singh/TR Hausdorff.pdf, 2007. W. Stuetzle. Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. J. Classification, 20(5):25­47, 2003. A. B. Tsybakov. On nonparametric estimation of density level sets. Ann. Statist., 25:948­969, 1997. R. Vert and J.-P. Vert. Consistency and convergence rates of one-class svms and related algorithms. J. Mach. Learn. Res., 7:817­854, 2006. R. Willett and R. Nowak. Minimax optimal level set estimation. IEEE Trans. Image Proc., 16(12):2965­2979, 2007.