Privacy-preserving logistic regression Kamalika Chaudhuri Information Theory and Applications University of California, San Diego kamalika@soe.ucsd.edu Claire Monteleoni Center for Computational Learning Systems Columbia University cmontel@ccls.columbia.edu Abstract This paper addresses the important tradeoff between privacy and learnability, when designing algorithms for learning from private databases. We focus on privacy-preserving logistic regression. First we apply an idea of Dwork et al. [7] to design a privacy-preserving logistic regression algorithm. This involves bounding the sensitivity of regularized logistic regression, and perturbing the learned classifier with noise proportional to the sensitivity. We show that for certain data distributions, this algorithm has poor learning generalization, compared with standard regularized logistic regression. We then provide a privacy-preserving regularized logistic regression algorithm based on a new privacy-preserving technique: solving a perturbed optimization problem. We prove that our algorithm preserves privacy in the model due to [7], and we provide learning guarantees. We show that our algorithm performs almost as well as standard regularized logistic regression, in terms of generalization error. Experiments demonstrate improved learning performance of our method, versus the sensitivity method. Our privacy-preserving technique does not depend on the sensitivity of the function, and extends easily to a class of convex loss functions. Our work also reveals an interesting connection between regularization and privacy. 1 Introduction Privacy-preserving machine learning is an emerging problem, due in part to the increased reliance on the internet for day-to-day tasks such as banking, shopping, and social networking. Moreover, private data such as medical and financial records are increasingly being digitized, stored, and managed by independent companies. In the literature on cryptography and information security, data privacy definitions have been proposed, however they sometimes severely limit the ability to learn meaningfully from the resulting (privatized) databases. On the other hand, data-mining algorithms have been introduced that aim to respect other notions of privacy that may be less formally justified. Our goal is to bridge the gap between approaches in the cryptography and information security community, and those in the data-mining community. This is necessary, as there is a tradeoff between the privacy of a protocol, and the learnability of functions that respect the protocol. In addition to the specific contributions of our paper, we hope to encourage the machine learning community to embrace the goals of privacy-preserving machine learning, as it is still a fledgling endeavor. In this work, we provide algorithms for learning in a privacy model introduced by Dwork et al. [7]. The -differential privacy model limits how much information an adversary can gain about a particular private value, by observing a function learned from a database containing that value, even if she knows every other value in the database. An initial positive result [7] in this setting depends on the sensitivity of the function to be learned, which is the maximum amount the function value can change due to an arbitrary change in one input. Using this method requires bounding the sensitivity The majority of this work was done while at UC San Diego. 1 of the function class to be learned, and then adding noise proportional to the sensitivity. This might be difficult for some functions that are important for machine learning. The contributions of this paper are as follows. First we apply the sensitivity-based method of designing privacy-preserving algorithms [7] to a specific machine learning algorithm, logistic regression. Then we present a second privacy-preserving logistic regression algorithm. The second algorithm is based on solving a perturbed objective function, and does not depend on the sensitivity. We prove that the new method is private in the -differential privacy model, and we also provide a generalization bound on the performance of this algorithm. Finally, we provide experiments demonstrating superior learning performance of our new method, with respect to the algorithm based on [7]. Our technique may have broader applications, and we show that it can be applied to certain classes of optimization problems. 1.1 Overview and related work At the first glance, it may seem that anonymizing a data-set ­ namely, stripping it of identifying information about individuals, such as names, addresses, etc ­ is sufficient to preserve privacy. However, this is problematic, because an adversary may have some auxiliary information, which may even be publicly available, and which can be used to breach privacy. For more details on such attacks, see [13]. To formally address this issue, we need a definition of privacy which works in the presence of auxiliary knowledge by the adversary. The definition we use is due to Dwork et al. [7], and has been used in several applications [5, 12, 3]. We explain this definition and privacy model in more detail in Section 2. Privacy and learning. The work most related to ours is [10] and [4]. [10] shows how to find classifiers that preserve -differential privacy; however, their algorithm takes time exponential in d for inputs in Rd . [4] provides a general method for publishing data-sets while preserving differential privacy such that the answers to all queries of a certain type with low VC-dimension are approximately correct. However, their algorithm can also be computationally inefficient. Additional related work. There has been a substantial amount of work on privacy in the literature, spanning several communities. Much work on privacy has been done in the data-mining community [1, 8], [15, 11], however the privacy definitions used in these papers are different, and some are susceptible to attacks when the adversary has some prior information. In contrast, the privacy definition we use avoids these attacks, and is very strong. 2 Sensitivity and the -differential privacy model Before we define the privacy model that we study, we will note a few preliminary points. Both in that model, and for our algorithm and analyses, we assume that each value in the database is a real vector with norm at most one. That is, a database contains values x1 , . . . , xn , where xi Rd , and xi 1 for all i {1, . . . , n}. This assumption is used in the privacy model. In addition, we assume that when learning linear separators, the best separator passes through the origin. Note that this is not an assumption that the data is separable, but instead an assumption that a vector's classification is based on its angle, regardless of its norm. In both privacy-preserving logistic regression algorithms that we state, the output is a parameter vector, w, which makes prediction SGN(w · x), on a point x. For a vector x, we use ||x|| to denote its Euclidean norm. For a function G(x) defined on Rd , we use G to denote its gradient and 2G to denote its Hessian. Privacy Definition. The privacy definition we use is due to Dwork et al. [7, 6]. In this model, users have access to private data about individuals through a sanitization mechanism, usually denoted by M . The interaction between the sanitization mechanism and an adversary is modelled as a sequence of queries, made by the adversary, and responses, made by the sanitizer. The sanitizer, which is typically a randomized algorithm, is said to preserve -differential privacy, if, the private value of any one individual in the data set does not affect the likelihood of a specific answer by the sanitizer by more than . 2 More formally, -differential privacy can be defined as follows. Definition 1 A randomized mechanism M provides -differential privacy, if, for all databases D1 and D2 which differ by at most one element, and for any t, Pr[M (D1 ) = t] I 1+ Pr[M (D2 ) = t] t was shown in [7] that if a mechanism satisfies -differential privacy, then an adversary who knows the private value of all the individuals in the data-set, except for one single individual, cannot figure out the private value of the unknown individual, with sufficient confidence, from the responses of the sanitizer. -differential privacy is therefore a very strong notion of privacy. [7] also provides a general method for computing an approximation to any function f while preserving -differential privacy. Before we can describe their method, we need a definition. Definition 2 For any function f with n inputs, we define the sensitivity S (f ) as the maximum, over all inputs, of the difference in the value of f when one input of f is changed. That is, S (f ) = max |f (x1 , . . . , xn-1 , xn = a) - f (x1 , . . . , xn-1 , xn = a )| (a,a ) [7] shows that for any input x1 , . . . , xn , releasing f (x1 , . . . , xn ) + , where is a random variable drawn from a Laplace distribution with mean 0 and standard deviation S (f ) preserves -differential privacy. In [14], Nissim et al. showed that given any input x to a function, and a function f , it is sufficient to draw from a Laplace distribution with standard deviation S S (f ) , where S S (f ) is the smoothedsensitivity of f around x. Although this method sometimes requires adding a smaller amount of noise to preserve privacy, in general, smoothed sensitivity of a function can be hard to compute. 3 A Simple Algorithm Based on [7], one can come up with a simple algorithm for privacy-preserving logistic regression, which adds noise to the classifier obtained by logistic regression, proportional to its sensitivity. From 2 Corollary 2, the sensitivity of logistic regression is at most n . This leads to Algorithm 1, which obeys the privacy guarantees in Theorem 1. Algorithm 1: 1. Compute w , the classifier obtained by logistic regression on the labelled examples (x1 , y1 ), . . . , (xn , yn ). 2. Pick a noise vector as follows. First, we pick a positive x from the Laplace distribution with mean 0 and standard deviation 2 . Next we pick a unit vector uniformly at random. ^ n The vector is set to be equal to x · . ^ 3. Output w + . Theorem 1 Let (x1 , y1 ), . . . , (xn , yn ) be a set of labelled points over Rd such that ||xi || 1 for all i. Then, Algorithm 1 preserves -differential privacy. P RO O F : The proof follows by a combination of [7], and Corollary 2, which states that the sensitivity 2 o Lf logistic regression is at most n . emma 1 Let G(w) and g (w) be two convex functions, which are continuous and differentiable at g all points. If w1 = argminw G(w) and w2 = argminw G(w) + g (w), then, ||w1 - w2 || G1 . Here, 2 g1 = maxw || g(w)|| and G2 = minv minw v T 2G(w)v, for any unit vector v . The main idea of the proof is to examine the gradient and the Hessian of the functions G and g around w1 and w2 . Due to lack of space, the full proof appears in the full version of our paper. Corollary 2 Given a set of n examples x1 , . . . , xn in Rd , with labels y1 , . . . , yn , such that for all i, 2 ||xi || 1, the sensitivity of logistic regression with regularization parameter is at most n . P RO O F : We use a triangle inequality and the fact 3hat G2 and g1 t 1 n. 3.1 An Example In this section, we show a set of example distributions Dd ( ) on Rd , such that the classifier provided by Algorithm 1 on Dd ( ) provides worse error bounds than the classifier provided by logistic regression. The support of Dd ( ) is a set of 2d labelled points, drawn from d dimensions. Let Dd be the uniform distribution over the hypercube {-1, +1}d . To get a sample from Dd ( ), we raw a d sample from D and scale it as follows : coordinates x2 , . . . , xd are multiplied by a factor of d-1 , and coordinate x1 is multiplied by . Moreover, the label of a point is the sign of its first coordinate. Notice that for this labelled distribution, the vector (1, 0, 0, . . . , ) is the perfect classifier. Moreover, the norm of each point in the distribution is 1. Figure 1a) shows the test error of the classifiers provided by (non-private) logistic regression and the output of Algorithm 1 on D10 (0.001), as a function of sample size n. We see that even if logistic regression achieves very low error over D10 (0.001), when is low, the error due to Algorithm 1 is quite high, and does not decrease substantially with increasing n. In the next section, we provide an algorithm which addresses this issue. 1- 2 4 A New Algorithm In this section, we provide a new privacy-preserving algorithm for logistic regression. The input to our algorithm is a set of examples x1 , . . . , xn over Rd such that ||xi || 1 for all i, a set of labels y1 , . . . , yn for the examples, a regularization constant and privacy parameter , and the output is a vector w in Rd . Our algorithm works as follows. Algorithm 2: 1. We pick a random vector b as follows. First, we pick a number X from the Laplace distribution with mean 0 and standard deviation 2 . Next, we pick a unit vector ^ uniformly from b ^. the d-dimensional unit sphere. The vector b is then set to X · b 2. Given examples x1 , . . . , xn , with labels y1 , . . . , yn and a regularization constant , we n T T 1 compute w = argminw 1 wT w + b nw + n i=1 log(1 + e-yi w xi ). Output w . 2 We observe that our method solves a convex programming problem very similar to the logistic regression convex program, and therefore it has running time similar to that of logistic regression. In the sequel, we show that the output of Algorithm 2 is privacy preserving. Theorem 2 Given a set of n examples x1 , . . . , xn over Rd , with labels y1 , . . . , yn , where for each i, ||xi || 1, the output of Algorithm 2 preserves -differential privacy. P RO O F : Let a and a be any two vectors over Rd with norm at most 1, and y , y {-1, 1}. For any such (a, y ), (a , y ), consider the inputs (x1 , y1 ), . . . , (xn-1 , yn-1 ), (a, y ) and (x1 , y1 ) . . . , (xn-1 , yn-1 ), (a , y ). Then, for any w output by our algorithm, there is a unique value of b that maps the input to the output. Let the values of b for the first and second cases respectively, be b1 and b2 . Since w is the value that minimizes both the optimization problems, the derivative of both optimization functions at w is 0. This implies that for every b1 in the first case, there exists a b2 in the second case such that: b1 - ya ya = b2 - 1+ey wT a . Since ||a|| 1, ||a || 1, and 1+ey1 T a 1, and 1+ey 1 a 1 wT y w T a w 1+e for any w , ||b1 - b2 || 2. Using the triangle inequality, ||b1 || - 2 ||b2 || ||b1 || + 2. Therefore, for any pair (a, y ), (a , y ), h(b1 ) Pr[w |x1 , . . . , xn-1 , y1 , . . . , yn-1 , xn = a, yn = y ] = |x , . . . , x ,y = y ] Pr[w 1 h(b2 ) n-1 , y1 , . . . , yn-1 , xn = a n where h(bi ) for i = 1, 2 is the density of bi . Since -2 ||b1 || - ||b2 || 2, and the direction of each bi is chosen uniformly at random, this ratio lies between e- and e . The theorem follows. 4 Test error on 1000 point holdout set (10 random restarts). Test error on 1000 point holdout set (10 random restarts). Test error on 1000 point holdout set (10 random restarts). 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 1 2 3 4 5 6 7 2 N/100. Learning curve for d=10, epsilon=0.2, gamma=0.001, lambda=gamma . Standard LR Sensitivity method 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 1 2 3 4 5 6 7 2 N/100. Learning curve for d=10, epsilon=0.2, gamma=0.001, lambda=gamma . New method Standard LR Sensitivity Method 0.05 0.045 0.04 0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 1 2 3 4 5 6 7 N/100. Learning curve for d=10, epsilon=0.2, gamma=0.001, lambda=gamma2. New method Standard LR Figure 1: a) Example from Section 3.1, b) including new method, c) zooming in on low error rates. Other Applications. Our algorithm for privacy-preserving logistic regression can be generalized to provide privacy-preserving outputs for more general convex optimization problems, so long as the problems satisfy certain conditions. These conditions can be formalized in the theorem below. Theorem 3 Let X = {x1 , . . . , xn } be a database containing private data of individuals. Suppose n we would like to compute a vector w that minimizes the function F (w) = G(w) + i=1 l(w, xi ), over w Rd for some d, such that (1) G(w) and l(w, xi ) are differentiable everywhere, and have continuous derivatives (2) G(w) is strongly convex and l(w, xi ) are convex for all i and (3) || wl(w, xi )|| , for all i. Let b = B · ^, where B is drawn from a Laplace distribution with b 2 ^ is drawn uniformly from the surface of a d-dimensional mean 0 and standard deviation , and b n unit sphere. Then, computing w , where w = argminw G(w) + i=1 l(w, xi ) + bT w, provides -differential privacy. 5 Learning Guarantees In this section, we show a theoretical bound on the number of samples required by our algorithm to learn a linear classifier. For the rest of the section, we use the following notation. For a classifier ^ w, we use L(w) to denote the expected loss of w over the data distribution, and L(w) to denote ^ (w) = 1 n log(1 + the empirical average loss of w over the training data. In other words, L i=1 n T e-yi w xi ), where, (xi , yi ), i = 1, . . . , n are the training examples. Further, for a classifier w, we use the notation f (w) to denote the quantity 1 ||w||2 + L(w) and 2 ^ ^ f (w) to denote the quantity 1 ||w||2 + L(w). 2 The main result of this section is the following theorem. Theorem 4 Let w0 be a classifier with expected loss L over the data distribution. If the training examples are drawn independently from the data distribution, and if n > 2 C log(1/ ) max( ||w0 || , ||w0 || ), for some constant C , then, the classifier output by Algorithm 2 has 2 g g loss at most L + g over the data distribution. We observe that for constant , if the privacy parameter is more than the generalization error g, then the number of examples required by our algorithm to acheive generalization error g is of the same order as required by standard logistic regression. Usually, g < , and therefore one needs only a constant factor more examples to guarantee good generalization along with privacy-preserving learning. Before we can prove Theorem 4, we need the following lemma, which shows that the values of ^ ^ f (w2 ) and f (w1 ) are close. It can be shown that a bound similar to Lemma 3 does not hold for the output of Algorithm 1. 5 Sensitivity method New method Standard LR Uniform, margin=0.1 0.0924±0.0522 0.0259±0.0149 0±0 Uniform, margin=0.05 0.2842±0.0283 0.0687±0.0202 0±0 UCI Pima Diabetes 0.4976±0.0546 0.4262±0.0496 0.3184±0.0155 UCI Breast 0.4569±0.0655 0.1900±0.0554 0.0755±0.0381 Figure 2: Test error: mean ± standard deviation. N=1000 Uniform, N=600 Diabetes, N=450 Breast. Average test error over 5-fold cross-valid. (10 random restarts) 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 New method Sensitivity method 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 Epsilon. Data is uniform, d=20, margin=0.1, N=1000. 0.2 Figure 3: Epsilon curve. Lemma 3 Given a logistic regression problem with regularization parameter , let w1 be the clas^ sifier that minimizes f , and w2 be the classifier output by Algorithm 2 respectively. Then, with 2 / ^ ^ probability 1 - over the randomness in the privacy mechanism, f (w2 ) f (w1 ) + 8 log 2(12 ) . n The proof is in the full version of our paper. Now we are ready to prove Theorem 4. P RO O F : Let w be the classifier that minimizes f (w) over the data distribution, and let w1 and w2 T ^ ^ be the classifiers that minimize f (w) and f (w) + b nw over the data distribution respectively. We can use an analysis as in [16] to write that: L(w2 ) = L(w0 ) + (f (w2 ) - f (w )) + (f (w ) - f (w0 )) + ||w0 ||2 - ||w2 ||2 (1) 2 2 2 ^ ^ Notice that from Lemma 3, f (w2 ) - f (w1 ) 8 log 2(1/) . Using this and [17], we can bound the n 2 og second quantity in equation 1 as f (w2 ) - f (w ) 16 ln2(1/) + O( 1 ). By definition of w , 2 n g the third quantity in Equation 1 is non-positive. If is set to be ||w0 ||2 , then, the fourth quantity g in Equation 1 is at most 2 . We now consider two cases, depending on the relative values of a 2 nd g ||w0 || . First, suppose that g ||w0 || . In this case, if n > C is suitably large, then, 1 n2 2 + O( 1 ) n n > C log(1/g )||w0 || , where the constant C E case, the total loss of the classifier w2 output by Algorithm 2 is at most L(w0 ) + g. C log(1/ )||w0 ||2 , and if the constant 2 g g g 2 . Now, suppose that ||w0 || . In this case, if g is suitably large, then, n1 2 + O( 1 ) 2 . In either 2 n xample from 3.1. Finally, we show that Algorithm 2, when applied to the example in Section 3.1 provides a classifier with a much lower error bound than Algorithm 1. Figure 1b)-c) compares the error-rates of the classifiers provided by Algorithm 1, Algorithm 2 and logistic regression on D10 as a function of the number of samples n. As we see from the figure, Algorithm 2 and logistic regression have similar error rates, whereas Algorithm 1 provides a classifier with a much higher error rate. 6 Experiments The purpose of our experiments is to verify that using a privacy-preserving approach to logistic regression does not degrade learning performance terribly, from that of standard logistic regression. Performance degredation is inevitable however, as in both cases, in order to address privacy concerns, we are adding noise, either to the learned classifier, or to the objective. Another goal of the experiments is to compare the two privacy-preserving approaches to standard logistic regression. 6 Average test error over 5-fold cross-valid. (10 random restarts) Average test error over 5-fold cross-valid. (10 random restarts) 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 100 New method Standard LR Sensitivity method 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 100 New method Standard LR Sensitivity method 200 300 400 500 600 700 800 900 1000 N/100. Learning curve for uniform data, d = 20, margin=0.1, epsilon=0.2 200 300 400 500 600 700 800 900 1000 N/100. Learning curve for uniform data, d = 20, margin=0.05, epsilon=0.2 Average test error over 5-fold cross-valid. (10 random restarts) Average test error over 5-fold cross-valid. (10 random restarts) 0.6 0.55 0.5 0.45 0.4 0.35 0.3 0.25 50 New method Standard LR Sensitivity method 0.7 New method 0.6 Standard LR Sensitivity method 0.5 0.4 0.3 0.2 0.1 100 150 200 250 300 350 400 450 500 550 N/50. Learning curve for UCI Pima Diabetes, n=768, d = 8, epsilon=0.2. 600 0 50 100 150 200 250 300 350 400 N/50. Learning curve for UCI Breast, n=569, d = 32, epsilon=0.2 450 Figure 4: Learning curves. a) Uniform distribution, margin=0.1, b) margin=0.05. c) UCI Pima Diabetes. d) UCI Breast Cancer. In order to obtain a clean comparison between the various logistic regression variants, we first experimented with artificial data that is separable through the origin. We used a uniform distribution on the hypersphere with zero mass within a small margin from the generating linear separator. We also experimented on several real data sets from the UC Irvine Machine Learning Repository [2]: Pima Indians Diabetes, and Breast Cancer Wisconsin Diagnostic, chosen in part for their potentially private nature, as disease data. Our implementations use the CVX [9] convex optimization package. Figure 2 gives mean and standard deviation of test error over five folds of cross validation. In all four problems, the new algorithm is superior to the sensitivity method, though incurs more error than standard (non-private) logistic regression. We used the following evaluation framework in all results reported. The simulations contain 1250 points in 20 dimensions, Diabetes has 786 points in 8 dimensions, and Breast has 569 points in 32 dimensions. On each problem, we tuned the logistic regression parameter, , to minimize the test error of standard logistic regression (the tuned values are: = 0.01 for margin 0.1, = 0.001 for margin 0.05, and = 10-6 for the UCI data). For each test error computation, the performance of each of the privacy-preserving algorithms was evaluated by averaging over 10 random restarts, since they are both randomized algorithms. In order to observe the effect of the level of privacy on the learning performance of the privacypreserving learning algorithms, in Figure 3 we vary , the privacy parameter to the two algorithms, on the simulation with margin 0.1. As per the definition of -differential privacy in Section 2, strengthening the privacy guarantee corresponds to reducing . Both algorithms' learning performance degrades in this direction, however at all values of that we tested, the new method is superior in managing the tradeoff between privacy and learning performance. In Figure 4 we provide learning curves. For the simulations, we graph the test error after each increment of 100 points, averaged over five-fold cross validation. For the UCI data we used increments of 50 points. The learning curves reveal that not only does the new method reach a lower final error than the sensitivity method, but it also maintains better performance at most smaller training set sizes. In the simulations, as the problem gets more difficult for standard logistic regression, with the decrease in margin from 0.1 to 0.05, the performance gap between logistic regression and the privacy methods widens, yet the gap between the new method and the sensitivity method increases as well. 7 7 Conclusion In conclusion, we show two ways to construct a privacy-preserving linear classifier through logistic regression. The first one uses the methods of [7], and the second one is a new algorithm. Using the -differential privacy definition of Dwork et al. [7], we prove that our new algorithm is privacypreserving. We then provide a generalization bound on the performance of our new algorithm, and run experiments in which our new algorithm outperforms the method based on [7]. Our work reveals an interesting connection between regularization and privacy: the larger the regularization constant, the less sensitive the logistic regression function is to any one individual example, and as a result, the less noise one needs to add to make it privacy-preserving. Therefore, regularization not only prevents overfitting, but also helps with privacy, by making the classifier less sensitive. An interesting future direction would be to explore whether other methods that prevent overfitting also have such properties. Other future directions would be to apply our techniques to other commonly used machine-learning algorithms, and to explore whether our techniques can be applied to more general optimization problems. Theorem 3 shows that our method can be applied to a class of optimization problems with certain restrictions. An open question would be to remove some of these restrictions. Acknowledgements. We thank Sanjoy Dasgupta and Daniel Hsu for several pointers. References [1] R. Agrawal and R. Srikant. Privacy-preserving data mining. SIGMOD Rec., 29(2):439­450, 2000. [2] A. Asuncion and D. Newman. UCI machine learning repository, 2007. [3] B. Barak, K. Chaudhuri, C. Dwork, S. Kale, F. McSherry, and K. Talwar. Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In PODS, pages 273­282, 2007. [4] A. Blum, K. Ligett, and A. Roth. A learning theory approach to non-interactive database privacy. In R. E. Ladner and C. Dwork, editors, STOC, pages 609­618. ACM, 2008. [5] K. Chaudhuri and N. Mishra. When random sampling preserves privacy. In C. Dwork, editor, CRYPTO, volume 4117 of Lecture Notes in Computer Science, pages 198­213. Springer, 2006. [6] C. Dwork. Differential privacy. In M. Bugliesi, B. Preneel, V. Sassone, and I. Wegener, editors, ICALP (2), volume 4052 of Lecture Notes in Computer Science, pages 1­12. Springer, 2006. [7] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference, pages 265­284, 2006. [8] A. Evfimievski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data mining. In PODS, pages 211­222, 2003. [9] M. Grant and S. Boyd. Cvx: http://stanford.edu/ boyd/cvx, 2008. Matlab software for disciplined convex programming: [10] S. P. Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith. What can we learn privately? In Proc. of Foundations of Computer Science, 2008. [11] A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. l-diversity: Privacy beyond kanonymity. In ICDE, page 24, 2006. [12] F. McSherry and K. Talwar. Mechanism design via differential privacy. In FOCS, pages 94­103, 2007. [13] A. Narayanan and V. Shmatikov. Robust de-anonymization of large sparse datasets. In IEEE Symposium on Security and Privacy, pages 111­125. IEEE Computer Society, 2008. [14] K. Nissim, S. Raskhodnikova, and A. Smith. Smooth sensitivity and sampling in private data analysis. In D. S. Johnson and U. Feige, editors, STOC, pages 75­84. ACM, 2007. [15] P. Samarati and L. Sweeney. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. In Proc. of the IEEE Symposium on Research in Security and Privacy, 1998. [16] S. Shalev-Shwartz and N. Srebro. Svm optimization: Inverse dependence on training set size. In International Conference on Machine Learning(ICML), 2008. [17] K. Sridharan, N. Srebro, and S. Shalev-Schwartz. Fast rates for regularized objectives. In Neural Information Processing Systems, 2008. 8