This year, the Best Paper Awards at USENIX Security went to two papers from the field of data-driven security. These papers are great examples of the benefits of systematic data collection and analysis for answering new questions on topics that have traditionally relied on math and experimentation. The data sets analyzed in these papers were key to the research results reported.
The paper “The Million-Key Question—Investigating the Origins of RSA Public Keys”, by Petr Švenda from Masaryk University, Czech Republic, and his colleagues, teaches us how to fingerprint RSA public keys. This result is surprising because (i) RSA keys should look like uniform random numbers and (ii) Švenda’s method works without finding the prime factors of the RSA modulus. In other words, you don’t have to break RSA to find out which algorithm implementation generated the key. After testing 22 software libraries and 16 smart cards and creating 1 million key pairs with each implementation, the authors discovered artifacts in the most significant bits (MSBs) of the public keys. These artifacts result in 13 clusters of keys. Given a key, the Masaryk researchers can guess the correct cluster with a 40% accuracy on average. While this gives worse odds than a coin flip, the gap between the distribution of real public keys and the theoretical space of RSA moduli is striking. Besides, with 10 keys from the same implementation, the guessing accuracy increases to 85% on average. This enables attacks such as quickly searching for keys from a vulnerable library or linking related Tor hidden service operators. To mitigate these attacks, the authors have developed a Web service for testing multiple keys and selecting the most anonymous one. These estimates assume that RSA implementations are equally popular; however, if all real-world keys come from, say, OpenSSL, then fingerprinting would not work. To answer this puzzle, the Masaryk team classified 11 million real keys (for example, from TLS scans or PGP key dumps) and compared the prevalence of the key clusters with the market share of the respective products. The numbers matched within a few percentage points. The paper also discusses the 7 implementation choices that bias the MSBs, such as directly manipulating the high order bits of the RSA primes or avoiding small factors. Remarkably, all these implementation choices make factorization more difficult. However, the lack of agreement about the best way to generate keys reduces the anonymity set of users who rely on these keys. Unlike other implementation flaws found by reviewing the code (including a previous method to fingerprint OpenSSL keys based on their prime factors), this threat became clear only after analyzing millions of keys from real-world RSA implementations. The researchers released their data sets here.
The paper “Fast, Lean, and Accurate: Modeling Password Guessability Using Neural Networks”, by William Melicher of Carnegie Mellon University and his colleagues, describes a machine learning tool for predicting how easy it is to guess a password. The tool is essentially a strength meter that incorporates recent advances in password research. In the past five years we’ve seen a resurgence of this old topic, focusing mainly on measuring password strength. For example, researchers have shown that the widely implemented FIPS guidelines for evaluating passwords measure the wrong thing (they don’t account for the fact that password cracking tools become better with each new password leak, as observed in practice), that password expiration policies are a measurably bad idea (as Bruce Schneider has been arguing), or that humans can learn strong passwords as long as they don’t have to invent them (after all, we used to remember whole address books of 10-digit phone numbers). The new strength metrics focus on guessability, which tries to estimate how many guesses a skilled hacker would need to break the password. Unfortunately, evaluating guessability involved hand crafted and slow heuristics (such as the word mangling rules from popular password crackers) and needed gigabyte-sized dictionaries. Instead, the CMU researchers trained a neural network to predict the next character of a password given the preceding characters. They used 105 million passwords, collected in their prior studies or leaked by various organizations, to train and validate this classifier. The neural network evaluated guessability more accurately than four previous techniques and two client-side password meters (which often overestimate password strength). With a few clever tricks to compress the model, the classifier can check a password with sub-second latency and requires only a few hundreds of kilobytes of storage. This makes it suitable for a real-time password meter, bundled with a client side encryption application (like GPG or TrueCrypt) or used in a Web site sign-up form. The researchers released their tool on Github. This approach has a drawback: unlike hand crafted rules, the neural network is a black box that cannot explain why the chosen password is weak or how to improve it. Nevertheless, as the first paper to apply machine learning to the problem of measuring password strength, this work opens a new research direction for the study of passwords.
As well as being instances of good research, these two papers also illustrate the unique insights that data driven methods can provide to security problems. Development-time assumptions often do not hold in the real world, and as a result software behaves in surprising ways that make security a moving target. Without measuring the statistical properties of public keys on a large corpus, the Masaryk team would not have been able to observe the idiosyncrasies of various crypto libraries. Defenses are also more effective when they account for the capabilities and limitations of real-world adversaries. Without the opportunity to learn the features of millions of real passwords, the CMU neural network would have been an inaccurate model of a hacker’s guessing ability. Similarly, answering some of the questions raised by these papers (such as how to make ML-based password meters more usable) will require further data-driven research.