Mining Internet-Scale Software Repositories Erik Linstead, Paul Rigor, Sushil Bajracharya, Cristina Lopes, Pierre Baldi Open Source Exposed! M68 New Methods for Code Search and Analysis · Over 4,600 projects and 38 million lines of code · Automated program understanding with topic modeling · Improved software retrieval via CodeRank and probabilistic keyword analysis · Probabilistic techniques for refactoring Developer Analysis! Topics={keywords} Web Programming {servlet, session,http} File Processing {file, dir, directory,path} Debugging {target, debug, breakpoint, source} Power Laws!