Mining Internet-Scale Software Repositories
Erik Linstead, Paul Rigor, Sushil Bajracharya, Cristina Lopes, Pierre Baldi
Open Source Exposed!

M68

New Methods for Code Search and Analysis
· Over 4,600 projects and 38 million lines of code · Automated program understanding with topic modeling · Improved software retrieval via CodeRank and probabilistic keyword analysis · Probabilistic techniques for refactoring

Developer Analysis!

Topics={keywords} Web Programming {servlet, session,http} File Processing {file, dir, directory,path} Debugging {target, debug, breakpoint, source}
Power Laws!