This year the International Symposium on Research in Attacks, Intrusions and Defenses (RAID) decided to include public reviews of all the accepted papers in the program (click on the “Public Review” link next to each paper). A public review is supposed to summarize the program committee discussions that lead to the acceptance decision, in order to justify this decision to the readers. Perhaps more importantly, a public review provides an independent perspective on the research, highlighting its strengths and weaknesses as perceived by 3–4 members of the program committee, who did not participate in the research. While these strengths and weaknesses may be obvious to experts in the topic addressed in the paper, other researchers and practitioners will benefit from this independent perspective. Unlike the reviews published by other conferences, each RAID’15 public review is a summary statement about the paper (rather than the verbatim reviews exported from the conference management software), written by a program committee member after the PC meeting and vetted by one of the PC chairs, and are meant to reflect the final version of the paper (not the submitted one) and all the PC discussions regarding the paper. Public reviews can be a great tool to advance our knowledge of computer security, if reviewers take the task of writing them seriously. I am happy that RAID decided to experiment with public reviews and I am curious how this experiment will evolve in future years, especially as the IMC symposium stopped posting public reviews last year.
In the mean time, I include below the public review I wrote for the paper “Ensemble Learning for Low-level Hardware-supported Malware Detection” by Khaled N. Khasawneh, Meltem Ozsoy, Caleb Donovick, Nael Abu-Ghazaleh and Dmitry V. Ponomarev. This review reflects the consensus opinion of the program committee, rather than my own views (but I do stand by the review as a whole). On that account, RAID public reviews are posted anonymously, on behalf of the PC, but I hope that nobody will be too upset if I reveal that I wrote this one.
Anti-virus software causes performance overhead on the hosts it aims to protect. To reduce this overhead, the paper investigates the effectiveness of hardware-supported malware detection. The authors consider several low-level features, which can be collected in hardware, and evaluate how well they work for particular categories of malware. They also propose metrics to quantify the benefit to anti-virus products from these hardware signals, and briefly outline a hardware implementation of the feature collection and classification system, by extending the AO486 open core and synthesizing it on an FPGA platform.
The significance of the work lies in its contribution to our understanding of hardware-based malware detection. While the security community has devoted considerable efforts to the development of software for detecting malicious programs, hardware detectors are a more recent idea. Demme et al. explored malware detection using the performance counters available on modern CPUs, and the authors’ prior work proposed additional features that could potentially be collected in hardware, such as features derived from the instruction mix and memory reference patterns. Given that these low-level features are less insightful than the semantic features used in software-based detectors, a key question is how much they raise the bar for the attacker.
The program committee liked the paper’s attempt to answer this question, by defining metrics to quantify the benefit that software-based anti-virus products can derive from the low-level hardware features. Specifically, the authors envision a system where the hardware detector is always on and produces a list of suspected processes, and the software detector uses this list to prioritize the usage of more heavy-weight protection mechanisms (e.g. Control Flow Integrity). The authors define and evaluate the work advantage (the ratio of the malware volumes detected with and without hardware support, in a scenario where the software detector scans only a fraction of the programs executed) and the time-to-detection (the expected time to detect a specific malware sample, when the software detector scans all the programs).
The reviewers also liked the fact that the authors break down malware into several categories (with different behaviors) and train logistic regression classifiers that are specialized to detect malware in each category. The final hardware detector is an ensemble classifier, which combines the outputs of the specialized detectors; the feature collection, the logistic regression and the combination of specialized models are implemented in hardware. The experimental results suggest that the specialized ensemble classifier has a work advantage of 11x (1.87x better than the best single detector) and reduces the time-to-detection by 6.6x when the fraction of malware programs is low (the hardware detectors exhibit diminishing returns with an increasing amount of malware). The program committee felt that these results represent a promising, but not definitive, evaluation of hardware support for anti virus scanning, as they are derived from micro-benchmarks. For example, the testing set for a specialized detector (and the experiments comparing it to the general detectors) only includes normal programs and the specialized detector’s type of malware; it is less clear what happens when a certain specialized detector (e.g. for backdoors) is presented with a malware sample from another category (e.g. a worm).
In consequence, the effectiveness of hardware-based malware detection with more representative workload mixes remains an open question. In particular, a machine typically runs multiple processes, and it is unclear how easy it would be for the hardware detector to separate the features belonging to each process. Another open question is why these low level features work, and how easy it would be for an attacker to bypass them. The authors provide an intuition for this in Section 3, but a more rigorous answer to this question will require follow-on research.
Link to paper:
K. Khasawneh, M. Ozsoy, C. Donovick, N. Abu-Ghazaleh and D. Ponomarev. “Ensemble Learning for Low-level Hardware-supported Malware Detection,” in International Symposium on Research in Attacks, Intrusions and Defenses (RAID), Kyoto, Japan, 2015.
PDF