Since my blog a few days ago, a few people have asked about the “Top Ten Mistakes Made When Evaluating Anti-Malware Software” that Kevin Townsend quoted here. Kevin was actually quoting a press release here, but it’s actually something I’ve used in quite a few contexts. A sort of mini-update to “A Reader’s Guide to Reviews“, originally credited to Sarah Tanner in Virus News International, but actually written by Dr. Alan Solomon.
So here it is again (slightly expanded): perhaps more up-to-date but considerably less detailed than Alan’s article.
1. Using samples received via email or on a honeypot machine, without checking that they really are malicious software. Some tests we come across have included well over 10% false positives, corrupted samples and so on and used them uncritically (i.e. without validation).
2. Using one of the programs to be tested to validate the samples. Or we might use the phrase pseudovalidate, since this takes no account of the possibility of false positives.
3. Assuming any sample detected by two or more scanners as malicious to be valid. This may bias the test in favour of products that flag everything that meets very broad criteria as suspicious, and against products that are more discriminating and fastidious about false positives. “It’s executable! It’s suspicious!”
4. Using VirusTotal or a similar service to check the samples and assume that any product that doesn’t report them as malicious can’t detect them. This will once again give the advantage to scanners that flag everything as “suspicious”, and will also disadvantage scanners that use some form of dynamic or behavioural analysis. It’s certainly not a real test, and it’s a form of pseudo-testing that VirusTotal itself discourages.
5. Using the default settings for detection testing, without trying to configure each product to the same level of paranoia. This isn’t a test of detection, but a test of design philosophy. Which is fine as long as you and your readers understand that.
6. Using default settings for scanning speed. This may introduce a bias in favour of products that get their speed advantage by cutting corners on detection, which may not be what the tester (or his audience) had in mind.
7. Asking vendors to supply samples. This may allow the vendor to bias the results in their own favour by including samples that other companies are unlikely to have access to, and to the disadvantage of companies who consider it unethical to share samples outside their web of trust. Some companies won’t cooperate with this sort of testing, but it puts them at an unfair disadvantage because it looks as if they’re scared to compete.
8. Categorising samples incorrectly, leading to possible errors in configuration. For instance, not all products flag certain kinds of “greyware” (described by some vendors as “possibly unwanted applications” or similar) as malware by default. That can be particularly misleading in combination with error 5.
9. Too much self belief. If, when testing two products that use the same version of the same engine, they score completely differently, it is unsafe to assume that there must be something wrong with the lower-scoring product. It is just as likely to be a problem with the setup or methodology. But some testers will not discuss the possibility that they may have tested incorrectly, and will not allow vendors to validate their sample set or methodology in any way. Of course, this may not be overconfidence, but a fear that their test will be found to be invalid.
10. Not including a contact point or allowing any right to reply. Be open in the methodology used and the objective of the evaluation, to allow others the possibility of verifying the validity of the test.
Mapping these points against the nine principles is left as an exercise for the reader. Or maybe I’ll come back to that. 😉
David Harley CITP FBCS CISSP