Posted by: David Harley | June 21, 2010

How to Screw Up Testing

Since my blog a few days ago, a few people have asked about the “Top Ten Mistakes Made When Evaluating Anti-Malware Software” that Kevin Townsend quoted here. Kevin was actually quoting a press release here, but it’s actually something I’ve used in quite a few contexts. A sort of mini-update to “A Reader’s Guide to Reviews“, originally credited to Sarah Tanner in Virus News International, but actually written by Dr. Alan Solomon.

So here it is again (slightly expanded): perhaps more up-to-date but considerably less detailed than Alan’s article.

1. Using samples received via email or on a honeypot machine, without checking that they really are malicious software. Some tests we come across have included well over 10% false positives, corrupted samples and so on and used them uncritically (i.e. without validation).

2. Using one of the programs to be tested to validate the samples. Or we might use the phrase pseudovalidate, since this takes no account of the possibility of false positives.

3. Assuming any sample detected by two or more scanners as malicious to be valid. This may bias the test in favour of products that flag everything that meets very broad criteria as suspicious, and against products that are more discriminating and fastidious about false positives. “It’s executable! It’s suspicious!”

4. Using VirusTotal or a similar service to check the samples and assume that any product that doesn’t report them as malicious can’t detect them. This will once again give the advantage to scanners that flag everything as “suspicious”, and will also disadvantage scanners that use some form of dynamic or behavioural analysis. It’s certainly not a real test, and it’s a form of pseudo-testing that VirusTotal itself discourages.

5. Using the default settings for detection testing, without trying to configure each product to the same level of paranoia. This isn’t a test of detection, but a test of design philosophy. Which is fine as long as you and your readers understand that.

6. Using default settings for scanning speed. This may introduce a bias in favour of products that get their speed advantage by cutting corners on detection, which may not be what the tester (or his audience) had in mind.

7. Asking vendors to supply samples. This may allow the vendor to bias the results in their own favour by including samples that other companies are unlikely to have access to, and to the disadvantage of companies who consider it unethical to share samples outside their web of trust. Some companies won’t cooperate with this sort of testing, but it puts them at an unfair disadvantage because it looks as if they’re scared to compete.

8. Categorising samples incorrectly, leading to possible errors in configuration. For instance, not all products flag certain kinds of “greyware” (described by some vendors as “possibly unwanted applications” or similar) as malware by default. That can be particularly misleading in combination with error 5.

9. Too much self belief. If, when testing two products that use the same version of the same engine, they score completely differently, it is unsafe to assume that there must be something wrong with the lower-scoring product. It is just as likely to be a problem with the setup or methodology. But some testers will not discuss the possibility that they may have tested incorrectly, and will not allow vendors to validate their sample set or methodology in any way. Of course, this may not be overconfidence, but a fear that their test will be found to be invalid.

10. Not including a contact point or allowing any right to reply. Be open in the methodology used and the objective of the evaluation, to allow others the possibility of verifying the validity of the test.

Mapping these points against the nine principles is left as an exercise for the reader. Or maybe I’ll come back to that. 😉

David Harley CITP FBCS CISSP


Responses

  1. I like numbers (5) at (7).

    (5) At one organization (not to be named), we changed our default out of the box settings to be more in line with competitors because we were getting burned in tests.

    (7) More reviewers probably do this than would care to admit it. Stating where the samples are obtained from would be useful.

  2. […] to an AMTSO blog post, I've returned to it (and slightly tweaked it) as another AMTSO blog post. I'll probably return to it in more detail here, […]

  3. […] software testing results, keep the methodology used in mind. See if you can identify any of the top ten testing mistakes frequently made by testers and prepare to question the conclusions in the […]

  4. […] software testing results, keep the methodology used in mind. See if you can identify any of the top ten testing mistakes frequently made by testers and prepare to question the conclusions in the […]

  5. […] software testing results, keep the methodology used in mind. See if you can identify any of the top ten testing mistakes frequently made by testers and prepare to question the conclusions in the […]

  6. […] Commentary without comment spam… By David Harley I should also have pointed out in my previous post that Alice Decker, a Trend Micro researcher who is very active in AMTSO, posted an interesting blog providing commentary on the latest guidelines documents approved at Helsinki and published here, the Kevin Townsend blog considered at some length here, and the top ten testing screw-ups blog here. […]

  7. […] one of David Harley’s ‘common mistakes’ in How to Screw Up Testing is “Using VirusTotal or a similar service to check the samples and assume that any product […]

  8. […] attention from sites pushing fake AV from re-posts of blogs that reference ours (especially this one on “how to screw up testing”). This blog offers a way for people who aren’t […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Categories

%d bloggers like this: