January 28, 2013

Taking Testing Seriously

A hat tip to Andy Hayter for drawing our attention to an article by Ellyne Phneah: Antivirus tests need better methodology.

The summary makes an essential point: “While most antivirus tests do not have the right assessment methods to determine their effectiveness, vendors cannot afford to ignore them as they are key to branding.” But some of the people quoted also made excellent points.

Simon Piff of IDC commented on the importance of the test’s selection process. No-one is going to test with every known malware sample at this point in time, so the sample set used – as Mr. Harley has remarked  many times – has to be truly representative of the whole population of known malware. (Making it representative of all malware known and unknown might be just too much of a challenge.) That applies irrespective of the size of the sample set.

I suppose even Imperva’s much-criticized 82-sample pseudo-test might have some validity if they were the right 82 samples, but how do you determine that? (Leaving aside the fact that VirusTotal is really not a testing tool, because it doesn’t tell you whether a given product is capable of detecting given malcode in a real-life scenario. I think we might have mentioned that before.) Peter Stelzhammer of AV-Comparatives rightly commented on the need for “a real-life scenario with a statistically valid number of test cases”, though what constitutes statistically valid remains moot. (Obviously it will, in any case vary according to the type of test: for example AV-Comparatives’ own recent test of OS X-hosted anti-malware products, using 477 samples, sounds quite reasonable in the context of the much less densely-populated Mac malware landscape.)

The article also cites Microsoft’s defence of its own relatively poor performance in a recent AV-Test report, claiming that the use of malware in tests did not reflect real-life conditions. And the company has a point: whole-product testing is hard work, and only practical with small sample sets like the 100 samples that AV-Test used, again raising the question as to whether the sample set was truly representative. Microsoft say not, based on its own perception of the prevalence of individual samples. AV-Test justifies its choice on the grounds that its samples, while low in prevalence as individual samples, were from highly prevalent malware families. I have sympathy with both viewpoints, but it seems to me that the argument as to whether these were the ‘right’ 100 samples remains unresolved.

The point about branding that constitutes the article’s summary also comes from Peter Stelzhammer:

…antivirus tests are not completely irrelevant as it can prove a good branding opportunity…When antivirus vendors score well across a variety of tests, they prove their quality and performance, which is way more effective than claiming to have the best product on the market…

And that is pretty much why AV vendors may decline to participate in a given test (when given a choice), but are unlikely to withdraw from all testing. Mainstream testers provide the nearest we have to an impartial and competent assessment of the comparative efficacy of individual products, and a product vendor that refuses to participate in a high-profile test risks being asked why it won’t participate if it believes in itself. The answer may, quite truthfully, be that the vendor has no faith in the test. But it just sounds like sour grapes.

AMTSO, to return to another theme we’ve talked about many times, is based on the hope and  expectation that there is a synergistic, symbiotic relationship between the security industry and the testing industry, and that both sectors will benefit from higher standards of testing. If they can continue to cooperate without either party trying to dominate the other, AMTSO may yet accomplish something that will truly benefit the end user. We shall see…

Old Mac Bloggit


