Posted by: David Harley | May 7, 2015

Anti-Malware Test Cheats revisited: AMTSO speaks

Here’s more about the companies that have been chastised by AV-Test, AV-Comparatives and Virus Bulletin for cheating in comparative tests.

First, AV-Comparatives announced on its Facebook page that one of the vendors participating in its tests had infringed its testing agreement by submitting a version for testing that ‘had been specifically engineered for the major testing labs.’ Since there wasn’t much in the way of hard information there, my article here on Gaming the tests: who’s being cheated? was equally sketchy on detail, but hopefully highlighted some issues by way of some commentary leavened with reminiscence.

Subsequently, a joint statement by AV-Comparatives, AV-Test and Virus Bulletin here announced that the products submitted by Qihoo for testing had the Bitdefender engine enabled by default and its own QVM engine disabled, whereas ‘all versions made generally available to users in Qihoo’s main market regions had the Bitdefender engine disabled and the QVM engine active…’

Which led me (and hopefully many others) to wonder why Qihoo was ‘apparently going out of its way to provide its customers with a default configuration that – according to the joint statement – not only demonstrates inferior detection performance, but actually impacts on usability by increasing the risk of false positives.’

Qihoo (or Qihu) subsequently attempted to answer some of the criticisms and questions on its Facebook page, claiming that the criticism of its engine was unfair because “many popular software add-ons in China that are flagged as malware by the AV-C definition are in fact performing proper functions and not malicious. Therefore, Qihoo 360 and other domestic vendors’ security products in China treat such add-ons as legitimate and non-threatening.” This may sound similar to an issue I cited in a previous blog:

In general, security products are cautious about detecting PUAs/PUPs/PUS by default, for a number of reasons. That’s problematical, though, where testers insist on using default settings and don’t filter PUAs out of their sample sets.

That’s a scenario that has irritated me for many years – which is why I cited it – but it was just an example – I didn’t know at the time that Qihoo were going to use much the same issue as a defence. In fact, as Simon Edwards quite rightly pointed out, testers (at least, reputable testers) nowadays are pretty careful about filtering correctly, and that certainly includes the three testers we’re concerned with here, so Qihoo’s argument is of doubtful relevance to the testers’ criticism. It also has a blog article offering an explanation for its preference for its own QVM engine in its public versions, and claims that the testing labs were made aware that the version supplied was configured differently. Clearly this differs from the statement made by the labs, and it seems that Qihoo has announced its withdrawal from their tests.

Next, Tencent was criticized by the same testers on somewhat similar grounds, though in this case it seems that the product (not only the version submitted for testing, but apparently all recent publicly available versions) was optimized for fast scanning by bypassing objects that are normally – and quite rightly – routinely scanned by anti-malware scanners. The conclusion, though, is pretty much the same:

These optimizations, which have been found in all recent public versions of the products, provide minimal benefit to normal users and could even degrade the level of protection offered by the products.

According to The Register, Virus Bulletin’s John Hawes comments that:

“Their software has so many feedback systems and each user was pumping the data back to Tencent’s labs.”

The Register also suggests that Baidu is also still being investigated, so perhaps there’s more to come on that. Not to mention a report that Tencent plans to take legal action against one of the labs, apparently in the hope of persuading it ‘to lift its allegations and resume all certifications and awards granted to Tencent.’

And finally, AMTSO, the Anti-Malware Testing Standards Organization, also weighed in: Why we cannot tolerate unethical behavior in the anti-malware industry. This is a big deal: when I was heavily involved with AMTSO, I and other Board members spent a lot of time debating testing issues with people outside the organization, and some of those discussions were pretty heated. AMTSO has seemed subsequently to avoid controversy, and in fact has been pretty quiet altogether, but while it doesn’t name names in its statement, it makes its position quite clear. It doesn’t approve of vendors that try to game tests, and is particularly concerned when vendors seem to be putting test scores ahead of their users’ safety. Can’t argue with that.

Well, I did have a bit more to say than that in an article for but I’m pleased to see AMTSO taking a firm stand on inappropriate practice by vendors, who have been known to use the organization as a threatening response to an unfavourable review. But there are still plenty of poor tests out there: it will be interesting to see whether AMTSO will be as ready to comment on genuinely poor practice by testers when appropriate.

David Harley

Posted by: David Harley | May 1, 2015

Follow-up to the article on test cheats

If you find my article on Gaming the tests: who’s being cheated?  of any interest, you may find this follow-up article for ITSecurity of some interest too, as it takes up the next exciting installment: Product test cheats: this could run and run.

David Harley
Small Blue-Green World

Posted by: David Harley | April 30, 2015

Gaming the tests: who’s being cheated?

[Update: a joint statement by AV-Comparatives, AV-Test and Virus Bulletin is now available here: it appears that the products submitted by Qihoo for testing had the Bitdefender engine enabled by default and its own QVM engine disabled, whereas ‘all versions made generally available to users in Qihoo’s main market regions had the Bitdefender engine disabled and the QVM engine active.’ The testers state that this engine provides ‘a considerably lower level of protection and a higher likelihood of false positives.’]

You may have the impression, if you’ve read some of the stuff I’ve written about testing over the years (surely somebody must have read a bit of it????), that I’m anti-tester. It’s not the case, though I’m passionately against bad testing: while many tests and testers make me want to shake somebody, I recognize that the Internet would be a more (ok, an even more) dangerous place without competent testers. Of whom there are quite a few, these days, and I think AMTSO, for all its false steps, can take some of the credit for that.

It’s easy to forget what a free-for-all testing was when AMTSO was actually conceived. Let’s be clear: there were always good (and bad, and mediocre) testers, and that’s still the case today, but many technical and ethical issues have been resolved – in the mainstream, at any rate – by exhaustive (and sometimes exhausting) discussion at workshops, in forums and by email.

I remember a time when there was much criticism of AMTSO because people suspected collusion between the two sides of the vendor/tester divide. In fact, a more accurate picture might be of two parties whose aims overlap but are by no means totally compatible, working towards methodologies that actually benefit customers rather than mislead them. There are testing organizations that decline to compromise their credibility by engaging with security companies in AMTSO or elsewhere, and I can see why they’d want to preserve their neutrality: the problem there is that testing is difficult, requiring a depth and breadth of knowledge and experience that is rarely found outside the security industry, and they’re cutting themselves off from a major source of information on how they can improve their testing.

Mainstream testers and vendors have a good knowledge of each other’s area of expertise, but there are consumer organizations who are convinced that testing AV is as easy as evaluating a pair of headphones or a car insurance policy. If they don’t feel competent to do it themselves and outsource the testing to a professional tester, that’s fine, but sometimes they prefer to use outside organizations who may have strong security connections, but aren’t sufficiently au fait with the subtleties of anti-malware technology.

Incompetent and downright dishonest testers, on the other hand, should be held accountable to and by the users of security products who are exposed to misleading test results and conclusions. But that doesn’t mean that vendors qualify for sainthood.

To some extent, it’s inevitable that vendors bear in mind the sort of tests they expect their products to undergo and configure them accordingly. Years ago, many anti-virus products would flag all sorts of non-viral, non-malicious files because they knew that high-profile testers were using poorly-filtered virus libraries that contained all sorts of unverified ‘garbage files’. In other words, they would detect and flag objects that posed no real threat to the user, because they knew that they would be penalized in comparative tests for not detecting them when other products did.

In the 90s, there was some controversy when a particular product (Dr Solomon’s) was found to configured so that if it found more than ten known viruses on a system, it assumed that it was being used by a tester/reviewer to scan a library of virus samples, and switched from using only static signatures to heuristic mode, to increase the likelihood that it would catch malware for which it didn’t yet have a static signature. McAfee (among others) claimed that this ‘cheat mode’ gave the Dr Solomon’s product an unfair advantage and misled the public, since the ‘extra’ viruses would not be detected in a real-world situation. Which did seem to be the case according to McAfee’s own testing, but it also contributed to making people aware that (a) heuristic scanning might be quite a good idea as more and more previously unknown viruses were appearing (b) the Dr Solomon’s range of products were really rather good at heuristics. (Unfortunately, it would be hard for any product to match that sort of performance on unknown malware today, because scanners need to detect a far wider range of malicious behaviour today than just the ability to self-replicate.)

Nowadays, static testing using huge collections of everything anyone had ever considered to be a ‘virus’ is the exception rather than the norm. At least, it’s not how competent mainstream testers work. And a product that restricted itself to non-heuristic detection would be of very little use, and it seems ludicrous that one company thought that another company was cheating by using heuristics. Perhaps that’s more understandable if you recall that there were concerns at the time that heuristics might increase the risk of false positives and have a negative impact on general processing speed.

But there is still a fine line between accommodating known testing methodologies and actually gaming a test. The Dr Solomon’s ‘cheat’ involved adding functionality to the same package used by its customers, even though that functionality was of doubtful direct benefit to the user (though it was obviously intended to benefit the vendor). Was that cheating? Lots of us didn’t think so at the time, and it seems to have become a non-issue since. Especially to McAfee, who actually bought the company subsequently.

However, AV-Comparatives has announced that it is investigating (in collaboration with AV-Test and Virus Bulletin) vendors who have submitted versions of their products for testing that are specifically engineered to optimize their performance in a testing environment, and that are not the same product generally in use among their customers. A joint statement is expected, but hasn’t yet been published, so we don’t know for sure which products/vendors are at issue.

Nor do we know exactly how the products at issue differ from the usual production versions, though I can think of a number of ways in which a product’s test performance might be boosted. For example:

  • By detecting ‘possible unwanted’ software by default. In general, security products are cautious about detecting PUAs/PUPs/PUS by default, for a number of reasons. That’s problematical, though, where testers insist on using default settings and don’t filter PUAs out of their sample sets.
  • By enabling by default a heuristic level so paranoid that in the real world it would generate an unacceptable level of false positives. (Though this might be a less effective strategy where a tester included FP testing in its test suite.)

Well, we’ll see what the testers’ joint statement tells us. What is clear, though, is this. A well-conceived test should reflect real-world experience as closely as possible. When the actual product isn’t the one that the real-world customer is using, the test can’t reflect real-world experience accurately, though we can’t say at the moment how much real difference to the test results the tweaking of the submitted version actually made.

But it isn’t just the tester who’s being cheated, it’s all the potential customers who will expect more of the out-of-the-box product than it actually provides. Hopefully, it can be configured to generate the same detection rates, if in fact boosting detection rates artificially in some way was the actual purpose of the tweak. However, many customers expect not to have to make any decisions at all about configuration: I think that’s an unhealthy expectation, but it is what it is. And if the product’s performance can be boosted to equal its detection under test, what are the implications for its performance in other respects, with or without tweaking?

David Harley

Posted by: David Harley | March 27, 2015

Choosing an antivirus program

Not so long ago, Heimdal Security’s Aurelian Neagu put together a blog to which a number of security people – including me – contributed tips: 50+ Internet Security Tips & Tricks from Top Experts.

So, given my continuing interest in testing and related issues, I was interested to see that Aurelian had put up security guide on What Is The Best Antivirus For My PC? A Step-By-Step Research Guide. After all, a lot of my writing (including the work I did with AMTSO) has been concerned with helping people to be able to make informed judgements on what security products they should invest in. Not from the point of  view of recommending specific products – since much of my income comes from working with a company that is a major player in the anti-malware market, it would be hard to avoid conflicts of interest – but in terms of making the best possible decision. However, I’ve tended to focus on product testing, and in particular, approaches to evaluating comparative tests.

The Heimdal guide takes a slightly broader approach exemplified by a Venn diagram where three circles representing Expert Reviews, Independent Testing, and User Opinions overlap to make up the label Complete Antivirus Assessment.

I’m not altogether on board with this approach:

  • Due to decades of reading forum discussions – from the chaos of alt.comp.virus in the 1990s, where marketroids, researchers, virus writers and confused computer users all rubbed shoulders, through to various LinkedIn groups where most of the posts are by vendor marketing managers –  I’m far from convinced that crowd-sourced information is reliable. It’s one of those areas where if you know enough to distinguish between good and bad advice, maybe you don’t need advice. There’s an article demanding to be written here on what snippets of advice should raise red flags, but Heimdal hasn’t written it.
  • The trouble with expert reviews is that so many of them are not written by experts (and they’re not always independent). It’s one of those areas where if you know enough to distinguish between good and bad advice, maybe you don’t need advice. (Is there an echo in here?)
  • Not all independent tests are competent. And some that look independent aren’t.

If past experience is anything to go by, I stand a good chance of inviting accusations of being at least negative and possibly elitist by these comments. But I don’t think it’s enough to direct people towards a forum of all shades of opinion and expertise may be represented. How do you decide whose advice to take (especially when it’s based on  ones-size-fits-all-criteria like price – how many tests, reviews and commentaries assume that free AV is best)?

There are, in fact, some useful ideas here as regards sources of information, like several of the more competent testers.  But it’s downright bizarre that there’s no mention of AMTSO here. Admittedly, one of the reasons I no longer have formal ties with AMTSO is that I always felt that the organization could have done more to engage with the everyday user, rather than focusing on testers. And it’s a pity that the AMTSO site seems to have dropped linking to articles other than its own guidelines documents, most of which are focused on testing methodologies rather than evaluation of tests by non-experts. (However, the AMTSO Fundamental Principles of Testing is still a must-read for anyone who wants to understand more about testing.)

Heimdal are to be applauded for trying to provide clarity where there is none – or very little – but I’m disappointed.

David Harley

Posted by: David Harley | March 2, 2014

AMTSO Feature Settings Checks Expanded

With a very muted fanfare, AMTSO has adjusted and expanded its web page for anti-malware feature settings by splitting it into two pages: the main page now links to  “Feature Settings Check for Desktop Solutions” and “Feature Settings Check for Android based Solutions“.

The Desktop Solutions page still links to the following tests:

  1. Test if my protection against the manual download of malware (EICAR.COM) is enabled
  2. Test if my protection against a drive-by download (EICAR.COM) is enabled
  3. Test if my protection against the download of a Potentially Unwanted Application (PUA) is enabled
  4. Test if protection against accessing a Phishing Page is enabled
  5. Test if my cloud protection is enabled

The Android links are as follows:

  1. Test if my protection against the manual download of malware is enabled
  2. Test if my protection against a drive-by download is enabled
  3. Test if my protection against the download of a Potentially Unwanted Application (PUA) is enabled
  4. Test if protection against accessing a Phishing Page is enabled

I haven’t looked at the new links, as I don’t have an Android device to test them with.

Feature testing is about checking whether your security product has specific features available and activated, and isn’t really related to the comparative testing that AMTSO mostly focuses on. Still, a lot of people seem to find tools like the EICAR ‘test’ file useful and reassuring.

David Harley
Small Blue-Green World

Older Posts »



Get every new post delivered to your Inbox.