Posted by: David Harley | September 19, 2015

AMTSO feature check – compressed files

The recently refurbished AMTSO site has added a new feature settings check to its pages.

The Feature Settings Check for Desktop Solutions page now includes a page where you can Test if my protection against the download of compressed malware is enabled. Perhaps I should have said ‘your protection’ rather than ‘my protection’: I already know about the packages I use. :)

The page contains the EICAR test file presented in 11 file compression formats:

  1. ZIP
  2. ZIPX
  3. 7-ZIP
  4. WinRAR
  5. targz
  6. ACE
  7. CAB
  8. JAR
  9. LZH
  10. RAR-SFX
  11. ZIP-SFX

In short, if your anti-malware product lets you download the file in one of these formats, there is a problem. You’d think that most products would support scanning inside these types of file in this day and age (though even a decade or so ago it was quite a different story), but the feature may not be activated for all products by default. At the moment only two products are listed as supporting this check page, but that doesn’t mean that other products don’t have support for compressed file scanning. I imagine others will be added in due course.

David Harley

Posted by: David Harley | July 23, 2015

AV-Comparatives OS X Product test

AV-Comparatives has released its 2015 OS X review/test report, comparing 10 products. Its remit is far wider than the malware protection test (consisting of 105 samples that OS X Yosemite doesn’t or didn’t block), including:

  • Operating systems supported
  • Additional features  a firewall or anti-phishing capability
  • Installation/deinstallation options and issues.
  • Interface aspects and issues
  • Operating system integration
  • Whether a user with a standard user account can disable the protection.
  • Scanning options
  • The quality of the help facilities.

Unusually, the type of alert shown when the EICAR test file and various features on the AMTSO Security Features pages are accessed.

On a quick scan, it looks like AV-Comparatives has done its usual thorough job.

David Harley

Posted by: David Harley | May 7, 2015

Anti-Malware Test Cheats revisited: AMTSO speaks

Here’s more about the companies that have been chastised by AV-Test, AV-Comparatives and Virus Bulletin for cheating in comparative tests.

First, AV-Comparatives announced on its Facebook page that one of the vendors participating in its tests had infringed its testing agreement by submitting a version for testing that ‘had been specifically engineered for the major testing labs.’ Since there wasn’t much in the way of hard information there, my article here on Gaming the tests: who’s being cheated? was equally sketchy on detail, but hopefully highlighted some issues by way of some commentary leavened with reminiscence.

Subsequently, a joint statement by AV-Comparatives, AV-Test and Virus Bulletin here announced that the products submitted by Qihoo for testing had the Bitdefender engine enabled by default and its own QVM engine disabled, whereas ‘all versions made generally available to users in Qihoo’s main market regions had the Bitdefender engine disabled and the QVM engine active…’

Which led me (and hopefully many others) to wonder why Qihoo was ‘apparently going out of its way to provide its customers with a default configuration that – according to the joint statement – not only demonstrates inferior detection performance, but actually impacts on usability by increasing the risk of false positives.’

Qihoo (or Qihu) subsequently attempted to answer some of the criticisms and questions on its Facebook page, claiming that the criticism of its engine was unfair because “many popular software add-ons in China that are flagged as malware by the AV-C definition are in fact performing proper functions and not malicious. Therefore, Qihoo 360 and other domestic vendors’ security products in China treat such add-ons as legitimate and non-threatening.” This may sound similar to an issue I cited in a previous blog:

In general, security products are cautious about detecting PUAs/PUPs/PUS by default, for a number of reasons. That’s problematical, though, where testers insist on using default settings and don’t filter PUAs out of their sample sets.

That’s a scenario that has irritated me for many years – which is why I cited it – but it was just an example – I didn’t know at the time that Qihoo were going to use much the same issue as a defence. In fact, as Simon Edwards quite rightly pointed out, testers (at least, reputable testers) nowadays are pretty careful about filtering correctly, and that certainly includes the three testers we’re concerned with here, so Qihoo’s argument is of doubtful relevance to the testers’ criticism. It also has a blog article offering an explanation for its preference for its own QVM engine in its public versions, and claims that the testing labs were made aware that the version supplied was configured differently. Clearly this differs from the statement made by the labs, and it seems that Qihoo has announced its withdrawal from their tests.

Next, Tencent was criticized by the same testers on somewhat similar grounds, though in this case it seems that the product (not only the version submitted for testing, but apparently all recent publicly available versions) was optimized for fast scanning by bypassing objects that are normally – and quite rightly – routinely scanned by anti-malware scanners. The conclusion, though, is pretty much the same:

These optimizations, which have been found in all recent public versions of the products, provide minimal benefit to normal users and could even degrade the level of protection offered by the products.

According to The Register, Virus Bulletin’s John Hawes comments that:

“Their software has so many feedback systems and each user was pumping the data back to Tencent’s labs.”

The Register also suggests that Baidu is also still being investigated, so perhaps there’s more to come on that. Not to mention a report that Tencent plans to take legal action against one of the labs, apparently in the hope of persuading it ‘to lift its allegations and resume all certifications and awards granted to Tencent.’

And finally, AMTSO, the Anti-Malware Testing Standards Organization, also weighed in: Why we cannot tolerate unethical behavior in the anti-malware industry. This is a big deal: when I was heavily involved with AMTSO, I and other Board members spent a lot of time debating testing issues with people outside the organization, and some of those discussions were pretty heated. AMTSO has seemed subsequently to avoid controversy, and in fact has been pretty quiet altogether, but while it doesn’t name names in its statement, it makes its position quite clear. It doesn’t approve of vendors that try to game tests, and is particularly concerned when vendors seem to be putting test scores ahead of their users’ safety. Can’t argue with that.

Well, I did have a bit more to say than that in an article for but I’m pleased to see AMTSO taking a firm stand on inappropriate practice by vendors, who have been known to use the organization as a threatening response to an unfavourable review. But there are still plenty of poor tests out there: it will be interesting to see whether AMTSO will be as ready to comment on genuinely poor practice by testers when appropriate.

David Harley

Posted by: David Harley | May 1, 2015

Follow-up to the article on test cheats

If you find my article on Gaming the tests: who’s being cheated?  of any interest, you may find this follow-up article for ITSecurity of some interest too, as it takes up the next exciting installment: Product test cheats: this could run and run.

David Harley
Small Blue-Green World

Posted by: David Harley | April 30, 2015

Gaming the tests: who’s being cheated?

[Update: a joint statement by AV-Comparatives, AV-Test and Virus Bulletin is now available here: it appears that the products submitted by Qihoo for testing had the Bitdefender engine enabled by default and its own QVM engine disabled, whereas ‘all versions made generally available to users in Qihoo’s main market regions had the Bitdefender engine disabled and the QVM engine active.’ The testers state that this engine provides ‘a considerably lower level of protection and a higher likelihood of false positives.’]

You may have the impression, if you’ve read some of the stuff I’ve written about testing over the years (surely somebody must have read a bit of it????), that I’m anti-tester. It’s not the case, though I’m passionately against bad testing: while many tests and testers make me want to shake somebody, I recognize that the Internet would be a more (ok, an even more) dangerous place without competent testers. Of whom there are quite a few, these days, and I think AMTSO, for all its false steps, can take some of the credit for that.

It’s easy to forget what a free-for-all testing was when AMTSO was actually conceived. Let’s be clear: there were always good (and bad, and mediocre) testers, and that’s still the case today, but many technical and ethical issues have been resolved – in the mainstream, at any rate – by exhaustive (and sometimes exhausting) discussion at workshops, in forums and by email.

I remember a time when there was much criticism of AMTSO because people suspected collusion between the two sides of the vendor/tester divide. In fact, a more accurate picture might be of two parties whose aims overlap but are by no means totally compatible, working towards methodologies that actually benefit customers rather than mislead them. There are testing organizations that decline to compromise their credibility by engaging with security companies in AMTSO or elsewhere, and I can see why they’d want to preserve their neutrality: the problem there is that testing is difficult, requiring a depth and breadth of knowledge and experience that is rarely found outside the security industry, and they’re cutting themselves off from a major source of information on how they can improve their testing.

Mainstream testers and vendors have a good knowledge of each other’s area of expertise, but there are consumer organizations who are convinced that testing AV is as easy as evaluating a pair of headphones or a car insurance policy. If they don’t feel competent to do it themselves and outsource the testing to a professional tester, that’s fine, but sometimes they prefer to use outside organizations who may have strong security connections, but aren’t sufficiently au fait with the subtleties of anti-malware technology.

Incompetent and downright dishonest testers, on the other hand, should be held accountable to and by the users of security products who are exposed to misleading test results and conclusions. But that doesn’t mean that vendors qualify for sainthood.

To some extent, it’s inevitable that vendors bear in mind the sort of tests they expect their products to undergo and configure them accordingly. Years ago, many anti-virus products would flag all sorts of non-viral, non-malicious files because they knew that high-profile testers were using poorly-filtered virus libraries that contained all sorts of unverified ‘garbage files’. In other words, they would detect and flag objects that posed no real threat to the user, because they knew that they would be penalized in comparative tests for not detecting them when other products did.

In the 90s, there was some controversy when a particular product (Dr Solomon’s) was found to configured so that if it found more than ten known viruses on a system, it assumed that it was being used by a tester/reviewer to scan a library of virus samples, and switched from using only static signatures to heuristic mode, to increase the likelihood that it would catch malware for which it didn’t yet have a static signature. McAfee (among others) claimed that this ‘cheat mode’ gave the Dr Solomon’s product an unfair advantage and misled the public, since the ‘extra’ viruses would not be detected in a real-world situation. Which did seem to be the case according to McAfee’s own testing, but it also contributed to making people aware that (a) heuristic scanning might be quite a good idea as more and more previously unknown viruses were appearing (b) the Dr Solomon’s range of products were really rather good at heuristics. (Unfortunately, it would be hard for any product to match that sort of performance on unknown malware today, because scanners need to detect a far wider range of malicious behaviour today than just the ability to self-replicate.)

Nowadays, static testing using huge collections of everything anyone had ever considered to be a ‘virus’ is the exception rather than the norm. At least, it’s not how competent mainstream testers work. And a product that restricted itself to non-heuristic detection would be of very little use, and it seems ludicrous that one company thought that another company was cheating by using heuristics. Perhaps that’s more understandable if you recall that there were concerns at the time that heuristics might increase the risk of false positives and have a negative impact on general processing speed.

But there is still a fine line between accommodating known testing methodologies and actually gaming a test. The Dr Solomon’s ‘cheat’ involved adding functionality to the same package used by its customers, even though that functionality was of doubtful direct benefit to the user (though it was obviously intended to benefit the vendor). Was that cheating? Lots of us didn’t think so at the time, and it seems to have become a non-issue since. Especially to McAfee, who actually bought the company subsequently.

However, AV-Comparatives has announced that it is investigating (in collaboration with AV-Test and Virus Bulletin) vendors who have submitted versions of their products for testing that are specifically engineered to optimize their performance in a testing environment, and that are not the same product generally in use among their customers. A joint statement is expected, but hasn’t yet been published, so we don’t know for sure which products/vendors are at issue.

Nor do we know exactly how the products at issue differ from the usual production versions, though I can think of a number of ways in which a product’s test performance might be boosted. For example:

  • By detecting ‘possible unwanted’ software by default. In general, security products are cautious about detecting PUAs/PUPs/PUS by default, for a number of reasons. That’s problematical, though, where testers insist on using default settings and don’t filter PUAs out of their sample sets.
  • By enabling by default a heuristic level so paranoid that in the real world it would generate an unacceptable level of false positives. (Though this might be a less effective strategy where a tester included FP testing in its test suite.)

Well, we’ll see what the testers’ joint statement tells us. What is clear, though, is this. A well-conceived test should reflect real-world experience as closely as possible. When the actual product isn’t the one that the real-world customer is using, the test can’t reflect real-world experience accurately, though we can’t say at the moment how much real difference to the test results the tweaking of the submitted version actually made.

But it isn’t just the tester who’s being cheated, it’s all the potential customers who will expect more of the out-of-the-box product than it actually provides. Hopefully, it can be configured to generate the same detection rates, if in fact boosting detection rates artificially in some way was the actual purpose of the tweak. However, many customers expect not to have to make any decisions at all about configuration: I think that’s an unhealthy expectation, but it is what it is. And if the product’s performance can be boosted to equal its detection under test, what are the implications for its performance in other respects, with or without tweaking?

David Harley

Older Posts »



Get every new post delivered to your Inbox.